Which optimizer do you use in them respectively? Are the table analyzed before you run the query?
Best regards, Ruilong Huo On Mon, Jan 16, 2017 at 9:32 PM, Guo Kai <[email protected]> wrote: > >> Hi, All. >> >> I ran tpc-h on a hawq cluster. The cluster consists of 1 master node and >> 3 segment nodes, and the same with HDFS cluster. Meanwhile, I build a gpdb >> cluster on same nodes. >> >> Each node(exactly each vm) has the same configurations -- 4GB memory, 1 >> vcpu, and sharing host network. I used dbgen to generate 3GB data, then >> loaded into hawq and gpdb. >> >> I ran Query 4 in tpc-h. gpdb finished it in a few seconds, but hawq got >> stuck for tens of minutes and more. >> >> Query 4 is as follows: >> >> *select* >> * o_orderpriority,* >> * count(*) as order_count* >> *from* >> * orders* >> *where* >> * o_orderdate >= date '1996-03-01'* >> * and o_orderdate < date '1996-03-01' + interval '3 month'* >> * and exists (* >> * select* >> * ** >> * from* >> * lineitem* >> * where* >> * l_orderkey = o_orderkey* >> * and l_commitdate < l_receiptdate* >> * )* >> *group by* >> * o_orderpriority* >> *order by^M* >> * o_orderpriority* >> *LIMIT 1;* >> >> Moreover, I retrieved both of query plans. I found hawq produced a huge >> cost plan while gpdb did a small one. The obvious difference between them >> is that hawq adopted >> nest loop join but gpdb adopted hash join. I checked that the parameter >> 'enable_nestloop' is off by default. >> >> The gpdb plan is as follows: >> >> QUERY PLAN >> >> ------------------------------------------------------------ >> ------------------------------------------------------------ >> ------------------------------------------------------------ >> Limit (cost=407633.93..407633.96 rows=1 width=72) >> -> Gather Motion 12:1 (slice3; segments: 12) >> (cost=407633.93..407633.96 rows=1 width=72) >> Merge Key: partial_aggregation.o_orderpriority >> -> Limit (cost=407633.93..407633.94 rows=1 width=72) >> -> Sort (cost=407633.93..407633.94 rows=1 width=72) >> Sort Key (Limit): partial_aggregation.o_orderpri >> ority >> -> HashAggregate (cost=407633.91..407633.92 rows=1 >> width=72) >> Group By: orders.o_orderpriority >> -> Redistribute Motion 12:12 (slice2; >> segments: 12) (cost=407633.88..407633.90 rows=1 width=72) >> Hash Key: orders.o_orderpriority >> -> HashAggregate >> (cost=407633.88..407633.88 rows=1 width=72) >> Group By: orders.o_orderpriority >> -> HashAggregate >> (cost=404041.00..406436.25 rows=19961 width=74) >> Group By: >> orders.ctid::bigint, orders.gp_segment_id >> -> Hash Join >> (cost=89634.53..402843.37 rows=19961 width=26) >> Hash Cond: >> lineitem.l_orderkey = orders.o_orderkey >> -> Append-only >> Columnar Scan on lineitem (cost=0.00..295217.60 rows=499906 width=4) >> Filter: >> l_commitdate < l_receiptdate >> -> Hash >> (cost=87388.56..87388.56 rows=14974 width=34) >> -> Redistribute >> Motion 12:12 (slice1; segments: 12) (cost=0.00..87388.56 rows=14974 >> width=34) >> Hash Key: >> orders.o_orderkey >> -> >> Append-only Columnar Scan on orders (cost=0.00..83795.00 rows=14974 >> width=34) >> >> Filter: o_orderdate >= '1996-03-01'::date AND o_orderdate < '1996-06-01 >> 00:00:00'::timestamp without time zone >> (23 rows) >> >> The hawq plan is as follows: >> >> QUERY PLAN >> >> ------------------------------------------------------------ >> ------------------------------------------------------------ >> ------------------------------------------------------------------------ >> Limit (cost=424555364732.86..424555364732.88 rows=1 width=72) >> -> Gather Motion 18:1 (slice4; segments: 18) >> (cost=424555364732.86..424555364732.88 rows=1 width=72) >> Merge Key: partial_aggregation.o_orderpriority >> -> Limit (cost=424555364732.86..424555364732.86 rows=1 >> width=72) >> -> Sort (cost=424555364732.86..424555364732.86 rows=1 >> width=72) >> Sort Key (Limit): partial_aggregation.o_orderpri >> ority >> -> HashAggregate >> (cost=424555364732.84..424555364732.85 >> rows=1 width=72) >> Group By: orders.o_orderpriority >> -> Redistribute Motion 18:18 (slice3; >> segments: 18) (cost=424555364732.80..424555364732.82 rows=1 width=72) >> Hash Key: orders.o_orderpriority >> -> HashAggregate >> (cost=424555364732.80..424555364732.80 rows=1 width=72) >> Group By: orders.o_orderpriority >> -> HashAggregate >> (cost=424555361238.53..424555363568.04 rows=12942 width=74) >> Group By: >> orders.ctid::bigint, orders.gp_segment_id >> -> Result >> (cost=122726.35..424555360073.77 rows=12942 width=26) >> -> Redistribute >> Motion 18:18 (slice2; segments: 18) (cost=122726.35..424555360073.77 >> rows=12942 width=26) >> Hash Key: >> orders.ctid >> -> Nested Loop >> (cost=122726.35..424555355414.73 rows=12942 width=26) >> Join >> Filter: lineitem.l_orderkey = orders.o_orderkey >> -> >> Parquet table Scan on lineitem (cost=0.00..318758.60 rows=333271 width=4) >> >> Filter: l_commitdate < l_receiptdate >> -> >> Materialize (cost=122726.35..154180.79 rows=174747 width=34) >> -> >> Broadcast Motion 18:18 (slice1; segments: 18) (cost=0.00..119580.91 >> rows=174747 width=34) >> >> -> Parquet table Scan on orders (cost=0.00..86379.00 rows=9709 width=34) >> >> Filter: o_orderdate >= '1996-03-01'::date AND o_orderdate < >> '1996-06-01 00:00:00'::timestamp without time zone >> Settings: default_hash_table_bucket_number=18 >> (26 rows) >> >> Please suggest explanations why this happens. Thank you! >> >>
