'enable_nestloop' is off but hawq continues to adopt nestloop, which means
HashJoinPath is not considered for some reason in the first pass. There do
exists hash clause 'lineitem.l_orderkey = orders.o_orderkey', so maybe the
cost issue or pathkey issue in old planner. Need to debug
JoinPathStrategy::hashInnerAndOuter,
the main logic is below:
```
// if we found any usable hashclauses, so make paths
if (IsValidNodeList(hashClauses)) {
// We consider both the cheapest-total-cost and cheapest-startup-cost
// outer paths. There's no need to consider any but the
// cheapest-total-cost inner path, however.
Path *cheapestStartupOuter = outerRel->cheapestStartupPath;
Path *cheapestTotalOuter = outerRel->cheapestTotalPath;
Path *cheapestTotalInner = innerRel->cheapestTotalPath;
if (cheapestStartupOuter)
tryHashJoinPath(root, joinRel, joinType, sjInfo, cheapestStartupOuter,
cheapestTotalInner, restrictList, hashClauses.get(),
mergeClauseList);
// already tried it
if (cheapestTotalOuter != cheapestStartupOuter)
tryHashJoinPath(root, joinRel, joinType, sjInfo, cheapestTotalOuter,
cheapestTotalInner, restrictList, hashClauses.get(),
mergeClauseList);
}
```
2017-01-17 9:36 GMT+08:00 Guo Kai <[email protected]>:
> I use the default optimizer not orca. The resource management's setting is
> none not YARN.
>
> After analyzing all tables, the problem is still.
>
> 2017-01-16 21:50 GMT+08:00 Ruilong Huo <[email protected]>:
>
>> Which optimizer do you use in them respectively? Are the table analyzed
>> before you run the query?
>>
>> Best regards,
>> Ruilong Huo
>>
>> On Mon, Jan 16, 2017 at 9:32 PM, Guo Kai <[email protected]> wrote:
>>
>>>
>>>> Hi, All.
>>>>
>>>> I ran tpc-h on a hawq cluster. The cluster consists of 1 master node
>>>> and 3 segment nodes, and the same with HDFS cluster. Meanwhile, I build a
>>>> gpdb cluster on same nodes.
>>>>
>>>> Each node(exactly each vm) has the same configurations -- 4GB memory, 1
>>>> vcpu, and sharing host network. I used dbgen to generate 3GB data, then
>>>> loaded into hawq and gpdb.
>>>>
>>>> I ran Query 4 in tpc-h. gpdb finished it in a few seconds, but hawq got
>>>> stuck for tens of minutes and more.
>>>>
>>>> Query 4 is as follows:
>>>>
>>>> *select*
>>>> * o_orderpriority,*
>>>> * count(*) as order_count*
>>>> *from*
>>>> * orders*
>>>> *where*
>>>> * o_orderdate >= date '1996-03-01'*
>>>> * and o_orderdate < date '1996-03-01' + interval '3 month'*
>>>> * and exists (*
>>>> * select*
>>>> * **
>>>> * from*
>>>> * lineitem*
>>>> * where*
>>>> * l_orderkey = o_orderkey*
>>>> * and l_commitdate < l_receiptdate*
>>>> * )*
>>>> *group by*
>>>> * o_orderpriority*
>>>> *order by^M*
>>>> * o_orderpriority*
>>>> *LIMIT 1;*
>>>>
>>>> Moreover, I retrieved both of query plans. I found hawq produced a huge
>>>> cost plan while gpdb did a small one. The obvious difference between them
>>>> is that hawq adopted
>>>> nest loop join but gpdb adopted hash join. I checked that the parameter
>>>> 'enable_nestloop' is off by default.
>>>>
>>>> The gpdb plan is as follows:
>>>>
>>>> QUERY PLAN
>>>>
>>>> ------------------------------------------------------------
>>>> ------------------------------------------------------------
>>>> ------------------------------------------------------------
>>>> Limit (cost=407633.93..407633.96 rows=1 width=72)
>>>> -> Gather Motion 12:1 (slice3; segments: 12)
>>>> (cost=407633.93..407633.96 rows=1 width=72)
>>>> Merge Key: partial_aggregation.o_orderpriority
>>>> -> Limit (cost=407633.93..407633.94 rows=1 width=72)
>>>> -> Sort (cost=407633.93..407633.94 rows=1 width=72)
>>>> Sort Key (Limit): partial_aggregation.o_orderpri
>>>> ority
>>>> -> HashAggregate (cost=407633.91..407633.92
>>>> rows=1 width=72)
>>>> Group By: orders.o_orderpriority
>>>> -> Redistribute Motion 12:12 (slice2;
>>>> segments: 12) (cost=407633.88..407633.90 rows=1 width=72)
>>>> Hash Key: orders.o_orderpriority
>>>> -> HashAggregate
>>>> (cost=407633.88..407633.88 rows=1 width=72)
>>>> Group By: orders.o_orderpriority
>>>> -> HashAggregate
>>>> (cost=404041.00..406436.25 rows=19961 width=74)
>>>> Group By:
>>>> orders.ctid::bigint, orders.gp_segment_id
>>>> -> Hash Join
>>>> (cost=89634.53..402843.37 rows=19961 width=26)
>>>> Hash Cond:
>>>> lineitem.l_orderkey = orders.o_orderkey
>>>> -> Append-only
>>>> Columnar Scan on lineitem (cost=0.00..295217.60 rows=499906 width=4)
>>>> Filter:
>>>> l_commitdate < l_receiptdate
>>>> -> Hash
>>>> (cost=87388.56..87388.56 rows=14974 width=34)
>>>> ->
>>>> Redistribute Motion 12:12 (slice1; segments: 12) (cost=0.00..87388.56
>>>> rows=14974 width=34)
>>>> Hash
>>>> Key: orders.o_orderkey
>>>> ->
>>>> Append-only Columnar Scan on orders (cost=0.00..83795.00 rows=14974
>>>> width=34)
>>>>
>>>> Filter: o_orderdate >= '1996-03-01'::date AND o_orderdate < '1996-06-01
>>>> 00:00:00'::timestamp without time zone
>>>> (23 rows)
>>>>
>>>> The hawq plan is as follows:
>>>>
>>>> QUERY PLAN
>>>>
>>>> ------------------------------------------------------------
>>>> ------------------------------------------------------------
>>>> ------------------------------------------------------------
>>>> ------------
>>>> Limit (cost=424555364732.86..424555364732.88 rows=1 width=72)
>>>> -> Gather Motion 18:1 (slice4; segments: 18)
>>>> (cost=424555364732.86..424555364732.88 rows=1 width=72)
>>>> Merge Key: partial_aggregation.o_orderpriority
>>>> -> Limit (cost=424555364732.86..424555364732.86 rows=1
>>>> width=72)
>>>> -> Sort (cost=424555364732.86..424555364732.86 rows=1
>>>> width=72)
>>>> Sort Key (Limit): partial_aggregation.o_orderpri
>>>> ority
>>>> -> HashAggregate
>>>> (cost=424555364732.84..424555364732.85
>>>> rows=1 width=72)
>>>> Group By: orders.o_orderpriority
>>>> -> Redistribute Motion 18:18 (slice3;
>>>> segments: 18) (cost=424555364732.80..424555364732.82 rows=1 width=72)
>>>> Hash Key: orders.o_orderpriority
>>>> -> HashAggregate
>>>> (cost=424555364732.80..424555364732.80 rows=1 width=72)
>>>> Group By: orders.o_orderpriority
>>>> -> HashAggregate
>>>> (cost=424555361238.53..424555363568.04 rows=12942 width=74)
>>>> Group By:
>>>> orders.ctid::bigint, orders.gp_segment_id
>>>> -> Result
>>>> (cost=122726.35..424555360073.77 rows=12942 width=26)
>>>> -> Redistribute
>>>> Motion 18:18 (slice2; segments: 18) (cost=122726.35..424555360073.77
>>>> rows=12942 width=26)
>>>> Hash Key:
>>>> orders.ctid
>>>> -> Nested
>>>> Loop (cost=122726.35..424555355414.73 rows=12942 width=26)
>>>> Join
>>>> Filter: lineitem.l_orderkey = orders.o_orderkey
>>>> ->
>>>> Parquet table Scan on lineitem (cost=0.00..318758.60 rows=333271 width=4)
>>>>
>>>> Filter: l_commitdate < l_receiptdate
>>>> ->
>>>> Materialize (cost=122726.35..154180.79 rows=174747 width=34)
>>>> ->
>>>> Broadcast Motion 18:18 (slice1; segments: 18) (cost=0.00..119580.91
>>>> rows=174747 width=34)
>>>>
>>>> -> Parquet table Scan on orders (cost=0.00..86379.00 rows=9709
>>>> width=34)
>>>>
>>>> Filter: o_orderdate >= '1996-03-01'::date AND o_orderdate <
>>>> '1996-06-01 00:00:00'::timestamp without time zone
>>>> Settings: default_hash_table_bucket_number=18
>>>> (26 rows)
>>>>
>>>> Please suggest explanations why this happens. Thank you!
>>>>
>>>>
>>
>
>
> --
> Guo Kai,
> [email protected]
>