Hello, I noticed ExecChooseHashTableSize() in nodeHash.c got failed by Assert(nbuckets > 0), when crazy number of rows are expected.
BACKTRACE:
#0 0x0000003f79432625 in raise () from /lib64/libc.so.6
#1 0x0000003f79433e05 in abort () from /lib64/libc.so.6
#2 0x000000000092600a in ExceptionalCondition (conditionName=0xac1ea0
"!(nbuckets > 0)",
errorType=0xac1d88 "FailedAssertion", fileName=0xac1d40 "nodeHash.c",
lineNumber=545) at assert.c:54
#3 0x00000000006851ff in ExecChooseHashTableSize (ntuples=60521928028,
tupwidth=8, useskew=1 '\001',
numbuckets=0x7fff146bff04, numbatches=0x7fff146bff00,
num_skew_mcvs=0x7fff146bfefc) at nodeHash.c:545
#4 0x0000000000701735 in initial_cost_hashjoin (root=0x253a318,
workspace=0x7fff146bffc0, jointype=JOIN_SEMI,
hashclauses=0x257e4f0, outer_path=0x2569a40, inner_path=0x2569908,
sjinfo=0x2566f40, semifactors=0x7fff146c0168)
at costsize.c:2592
#5 0x000000000070e02a in try_hashjoin_path (root=0x253a318, joinrel=0x257d940,
outer_path=0x2569a40, inner_path=0x2569908,
hashclauses=0x257e4f0, jointype=JOIN_SEMI, extra=0x7fff146c0150) at
joinpath.c:543
See the following EXPLAIN output, at the configuration without --enable-cassert.
Planner expects 60.5B rows towards the self join by a relation with 72M rows.
(Probably, this estimation is too much.)
[kaigai@ayu ~]$ (echo EXPLAIN; cat ~/tpcds/query95.sql) | psql tpcds100
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=9168667273.07..9168667273.08 rows=1 width=20)
CTE ws_wh
-> Custom Scan (GpuJoin) (cost=3342534.49..654642911.88 rows=60521928028
width=24)
Bulkload: On (density: 100.00%)
Depth 1: Logic: GpuHashJoin, HashKeys: (ws_order_number), JoinQual:
((ws_warehouse_sk <> ws_warehouse_sk) AND (ws_order_number = ws_order_number)),
nrows (ratio: 84056.77%)
-> Custom Scan (BulkScan) on web_sales ws1_1
(cost=0.00..3290612.48 rows=72001248 width=16)
-> Seq Scan on web_sales ws2 (cost=0.00..3290612.48 rows=72001248
width=16)
-> Sort (cost=8514024361.19..8514024361.20 rows=1 width=20)
Sort Key: (count(DISTINCT ws1.ws_order_number))
:
This crash was triggered by Assert(nbuckets > 0), and nbuckets is calculated
as follows.
/*
* If there's not enough space to store the projected number of tuples and
* the required bucket headers, we will need multiple batches.
*/
if (inner_rel_bytes + bucket_bytes > hash_table_bytes)
{
/* We'll need multiple batches */
long lbuckets;
double dbatch;
int minbatch;
long bucket_size;
/*
* Estimate the number of buckets we'll want to have when work_mem is
* entirely full. Each bucket will contain a bucket pointer plus
* NTUP_PER_BUCKET tuples, whose projected size already includes
* overhead for the hash code, pointer to the next tuple, etc.
*/
bucket_size = (tupsize * NTUP_PER_BUCKET + sizeof(HashJoinTuple));
lbuckets = 1 << my_log2(hash_table_bytes / bucket_size);
lbuckets = Min(lbuckets, max_pointers);
nbuckets = (int) lbuckets;
bucket_bytes = nbuckets * sizeof(HashJoinTuple);
:
:
}
Assert(nbuckets > 0);
Assert(nbatch > 0);
In my case, the hash_table_bytes was 101017630802, and bucket_size was 48.
So, my_log2(hash_table_bytes / bucket_size) = 31, then lbuckets will have
negative number because both "1" and my_log2() is int32.
So, Min(lbuckets, max_pointers) chooses 0x80000000, then it was set on
the nbuckets and triggers the Assert().
Attached patch fixes the problem.
Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <[email protected]>
pgsql-fix-hash-nbuckets.patch
Description: pgsql-fix-hash-nbuckets.patch
-- Sent via pgsql-hackers mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
