date:20250402

Re: partition table optimizer join cost misestimation

2025-04-02 Thread James Pang

Andrei,
   Yes, from explain output, since optimizer already get the merge_append
cost but not take account into total cost, that make a big difference.  I
shared table DDLs and explain analyze,buffers output , I think  the data
maybe generated by other way to reproduce this issue. sorry for not sharing
the commercial production data here.

Andrei Lepikhov  於 2025年4月2日週三 下午7:03寫道：

> On 4/2/25 12:18, James Pang wrote:
> > Hi,
> > Postgresq v14.8, we found optimizer doest not take "merge append"
> > cost into sql plan total cost and then make a bad sql plan. attached
> > please find details.
> I suppose there is a different type of issue.
> MegeJoin sometimes doesn't need to scan the whole inner or outer side
> (see the MergeScanSelCache structure and how it is used in the cost
> estimation routine).
>
> So, the cost can be less because the optimizer predicted that only a
> small part of the Append will be scanned and used some sort of
> interpolation between startup cost and total cost.
>
> But to be sure, could you send the results of EXPLAIN ANALYZE VERBOSE?
> If you also send the data to reproduce the case, we may find the source
> of the problem more precisely.
>
> --
> regards, Andrei Lepikhov
>
\d+ tablea
Partitioned table "mct.tablea"
  Column   |Type | Collation | Nullable |  
Default  | Storage  | Compression | Stats target | Description
---+-+---+--+---+--+-+--+-
 mmm_id | bigint  |   | not null |  
 | plain| |  |
 clu_id| bigint  |   |  |   
| plain| |  |
 _id | bigint  |   |  | 
  | plain| |  |
 zzz_id   | bigint  |   |  |
   | plain| |  |
 ssr_id | bigint  |   |  |  
 | plain| |  |
 ssr_type   | bigint  |   |  |  
 | plain| |  |
 mmm_stat| integer |   |  |   | 
plain| |  |
 time  | timestamp without time zone |   | not null |   
| plain| |  |
 protocol  | integer |   |  |   
| plain| |  |
 uuun  | character varying(20)   |   |  |   
| extended | |  |
 lastmodifiedtime  | timestamp without time zone |   |  | 
sysdate() | plain| |  |
 mmm_sss_id | bigint  |   |  |   | 
plain| |  |
 remote_addr | character varying(20)   |   |  | 
  | extended | |  |
Partition key: RANGE ("time")
Indexes:
"tablea_pkey" PRIMARY KEY, btree (mmm_id, "time")
"tablea_idx1" btree (zzz_id, mmm_sss_id, mmm_stat)
"tablea_idx2" btree (mmm_sss_id)
Publications:
"_24503"
Triggers:
x AFTER DELETE ON tablea FOR EACH ROW WHEN (USER !~~ 'SPLEX%'::text) 
EXECUTE FUNCTION trigger_fct_xx()
tr_tablea_update BEFORE UPDATE ON tablea FOR EACH ROW WHEN (USER !~~ 
'SPLEX%'::text) EXECUTE FUNCTION trigger_fct_tr_x
tus_update()
Partitions: tablea_p1970 FOR VALUES FROM ('1970-01-01 00:00:00') TO 
('2023-09-07 00:00:00'),
tablea_p20250319 FOR VALUES FROM ('2025-03-19 00:00:00') TO 
('2025-03-20 00:00:00'),
tablea_p20250320 FOR VALUES FROM ('2025-03-20 00:00:00') TO 
('2025-03-21 00:00:00'),
tablea_p20250321 FOR VALUES FROM ('2025-03-21 00:00:00') TO 
('2025-03-22 00:00:00'),
tablea_p20250322 FOR VALUES FROM ('2025-03-22 00:00:00') TO 
('2025-03-23 00:00:00'),
tablea_p20250323 FOR VALUES FROM ('2025-03-23 00:00:00') TO 
('2025-03-24 00:00:00'),
tablea_p20250324 FOR VALUES FROM ('2025-03-24 00:00:00') TO 
('2025-03-25 00:00:00'),
tablea_p20250325 FOR VALUES FROM ('2025-03-25 00:00:00') TO 
('2025-03-26 00:00:00'),
tablea_p20250326 FOR VALUES FROM ('2025-03-26 00:00:00') TO 
('2025-03-27 00:00:00'),
tablea_p20250327 FOR VALUES FROM ('2025-03-27 00:00:00') TO 
('2025-03-28 00:00:00'),
tablea_p20250328 FOR VALUES FROM ('2025-03-28 00:00:00') TO 
('2025-03-29 00:00:00'),
tablea_p20250329 FOR VALUES FROM ('2025-03-29 00:00:00') TO 
('2025-03-30 00:00:00'),
tablea_p20250330 FOR VALUES FROM ('2025-03-30 00:00:00') TO 
('2025-03-31 00:00:00'),
tablea_p20250331 FOR V

Re: partition table optimizer join cost misestimation

2025-04-02 Thread Andrei Lepikhov


On 4/3/25 02:46, James Pang wrote:

Andrei,
    Yes, from explain output, since optimizer already get the 
merge_append cost but not take account into total cost, that make a big 
difference.  I shared table DDLs and explain analyze,buffers output , I 
think  the data maybe generated by other way to reproduce this issue. 
sorry for not sharing the commercial production data here.
I think it is almost enough to identify the issue. Let me ask you the 
following questions:

1. Can you provide min and max values in columns cc.sss_id and aa.mmm_id?
2. How often do you analyze your *parent* tables tablea and tablec? 
Remember, if you want to build statistics on a partitioned table (not a 
partition), you have to explicitly call


ANALYZE tablea,tablec;

mentioning these tables in the analyze list.

3. May you provide a dump of pg_statistic on attributes cc.sss_id and 
aa.mmm_id?
4. Is there a possibility of changing a single code line and rebuilding 
your DB instance to check a conjecture?


--
regards, Andrei Lepikhov

Re: partition table optimizer join cost misestimation

2025-04-02 Thread Andrei Lepikhov


On 4/2/25 12:18, James Pang wrote:

Hi,
    Postgresq v14.8, we found optimizer doest not take "merge append" 
cost into sql plan total cost and then make a bad sql plan. attached 
please find details.

I suppose there is a different type of issue.
MegeJoin sometimes doesn't need to scan the whole inner or outer side 
(see the MergeScanSelCache structure and how it is used in the cost 
estimation routine).


So, the cost can be less because the optimizer predicted that only a 
small part of the Append will be scanned and used some sort of 
interpolation between startup cost and total cost.


But to be sure, could you send the results of EXPLAIN ANALYZE VERBOSE? 
If you also send the data to reproduce the case, we may find the source 
of the problem more precisely.


--
regards, Andrei Lepikhov

partition table optimizer join cost misestimation

2025-04-02 Thread James Pang

Hi,
   Postgresq v14.8, we found optimizer doest not take "merge append" cost
into sql plan total cost and then make a bad sql plan. attached please find
details.

  Query: masking table and column names :

   select 
   from tablea aa
inner join tableb bb on aa.ind1 = bb.ind1
inner join tablec cc  on aa.ind2 = cc.ind2
   where aa.ind3 in ($1)
   order by cc.id3;

optimizer, it choose merge join with full index scan with a lot of rows and
huge IO needed,
looks like optimizer does not take into account the "Merge Append" huge
cost into the whole SQL plan cost,
so it thought only Sort and Merge join cost and make it to choose mergejoin
instead of nestloop.

--bad plan, it took hundreds of seconds to complete.
Sort  (cost=9624.33..9677.60 rows=21306 width=981)
   Sort Key: cc.id3
   ->  Nested Loop  (cost=99.78..2717.44 rows=21306 width=981)
 ->  Merge Join  (cost=99.35..2009.19 rows=21306 width=915)
   Merge Cond: (cc.ind2 = aa.id2)
   ->  Merge Append  (cost=8.15..17046177.32 rows=98194074
width=903)<<< merge append huge cost but looks like optimizer
does not take this into total plan cost yet.

--good plan, off mergejoin to make it chose nestloop and it took only 20
milliseconds to complete.

 Sort  (cost=19618.71..19671.98 rows=21306 width=981)
   Sort Key: cc.ind2
   ->  Nested Loop  (cost=0.72..12711.82 rows=21306 width=981)
 ->  Nested Loop  (cost=0.29..12003.57 rows=21306 width=915)
   ->  Append  (cost=0.29..42.46 rows=30 width=28)

Thanks,

James

Query: masking table and column names :

   select 
   from tablea aa 
inner join tableb bb on aa.ind1 = bb.ind1 
inner join tablec cc  on aa.ind2 = cc.ind2
   where aa.ind3 in ($1)
   order by cc.id3;

optimizer, it choose merge join with full index scan with a lot of rows and 
huge IO needed,
looks like optimizer does not take into account the "Merge Append" huge cost , 
so it thought only 
Sort and Merge join cost. that make 

--bad plan, it took hundreds of seconds to complete.
Sort  (cost=9624.33..9677.60 rows=21306 width=981)
   Sort Key: cc.id3
   ->  Nested Loop  (cost=99.78..2717.44 rows=21306 width=981)
 ->  Merge Join  (cost=99.35..2009.19 rows=21306 width=915)
   Merge Cond: (cc.ind2 = aa.id2)
   ->  Merge Append  (cost=8.15..17046177.32 rows=98194074 
width=903)<<< merge append huge cost but looks like optimizer does 
not take this into total plan cost yet.

--good plan, off mergejoin to make it chose nestloop and it took only 20 
milliseconds to complete. 

 Sort  (cost=19618.71..19671.98 rows=21306 width=981)
   Sort Key: cc.ind2
   ->  Nested Loop  (cost=0.72..12711.82 rows=21306 width=981)
 ->  Nested Loop  (cost=0.29..12003.57 rows=21306 width=915)
   ->  Append  (cost=0.29..42.46 rows=30 width=28)
   

   

explain execute slowsql9(123456789);

QUERY PLAN

-
-
 Sort  (cost=9624.33..9677.60 rows=21306 width=981)
   Sort Key: cc.id3
   ->  Nested Loop  (cost=99.78..2717.44 rows=21306 width=981)
 ->  Merge Join  (cost=99.35..2009.19 rows=21306 width=915)
   Merge Cond: (cc.ind2 = aa.id2)
   ->  Merge Append  (cost=8.15..17046177.32 rows=98194074 
width=903)<<< merge append huge cost but looks like optimizer does 
not take this into total plan cost yet.
 Sort Key: cc.ind2
 ->  Index Scan using tablec_p1970_ind2_idx on tablec_p1970 
cc_1  (cost=0.12..2.34 rows=1 width=532)
 ->  Index Scan using tablec_p20250318_ind2_idx on 
tablec_p20250318 cc_2  (cost=0.29..24653.56 rows=158631 width=861)
 ->  Index Scan using tablec_p20250319_ind2_idx on 
tablec_p20250319 cc_3  (cost=0.29..25078.61 rows=157041 width=861)
 ->  Index Scan using tablec_p20250320_ind2_idx on 
tablec_p20250320 cc_4  (cost=0.29..26336.20 rows=162959 width=868)
 ->  Index Scan using tablec_p20250321_ind2_idx on 
tablec_p20250321 cc_5  (cost=0.29..25405.45 rows=158288 width=866)
 ->  Index Scan using tablec_p20250322_ind2_idx on 
tablec_p20250322 cc_6  (cost=0.42..26346.47 rows=185312 width=837)
 ->  Index Scan using tablec_p20250323_ind2_idx on 
tablec_p20250323 cc_7  (cost=0.42..60430.87 rows=370927 width=918)
 ->  Index Scan using tablec_p20250324_ind2_idx on 
tablec_p20250324 cc_8  (cost=0.29..24798.45 rows=153782 width=862)
 ->  Index Scan using tablec_p20250325_ind2_idx on 
tablec_p20250325 cc_9  (cost=0.29..23069.31 rows=144129 width=854)
 ->  Index Scan using

Re: partition table optimizer join cost misestimation

Re: partition table optimizer join cost misestimation

Re: partition table optimizer join cost misestimation

partition table optimizer join cost misestimation

4 matches

Site Navigation

Mail list logo

Footer information