Postgresql equal join on function with columns not use index

James Pang (chaolpan) Mon, 12 Jun 2023 01:10:08 -0700

Hi,
       We migrate from Oracle to Postgresql14.8, one SQL has regression in 
Postgres  run in 5800 milliseconds in Postgresql v14.8,  but the same SQL got 
done in several hundred milliseconds in Oracle database.
       With multiple table JOINs, if the join condition is  
tablea.column1=tableb.column1, optimizer will use the index to filter data in 
nest loops, but if  tablea.column1=regexp_replace(tableb.column1....),
Optimizer will not be able to use the index on tablea.column1, then it do a 
table scan and nestloop to produce a lot rows then use 
tablea.column1=regexp_replace(tableb.column1....) as a filter.  As a workaround 
we create a view then use tablea.column1=view.column1 that works.
     Is it expected ?    details as below.


           SELECT DISTINCT a.xxx, b.xxx as TOLLFREE FROM tableA a, tableB b
                        WHERE a.column1 = regexp_replace(b.column1,'[^0-9]','') 
AND b.column2 = $1 AND b.column3= $2
                                AND NOT EXISTS (SELECT 1 FROM tableC c
                                WHERE c.xxid = b.xxid
                                AND c.xxtype = case when b.col4 = 1 then 
'TollFree' else 'Toll' end
                                AND c.xxid = $3)

     Unique  (cost=423374.60..423377.87 rows=436 width=21) (actual 
time=6070.963..6071.054 rows=395 loops=1)
   Buffers: shared hit=148
   ->  Sort  (cost=423374.60..423375.69 rows=436 width=21) (actual 
time=6070.963..6070.992 rows=397 loops=1)
         Sort Key: a.xx, b.xx
         Sort Method: quicksort  Memory: 56kB
         Buffers: shared hit=148
         ->  Nested Loop  (cost=0.69..423355.48 rows=436 width=21) (actual 
time=120.338..6070.669 rows=397 loops=1)
               Join Filter: ((a.column1)::text = 
regexp_replace((b.column1)::text, '[^0-9]'::text, ''::text))                    
                             <<<optimizer only do filter after nest loops with 
a lot of rows
               Rows Removed by Join Filter: 1511155
               Buffers: shared hit=145
               ->  Seq Scan on tableA a  (cost=0.00..161.12 rows=7712 width=25) 
(actual time=0.022..1.380 rows=7712 loops=1)
                     Buffers: shared hit=84
               ->  Materialize  (cost=0.69..153.12 rows=207 width=21) (actual 
time=0.000..0.011 rows=196 loops=7712)
                     Buffers: shared hit=58
                     ->  Nested Loop Anti Join  (cost=0.69..152.09 rows=207 
width=21) (actual time=0.069..0.278 rows=196 loops=1)
                           Join Filter: ((c.xxid = b.xxid) AND ((c.xxxx)::text 
= CASE WHEN (b.column2 = 1) THEN 'aaa'::text ELSE 'bbb'::t
ext END))
                           Buffers: shared hit=58
                           ->  Index Scan using idx_xxx on tableB b  
(cost=0.42..146.55 rows=207 width=29) (actual time=0.047..0.207 rows=196 
loops=1)
                                 Index Cond: ((colum3 = 40957) AND (column2 = 
1))
                                 Buffers: shared hit=56
                           ->  Materialize  (cost=0.27..1.40 rows=1 width=15) 
(actual time=0.000..0.000 rows=0 loops=196)
                                 Buffers: shared hit=2
                                 ->  Index Only Scan using pk_xxxx on tableC c  
(cost=0.27..1.39 rows=1 width=15
) (actual time=0.020..0.020 rows=0 loops=1)
                                       Index Cond: (xxxid = 12407262)
                                       Heap Fetches: 0
                                       Buffers: shared hit=2

If we create a view ,the SQL got done in several million seconds,
CREATE VIEW tableBREGXP as (select 
xx,column2,column3,xxid,regexp_replace(column1,'[^0-9]','') as column1 from 
tableB);
           SELECT DISTINCT a.xxx, b.xxx as TOLLFREE FROM tableA a, tableBREGXP 
b                                <<< replace the tableB with view name.
                        WHERE a.column1 = b.column1 AND b.column2 = $1 AND 
b.column3= $2                   <<< use b.column1 to replace 
regexp_replace((b.column1)::text, '[^0-9]'::text, ''::text))
                                AND NOT EXISTS (SELECT 1 FROM tableC c
                                WHERE c.xxid = b.xxid
                                AND c.xxtype = case when b.col4 = 1 then 
'TollFree' else 'Toll' end
                                AND c.xxid = $3)

HashAggregate  (cost=408.19..412.76 rows=457 width=21) (actual 
time=4.524..4.644 rows=395 loops=1)
   Group Key: a.xxx, b.xx
   Batches: 1  Memory Usage: 61kB
   Buffers: shared hit=693
   ->  Nested Loop  (cost=0.97..405.90 rows=457 width=21) (actual 
time=0.154..4.205 rows=397 loops=1)
         Buffers: shared hit=693
         ->  Nested Loop Anti Join  (cost=0.69..214.97 rows=217 width=40) 
(actual time=0.137..2.877 rows=196 loops=1)
               Join Filter: ((c.xxyid = b.xxid) AND ((c.xxxxx)::text = CASE 
WHEN (b.column2 = 1) THEN 'To
llFree'::text ELSE 'Toll'::text END))
               Buffers: shared hit=55
               ->  Index Scan using idx_xxx on b  (cost=0.42..207.06 rows=217 
width=64) (actual time=0.123..2.725 rows=196 loops=1)
                     Index Cond: ((column2 = 40957) AND (column3 = 1))
                     Buffers: shared hit=53
               ->  Materialize  (cost=0.27..1.40 rows=1 width=15) (actual 
time=0.000..0.000 rows=0 loops=196)
                     Buffers: shared hit=2
                     ->  Index Only Scan using pk_xxxx on tableC c  
(cost=0.27..1.39 rows=1 width=15) (actual time=0.010..0.010 rows=0 loops=1)
                           Index Cond: (siteid = 12407262)
                           Heap Fetches: 0
                           Buffers: shared hit=2
         ->  Index Scan using idx_xxx on tableA a  (cost=0.28..0.86 rows=2 
width=25) (actual time=0.004..0.005 rows=2 loops=196)
               Index Cond: ((xxxx)::text = 
(regexp_replace((b.phonenumber)::text, '[^0-9]'::text, ''::text)))              
                                              <<< it use the index to filter a 
lot of rows here,
               Buffers: shared hit=638
Planning Time: 0.619 ms
Execution Time: 4.762 ms


Thanks,

James

Postgresql equal join on function with columns not use index

Reply via email to