Re: FW: huge SubtransSLRU and SubtransBuffer wait_event

2024-02-29 Thread Alvaro Herrera
On 2024-Mar-01, James Pang wrote:

> one question:
>  we need to increase all SLRU buffers together , MULTIXACT, XACT,
> Subtrans, COMMIT TS ,  for example, got all of them doubled based on
> existing size ?

No need.

> or only increase Subtrans , or Subtrans and multixact ?

Just increase the sizes for the ones that are causing you pain. You can
have a look at pg_stat_slru for some metrics that might be useful in
determining which are those.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: FW: huge SubtransSLRU and SubtransBuffer wait_event

2024-02-29 Thread James Pang
>   Possible to increase  Subtrans SLRU buffer size ?

Not at present -- you need to recompile after changing NUM_SUBTRANS_BUFFERS
in src/include/access/subtrans.h, NUM_MULTIXACTOFFSET_BUFFERS and
NUM_MULTIXACTMEMBER_BUFFERS in src/include/access/multixact.h.

one question:
 we need to increase all SLRU buffers together , MULTIXACT, XACT,
Subtrans, COMMIT TS ,  for example, got all of them doubled based on
existing size ?   or only increase Subtrans , or Subtrans and multixact ?

Thanks,

James

James Pang (chaolpan)  於 2024年3月1日週五 下午2:45寫道:

>
>
> -Original Message-
> From: Alvaro Herrera 
> Sent: Friday, February 2, 2024 4:13 PM
> To: James Pang (chaolpan) 
> Cc: Laurenz Albe ;
> pgsql-performance@lists.postgresql.org
> Subject: Re: huge SubtransSLRU and SubtransBuffer wait_event
>
> On 2024-Feb-02, James Pang (chaolpan) wrote:
>
> >   Possible to increase  Subtrans SLRU buffer size ?
>
> Not at present -- you need to recompile after changing
> NUM_SUBTRANS_BUFFERS in src/include/access/subtrans.h,
> NUM_MULTIXACTOFFSET_BUFFERS and NUM_MULTIXACTMEMBER_BUFFERS in
> src/include/access/multixact.h.
>
> There's pending work to let these be configurable in version 17.
>
> >   Our case is   1) we use PL/PGSQL procedure1-->procedure2 (update
> >   table ;commit);   2) application JDBC client call procedure1
> >   (it's a long running job, sometimes it could last > 1hours).
> >   During this time window,  other Postgresql JDBC clients (100-200)
> >   coming in in same time , then quickly see MultiXactoffset and
> >   SubtransSLRU increased very quickly.
> >   PL/PGSQL proc1--> procedure2(updates table) it use substransation in
> >   procedure2 ,right?
>
> If your functions/procedures use EXCEPTION clauses, that would create
> subtransactions also.
>
> --
> Álvaro HerreraBreisgau, Deutschland  —
> https://www.EnterpriseDB.com/
> "No deja de ser humillante para una persona de ingenio saber que no hay
> tonto que no le pueda enseñar algo." (Jean B. Say)
>


Re: Optimizing count(), but Explain estimates wildly off

2024-02-29 Thread Chema
>
> > Reading around, seems many people are still using this 2005 snippet to
> obtain the
> > row count estimate from Explain:
>
> I recommend using FORMAT JSON and extracting the top row count from that.
> It is
> simpler and less error-prone.
>
Good tip, thanks Laurenze!

>
> > Is this still the current best practice?  Any tips to increase precision?
> > Currently it can estimate the actual number of rows for over or under a
> million,
> > as seen on the sample queries (1,955,297 instead of 1,001,200; 162,080
> instead
> > of 1,292,010).
>
> Looking at the samples you provided, I get the impression that the
> statistics for
> the table are quite outdated.  That will affect the estimates.  Try
> running ANALYZE
> and see if that improves the estimates.
>
>
No major changes after doing Analyze, and also Vacuum Analyze.  Seems
something is seriously off.  I pimped my config thanks to Alvaro's
prompting, set default statistics = 500 (suggested for warehouse dbs) but
raised pertinent columns from 2,000 to 5,000 (will play with disabling JIT
or raising cost later):

shared_buffers = 2GB# ~0.25 * RAM, dedicated cache, hard
allocation (requires restart)
effective_cache_size = 6GB  # 0.5-0.75 RAM (free -h: free + cache +
shared_buffers)
work_mem = 128MB# RAM * 0.25 / max_connections.
maintenance_work_mem = 512MB
default_statistics_target = 500 # def 100, higher to make planner use
indexes in big warehouse tables.
random_page_cost = 1.1  # Random reads in SSD cost almost as
little as sequential ones

Analized again (1.5M samples instead of 600k):
"tenders": scanned 216632 of 216632 pages, containing 3815567 live rows and
0 dead rows; 150 rows in sample, 3815567 estimated total rows
"items": scanned 995023 of 995023 pages, containing 7865043 live rows and 0
dead rows; 150 rows in sample, 7865043 estimated total rows

but same deal:

-- After config pimp 1,959,657 instead of 1,001,200 45,341.654 ms

Gather (cost=247031.70..1479393.82 rows=1959657 width=824) (actual time
=8464.691..45257.435 rows=1001200 loops=1)

Workers Planned: 2

Workers Launched: 2

-> Parallel Hash Join (cost=246031.70..1282428.12 rows=816524 width=824) (
actual time=8413.057..44614.153 rows=333733 loops=3)

Hash Cond: (pricescope_items.tender_transaction_id = pricescope_tenders.
transaction_id)

-> Parallel Seq Scan on pricescope_items (cost=0.00..1027794.01 rows=3277101
width=522) (actual time=0.753..41654.507 rows=2621681 loops=3)

-> Parallel Hash (cost=241080.20..241080.20 rows=396120 width=302) (actual
time=995.247..995.250 rows=333733 loops=3)

Buckets: 1048576 Batches: 1 Memory Usage: 219904kB

-> Parallel Bitmap Heap Scan on pricescope_tenders (cost=17516.10..241080.20
rows=396120 width=302) (actual time=162.898..321.472 rows=333733 loops=3)

Recheck Cond: ((country = 'Colombia'::text) AND (date >= '2023-01-01'::date)
AND (date < '2024-01-01'::date))

Heap Blocks: exact=34722

-> Bitmap Index Scan on pricescope_tenders_country_and_date_index (cost
=0.00..17278.43 rows=950688 width=0) (actual time=186.536..186.537 rows=
1001200 loops=1)

Index Cond: ((country = 'Colombia'::text) AND (date >= '2023-01-01'::date)
AND (date < '2024-01-01'::date))

Planning Time: 11.310 ms

JIT:

Functions: 33

Options: Inlining true, Optimization true, Expressions true, Deforming true

Timing: Generation 8.608 ms, Inlining 213.375 ms, Optimization 557.351 ms,
Emission 417.568 ms, Total 1196.902 ms

Execution Time: 45341.654 ms


BUT if I force the planner to ignore 'country' statistics:

-- Subselect country to hide constant from planner, so it doesn't use
statistics

Explain Analyze

Select * from pricescope_tenders inner join pricescope_items on
transaction_id = tender_transaction_id

where country = (select 'Colombia')

and "date" >= '2023-01-01' and "date" < '2024-01-01'

;

Then I get the same plan than if I filter for Mexico, with a similar run
time:

-- Colombia in subselect 428,623 instead of 1,001,200 6674.860 ms

Gather (cost=1001.00..570980.73 rows=428623 width=824) (actual time
=166.785..6600.673 rows=1001200 loops=1)

Workers Planned: 2

Params Evaluated: $0

Workers Launched: 2

InitPlan 1 (returns $0)

-> Result (cost=0.00..0.01 rows=1 width=32) (actual time=166.031..166.033
rows=1 loops=1)

-> Nested Loop (cost=0.99..527118.42 rows=178593 width=824) (actual time
=200.511..5921.585 rows=333733 loops=3)

-> Parallel Index Scan using pricescope_tenders_country_and_date_index on
pricescope_tenders (cost=0.43..104391.64 rows=86641 width=302) (actual time
=200.388..400.882 rows=333733 loops=3)

Index Cond: ((country = $0) AND (date >= '2023-01-01'::date) AND (date <
'2024-01-01'::date))

-> Index Scan using pricescope_items_tender_transaction_id_index on
pricescope_items (cost=0.56..4.83 rows=5 width=522) (actual time=0.016..
0.016 rows=1 loops=1001200)

Index Cond: (tender_transaction_id = pricescope_tenders.transaction_id)

Planning Time: 7.372 ms

JIT:

Functions: 31


Table Partitioning and Indexes Performance Questions

2024-02-29 Thread David Kelly
I was told that partitioned table indexed must always start with the
partition key columns.

Is this always the case or does it depend on use case? When would you want
to create indexes in this way?

The documentation just mentions that it is strictly unnecessary but can be
helpful. My understanding is partitions behave like normal tables. Each
gets their own index. So, I'd expect the reasoning behind creating the
index on the partition should be the same as if it were just a normal table
(assuming it has the same subset of data as the individual partition). Is
this a correct understanding?

Any other performance considerations when it comes to partitioned table
indexing? Specifically, partitioning by range where the range is a single
value.


Re: generic plan generate poor performance

2024-02-29 Thread Pavel Stehule
Hi

čt 29. 2. 2024 v 15:28 odesílatel James Pang 
napsal:

> Hi,
>we create statistics (dependencies,distinct) on (cccid,sssid);  with
> real bind variables , it make good plan of Hash join , but when it try to
> generic plan, it automatically convert to  Nestloop and then very poor sql
> performance.   why generic plan change to to a poor plan "nestloop" ? how
> to  fix that.
>

please, send result of EXPLAIN ANALYZE, try to run VACUUM ANALYZE before

probably there will not good estimation



>
>   explain execute j2eemtgatdlistsql16(27115336789879,15818676);
>QUERY PLAN
>
> 
>  Hash Left Join  (cost=11513.05..25541.17 rows=773 width=)
>Hash Cond: ((a.sssid = b.sssid) AND (a.cccid = b.cccid) AND (a.uuid =
> b.uuid))
>->  Index Scan using idx_mtgccclist_conf_j2 on mtgccclistj2 a
>  (cost=0.43..14010.19 rows=773 width=1059)
>  Index Cond: ((cccfid = '27115336789879'::bigint) AND (sssid =
> '15818676'::bigint))
>  Filter: (jstatus = ANY ('{3,7,11,2,6,10}'::bigint[]))
>->  Hash  (cost=11330.73..11330.73 rows=10393 width=51)
>  ->  Index Scan using idx_mtgccclstext_cccsssid_j2 on
> mtglistextj2 b  (cost=0.43..11330.73 rows=10393 width=51)
>Index Cond: ((cccid = '27115336789879'::bigint) AND (siteid
> = '15818676'::bigint))
> (8 rows)
>
>  explain execute j2eemtgatdlistsql16(27115336789879,15818676);
> QUERY PLAN
>
> ---
>  Nested Loop Left Join  (cost=0.87..289.53 rows=14 width=)
>->  Index Scan using idx_mtgccclist_conf_j2 on mtgccclistj2 a
>  (cost=0.43..251.94 rows=14 width=1059)
>  Index Cond: ((cccid = $1) AND (sssid = $2))
>  Filter: (jstatus = ANY ('{3,7,11,2,6,10}'::bigint[]))
>->  Index Scan using idx_mtgccclstext_cccsssid_j2 on mtglistextj2 b
>  (cost=0.43..2.66 rows=1 width=51)
>  Index Cond: ((cccid = a.cccid) AND (cccid = $1) AND (sssid =
> a.sssid) AND (sssid = $2))
>  Filter: (a.uuid = uuid)
> (7 rows)
>
> Thanks,
>
>
Regards

Pavel


> James
>


generic plan generate poor performance

2024-02-29 Thread James Pang
Hi,
   we create statistics (dependencies,distinct) on (cccid,sssid);  with
real bind variables , it make good plan of Hash join , but when it try to
generic plan, it automatically convert to  Nestloop and then very poor sql
performance.   why generic plan change to to a poor plan "nestloop" ? how
to  fix that.

  explain execute j2eemtgatdlistsql16(27115336789879,15818676);
   QUERY PLAN

 Hash Left Join  (cost=11513.05..25541.17 rows=773 width=)
   Hash Cond: ((a.sssid = b.sssid) AND (a.cccid = b.cccid) AND (a.uuid =
b.uuid))
   ->  Index Scan using idx_mtgccclist_conf_j2 on mtgccclistj2 a
 (cost=0.43..14010.19 rows=773 width=1059)
 Index Cond: ((cccfid = '27115336789879'::bigint) AND (sssid =
'15818676'::bigint))
 Filter: (jstatus = ANY ('{3,7,11,2,6,10}'::bigint[]))
   ->  Hash  (cost=11330.73..11330.73 rows=10393 width=51)
 ->  Index Scan using idx_mtgccclstext_cccsssid_j2 on
mtglistextj2 b  (cost=0.43..11330.73 rows=10393 width=51)
   Index Cond: ((cccid = '27115336789879'::bigint) AND (siteid
= '15818676'::bigint))
(8 rows)

 explain execute j2eemtgatdlistsql16(27115336789879,15818676);
QUERY PLAN
---
 Nested Loop Left Join  (cost=0.87..289.53 rows=14 width=)
   ->  Index Scan using idx_mtgccclist_conf_j2 on mtgccclistj2 a
 (cost=0.43..251.94 rows=14 width=1059)
 Index Cond: ((cccid = $1) AND (sssid = $2))
 Filter: (jstatus = ANY ('{3,7,11,2,6,10}'::bigint[]))
   ->  Index Scan using idx_mtgccclstext_cccsssid_j2 on mtglistextj2 b
 (cost=0.43..2.66 rows=1 width=51)
 Index Cond: ((cccid = a.cccid) AND (cccid = $1) AND (sssid =
a.sssid) AND (sssid = $2))
 Filter: (a.uuid = uuid)
(7 rows)

Thanks,

James


Re: Table Partitioning and Indexes Performance Questions

2024-02-29 Thread Laurenz Albe
On Thu, 2024-02-29 at 11:42 -0500, David Kelly wrote:
> I was told that partitioned table indexed must always start with the 
> partition key columns.

That's not true.

Only unique indexes (as used by primary key and unique constraints) must
contain the partitioning key (but they don't have to start with it).


> Any other performance considerations when it comes to partitioned table 
> indexing?
> Specifically, partitioning by range where the range is a single value.

Not particularly - selecting from a partitioned table is like selecting
from a UNION ALL of all partitions, except that sometimes PostgreSQL
can forgo scanning some of the partitions.
If you use very many partitions, the overhead for query planning and
execution can become noticable.

Yours,
Laurenz Albe




Re: Table Partitioning and Indexes Performance Questions

2024-02-29 Thread Anupam b
Would eliminating  triggers and stored procedure would be step #1 to start 
seeing gains from partitions?
We have many triigers and stored procedure and i am trying to to kake sure if 
need to deprecate before moving to partitioning.

Many thx
Andy

Get Outlook for Android

From: Laurenz Albe 
Sent: Thursday, February 29, 2024 9:32:48 AM
To: David Kelly ; 
pgsql-performance@lists.postgresql.org 
Subject: Re: Table Partitioning and Indexes Performance Questions

On Thu, 2024-02-29 at 11:42 -0500, David Kelly wrote:
> I was told that partitioned table indexed must always start with the 
> partition key columns.

That's not true.

Only unique indexes (as used by primary key and unique constraints) must
contain the partitioning key (but they don't have to start with it).


> Any other performance considerations when it comes to partitioned table 
> indexing?
> Specifically, partitioning by range where the range is a single value.

Not particularly - selecting from a partitioned table is like selecting
from a UNION ALL of all partitions, except that sometimes PostgreSQL
can forgo scanning some of the partitions.
If you use very many partitions, the overhead for query planning and
execution can become noticable.

Yours,
Laurenz Albe