Re: Lock contention high

2021-12-20 Thread Yura Sokolov
В Чт, 28/10/2021 в 03:14 +0530, Ashkil Dighin пишет:
> Hi,
> Yes, lock contention reduced with postgresqlv14.
> Lock acquire reduced 18% to 10%
> 10.49 %postgres  postgres[.] LWLockAcquire
> 5.09%  postgres  postgres[.] _bt_compare
> 
> Is lock contention can be reduced to 0-3%?
> On pg-stat-activity shown LwLock as “BufferCounter” and “WalInsert”
> 
> 
> On Tuesday, October 26, 2021, Andres Freund  wrote:
> > Hi,
> > 
> > On 2021-10-12 13:05:12 +0530, Ashkil Dighin wrote:
> > > PostgreSQL version: 13.3
> > 
> > You could try postgres 14 - that did improve scalability in some areas.
> > 
> > 
> > 
> > > Perf data for 24vu(TPC-C)
> > > 
> > > 
> > >   18.99%  postgres  postgres[.] LWLockAcquire
> > >  7.09%  postgres  postgres[.] _bt_compare
> > >  8.66%  postgres  postgres[.] LWLockRelease
> > >  2.28%  postgres  postgres[.] GetSnapshotData
> > >  2.25%  postgres  postgres[.] hash_search_with_hash_value
> > >  2.11%  postgres  postgres[.] XLogInsertRecord
> > >  1.98%  postgres  postgres[.] PinBuffer
> > 
> > To be more useful you'd need to create a profile with 'caller' information
> > using 'perf record --call-graph dwarf', and then check what the important
> > callers are.
> > 
> > 
> > > Postgres.conf used  in Baremetal
> > > 
> > > shared_buffers = 128GB(1/4 th RAM size)
> > > effective_cachesize=392 GB(1/3 or 75% of RAM size)
> > 
> > If your hot data set is actually larger than s_b, I'd recommend trying a
> > larger s_b. It's plausible that a good chunk of lock contention is from 
> > that.
> > 

Could you try attached patch?
It reduces lock contention in buffer manager by not acquiring
two locks simultaneously on buffer eviction.

v1-0001-* - it is file for postgresql 14 and master branch
vpg13v1-0001-* - this file for postgresql 13

Corresponding (not so loud) discussion:
https://postgr.es/m/flat/1edbb61981fe1d99c3f20e3d56d6c88999f4227c.camel%40postgrespro.ru



regards,

Yura Sokolov
y.soko...@postgrespro.ru
funny.fal...@gmail.com
From efa1d36f0e6a83f4329d259f310b8da25b04bc24 Mon Sep 17 00:00:00 2001
From: Yura Sokolov 
Date: Wed, 22 Sep 2021 13:10:37 +0300
Subject: [PATCH vpg13v1] bufmgr: do not acquire two partition locks.

Acquiring two partition locks leads to complex dependency chain that hurts
at high concurrency level.

There is no need to hold both lock simultaneously. Buffer is pinned so
other processes could not select it for eviction. If tag is cleared and
buffer removed from old partition other processes will not find it.
Therefore it is safe to release old partition lock before acquiring
new partition lock.

This change requires to manually return BufferDesc to free list.

Also insertion and deletion to dynahash is optimized by avoiding
unnecessary free list manipulations in common case (when buffer is
reused)

Also small and never triggered bug in hash_update_hash_key is fixed.
---
 src/backend/storage/buffer/buf_table.c |  54 +++--
 src/backend/storage/buffer/bufmgr.c| 190 
 src/backend/utils/hash/dynahash.c  | 289 +++--
 src/include/storage/buf_internals.h|   6 +-
 src/include/utils/hsearch.h|  17 ++
 5 files changed, 410 insertions(+), 146 deletions(-)

diff --git a/src/backend/storage/buffer/buf_table.c b/src/backend/storage/buffer/buf_table.c
index 4953ae9f824..feab7f62f5b 100644
--- a/src/backend/storage/buffer/buf_table.c
+++ b/src/backend/storage/buffer/buf_table.c
@@ -107,36 +107,29 @@ BufTableLookup(BufferTag *tagPtr, uint32 hashcode)
 
 /*
  * BufTableInsert
- *		Insert a hashtable entry for given tag and buffer ID,
- *		unless an entry already exists for that tag
- *
- * Returns -1 on successful insertion.  If a conflicting entry exists
- * already, returns the buffer ID in that entry.
+ *		Insert a hashtable entry for given tag and buffer ID.
+ *		Caller should be sure there is no conflicting entry.
  *
  * Caller must hold exclusive lock on BufMappingLock for tag's partition
+ * and call BufTableLookup to check for conflicting entry.
+ *
+ * If oldelem is passed it is reused.
  */
-int
-BufTableInsert(BufferTag *tagPtr, uint32 hashcode, int buf_id)
+void
+BufTableInsert(BufferTag *tagPtr, uint32 hashcode, int buf_id, void *oldelem)
 {
 	BufferLookupEnt *result;
-	bool		found;
 
 	Assert(buf_id >= 0);		/* -1 is reserved for not-in-table */
 	Assert(tagPtr->blockNum != P_NEW);	/* invalid tag */
 
 	result = (BufferLookupEnt *)
-		hash_search_with_hash_value(SharedBufHash,
-	(void *) tagPtr,
-	hashcode,
-	HASH_ENTER,
-	);
-
-	if (found)	/* found something already in the table */
-		return result->id;
+		hash_insert_with_hash_nocheck(SharedBufHash,
+	  (void *) tagPtr,
+	  hashcode,
+	  oldelem);
 
 	result->id = buf_id;
-
-	return -1;
 }
 
 /*
@@ -144,19 

Re: Query is slower with a large proportion of NULLs in several columns

2021-12-20 Thread Tom Lane
"David G. Johnston"  writes:
> On Monday, December 20, 2021, Tom Lane  wrote:
>> It would help if somebody had labeled the units of I/O Time
>> ... but I'm guessing those are microsec vs. the millisec
>> of the other times, because otherwise it's completely wrong.

> Related to my preceding observation, from the explain (buffers) docs:
> “…and the time spent reading and writing data file blocks (in milliseconds)
> if track_io_timing
> 
> is enabled.“

Hmm ... the code sure looks like it's supposed to be millisec:

appendStringInfoString(es->str, "I/O Timings:");
if (!INSTR_TIME_IS_ZERO(usage->blk_read_time))
appendStringInfo(es->str, " read=%0.3f",
 INSTR_TIME_GET_MILLISEC(usage->blk_read_time));
if (!INSTR_TIME_IS_ZERO(usage->blk_write_time))
appendStringInfo(es->str, " write=%0.3f",
 
INSTR_TIME_GET_MILLISEC(usage->blk_write_time));

And when I try some cases here, I get I/O timing numbers that are
consistent with the overall time reported by EXPLAIN, for example

 Seq Scan on foo  (cost=0.00..843334.10 rows=1110 width=508) (actual time=0.
015..1897.492 rows=1100 loops=1)
   Buffers: shared hit=15874 read=717460
   I/O Timings: read=1184.638
 Planning:
   Buffers: shared hit=5 read=2
   I/O Timings: read=0.025
 Planning Time: 0.229 ms
 Execution Time: 2151.529 ms

So now we have a real mystery about what is happening on Lars'
system.  Those numbers can't be right.

regards, tom lane




Re: Query is slower with a large proportion of NULLs in several columns

2021-12-20 Thread David G. Johnston
On Monday, December 20, 2021, Tom Lane  wrote:

> Justin Pryzby  writes:
> > On Mon, Dec 20, 2021 at 08:11:42PM -0800, Lars Bergeson wrote:
> >> Still taking 10X more I/O to read the smaller table. Very odd.
>
> > If I'm not wrong, it's even worse than that ?
> > It takes 20 or 30sec to run the query - but it says the associated I/O
> times
> > are ~500sec or ~6000sec ?
>
> It would help if somebody had labeled the units of I/O Time
> ... but I'm guessing those are microsec vs. the millisec
> of the other times, because otherwise it's completely wrong.
>
>
Related to my preceding observation, from the explain (buffers) docs:

“…and the time spent reading and writing data file blocks (in milliseconds)
if track_io_timing

is
enabled.“

David J.


Re: Query is slower with a large proportion of NULLs in several columns

2021-12-20 Thread David G. Johnston
On Monday, December 20, 2021, Justin Pryzby  wrote:

> On Mon, Dec 20, 2021 at 08:11:42PM -0800, Lars Bergeson wrote:
> > ok, here are results after I did:
> > set max_parallel_workers_per_gather = 0;
> >
> > HashAggregate  (cost=1676432.13..1676432.16 rows=3 width=15) (actual
> time=19908.343..19908.345 rows=5 loops=1)
> >   I/O Timings: read=532369.898
> > Execution Time: 19908.383 ms
>
> > HashAggregate  (cost=1390580.70..1390580.72 rows=2 width=15) (actual
> time=30369.758..30369.761 rows=5 loops=1)
> >   I/O Timings: read=6440851.540
> > Execution Time: 30369.796 ms
>
> > Still taking 10X more I/O to read the smaller table. Very odd.
>
> If I'm not wrong, it's even worse than that ?
> It takes 20 or 30sec to run the query - but it says the associated I/O
> times
> are ~500sec or ~6000sec ?
>
> What architecture and OS/version are you running ?
> How did you install postgres?  From a package or compiled from source ?
>

The docs indicate you’ll only see I/O Timing information if using EXPLAIN
BUFFERS but I’m not seeing any of the other buffer-related information in
these plans.  Thoughts?

David J.


Re: Query is slower with a large proportion of NULLs in several columns

2021-12-20 Thread Tom Lane
Justin Pryzby  writes:
> On Mon, Dec 20, 2021 at 08:11:42PM -0800, Lars Bergeson wrote:
>> Still taking 10X more I/O to read the smaller table. Very odd.

> If I'm not wrong, it's even worse than that ?
> It takes 20 or 30sec to run the query - but it says the associated I/O times
> are ~500sec or ~6000sec ?

It would help if somebody had labeled the units of I/O Time
... but I'm guessing those are microsec vs. the millisec
of the other times, because otherwise it's completely wrong.

regards, tom lane




Re: Query is slower with a large proportion of NULLs in several columns

2021-12-20 Thread Justin Pryzby
On Mon, Dec 20, 2021 at 08:11:42PM -0800, Lars Bergeson wrote:
> ok, here are results after I did:
> set max_parallel_workers_per_gather = 0;
> 
> HashAggregate  (cost=1676432.13..1676432.16 rows=3 width=15) (actual 
> time=19908.343..19908.345 rows=5 loops=1)
>   I/O Timings: read=532369.898
> Execution Time: 19908.383 ms

> HashAggregate  (cost=1390580.70..1390580.72 rows=2 width=15) (actual 
> time=30369.758..30369.761 rows=5 loops=1)
>   I/O Timings: read=6440851.540
> Execution Time: 30369.796 ms

> Still taking 10X more I/O to read the smaller table. Very odd.

If I'm not wrong, it's even worse than that ?
It takes 20 or 30sec to run the query - but it says the associated I/O times
are ~500sec or ~6000sec ?

What architecture and OS/version are you running ?
How did you install postgres?  From a package or compiled from source ?

It might be interesting to know the output from something like this command,
depending on whether and where the headers like pg_config_x86_64.h are 
installed.

grep -r HAVE_CLOCK_GETTIME /usr/pgsql-13/include

-- 
Justin




Re: Query is slower with a large proportion of NULLs in several columns

2021-12-20 Thread Lars Bergeson
ok, here are results after I did:
set max_parallel_workers_per_gather = 0;

no nulls table is 11.462 GB:
QUERY PLAN
HashAggregate  (cost=1676432.13..1676432.16 rows=3 width=15) (actual
time=19908.343..19908.345 rows=5 loops=1)
  Group Key: roys_creation_user
  Batches: 1  Memory Usage: 24kB
  I/O Timings: read=532369.898
  ->  Seq Scan on royalty_no_null  (cost=0.00..1583887.42 rows=18508942
width=7) (actual time=0.013..16705.734 rows=18508470 loops=1)
I/O Timings: read=532369.898
Settings: effective_cache_size = '21553496kB', maintenance_io_concurrency =
'1', max_parallel_workers_per_gather = '0', search_path = 'public, public,
"$user"'
Planning Time: 0.056 ms
Execution Time: 19908.383 ms

with nulls table is 9.120 GB:
QUERY PLAN
HashAggregate  (cost=1390580.70..1390580.72 rows=2 width=15) (actual
time=30369.758..30369.761 rows=5 loops=1)
  Group Key: roys_creation_user
  Batches: 1  Memory Usage: 24kB
  I/O Timings: read=6440851.540
  ->  Seq Scan on royalty_with_null  (cost=0.00..1298048.80 rows=18506380
width=7) (actual time=0.015..25525.104 rows=18508470 loops=1)
I/O Timings: read=6440851.540
Settings: effective_cache_size = '21553496kB', maintenance_io_concurrency =
'1', max_parallel_workers_per_gather = '0', search_path = 'public, public,
"$user"'
Planning Time: 0.060 ms
Execution Time: 30369.796 ms

Still taking 10X more I/O to read the smaller table. Very odd.

Regarding the earlier comment from David Johnston: If I put null values in
just one of the columns for all rows, it should force a null bitmap to be
created for every row, with the same amount of checking of the bitmap
required. However, the query still runs faster even though the table is
larger:
with nulls table is 11.604 GB when all values are filled except 1 column
has mostly nulls. The extra 0.14 GB (11.604 GB - 11.462 GB) is probably
space consumed by null bitmaps:
QUERY PLAN
HashAggregate  (cost=1693765.03..1693765.06 rows=3 width=15) (actual
time=26452.653..26452.655 rows=5 loops=1)
  Group Key: roys_creation_user
  Batches: 1  Memory Usage: 24kB
  I/O Timings: read=2706123.209
  ->  Seq Scan on royalty_with_null_cols_filled  (cost=0.00..1601218.02
rows=18509402 width=7) (actual time=0.014..22655.366 rows=18508470 loops=1)
I/O Timings: read=2706123.209
Settings: effective_cache_size = '21553496kB', maintenance_io_concurrency =
'1', max_parallel_workers_per_gather = '0', search_path = 'public, public,
"$user"'
Planning Time: 0.068 ms
Execution Time: 26452.691 ms

It seems to be the actual presence of null values that slows things down,
even when the same sized null bitmap exists for each row.

On Mon, Dec 20, 2021 at 5:51 PM Tom Lane  wrote:

> Lars Bergeson  writes:
> > What is it about null values in the table that slows down the full table
> > scan?
>
> If a row has any nulls, then it contains a "nulls bitmap" [1] that says
> which columns are null, and that bitmap has to be consulted while
> walking through the row contents.  So the most obvious theory here
> is that that adds overhead that's significant in your case.  But there
> are some holes in that theory, mainly that the I/O timings you are
> showing don't seem very consistent:
>
> no nulls:
> >   I/O Timings: read=1884365.335
> > Execution Time: 11135.368 ms
>
> with nulls:
> >   I/O Timings: read=17141420.771
> > Execution Time: 25407.318 ms
>
> Regardless of CPU time required, it should not take 10X less I/O
> time to read a physically larger table.  So there's something
> fairly bogus going on there.  One thing you might try is disabling
> parallelism (set max_parallel_workers_per_gather = 0) to see if
> that's confusing the numbers somehow.
>
> regards, tom lane
>
> [1]
> https://www.postgresql.org/docs/current/storage-page-layout.html#STORAGE-TUPLE-LAYOUT
>


Re: Query is slower with a large proportion of NULLs in several columns

2021-12-20 Thread Tom Lane
Lars Bergeson  writes:
> What is it about null values in the table that slows down the full table
> scan?

If a row has any nulls, then it contains a "nulls bitmap" [1] that says
which columns are null, and that bitmap has to be consulted while
walking through the row contents.  So the most obvious theory here
is that that adds overhead that's significant in your case.  But there
are some holes in that theory, mainly that the I/O timings you are
showing don't seem very consistent:

no nulls:
>   I/O Timings: read=1884365.335
> Execution Time: 11135.368 ms

with nulls:
>   I/O Timings: read=17141420.771
> Execution Time: 25407.318 ms

Regardless of CPU time required, it should not take 10X less I/O
time to read a physically larger table.  So there's something
fairly bogus going on there.  One thing you might try is disabling
parallelism (set max_parallel_workers_per_gather = 0) to see if
that's confusing the numbers somehow.

regards, tom lane

[1] 
https://www.postgresql.org/docs/current/storage-page-layout.html#STORAGE-TUPLE-LAYOUT




Re: Query is slower with a large proportion of NULLs in several columns

2021-12-20 Thread David G. Johnston
On Monday, December 20, 2021, Lars Bergeson  wrote:

>
> What is it about null values in the table that slows down the full table
> scan?
>
> If I populate blank/zero for all of the unused values in columns that are
> NULLable, the query is fast again. So just defining the columns as NULLable
> isn't what slows it down -- it's actually the NULL values in the rows that
> seems to degrade performance.
>

The presence or absence of the constraint has zero effect on the contents
of the page/tuple.  As soon as you have a single null in a row you are
adding a null bitmap [1] to the stored tuple.  And now for every single
column the system has to check whether a specific column’s value is null or
not.  Given the number of columns in your table, that this is noticeable is
not surprising.

David J.

[1]  https://www.postgresql.org/docs/current/storage-page-layout.html


Query is slower with a large proportion of NULLs in several columns

2021-12-20 Thread Lars Bergeson
First of all, here is the version of PostgreSQL I'm using:
PostgreSQL 13.3 on x86_64-pc-linux-gnu, compiled by x86_64-pc-linux-gnu-gcc
(GCC) 7.4.0, 64-bit

I'm new to PostgreSQL, and I'm deciding if I should make columns in my
database nullable or not.

I have no need to distinguish between blank/zero and null. But I have
noticed that using NULL for unused values does save disk space, as opposed
to using zero/blank default values.

In my table I have 142 columns and 18,508,470 rows. Using NULLs instead of
zero/blank reduces the table storage space from 11.462 GB down to 9.120 GB.
That's a 20% reduction in size, and from what I know about the data it's
about right that 20% of the values in the database are unused.

I would think that any full table scans would run faster against the table
that has the null values since there are less data pages to read. But,
actually, the opposite is true, and by quite a large margin (about 50%
slower!).

In the "Slow Query Questions" guide, it says to mention if the table

   - has a large proportion of NULLs in several columns

Yes, it does, I would estimate that about 20% of the time a column's value
is null. Why does this matter? Is this a known thing about PostgreSQL
performance? If so, where do I read about it?

The table does not contain large objects, has been freshly loaded (so not a
lot of UPDATE/DELETEs), is not growing, only has the 1 primary index,
and does not use triggers.

Anyway, below are the query results. The field being selected
(creation_user) is not in any index, which forces a full table scan:

--> 18.844 sec to execute when all columns defined NOT NULL WITH DEFAULT,
table size is 11.462 GB
select creation_user, count(*)
   from eu.royalty_no_null
 group by creation_user;

creation_user|count   |
-++
[BLANK]  |   84546|
BACOND   |  10|
BALUN|2787|
FOGGOL   | 109|
TRBATCH  |18421018|
QUERY PLAN
Finalize GroupAggregate  (cost=1515478.96..1515479.72 rows=3 width=15)
(actual time=11133.324..11135.311 rows=5 loops=1)
  Group Key: creation_user
  I/O Timings: read=1884365.335
  ->  Gather Merge  (cost=1515478.96..1515479.66 rows=6 width=15) (actual
time=11133.315..11135.300 rows=13 loops=1)
Workers Planned: 2
Workers Launched: 2
I/O Timings: read=1884365.335
->  Sort  (cost=1514478.94..1514478.95 rows=3 width=15) (actual
time=11127.396..11127.398 rows=4 loops=3)
  Sort Key: creation_user
  Sort Method: quicksort  Memory: 25kB
  I/O Timings: read=1884365.335
  Worker 0:  Sort Method: quicksort  Memory: 25kB
  Worker 1:  Sort Method: quicksort  Memory: 25kB
  ->  Partial HashAggregate  (cost=1514478.89..1514478.92
rows=3 width=15) (actual time=11127.370..11127.372 rows=4 loops=3)
Group Key: creation_user
Batches: 1  Memory Usage: 24kB
I/O Timings: read=1884365.335
Worker 0:  Batches: 1  Memory Usage: 40kB
Worker 1:  Batches: 1  Memory Usage: 40kB
->  Parallel Seq Scan on royalty_no_null
 (cost=0.00..1475918.59 rows=7712059 width=7) (actual time=0.006..9339.296
rows=6169490 loops=3)
  I/O Timings: read=1884365.335
Settings: effective_cache_size = '21553496kB', maintenance_io_concurrency =
'1', search_path = 'public, public, "$user"'
Planning Time: 0.098 ms
Execution Time: 11135.368 ms


--> 30.57 sec to execute when all columns are nullable instead of
defaulting to zero/blank, table size is 9.120 GB:
select creation_user, count(*)
   from eu.royalty_with_null
 group by creation_user;

creation_user|count   |
-++
BACOND   |  10|
BALUN|2787|
FOGGOL   | 109|
TRBATCH  |18421018|
[NULL]   |   84546|
QUERY PLAN
+
Finalize GroupAggregate  (cost=1229649.93..1229650.44 rows=2 width=15)
(actual time=25404.925..25407.262 rows=5 loops=1)
  Group Key: creation_user
  I/O Timings: read=17141420.771
  ->  Gather Merge  (cost=1229649.93..1229650.40 rows=4 width=15) (actual
time=25404.917..25407.249 rows=12 loops=1)
Workers Planned: 2
Workers Launched: 2
I/O Timings: read=17141420.771
->  Sort  (cost=1228649.91..1228649.91 rows=2 width=15) (actual
time=25398.004..25398.006 rows=4 loops=3)
  Sort Key: creation_user
  Sort Method: quicksort  Memory: 25kB
  I/O Timings: read=17141420.771
  Worker 0:  Sort Method: quicksort  Memory: 25kB
  Worker 1:  Sort Method: quicksort  Memory: 25kB
  ->  Partial HashAggregate  (cost=1228649.88..1228649.90
rows=2 width=15) (actual time=25397.918..25397.920 rows=4 loops=3)
Group Key: creation_user