Re: PG 16 draft release notes ready

2023-05-24 Thread David Rowley
On Thu, 25 May 2023 at 05:45, Bruce Momjian  wrote:
>
> On Wed, May 24, 2023 at 01:43:50PM -0400, Bruce Momjian wrote:
> > > * Reduce palloc() memory overhead for all memory allocations down to 8
> > > bytes on all platforms. (Andres Freund, David Rowley)
> > >
> > > This allows more efficient use of memory and is especially useful in
> > > queries which perform operations (such as sorting or hashing) that
> > > require more than work_mem.
> >
> > Well, this would go in the source code section, but it seems too
> > internal and global to mention.
>
> What was the previous memory allocation overhead?

On 64-bit builds, it was 16 bytes for AllocSet contexts, 24 bytes for
generation contexts and 16 bytes for slab contexts.

David




Re: PostgreSQL 16 Beta 1 release announcement draft

2023-05-24 Thread Andres Freund
Hi,

On 2023-05-24 23:30:58 -0400, Jonathan S. Katz wrote:
> > Ah, OK, that's why I didn't grok it. I read through the first message
> > in[1] and definitely agree it should be in the announcement. How about:
> > 
> > "PostgreSQL 16 also shows up to a 300% improvement when concurrently
> > loading data with `COPY`"
> 
> I currently have it as the below in the release announcement. If it you send
> any suggested updates, I can try to put them in before release:
> 
> PostgreSQL 16 can also improve the performance of concurrent bulk loading of
> data using [`COPY`](https://www.postgresql.org/docs/16/sql-copy.html) up to
> a 300%.

It also speeds up concurrent loading when not using COPY, just to a lesser
degree. But I can't come up with a concise phrasing for that right now...

Greetings,

Andres Freund




Re: PostgreSQL 16 Beta 1 release announcement draft

2023-05-24 Thread Jonathan S. Katz

On 5/24/23 11:30 PM, Jonathan S. Katz wrote:

On 5/24/23 9:20 PM, Jonathan S. Katz wrote:


I currently have it as the below in the release announcement. If it you 
send any suggested updates, I can try to put them in before release:


PostgreSQL 16 can also improve the performance of concurrent bulk 
loading of data using 
[`COPY`](https://www.postgresql.org/docs/16/sql-copy.html) up to a 300%.


(without the "a 300%" typo).

Jonathan



OpenPGP_signature
Description: OpenPGP digital signature


Re: PostgreSQL 16 Beta 1 release announcement draft

2023-05-24 Thread Jonathan S. Katz

On 5/24/23 9:20 PM, Jonathan S. Katz wrote:

On 5/24/23 8:04 PM, Andres Freund wrote:

Hi,

On 2023-05-24 19:57:39 -0400, Jonathan S. Katz wrote:

On 5/24/23 5:28 PM, Andres Freund wrote:


I think the relation extension improvements ought to be mentioned 
here as

well? Up to 3x faster concurrent data load with COPY seems practically
relevant.


I missed that -- not sure I'm finding it in the release notes with a 
quick

grep -- which commit/thread is this?


It was split over quite a few commits, the one improving COPY most
significantly is

commit 00d1e02be24987180115e371abaeb84738257ae2
Author: Andres Freund 
Date:   2023-04-06 16:35:21 -0700

 hio: Use ExtendBufferedRelBy() to extend tables more efficiently

Relevant thread: 
https://postgr.es/m/20221029025420.eplyow6k7tgu6...@awork3.anarazel.de


It's in the release notes as:
   Allow more efficient addition of heap and index pages (Andres Freund)


Ah, OK, that's why I didn't grok it. I read through the first message 
in[1] and definitely agree it should be in the announcement. How about:


"PostgreSQL 16 also shows up to a 300% improvement when concurrently 
loading data with `COPY`"


I currently have it as the below in the release announcement. If it you 
send any suggested updates, I can try to put them in before release:


PostgreSQL 16 can also improve the performance of concurrent bulk 
loading of data using 
[`COPY`](https://www.postgresql.org/docs/16/sql-copy.html) up to a 300%.


Jonathan


OpenPGP_signature
Description: OpenPGP digital signature


Re: Proposal: Removing 32 bit support starting from PG17++

2023-05-24 Thread Thomas Munro
On Thu, May 25, 2023 at 12:34 PM Tom Lane  wrote:
> Dunno about antique MIPS.  I think there's still some interest in
> not-antique 32-bit MIPS; I have some current-production routers
> with such CPUs.  (Sadly, they don't have enough storage to do
> anything useful with, or I'd think about repurposing one for
> buildfarm.)

FWIW "development of the MIPS architecture has ceased"[1].  (Clearly
there are living ISAs that continue either its spirit or its ...
instructions, but they aren't calling themselves MIPS.)

[1] https://en.wikipedia.org/wiki/MIPS_architecture




Re: PG 16 draft release notes ready

2023-05-24 Thread Bruce Momjian
On Thu, May 25, 2023 at 08:31:29AM +0700, John Naylor wrote:
> 
> On Wed, May 24, 2023 at 8:58 PM Bruce Momjian  wrote:
> >
> > Okay, items split into sections and several merged.  I left the
> > CPU-specific parts in Source Code, and moved the rest into a merged item
> > in General Performance, but moved the JSON item to Data Types.
> 
> It looks like it got moved to Functions actually?
> 
> > > The last one refers to new internal functions, so it could stay in source
> code.
> > > (Either way, we don't want to imply that arrays of SQL types are
> accelerated
> > > this way, it's so far only for internal arrays.)
> >
> > Good point.  I called them "C arrays" but it it into the General
> > Performance item.
> 
> Looks good to me, although...
> 
> > Allow xid/subxid searches and ASCII string detection to use vector 
> > operations
> (Nathan Bossart)
> 
> Nathan wrote the former, I did the latter.
> 
> Thanks for working on this!

Ugh, I have to remember to merge authors when I merge items --- fixed.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: PG 16 draft release notes ready

2023-05-24 Thread John Naylor
On Wed, May 24, 2023 at 8:58 PM Bruce Momjian  wrote:
>
> Okay, items split into sections and several merged.  I left the
> CPU-specific parts in Source Code, and moved the rest into a merged item
> in General Performance, but moved the JSON item to Data Types.

It looks like it got moved to Functions actually?

> > The last one refers to new internal functions, so it could stay in
source code.
> > (Either way, we don't want to imply that arrays of SQL types are
accelerated
> > this way, it's so far only for internal arrays.)
>
> Good point.  I called them "C arrays" but it it into the General
> Performance item.

Looks good to me, although...

> Allow xid/subxid searches and ASCII string detection to use vector
operations (Nathan Bossart)

Nathan wrote the former, I did the latter.

Thanks for working on this!

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Proposal: Removing 32 bit support starting from PG17++

2023-05-24 Thread Andres Freund
Hi,

On 2023-05-24 20:34:38 -0400, Tom Lane wrote:
> Thomas Munro  writes:
> > On Thu, May 25, 2023 at 11:51 AM Tom Lane  wrote:
> >> You'll no doubt be glad to hear that I'll be retiring chickadee
> >> in the very near future.
>
> > . o O { I guess chickadee might have been OK anyway, along with e.g.
> > antique low-end SGI MIPS gear etc of "workstation"/"desktop" form that
> > any collector is likely to have still running, because they only had
> > one CPU (unlike their Vogon-spaceship-sized siblings).  As long as
> > they had 64 bit load/store instructions, those couldn't be 'divided'
> > by an interrupt, so scheduler switches shouldn't be able to tear them,
> > AFAIK? }
>
> PA-RISC can probably do tear-free 8-byte reads, but Andres also
> wanted to raise the bar enough to include 32-bit atomic instructions,
> which PA-RISC hasn't got; the one such instruction it has is
> limited enough that you can't do much beyond building a spinlock.

Yes, I looked at some point, and it didn't seem viable to do more.


> I think there's still some interest in not-antique 32-bit MIPS; I have some
> current-production routers with such CPUs.  (Sadly, they don't have enough
> storage to do anything useful with, or I'd think about repurposing one for
> buildfarm.)

After spending a bunch more time staring at various reference manuals, it
looks to me like 32bit MIPS supports 64 bit atomics these days, via LLWP /
SCWP.

https://s3-eu-west-1.amazonaws.com/downloads-mips/documents/MD00086-2B-MIPS32BIS-AFP-6.06.pdf
documents them as having been added to MIPS32 Release 6, from 2014.

Greetings,

Andres Freund




Re: PostgreSQL 16 Beta 1 release announcement draft

2023-05-24 Thread Jonathan S. Katz

On 5/24/23 8:04 PM, Andres Freund wrote:

Hi,

On 2023-05-24 19:57:39 -0400, Jonathan S. Katz wrote:

On 5/24/23 5:28 PM, Andres Freund wrote:


I think the relation extension improvements ought to be mentioned here as
well? Up to 3x faster concurrent data load with COPY seems practically
relevant.


I missed that -- not sure I'm finding it in the release notes with a quick
grep -- which commit/thread is this?


It was split over quite a few commits, the one improving COPY most
significantly is

commit 00d1e02be24987180115e371abaeb84738257ae2
Author: Andres Freund 
Date:   2023-04-06 16:35:21 -0700

 hio: Use ExtendBufferedRelBy() to extend tables more efficiently

Relevant thread: 
https://postgr.es/m/20221029025420.eplyow6k7tgu6...@awork3.anarazel.de

It's in the release notes as:
   Allow more efficient addition of heap and index pages (Andres Freund)


Ah, OK, that's why I didn't grok it. I read through the first message 
in[1] and definitely agree it should be in the announcement. How about:


"PostgreSQL 16 also shows up to a 300% improvement when concurrently 
loading data with `COPY`"


Thanks,

Jonathan

[1] https://postgr.es/m/20221029025420.eplyow6k7tgu6...@awork3.anarazel.de



OpenPGP_signature
Description: OpenPGP digital signature


Re: Proposal: Removing 32 bit support starting from PG17++

2023-05-24 Thread Tom Lane
Andres Freund  writes:
> On 2023-05-24 19:51:22 -0400, Tom Lane wrote:
>> So dropping PA-RISC altogether should probably happen for v17, maybe even
>> v16.

> Definitely for 17 - not sure if we have much to gain by doing it in 16.

I'm just thinking that we'll have no way to test it.  I wouldn't advocate
such a change in released branches; but I think it'd be within policy
still for v16, and that would give us one less year of claimed support
for an arch we can't test.

regards, tom lane




Re: Proposal: Removing 32 bit support starting from PG17++

2023-05-24 Thread Andres Freund
Hi,

On 2023-05-24 19:51:22 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2023-05-24 17:44:36 -0400, Tom Lane wrote:
> >> Hmm, can we really expect atomic 8-byte reads on "relevant" 32-bit
> >> platforms?  I'd be on board with this if so, but it sounds a bit
> >> optimistic.
> 
> > ...
> 
> > So it looks like the only certain problem is PA-RISC - which I personally
> > wouldn't include in "relevant" :), with some evaluation needed for 32bit 
> > mips
> > and old arms.
> 
> You'll no doubt be glad to hear that I'll be retiring chickadee
> in the very near future.

Heh, I have to admit, I am.


> So dropping PA-RISC altogether should probably happen for v17, maybe even
> v16.

Definitely for 17 - not sure if we have much to gain by doing it in 16. The
likelihood that somebody will find a PA-RISC specific bug, after we dropped
support for 15, is pretty low, I think.


> Seems like we should poke into ARM more closely, though.

Looks like the necessary support was added in armv6k and armv7a:

https://developer.arm.com/documentation/dui0489/i/arm-and-thumb-instructions/strex
  ARM STREXB, STREXD, and STREXH are available in ARMv6K and above.
  All these 32-bit Thumb instructions are available in ARMv6T2 and above, 
except that STREXD is not available in the ARMv7-M architecture.

ARMv7-M is for microcontrollers without an MMU, so it's not going to be used
for postgres.

The arm buildfarm machines (using the architecture names from there) we have
are:

armv6l: chipmunk
ARMv7: mereswine, gull, grison
arm64: dikkop, sifaka, indri

turako says it's armv7l, but it seems actually to target aarch64.

At least for type of machine available on the buildfarm, it looks like we
actually should be good. They all appear to already be supporting 64bit
atomics, without needing further compiler flags.

Greetings,

Andres Freund




Re: Proposal: Removing 32 bit support starting from PG17++

2023-05-24 Thread Tom Lane
Thomas Munro  writes:
> On Thu, May 25, 2023 at 11:51 AM Tom Lane  wrote:
>> You'll no doubt be glad to hear that I'll be retiring chickadee
>> in the very near future.

> . o O { I guess chickadee might have been OK anyway, along with e.g.
> antique low-end SGI MIPS gear etc of "workstation"/"desktop" form that
> any collector is likely to have still running, because they only had
> one CPU (unlike their Vogon-spaceship-sized siblings).  As long as
> they had 64 bit load/store instructions, those couldn't be 'divided'
> by an interrupt, so scheduler switches shouldn't be able to tear them,
> AFAIK? }

PA-RISC can probably do tear-free 8-byte reads, but Andres also
wanted to raise the bar enough to include 32-bit atomic instructions,
which PA-RISC hasn't got; the one such instruction it has is
limited enough that you can't do much beyond building a spinlock.

Dunno about antique MIPS.  I think there's still some interest in
not-antique 32-bit MIPS; I have some current-production routers
with such CPUs.  (Sadly, they don't have enough storage to do
anything useful with, or I'd think about repurposing one for
buildfarm.)

regards, tom lane




Re: Question about error message in auth.c

2023-05-24 Thread Michael Paquier
On Wed, May 24, 2023 at 02:12:23PM -0400, Dave Cramer wrote:
> The last piece of information is the encryption state. However when an SSL
> connection fails to authenticate the state is not encrypted.
>
> When would it ever be encrypted if authentication fails ?

I am not sure to follow.  Under a SSL connection things should be
encrypted.  Or perhaps that's something related to hostssl and/or
hostnossl?

Back to the point, the SSL handshake happens before any authentication
attempt and any HBA resolution, so a connection could be encrypted
before authentication gets rejected.  The error path you are pointing
at would happen after the SSL handshake is done.  For instance, take
an HBA entry like that:
# TYPE  DATABASEUSER   ADDRESS METHOD
hostall all127.0.0.1/32reject

Then, attempting a connection with sslmode=prefer, one can see the
difference:
$ psql -h 127.0.0.1 -d postgres -U postgres
psql: error: connection to server at "127.0.0.1", port 5432 failed:
FATAL:  pg_hba.conf rejects connection for host "127.0.0.1", user
"postgres", database "postgres", SSL encryption
connection to server at "127.0.0.1", port 5432 failed: FATAL:
pg_hba.conf rejects connection for host "127.0.0.1", user "postgres",
database "postgres", no encryption
--
Michael


signature.asc
Description: PGP signature


Re: Proposal: Removing 32 bit support starting from PG17++

2023-05-24 Thread Thomas Munro
On Thu, May 25, 2023 at 11:51 AM Tom Lane  wrote:
> Andres Freund  writes:
> > On 2023-05-24 17:44:36 -0400, Tom Lane wrote:
> > So it looks like the only certain problem is PA-RISC - which I personally
> > wouldn't include in "relevant" :), with some evaluation needed for 32bit 
> > mips
> > and old arms.
>
> You'll no doubt be glad to hear that I'll be retiring chickadee
> in the very near future.

. o O { I guess chickadee might have been OK anyway, along with e.g.
antique low-end SGI MIPS gear etc of "workstation"/"desktop" form that
any collector is likely to have still running, because they only had
one CPU (unlike their Vogon-spaceship-sized siblings).  As long as
they had 64 bit load/store instructions, those couldn't be 'divided'
by an interrupt, so scheduler switches shouldn't be able to tear them,
AFAIK? }




Re: PostgreSQL 16 Beta 1 release announcement draft

2023-05-24 Thread Andres Freund
Hi,

On 2023-05-24 19:57:39 -0400, Jonathan S. Katz wrote:
> On 5/24/23 5:28 PM, Andres Freund wrote:
> >
> > I think the relation extension improvements ought to be mentioned here as
> > well? Up to 3x faster concurrent data load with COPY seems practically
> > relevant.
>
> I missed that -- not sure I'm finding it in the release notes with a quick
> grep -- which commit/thread is this?

It was split over quite a few commits, the one improving COPY most
significantly is

commit 00d1e02be24987180115e371abaeb84738257ae2
Author: Andres Freund 
Date:   2023-04-06 16:35:21 -0700

hio: Use ExtendBufferedRelBy() to extend tables more efficiently

Relevant thread: 
https://postgr.es/m/20221029025420.eplyow6k7tgu6...@awork3.anarazel.de

It's in the release notes as:
  Allow more efficient addition of heap and index pages (Andres Freund)

Greetings,

Andres Freund




Re: PostgreSQL 16 Beta 1 release announcement draft

2023-05-24 Thread Jonathan S. Katz

On 5/24/23 5:28 PM, Andres Freund wrote:


I think the relation extension improvements ought to be mentioned here as
well? Up to 3x faster concurrent data load with COPY seems practically
relevant.


I missed that -- not sure I'm finding it in the release notes with a 
quick grep -- which commit/thread is this?


But yes this does sound like something that should be included, I just 
want to read upon it.


Thanks,

Jonathan



OpenPGP_signature
Description: OpenPGP digital signature


Re: Proposal: Removing 32 bit support starting from PG17++

2023-05-24 Thread Tom Lane
Andres Freund  writes:
> On 2023-05-24 17:44:36 -0400, Tom Lane wrote:
>> Hmm, can we really expect atomic 8-byte reads on "relevant" 32-bit
>> platforms?  I'd be on board with this if so, but it sounds a bit
>> optimistic.

> ...

> So it looks like the only certain problem is PA-RISC - which I personally
> wouldn't include in "relevant" :), with some evaluation needed for 32bit mips
> and old arms.

You'll no doubt be glad to hear that I'll be retiring chickadee
in the very near future.  (I'm moving/downsizing, and that machine
isn't making the cut.)  So dropping PA-RISC altogether should probably
happen for v17, maybe even v16.  Seems like we should poke into
ARM more closely, though.

regards, tom lane




Re: Proposal: Removing 32 bit support starting from PG17++

2023-05-24 Thread Andres Freund
Hi,

On 2023-05-24 17:44:36 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Dropping CPUs without native atomic operations / without a way to do 
> > tear-free
> > 8 byte reads would make several substantial performance improvements easier,
> > while not really dropping any relevant platform.
>
> Hmm, can we really expect atomic 8-byte reads on "relevant" 32-bit
> platforms?  I'd be on board with this if so, but it sounds a bit
> optimistic.

Combining https://wiki.postgresql.org/wiki/Atomics with our docs:

>  In general, PostgreSQL can be expected to work on these CPU architectures:
>  x86, PowerPC, S/390, SPARC, ARM, MIPS, RISC-V, and PA-RISC, including
>  big-endian, little-endian, 32-bit, and 64-bit variants where applicable. It
>  is often possible to build on an unsupported CPU type by configuring with
>  --disable-spinlocks, but performance will be poor.


On x86 8 byte atomics are supported since the 586 - released in 1993. I don't
think we need to care about i386 / i486?

PPC always had it from what I can tell (the docs don't mention a minimum
version).

Sparc has supported single copy atomicity for 8 byte values since sparc v9, so
~1995 (according to wikipedia there was one V8 chip released in 1996, there's
also "LEON", a bit later, but that's "V8E", which includes the atomicity
guarantees from v9).

On s390 sufficient instructions to make 64bit atomics work natively are
supported (just using cmpxchg).

On older arm it's supported via kernel emulation - which afaict is better than
falling back to a semaphore, our current fallback. I don't currently have
access to armv7*, so I can't run a benchmark.


So the only problematic platforms are 32 bit MIPS and PA-RISC.


I'm not entirely sure whether my determination about 32 bit MIPS from back
then is actually true - I might have read the docs too narrowly back then or
they were updated since. In a newer version of the manual I see:

> The paired instructions, Load Linked and Store Conditional, can be used to
> perform an atomic read-modify-write of word or doubleword cached memory
> locations.

The word width is referenced to be 32bit earlier on the same page. And it's
documented to be true for mips32, not just mips64.

With that one can implement a 64bit cmpxchg, and with 64bit cmpxchg one can
implement a, not too efficient, tear-free read.


What gave me pause for a moment is that both clang and gcc generate calls to
external functions on 32 bit mips - but they do provide libatomic and looking
at the disassembly, there does appear to be a some non-emulated execution. But
tbh, there are so many ABIs and options that I am not sure what is what.

Reading the https://en.wikipedia.org/wiki/MIPS_architecture history part gives
me a headache: "During the mid-1990s, many new 32-bit MIPS processors for
embedded systems were MIPS II implementations because the introduction of the
64-bit MIPS III architecture in 1991 left MIPS II as the newest 32-bit MIPS
architecture until MIPS32 was introduced in 1999.[3]: 19 "

My rough understanding is that it's always doable in 64 mode, and has been
available in 32bit mode for a long time, but that it depends on the the ABI
used.


So it looks like the only certain problem is PA-RISC - which I personally
wouldn't include in "relevant" :), with some evaluation needed for 32bit mips
and old arms.


Greetings,

Andres Freund




Re: Atomic ops for unlogged LSN

2023-05-24 Thread Michael Paquier
On Wed, May 24, 2023 at 02:49:58PM -0700, Andres Freund wrote:
> I was a bit confused by Michael's comment as well, due to the section of code
> quoted. But he does have a point: pg_atomic_read_u32() does indeed *not* have
> barrier semantics (it'd be way more expensive), and the patch does contain
> this hunk:

Thanks for the correction.  The part about _add was incorrect.

> So we indeed loose some "barrier strength" - but I think that's fine. For one,
> it's a debugging-only value. But more importantly, I don't see what reordering
> the barrier could prevent - a barrier is useful for things like sequencing two
> memory accesses to happen in the intended order - but unloggedLSN doesn't
> interact with another variable that's accessed within the ControlFileLock
> afaict.

This stuff is usually tricky enough that I am never completely sure
whether it is fine without reading again README.barrier, which is
where unloggedLSN is saved in the control file under its LWLock.
Something that I find confusing in the patch is that it does not
document the reason why this is OK.
--
Michael


signature.asc
Description: PGP signature


Re: SyncRepWaitForLSN waits for XLogFlush?

2023-05-24 Thread Tejasvi Kashi
Hi Andres,

Thanks for your reply. If I understand you correctly, you're saying that
the walsender *waits* for xlogflush, but does not *cause* it. This means
that for a sync_commit=off transaction, the xlog records won't get shipped
out to standbys until the walwriter flushes in-memory xlog contents to disk.

And furthermore, am I correct in assuming that this behaviour is different
from the buffer manager and the slru which both *cause* xlog flush up to a
certain lsn before they proceed with flushing a page to disk?

The reason I'm asking this is that I'm working on modifying the transaction
manager for my thesis project, and the pg_walinspect test is failing when I
run make check-world. So, I'm just trying to understand things to identify
the cause of this issue.

Regards,
Tej

On Wed, 24 May 2023 at 17:33, Andres Freund  wrote:

> Hi,
>
> On 2023-05-24 10:53:51 -0400, Tejasvi Kashi wrote:
> > I was wondering if waiting for an LSN in SyncRepWaitForLSN ensures that
> the
> > XLOG has been flushed locally up to that location before the record is
> > shipped off to standbys?
>
> No, SyncRepWaitForLSN() doesn't directly ensure that. The callers have to
> (and
> do) call XLogFlush() separately. See e.g. the XLogFlush() call in
> RecordTransactionCommit().
>
> Note that calling SyncRepWaitForLSN() for an LSN that is not yet flushed
> would
> not lead for data to be prematurely sent out - walsender won't send data
> that
> hasn't yet been flushed. So a backend with such a spurious
> SyncRepWaitForLSN()
> would just wait until the LSN is actually flushed and *then* replicated.
>
> Any specific reason for that question?
>
> Greetings,
>
> Andres Freund
>


Re: Proposal: Removing 32 bit support starting from PG17++

2023-05-24 Thread Michael Paquier
On Wed, May 24, 2023 at 10:44:11AM -0400, Tom Lane wrote:
> Hans Buschmann  writes:
> > This inspired me to propose dropping 32bit support for PostgreSQL starting 
> > with PG17.
> 
> I don't think this is a great idea.  Even if Intel isn't interested,
> there'll be plenty of 32-bit left in the lower end of the market
> (think ARM, IoT, and so on).

A few examples of that are the first models of the Raspberry PIs,
which are still produced (until 2026 actually for the first model).
These rely on ARMv6 if I recall correctly, which are 32b.
--
Michael


signature.asc
Description: PGP signature


Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2023-05-24 Thread Michael Paquier
On Wed, May 24, 2023 at 01:03:04PM +0200, Daniel Gustafsson wrote:
> When we moved the goalposts to 1.0.1 (commit 7b283d0e1d1) we referred to RHEL6
> using 1.0.1, and RHEL6 goes out of ELS in late June 2024 seems possible to 
> drop
> 1.0.1 support during v17.  I haven't studied the patch yet but I'll have a 
> look
> at it.

Great, thanks for the help.
--
Michael


signature.asc
Description: PGP signature


Re: Atomic ops for unlogged LSN

2023-05-24 Thread Andres Freund
Hi,

On 2023-05-24 08:22:08 -0400, Robert Haas wrote:
> On Tue, May 23, 2023 at 6:26 PM Michael Paquier  wrote:
> > Spinlocks provide a full memory barrier, which may not the case with
> > add_u64() or read_u64().  Could memory ordering be a problem in these
> > code paths?
> 
> It could be a huge problem if what you say were true, but unless I'm
> missing something, the comments in atomics.h say that it isn't. The
> comments for the 64-bit functions say to look at the 32-bit functions,
> and there it says:
> 
> /*
>  * pg_atomic_add_fetch_u32 - atomically add to variable
>  *
>  * Returns the value of ptr after the arithmetic operation.
>  *
>  * Full barrier semantics.
>  */
> 
> Which is probably a good thing, because I'm not sure what good it
> would be to have an operation that both reads and modifies an atomic
> variable but has no barrier semantics.

I was a bit confused by Michael's comment as well, due to the section of code
quoted. But he does have a point: pg_atomic_read_u32() does indeed *not* have
barrier semantics (it'd be way more expensive), and the patch does contain
this hunk:

> @@ -6784,9 +6775,7 @@ CreateCheckPoint(int flags)
>* unused on non-shutdown checkpoints, but seems useful to store it 
> always
>* for debugging purposes.
>*/
> - SpinLockAcquire(>ulsn_lck);
> - ControlFile->unloggedLSN = XLogCtl->unloggedLSN;
> - SpinLockRelease(>ulsn_lck);
> + ControlFile->unloggedLSN = pg_atomic_read_u64(>unloggedLSN);
>  
>   UpdateControlFile();
>   LWLockRelease(ControlFileLock);

So we indeed loose some "barrier strength" - but I think that's fine. For one,
it's a debugging-only value. But more importantly, I don't see what reordering
the barrier could prevent - a barrier is useful for things like sequencing two
memory accesses to happen in the intended order - but unloggedLSN doesn't
interact with another variable that's accessed within the ControlFileLock
afaict.


> That's not to say that I entirely understand the point of this patch.
> It seems like a generally reasonable thing to do AFAICT but somehow I
> wonder if there's more to the story here, because it doesn't feel like
> it would be easy to measure the benefit of this patch in isolation.
> That's not a reason to reject it, though, just something that makes me
> curious.

I doubt it's a meaningful, if even measurable win. But removing atomic ops and
reducing shared memory space isn't a bad thing, even if there's no immediate
benefit...

Greetings,

Andres Freund




Re: Proposal: Removing 32 bit support starting from PG17++

2023-05-24 Thread Tom Lane
Andres Freund  writes:
> Dropping CPUs without native atomic operations / without a way to do tear-free
> 8 byte reads would make several substantial performance improvements easier,
> while not really dropping any relevant platform.

Hmm, can we really expect atomic 8-byte reads on "relevant" 32-bit
platforms?  I'd be on board with this if so, but it sounds a bit
optimistic.

regards, tom lane




Re: Proposal: Removing 32 bit support starting from PG17++

2023-05-24 Thread Andres Freund
Hi,

On 2023-05-24 14:33:06 +, Hans Buschmann wrote:
> I recently stumbled over the following Intel proposal for dropping 32bit 
> support in x86 processors. [1]

It's a proposal for something in the future. Which, even if implemented as is,
will affect future hardware, several years down the line. I don't think that's
a good reason for removing 32 bit support in postgres.

And postgres is used on non-x86 architectures...


> This inspired me to propose dropping 32bit support for PostgreSQL starting
> with PG17.
> ...
> Even if I am not a postgres hacker I suppose this could simplify things quite 
> a lot.

There's some simplification, but I don't think it'd be that much.

I do think there are code removals and simplifications that would be bigger
than dropping 32bit support.

Dropping support for effectively-obsolete compilers like sun studio (requires
random environment variables to be exported to not run out of memory) and
AIX's xlc (requires a lot of extra compiler flags to be passed in for a sane
build) would remove a fair bit of code.

Dropping CPUs without native atomic operations / without a way to do tear-free
8 byte reads would make several substantial performance improvements easier,
while not really dropping any relevant platform.

Etc.

Greetings,

Andres Freund




Re: testing dist tarballs

2023-05-24 Thread Tom Lane
Andres Freund  writes:
> First thing I noticed that 'make dist' doesn't work in a vpath, failing in a
> somewhat obscure way (likely because in a vpath build the the copy from the
> source dir doesn't include GNUMakefile). Do we expect it to work?

Don't see how it could possibly be useful in a vpath, because you'd have
the real source files and the generated files in different trees.

regards, tom lane




Re: SyncRepWaitForLSN waits for XLogFlush?

2023-05-24 Thread Andres Freund
Hi,

On 2023-05-24 10:53:51 -0400, Tejasvi Kashi wrote:
> I was wondering if waiting for an LSN in SyncRepWaitForLSN ensures that the
> XLOG has been flushed locally up to that location before the record is
> shipped off to standbys?

No, SyncRepWaitForLSN() doesn't directly ensure that. The callers have to (and
do) call XLogFlush() separately. See e.g. the XLogFlush() call in
RecordTransactionCommit().

Note that calling SyncRepWaitForLSN() for an LSN that is not yet flushed would
not lead for data to be prematurely sent out - walsender won't send data that
hasn't yet been flushed. So a backend with such a spurious SyncRepWaitForLSN()
would just wait until the LSN is actually flushed and *then* replicated.

Any specific reason for that question?

Greetings,

Andres Freund




Re: Wrong results due to missing quals

2023-05-24 Thread Tom Lane
I wrote:
> ... Another idea is that maybe we need another
> RestrictInfo field that's directly a set of OJ relids that this clause
> can't be applied above.  That'd reduce clause_is_computable_at to
> basically a bms_intersect test which would be nice speed-wise.  The
> space consumption could be annoying, but I'm thinking that we might
> only have to populate the field in clone clauses, which would
> alleviate that issue.

I tried this and it seems to work all right: it fixes the example
you showed while not causing any new failures.  (Doesn't address
the broken join-removal logic you showed in the other thread,
though.)

While at it, I also changed make_restrictinfo to treat has_clone
and is_clone as first-class citizens, to fix the dubious coding in
equivclass.c that I mentioned at [1].

regards, tom lane

[1] https://www.postgresql.org/message-id/395264.1684698283%40sss.pgh.pa.us

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 428ea3810f..c5cada55fb 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -6521,8 +6521,11 @@ foreign_grouping_ok(PlannerInfo *root, RelOptInfo *grouped_rel,
 	  expr,
 	  true,
 	  false,
+	  false,
+	  false,
 	  root->qual_security_level,
 	  grouped_rel->relids,
+	  NULL,
 	  NULL);
 			if (is_foreign_expr(root, grouped_rel, expr))
 fpinfo->remote_conds = lappend(fpinfo->remote_conds, rinfo);
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index f1a96b8584..2ab4f3dbf3 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -539,14 +539,13 @@ set includes the A/B join relid which is not in the input.  However,
 if we form B/C after A/B, then both forms of the clause are applicable
 so far as that test can tell.  We have to look more closely to notice
 that the "Pbc" clause form refers to relation B which is no longer
-directly accessible.  While this check is straightforward, it's not
-especially cheap (see clause_is_computable_at()).  To avoid doing it
-unnecessarily, we mark the variant versions of a redundant clause as
-either "has_clone" or "is_clone".  When considering a clone clause,
-we must check clause_is_computable_at() to disentangle which version
-to apply at the current join level.  (In debug builds, we also Assert
-that non-clone clauses are validly computable at the current level;
-but that seems too expensive for production usage.)
+directly accessible.  While such a check could be performed using the
+per-relation RelOptInfo.nulling_relids data, it would be annoyingly
+expensive to do over and over as we consider different join paths.
+To make this simple and reliable, we compute an "incompatible_relids"
+set for each variant version (clone) of a redundant clause.  A clone
+clause should not be applied if any of the outer-join relids listed in
+incompatible_relids has already been computed below the current join.
 
 
 Optimizer Functions
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 2db1bf6448..7fa502d6e2 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -197,9 +197,12 @@ process_equivalence(PlannerInfo *root,
 make_restrictinfo(root,
   (Expr *) ntest,
   restrictinfo->is_pushed_down,
+  restrictinfo->has_clone,
+  restrictinfo->is_clone,
   restrictinfo->pseudoconstant,
   restrictinfo->security_level,
   NULL,
+  restrictinfo->incompatible_relids,
   restrictinfo->outer_relids);
 		}
 		return false;
@@ -1972,7 +1975,8 @@ create_join_clause(PlannerInfo *root,
  * clause into the regular processing, because otherwise the join will be
  * seen as a clauseless join and avoided during join order searching.
  * We handle this by generating a constant-TRUE clause that is marked with
- * required_relids that make it a join between the correct relations.
+ * the same required_relids etc as the removed outer-join clause, thus
+ * making it a join clause between the correct relations.
  */
 void
 reconsider_outer_join_clauses(PlannerInfo *root)
@@ -2001,10 +2005,13 @@ reconsider_outer_join_clauses(PlannerInfo *root)
 /* throw back a dummy replacement clause (see notes above) */
 rinfo = make_restrictinfo(root,
 		  (Expr *) makeBoolConst(true, false),
-		  true, /* is_pushed_down */
+		  rinfo->is_pushed_down,
+		  rinfo->has_clone,
+		  rinfo->is_clone,
 		  false,	/* pseudoconstant */
 		  0,	/* security_level */
 		  rinfo->required_relids,
+		  rinfo->incompatible_relids,
 		  rinfo->outer_relids);
 distribute_restrictinfo_to_rels(root, rinfo);
 			}
@@ -2026,10 +2033,13 @@ reconsider_outer_join_clauses(PlannerInfo *root)
 /* throw back a dummy 

Re: PostgreSQL 16 Beta 1 release announcement draft

2023-05-24 Thread Andres Freund
Hi,

On 2023-05-24 13:06:30 -0400, Jonathan S. Katz wrote:
> PostgreSQL 16 Feature Highlights
> 
> 
> ### Performance
> 
> PostgreSQL 16 includes performance improvements in query execution. This 
> release
> adds more query parallelism, including allowing `FULL` and `RIGHT` joins to
> execute in parallel, and parallel execution of the `string_agg` and 
> `array_agg`
> aggregate functions. Additionally, PostgreSQL 16 can use incremental sorts in
> `SELECT DISTINCT` queries. There are also several optimizations for
> [window 
> queries](https://www.postgresql.org/docs/16/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS),
> improvements in lookups for `RANGE` and `LIST` partitions, and support for
> "anti-joins" in `RIGHT` and `OUTER` queries.
> 
> This release also introduces support for CPU acceleration using SIMD for both
> x86 and ARM architectures, including optimizations for processing ASCII and 
> JSON
> strings, and array and subtransaction searches. Additionally, PostgreSQL 16
> introduces [load 
> balancing](https://www.postgresql.org/docs/16/libpq-connect.html#LIBPQ-CONNECT-LOAD-BALANCE-HOSTS)
> to libpq, the client library for PostgreSQL.

I think the relation extension improvements ought to be mentioned here as
well? Up to 3x faster concurrent data load with COPY seems practically
relevant.

Greetings,

Andres Freund




testing dist tarballs

2023-05-24 Thread Andres Freund
Hi,

On 2023-05-23 14:51:03 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I guess I need to go and check how long the "release" tarball generation
> > takes...
> 
> It's quick except for the documentation-generating steps.  Maybe
> we could test that part only once?

First thing I noticed that 'make dist' doesn't work in a vpath, failing in a
somewhat obscure way (likely because in a vpath build the the copy from the
source dir doesn't include GNUMakefile). Do we expect it to work?


Besides docs, the slowest part appears to be gzip --best and then bzip2, as
those runs serially and takes 11 and 13 seconds respectively here...

The first thing I tried was:
  make -j8 dist GZIP=pigz BZIP2=pbzip2

unfortunately that results in

pigz: abort: cannot provide files in GZIP environment variable

echo GZIP=pigz >> src/Makefile.custom
echo BZIP2=pbzip2 >> src/Makefile.custom

reduces that to

real1m6.472s
user1m28.316s
sys 0m5.340s

real0m54.811s
user1m42.078s
sys 0m6.183s

still not great...


OTOH, we currently already build the docs as part of the CompilerWarnings
test. I don't think there's a reason to test that twice?


For me make distcheck currently fails:

In file included from ../../src/include/postgres.h:46,
 from hashfn.c:24:
../../src/include/utils/elog.h:79:10: fatal error: utils/errcodes.h: No such 
file or directory
   79 | #include "utils/errcodes.h"
  |  ^~
compilation terminated.
make[3]: *** [: hashfn.o] Error 1

at first I thought it was due to my use of -j8 - but it doesn't even work
without that.


That's due to MAKELEVEL:

submake-generated-headers:
ifndef NO_GENERATED_HEADERS
ifeq ($(MAKELEVEL),0)
$(MAKE) -C $(top_builddir)/src/backend generated-headers
endif
endif

So the distcheck target needs to reset MAKELEVEL=0 - unless somebody has a
better idea?


Separately, it's somewhat confusing that we include errcodes.h etc in
src/backend/utils, rather than its final location, in src/include/utils. It
works, even without perl, because copying the file doesn't require perl, it's
just generating it...

Greetings,

Andres Freund




Re: [PATCH] psql: \dn+ to show size of each schema (and \dA+ for AMs)

2023-05-24 Thread Justin Pryzby
I added documentation for the SQL functions in 001.
And updated to say 17

I'm planning to set this patch as ready - it has not changed
significantly in 18 months.  Not for the first time, I've implemented a
workaround at a higher layer.

-- 
Justin
>From 917e5e36d14018b6de457ba9eafe8936c0e88c3c Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 13 Jul 2021 21:25:48 -0500
Subject: [PATCH 1/4] Add pg_am_size(), pg_namespace_size() ..

See also: 358a897fa, 528ac10c7
---
 doc/src/sgml/func.sgml  |  39 ++
 src/backend/utils/adt/dbsize.c  | 132 
 src/include/catalog/pg_proc.dat |  19 +
 3 files changed, 190 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5a47ce43434..6cbf4e9aa56 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27762,6 +27762,45 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset

   
 
+  
+   
+
+ pg_namespace_size
+
+pg_namespace_size ( name )
+bigint
+   
+   
+pg_namespace_size ( oid )
+bigint
+   
+   
+Computes the total disk space used by relations in the namespace (schema)
+with the specified name or OID. To use this function, you must
+have CREATE privilege on the specified namespace
+or have privileges of the pg_read_all_stats role,
+unless it is the default namespace for the current database.
+   
+  
+
+  
+   
+
+ pg_am_size
+
+pg_am_size ( name )
+bigint
+   
+   
+pg_am_size ( oid )
+bigint
+   
+   
+Computes the total disk space used by relations using the access method
+with the specified name or OID.
+   
+  
+
   

 
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index e5c0f1c45b6..af0955d1790 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -13,19 +13,25 @@
 
 #include 
 
+#include "access/genam.h"
 #include "access/htup_details.h"
 #include "access/relation.h"
+#include "access/table.h"
 #include "catalog/catalog.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_database.h"
+#include "catalog/pg_namespace.h"
 #include "catalog/pg_tablespace.h"
 #include "commands/dbcommands.h"
+#include "commands/defrem.h"
 #include "commands/tablespace.h"
 #include "miscadmin.h"
 #include "storage/fd.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
 #include "utils/numeric.h"
 #include "utils/rel.h"
 #include "utils/relfilenumbermap.h"
@@ -858,6 +864,132 @@ pg_size_bytes(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(result);
 }
 
+/*
+ * Return the sum of size of relations for which the given attribute of
+ * pg_class matches the specified OID value.
+ */
+static int64
+calculate_size_attvalue(AttrNumber attnum, Oid attval)
+{
+	int64		totalsize = 0;
+	ScanKeyData skey;
+	Relation	pg_class;
+	SysScanDesc scan;
+	HeapTuple	tuple;
+
+	ScanKeyInit(, attnum,
+BTEqualStrategyNumber, F_OIDEQ, attval);
+
+	pg_class = table_open(RelationRelationId, AccessShareLock);
+	scan = systable_beginscan(pg_class, InvalidOid, false, NULL, 1, );
+	while (HeapTupleIsValid(tuple = systable_getnext(scan)))
+	{
+		Form_pg_class classtuple = (Form_pg_class) GETSTRUCT(tuple);
+		Relation	rel;
+
+		rel = try_relation_open(classtuple->oid, AccessShareLock);
+		if (!rel)
+			continue;
+
+		for (ForkNumber forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+			totalsize += calculate_relation_size(&(rel->rd_locator), rel->rd_backend, forkNum);
+
+		relation_close(rel, AccessShareLock);
+	}
+
+	systable_endscan(scan);
+	table_close(pg_class, AccessShareLock);
+	return totalsize;
+}
+
+/* Compute the size of relations in a schema (namespace) */
+static int64
+calculate_namespace_size(Oid nspOid)
+{
+	/*
+	 * User must be a member of pg_read_all_stats or have CREATE privilege for
+	 * target namespace.
+	 */
+	if (!is_member_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS))
+	{
+		AclResult	aclresult;
+
+		aclresult = object_aclcheck(NamespaceRelationId, nspOid, GetUserId(), ACL_CREATE);
+		if (aclresult != ACLCHECK_OK)
+			aclcheck_error(aclresult, OBJECT_SCHEMA,
+		   get_namespace_name(nspOid));
+	}
+
+	return calculate_size_attvalue(Anum_pg_class_relnamespace, nspOid);
+}
+
+Datum
+pg_namespace_size_oid(PG_FUNCTION_ARGS)
+{
+	Oid			nspOid = PG_GETARG_OID(0);
+	int64		size;
+
+	size = calculate_namespace_size(nspOid);
+
+	if (size < 0)
+		PG_RETURN_NULL();
+
+	PG_RETURN_INT64(size);
+}
+
+Datum
+pg_namespace_size_name(PG_FUNCTION_ARGS)
+{
+	Name		nspName = PG_GETARG_NAME(0);
+	Oid			nspOid = get_namespace_oid(NameStr(*nspName), false);
+	int64		size;
+
+	size = calculate_namespace_size(nspOid);
+
+	if (size < 0)
+		PG_RETURN_NULL();
+
+	

Re: walsender performance regression due to logical decoding on standby changes

2023-05-24 Thread Andres Freund
Hi,

On 2023-05-24 05:53:51 +, Zhijie Hou (Fujitsu) wrote:
> On Tuesday, May 23, 2023 1:53 AM Andres Freund  wrote:
> > On 2023-05-22 12:15:07 +, Zhijie Hou (Fujitsu) wrote:
> > > About "a backend doing logical decoding", do you mean the case when a
> > user
> > > start a backend and invoke pg_logical_slot_get_changes() to do the logical
> > > decoding ? If so, it seems the logical decoding in a backend won't be 
> > > waked
> > up
> > > by startup process because the backend won't be registered as a walsender
> > so
> > > the backend won't be found in WalSndWakeup().
> > 
> > I meant logical decoding happening inside a walsender instance.
> > 
> > 
> > > Or do you mean the deadlock between the real logical walsender and startup
> > > process ? (I might miss something) I think the logical decoding doesn't 
> > > lock
> > > the target user relation when decoding because it normally can get the
> > needed
> > > information from WAL.
> > 
> > It does lock catalog tables briefly. There's no guarantee that such locks 
> > are
> > released immediately. I forgot the details, but IIRC there's some outfuncs
> > (enum?) that intentionally delay releasing locks till transaction commit.
> 
> Thanks for the explanation !
> 
> I understand that the startup process can take lock on the catalog(when
> replaying record) which may conflict with the lock in walsender.
> 
> But in walsender, I think we only start transaction after entering
> ReorderBufferProcessTXN(), and the transaction started here will be released
> soon after processing and outputting the decoded transaction's data(as the
> comment in ReorderBufferProcessTXN() says:" all locks acquired in here to be
> released, not reassigned to the parent and we do not want any database access
> have persistent effects.").

It's possible that there's no immediate danger - but I wouldn't want to bet on
it staying that way. Think about things like streaming out large transactions
etc.  There also are potential dangers outside of plain things like locks -
there could be a recovery conflicts on the snapshot or such (although that
normally should be prevented via hot_standby_feedback or such).


Even if there's no hard deadlock issue - not waking up a
walsender-in-logical-decoding that's currently waiting because (replay_count %
N) != 0 means that the walsender might be delayed for quite a while - there
might not be any further records from the primary in the near future.


I don't think any approach purely based on record counts has any chance to
work well.  You could combine it with something else, e.g. with always waking
up in XLogPageRead() and WaitForWALToBecomeAvailable().  But then you end up
with something relatively complicated again.

Greetings,

Andres Freund




Re: memory leak in trigger handling (since PG12)

2023-05-24 Thread Andres Freund
Hi,

On 2023-05-24 21:56:22 +0200, Tomas Vondra wrote:
> >> The really hard thing was determining what causes the memory leak - the
> >> simple instrumentation doesn't help with that at all. It tells you there
> >> might be a leak, but you don't know where did the allocations came from.
> >>
> >> What I ended up doing is a simple gdb script that sets breakpoints on
> >> all palloc/pfree variants, and prints info (including the backtrace) for
> >> each call on ExecutorState. And then a script that aggregate those to
> >> identify which backtraces allocated most chunks that were not freed.
> > 
> > FWIW, for things like this I found "heaptrack" to be extremely helpful.
> > 
> > E.g. for a reproducer of the problem here, it gave me the attach "flame 
> > graph"
> > of the peak memory usage - after attaching to a running backend and running 
> > an
> > UPDATE triggering the leak..
> > 
> > Because the view I screenshotted shows the stacks contributing to peak 
> > memory
> > usage, it works nicely to find "temporary leaks", even if memory is actually
> > freed after all etc.
> > 
> 
> That's a nice visualization, but isn't that useful only once you
> determine there's a memory leak? Which I think is the hard problem.

So is your gdb approach, unless I am misunderstanding? The view I
screenshotted shows the "peak" allocated memory, if you have a potential leak,
you can see where most of the allocated memory was allocated. Which at least
provides you with a good idea of where to look for a problem in more detail.


> >>> Hm. Somehow this doesn't seem quite right. Shouldn't we try to use a 
> >>> shorter
> >>> lived memory context instead? Otherwise we'll just end up with the same
> >>> problem in a few years.
> >>>
> >>
> >> I agree using a shorter lived memory context would be more elegant, and
> >> more in line with how we do things. But it's not clear to me why we'd
> >> end up with the same problem in a few years with what the patch does.
> > 
> > Because it sets up the pattern of manual memory management and continues to
> > run the relevant code within a query-lifetime context.
> > 
> 
> Oh, you mean someone might add new allocations to this code (or into one
> of the functions executed from it), and that'd leak again? Yeah, true.

Yes. It's certainly not obvious this far down that we are called in a
semi-long-lived memory context.

Greetings,

Andres Freund




Re: memory leak in trigger handling (since PG12)

2023-05-24 Thread Andres Freund
Hi,

On 2023-05-24 21:49:13 +0200, Tomas Vondra wrote:
> >> and then have to pass updatedCols elsewhere. It's tricky to just switch
> >> to the context (e.g. in ExecASUpdateTriggers/ExecARUpdateTriggers), as
> >> AfterTriggerSaveEvent allocates other bits of memory too (in a longer
> >> lived context).
> > 
> > Hm - on a quick look the allocations in trigger.c itself are done with
> > MemoryContextAlloc().
> > 
> > I did find a problematic path, namely that ExecGetChildToRootMap() ends up
> > building resultRelInfo->ri_ChildToRootMap in CurrentMemoryContext.
> > 
> > That seems like a flat out bug to me - we can't just store data in a
> > ResultRelInfo without ensuring the memory is lives long enough. Nearby 
> > places
> > like ExecGetRootToChildMap() do make sure to switch to es_query_cxt.
> > 
> > 
> > Did you see other bits of memory getting allocated in CurrentMemoryContext?
> > 
> 
> No, I simply tried to do the context switch and then gave up when it
> crashed on the ExecGetRootToChildMap info. I haven't looked much
> further, but you may be right it's the only bit.
> 
> It didn't occur to me it might be a bug - I think the code simply
> assumes it gets called with suitable memory context, just like we do in
> various other places. Maybe it should document the assumption.

I think architecturally, code storing stuff in a ResultRelInfo - which has
query lifetime - ought to be careful to allocate memory with such lifetime.
Note that the nearby ExecGetRootToChildMap() actually is careful to do so.


> >> So we'd have to do another switch again. Not sure how
> >> backpatch-friendly would that be.
> > 
> > Yea, that's a valid concern. I think it might be reasonable to use something
> > like ExecGetAllUpdatedColsCtx() in the backbranches, and switch to a
> > short-lived context for the trigger invocations in >= 16.


Hm - stepping back a bit, why are we doing the work in ExecGetAllUpdatedCols()
over and over?  Unless I am missing something, the result doesn't change
across rows. And it doesn't look that cheap to compute, leaving aside the
allocation that bms_union() does.

It's visible in profiles, not as a top entry, but still.

Perhaps the easiest to backpatch fix is to just avoid recomputing the value?
But perhaps it'd be just as problmeatic, because callers might modify
ExecGetAllUpdatedCols()'s return value in place...

Greetings,

Andres Freund




Re: memory leak in trigger handling (since PG12)

2023-05-24 Thread Tomas Vondra
On 5/24/23 20:14, Andres Freund wrote:
> Hi,
> 
> On 2023-05-23 23:26:42 +0200, Tomas Vondra wrote:
>> On 5/23/23 19:14, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2023-05-23 18:23:00 +0200, Tomas Vondra wrote:
 This means that for an UPDATE with triggers, we may end up calling this
 for each row, possibly multiple bitmaps. And those bitmaps are allocated
 in ExecutorState, so won't be freed until the end of the query :-(
>>>
>>> Ugh.
>>>
>>>
>>> I've wondered about some form of instrumentation to detect such issues
>>> before. It's obviously a problem that we can have fairly large leaks, like 
>>> the
>>> one you just discovered, without detecting it for a couple years. It's kinda
>>> made worse by the memory context infrastructure, because it hides such 
>>> issues.
>>>
>>> Could it help to have a mode where the executor shutdown hook checks how 
>>> much
>>> memory is allocated in ExecutorState and warns if its too much? There's 
>>> IIRC a
>>> few places that allocate large things directly in it, but most of those
>>> probably should be dedicated contexts anyway.  Something similar could be
>>> useful for some other long-lived contexts.
>>>
>>
>> Not sure such simple instrumentation would help much, unfortunately :-(
>>
>> We only discovered this because the user reported OOM crashes, which
>> means the executor didn't get to the shutdown hook at all. Yeah, maybe
>> if we had such instrumentation it'd get triggered for milder cases, but
>> I'd bet the amount of noise is going to be significant.
>>
>> For example, there's a nearby thread about hashjoin allocating buffiles
>> etc. in ExecutorState - we ended up moving that to a separate context,
>> but surely there are more such cases (not just for ExecutorState).
> 
> Yes, that's why I said that we would have to more of those into dedicated
> contexts - which is a good idea independent of this issue.
> 

Yeah, I think that's a good idea in principle.

> 
>> The really hard thing was determining what causes the memory leak - the
>> simple instrumentation doesn't help with that at all. It tells you there
>> might be a leak, but you don't know where did the allocations came from.
>>
>> What I ended up doing is a simple gdb script that sets breakpoints on
>> all palloc/pfree variants, and prints info (including the backtrace) for
>> each call on ExecutorState. And then a script that aggregate those to
>> identify which backtraces allocated most chunks that were not freed.
> 
> FWIW, for things like this I found "heaptrack" to be extremely helpful.
> 
> E.g. for a reproducer of the problem here, it gave me the attach "flame graph"
> of the peak memory usage - after attaching to a running backend and running an
> UPDATE triggering the leak..
> 
> Because the view I screenshotted shows the stacks contributing to peak memory
> usage, it works nicely to find "temporary leaks", even if memory is actually
> freed after all etc.
> 

That's a nice visualization, but isn't that useful only once you
determine there's a memory leak? Which I think is the hard problem.

> 
> 
>>> Hm. Somehow this doesn't seem quite right. Shouldn't we try to use a shorter
>>> lived memory context instead? Otherwise we'll just end up with the same
>>> problem in a few years.
>>>
>>
>> I agree using a shorter lived memory context would be more elegant, and
>> more in line with how we do things. But it's not clear to me why we'd
>> end up with the same problem in a few years with what the patch does.
> 
> Because it sets up the pattern of manual memory management and continues to
> run the relevant code within a query-lifetime context.
> 

Oh, you mean someone might add new allocations to this code (or into one
of the functions executed from it), and that'd leak again? Yeah, true.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: memory leak in trigger handling (since PG12)

2023-05-24 Thread Andres Freund
Hi,

On 2023-05-23 18:23:00 +0200, Tomas Vondra wrote:
> While looking for other places allocating stuff in ExecutorState (for
> the UPDATE case) and leaving it there, I found two more cases:

For a bit I thought there was a similar problem in ExecWithCheckOptions() -
but we error out immediately afterwards, so there's no danger of accumulating
memory.


I find it quite depressing that we have at least four copies of:

/*
 * If the tuple has been routed, it's been converted to the partition's
 * rowtype, which might differ from the root table's.  We must convert 
it
 * back to the root table's rowtype so that val_desc in the error 
message
 * matches the input tuple.
 */
if (resultRelInfo->ri_RootResultRelInfo)
{
ResultRelInfo *rootrel = resultRelInfo->ri_RootResultRelInfo;
TupleDesc   old_tupdesc;
AttrMap*map;

root_relid = RelationGetRelid(rootrel->ri_RelationDesc);
tupdesc = RelationGetDescr(rootrel->ri_RelationDesc);

old_tupdesc = RelationGetDescr(resultRelInfo->ri_RelationDesc);
/* a reverse map */
map = build_attrmap_by_name_if_req(old_tupdesc, tupdesc, false);

/*
 * Partition-specific slot's tupdesc can't be changed, so 
allocate a
 * new one.
 */
if (map != NULL)
slot = execute_attr_map_slot(map, slot,

 MakeTupleTableSlot(tupdesc, ));
modifiedCols = bms_union(ExecGetInsertedCols(rootrel, estate),
 
ExecGetUpdatedCols(rootrel, estate));
}


Once in ExecPartitionCheckEmitError(), *twice* in ExecConstraints(),
ExecWithCheckOptions(). Some of the partitioning stuff has been added in a
really myopic way.

Greetings,

Andres Freund




Re: memory leak in trigger handling (since PG12)

2023-05-24 Thread Tomas Vondra



On 5/24/23 20:55, Andres Freund wrote:
> Hi,
> 
> On 2023-05-24 15:38:41 +0200, Tomas Vondra wrote:
>> I looked at this again, and I think GetPerTupleMemoryContext(estate)
>> might do the trick, see the 0002 part.
> 
> Yea, that seems like the right thing here.
> 
> 
>> Unfortunately it's not much
>> smaller/simpler than just freeing the chunks, because we end up doing
>>
>> oldcxt = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
>> updatedCols = ExecGetAllUpdatedCols(relinfo, estate);
>> MemoryContextSwitchTo(oldcxt);
> 
> We could add a variant of ExecGetAllUpdatedCols that switches the context.
> 

Yeah, we could do that. I was thinking about backpatching, and modifying
 ExecGetAllUpdatedCols signature would be ABI break, but adding a
variant should be fine.

> 
>> and then have to pass updatedCols elsewhere. It's tricky to just switch
>> to the context (e.g. in ExecASUpdateTriggers/ExecARUpdateTriggers), as
>> AfterTriggerSaveEvent allocates other bits of memory too (in a longer
>> lived context).
> 
> Hm - on a quick look the allocations in trigger.c itself are done with
> MemoryContextAlloc().
> 
> I did find a problematic path, namely that ExecGetChildToRootMap() ends up
> building resultRelInfo->ri_ChildToRootMap in CurrentMemoryContext.
> 
> That seems like a flat out bug to me - we can't just store data in a
> ResultRelInfo without ensuring the memory is lives long enough. Nearby places
> like ExecGetRootToChildMap() do make sure to switch to es_query_cxt.
> 
> 
> Did you see other bits of memory getting allocated in CurrentMemoryContext?
> 

No, I simply tried to do the context switch and then gave up when it
crashed on the ExecGetRootToChildMap info. I haven't looked much
further, but you may be right it's the only bit.

It didn't occur to me it might be a bug - I think the code simply
assumes it gets called with suitable memory context, just like we do in
various other places. Maybe it should document the assumption.

> 
>> So we'd have to do another switch again. Not sure how
>> backpatch-friendly would that be.
> 
> Yea, that's a valid concern. I think it might be reasonable to use something
> like ExecGetAllUpdatedColsCtx() in the backbranches, and switch to a
> short-lived context for the trigger invocations in >= 16.
> 

WFM


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: memory leak in trigger handling (since PG12)

2023-05-24 Thread Andres Freund
Hi,

On 2023-05-24 15:38:41 +0200, Tomas Vondra wrote:
> I looked at this again, and I think GetPerTupleMemoryContext(estate)
> might do the trick, see the 0002 part.

Yea, that seems like the right thing here.


> Unfortunately it's not much
> smaller/simpler than just freeing the chunks, because we end up doing
> 
> oldcxt = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
> updatedCols = ExecGetAllUpdatedCols(relinfo, estate);
> MemoryContextSwitchTo(oldcxt);

We could add a variant of ExecGetAllUpdatedCols that switches the context.


> and then have to pass updatedCols elsewhere. It's tricky to just switch
> to the context (e.g. in ExecASUpdateTriggers/ExecARUpdateTriggers), as
> AfterTriggerSaveEvent allocates other bits of memory too (in a longer
> lived context).

Hm - on a quick look the allocations in trigger.c itself are done with
MemoryContextAlloc().

I did find a problematic path, namely that ExecGetChildToRootMap() ends up
building resultRelInfo->ri_ChildToRootMap in CurrentMemoryContext.

That seems like a flat out bug to me - we can't just store data in a
ResultRelInfo without ensuring the memory is lives long enough. Nearby places
like ExecGetRootToChildMap() do make sure to switch to es_query_cxt.


Did you see other bits of memory getting allocated in CurrentMemoryContext?


> So we'd have to do another switch again. Not sure how
> backpatch-friendly would that be.

Yea, that's a valid concern. I think it might be reasonable to use something
like ExecGetAllUpdatedColsCtx() in the backbranches, and switch to a
short-lived context for the trigger invocations in >= 16.

Greetings,

Andres Freund




Re: memory leak in trigger handling (since PG12)

2023-05-24 Thread Andres Freund
Hi,

On 2023-05-23 23:26:42 +0200, Tomas Vondra wrote:
> On 5/23/23 19:14, Andres Freund wrote:
> > Hi,
> > 
> > On 2023-05-23 18:23:00 +0200, Tomas Vondra wrote:
> >> This means that for an UPDATE with triggers, we may end up calling this
> >> for each row, possibly multiple bitmaps. And those bitmaps are allocated
> >> in ExecutorState, so won't be freed until the end of the query :-(
> > 
> > Ugh.
> > 
> > 
> > I've wondered about some form of instrumentation to detect such issues
> > before. It's obviously a problem that we can have fairly large leaks, like 
> > the
> > one you just discovered, without detecting it for a couple years. It's kinda
> > made worse by the memory context infrastructure, because it hides such 
> > issues.
> > 
> > Could it help to have a mode where the executor shutdown hook checks how 
> > much
> > memory is allocated in ExecutorState and warns if its too much? There's 
> > IIRC a
> > few places that allocate large things directly in it, but most of those
> > probably should be dedicated contexts anyway.  Something similar could be
> > useful for some other long-lived contexts.
> > 
> 
> Not sure such simple instrumentation would help much, unfortunately :-(
> 
> We only discovered this because the user reported OOM crashes, which
> means the executor didn't get to the shutdown hook at all. Yeah, maybe
> if we had such instrumentation it'd get triggered for milder cases, but
> I'd bet the amount of noise is going to be significant.
> 
> For example, there's a nearby thread about hashjoin allocating buffiles
> etc. in ExecutorState - we ended up moving that to a separate context,
> but surely there are more such cases (not just for ExecutorState).

Yes, that's why I said that we would have to more of those into dedicated
contexts - which is a good idea independent of this issue.


> The really hard thing was determining what causes the memory leak - the
> simple instrumentation doesn't help with that at all. It tells you there
> might be a leak, but you don't know where did the allocations came from.
> 
> What I ended up doing is a simple gdb script that sets breakpoints on
> all palloc/pfree variants, and prints info (including the backtrace) for
> each call on ExecutorState. And then a script that aggregate those to
> identify which backtraces allocated most chunks that were not freed.

FWIW, for things like this I found "heaptrack" to be extremely helpful.

E.g. for a reproducer of the problem here, it gave me the attach "flame graph"
of the peak memory usage - after attaching to a running backend and running an
UPDATE triggering the leak..

Because the view I screenshotted shows the stacks contributing to peak memory
usage, it works nicely to find "temporary leaks", even if memory is actually
freed after all etc.



> > Hm. Somehow this doesn't seem quite right. Shouldn't we try to use a shorter
> > lived memory context instead? Otherwise we'll just end up with the same
> > problem in a few years.
> >
> 
> I agree using a shorter lived memory context would be more elegant, and
> more in line with how we do things. But it's not clear to me why we'd
> end up with the same problem in a few years with what the patch does.

Because it sets up the pattern of manual memory management and continues to
run the relevant code within a query-lifetime context.

Greetings,

Andres Freund


Question about error message in auth.c

2023-05-24 Thread Dave Cramer
Greetings,

In
https://github.com/postgres/postgres/blob/5c2c59ba0b5f723b067a6fa8bf8452d41fbb2125/src/backend/libpq/auth.c#L463

The last piece of information is the encryption state. However when an SSL
connection fails to authenticate the state is not encrypted.

When would it ever be encrypted if authentication fails ?

Dave Cramer


Re: memory leak in trigger handling (since PG12)

2023-05-24 Thread Tomas Vondra



On 5/24/23 17:37, Tom Lane wrote:
> Tomas Vondra  writes:
>> While looking for other places allocating stuff in ExecutorState (for
>> the UPDATE case) and leaving it there, I found two more cases:
> 
>> 1) copy_plpgsql_datums
> 
>> 2) make_expanded_record_from_tupdesc
>>make_expanded_record_from_exprecord
> 
>> All of this is calls from plpgsql_exec_trigger.
> 
> Can you show a test case in which this happens?  I added some
> instrumentation and verified at least within our regression tests,
> copy_plpgsql_datums' CurrentMemoryContext is always plpgsql's
> "SPI Proc" context, so I do not see how there can be a query-lifespan
> leak there, nor how your 0003 would fix it if there is.
> 

Interesting. I tried to reproduce it, but without success, and it passes
even with an assert on the context name. The only explanation I have is
that the gdb script I used might have been a bit broken - it used
conditional breakpoints like this one:

  break AllocSetAlloc if strcmp(((MemoryContext) $rdi)->name, \
"ExecutorState") == 0
  commands
  bt
  cont
  end

but I just noticed gdb sometimes complains about this:

  Error in testing breakpoint condition:
  '__strcmp_avx2' has unknown return type; cast the call to its declared
  return type

The breakpoint still fires all the commands, which is pretty surprising
behavior, but that might explain why I saw copy_plpgsql_data as another
culprit. And I suspect the make_expanded_record calls might be caused by
the same thing.

I'll check deeper tomorrow, when I get access to the original script
etc. We can ignore these cases until then.

Sorry for the confusion :-/


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: PG 16 draft release notes ready

2023-05-24 Thread Bruce Momjian
On Wed, May 24, 2023 at 01:43:50PM -0400, Bruce Momjian wrote:
> > * Reduce palloc() memory overhead for all memory allocations down to 8
> > bytes on all platforms. (Andres Freund, David Rowley)
> > 
> > This allows more efficient use of memory and is especially useful in
> > queries which perform operations (such as sorting or hashing) that
> > require more than work_mem.
> 
> Well, this would go in the source code section, but it seems too
> internal and global to mention.

What was the previous memory allocation overhead?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: PG 16 draft release notes ready

2023-05-24 Thread Bruce Momjian
On Wed, May 24, 2023 at 08:48:56AM +1200, David Rowley wrote:
> On Tue, 23 May 2023 at 06:04, Bruce Momjian  wrote:
> >
> > On Mon, May 22, 2023 at 10:59:36AM -0700, Andres Freund wrote:
> > > And here it's not just performance, but also memory usage, including 
> > > steady
> > > state memory usage.
> >
> > Understood.  I continue to need help determining which items to include.
> > Can you suggest some text?  This?
> >
> > Improve efficiency of memory usage to allow for better scaling
> 
> Maybe something like:
> 
> * Reduce palloc() memory overhead for all memory allocations down to 8
> bytes on all platforms. (Andres Freund, David Rowley)
> 
> This allows more efficient use of memory and is especially useful in
> queries which perform operations (such as sorting or hashing) that
> require more than work_mem.

Well, this would go in the source code section, but it seems too
internal and global to mention.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: PostgreSQL 16 Beta 1 release announcement draft

2023-05-24 Thread Jonathan S. Katz

On 5/22/23 3:23 PM, Erik Rijkers wrote:

Op 5/21/23 om 19:07 schreef Jonathan S. Katz:

On 5/19/23 12:17 AM, Jonathan S. Katz wrote:

Hi,

Attached is a draft of the release announcement for PostgreSQL 16 
Beta Please provide feedback no later than May 24, 0:00 AoE. This 
will give 
Thanks everyone for your feedback. Here is the updated text that 


'substransaction'  should be
'subtransaction'


Fixed.


'use thousands separators'  perhaps is better:
'use underscore as digit-separator, as in `5_432` and `1_00_000`'


I looked at how other languages document this, and they do use the term 
"thousands separators." I left that in, but explicitly called out the 
underscore.



'instrcut'  should be
'instruct'


Fixed. Attached is the (hopefully) final draft.

Thanks,

Jonathan

The PostgreSQL Global Development Group announces that the first beta release of
PostgreSQL 16 is now [available for 
download](https://www.postgresql.org/download/).
This release contains previews of all features that will be available when
PostgreSQL 16 is made generally available, though some details of the release
can change during the beta period.

You can find information about all of the features and changes found in
PostgreSQL 16 in the [release 
notes](https://www.postgresql.org/docs/16/release-16.html):

  
[https://www.postgresql.org/docs/16/release-16.html](https://www.postgresql.org/docs/16/release-16.html)

In the spirit of the open source PostgreSQL community, we strongly encourage you
to test the new features of PostgreSQL 16 on your systems to help us eliminate
bugs or other issues that may exist. While we do not advise you to run
PostgreSQL 16 Beta 1 in production environments, we encourage you to find ways
to run your typical application workloads against this beta release.

Your testing and feedback will help the community ensure that the PostgreSQL 16
release upholds our standards of delivering a stable, reliable release of the
world's most advanced open source relational database. Please read more about
our [beta testing process](https://www.postgresql.org/developer/beta/) and how
you can contribute:

  
[https://www.postgresql.org/developer/beta/](https://www.postgresql.org/developer/beta/)

PostgreSQL 16 Feature Highlights


### Performance

PostgreSQL 16 includes performance improvements in query execution. This release
adds more query parallelism, including allowing `FULL` and `RIGHT` joins to
execute in parallel, and parallel execution of the `string_agg` and `array_agg`
aggregate functions. Additionally, PostgreSQL 16 can use incremental sorts in
`SELECT DISTINCT` queries. There are also several optimizations for
[window 
queries](https://www.postgresql.org/docs/16/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS),
improvements in lookups for `RANGE` and `LIST` partitions, and support for
"anti-joins" in `RIGHT` and `OUTER` queries.

This release also introduces support for CPU acceleration using SIMD for both
x86 and ARM architectures, including optimizations for processing ASCII and JSON
strings, and array and subtransaction searches. Additionally, PostgreSQL 16
introduces [load 
balancing](https://www.postgresql.org/docs/16/libpq-connect.html#LIBPQ-CONNECT-LOAD-BALANCE-HOSTS)
to libpq, the client library for PostgreSQL.

### Logical Replication Enhancements

Logical replication lets PostgreSQL users stream data in real-time to other
PostgreSQL or other external systems that implement the logical protocol. Until
PostgreSQL 16, users could only create logical replication publishers on primary
instances. PostgreSQL 16 adds the ability to perform logical decoding on a
standby instance, giving users more options to distribute their workload, for
example, use a standby that's less busy than a primary to logically replicate
changes.

PostgreSQL 16 also includes several performance improvements to logical
replication. This includes allowing the subscriber to apply large transactions
in parallel, use indexes other than the `PRIMARY KEY` to perform lookups during
`UPDATE` or `DELETE` operations, and allow for tables to be copied using binary
format during initialization.

### Developer Experience

PostgreSQL 16 continues to implement the 
[SQL/JSON](https://www.postgresql.org/docs/16/functions-json.html)
standard for manipulating 
[JSON](https://www.postgresql.org/docs/16/datatype-json.html)
data, including support for SQL/JSON constructors (e.g. `JSON_ARRAY()`,
`JSON_ARRAYAGG()` et al), and identity functions (`IS JSON`). This release also
adds the SQL standard 
[`ANY_VALUE`](https://www.postgresql.org/docs/16/functions-aggregate.html#id-1.5.8.27.5.2.4.1.1.1.1)
aggregate function, which returns any arbitrary value from the aggregate set.
For convenience, PostgreSQL 16 now lets you specify non-decimal integer
literals, such as `0xff`, `0o777`, and `0b101010`, and use underscores as
thousands separators, such as `5_432`.

This release adds support for the extended query protocol to the 

Re: PG 16 draft release notes ready

2023-05-24 Thread Jonathan S. Katz

On 5/24/23 12:13 AM, David Rowley wrote:

On Wed, 24 May 2023 at 15:54, Bruce Momjian  wrote:


On Wed, May 24, 2023 at 08:37:45AM +1200, David Rowley wrote:

On Mon, 22 May 2023 at 07:05, Jonathan S. Katz  wrote:

* Parallel execution of queries that use `FULL` and `OUTER` joins


I think this should be `RIGHT` joins rather than `OUTER` joins.

LEFT joins have been parallelizable I think for a long time now.


Well, since we can swap left/right easily, why would we not have just
have swappted the tables and done the join in the past?  I think there
are two things missing in my description.

First, I need to mention parallel _hash_ join.  Second, I think this
item is saying that the _inner_ side of a parallel hash join can be an
OUTER or FULL join.  How about?

 Allow hash joins to be parallelized where the inner side is
 processed as an OUTER or FULL join (Melanie Plageman, Thomas Munro)

In this case, the inner side is the hashed side.


I think Jonathan's text is safe to swap OUTER to RIGHT as it mentions
"execution".


I made this swap in the release announcement. Thanks!

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature


Re: Order changes in PG16 since ICU introduction

2023-05-24 Thread Joe Conway

On 5/24/23 11:39, Jeff Davis wrote:

On Mon, 2023-05-22 at 22:09 +0200, Daniel Verite wrote:

In practice we're probably getting the "und" ICU locale whereas "fr"
would be appropriate.


This is a good point and illustrates that ICU is not a drop-in
replacement for libc in all cases.

I don't see a solution here that doesn't involve some rough edges,
though. "Locale" is a generic term, and if we continue to insist that
it really means a libc locale, then ICU will never be on an equal
footing with libc, let alone the preferred provider.


Huge +1

IMHO the experience should be unified to the degree possible.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





Re: PG 16 draft release notes ready

2023-05-24 Thread Bruce Momjian
On Wed, May 24, 2023 at 04:57:59PM +0200, Erik Rijkers wrote:
> Op 5/24/23 om 15:58 schreef Bruce Momjian:
> > On Wed, May 24, 2023 at 12:23:02PM +0700, John Naylor wrote:
> > > 
> > > On Wed, May 24, 2023 at 11:19 AM Bruce Momjian  wrote:
> 
> Typos:
> 
> 'from standbys servers'  should be
> 'from standby servers'
> 
> 'reindexedb'  should be
> 'reindexdb'
>   (2x: the next line mentions, erroneously,  'reindexedb --system')
> 
> 'created only created'  should be
> 'only created'
>   (I think)
> 
> 'could could'  should be
> 'could'
> 
> 'are now require the role'  should be
> 'now require the role'
> 
> 'values is'  should be
> 'value is'
> 
> 'to marked'  should be
> 'to be marked'

All good, patch attached and applied.  Updated docs are at:

https://momjian.us/pgsql_docs/release-16.html

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/release-16.sgml b/doc/src/sgml/release-16.sgml
index bb92fe5cf9..88d6514ad7 100644
--- a/doc/src/sgml/release-16.sgml
+++ b/doc/src/sgml/release-16.sgml
@@ -27,7 +27,7 @@
 
 
  
-  Allow logical replication from standbys servers
+  Allow logical replication from standby servers
  
 
 
@@ -126,11 +126,11 @@ Author: Michael Paquier 
 
 
 
-Change REINDEX DATABASE and reindexedb to not process indexes on system catalogs (Simon Riggs)
+Change REINDEX DATABASE and reindexdb to not process indexes on system catalogs (Simon Riggs)
 
 
 
-Processing such indexes is still possible using REINDEX SYSTEM and reindexedb --system.
+Processing such indexes is still possible using REINDEX SYSTEM and reindexdb --system.
 
 
 
@@ -593,7 +593,7 @@ Create subscription statistics entries at subscription creation time so stats_re
 
 
 
-Previously entries were created only created when the first statistics were reported.
+Previously entries were created only when the first statistics were reported.
 
 
 
@@ -777,7 +777,7 @@ Simplify permissions for LOCK TABLE (Jeff Davis)
 
 
 
-Previously the ability to perform LOCK TABLE at various lock levels was bound to specific query-type permissions.  For example, UPDATE could could perform all lock levels except ACCESS SHARE, which
+Previously the ability to perform LOCK TABLE at various lock levels was bound to specific query-type permissions.  For example, UPDATE could perform all lock levels except ACCESS SHARE, which
 required SELECT permissions.  Now UPDATE can issue all lock levels.  MORE?
 
 
@@ -808,7 +808,7 @@ Restrict the privileges of CREATEROLE roles (Robert Haas)
 
 
 
-Previously roles with CREATEROLE privileges could change many aspects of any non-superuser role.  Such changes, including adding members, are now require the role requesting the change to have ADMIN OPTION
+Previously roles with CREATEROLE privileges could change many aspects of any non-superuser role.  Such changes, including adding members, now require the role requesting the change to have ADMIN OPTION
 permission.
 
 
@@ -946,7 +946,7 @@ Add dependency tracking of grantors for GRANT records (Robert Haas)
 
 
 
-This will guarantee that pg_auth_members.grantor values is always valid.
+This guarantees that pg_auth_members.grantor values are always valid.
 
 
 
@@ -3017,7 +3017,7 @@ Author: Tom Lane 
 
 
 
-Allow required extensions to marked as non-relocatable using "no_relocate" (Regina Obe)
+Allow required extensions to be marked as non-relocatable using "no_relocate" (Regina Obe)
 
 
 


Re: memory leak in trigger handling (since PG12)

2023-05-24 Thread Tom Lane
Tomas Vondra  writes:
> On 5/23/23 23:39, Tom Lane wrote:
>> FWIW, I've had some success localizing palloc memory leaks with valgrind's
>> leak detection mode.  The trick is to ask it for a report before the
>> context gets destroyed.  Beats writing your own infrastructure ...

> I haven't tried valgrind, so can't compare.
> Would it be possible to filter which memory contexts to track? Say we
> know the leak is in ExecutorState, so we don't need to track allocations
> in other contexts. That was a huge speedup for me, maybe it'd help
> valgrind too.

I don't think valgrind has a way to do that, but this'd be something you
set up specially in any case.

> Also, how did you ask for the report before context gets destroyed?

There are several valgrind client requests that could be helpful:

/* Do a full memory leak check (like --leak-check=full) mid-execution. */
#define VALGRIND_DO_LEAK_CHECK   \
VALGRIND_DO_CLIENT_REQUEST_STMT(VG_USERREQ__DO_LEAK_CHECK,   \
0, 0, 0, 0, 0)

/* Same as VALGRIND_DO_LEAK_CHECK but only showing the entries for
   which there was an increase in leaked bytes or leaked nr of blocks
   since the previous leak search. */
#define VALGRIND_DO_ADDED_LEAK_CHECK\
VALGRIND_DO_CLIENT_REQUEST_STMT(VG_USERREQ__DO_LEAK_CHECK,  \
0, 1, 0, 0, 0)

/* Return number of leaked, dubious, reachable and suppressed bytes found by
   all previous leak checks.  They must be lvalues.  */
#define VALGRIND_COUNT_LEAK_BLOCKS(leaked, dubious, reachable, suppressed) \

Putting VALGRIND_DO_ADDED_LEAK_CHECK someplace in the executor loop would
help narrow things down pretty quickly, assuming you had a self-contained
example demonstrating the leak.  I don't recall exactly how I used these
but it was something along that line.

>> Yeah, it's not clear whether we could make the still-hypothetical check
>> sensitive enough to find leaks using small test cases without getting an
>> unworkable number of false positives.  Still, might be worth trying.

> I'm not against experimenting with that. Were you thinking about
> something that'd be cheap enough to just be enabled always/everywhere,
> or something we'd enable during testing?

We seem to have already paid the overhead of counting all palloc
allocations, so I don't think it'd be too expensive to occasionally check
the ExecutorState's mem_allocated and see if it's growing (especially
if we only do so in assert-enabled builds).  The trick is to define the
rules for what's worth reporting.

>> It might be an acceptable tradeoff to have stricter rules for what can
>> be allocated in ExecutorState in order to make this sort of problem
>> more easily detectable.

> Would these rules be just customary, or defined/enforced in the code
> somehow? I can't quite imagine how would that work, TBH.

If the check bleated "WARNING: possible executor memory leak" during
regression testing, people would soon become conditioned to doing
whatever they have to do to avoid it ;-)

regards, tom lane




Re: Order changes in PG16 since ICU introduction

2023-05-24 Thread Jeff Davis
On Mon, 2023-05-22 at 22:09 +0200, Daniel Verite wrote:
> While I agree that the LOCALE option in CREATE DATABASE is
> counter-intuitive,

I think it's more than that. As Andreww Gierth pointed out:

   $ initdb --locale=fr_FR
   ...
 ICU locale:  en-US
   ...

Is more than just counter-intuitive. I don't think we can ship 16 that
way.

>  I find it questionable that blending ICU
> and libc locales into it helps that much with the user experience.

Thank you for going through some examples here. I agree that it's not
perfect, but we need some path to a reasonable ICU user experience, and
I think we'll have to accept some rough edges to avoid the worst cases,
like above.

> initdb:
> 
>   Using default ICU locale "fr".
>   Using language tag "fr" for ICU locale "fr".
> 

...

> #1
> 
> postgres=# create database test1 locale='fr_FR.UTF-8';
> NOTICE:  using standard form "fr-FR" for ICU locale "fr_FR.UTF-8"
> ERROR:  new ICU locale (fr-FR) is incompatible with the ICU locale of

I don't see a problem here. If you specify LOCALE to CREATE DATABASE,
you should either be using "TEMPLATE template0", or you should be
expecting an error if the LOCALE doesn't match exactly.

What would you like to see happen here?

> #2
> 
> postgres=# create database test2 locale='C.UTF-8'
> template='template0';
> NOTICE:  using standard form "en-US-u-va-posix" for ICU locale
> "C.UTF-8"
> CREATE DATABASE
> 
> 
> en-US-u-va-posix does not sort like C.UTF-8 in glibc 2.35, so
> this interpretation is arguably not what a user would expect.

As you pointed out, this is not settled in libc either:

https://www.postgresql.org/message-id/8a3dc06f-9b9d-4ed7-9a12-2070d8b0165f%40manitou-mail.org

We really can't expect a particular order for a particular locale name,
unless we handle it specially like "C" or "POSIX". If we pass it to the
provider, we have to trust the provider to match our conceptual
expectations for that locale (and ideally version it properly).

> I would expect the ICU warning or error (icu_validation_level) to
> kick
> in instead of that transliteration.

Earlier versions of ICU (<= 63) do this transformation automatically,
and I don't see a reason to throw an error if ICU considers it valid.
The language tag en-US-u-va-posix will be stored in the catalog, and
that will be considered valid in later versions of ICU.

Later versions of ICU (>= 64) consider locales with a language name of
"C" to be obsolete and no longer valid. I added code to do the
transformation without error in these later versions, but I think we
have agreement to remove it.

If a user specifies the locale as "C.UTF-8", we can either pass it to
ICU and see whether that version accepts it or not (and if not, throw a
warning/error); or if we decide that "C.UTF-8" really means "C", we can
handle it in the memcmp() path like C and never send it to ICU.

> #3
> 
> $ grep french /etc/locale.alias
> french  fr_FR.ISO-8859-1
> 
> postgres=# create database test3 locale='french' template='template0'
> encoding='LATIN1';
> WARNING:  ICU locale "french" has unknown language "french"
> HINT:  To disable ICU locale validation, set parameter
> icu_validation_level
> to DISABLED.
> CREATE DATABASE
> 
> 
> In practice we're probably getting the "und" ICU locale whereas "fr"
> would
> be appropriate.

This is a good point and illustrates that ICU is not a drop-in
replacement for libc in all cases.

I don't see a solution here that doesn't involve some rough edges,
though. "Locale" is a generic term, and if we continue to insist that
it really means a libc locale, then ICU will never be on an equal
footing with libc, let alone the preferred provider.

Regards,
Jeff Davis






Re: memory leak in trigger handling (since PG12)

2023-05-24 Thread Tom Lane
Tomas Vondra  writes:
> While looking for other places allocating stuff in ExecutorState (for
> the UPDATE case) and leaving it there, I found two more cases:

> 1) copy_plpgsql_datums

> 2) make_expanded_record_from_tupdesc
>make_expanded_record_from_exprecord

> All of this is calls from plpgsql_exec_trigger.

Can you show a test case in which this happens?  I added some
instrumentation and verified at least within our regression tests,
copy_plpgsql_datums' CurrentMemoryContext is always plpgsql's
"SPI Proc" context, so I do not see how there can be a query-lifespan
leak there, nor how your 0003 would fix it if there is.

regards, tom lane




Re: Wrong results due to missing quals

2023-05-24 Thread Tom Lane
Richard Guo  writes:
> So the qual 't2.a = t5.a' is missing.

Ugh.

> I looked into it and found that both clones of this joinqual are
> rejected by clause_is_computable_at, because their required_relids do
> not include the outer join of t2/(t3/t4), and meanwhile include nullable
> rels of this outer join.
> I think the root cause is that, as Tom pointed out in [1], we're not
> maintaining required_relids very accurately.  In b9c755a2, we make
> clause_is_computable_at test required_relids for clone clauses.  I think
> this is how this issue sneaks in.

Yeah.  I'm starting to think that b9c755a2 took the wrong approach.
Really, required_relids is about making sure that a qual isn't
evaluated "too low", before all necessary joins have been formed.  But
clause_is_computable_at is charged with making sure we don't evaluate
it "too high", after some incompatible join has been formed.  There's
no really good reason to suppose that required_relids can serve both
purposes, even if it were computed perfectly accurately (and what is
perfect, anyway?).

So right now I'm playing with the idea of reverting the change in
clause_is_computable_at and seeing how else we can fix the previous
bug.  Don't have anything to show yet, but one thought is that maybe
deconstruct_distribute_oj_quals needs to set up clause_relids for
clone clauses differently.  Another idea is that maybe we need another
RestrictInfo field that's directly a set of OJ relids that this clause
can't be applied above.  That'd reduce clause_is_computable_at to
basically a bms_intersect test which would be nice speed-wise.  The
space consumption could be annoying, but I'm thinking that we might
only have to populate the field in clone clauses, which would
alleviate that issue.

regards, tom lane




Re: PG 16 draft release notes ready

2023-05-24 Thread Erik Rijkers

Op 5/24/23 om 15:58 schreef Bruce Momjian:

On Wed, May 24, 2023 at 12:23:02PM +0700, John Naylor wrote:


On Wed, May 24, 2023 at 11:19 AM Bruce Momjian  wrote:


Typos:

'from standbys servers'  should be
'from standby servers'

'reindexedb'  should be
'reindexdb'
  (2x: the next line mentions, erroneously,  'reindexedb --system')

'created only created'  should be
'only created'
  (I think)

'could could'  should be
'could'

'are now require the role'  should be
'now require the role'

'values is'  should be
'value is'

'to marked'  should be
'to be marked'


thanks,
Erik






SyncRepWaitForLSN waits for XLogFlush?

2023-05-24 Thread Tejasvi Kashi
Hi everyone,

I was wondering if waiting for an LSN in SyncRepWaitForLSN ensures that the
XLOG has been flushed locally up to that location before the record is
shipped off to standbys?

Regards,
Tej


Re: Proposal: Removing 32 bit support starting from PG17++

2023-05-24 Thread Tom Lane
Hans Buschmann  writes:
> This inspired me to propose dropping 32bit support for PostgreSQL starting 
> with PG17.

I don't think this is a great idea.  Even if Intel isn't interested,
there'll be plenty of 32-bit left in the lower end of the market
(think ARM, IoT, and so on).

regards, tom lane




Proposal: Removing 32 bit support starting from PG17++

2023-05-24 Thread Hans Buschmann
I recently stumbled over the following Intel proposal for dropping 32bit 
support in x86 processors. [1]


This inspired me to propose dropping 32bit support for PostgreSQL starting with 
PG17.


It seems obvious that production systems mostly won't use newly installed 32 
bit native code in late 2024 and beyond.


A quick inspection of the buildfarm only showed a very limited number of 32bit 
systems.


This proposal follows the practice for Windows which already (practically) 
dropped 32bit support.


32bit systems may get continuing support in the backbranches until PG16 retires 
in 2028.


Even if I am not a postgres hacker I suppose this could simplify things quite a 
lot.


Any thoughts for discussion are welcome!


Hans Buschmann


[1] https://www.phoronix.com/news/Intel-X86-S-64-bit-Only




Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2023-05-24 Thread Tom Lane
Daniel Gustafsson  writes:
> It would be nice it the OpenSSL project could grant us an LTS license for a
> buildfarm animal to ensure compatibility but I have no idea how realistic that
> is (or how much the LTS version of 1.0.2 has diverged from the last available
> public 1.0.2 version).

Surely the answer must be "not much".  The entire point of an LTS
version is to not have to change dusty calling applications.

We had definitely better have some animals still using 1.0.2, but
I don't see much reason to think that the last public release
wouldn't be good enough.

regards, tom lane




Re: Make pgbench exit on SIGINT more reliably

2023-05-24 Thread Tristan Partin
On Tue May 23, 2023 at 7:31 PM CDT, Michael Paquier wrote:
> On Mon, May 22, 2023 at 10:02:02AM -0500, Tristan Partin wrote:
> > The way that pgbench handled SIGINT changed in
> > 1d468b9ad81b9139b4a0b16b416c3597925af4b0. Unfortunately this had a
> > couple of unintended consequences, at least from what I can tell[1].
> > 
> > - CTRL-C no longer stops the program unless the right point in pgbench
> >   execution is hit
> > - pgbench no longer exits with a non-zero exit code
> > 
> > An easy reproduction of these problems is to run with a large scale
> > factor like: pgbench -i -s 50. Then try to CTRL-C the program.
>
> This comes from the code path where the data is generated client-side,
> and where the current CancelRequested may not be that responsive,
> isn't it?

It comes from the fact that CancelRequested is only checked within the
generation of the pgbench_accounts table, but with a large enough scale
factor or enough latency, say cross-continent communication, generating
the other tables can be time consuming too. But, yes you are more likely
to run into this issue when generating client-side data.

-- 
Tristan Partin
Neon (https://neon.tech)




Re: PG 16 draft release notes ready

2023-05-24 Thread Bruce Momjian
On Wed, May 24, 2023 at 12:23:02PM +0700, John Naylor wrote:
> 
> On Wed, May 24, 2023 at 11:19 AM Bruce Momjian  wrote:
> >
> > Second, you might be correct that the section is wrong.  I thought of
> > CPU instructions as something tied to the compiler, so part of the build
> > process or source code, but the point we should be make is that we have
> > these acceleration, not how it is implemented.  We can move the entire
> > group to the "General Performance" section, or we can split it out:
> 
> Splitting out like that seems like a good idea to me. 

Okay, items split into sections and several merged.  I left the
CPU-specific parts in Source Code, and moved the rest into a merged item
in General Performance, but moved the JSON item to Data Types.

Patch attached, and you can see the results at:

https://momjian.us/pgsql_docs/release-16.html

> The last one refers to new internal functions, so it could stay in source 
> code.
> (Either way, we don't want to imply that arrays of SQL types are accelerated
> this way, it's so far only for internal arrays.)

Good point.  I called them "C arrays" but it it into the General
Performance item.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
commit ad5406246b
Author: Bruce Momjian 
Date:   Wed May 24 09:54:34 2023 -0400

doc: PG 16 relnotes, merge and move vector items

Reported-by: John Naylor

Discussion: https://postgr.es/m/CAFBsxsEPg8L2MmGqavc8JByC=wf_mnkhn-kknfpkcqh0hyd...@mail.gmail.com

diff --git a/doc/src/sgml/release-16.sgml b/doc/src/sgml/release-16.sgml
index c30c530065..bb92fe5cf9 100644
--- a/doc/src/sgml/release-16.sgml
+++ b/doc/src/sgml/release-16.sgml
@@ -472,6 +472,28 @@ Author: David Rowley 
 
 Improve the speed of updating the process title (David Rowley)
 
+
+
+
+
+
+
+Allow xid/subxid searches and ASCII string detection to use vector operations (Nathan Bossart)
+
+
+
+ASCII detection is particularly useful for COPY FROM.  Vector operations are also used for some C array searches.
+
+
 
 
  
@@ -1781,6 +1803,17 @@ The IS JSON checks include checks for values, arrays, objects, scalars, and uniq
 
 
 
+
+
+
+
+Allow JSON string parsing to use vector operations (John Naylor)
+
+
+
 
-
-
-
-Allow ASCII string detection to use vector operations (John Naylor)
-
-
-
-
-
-
-
-Allow JSON string parsing to use vector operations (John Naylor)
-
-
-
-ARM?
-
-
-
-
-
-
-
-Allow array searches to use vector operations (John Naylor)
-
-
-
-
-
-
-
-Allow xid/subxid searches to use vector operations (Nathan Bossart)
-
-
-
 

Re: memory leak in trigger handling (since PG12)

2023-05-24 Thread Tomas Vondra


On 5/23/23 22:57, Tomas Vondra wrote:
> 
> 
> On 5/23/23 18:39, Tom Lane wrote:
>> Tomas Vondra  writes:
>>> it seems there's a fairly annoying memory leak in trigger code,
>>> introduced by
>>> ...
>>> Attached is a patch, restoring the pre-12 behavior for me.
>>
>>> While looking for other places allocating stuff in ExecutorState (for
>>> the UPDATE case) and leaving it there, I found two more cases:
>>
>>> 1) copy_plpgsql_datums
>>
>>> 2) make_expanded_record_from_tupdesc
>>>make_expanded_record_from_exprecord
>>
>>> All of this is calls from plpgsql_exec_trigger.
>>
>> Not sure about the expanded-record case, but both of your other two
>> fixes feel like poor substitutes for pushing the memory into a
>> shorter-lived context.  In particular I'm quite surprised that
>> plpgsql isn't already allocating that workspace in the "procedure"
>> memory context.
>>
> 
> I don't disagree, but which memory context should this use and
> when/where should we switch to it?
> 
> I haven't seen any obvious memory context candidate in the code
> calling ExecGetAllUpdatedCols, so I guess we'd have to pass it from
> above. Is that a good idea for backbranches ...
> 

I looked at this again, and I think GetPerTupleMemoryContext(estate)
might do the trick, see the 0002 part. Unfortunately it's not much
smaller/simpler than just freeing the chunks, because we end up doing

oldcxt = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
updatedCols = ExecGetAllUpdatedCols(relinfo, estate);
MemoryContextSwitchTo(oldcxt);

and then have to pass updatedCols elsewhere. It's tricky to just switch
to the context (e.g. in ExecASUpdateTriggers/ExecARUpdateTriggers), as
AfterTriggerSaveEvent allocates other bits of memory too (in a longer
lived context). So we'd have to do another switch again. Not sure how
backpatch-friendly would that be.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom 85439e2587f035040c82e7303b96b887e650b01f Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Tue, 23 May 2023 17:45:47 +0200
Subject: [PATCH 1/3] Free updatedCols bitmaps

---
 src/backend/commands/trigger.c  | 22 --
 src/backend/executor/execMain.c |  9 +++--
 2 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 4b295f8da5e..fb11203c473 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2916,6 +2916,8 @@ ExecBSUpdateTriggers(EState *estate, ResultRelInfo *relinfo)
 	(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
 	 errmsg("BEFORE STATEMENT trigger cannot return a value")));
 	}
+
+	bms_free(updatedCols);
 }
 
 void
@@ -2928,12 +2930,18 @@ ExecASUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
 	Assert(relinfo->ri_RootResultRelInfo == NULL);
 
 	if (trigdesc && trigdesc->trig_update_after_statement)
+	{
+		Bitmapset *updatedCols = ExecGetAllUpdatedCols(relinfo, estate);
+
 		AfterTriggerSaveEvent(estate, relinfo, NULL, NULL,
 			  TRIGGER_EVENT_UPDATE,
 			  false, NULL, NULL, NIL,
-			  ExecGetAllUpdatedCols(relinfo, estate),
+			  updatedCols,
 			  transition_capture,
 			  false);
+
+		bms_free(updatedCols);
+	}
 }
 
 bool
@@ -3043,6 +3051,9 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
 heap_freetuple(trigtuple);
 			if (should_free_new)
 heap_freetuple(oldtuple);
+
+			bms_free(updatedCols);
+
 			return false;		/* "do nothing" */
 		}
 		else if (newtuple != oldtuple)
@@ -3068,6 +3079,8 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
 	if (should_free_trig)
 		heap_freetuple(trigtuple);
 
+	bms_free(updatedCols);
+
 	return true;
 }
 
@@ -3107,6 +3120,7 @@ ExecARUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
 		 */
 		TupleTableSlot *oldslot;
 		ResultRelInfo *tupsrc;
+		Bitmapset *updatedCols;
 
 		Assert((src_partinfo != NULL && dst_partinfo != NULL) ||
 			   !is_crosspart_update);
@@ -3129,14 +3143,18 @@ ExecARUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
 		else
 			ExecClearTuple(oldslot);
 
+		updatedCols = ExecGetAllUpdatedCols(relinfo, estate);
+
 		AfterTriggerSaveEvent(estate, relinfo,
 			  src_partinfo, dst_partinfo,
 			  TRIGGER_EVENT_UPDATE,
 			  true,
 			  oldslot, newslot, recheckIndexes,
-			  ExecGetAllUpdatedCols(relinfo, estate),
+			  updatedCols,
 			  transition_capture,
 			  is_crosspart_update);
+
+		bms_free(updatedCols);
 	}
 }
 
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index c76fdf59ec4..98502b956c2 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2367,6 +2367,7 @@ ExecUpdateLockMode(EState *estate, ResultRelInfo *relinfo)
 {
 	Bitmapset  *keyCols;
 	Bitmapset  *updatedCols;
+	LockTupleMode	lockMode;
 
 	/*
 	 * Compute lock mode to use.  If columns that are part of the key have not
@@ -2378,9 

Re: memory leak in trigger handling (since PG12)

2023-05-24 Thread Tomas Vondra



On 5/24/23 10:19, Jakub Wartak wrote:
> Hi, just two cents:
> 
> On Tue, May 23, 2023 at 8:01 PM Andres Freund  wrote:
>>
>> Hi,
>>
>> On 2023-05-23 13:28:30 -0400, Tom Lane wrote:
>>> Andres Freund  writes:
 Could it help to have a mode where the executor shutdown hook checks how 
 much
 memory is allocated in ExecutorState and warns if its too much?
>>>
>>> It'd be very hard to set a limit for what's "too much", since the amount
>>> of stuff created initially will depend on the plan size.
>>
>> I was thinking of some limit that should really never be reached outside of a
>> leak or work_mem based allocations, say 2GB or so.
> 
> RE: instrumentation subthread:
> if that helps then below technique can work somewhat good on normal
> binaries for end users (given there are debug symbols installed), so
> maybe we don't need that much infrastructure added in to see the hot
> code path:
> 
> perf probe -x /path/to/postgres 'palloc' 'size=%di:u64' # RDI on
> x86_64(palloc size arg0)
> perf record -avg --call-graph dwarf -e probe_postgres:palloc -aR -p
>  sleep 3 # cannot be longer, huge overhead (~3s=~2GB)
> 
> it produces:
> 50.27%  (563d0e380670) size=24
> |
> ---palloc
>bms_copy
>ExecUpdateLockMode
>ExecBRUpdateTriggers
>ExecUpdate
> [..]
> 
> 49.73%  (563d0e380670) size=16
> |
> ---palloc
>bms_copy
>RelationGetIndexAttrBitmap
>ExecUpdateLockMode
>ExecBRUpdateTriggers
>ExecUpdate
> [..]
> 
> Now we know that those small palloc() are guilty, but we didn't know
> at the time with Tomas. The problem here is that we do not know in
> palloc() - via its own arguments for which MemoryContext this is going
> to be allocated for. This is problematic for perf, because on RHEL8, I
> was not able to generate an uprobe that was capable of reaching a
> global variable (CurrentMemoryContext) at that time.
> 

I think there are a couple even bigger issues:

(a) There are other methods that allocate memory - repalloc, palloc0,
MemoryContextAlloc, ... and so on. But presumably we can trace all of
them at once? We'd have to trace reset/deletes too.

(b) This doesn't say if/when the allocated chunks get freed - even with
a fix, we'll still do exactly the same number of allocations, except
that we'll free the memory shortly after that. I wonder if we could
print a bit more info for each probe, to match the alloc/free requests.

> Additionally what was even more frustrating on diagnosing that case on
> the customer end system, was that such OOMs were crashing other
> PostgreSQL clusters on the same OS. Even knowing the exact guilty
> statement it was impossible to limit RSS memory usage of that
> particular backend. So, what you are proposing makes a lot of sense.
> Also it got me thinking of implementing safety-memory-net-GUC
> debug_query_limit_backend_memory=X MB that would inject
> setrlimit(RLIMIT_DATA) through external extension via hook(s) and
> un-set it later, but the man page states it works for mmap() only
> after Linux 4.7+ so it is future proof but won't work e.g. on RHEL7 -
> maybe that's still good enough?; Or, well maybe try to hack a palloc()
> a little, but that has probably too big overhead, right? (just
> thinking loud).
> 

Not sure about setting a hard limit - that seems pretty tricky and may
easily backfire. It's already possible to set such memory limit using
e.g. cgroups, or even better use VMs to isolate the instances.

I certainly agree it's annoying that when OOM hits, we end up with no
information about what used the memory. Maybe we could have a threshold
triggering a call to MemoryContextStats? So that we have at least some
memory usage info in the log in case the OOM killer intervenes.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Large files for relations

2023-05-24 Thread Robert Haas
On Wed, May 24, 2023 at 2:18 AM Peter Eisentraut
 wrote:
> > What I'm hearing is that something simple like this might be more 
> > acceptable:
> >
> > * initdb --rel-segsize (cf --wal-segsize), default unchanged
>
> makes sense

+1.

> > * pg_upgrade would convert if source and target don't match
>
> This would be good, but it could also be an optional or later feature.

+1. I think that would be nice to have, but not absolutely required.

IMHO it's best not to overcomplicate these projects. Not everything
needs to be part of the initial commit. If the initial commit happens
2 months from now and then stuff like this gets added over the next 8,
that's strictly better than trying to land the whole patch set next
March.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Atomic ops for unlogged LSN

2023-05-24 Thread Robert Haas
On Tue, May 23, 2023 at 6:26 PM Michael Paquier  wrote:
> Spinlocks provide a full memory barrier, which may not the case with
> add_u64() or read_u64().  Could memory ordering be a problem in these
> code paths?

It could be a huge problem if what you say were true, but unless I'm
missing something, the comments in atomics.h say that it isn't. The
comments for the 64-bit functions say to look at the 32-bit functions,
and there it says:

/*
 * pg_atomic_add_fetch_u32 - atomically add to variable
 *
 * Returns the value of ptr after the arithmetic operation.
 *
 * Full barrier semantics.
 */

Which is probably a good thing, because I'm not sure what good it
would be to have an operation that both reads and modifies an atomic
variable but has no barrier semantics.

That's not to say that I entirely understand the point of this patch.
It seems like a generally reasonable thing to do AFAICT but somehow I
wonder if there's more to the story here, because it doesn't feel like
it would be easy to measure the benefit of this patch in isolation.
That's not a reason to reject it, though, just something that makes me
curious.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Docs: Encourage strong server verification with SCRAM

2023-05-24 Thread Daniel Gustafsson
> On 23 May 2023, at 23:02, Stephen Frost  wrote:
> * Jacob Champion (jchamp...@timescale.com) wrote:

>> - low iteration counts accepted by the client make it easier than it
>> probably should be for a MITM to brute-force passwords (note that
>> PG16's scram_iterations GUC, being server-side, does not mitigate
>> this)
> 
> This would be good to improve on.

The mechanics of this are quite straighforward, the problem IMHO lies in how to
inform and educate users what a reasonable iteration count is, not to mention
what an iteration count is in the first place.

> Perhaps more succinctly- maybe we should be making adjustments to the
> current language instead of just adding a new paragraph.

+1

--
Daniel Gustafsson





Re: pgsql: TAP test for logical decoding on standby

2023-05-24 Thread Drouvot, Bertrand

Hi,

On 5/23/23 5:15 PM, Robert Haas wrote:

On Sat, Apr 8, 2023 at 5:26 AM Andres Freund  wrote:

TAP test for logical decoding on standby


Small nitpicks:

1. The test names generated by check_slots_conflicting_status() start
with a capital letter, while most other test names start with a
lower-case letter.



Yeah, not sure that would deserve a fix for its own but if we address 2.
then let's do 1. too.


2. The function is called 7 times, 6 with a true argument and 1 with a
false argument, but the test name only depends on whether the argument
is true or false, so we get the same test name 6 times. Maybe there's
not a reasonable way to do better, I'm not sure, but it's not ideal.



I agree that's not ideal (but one could still figure out which one is
failing if any by looking at the perl script).

If we want to "improve" this, what about passing a second argument that
would provide more context in the test name?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Wrong results due to missing quals

2023-05-24 Thread Richard Guo
Testing with SQLancer reports a wrong results issue on master and I
reduced it to the repro query below.

create table t (a int, b int);

explain (costs off)
select * from t t1 left join
(t t2 left join t t3 full join t t4 on false on false)
left join t t5 on t2.a = t5.a
on t2.b = 1;
QUERY PLAN
--
 Nested Loop Left Join
   ->  Seq Scan on t t1
   ->  Materialize
 ->  Nested Loop Left Join
   ->  Nested Loop Left Join
 Join Filter: false
 ->  Seq Scan on t t2
   Filter: (b = 1)
 ->  Result
   One-Time Filter: false
   ->  Materialize
 ->  Seq Scan on t t5
(12 rows)

So the qual 't2.a = t5.a' is missing.

I looked into it and found that both clones of this joinqual are
rejected by clause_is_computable_at, because their required_relids do
not include the outer join of t2/(t3/t4), and meanwhile include nullable
rels of this outer join.

I think the root cause is that, as Tom pointed out in [1], we're not
maintaining required_relids very accurately.  In b9c755a2, we make
clause_is_computable_at test required_relids for clone clauses.  I think
this is how this issue sneaks in.

To fix it, it seems to me that the ideal way would be to always compute
accurate required_relids.  But I'm not sure how difficult it is.

[1] https://www.postgresql.org/message-id/395264.1684698283%40sss.pgh.pa.us

Thanks
Richard


Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2023-05-24 Thread Daniel Gustafsson
> On 24 May 2023, at 11:52, Michael Paquier  wrote:
> 
> On Wed, May 24, 2023 at 11:36:56AM +0200, Daniel Gustafsson wrote:
>> 1.0.2 is also an LTS version available commercially for premium support
>> customers of OpenSSL (1.1.1 will become an LTS version as well), with 1.0.2zh
>> slated for release next week.  This raises the likelyhood of Postgres
>> installations using 1.0.2 in production still, and for some time to come.
> 
> Good point.  Indeed, that makes it pretty clear that not dropping
> 1.0.2 would be the best option for the time being, so 0001 would be
> enough.

I think thats the right move re 1.0.2 support.  1.0.2 is also the version in
RHEL7 which is in ELS until 2026.

When we moved the goalposts to 1.0.1 (commit 7b283d0e1d1) we referred to RHEL6
using 1.0.1, and RHEL6 goes out of ELS in late June 2024 seems possible to drop
1.0.1 support during v17.  I haven't studied the patch yet but I'll have a look
at it.

> I am wondering if we should worry about having a buildfarm member that
> could test these binaries, though, in case they have compatibility
> issues..  But it would be harder to debug without the code at hand, as
> well.

It would be nice it the OpenSSL project could grant us an LTS license for a
buildfarm animal to ensure compatibility but I have no idea how realistic that
is (or how much the LTS version of 1.0.2 has diverged from the last available
public 1.0.2 version).

--
Daniel Gustafsson





Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2023-05-24 Thread Michael Paquier
On Wed, May 24, 2023 at 11:36:56AM +0200, Daniel Gustafsson wrote:
> 1.0.2 is also an LTS version available commercially for premium support
> customers of OpenSSL (1.1.1 will become an LTS version as well), with 1.0.2zh
> slated for release next week.  This raises the likelyhood of Postgres
> installations using 1.0.2 in production still, and for some time to come.

Good point.  Indeed, that makes it pretty clear that not dropping
1.0.2 would be the best option for the time being, so 0001 would be
enough.

I am wondering if we should worry about having a buildfarm member that
could test these binaries, though, in case they have compatibility
issues..  But it would be harder to debug without the code at hand, as
well.
--
Michael


signature.asc
Description: PGP signature


Re: RFI: Extending the TOAST Pointer

2023-05-24 Thread Matthias van de Meent
On Tue, 23 May 2023 at 18:34, Robert Haas  wrote:
>
> On Thu, May 18, 2023 at 8:06 AM Matthias van de Meent
>  wrote:
> > This enum still has many options to go before it exceeds the maximum
> > of the uint8 va_tag field. Therefore, I don't think we have no disk
> > representations left, nor do I think we'll need to add another option
> > to the ToastCompressionId enum.
> > As an example, we can add another VARTAG option for dictionary-enabled
> > external toast; like what the pluggable toast patch worked on. I think
> > we can salvage some ideas from that patch, even if the main idea got
> > stuck.
>
> Adding another VARTAG option is somewhat different from adding another
> ToastCompressionId. I think that right now we have embedded in various
> places the idea that VARTAG_EXTERNAL is the only thing that shows up
> on disk, and we'd need to track down all such places and adjust them
> if we add other VARTAG types in the future. Depending on how it is to
> be used, adding a new ToastCompressionId might be less work. However,
> I don't think we can use the possibility of adding a new VARTAG value
> as a reason why it's OK to use up the last possible ToastCompressionId
> value for something non-extensible.

I think you might not have picked up on what I was arguing for, but I
agree with what you just said.

My comment on not needing to invent a new ToastCompressionId was on
the topic of adding capabilities^ to toast pointers that do things
differently than the current TOAST and need more info than just sizes,
2x 32-bit ID and a compression algorithm.

^ capabilities such as compression dictionaries (which would need to
store a dictionary ID in the pointer), TOAST IDs that are larger than
32 bits, and other such advances.

Kind regards,

Matthias van de Meent
Neon, Inc.




Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2023-05-24 Thread Daniel Gustafsson
> On 24 May 2023, at 10:22, Michael Paquier  wrote:

> The removal of 1.0.2 would be nice, but the tweaks
> needed for LibreSSL make it less appealing.

1.0.2 is also an LTS version available commercially for premium support
customers of OpenSSL (1.1.1 will become an LTS version as well), with 1.0.2zh
slated for release next week.  This raises the likelyhood of Postgres
installations using 1.0.2 in production still, and for some time to come.

--
Daniel Gustafsson





Re: RFI: Extending the TOAST Pointer

2023-05-24 Thread Nikita Malakhov
Hi!

I've made a WIP patch that uses 64-bit TOAST value ID instead of 32-bit,
and sent it as a part of discussion, but there was no feedback on such a
solution. There was a link to that discussion at the top of this thread.

Also, I have to note that, based on our work on Pluggable TOAST - extending
TOAST pointer with additional structures would require review of the logical
replication engine, currently it is not suitable for any custom TOAST
pointers.
Currently we have no final solution for problems with logical replication
for
custom TOAST pointers.

-- 
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/


Re: memory leak in trigger handling (since PG12)

2023-05-24 Thread Tomas Vondra
On 5/23/23 23:39, Tom Lane wrote:
> Tomas Vondra  writes:
>> The really hard thing was determining what causes the memory leak - the
>> simple instrumentation doesn't help with that at all. It tells you there
>> might be a leak, but you don't know where did the allocations came from.
> 
>> What I ended up doing is a simple gdb script that sets breakpoints on
>> all palloc/pfree variants, and prints info (including the backtrace) for
>> each call on ExecutorState. And then a script that aggregate those to
>> identify which backtraces allocated most chunks that were not freed.
> 
> FWIW, I've had some success localizing palloc memory leaks with valgrind's
> leak detection mode.  The trick is to ask it for a report before the
> context gets destroyed.  Beats writing your own infrastructure ...
> 

I haven't tried valgrind, so can't compare.

Would it be possible to filter which memory contexts to track? Say we
know the leak is in ExecutorState, so we don't need to track allocations
in other contexts. That was a huge speedup for me, maybe it'd help
valgrind too.

Also, how did you ask for the report before context gets destroyed?

>> Would require testing with more data, though. I doubt we'd find much
>> with our regression tests, which have tiny data sets.
> 
> Yeah, it's not clear whether we could make the still-hypothetical check
> sensitive enough to find leaks using small test cases without getting an
> unworkable number of false positives.  Still, might be worth trying.

I'm not against experimenting with that. Were you thinking about
something that'd be cheap enough to just be enabled always/everywhere,
or something we'd enable during testing?

This reminded me a strangeloop talk [1] [2] about the Scalene memory
profiler from UMass. That's for Python, but they did some smart tricks
to reduce the cost of profiling - maybe we could do something similar,
possibly by extending the memory contexts a bit.

[1] https://youtu.be/vVUnCXKuNOg?t=1405
[2] https://youtu.be/vVUnCXKuNOg?t=1706

> It might be an acceptable tradeoff to have stricter rules for what can
> be allocated in ExecutorState in order to make this sort of problem
> more easily detectable.
> 

Would these rules be just customary, or defined/enforced in the code
somehow? I can't quite imagine how would that work, TBH.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2023-05-24 Thread Michael Paquier
Hi all,

$Subject says it all.  Based on an analysis of the code, I can note
the following when it comes to the removal of 1.0.1:
- A lot of code related to channel binding is now simplified as
X509_get_signature_nid() always exists, mostly around its CFLAGS.
- This removes the comments related to 1.0.1e introduced by 74242c2.

Then for 1.0.2, the following flags can be gone:
HAVE_ASN1_STRING_GET0_DATA
HAVE_BIO_GET_DATA
HAVE_BIO_METH_NEW
HAVE_HMAC_CTX_FREE
HAVE_HMAC_CTX_NEW

It would be nice to remove CRYPTO_lock(), but from what I can see the
function still exists in LibreSSL, meaning that libpq locking wouldn't
be thread-safe if these function's paths are removed.

Another related question is how much do we care about builds with
LibreSSL with MSVC?  This patch sets takes the simple path of assuming
that this has never been really tested, and if you look at the MSVC
scripts on HEAD we rely on a version number from OpenSSL, which is not
something LibreSSL copes nicely with already, as documented in
configure.ac.

OpenSSL 1.0.2 has been EOLd at the end of 2019, and 1.0.1 had its last
minor release in September 2019, so with Postgres 17 targetted in
September/October 2024, we would be five years behind that.

Last comes the buildfarm, and I suspect that a few animals are
building with 1.0.2.  Among what I have spotted:
- wobbegong and vulpes, on Fedora 27, though I could not find
references about the version used there.
- bonito, on Fedora 29.
- SUSE 12 SP{4,5} have 1.0.2 as their newer version.
- butterflyfish may not like that, if I recall correctly, as it should
have 1.0.2.

So, it seems to me that 1.0.1 would be a rather silent move for the
buildfarm, and 1.0.2 could lead to some noise.  Note as well that
1.1.0 support has been stopped by upstream at the same time as 1.0.1,
with a last minor release in September 2019, though that feels like a
huge jump at this stage.  On a code basis, removing 1.0.1 leads to the
most cleanup.  The removal of 1.0.2 would be nice, but the tweaks
needed for LibreSSL make it less appealing.

Attached are two patches to remove support for 1.0.1 and 1.0.2 for
now, kept separated for clarity, to be considered as something to do
at the beginning of the v17 cycle.  0001 is in a rather good shape
seen from here.

Now, for 0002 and the removal of 1.0.2, I am seeing two things once
OPENSSL_API_COMPAT is bumped from 0x10002000L to 0x1010L:
- be-secure-openssl.c requires an extra openssl/bn.h, which is not a
big deal, from what I get.
- Much more interesting: OpenSSL_add_all_algorithms() has two macros
that get removed, see include/openssl/evp.h:
#  ifdef OPENSSL_LOAD_CONF
#   define OpenSSL_add_all_algorithms() \
OPENSSL_init_crypto(OPENSSL_INIT_ADD_ALL_CIPHERS \
| OPENSSL_INIT_ADD_ALL_DIGESTS \
| OPENSSL_INIT_LOAD_CONFIG, NULL)
#  else
#   define OpenSSL_add_all_algorithms() \
OPENSSL_init_crypto(OPENSSL_INIT_ADD_ALL_CIPHERS \
| OPENSSL_INIT_ADD_ALL_DIGESTS, NULL)
#  endif

The part where I am not quite sure of what to do is about
OPENSSL_LOAD_CONF.  We could call directly OPENSSL_init_crypto() and
add OPENSSL_INIT_LOAD_CONFIG if building with OPENSSL_LOAD_CONF, but
it feels a bit ugly to copy-paste this code from OpenSSL itself.
Note that patch 0002 still has OPENSSL_API_COMPAT at 0x10002000L.
OPENSSL_init_crypto() looks to be in LibreSSL, and it is new in
OpenSSL 1.1.0, so switching the minimum to OpenSSL 1.1.0 should not
require a new cflags on this one.

Thoughts?
--
Michael
From 553b9eb7e24250a92d7551cb28a301d9e63ba7ad Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 24 May 2023 16:22:02 +0900
Subject: [PATCH v1 1/2] Remove support for OpenSSL 1.0.1

A few notes about that:
- This simplifies a lot of code areas related to channel binding, as
X509_get_signature_nid() always exists.
- This removes the comments related to 1.0.1e introduced by 74242c2.
- Do we need to care about the case of building the Postgres code with
LibreSSL on Windows for the MSVC scripts, leading to the removal of the
check with HAVE_SSL_CTX_SET_CERT_CB?
---
 src/include/libpq/libpq-be.h |  6 +---
 src/include/pg_config.h.in   |  3 --
 src/backend/libpq/auth-scram.c   | 20 ++--
 src/backend/libpq/be-secure-openssl.c|  4 ---
 src/interfaces/libpq/fe-auth-scram.c |  8 ++---
 src/interfaces/libpq/fe-auth.c   |  2 +-
 src/interfaces/libpq/fe-secure-openssl.c |  4 ---
 src/interfaces/libpq/libpq-int.h |  6 +---
 src/test/ssl/t/002_scram.pl  | 41 
 doc/src/sgml/installation.sgml   |  2 +-
 configure| 16 -
 configure.ac |  9 +++---
 meson.build  |  5 ++-
 src/tools/msvc/Solution.pm   | 10 +-
 14 files changed, 38 insertions(+), 98 deletions(-)

diff --git a/src/include/libpq/libpq-be.h 

Re: memory leak in trigger handling (since PG12)

2023-05-24 Thread Jakub Wartak
Hi, just two cents:

On Tue, May 23, 2023 at 8:01 PM Andres Freund  wrote:
>
> Hi,
>
> On 2023-05-23 13:28:30 -0400, Tom Lane wrote:
> > Andres Freund  writes:
> > > Could it help to have a mode where the executor shutdown hook checks how 
> > > much
> > > memory is allocated in ExecutorState and warns if its too much?
> >
> > It'd be very hard to set a limit for what's "too much", since the amount
> > of stuff created initially will depend on the plan size.
>
> I was thinking of some limit that should really never be reached outside of a
> leak or work_mem based allocations, say 2GB or so.

RE: instrumentation subthread:
if that helps then below technique can work somewhat good on normal
binaries for end users (given there are debug symbols installed), so
maybe we don't need that much infrastructure added in to see the hot
code path:

perf probe -x /path/to/postgres 'palloc' 'size=%di:u64' # RDI on
x86_64(palloc size arg0)
perf record -avg --call-graph dwarf -e probe_postgres:palloc -aR -p
 sleep 3 # cannot be longer, huge overhead (~3s=~2GB)

it produces:
50.27%  (563d0e380670) size=24
|
---palloc
   bms_copy
   ExecUpdateLockMode
   ExecBRUpdateTriggers
   ExecUpdate
[..]

49.73%  (563d0e380670) size=16
|
---palloc
   bms_copy
   RelationGetIndexAttrBitmap
   ExecUpdateLockMode
   ExecBRUpdateTriggers
   ExecUpdate
[..]

Now we know that those small palloc() are guilty, but we didn't know
at the time with Tomas. The problem here is that we do not know in
palloc() - via its own arguments for which MemoryContext this is going
to be allocated for. This is problematic for perf, because on RHEL8, I
was not able to generate an uprobe that was capable of reaching a
global variable (CurrentMemoryContext) at that time.

Additionally what was even more frustrating on diagnosing that case on
the customer end system, was that such OOMs were crashing other
PostgreSQL clusters on the same OS. Even knowing the exact guilty
statement it was impossible to limit RSS memory usage of that
particular backend. So, what you are proposing makes a lot of sense.
Also it got me thinking of implementing safety-memory-net-GUC
debug_query_limit_backend_memory=X MB that would inject
setrlimit(RLIMIT_DATA) through external extension via hook(s) and
un-set it later, but the man page states it works for mmap() only
after Linux 4.7+ so it is future proof but won't work e.g. on RHEL7 -
maybe that's still good enough?; Or, well maybe try to hack a palloc()
a little, but that has probably too big overhead, right? (just
thinking loud).

-Jakub Wartak.




Re: Large files for relations

2023-05-24 Thread Peter Eisentraut

On 24.05.23 02:34, Thomas Munro wrote:

Thanks all for the feedback.  It was a nice idea and it *almost*
works, but it seems like we just can't drop segmented mode.  And the
automatic transition schemes I showed don't make much sense without
that goal.

What I'm hearing is that something simple like this might be more acceptable:

* initdb --rel-segsize (cf --wal-segsize), default unchanged


makes sense


* pg_upgrade would convert if source and target don't match


This would be good, but it could also be an optional or later feature.

Maybe that should be a different mode, like 
--copy-and-adjust-as-necessary, so that users would have to opt into 
what would presumably be slower than plain --copy, rather than being 
surprised by it, if they unwittingly used incompatible initdb options.



I would probably also leave out those Windows file API changes, too.
--rel-segsize would simply refuse larger sizes until someone does the
work on that platform, to keep the initial proposal small.


Those changes from off_t to pgoff_t?  Yes, it would be good to do 
without those.  Apart of the practical problems that have been brought 
up, this was a major annoyance with the proposed patch set IMO.



I would probably leave the experimental copy_on_write() ideas out too,
for separate discussion in a separate proposal.


right





Re: Insertion Sort Improvements

2023-05-24 Thread Benjamin Coutu
> That's worth trying out. It might also then be worth trying to push both 
> unordered values -- the big one up / the small one down. I've seen other 
> implementations do that, but don't remember where, or what it's called.

It is important that we do not do 2 compares two avoid one copy (assignment to 
temporary) as you did in your patch earlier in this thread, cause compares are 
usually pretty costly (also two compares are then inlined, bloating the binary).
Assigning a sort tuple to a temporary translates to pretty simple assembly 
code, so my suggested modification should outperform. It cuts down the cost of 
the inner loop by ca. 40% comparing the assembly. And it avoids having to 
re-read memory on each comparison, as the temporary can be held in registers.

> "Unbounded" means no bounds check on the array. That's not possible in its 
> current form, so I think you misunderstood something.

Sorry for the confusion. I didn't mean unbounded in the "array bound checking" 
sense, but in the "unrestricted number of loops" sense.

> I only remember implementations tracking loop iterations, not swaps. You'd 
> need evidence that this is better.

Well, the idea was to include the presorted check somehow. Stopping after a 
certain number of iterations is surely more safe than stopping after a number 
of swaps, but we would then implicitly also stop our presort check. We could 
change that though: Count loop iterations and on bailout continue with a pure 
presort check, but from the last position of the insertion sort -- not all over 
again -- by comparing against the maximum value recorded during the insertion 
sort. Thoughts?

> An important part not mentioned yet: This might only be worth doing if the 
> previous recursion level performed no swaps during partitioning and the 
> current pivot candidates are ordered.

Agreed.

> It's currently 7, but should really be something like 10. A script that 
> repeats tests for, say, 7 through 18 should show a concave-up shape in the 
> times. The bottom of the bowl should shift to higher values, and that minimum 
> is what should be compared.

Yeah, as alluded to before, it should be closer to 10 nowadays.
In any case it should be changed at least from 7 to 8, cause then we could at 
least discard the additional check for n > 7 in the quicksort code path (see 
/src/include/lib/sort_template.h#L322). Currently we check n < 7 and a few 
lines down we check for n > 7, if we check n < 8 for insertion sort then the 
second check becomes obsolete.

Benjamin Coutu
http://www.zeyos.com