date:20240409

Re: Incorrect handling of IS [NOT] NULL quals on inheritance parents

2024-04-09 Thread David Rowley

On Wed, 10 Apr 2024 at 17:18, Tom Lane  wrote:
>
> David Rowley  writes:
> > I think a better fix is just to not apply the optimisation for
> > inheritance RTEs in add_base_clause_to_rel().
>
> Is it worth paying attention to whether the constraint is marked
> connoinherit?  If that involves an extra syscache fetch, I'd tend to
> agree that it's not worth it; but if we can get that info for free
> it seems worthwhile to not break this for inheritance cases.

I think everything should be optimised as we like without that.
Effectively get_relation_info() looks at the pg_attribute.attnotnull
column for the relation in question. We never look at the
pg_constraint record to figure out the nullability of the column.

David

Re: Improve eviction algorithm in ReorderBuffer

2024-04-09 Thread Amit Kapila

On Wed, Apr 10, 2024 at 11:00 AM Heikki Linnakangas  wrote:
>
> On 10/04/2024 07:45, Michael Paquier wrote:
> > On Tue, Apr 09, 2024 at 09:16:53PM -0700, Jeff Davis wrote:
> >> On Wed, 2024-04-10 at 12:13 +0900, Michael Paquier wrote:
> >>> Wouldn't the best way forward be to revert
> >>> 5bec1d6bc5e3 and revisit the whole in v18?
> >>
> >> Also consider commits b840508644 and bcb14f4abc.
> >
> > Indeed.  These are also linked.
>
> I don't feel the urge to revert this:
>
> - It's not broken as such, we're just discussing better ways to
> implement it. We could also do nothing, and revisit this in v18. The
> only must-fix issue is some compiler warnings IIUC.
>
> - It's a pretty localized change in reorderbuffer.c, so it's not in the
> way of other patches or reverts. Nothing else depends on the binaryheap
> changes yet either.
>
> - It seems straightforward to repeat the performance tests with whatever
> alternative implementations we want to consider.
>
> My #1 choice would be to write a patch to switch the pairing heap,
> performance test that, and revert the binary heap changes.
>

+1.

-- 
With Regards,
Amit Kapila.

Re: Improve eviction algorithm in ReorderBuffer

2024-04-09 Thread Heikki Linnakangas


On 10/04/2024 07:45, Michael Paquier wrote:

On Tue, Apr 09, 2024 at 09:16:53PM -0700, Jeff Davis wrote:

On Wed, 2024-04-10 at 12:13 +0900, Michael Paquier wrote:

Wouldn't the best way forward be to revert
5bec1d6bc5e3 and revisit the whole in v18?


Also consider commits b840508644 and bcb14f4abc.


Indeed.  These are also linked.


I don't feel the urge to revert this:

- It's not broken as such, we're just discussing better ways to 
implement it. We could also do nothing, and revisit this in v18. The 
only must-fix issue is some compiler warnings IIUC.


- It's a pretty localized change in reorderbuffer.c, so it's not in the 
way of other patches or reverts. Nothing else depends on the binaryheap 
changes yet either.


- It seems straightforward to repeat the performance tests with whatever 
alternative implementations we want to consider.


My #1 choice would be to write a patch to switch the pairing heap, 
performance test that, and revert the binary heap changes.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Incorrect handling of IS [NOT] NULL quals on inheritance parents

2024-04-09 Thread Tom Lane

David Rowley  writes:
> I think a better fix is just to not apply the optimisation for
> inheritance RTEs in add_base_clause_to_rel().

Is it worth paying attention to whether the constraint is marked
connoinherit?  If that involves an extra syscache fetch, I'd tend to
agree that it's not worth it; but if we can get that info for free
it seems worthwhile to not break this for inheritance cases.

regards, tom lane

Re: Incorrect handling of IS [NOT] NULL quals on inheritance parents

2024-04-09 Thread David Rowley

On Tue, 9 Apr 2024 at 21:55, Richard Guo  wrote:
>
> In b262ad440e we introduced an optimization that drops IS NOT NULL quals
> on a NOT NULL column, and reduces IS NULL quals on a NOT NULL column to
> constant-FALSE.  I happened to notice that this is not working correctly
> for traditional inheritance parents.  Traditional inheritance parents
> might have NOT NULL constraints marked NO INHERIT, while their child
> tables do not have NOT NULL constraints.  In such a case, we would have
> problems when we have removed redundant IS NOT NULL restriction clauses
> of the parent rel, as this could cause NULL values from child tables to
> not be filtered out, or when we have reduced IS NULL restriction clauses
> of the parent rel to constant-FALSE, as this could cause NULL values
> from child tables to not be selected out.

hmm, yeah, inheritance tables were overlooked.

I looked at the patch and I don't think it's a good idea to skip
recording NOT NULL constraints to fix based on the fact that it
happens to result in this particular optimisation working correctly.
It seems that just makes this work in favour of possibly being wrong
for some future optimisation where we have something else that looks
at the RelOptInfo.notnullattnums and makes some decision that assumes
the lack of corresponding notnullattnums member means the column is
NULLable.

I think a better fix is just to not apply the optimisation for
inheritance RTEs in add_base_clause_to_rel().  If we do it this way,
it's only the inh==true RTE that we skip.  Remember that there are two
RangeTblEntries for an inheritance parent.  The other one will have
inh==false, and we can still have the optimisation as that's the one
that'll be used in the final plan.  It'll be the inh==true one that we
copy the quals from in apply_child_basequals(), so we've no need to
worry about missing baserestrictinfos when applying the base quals to
the child.

For partitioned tables, there's only a single RTE with inh==true.
We're free to include the redundant quals there to be applied or
skipped in apply_child_basequals().  The corresponding RangeTblEntry
is not going to be scanned in the final plan, so it does not matter
about the extra qual.

The revised patch is attached.

David
diff --git a/src/backend/optimizer/plan/initsplan.c 
b/src/backend/optimizer/plan/initsplan.c
index d3868b628d..1de965b3ad 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -2629,32 +2629,45 @@ add_base_clause_to_rel(PlannerInfo *root, Index relid,
   RestrictInfo *restrictinfo)
 {
RelOptInfo *rel = find_base_rel(root, relid);
+   RangeTblEntry *rte = root->simple_rte_array[relid];

Assert(bms_membership(restrictinfo->required_relids) == BMS_SINGLETON);

-   /* Don't add the clause if it is always true */
-   if (restriction_is_always_true(root, restrictinfo))
-   return;
-
/*
-* Substitute the origin qual with constant-FALSE if it is provably 
always
-* false.  Note that we keep the same rinfo_serial.
+* For inheritance parent tables, we must always record the
+* RestrictInfo in baserestrictinfo as is.  If we were to transform or
+* skip adding it, then the original wouldn't be available in
+* apply_child_basequals.  Since there are two RangeTblEntries for
+* inheritance parents, one with inh==true and the other with 
inh==false,
+* we're still able to apply this optimization to the inh==false one.  
The
+* inh==true one is what apply_child_basequals() sees.
 */
-   if (restriction_is_always_false(root, restrictinfo))
+   if (!rte->inh)
{
-   int save_rinfo_serial = 
restrictinfo->rinfo_serial;
-
-   restrictinfo = make_restrictinfo(root,
-   
 (Expr *) makeBoolConst(false, false),
-   
 restrictinfo->is_pushed_down,
-   
 restrictinfo->has_clone,
-   
 restrictinfo->is_clone,
-   
 restrictinfo->pseudoconstant,
-   
 0, /* security_level */
-   
 restrictinfo->required_relids,
-   
 restrictinfo->incompatible_relids,
-   
 restrictinfo->outer_relids);
-   restrictinfo->rinfo_serial = save_rinfo_serial;
+   /* Don't add the

Re: Speed up clean meson builds by ~25%

2024-04-09 Thread Tom Lane

Thomas Munro  writes:
> On Wed, Apr 10, 2024 at 11:44 AM Tom Lane  wrote:
>> ... On my Mac laptop
>> (Apple clang version 15.0.0), the compile time for preproc.o went from
>> 6.7sec to 5.5sec.

> Having seen multi-minute compile times on FreeBSD (where clang is the
> system compiler) and Debian (where I get packages from apt.llvm.org),
> I have been quietly waiting for this issue to hit Mac users too (where
> a clang with unknown proprietary changes is the system compiler), but
> it never did.  Huh.

I don't doubt that there are other clang versions where the problem
bites a lot harder.  What result do you get from the test I tried
(turning mm_strdup into a no-op macro)?

regards, tom lane

Re: Speed up clean meson builds by ~25%

2024-04-09 Thread Thomas Munro

On Wed, Apr 10, 2024 at 11:44 AM Tom Lane  wrote:
> ... On my Mac laptop
> (Apple clang version 15.0.0), the compile time for preproc.o went from
> 6.7sec to 5.5sec.

Having seen multi-minute compile times on FreeBSD (where clang is the
system compiler) and Debian (where I get packages from apt.llvm.org),
I have been quietly waiting for this issue to hit Mac users too (where
a clang with unknown proprietary changes is the system compiler), but
it never did.  Huh.

I tried to understand a bit more about Apple's version soup.  This
seems to be an up-to-date table (though I don't understand their
source of information):

https://en.wikipedia.org/wiki/Xcode#Xcode_15.0_-_(since_visionOS_support)_2

According to cc -v on my up-to-date MacBook Air, it has "Apple clang
version 15.0.0 (clang-1500.3.9.4)", which, if the table is correct,
means that it's using LLVM 16.0.0 (note, not 16.0.6, the final version
of that branch of [open] LLVM, and the version I saw the issue with on
FreeBSD and Debian).  They relabel everything to match the Xcode
version that shipped it, and they're currently off by one.

I wondered if perhaps the table just wasn't accurate in the final
digits, so I looked for clues in strings in the binary, and sure
enough it contains "LLVM 15.0.0".  My guess would be that they've
clobbered the major version, but not the rest: the Xcode version is
15.3, and I don't see a 3, so I guess this is really derived from LLVM
16.0.0.

One explanation would be that they rebase their proprietary bits and
pieces over the .0 version of each major release, and then cherry-pick
urgent fixes and stuff later, not pulling in the whole minor release;
they also presumably have to maintain it for much longer than the LLVM
project's narrow support window.  Who knows.  So now I wonder if it
could be that LLVM 16.0.6 does this, but LLVM 16.0.0 doesn't.

I installed clang-16 (16.0.6) with MacPorts, and it does show the problem.

Re: Weird test mixup

2024-04-09 Thread Michael Paquier

On Tue, Apr 09, 2024 at 12:41:57PM +0900, Michael Paquier wrote:
> Applied that after a second check.  And thanks to Bharath for the
> poke.

And now that the buildfarm is cooler, I've also applied the final
patch in the series as of 5105c9079681 to make the GIN module
concurrent-safe using injection_points_set_local().
--
Michael


signature.asc
Description: PGP signature

Re: Improve eviction algorithm in ReorderBuffer

2024-04-09 Thread Michael Paquier

On Tue, Apr 09, 2024 at 09:16:53PM -0700, Jeff Davis wrote:
> On Wed, 2024-04-10 at 12:13 +0900, Michael Paquier wrote:
>> Wouldn't the best way forward be to revert
>> 5bec1d6bc5e3 and revisit the whole in v18?
> 
> Also consider commits b840508644 and bcb14f4abc.

Indeed.  These are also linked.
--
Michael


signature.asc
Description: PGP signature

Re: Improve eviction algorithm in ReorderBuffer

2024-04-09 Thread Jeff Davis

On Wed, 2024-04-10 at 12:13 +0900, Michael Paquier wrote:

> Wouldn't the best way forward be to revert
> 5bec1d6bc5e3 and revisit the whole in v18?

That's a reasonable conclusion. Also consider commits b840508644 and
bcb14f4abc.

I had tried to come up with a narrower fix, and I think it's already
been implemented here in approach 2:

https://www.postgresql.org/message-id/CAD21AoAtf12e9Z9NLBuaO1GjHMMo16_8R-yBu9Q9jrk2QLqMEA%40mail.gmail.com

but it does feel wrong to introduce an unnecessary hash table in 17
when we know it's not the right solution.

Regards,
Jeff Davis

Re: Improve eviction algorithm in ReorderBuffer

2024-04-09 Thread Michael Paquier

On Tue, Apr 09, 2024 at 06:24:43PM -0700, Jeff Davis wrote:
> I had suggested that the heap could track the element indexes for
> efficient update/removal, but that would be a change to the
> binaryheap.h API, which would require some discussion (and possibly not
> be acceptable post-freeze).
> 
> But I think you're right: a pairing heap already solves the problem
> without modification. (Note: our pairing heap API doesn't explicitly
> support updating a key, so I think it would need to be done with
> remove/add.) So we might as well just do that right now rather than
> trying to fix the way the hash table is being used or trying to extend
> the binaryheap API.
> 
> Of course, we should measure to be sure there aren't bad cases around
> updating/removing a key, but I don't see a fundamental reason that it
> should be worse.

This is going to require a rewrite of 5bec1d6bc5e3 with a new
performance study, which strikes me as something that we'd better not
do after feature freeze.  Wouldn't the best way forward be to revert
5bec1d6bc5e3 and revisit the whole in v18?

I have added an open item, for now.
--
Michael


signature.asc
Description: PGP signature

Re: post-freeze damage control

2024-04-09 Thread John Naylor

On Tue, Apr 9, 2024 at 2:47 AM Robert Haas  wrote:
>
> - I'm slightly worried about the TID store work (ee1b30f12, 30e144287,
> 667e65aac35), perhaps for no reason. Actually, the results seem really
> impressive,

First, thanks for the complement. I actually suspect if we had this
years ago, it might never have occurred to anyone to go through the
trouble of adding parallel index cleanup.

In a funny way, it's almost too effective -- the claim is that m_w_m
over 1GB is perfectly usable, but I haven't been able to get anywere
near that via vacuum (we have indirectly, via bespoke dev code,
but...). I don't have easy access to hardware that can hold a table
big enough to do so -- vacuuming 1 billion records stays under 400MB.

> and a very quick look at some of the commits seemed kind
> of reassuring. But, like the changes to pruning and freezing, this is
> making some really fundamental changes to critical code. In this case,
> it's code that has evolved very little over the years and seems to
> have now evolved quite a lot.

True. I'd say that at a high level, storage and retrieval of TIDs is a
lot simpler conceptually than other aspects of vacuuming. The
low-level guts are now much more complex, but I'm confident it won't
just output a wrong answer. That aspect has been working for a long
time, and when it has broken during development, it fails very quickly
and obviously.

The more challenging aspects are less cut-and-dried, like memory
management, delegation of responsibility, how to expose locking (which
vacuum doesn't even use), readability/maintainability. Those are more
subjective, but it seems to have finally clicked into place in a way
that feels like the right trade-offs have been made. That's hand-wavy,
I realize.

The more recent follow-up commits are pieces that were discussed and
planned for earlier, but have had less review and were left out from
the initial commits since they're not essential to the functionality.
I judged that test coverage was enough to have confidence in them.

On Tue, Apr 9, 2024 at 6:33 AM Michael Paquier  wrote:
>
> This worries me less than other patches like the ones around heap
> changes or the radix tree stuff for TID collection plugged into
> vacuum, which don't have explicit on/off switches AFAIK.

Yes, there is no switch. Interestingly enough, the previous item array
ended up with its own escape hatch to tamp down its eager allocation,
autovacuum_work_mem. Echoing what I said to Robert, if we had the new
storage years ago, I doubt this GUC would ever have been proposed.

I'll also mention the array is still in the code base, but only in a
test module as a standard to test against. Hopefully that offers some
reassurance.

Requiring LLVM 14+ in PostgreSQL 18

2024-04-09 Thread Thomas Munro

Hi

PostgreSQL 18 will ship after these vacuum horizon systems reach EOL[1]:

animal | arch | llvm_version | os | os_release | end_of_support
---+-+--+++
branta | s390x | 10.0.0 | Ubuntu | 20.04 | 2025-04-01
splitfin | aarch64 | 10.0.0 | Ubuntu | 20.04 | 2025-04-01
urutau | s390x | 10.0.0 | Ubuntu | 20.04 | 2025-04-01
massasauga | aarch64 | 11.1.0 | Amazon | 2 | 2025-06-30
snakefly | aarch64 | 11.1.0 | Amazon | 2 | 2025-06-30

Therefore, some time after the tree re-opens for hacking, we could rip
out a bunch of support code for LLVM 10-13, and then rip out support
for pre-opaque-pointer mode.  Please see attached.

[1] 
https://www.postgresql.org/message-id/CA%2BhUKG%2B-g61yq7Ce4aoZtBDO98b4GXH8Cu3zxVk-Zn1Vh7TKpA%40mail.gmail.com
From f5de5c6535b825033b1829eaf340baacc10ed654 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 19 Oct 2023 04:45:46 +1300
Subject: [PATCH 1/2] jit: Require at least LLVM 14, if enabled.

Remove support for LLVM versions 10-13.  The default on all non-EOL'd
OSes represented in our build farm will be at least LLVM 14 when
PostgreSQL 18 ships.

Discussion: https://postgr.es/m/CA%2BhUKGLhNs5geZaVNj2EJ79Dx9W8fyWUU3HxcpZy55sMGcY%3DiA%40mail.gmail.com
---
 config/llvm.m4  |   8 +-
 configure   |   8 +-
 doc/src/sgml/installation.sgml  |   4 +-
 meson.build |   2 +-
 src/backend/jit/llvm/llvmjit.c  | 109 
 src/backend/jit/llvm/llvmjit_error.cpp  |  25 --
 src/backend/jit/llvm/llvmjit_inline.cpp |  13 ---
 src/backend/jit/llvm/llvmjit_wrap.cpp   |   4 -
 8 files changed, 11 insertions(+), 162 deletions(-)

diff --git a/config/llvm.m4 b/config/llvm.m4
index 44769d819a0..6ca088b5a45 100644
--- a/config/llvm.m4
+++ b/config/llvm.m4
@@ -13,7 +13,7 @@ AC_DEFUN([PGAC_LLVM_SUPPORT],
   AC_REQUIRE([AC_PROG_AWK])
 
   AC_ARG_VAR(LLVM_CONFIG, [path to llvm-config command])
-  PGAC_PATH_PROGS(LLVM_CONFIG, llvm-config llvm-config-17 llvm-config-16 llvm-config-15 llvm-config-14 llvm-config-13 llvm-config-12 llvm-config-11 llvm-config-10)
+  PGAC_PATH_PROGS(LLVM_CONFIG, llvm-config llvm-config-18 llvm-config-17 llvm-config-16 llvm-config-15 llvm-config-14)
 
   # no point continuing if llvm wasn't found
   if test -z "$LLVM_CONFIG"; then
@@ -25,14 +25,14 @@ AC_DEFUN([PGAC_LLVM_SUPPORT],
 AC_MSG_ERROR([$LLVM_CONFIG does not work])
   fi
   # and whether the version is supported
-  if echo $pgac_llvm_version | $AWK -F '.' '{ if ([$]1 >= 10) exit 1; else exit 0;}';then
-AC_MSG_ERROR([$LLVM_CONFIG version is $pgac_llvm_version but at least 10 is required])
+  if echo $pgac_llvm_version | $AWK -F '.' '{ if ([$]1 >= 14) exit 1; else exit 0;}';then
+AC_MSG_ERROR([$LLVM_CONFIG version is $pgac_llvm_version but at least 14 is required])
   fi
   AC_MSG_NOTICE([using llvm $pgac_llvm_version])
 
   # need clang to create some bitcode files
   AC_ARG_VAR(CLANG, [path to clang compiler to generate bitcode])
-  PGAC_PATH_PROGS(CLANG, clang clang-17 clang-16 clang-15 clang-14 clang-13 clang-12 clang-11 clang-10)
+  PGAC_PATH_PROGS(CLANG, clang clang-18 clang-17 clang-16 clang-15 clang-14)
   if test -z "$CLANG"; then
 AC_MSG_ERROR([clang not found, but required when compiling --with-llvm, specify with CLANG=])
   fi
diff --git a/configure b/configure
index cfbd2a096f4..ba4ce5a5282 100755
--- a/configure
+++ b/configure
@@ -5065,7 +5065,7 @@ if test "$with_llvm" = yes; then :
 
 
   if test -z "$LLVM_CONFIG"; then
-  for ac_prog in llvm-config llvm-config-17 llvm-config-16 llvm-config-15 llvm-config-14 llvm-config-13 llvm-config-12 llvm-config-11 llvm-config-10
+  for ac_prog in llvm-config llvm-config-18 llvm-config-17 llvm-config-16 llvm-config-15 llvm-config-14
 do
   # Extract the first word of "$ac_prog", so it can be a program name with args.
 set dummy $ac_prog; ac_word=$2
@@ -5129,8 +5129,8 @@ fi
 as_fn_error $? "$LLVM_CONFIG does not work" "$LINENO" 5
   fi
   # and whether the version is supported
-  if echo $pgac_llvm_version | $AWK -F '.' '{ if ($1 >= 10) exit 1; else exit 0;}';then
-as_fn_error $? "$LLVM_CONFIG version is $pgac_llvm_version but at least 10 is required" "$LINENO" 5
+  if echo $pgac_llvm_version | $AWK -F '.' '{ if ($1 >= 14) exit 1; else exit 0;}';then
+as_fn_error $? "$LLVM_CONFIG version is $pgac_llvm_version but at least 14 is required" "$LINENO" 5
   fi
   { $as_echo "$as_me:${as_lineno-$LINENO}: using llvm $pgac_llvm_version" >&5
 $as_echo "$as_me: using llvm $pgac_llvm_version" >&6;}
@@ -5138,7 +5138,7 @@ $as_echo "$as_me: using llvm $pgac_llvm_version" >&6;}
   # need clang to create some bitcode files
 
   if test -z "$CLANG"; then
-  for ac_prog in clang clang-17 clang-16 clang-15 clang-14 clang-13 clang-12 clang-11 clang-10
+  for ac_prog in clang clang-18 clang-17 clang-16 clang-15 clang-14
 do
   # Extract the first word of "$ac_prog", so it can be a

Re: Improve eviction algorithm in ReorderBuffer

2024-04-09 Thread Jeff Davis

On Tue, 2024-04-09 at 23:49 +0300, Heikki Linnakangas wrote:
> 
> I wonder why this doesn't use the existing pairing heap 
> implementation? I would assume that to be at least as good as the
> binary 
> heap + hash table

I agree that an additional hash table is not needed -- there's already
a hash table to do a lookup based on xid in reorderbuffer.c.

I had suggested that the heap could track the element indexes for
efficient update/removal, but that would be a change to the
binaryheap.h API, which would require some discussion (and possibly not
be acceptable post-freeze).

But I think you're right: a pairing heap already solves the problem
without modification. (Note: our pairing heap API doesn't explicitly
support updating a key, so I think it would need to be done with
remove/add.) So we might as well just do that right now rather than
trying to fix the way the hash table is being used or trying to extend
the binaryheap API.

Of course, we should measure to be sure there aren't bad cases around
updating/removing a key, but I don't see a fundamental reason that it
should be worse.

Regards,
Jeff Davis

Re: Why is parula failing?

2024-04-09 Thread Robins Tharakan

On Wed, 10 Apr 2024 at 10:24, David Rowley  wrote:
> Master failed today for the first time since the compiler upgrade.
> Again reltuples == 48.

>From the buildfarm members page, parula seems to be the only aarch64 + gcc
13.2
combination today, and then I suspect whether this is about gcc v13.2
maturity on aarch64?

I'll try to upgrade one of the other aarch64s I have (massasauga or
snakefly) and
see if this is more about gcc 13.2 maturity on this architecture.
-
robins

Re: Speed up clean meson builds by ~25%

2024-04-09 Thread Tom Lane

Andres Freund  writes:
> On 2024-04-09 20:12:48 -0400, Tom Lane wrote:
>> In any case, this is all moot unless we can come to a new design for
>> how ecpg does its string-mashing.  Thoughts?

> Am I missing something, or is ecpg string handling almost comically
> inefficient? Building up strings in tiny increments, which then get mashed
> together to get slightly larger pieces, just to then be mashed together again?
> It's like an intentional allocator stress test.
> It's particularly absurd because in the end we just print those strings, after
> carefully assembling them...

It is that.  Here's what I'm thinking: probably 90% of what ecpg
does is to verify that a chunk of its input represents a valid bit
of SQL (or C) syntax and then emit a representation of that chunk.
Currently, that representation tends to case-normalize tokens and
smash inter-token whitespace and comments to a single space.
I propose though that neither of those behaviors is mission-critical,
or even all that desirable.  I think few users would complain if
ecpg preserved the input's casing and spacing and comments.

Given that definition, most of ecpg's productions (certainly just
about all the auto-generated ones) would simply need to return a
pointer and length describing a part of the input string.  There are
places where ecpg wants to insert some text it generates, and I think
it might need to re-order text in a few places, so we need a
production result representation that can cope with those cases ---
but if we can make "regurgitate the input" cases efficient, I think
we'll have licked the performance problem.

With that in mind, I wonder whether we couldn't make the simple
cases depend on bison's existing support for location tracking.
In which case, the actual actions for all those cases could be
default, achieving one of the goals you mention.

Obviously, this is not going to be a small lift, but it kind
of seems do-able.

regards, tom lane

Re: Why is parula failing?

2024-04-09 Thread David Rowley

On Tue, 9 Apr 2024 at 15:48, David Rowley  wrote:
> Still no partition_prune failures on master since the compiler version
> change.  There has been one [1] in REL_16_STABLE. I'm thinking it
> might be worth backpatching the partition_prune debug to REL_16_STABLE
> to see if we can learn anything new from it.

Master failed today for the first time since the compiler upgrade.
Again reltuples == 48.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=parula=2024-04-10%2000%3A27%3A02

David

Re: broken JIT support on Fedora 40

2024-04-09 Thread Thomas Munro

On Tue, Apr 9, 2024 at 10:05 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
> +   /* In assertion builds, run the LLVM verify pass. */
> +#ifdef USE_ASSERT_CHECKING
> +   LLVMPassBuilderOptionsSetVerifyEach(options, true);
> +#endif

Thanks, that seems nicer.  I think the question is whether it will
slow down build farm/CI/local meson test runs to a degree that exceeds
its value.  Another option would be to have some other opt-in macro,
like the existing #ifdef LLVM_PASS_DEBUG, for people who maintain
JIT-related stuff to turn on.

Supposing we go with USE_ASSERT_CHECKING, I have another question:

-   const char *nm = "llvm.lifetime.end.p0i8";
+   const char *nm = "llvm.lifetime.end.p0";

Was that a mistake, or did the mangling rules change in some version?
I don't currently feel inclined to go and test this on the ancient
versions we claim to support in back-branches.  Perhaps we should just
do this in master, and then it'd be limited to worrying about LLVM
versions 10-18 (see 820b5af7), which have the distinct advantage of
being available in package repositories for testing.  Or I suppose we
could back-patch, but only do it if LLVM_VERSION_MAJOR >= 10.  Or we
could do it unconditionally, and wait for ancient-LLVM build farm
animals to break if they're going to.

I pushed the illegal attribute fix though.  Thanks for the detective work!

(It crossed my mind that perhaps deform functions should have their
own template function, but if someone figures out that that's a good
idea, I think we'll *still* need that change just pushed.)

Re: Speed up clean meson builds by ~25%

2024-04-09 Thread Andres Freund

Hi,

On 2024-04-09 20:12:48 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > FWIW, just redefining mm_strdup() that way doesn't help much here either,
> > even with an affected compiler. The gain increases substantially after
> > simplifying unreserved_keyword etc to just use the default action.
> 
> Hm.
> 
> In any case, this is all moot unless we can come to a new design for
> how ecpg does its string-mashing.  Thoughts?

Tthere might be a quick-n-dirty way: We could make pgc.l return dynamically
allocated keywords.

Am I missing something, or is ecpg string handling almost comically
inefficient? Building up strings in tiny increments, which then get mashed
together to get slightly larger pieces, just to then be mashed together again?
It's like an intentional allocator stress test.

It's particularly absurd because in the end we just print those strings, after
carefully assembling them...

> I thought for a bit about not allocating strings as such, but just
> passing around pointers into the source text plus lengths, and
> reassembling the string data only at the end when we need to output it.
> Not sure how well that would work, but it could be a starting point.

I was wondering about something vaguely similar: Instead of concatenating
individual strings, append them to a stringinfo. The stringinfo can be reset
individually...

Greetings,

Andres Freund

Re: Speed up clean meson builds by ~25%

2024-04-09 Thread Tom Lane

Andres Freund  writes:
> On 2024-04-09 19:44:03 -0400, Tom Lane wrote:
>> As you did, I wasn't trying to get to a working result, so I didn't do
>> anything about removing all the free's or fixing the cast-away-const
>> warnings.  The result was disappointing though.  On my Mac laptop
>> (Apple clang version 15.0.0), the compile time for preproc.o went from
>> 6.7sec to 5.5sec.  Which is better, but not enough better to persuade
>> me to do all the janitorial work of restructuring ecpg's
>> string-slinging.  I think we haven't really identified the problem.

> With what level of optimization was that?  It kinda looks like their version
> might be from before the worst of the issue...

Just the autoconf-default -O2.

> FWIW, just redefining mm_strdup() that way doesn't help much here either,
> even with an affected compiler. The gain increases substantially after
> simplifying unreserved_keyword etc to just use the default action.

Hm.

In any case, this is all moot unless we can come to a new design for
how ecpg does its string-mashing.  Thoughts?

I thought for a bit about not allocating strings as such, but just
passing around pointers into the source text plus lengths, and
reassembling the string data only at the end when we need to output it.
Not sure how well that would work, but it could be a starting point.

regards, tom lane

Re: Speed up clean meson builds by ~25%

2024-04-09 Thread Andres Freund

Hi,

On 2024-04-09 19:44:03 -0400, Tom Lane wrote:
> I experimented with replacing mm_strdup() with
>
> #define mm_strdup(x) (x)
>
> As you did, I wasn't trying to get to a working result, so I didn't do
> anything about removing all the free's or fixing the cast-away-const
> warnings.  The result was disappointing though.  On my Mac laptop
> (Apple clang version 15.0.0), the compile time for preproc.o went from
> 6.7sec to 5.5sec.  Which is better, but not enough better to persuade
> me to do all the janitorial work of restructuring ecpg's
> string-slinging.  I think we haven't really identified the problem.

With what level of optimization was that?  It kinda looks like their version
might be from before the worst of the issue...

FWIW, just redefining mm_strdup() that way doesn't help much here either,
even with an affected compiler. The gain increases substantially after
simplifying unreserved_keyword etc to just use the default action.

I think having the non-default actions for those branches leaves you with a
similar issue, each of the actions just set a register, storing that and going
to the loop iteration is the same.

FWIW:
   clang-19 -O2
"plain"0m24.354s
mm_strdup redefined0m23.741s
+use default action0m14.218s

Greetings,

Andres Freund

Re: "an SQL" vs. "a SQL"

2024-04-09 Thread David Rowley

On Tue, 9 Apr 2024 at 16:18, David Rowley  wrote:
> There's just 1 instance of "a SQL" that crept into PG16 after
> d866f0374.  This probably means I'd be better off doing this in June a
> few weeks before branching...
>
> Patch attached.

Pushed.

David

Re: Fixup some StringInfo usages

2024-04-09 Thread David Rowley

On Tue, 9 Apr 2024 at 14:27, Tom Lane  wrote:
>
> David Rowley  writes:
> > Similar to f736e188c, I've attached a patch that fixes up a few
> > misusages of the StringInfo functions. These just swap one function
> > call for another function that is more suited to the use case.
>
> > I feel like it's a good idea to fix these soon while they're new
> > rather than wait N years and make backpatching bug fixes possibly
> > harder.
>
> +1.  Looks good in a quick eyeball scan.

Thank you to both of you for having a look. Pushed.

David

Re: post-freeze damage control

2024-04-09 Thread Michael Paquier

On Wed, Apr 10, 2024 at 09:29:38AM +1000, David Steele wrote:
> Even so, only keeping WAL for the last backup is a dangerous move in any
> case. Lots of things can happen to a backup (other than bugs in the
> software) so keeping WAL back to the last full (or for all backups) is
> always an excellent idea.

Yeah, that's an excellent practive, but is why I'm less worried for
this feature.  The docs at [1] caution about "not to remove earlier
backups if they might be needed when restoring later incremental
backups".  Like Alvaro said, should we insist a bit more about the WAL
retention part in this section of the docs, down to the last full
backup?

[1]: 
https://www.postgresql.org/docs/devel/continuous-archiving.html#BACKUP-INCREMENTAL-BACKUP
--
Michael

signature.asc
Description: PGP signature

Re: Speed up clean meson builds by ~25%

2024-04-09 Thread Tom Lane

Andres Freund  writes:
> On 2024-04-09 19:00:41 -0400, Tom Lane wrote:
>> Andres Freund  writes:
>>> Why are strduping all of these?

>> IIRC, the issue is that the mechanism for concatenating the tokens
>> back together frees the input strings

> Ah, that explains it - but also seems somewhat unnecessary.

I experimented with replacing mm_strdup() with

#define mm_strdup(x) (x)

As you did, I wasn't trying to get to a working result, so I didn't do
anything about removing all the free's or fixing the cast-away-const
warnings.  The result was disappointing though.  On my Mac laptop
(Apple clang version 15.0.0), the compile time for preproc.o went from
6.7sec to 5.5sec.  Which is better, but not enough better to persuade
me to do all the janitorial work of restructuring ecpg's
string-slinging.  I think we haven't really identified the problem.

As a comparison point, compiling gram.o on the same machine
takes 1.3sec.  So I am seeing a problem here, sure enough,
although not as bad as it is in some other clang versions.

regards, tom lane

Re: wal_consistemcy_checking clean on HEAD

2024-04-09 Thread Peter Geoghegan

On Tue, Apr 9, 2024 at 7:35 PM Michael Paquier  wrote:
> It's been on my TODO list to automate that in one of my buildfarm
> animals, and never got down to do it.  I've looked at the current
> animal fleet, and it looks that we don't have one yet.  Perhaps I've
> just missed something?

wal_consistency_checking is very useful in general. I find myself
using it fairly regularly.

That's probably why it's not finding anything now: most people working
on something that touches WAL already know that testing their patch
with wal_consistency_checking early is a good idea. Of course it also
wouldn't be a bad idea to have a BF animal for that, especially
because we already have BF animals that test things far more niche
than this.

-- 
Peter Geoghegan

wal_consistemcy_checking clean on HEAD

2024-04-09 Thread Michael Paquier

Hi all,

I have been doing some checks with page masking and WAL consistency
now that we are in feature freeze, and there are no failures on HEAD:
cd src/test/recovery/ && \
PG_TEST_EXTRA=wal_consistency_checking \
PROVE_TESTS=t/027_stream_regress.pl make check

It's been on my TODO list to automate that in one of my buildfarm
animals, and never got down to do it.  I've looked at the current
animal fleet, and it looks that we don't have one yet.  Perhaps I've
just missed something?

Thanks,
--
Michael


signature.asc
Description: PGP signature

Re: post-freeze damage control

2024-04-09 Thread David Steele


On 4/10/24 01:59, Andrey M. Borodin wrote:



On 9 Apr 2024, at 18:45, Alvaro Herrera  wrote:

Maybe we should explicitly advise users to not delete that WAL from
their archives, until pg_combinebackup is hammered a bit more.


As a backup tool maintainer, I always reference to out-of-the box Postgres 
tools as some bulletproof alternative.
I really would like to stick to this reputation and not discredit these tools.


+1.

Even so, only keeping WAL for the last backup is a dangerous move in any 
case. Lots of things can happen to a backup (other than bugs in the 
software) so keeping WAL back to the last full (or for all backups) is 
always an excellent idea.


Regards,
-David

Re: Speed up clean meson builds by ~25%

2024-04-09 Thread Andres Freund

Hi,

On 2024-04-09 19:00:41 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I think we need to do something about the compile time of this file, even 
> > with
> > gcc. Our main grammar already is an issue and stacking all the ecpg stuff on
> > top makes it considerably worse.
>
> Seems reasonable, if we can.
>
> > Why are strduping all of these?
>
> IIRC, the issue is that the mechanism for concatenating the tokens
> back together frees the input strings

Ah, that explains it - but also seems somewhat unnecessary.

> So that ought to dump core if you don't make all the productions
> return malloc'd strings.  How did you work around that?

I just tried to get to the point of understanding the reasons for slow
compilation, not to actually keep it working :). I.e. I didn't.

> (Maybe it'd be okay to just leak all the strings?)

Hm. The input to ecpg can be fairly large, I guess. And we have fun code like
cat_str(), which afaict is O(arguments^2) in its memory usage if we wouldn't
free?

Not immediately sure what the right path is.

Greetings,

Andres Freund

Re: Speed up clean meson builds by ~25%

2024-04-09 Thread Andres Freund

Hi,

On 2024-04-09 15:33:10 -0700, Andres Freund wrote:
> Which leads to a control flow graph with *many* incoming edges to the basic
> block containing the function call to mm_strdup(), triggering a normally
> harmless O(N^2) or such.

With clang-16 -O2 there is a basic block with 3904 incoming basic blocks. With
the hacked up preproc.y it's 2968. A 30% increase leading to a doubling of
runtime imo seems consistent with my theory of there being some ~quadratic
behaviour.

I suspect that this is also what's causing gram.c compilation to be fairly
slow, with both clang and gcc. There aren't as many pstrdup()s in gram.y as
the are mm_strdup() in preproc.y, but there still are many.

ISTM that there are many pstrdup()s that really make very little sense. I
think it's largely because there are many rules declared %type , which
prevents us from returning a string constant without a warning.

There may be (?) some rules that modify strings returned by subsidiary rules,
but if so, it can't be many.

Greetings,

Andres

Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-04-09 Thread Alexander Korotkov

On Mon, Apr 8, 2024 at 11:43 PM Dmitry Koval  wrote:
> Attached fix for the problems found by Alexander Lakhin.
>
> About grammar errors.
> Unfortunately, I don't know English well.
> Therefore, I plan (in the coming days) to show the text to specialists
> who perform technical translation of documentation.

Thank you.  I've pushed this fix with minor corrections from me.

--
Regards,
Alexander Korotkov

Re: Speed up clean meson builds by ~25%

2024-04-09 Thread Tom Lane

Andres Freund  writes:
> I think we need to do something about the compile time of this file, even with
> gcc. Our main grammar already is an issue and stacking all the ecpg stuff on
> top makes it considerably worse.

Seems reasonable, if we can.

> Why are strduping all of these?

IIRC, the issue is that the mechanism for concatenating the tokens
back together frees the input strings

static char *
cat2_str(char *str1, char *str2)
{
char * res_str= (char *)mm_alloc(strlen(str1) + strlen(str2) + 2);

strcpy(res_str, str1);
if (strlen(str1) != 0 && strlen(str2) != 0)
strcat(res_str, " ");
strcat(res_str, str2);
free(str1);  <--
free(str2);  <--
return res_str;
}

So that ought to dump core if you don't make all the productions
return malloc'd strings.  How did you work around that?

(Maybe it'd be okay to just leak all the strings?)

regards, tom lane

Re: Allow non-superuser to cancel superuser tasks.

2024-04-09 Thread Michael Paquier

On Wed, Apr 10, 2024 at 12:52:19AM +0300, Kirill Reshke wrote:
> On Tue, 9 Apr 2024 at 08:53, Michael Paquier  wrote:
>> The thing is that you cannot rely on a lookup of the backend type for
>> the error information, or you open yourself to letting the caller of
>> pg_cancel_backend or pg_terminate_backend know if a backend is
>> controlled by a superuser or if a backend is an autovacuum worker.
> 
> Good catch. Thanks.  I think we need to update the error message to not
> leak backend type info.

Yep, that's necessary I am afraid.

>> The choice of pg_signal_autovacuum is a bit inconsistent, as well,
>> because autovacuum workers operate like regular backends.  This name
>> can also be confused with the autovacuum launcher.
> 
> Ok. What would be a good choice? Is `pg_signal_autovacuum_worker` good
> enough?

Sounds fine to me.  Perhaps others have an opinion about that?
--
Michael


signature.asc
Description: PGP signature

Re: Speed up clean meson builds by ~25%

2024-04-09 Thread Andres Freund

Hi,

On 2024-04-09 17:13:52 +1200, Thomas Munro wrote:
> On Tue, Apr 9, 2024 at 5:01 PM Michael Paquier  wrote:
> > On Mon, Apr 08, 2024 at 12:23:56PM +0300, Nazir Bilal Yavuz wrote:
> > > On Mon, 8 Apr 2024 at 11:00, Alexander Lakhin  wrote:
> > >> As I wrote in [1], I didn't observe the issue with clang-18, so maybe it
> > >> is fixed already.
> > >> Perhaps it's worth rechecking...
> > >>
> > >> [1] 
> > >> https://www.postgresql.org/message-id/d2bf3727-bae4-3aee-65f6-caec2c4ebaa8%40gmail.com
> > >
> > > I had this problem on my local computer. My build times are:
> > >
> > > gcc: 20s
> > > clang-15: 24s
> > > clang-16: 105s
> > > clang-17: 111s
> > > clang-18: 25s
> >
> > Interesting.  A parallel build of ecpg shows similar numbers here:
> > clang-16: 101s
> > clang-17: 112s
> > clang-18: 14s
> > gcc: 10s
>
> I don't expect it to get fixed BTW, because it's present in 16.0.6,
> and .6 is the terminal release, if I understand their system
> correctly.  They're currently only doing bug fixes for 18, and even
> there not for much longer. Interesting that not everyone saw this at
> first, perhaps the bug arrived in a minor release that some people
> didn't have yet?  Or perhaps there is something special required to
> trigger it?

I think we need to do something about the compile time of this file, even with
gcc. Our main grammar already is an issue and stacking all the ecpg stuff on
top makes it considerably worse.

ISTM there's a bunch of pretty pointless stuff in the generated preproc.y,
which do seem to have some impact on compile time. E.g. a good bit of the file
is just stuff like

 reserved_keyword:
 ALL
 {
 $$ = mm_strdup("all");
}
...

Why are strduping all of these? We could instead just use the value of the
token, instead of forcing the compiler to generate branches for all individual
keywords etc.

I don't know off-hand if the keyword lookup machinery ends up with an
uppercase keyword, but if so, that'd be easy enough to change.

It actually looks to me like the many calls to mm_strdup() might actually be
what's driving clang nuts. I hacked up preproc.y to not need those calls for
  unreserved_keyword
  col_name_keyword
  type_func_name_keyword
  reserved_keyword
  bare_label_keyword
by removing the actions and defining those tokens to be of type str. There are
many more such calls that could be dealt with similarly.

That alone reduced compile times with
clang-16 -O1 from  18.268s to  12.516s
clang-16 -O2 from 345.188  to 158.084s
clang-19 -O2 from  26.018s to  15.200s

I suspect what is happening is that clang tries to optimize the number of
calls to mm_strdup(), by separating the argument setup from the function
call. Which leads to a control flow graph with *many* incoming edges to the
basic block containing the function call to mm_strdup(), triggering a normally
harmless O(N^2) or such.

Greetings,

Andres Freund

Re: Allow non-superuser to cancel superuser tasks.

2024-04-09 Thread Kirill Reshke

Hi, thanks for looking into this.

On Tue, 9 Apr 2024 at 08:53, Michael Paquier  wrote:

> On Mon, Apr 08, 2024 at 05:42:05PM +, Leung, Anthony wrote:
> > Are you suggesting that we check if the backend is B_AUTOVAC in
> > pg_cancel/ terminate_backend? That seems a bit unclean to me since
> > pg_cancel_backend & pg_cancel_backend does not access to the
> > procNumber to check the type of the backend.
> >
> > IMHO, we can keep SIGNAL_BACKEND_NOAUTOVACUUM but just improve the
> > errmsg / errdetail to not expose that the backend is an AV
> > worker. It'll also be helpful if you can suggest what errdetail we
> > should use here.
>
> The thing is that you cannot rely on a lookup of the backend type for
> the error information, or you open yourself to letting the caller of
> pg_cancel_backend or pg_terminate_backend know if a backend is
> controlled by a superuser or if a backend is an autovacuum worker.
>

Good catch. Thanks.  I think we need to update the error message to not
leak backend type info.

> The choice of pg_signal_autovacuum is a bit inconsistent, as well,
> because autovacuum workers operate like regular backends.  This name
> can also be confused with the autovacuum launcher.

Ok. What would be a good choice? Is `pg_signal_autovacuum_worker` good
enough?

Re: PostgreSQL 17 Release Management Team & Feature Freeze

2024-04-09 Thread Bruce Momjian

On Mon, Apr  8, 2024 at 10:41:17AM -0400, Robert Treat wrote:
> Unfortunately many humans are hardwired towards procrastination and
> last minute heroics (it's one reason why deadline driven development
> works even though in the long run it tends to be bad practice), and
> historically was one of the driving factors in why we started doing
> commitfests in the first place (you should have seen the mad dash of
> commits before we had that process), so ISTM that it's not completely
> avoidable...
> 
> That said, are you suggesting that the feature freeze deadline be
> random, and also held in secret by the RMT, only to be announced after
> the freeze time has passed? This feels weird, but might apply enough
> deadline related pressure while avoiding last minute shenanigans.

Committing code is a hard job, no question.  However, committers have to
give up the idea that they should wait for brilliant ideas before
finalizing patches.  If you come up with a better idea later, great, but
don't wait to finalize patches.

I used to write college papers much too late because I expected some
brilliant idea to come to me, and it rarely happened.  I learned to
write the paper with the ideas I had, and if I come up with a better
idea later, I can add it to the end.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: PostgreSQL 17 Release Management Team & Feature Freeze

2024-04-09 Thread Bruce Momjian

On Mon, Apr  8, 2024 at 09:32:14PM +0200, Jelte Fennema-Nio wrote:
> I'll sketch a situation: There's a big patch that some non-committer
> submitted that has been sitting on the mailinglist for 6 months or
> more, only being reviewed by other non-committers, which the submitter
> quickly addresses. Then in the final commit fest it is finally
> reviewed by a committer, and they say it requires significant changes.
> Right now, the submitter can decide to quickly address those changes,
> and hope to keep the attention of this committer, to hopefully get it
> merged before the deadline after probably a few more back and forths.
> But this new rule would mean that those significant changes would be a
> reason not to put it in the upcoming release. Which I expect would
> first of all really feel like a slap in the face to the submitter,
> because it's not their fault that those changes were only requested in
> the last commit fest. This would then likely result in the submitter
> not following up quickly (why make time right at this moment, if
> you're suddenly going to have another full year), which would then
> cause the committer to lose context of the patch and thus interest in
> the patch. And then finally getting into the exact same situation next
> year in the final commit fest, when some other committer didn't agree
> with the redesign of the first one and request a new one pushing it
> another year.

Yes, I can see this happening.  :-(

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: Issue with the PRNG used by Postgres

2024-04-09 Thread Andres Freund

Hi,

On 2024-04-08 22:52:09 -0700, Parag Paul wrote:
>  We have an interesting problem, where PG went to PANIC due to stuck
> spinlock case.
> On careful analysis and hours of trying to reproduce this(something that
> showed up in production after almost 2 weeks of stress run), I did some
> statistical analysis on the RNG generator that PG uses to create the
> backoff for the spin locks.

ISTM that the fix here is to not use a spinlock for whatever the contention is
on, rather than improve the RNG.

Greetings,

Andres Freund

Re: post-freeze damage control

2024-04-09 Thread Tom Lane

Alexander Korotkov  writes:
> On Tue, Apr 9, 2024 at 11:37 PM Tom Lane  wrote:
>> What exactly is the point of having a NodeTag in the struct though?
>> If you don't need it to be a valid Node, that seems pointless and
>> confusing.  We certainly have plenty of other lists that contain
>> plain structs without tags, so I don't buy that the List
>> infrastructure is making you do that.

> This code mixes Expr's and hash entries in the single list.  The point
> of having a NodeTag in the struct is the ability to distinguish them
> later.

If you're doing that, it really really ought to be a proper Node.
If nothing else, that would aid debugging by allowing the list
to be pprint'ed.

regards, tom lane

Re: removal of '{' from WORD_BREAKS

2024-04-09 Thread Tom Lane

Alexander Korotkov  writes:
> On Tue, Apr 9, 2024 at 6:18 PM Robert Haas  wrote:
>> I too felt uneasy about that commit, for the same reason. However,
>> there is a justification for the change in the commit message which is
>> not obviously wrong, namely that ":{?name} is the only psql syntax
>> using the '{' sign". And in fact, SQL basically doesn't use '{' for
>> anything, either.

True.

> FWIW, the default value of rl_basic_word_break_characters [1] has '{'
> but doesn't have '}'.  The documentation says that this should "break
> words for completion in Bash".  But I failed to find an explanation
> why this should be so for Bash.  As you correctly get, my idea was
> that our SQL isn't not heavily using '{' unlike Bash.

Yeah, there's no doubt that what we are currently using for
WORD_BREAKS has mostly been cargo-culted from Bash rather than having
any solid foundation in SQL syntax.  It works all right for us today
because we don't really try to complete anything in general SQL
expression contexts, so as long as we have whitespace and parens in
there we're more or less fine.

I wonder a bit why comma isn't in there, though.  As an example,
vacuum (full,fre
fails to complete "freeze", though it works fine with a space
after the comma.  I've not experimented, but it seems certain
that it'd behave better with comma in WORD_BREAKS.  Whether
that'd pessimize other behaviors, I dunno.

The longer-range concern that I have is that if we ever want to
complete stuff within expressions, I think we're going to need
all the valid operator characters to be in WORD_BREAKS.  And that
will be a problem for this patch, not because of '{' but because
of '?'.  So I'd be happier if the parsing were done in a way that
did not rely on '{' and '?' being treated as word characters.
But I've not looked into how hard that'd be.  In any case, it's
likely that expanding WORD_BREAKS like that would cause some other
problems that'd have to be fixed, so it's not very reasonable to
expect this patch to avoid a hypothetical future problem.

regards, tom lane

Re: Improve eviction algorithm in ReorderBuffer

2024-04-09 Thread Heikki Linnakangas


On 09/04/2024 21:04, Jeff Davis wrote:

On Fri, 2024-04-05 at 16:58 +0900, Masahiko Sawada wrote:

I have some further comments and I believe changes are required for
v17.


I also noticed that the simplehash is declared in binaryheap.h with
"SH_SCOPE extern", which seems wrong. Attached is a rough patch to
bring the declarations into binaryheap.c.

Note that the attached patch uses "SH_SCOPE static", which makes sense
to me in this case, but causes a bunch of warnings in gcc. I will post
separately about silencing that warning, but for now you can either
use:

SH_SCOPE static inline

which is probably fine, but will encourage the compiler to inline more
code, when not all callers even use the hash table. Alternatively, you
can do:

SH_SCOPE static pg_attribute_unused()

which looks a bit wrong to me, but seems to silence the warnings, and
lets the compiler decide whether or not to inline.

Also probably needs comment updates, etc.


Sorry I'm late to the party, I didn't pay attention to this thread 
earlier. But I wonder why this doesn't use the existing pairing heap 
implementation? I would assume that to be at least as good as the binary 
heap + hash table. And it's cheap to to insert to (O(1)), so we could 
probably remove the MAX_HEAP_TXN_COUNT_THRESHOLD, and always keep the 
heap up-to-date.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: post-freeze damage control

2024-04-09 Thread Alexander Korotkov

On Tue, Apr 9, 2024 at 11:42 PM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > On Tue, Apr 9, 2024 at 7:27 PM Tom Lane  wrote:
> >> Yeah, that's one of the reasons I'm dubious that the committed
> >> patch was ready.
>
> > While inventing this GUC, I was thinking more about avoiding
> > regressions rather than about unleashing the full power of this
> > optimization.  But now I see that that wasn't good enough.  And it was
> > definitely hasty to commit to this shape.  I apologize for this.
>
> > Tom, I think you are way more experienced in this codebase than me.
> > And, probably more importantly, more experienced in making decisions
> > for planner development.  If you see some way forward to polish this
> > post-commit, Andrei and I are ready to work hard on this with you.  If
> > you don't see (or don't think that's good), let's revert this.
>
> It wasn't ready to commit, and I think trying to fix it up post
> feature freeze isn't appropriate project management.  Let's revert
> it and work on it more in the v18 time frame.

Ok, let's do this.  I'd like to hear from you some directions for
further development of this patch if possible.

--
Regards,
Alexander Korotkov

Re: post-freeze damage control

2024-04-09 Thread Alexander Korotkov

On Tue, Apr 9, 2024 at 11:37 PM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > On Tue, Apr 9, 2024 at 5:12 AM Tom Lane  wrote:
> >> * OrClauseGroupKey is not a Node type, so why does it have
> >> a NodeTag?  I wonder what value will appear in that field,
> >> and what will happen if the struct is passed to any code
> >> that expects real Nodes.
>
> > I used that to put both not-subject-of-transform nodes together with
> > hash entries into the same list.  This is used to save the order of
> > clauses.  I think this is an important property, and I have already
> > expressed it in [1].
>
> What exactly is the point of having a NodeTag in the struct though?
> If you don't need it to be a valid Node, that seems pointless and
> confusing.  We certainly have plenty of other lists that contain
> plain structs without tags, so I don't buy that the List
> infrastructure is making you do that.

This code mixes Expr's and hash entries in the single list.  The point
of having a NodeTag in the struct is the ability to distinguish them
later.

--
Regards,
Alexander Korotkov

Re: post-freeze damage control

2024-04-09 Thread Tom Lane

Alexander Korotkov  writes:
> On Tue, Apr 9, 2024 at 7:27 PM Tom Lane  wrote:
>> Yeah, that's one of the reasons I'm dubious that the committed
>> patch was ready.

> While inventing this GUC, I was thinking more about avoiding
> regressions rather than about unleashing the full power of this
> optimization.  But now I see that that wasn't good enough.  And it was
> definitely hasty to commit to this shape.  I apologize for this.

> Tom, I think you are way more experienced in this codebase than me.
> And, probably more importantly, more experienced in making decisions
> for planner development.  If you see some way forward to polish this
> post-commit, Andrei and I are ready to work hard on this with you.  If
> you don't see (or don't think that's good), let's revert this.

It wasn't ready to commit, and I think trying to fix it up post
feature freeze isn't appropriate project management.  Let's revert
it and work on it more in the v18 time frame.

regards, tom lane

Re: post-freeze damage control

2024-04-09 Thread Tom Lane

Alexander Korotkov  writes:
> On Tue, Apr 9, 2024 at 5:12 AM Tom Lane  wrote:
>> * OrClauseGroupKey is not a Node type, so why does it have
>> a NodeTag?  I wonder what value will appear in that field,
>> and what will happen if the struct is passed to any code
>> that expects real Nodes.

> I used that to put both not-subject-of-transform nodes together with
> hash entries into the same list.  This is used to save the order of
> clauses.  I think this is an important property, and I have already
> expressed it in [1].

What exactly is the point of having a NodeTag in the struct though?
If you don't need it to be a valid Node, that seems pointless and
confusing.  We certainly have plenty of other lists that contain
plain structs without tags, so I don't buy that the List
infrastructure is making you do that.

regards, tom lane

Re: macOS Ventura won't generate core dumps

2024-04-09 Thread Tom Lane

Robert Haas  writes:
> On Tue, Apr 9, 2024 at 3:44 PM Tom Lane  wrote:
>> The usual gotchas apply: you need to have started the postmaster
>> under "ulimit -c unlimited", and the /cores directory has to be
>> writable by whatever user the postmaster is running as.  I have
>> occasionally seen system updates reduce the privileges on /cores,
>> although not recently.

> Interesting. I'm on Ventura 13.6.2. I think I've checked all of the
> stuff you mention, but I'll do some more investigation.

Huh, that's odd.  One idea that comes to mind is that the core files
are frickin' large, for me typically around 3.5GB-4GB even for a
postmaster running with default shared memory sizes.  (I don't know
why, although it seems to me they were less ridiculous a few years
ago.)  Is it possible you're running out of space, or hitting a
quota of some sort?

regards, tom lane

Re: post-freeze damage control

2024-04-09 Thread Alexander Korotkov

On Tue, Apr 9, 2024 at 7:27 PM Tom Lane  wrote:
> Peter Geoghegan  writes:
> > On Mon, Apr 8, 2024 at 10:12 PM Tom Lane  wrote:
> >> I don't know that I'd call it scary exactly, but I do think it
> >> was premature.  A week ago there was no consensus that it was
> >> ready to commit, but Alexander pushed it (or half of it, anyway)
> >> despite that.
>
> > Some of the most compelling cases for the transformation will involve
> > path keys. If the transformation enables the optimizer to build a
> > plain index scan (or index-only scan) with useful path keys, then that
> > might well lead to a far superior plan compared to what's possible
> > with BitmapOrs.
>
> I did not say it isn't a useful thing to have.  I said the patch
> did not appear ready to go in.
>
> > I understand that it'll still be possible to use OR expression
> > evaluation in such cases, without applying the transformation (via
> > filter quals), so in principle you don't need the transformation to
> > get an index scan that can (say) terminate the scan early due to the
> > presence of an "ORDER BY ... LIMIT n". But I suspect that that won't
> > work out much of the time, because the planner will believe (rightly
> > or wrongly) that the filter quals will incur too many heap page
> > accesses.
>
> That's probably related to the fact that we don't have a mechanism
> for evaluating non-indexed quals against columns that are retrievable
> from the index.  We really oughta work on getting that done.  But
> I've been keeping a side eye on the work to unify plain and index-only
> scans, because that's going to touch a lot of the same code so it
> doesn't seem profitable to pursue those goals in parallel.
>
> > Another problem (at least as I see it) with the new
> > or_to_any_transform_limit GUC is that its design seems to have nothing
> > to say about the importance of these sorts of cases. Most of these
> > cases will only have 2 or 3 constants, just because that's what's most
> > common in general.
>
> Yeah, that's one of the reasons I'm dubious that the committed
> patch was ready.

While inventing this GUC, I was thinking more about avoiding
regressions rather than about unleashing the full power of this
optimization.  But now I see that that wasn't good enough.  And it was
definitely hasty to commit to this shape.  I apologize for this.

Tom, I think you are way more experienced in this codebase than me.
And, probably more importantly, more experienced in making decisions
for planner development.  If you see some way forward to polish this
post-commit, Andrei and I are ready to work hard on this with you.  If
you don't see (or don't think that's good), let's revert this.

--
Regards,
Alexander Korotkov

Re: post-freeze damage control

2024-04-09 Thread Alexander Korotkov

On Tue, Apr 9, 2024 at 8:37 AM Andrei Lepikhov
 wrote:
> On 9/4/2024 09:12, Tom Lane wrote:
> > I have another one that I'm not terribly happy about:
> >
> >  Author: Alexander Korotkov 
> >  Branch: master [72bd38cc9] 2024-04-08 01:27:52 +0300
> >
> >  Transform OR clauses to ANY expression
> Because I'm primary author of the idea, let me answer.
> >
> > I don't know that I'd call it scary exactly, but I do think it
> > was premature.  A week ago there was no consensus that it was
> > ready to commit, but Alexander pushed it (or half of it, anyway)
> > despite that.  A few concrete concerns:
> >
> > * Yet another planner GUC.  Do we really need or want that?
> It is the most interesting question here. Looking around planner
> features designed but not applied for the same reason because they can
> produce suboptimal plans in corner cases, I think about inventing
> flag-type parameters and hiding some features that work better for
> different load types under such flagged parameters.

Yes, I have spotted this transformation could cause a bitmap scan plan
regressions in [1] and [2].  Fixing that required to treat ANY the
same as OR for bitmap scans.  Andrei implemented that in [3], but that
increases planning complexity and elimitates significant part of the
advantages of OR-to-ANY transformation.

Links.
1. 
https://www.postgresql.org/message-id/CAPpHfduJtO0s9E%3DSHUTzrCD88BH0eik0UNog1_q3XBF2wLmH6g%40mail.gmail.com
2. 
https://www.postgresql.org/message-id/CAPpHfdtSXxhdv3mLOLjEewGeXJ%2BFtfhjqodn1WWuq5JLsKx48g%40mail.gmail.com
3. 
https://www.postgresql.org/message-id/6d27d752-db0b-4cac-9843-6ba3dd7a1e94%40postgrespro.ru

--
Regards,
Alexander Korotkov

Re: macOS Ventura won't generate core dumps

2024-04-09 Thread Robert Haas

On Tue, Apr 9, 2024 at 3:44 PM Tom Lane  wrote:
> Works for me on Sonoma 14.4.1 and Ventura 13.6.6, and has done
> in many versions before those.
>
> The usual gotchas apply: you need to have started the postmaster
> under "ulimit -c unlimited", and the /cores directory has to be
> writable by whatever user the postmaster is running as.  I have
> occasionally seen system updates reduce the privileges on /cores,
> although not recently.

Interesting. I'm on Ventura 13.6.2. I think I've checked all of the
stuff you mention, but I'll do some more investigation.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: simplehash.h: "SH_SCOPE static" causes warnings

2024-04-09 Thread Robert Haas

On Tue, Apr 9, 2024 at 3:33 PM Jeff Davis  wrote:
> Should I go ahead and commit something like that now, or hold it until
> the other thread concludes, or hold it until the July CF?

I think it's fine to commit it now if it makes it usefully easier to
fix an open item, and otherwise it should wait until next cycle.

But I also wonder if we shouldn't just keep using static inline
everywhere. I'm guessing if we do this people are just going to make
random decisions about which one to use every time this comes up.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: simplehash.h: "SH_SCOPE static" causes warnings

2024-04-09 Thread Robert Haas

On Tue, Apr 9, 2024 at 3:30 PM Jeff Davis  wrote:
> Pages of warnings is not ideal, though. We should either support
> "SH_SCOPE static", or have some kind of useful #error that makes it
> clear that we don't support it (and/or don't think it's a good idea).

Fair.

> >  I'm not sure that I like the idea of just ignoring the
> > warnings, for fear that the compiler might not actually remove the
> > code for the unused functions from the resulting binary. But I'm not
> > an expert in this area either, so maybe I'm wrong.
>
> In a simple "hello world" test with an unreferenced static function, it
> doesn't seem to be a problem at -O2. I suppose it could be with some
> compiler somewhere, or perhaps in a more complex scenario, but it would
> seem strange to me.

OK.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: post-freeze damage control

2024-04-09 Thread Alexander Korotkov

On Tue, Apr 9, 2024 at 5:12 AM Tom Lane  wrote:
> * OrClauseGroupKey is not a Node type, so why does it have
> a NodeTag?  I wonder what value will appear in that field,
> and what will happen if the struct is passed to any code
> that expects real Nodes.

I used that to put both not-subject-of-transform nodes together with
hash entries into the same list.  This is used to save the order of
clauses.  I think this is an important property, and I have already
expressed it in [1].  That could be achieved without adding NodeTag to
hash entries, but that would require a level of indirection.  It's not
passed to code that expects real Nodes, it doesn't go to anything
except lists.

Links.
1. 
https://www.postgresql.org/message-id/CAPpHfdutHt31sdt2rfU%3D4fsDMWxf6tvtnHARgCzLY2Tf21%2Bfgw%40mail.gmail.com

--
Regards,
Alexander Korotkov

Re: macOS Ventura won't generate core dumps

2024-04-09 Thread Tom Lane

Robert Haas  writes:
> On Tue, Apr 9, 2024 at 2:37 PM Tom Lane  wrote:
>> Still works for me, at least on machines where I have SIP turned
>> off.  Admittedly, Apple's busy making that a less and less desirable
>> choice.

> What exact version are you running?

Works for me on Sonoma 14.4.1 and Ventura 13.6.6, and has done
in many versions before those.

The usual gotchas apply: you need to have started the postmaster
under "ulimit -c unlimited", and the /cores directory has to be
writable by whatever user the postmaster is running as.  I have
occasionally seen system updates reduce the privileges on /cores,
although not recently.

regards, tom lane

Re: simplehash.h: "SH_SCOPE static" causes warnings

2024-04-09 Thread Jeff Davis

Hi,

On Tue, 2024-04-09 at 11:56 -0700, Andres Freund wrote:
> FWIW, with just about any modern-ish compiler just using "inline"
> doesn't
> actually force inlining, it just changes the cost model to make it
> more
> likely.

OK.

In the linked thread, I didn't see a good reason to encourage the
compiler to inline the code. Only one caller uses the hash table, so my
instinct would be that the code for maniuplating it should not be
inlined. But "extern" (which is the scope now) is certainly not right,
so "static" made the most sense to me.

> 
> I'm not opposed. I'd however at least add a comment explaining why
> this is
> being used. Arguably it doesn't make sense to add it to *_create(),
> as without
> that there really isn't a point in having a simplehash instantiation.
> Might
> make it slightly easier to notice obsoleted uses of simplehash.

That's a good idea that preserves some utility for the warnings.

Should I go ahead and commit something like that now, or hold it until
the other thread concludes, or hold it until the July CF?

Regards,
Jeff Davis

Re: simplehash.h: "SH_SCOPE static" causes warnings

2024-04-09 Thread Jeff Davis

On Tue, 2024-04-09 at 14:49 -0400, Robert Haas wrote:
> Hmm. I'm pretty sure that I've run into this problem, but I concluded
> that I should use either "static inline" or "extern" and didn't think
> any more of it.

Pages of warnings is not ideal, though. We should either support
"SH_SCOPE static", or have some kind of useful #error that makes it
clear that we don't support it (and/or don't think it's a good idea).

>  I'm not sure that I like the idea of just ignoring the
> warnings, for fear that the compiler might not actually remove the
> code for the unused functions from the resulting binary. But I'm not
> an expert in this area either, so maybe I'm wrong.

In a simple "hello world" test with an unreferenced static function, it
doesn't seem to be a problem at -O2. I suppose it could be with some
compiler somewhere, or perhaps in a more complex scenario, but it would
seem strange to me.

Regards,
Jeff Davis

Re: post-freeze damage control

2024-04-09 Thread Alexander Korotkov

On Tue, Apr 9, 2024 at 8:55 AM Tom Lane  wrote:
> Andrei Lepikhov  writes:
> > On 9/4/2024 09:12, Tom Lane wrote:
> >> I have another one that I'm not terribly happy about:
> >> Author: Alexander Korotkov 
> >> Branch: master [72bd38cc9] 2024-04-08 01:27:52 +0300
> >> Transform OR clauses to ANY expression
>
> >> * What the medical community would call off-label usage of
> >> query jumbling.  I'm not sure this is even correct as-used,
> >> and for sure it's using that code for something never intended.
> >> Nor is the added code adequately (as in, at all) documented.
>
> > I agree with documentation and disagree with critics on the expression
> > jumbling. It was introduced in the core. Why don't we allow it to be
> > used to speed up machinery with some hashing?
>
> I would back up from that a good deal: why do we need to hash here in
> the first place?  There's no evidence I'm aware of that it's needful
> from a performance standpoint.

I think the feature is aimed to deal with large OR lists.  I've seen a
significant degradation on 1 or-clause-groups.  That might seem
like awfully a lot, but actually it's not unachievable in generated
queries.

Links.
1. 
https://www.postgresql.org/message-id/CAPpHfduJtO0s9E%3DSHUTzrCD88BH0eik0UNog1_q3XBF2wLmH6g%40mail.gmail.com

--
Regards,
Alexander Korotkov

Re: simplehash.h: "SH_SCOPE static" causes warnings

2024-04-09 Thread Andres Freund

Hi,

On 2024-04-09 11:10:15 -0700, Jeff Davis wrote:
> The reason I'm suggesting it there is because the hash table is used
> only for the indexed binary heap, not an ordinary binary heap, so I'd
> like to leave it up to the compiler whether to do any inlining or not.

FWIW, with just about any modern-ish compiler just using "inline" doesn't
actually force inlining, it just changes the cost model to make it more
likely.

> If someone thinks the attached patch is a good change to commit now,
> please let me know. Otherwise, I'll recommend "static inline" in the
> above thread and leave the attached patch to be considered for v18.

I'm not opposed. I'd however at least add a comment explaining why this is
being used. Arguably it doesn't make sense to add it to *_create(), as without
that there really isn't a point in having a simplehash instantiation. Might
make it slightly easier to notice obsoleted uses of simplehash.

Greetings,

Andres Freund

Re: simplehash.h: "SH_SCOPE static" causes warnings

2024-04-09 Thread Robert Haas

On Tue, Apr 9, 2024 at 2:10 PM Jeff Davis  wrote:
> If using "SH_SCOPE static" with simplehash.h, it causes a bunch of
> warnings about functions that are defined but not used. It's simple
> enough to fix by appending pg_attribute_unused() to the declarations
> (attached).

Hmm. I'm pretty sure that I've run into this problem, but I concluded
that I should use either "static inline" or "extern" and didn't think
any more of it. I'm not sure that I like the idea of just ignoring the
warnings, for fear that the compiler might not actually remove the
code for the unused functions from the resulting binary. But I'm not
an expert in this area either, so maybe I'm wrong.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: macOS Ventura won't generate core dumps

2024-04-09 Thread Robert Haas

On Tue, Apr 9, 2024 at 2:37 PM Tom Lane  wrote:
> Robert Haas  writes:
> > At least not for me. According to various things I found on the
> > Internet, it's now required that you codesign your binaries and give
> > them an entitlement in order to generate core dumps:
>
> Still works for me, at least on machines where I have SIP turned
> off.  Admittedly, Apple's busy making that a less and less desirable
> choice.

What exact version are you running?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: macOS Ventura won't generate core dumps

2024-04-09 Thread Tom Lane

Robert Haas  writes:
> At least not for me. According to various things I found on the
> Internet, it's now required that you codesign your binaries and give
> them an entitlement in order to generate core dumps:

Still works for me, at least on machines where I have SIP turned
off.  Admittedly, Apple's busy making that a less and less desirable
choice.

> Unless I'm missing something, this is positively rage-inducing.

I don't disagree.

regards, tom lane

Re: removal of '{' from WORD_BREAKS

2024-04-09 Thread Alexander Korotkov

On Tue, Apr 9, 2024 at 6:18 PM Robert Haas  wrote:
> On Thu, Apr 4, 2024 at 10:38 PM Michael Paquier  wrote:
> > > It kind of looks like a libedit bug, but maybe we should dig more
> > > deeply.  I felt itchy about 927332b95e77 removing '{' from the
> > > WORD_BREAKS set, and wondered exactly how that would change readline's
> > > behavior.  But even if that somehow accounts for the extra backslash
> > > before '{', it's not clear how it could lead to '?' and '}' also
> > > getting backslashed.
> >
> > I don't have a clear idea, either.  I also feel uneasy about
> > 927332b95e77 and its change of WORD_BREAKS, but this has the smell
> > of a bug from an outdated libedit version.
>
> I too felt uneasy about that commit, for the same reason. However,
> there is a justification for the change in the commit message which is
> not obviously wrong, namely that ":{?name} is the only psql syntax
> using the '{' sign". And in fact, SQL basically doesn't use '{' for
> anything, either. We do see { showing up inside of quoted strings, for
> arrays or JSON, but I would guess that the word-break characters
> aren't going to affect behavior within a quoted string. So it still
> seems like it should be OK? Another thing that makes me think that my
> unease may be unfounded is that the matching character '}' isn't in
> WORD_BREAKS either, and I would have thought that if we needed one
> we'd need both.

FWIW, the default value of rl_basic_word_break_characters [1] has '{'
but doesn't have '}'.  The documentation says that this should "break
words for completion in Bash".  But I failed to find an explanation
why this should be so for Bash.  As you correctly get, my idea was
that our SQL isn't not heavily using '{' unlike Bash.

> But does anyone else have a more specific reason for thinking that
> this might be a problem?

I don't particularly object against reverting this commit, but I think
we should get to the bottom of this first.  Otherwise there is no
warranty to not run into the same problem again.

Links.
1. 
https://tiswww.case.edu/php/chet/readline/readline.html#index-rl_005fbasic_005fword_005fbreak_005fcharacters

--
Regards,
Alexander Korotkov

simplehash.h: "SH_SCOPE static" causes warnings

2024-04-09 Thread Jeff Davis

If using "SH_SCOPE static" with simplehash.h, it causes a bunch of
warnings about functions that are defined but not used. It's simple
enough to fix by appending pg_attribute_unused() to the declarations
(attached).

There are currently no callers that use "SH_SCOPE static", but I'm
suggesting its use in the thread below as a cleanup to a recently-
committed feature:

https://www.postgresql.org/message-id/791d98f474e518387d09eb390b8a12f265d130cc.ca...@j-davis.com

The reason I'm suggesting it there is because the hash table is used
only for the indexed binary heap, not an ordinary binary heap, so I'd
like to leave it up to the compiler whether to do any inlining or not.

If someone thinks the attached patch is a good change to commit now,
please let me know. Otherwise, I'll recommend "static inline" in the
above thread and leave the attached patch to be considered for v18.

Regards,
Jeff Davis

From 198235448a009a46812eac4854d760af76c85139 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Tue, 9 Apr 2024 10:42:05 -0700
Subject: [PATCH v1 1/2] simplehash.h: declare with pg_attribute_unused().

If SH_SCOPE is static, avoid warnings for unused functions.
---
 src/include/lib/simplehash.h | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 3e1b1f9461..7d93d68e93 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -188,62 +188,62 @@ typedef struct SH_ITERATOR
 /* externally visible function prototypes */
 #ifdef SH_RAW_ALLOCATOR
 /* _hash _create(uint32 nelements, void *private_data) */
-SH_SCOPE	SH_TYPE *SH_CREATE(uint32 nelements, void *private_data);
+SH_SCOPE	SH_TYPE *SH_CREATE(uint32 nelements, void *private_data) pg_attribute_unused();
 #else
 /*
  * _hash _create(MemoryContext ctx, uint32 nelements,
  * void *private_data)
  */
 SH_SCOPE	SH_TYPE *SH_CREATE(MemoryContext ctx, uint32 nelements,
-			   void *private_data);
+			   void *private_data) pg_attribute_unused();
 #endif
 
 /* void _destroy(_hash *tb) */
-SH_SCOPE void SH_DESTROY(SH_TYPE * tb);
+SH_SCOPE void SH_DESTROY(SH_TYPE * tb) pg_attribute_unused();
 
 /* void _reset(_hash *tb) */
-SH_SCOPE void SH_RESET(SH_TYPE * tb);
+SH_SCOPE void SH_RESET(SH_TYPE * tb) pg_attribute_unused();
 
 /* void _grow(_hash *tb, uint64 newsize) */
-SH_SCOPE void SH_GROW(SH_TYPE * tb, uint64 newsize);
+SH_SCOPE void SH_GROW(SH_TYPE * tb, uint64 newsize) pg_attribute_unused();
 
 /*  *_insert(_hash *tb,  key, bool *found) */
-SH_SCOPE	SH_ELEMENT_TYPE *SH_INSERT(SH_TYPE * tb, SH_KEY_TYPE key, bool *found);
+SH_SCOPE	SH_ELEMENT_TYPE *SH_INSERT(SH_TYPE * tb, SH_KEY_TYPE key, bool *found) pg_attribute_unused();
 
 /*
  *  *_insert_hash(_hash *tb,  key, uint32 hash,
  *   bool *found)
  */
 SH_SCOPE	SH_ELEMENT_TYPE *SH_INSERT_HASH(SH_TYPE * tb, SH_KEY_TYPE key,
-			uint32 hash, bool *found);
+			uint32 hash, bool *found) pg_attribute_unused();
 
 /*  *_lookup(_hash *tb,  key) */
-SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP(SH_TYPE * tb, SH_KEY_TYPE key);
+SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP(SH_TYPE * tb, SH_KEY_TYPE key) pg_attribute_unused();
 
 /*  *_lookup_hash(_hash *tb,  key, uint32 hash) */
 SH_SCOPE	SH_ELEMENT_TYPE *SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key,
-			uint32 hash);
+			uint32 hash) pg_attribute_unused();
 
 /* void _delete_item(_hash *tb,  *entry) */
-SH_SCOPE void SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry);
+SH_SCOPE void SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry) pg_attribute_unused();
 
 /* bool _delete(_hash *tb,  key) */
-SH_SCOPE bool SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key);
+SH_SCOPE bool SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key) pg_attribute_unused();
 
 /* void _start_iterate(_hash *tb, _iterator *iter) */
-SH_SCOPE void SH_START_ITERATE(SH_TYPE * tb, SH_ITERATOR * iter);
+SH_SCOPE void SH_START_ITERATE(SH_TYPE * tb, SH_ITERATOR * iter) pg_attribute_unused();
 
 /*
  * void _start_iterate_at(_hash *tb, _iterator *iter,
  *  uint32 at)
  */
-SH_SCOPE void SH_START_ITERATE_AT(SH_TYPE * tb, SH_ITERATOR * iter, uint32 at);
+SH_SCOPE void SH_START_ITERATE_AT(SH_TYPE * tb, SH_ITERATOR * iter, uint32 at) pg_attribute_unused();
 
 /*  *_iterate(_hash *tb, _iterator *iter) */
-SH_SCOPE	SH_ELEMENT_TYPE *SH_ITERATE(SH_TYPE * tb, SH_ITERATOR * iter);
+SH_SCOPE	SH_ELEMENT_TYPE *SH_ITERATE(SH_TYPE * tb, SH_ITERATOR * iter) pg_attribute_unused();
 
 /* void _stat(_hash *tb */
-SH_SCOPE void SH_STAT(SH_TYPE * tb);
+SH_SCOPE void SH_STAT(SH_TYPE * tb) pg_attribute_unused();
 
 #endif			/* SH_DECLARE */
 
-- 
2.34.1

Re: Improve eviction algorithm in ReorderBuffer

2024-04-09 Thread Jeff Davis

On Fri, 2024-04-05 at 16:58 +0900, Masahiko Sawada wrote:
> > I have some further comments and I believe changes are required for
> > v17.

I also noticed that the simplehash is declared in binaryheap.h with
"SH_SCOPE extern", which seems wrong. Attached is a rough patch to
bring the declarations into binaryheap.c.

Note that the attached patch uses "SH_SCOPE static", which makes sense
to me in this case, but causes a bunch of warnings in gcc. I will post
separately about silencing that warning, but for now you can either
use:

   SH_SCOPE static inline

which is probably fine, but will encourage the compiler to inline more
code, when not all callers even use the hash table. Alternatively, you
can do:

   SH_SCOPE static pg_attribute_unused()

which looks a bit wrong to me, but seems to silence the warnings, and
lets the compiler decide whether or not to inline.

Also probably needs comment updates, etc.

Regards,
Jeff Davis

From 08dcf21646e4ded22b10fd0ed536d3bbf6fc1328 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Tue, 9 Apr 2024 10:45:00 -0700
Subject: [PATCH v1] binaryheap: move hash table out of header into
 binaryheap.c.

Commit b840508644 declared the internal hash table in the header with
scope "extern", which was unnecessary.
---
 src/common/binaryheap.c  | 15 ++-
 src/include/lib/binaryheap.h | 25 +
 2 files changed, 15 insertions(+), 25 deletions(-)

diff --git a/src/common/binaryheap.c b/src/common/binaryheap.c
index c20ed50acc..2501dad6f2 100644
--- a/src/common/binaryheap.c
+++ b/src/common/binaryheap.c
@@ -25,6 +25,18 @@
 #include "common/hashfn.h"
 #include "lib/binaryheap.h"
 
+/*
+ * Struct for a hash table element to store the node's index in the bh_nodes
+ * array.
+ */
+typedef struct bh_nodeidx_entry
+{
+	bh_node_type key;
+	int			index;			/* entry's index within the node array */
+	char		status;			/* hash status */
+	uint32		hash;			/* hash values (cached) */
+} bh_nodeidx_entry;
+
 /*
  * Define parameters for hash table code generation. The interface is *also*
  * declared in binaryheap.h (to generate the types, which are externally
@@ -37,12 +49,13 @@
 #define SH_HASH_KEY(tb, key) \
 	hash_bytes((const unsigned char *) , sizeof(bh_node_type))
 #define SH_EQUAL(tb, a, b) (memcmp(, , sizeof(bh_node_type)) == 0)
-#define SH_SCOPE extern
+#define SH_SCOPE static
 #ifdef FRONTEND
 #define SH_RAW_ALLOCATOR pg_malloc0
 #endif
 #define SH_STORE_HASH
 #define SH_GET_HASH(tb, a) a->hash
+#define SH_DECLARE
 #define SH_DEFINE
 #include "lib/simplehash.h"
 
diff --git a/src/include/lib/binaryheap.h b/src/include/lib/binaryheap.h
index 4c1a1bb274..8b47132fc3 100644
--- a/src/include/lib/binaryheap.h
+++ b/src/include/lib/binaryheap.h
@@ -29,29 +29,6 @@ typedef Datum bh_node_type;
  */
 typedef int (*binaryheap_comparator) (bh_node_type a, bh_node_type b, void *arg);
 
-/*
- * Struct for a hash table element to store the node's index in the bh_nodes
- * array.
- */
-typedef struct bh_nodeidx_entry
-{
-	bh_node_type key;
-	int			index;			/* entry's index within the node array */
-	char		status;			/* hash status */
-	uint32		hash;			/* hash values (cached) */
-} bh_nodeidx_entry;
-
-/* Define parameters necessary to generate the hash table interface. */
-#define SH_PREFIX bh_nodeidx
-#define SH_ELEMENT_TYPE bh_nodeidx_entry
-#define SH_KEY_TYPE bh_node_type
-#define SH_SCOPE extern
-#ifdef FRONTEND
-#define SH_RAW_ALLOCATOR pg_malloc0
-#endif
-#define SH_DECLARE
-#include "lib/simplehash.h"
-
 /*
  * binaryheap
  *
@@ -76,7 +53,7 @@ typedef struct binaryheap
 	 * node's index in bh_nodes. This enables the caller to perform
 	 * binaryheap_remove_node_ptr(), binaryheap_update_up/down in O(log n).
 	 */
-	bh_nodeidx_hash *bh_nodeidx;
+	struct bh_nodeidx_hash *bh_nodeidx;
 } binaryheap;
 
 extern binaryheap *binaryheap_allocate(int num_nodes,
-- 
2.34.1

Re: Annoying build warnings from latest Apple toolchain

2024-04-09 Thread Andres Freund

Hi,

Looks like I forgot to update the thread to note that I finally committed the
remaining warning fixes. I had already fixed a bunch of others in upstream
meson.

commit a3da95deee38ee067b0bead639c830eacbe894d5
Author: Andres Freund 
Date:   2024-03-13 01:40:53 -0700

meson: macos: Avoid warnings on Sonoma

Greetings,

Andres Freund

Re: macOS Ventura won't generate core dumps

2024-04-09 Thread Andres Freund

Hi,

On 2024-04-09 13:35:51 -0400, Robert Haas wrote:
> Now, if DYLD_* is ignored, then our regression tests won't work
> properly. But if core dumps are not enabled, then how am I supposed to
> debug things that can only be debugged with a core dump?

FWIW, I posted a patch a while back to make meson builds support relative
rpaths on some platforms, including macos. I.e. each library/binary finds the
location of the needed libraries relative to its own location. That gets rid
of the need to use DYLD_ at all, because the temporary install for the tests
can find the library location without a problem.

Perhaps it's worth picking that up again?

https://github.com/anarazel/postgres/tree/meson-rpath
https://github.com/anarazel/postgres/commit/46f1963fee7525c3cc3837ef8423cbf6cb08d10a

Greetings,

Andres Freund

macOS Ventura won't generate core dumps

2024-04-09 Thread Robert Haas

At least not for me. According to various things I found on the
Internet, it's now required that you codesign your binaries and give
them an entitlement in order to generate core dumps:

https://nasa.github.io/trick/howto_guides/How-to-dump-core-file-on-MacOS.html

But according to previous on-list discussion, code-signing a binary
means that DYLD_* will be ignored:

http://postgr.es/m/920451.1634265...@sss.pgh.pa.us

Now, if DYLD_* is ignored, then our regression tests won't work
properly. But if core dumps are not enabled, then how am I supposed to
debug things that can only be debugged with a core dump?

Unless I'm missing something, this is positively rage-inducing. It's
reasonable, on Apple's part, to want to install more secure defaults,
but having no practical way of overriding the defaults is not
reasonable. And by "practical," I mean ideally (a) doesn't require a
macOS-specific patch to the PostgreSQL source repository but at least
(b) can be done somehow without breaking other important things that
also need to work.

Anyone have any ideas?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Can't find not null constraint, but \d+ shows that

2024-04-09 Thread Alvaro Herrera

On 2024-Mar-29, Tender Wang wrote:

> I think aboved case can explain what's meaning about comments in
> dropconstraint_internal.
> But here, in RemoveConstraintById() , we only care about primary key case,
> so NOT NULL is better to removed from comments.

Actually, I think it's better if all the resets of attnotnull occur in
RemoveConstraintById, for both types of constraints; we would keep that
block in dropconstraint_internal only to raise errors in the cases where
the constraint is protecting a replica identity or a generated column.
Something like the attached, perhaps, may need more polish.

I'm not really sure about the business of adding a new pass value
-- it's clear that things don't work if we don't do *something* -- I'm
just not sure if this is the something that we want to do.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Tiene valor aquel que admite que es un cobarde" (Fernandel)
>From 397d424cd64020bcb4aff219deefca7fc7b6f88d Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 9 Apr 2024 19:18:57 +0200
Subject: [PATCH] Correctly reset attnotnull when constraints dropped
 indirectly

---
 src/backend/catalog/pg_constraint.c   | 105 +-
 src/backend/commands/tablecmds.c  |  67 ++
 src/test/regress/expected/constraints.out |  11 +++
 src/test/regress/sql/constraints.sql  |   7 ++
 4 files changed, 150 insertions(+), 40 deletions(-)

diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c
index 604280d322..17c7a2d0f3 100644
--- a/src/backend/catalog/pg_constraint.c
+++ b/src/backend/catalog/pg_constraint.c
@@ -19,6 +19,7 @@
 #include "access/htup_details.h"
 #include "access/sysattr.h"
 #include "access/table.h"
+#include "access/xact.h"
 #include "catalog/catalog.h"
 #include "catalog/dependency.h"
 #include "catalog/heap.h"
@@ -916,6 +917,8 @@ RemoveConstraintById(Oid conId)
 	Relation	conDesc;
 	HeapTuple	tup;
 	Form_pg_constraint con;
+	bool		dropping_pk = false;
+	List	   *unconstrained_cols = NIL;
 
 	conDesc = table_open(ConstraintRelationId, RowExclusiveLock);
 
@@ -940,7 +943,9 @@ RemoveConstraintById(Oid conId)
 		/*
 		 * We need to update the relchecks count if it is a check constraint
 		 * being dropped.  This update will force backends to rebuild relcache
-		 * entries when we commit.
+		 * entries when we commit.  For not-null and primary key constraints,
+		 * obtain the list of columns affected, so that we can reset their
+		 * attnotnull flags below.
 		 */
 		if (con->contype == CONSTRAINT_CHECK)
 		{
@@ -967,6 +972,36 @@ RemoveConstraintById(Oid conId)
 
 			table_close(pgrel, RowExclusiveLock);
 		}
+		else if (con->contype == CONSTRAINT_NOTNULL)
+		{
+			unconstrained_cols = list_make1_int(extractNotNullColumn(tup));
+		}
+		else if (con->contype == CONSTRAINT_PRIMARY)
+		{
+			Datum		adatum;
+			ArrayType  *arr;
+			int			numkeys;
+			bool		isNull;
+			int16	   *attnums;
+
+			dropping_pk = true;
+
+			adatum = heap_getattr(tup, Anum_pg_constraint_conkey,
+  RelationGetDescr(conDesc), );
+			if (isNull)
+elog(ERROR, "null conkey for constraint %u", con->oid);
+			arr = DatumGetArrayTypeP(adatum);	/* ensure not toasted */
+			numkeys = ARR_DIMS(arr)[0];
+			if (ARR_NDIM(arr) != 1 ||
+numkeys < 0 ||
+ARR_HASNULL(arr) ||
+ARR_ELEMTYPE(arr) != INT2OID)
+elog(ERROR, "conkey is not a 1-D smallint array");
+			attnums = (int16 *) ARR_DATA_PTR(arr);
+
+			for (int i = 0; i < numkeys; i++)
+unconstrained_cols = lappend_int(unconstrained_cols, attnums[i]);
+		}
 
 		/* Keep lock on constraint's rel until end of xact */
 		table_close(rel, NoLock);
@@ -986,6 +1021,74 @@ RemoveConstraintById(Oid conId)
 	/* Fry the constraint itself */
 	CatalogTupleDelete(conDesc, >t_self);
 
+	/*
+	 * If this was a NOT NULL or the primary key, the constrained columns must
+	 * have had pg_attribute.attnotnull set.  See if we need to reset it, and
+	 * do so.
+	 */
+	if (unconstrained_cols)
+	{
+		Relation	tablerel;
+		Relation	attrel;
+		Bitmapset  *pkcols;
+		ListCell   *lc;
+
+		/* Make the above deletion visible */
+		CommandCounterIncrement();
+
+		tablerel = table_open(con->conrelid, NoLock);	/* already have lock */
+		attrel = table_open(AttributeRelationId, RowExclusiveLock);
+
+		/*
+		 * We want to test columns for their presence in the primary key, but
+		 * only if we're not dropping it.
+		 */
+		pkcols = dropping_pk ? NULL :
+			RelationGetIndexAttrBitmap(tablerel,
+	   INDEX_ATTR_BITMAP_PRIMARY_KEY);
+
+		foreach(lc, unconstrained_cols)
+		{
+			AttrNumber	attnum = lfirst_int(lc);
+			HeapTuple	atttup;
+			HeapTuple	contup;
+			Form_pg_attribute attForm;
+
+			/*
+			 * Obtain pg_attribute tuple and verify conditions on it.  We use
+			 * a copy we can scribble on.
+			 */
+			atttup = SearchSysCacheCopyAttNum(con->conrelid, attnum);
+			if (!HeapTupleIsValid(atttup))
+elog(ERROR, "cache lookup failed for attribute %d of

Re: IPC::Run::time[r|out] vs our TAP tests

2024-04-09 Thread Daniel Gustafsson

> On 4 Apr 2024, at 23:46, Daniel Gustafsson  wrote:
> 
>> On 4 Apr 2024, at 23:24, Tom Lane  wrote:
> 
>> A minimum fix that seems to make this work better is as attached,
>> but I feel like somebody ought to examine all the IPC::Run::timer
>> and IPC::Run::timeout calls and see which ones are mistaken.
>> It's a little scary to convert a timeout to a timer because of
>> the hazard that someplace that would need to be checking for
>> is_expired isn't.
> 
> Skimming this and a few callsites it seems reasonable to use a timer instead 
> of
> a timeout, but careful study is needed to make sure we're not introducing
> anything subtly wrong in the other direction.

Sharing a few preliminary results from looking at this, the attached passes
check-world but need more study/testing.

It seems wrong to me that we die() in query and query_until rather than giving
the caller the power to decide how to proceed.  We have even documented that we
do just this:

"Dies on failure to invoke psql, or if psql fails to connect.  Errors
occurring later are the caller's problem"

Turning the timeout into a timer and returning undef along with logging a test
failure in case of expiration seems a bit saner (maybe Andrew can suggest an
API which has a better Perl feel to it).  Most callsites don't need any changes
to accommodate for this, the attached 0002 implements this timer change and
modify the few sites that need it, converting one to plain query() where the
added complexity of query_until isn't required.

A few other comments on related things that stood out while reviewing:

The tab completion test can use the API call for automatically restart the
timer to reduce the complexity of check_completion a hair.  Done in 0001 (but
really not necessary).

Commit Af279ddd1c2 added this sequence to 040_standby_failover_slots_sync.pl in
the recovery tests:

$back_q->query_until(
qr/logical_slot_get_changes/, q(
   \echo logical_slot_get_changes
   SELECT pg_logical_slot_get_changes('test_slot', NULL, NULL);
));

...  ...

# Since there are no slots in standby_slot_names, the function
# pg_logical_slot_get_changes should now return, and the session can be
# stopped.
$back_q->quit;

There is no guarantee that pg_logical_slot_get_changes has returned when
reaching this point.  This might still work as intended, but the comment is
slightly misleading IMO.

recovery/t/043_wal_replay_wait.pl calls pg_wal_replay_wait() since 06c418e163e
in a background session which it then skips terminating.  Calling ->quit is
mandated by the API, in turn required by IPC::Run.  Calling ->quit on the
process makes the test fail from the process having already exited, but we can
call ->finish directly like we do in test_misc/t/005_timeouts.pl.  0003 fixes
this.

--
Daniel Gustafsson

v1-0001-Configure-interactive-instance-to-restart-timer.patch
Description: Binary data

v1-0002-Report-test-failure-rather-than-aborting-in-case-.patch
Description: Binary data

v1-0003-Clean-up-BackgroundPsql-object-when-test-ends.patch
Description: Binary data

Re: WIP Incremental JSON Parser

2024-04-09 Thread Jacob Champion

On Mon, Apr 8, 2024 at 10:24 PM Michael Paquier  wrote:
> At the end, having a way to generate JSON blobs randomly to test this
> stuff would be more appealing

For the record, I'm working on an LLVM fuzzer target for the JSON
parser. I think that would be a lot more useful than anything we can
hand-code.

But I want it to cover both the recursive and incremental code paths,
and we'd need to talk about where it would live. libfuzzer is seeded
with a bunch of huge incomprehensible blobs, which is something we're
now trying to avoid checking in. There's also the security aspect of
"what do we do when it finds something", and at that point maybe we
need to look into a service like oss-fuzz.

Thanks,
--Jacob

Re: post-freeze damage control

2024-04-09 Thread Peter Geoghegan

On Tue, Apr 9, 2024 at 12:27 PM Tom Lane  wrote:
> > Some of the most compelling cases for the transformation will involve
> > path keys. If the transformation enables the optimizer to build a
> > plain index scan (or index-only scan) with useful path keys, then that
> > might well lead to a far superior plan compared to what's possible
> > with BitmapOrs.
>
> I did not say it isn't a useful thing to have.  I said the patch
> did not appear ready to go in.

Didn't mean to suggest otherwise. I just wanted to hear your thoughts
on this aspect of these sorts of transformations.

> > I understand that it'll still be possible to use OR expression
> > evaluation in such cases, without applying the transformation (via
> > filter quals), so in principle you don't need the transformation to
> > get an index scan that can (say) terminate the scan early due to the
> > presence of an "ORDER BY ... LIMIT n". But I suspect that that won't
> > work out much of the time, because the planner will believe (rightly
> > or wrongly) that the filter quals will incur too many heap page
> > accesses.
>
> That's probably related to the fact that we don't have a mechanism
> for evaluating non-indexed quals against columns that are retrievable
> from the index.  We really oughta work on getting that done.

I agree that that is very important work, but I'm not sure that it
makes all that much difference here. Even if we had that improved
mechanism already, today, using index quals would still be strictly
better than expression evaluation. Index quals can allow nbtree to
skip over irrelevant parts of the index as the need arises, which is a
significant advantage in its own right.

ISTM that the planner should always prefer index quals over expression
evaluation, on general principle, even when there's no reason to think
it'll work out. At worst the executor has essentially the same
physical access patterns as the expression evaluation case. On the
other hand, providing nbtree with that context might end up being a
great deal faster.

-- 
Peter Geoghegan

Re: post-freeze damage control

2024-04-09 Thread Tom Lane

Peter Geoghegan  writes:
> On Mon, Apr 8, 2024 at 10:12 PM Tom Lane  wrote:
>> I don't know that I'd call it scary exactly, but I do think it
>> was premature.  A week ago there was no consensus that it was
>> ready to commit, but Alexander pushed it (or half of it, anyway)
>> despite that.

> Some of the most compelling cases for the transformation will involve
> path keys. If the transformation enables the optimizer to build a
> plain index scan (or index-only scan) with useful path keys, then that
> might well lead to a far superior plan compared to what's possible
> with BitmapOrs.

I did not say it isn't a useful thing to have.  I said the patch
did not appear ready to go in.

> I understand that it'll still be possible to use OR expression
> evaluation in such cases, without applying the transformation (via
> filter quals), so in principle you don't need the transformation to
> get an index scan that can (say) terminate the scan early due to the
> presence of an "ORDER BY ... LIMIT n". But I suspect that that won't
> work out much of the time, because the planner will believe (rightly
> or wrongly) that the filter quals will incur too many heap page
> accesses.

That's probably related to the fact that we don't have a mechanism
for evaluating non-indexed quals against columns that are retrievable
from the index.  We really oughta work on getting that done.  But
I've been keeping a side eye on the work to unify plain and index-only
scans, because that's going to touch a lot of the same code so it
doesn't seem profitable to pursue those goals in parallel.

> Another problem (at least as I see it) with the new
> or_to_any_transform_limit GUC is that its design seems to have nothing
> to say about the importance of these sorts of cases. Most of these
> cases will only have 2 or 3 constants, just because that's what's most
> common in general.

Yeah, that's one of the reasons I'm dubious that the committed
patch was ready.

regards, tom lane

Re: WIP Incremental JSON Parser

2024-04-09 Thread Jacob Champion

On Tue, Apr 9, 2024 at 7:30 AM Andrew Dunstan  wrote:
> I think Michael's point was that if we carry the code we should test we
> can run it. The other possibility would be just to remove it. I can see
> arguments for both.

Hm. If it's not acceptable to carry this (as a worse-is-better smoke
test) without also running it during tests, then my personal vote
would be to tear it out and just have people write/contribute targeted
benchmarks when they end up working on performance. I don't think the
cost/benefit makes sense at that point.

--Jacob

Re: post-freeze damage control

2024-04-09 Thread Peter Geoghegan

On Mon, Apr 8, 2024 at 10:12 PM Tom Lane  wrote:
> I don't know that I'd call it scary exactly, but I do think it
> was premature.  A week ago there was no consensus that it was
> ready to commit, but Alexander pushed it (or half of it, anyway)
> despite that.

Some of the most compelling cases for the transformation will involve
path keys. If the transformation enables the optimizer to build a
plain index scan (or index-only scan) with useful path keys, then that
might well lead to a far superior plan compared to what's possible
with BitmapOrs.

I understand that it'll still be possible to use OR expression
evaluation in such cases, without applying the transformation (via
filter quals), so in principle you don't need the transformation to
get an index scan that can (say) terminate the scan early due to the
presence of an "ORDER BY ... LIMIT n". But I suspect that that won't
work out much of the time, because the planner will believe (rightly
or wrongly) that the filter quals will incur too many heap page
accesses.

Another problem (at least as I see it) with the new
or_to_any_transform_limit GUC is that its design seems to have nothing
to say about the importance of these sorts of cases. Most of these
cases will only have 2 or 3 constants, just because that's what's most
common in general.

-- 
Peter Geoghegan

Re: post-freeze damage control

2024-04-09 Thread Andrey M. Borodin

> On 9 Apr 2024, at 18:45, Alvaro Herrera  wrote:
> 
> Maybe we should explicitly advise users to not delete that WAL from
> their archives, until pg_combinebackup is hammered a bit more.

As a backup tool maintainer, I always reference to out-of-the box Postgres 
tools as some bulletproof alternative.
I really would like to stick to this reputation and not discredit these tools.

Best regards, Andrey Borodin.

Re: post-freeze damage control

2024-04-09 Thread Alvaro Herrera

On 2024-Apr-09, Stefan Fercot wrote:

> At some point, the only way to really validate a backup is to actually try to 
> restore it.
> And if people get encouraged to do that faster thanks to incremental backups, 
> they could detect potential issues sooner.
> Ultimately, users will still need their full backups and WAL archives.
> If pg_combinebackup fails for any reason, the fix will be to perform the 
> recovery from the full backup directly.
> They still should be able to recover, just slower.

I completely agree that people should be testing the feature so that we
can fix any bugs as soon as possible.  However, if my understanding is
correct, restoring a full backup plus an incremental no longer needs the
intervening WAL up to the incremental.  Users wishing to save some disk
space might be tempted to delete that WAL.  If they do, and later it
turns out that the full+incremental cannot be restored for whatever
reason, they are in danger.

But you're right that if they don't delete that WAL, then the full is
restorable on its own.

Maybe we should explicitly advise users to not delete that WAL from
their archives, until pg_combinebackup is hammered a bit more.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"I must say, I am absolutely impressed with what pgsql's implementation of
VALUES allows me to do. It's kind of ridiculous how much "work" goes away in
my code.  Too bad I can't do this at work (Oracle 8/9)."   (Tom Allison)
   http://archives.postgresql.org/pgsql-general/2007-06/msg00016.php

removal of '{' from WORD_BREAKS

2024-04-09 Thread Robert Haas

On Thu, Apr 4, 2024 at 10:38 PM Michael Paquier  wrote:
> > It kind of looks like a libedit bug, but maybe we should dig more
> > deeply.  I felt itchy about 927332b95e77 removing '{' from the
> > WORD_BREAKS set, and wondered exactly how that would change readline's
> > behavior.  But even if that somehow accounts for the extra backslash
> > before '{', it's not clear how it could lead to '?' and '}' also
> > getting backslashed.
>
> I don't have a clear idea, either.  I also feel uneasy about
> 927332b95e77 and its change of WORD_BREAKS, but this has the smell
> of a bug from an outdated libedit version.

I too felt uneasy about that commit, for the same reason. However,
there is a justification for the change in the commit message which is
not obviously wrong, namely that ":{?name} is the only psql syntax
using the '{' sign". And in fact, SQL basically doesn't use '{' for
anything, either. We do see { showing up inside of quoted strings, for
arrays or JSON, but I would guess that the word-break characters
aren't going to affect behavior within a quoted string. So it still
seems like it should be OK? Another thing that makes me think that my
unease may be unfounded is that the matching character '}' isn't in
WORD_BREAKS either, and I would have thought that if we needed one
we'd need both.

But does anyone else have a more specific reason for thinking that
this might be a problem?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Streaming relation data out of order

2024-04-09 Thread Robert Haas

On Tue, Apr 9, 2024 at 12:19 AM Thomas Munro  wrote:
> This idea is due to Robert Haas, who complained that he feared that
> the streaming I/O API already worked like this.  It doesn't, but it
> could!  Here is a concept patch to try it out.

Oh, no, it's all my fault!

My favorite part of this patch is the comment added to read_stream.h,
which IMHO makes things a whole lot clearer.

I have no opinion on the rest of it; I don't understand this code.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: WIP Incremental JSON Parser

2024-04-09 Thread Andrew Dunstan




On 2024-04-09 Tu 09:45, Jacob Champion wrote:

On Tue, Apr 9, 2024 at 4:54 AM Andrew Dunstan  wrote:

On 2024-04-09 Tu 01:23, Michael Paquier wrote:
There is no direct check on test_json_parser_perf.c, either, only a
custom rule in the Makefile without specifying something for meson.
So it looks like you could do short execution check in a TAP test, at
least.

Not adding a test for that was deliberate - any sane test takes a while, and I didn't 
want to spend that much time on it every time someone runs "make check-world" 
or equivalent. However, adding a test to run it with a trivial number of iterations seems 
reasonable, so I'll add that. I'll also add a meson target for the binary.

Okay, but for what purpose? My understanding during review was that
this was a convenience utility for people who were actively hacking on
the code (and I used it for exactly that purpose a few months back, so
I didn't question that any further). Why does the farm need to spend
any time running it at all?



I think Michael's point was that if we carry the code we should test we 
can run it. The other possibility would be just to remove it. I can see 
arguments for both.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: IPC::Run::time[r|out] vs our TAP tests

2024-04-09 Thread Andrew Dunstan




On 2024-04-09 Tu 09:46, Tom Lane wrote:

Michael Paquier  writes:

By the way, are you planning to do something like [1]?  I've not
looked in details at the callers of IPC::Run::timeout, still the extra
debug output would be nice.

It needs more review I think - I didn't check every call site to see
if anything would be broken.  I believe Andrew has undertaken a
survey of all the timeout/timer calls, but if he doesn't produce
anything I might have a go at it after awhile.





What I looked at so far was the use of is_expired, but when you look 
into that you see that you need to delve further, to where timeout/timer 
objects are created and passed around. I'll take a closer look when I 
have done some incremental json housekeeping.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: IPC::Run::time[r|out] vs our TAP tests

2024-04-09 Thread Tom Lane

Michael Paquier  writes:
> By the way, are you planning to do something like [1]?  I've not
> looked in details at the callers of IPC::Run::timeout, still the extra
> debug output would be nice.

It needs more review I think - I didn't check every call site to see
if anything would be broken.  I believe Andrew has undertaken a
survey of all the timeout/timer calls, but if he doesn't produce
anything I might have a go at it after awhile.

regards, tom lane

Re: WIP Incremental JSON Parser

2024-04-09 Thread Jacob Champion

On Tue, Apr 9, 2024 at 4:54 AM Andrew Dunstan  wrote:
> On 2024-04-09 Tu 01:23, Michael Paquier wrote:
> There is no direct check on test_json_parser_perf.c, either, only a
> custom rule in the Makefile without specifying something for meson.
> So it looks like you could do short execution check in a TAP test, at
> least.
>
> Not adding a test for that was deliberate - any sane test takes a while, and 
> I didn't want to spend that much time on it every time someone runs "make 
> check-world" or equivalent. However, adding a test to run it with a trivial 
> number of iterations seems reasonable, so I'll add that. I'll also add a 
> meson target for the binary.

Okay, but for what purpose? My understanding during review was that
this was a convenience utility for people who were actively hacking on
the code (and I used it for exactly that purpose a few months back, so
I didn't question that any further). Why does the farm need to spend
any time running it at all?

--Jacob

Re: post-freeze damage control

2024-04-09 Thread Stefan Fercot

Hi,

On Tuesday, April 9th, 2024 at 2:46 PM, Robert Haas  
wrote:
> In all sincerity, I appreciate the endorsement. Basically what's been
> scaring me about this feature is the possibility that there's some
> incurable design flaw that I've managed to completely miss. If it has
> some more garden-variety bugs, that's still pretty bad: people will
> potentially lose data and be unable to get it back. But, as long as
> we're able to find the bugs and fix them, the situation should improve
> over time until, hopefully, everybody trusts it roughly as much as we
> trust, say, crash recovery. Perhaps even a bit more: I think this code
> is much better-written than our crash recovery code, which has grown
> into a giant snarl that nobody seems able to untangle, despite
> multiple refactoring attempts. However, if there's some reason why the
> approach is fundamentally unsound which I and others have failed to
> detect, then we're at risk of shipping a feature that is irretrievably
> broken. That would really suck.

IMHO it totally worth shipping such long-waited feature sooner than later.
Yes, it is a complex one, but you started advertising it since last January 
already, so people should already be able to play with it in Beta.

And as you mentioned in your blog about the evergreen backup:

> But if you're anything like me, you'll already see that this arrangement
> has two serious weaknesses. First, if there are any data-corrupting bugs
> in pg_combinebackup or any of the server-side code that supports
> incremental backup, this approach could get you into big trouble.

At some point, the only way to really validate a backup is to actually try to 
restore it.
And if people get encouraged to do that faster thanks to incremental backups, 
they could detect potential issues sooner.
Ultimately, users will still need their full backups and WAL archives.
If pg_combinebackup fails for any reason, the fix will be to perform the 
recovery from the full backup directly.
They still should be able to recover, just slower.

--
Stefan FERCOT
Data Egret (https://dataegret.com)

Re: post-freeze damage control

2024-04-09 Thread Andrei Lepikhov


On 9/4/2024 12:55, Tom Lane wrote:

Andrei Lepikhov  writes:

* I really, really dislike jamming this logic into prepqual.c,
where it has no business being.  I note that it was shoved
into process_duplicate_ors without even the courtesy of
expanding the header comment:



Yeah, I preferred to do it in parse_expr.c with the assumption of some
'minimal' or 'canonical' tree form.


That seems quite the wrong direction to me.  AFAICS, the argument
for making this transformation depends on being able to convert
to an indexscan condition, so I would try to apply it even later,
when we have a set of restriction conditions to apply to a particular
baserel.  (This would weaken the argument that we need hashing
rather than naive equal() tests even further, I think.)  Applying
the transform to join quals seems unlikely to be a win.
Our first prototype did this job right at the stage of index path 
creation. Unfortunately, this approach was too narrow and expensive.
The most problematic cases we encountered were from BitmapOr paths: if 
an incoming query has a significant number of OR clauses, the optimizer 
spends a lot of time generating these, in most cases, senseless paths 
(remember also memory allocated for that purpose). Imagine how much 
worse the situation becomes when we scale it with partitions.
Another issue we resolved with this transformation: shorter list of 
clauses speeds up planning and, sometimes, makes cardinality estimation 
more accurate.
Moreover, it helps even SeqScan: attempting to find a value in the 
hashed array is much faster than cycling a long-expression on each 
incoming tuple.


One more idea that I have set aside here is that the planner can utilize 
quick clause hashing:
From time to time, in the mailing list, I see disputes on different 
approaches to expression transformation/simplification/grouping, and 
most of the time, it ends up with the problem of search complexity. 
Clause hash can be a way to solve this, can't it?


--
regards,
Andrei Lepikhov
Postgres Professional

RE: Synchronizing slots from primary to standby

2024-04-09 Thread Zhijie Hou (Fujitsu)

On Thursday, April 4, 2024 4:25 PM Masahiko Sawada  
wrote:

Hi,

> On Wed, Apr 3, 2024 at 7:06 PM Amit Kapila 
> wrote:
> >
> > On Wed, Apr 3, 2024 at 11:13 AM Amit Kapila 
> wrote:
> > >
> > > On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy
> > >  wrote:
> > >
> > > > I quickly looked at v8, and have a nit, rest all looks good.
> > > >
> > > > +if (DecodingContextReady(ctx) &&
> found_consistent_snapshot)
> > > > +*found_consistent_snapshot = true;
> > > >
> > > > Can the found_consistent_snapshot be checked first to help avoid
> > > > the function call DecodingContextReady() for
> > > > pg_replication_slot_advance callers?
> > > >
> > >
> > > Okay, changed. Additionally, I have updated the comments and commit
> > > message. I'll push this patch after some more testing.
> > >
> >
> > Pushed!
> 
> While testing this change, I realized that it could happen that the server 
> logs
> are flooded with the following logical decoding logs that are written every 
> 200
> ms:

Thanks for reporting!

> 
> 2024-04-04 16:15:19.270 JST [3838739] LOG:  starting logical decoding for slot
> "test_sub"
> 2024-04-04 16:15:19.270 JST [3838739] DETAIL:  Streaming transactions
> committing after 0/50006F48, reading WAL from 0/50006F10.
> 2024-04-04 16:15:19.270 JST [3838739] LOG:  logical decoding found
> consistent point at 0/50006F10
> 2024-04-04 16:15:19.270 JST [3838739] DETAIL:  There are no running
> transactions.
> 2024-04-04 16:15:19.477 JST [3838739] LOG:  starting logical decoding for slot
> "test_sub"
> 2024-04-04 16:15:19.477 JST [3838739] DETAIL:  Streaming transactions
> committing after 0/50006F48, reading WAL from 0/50006F10.
> 2024-04-04 16:15:19.477 JST [3838739] LOG:  logical decoding found
> consistent point at 0/50006F10
> 2024-04-04 16:15:19.477 JST [3838739] DETAIL:  There are no running
> transactions.
> 
> For example, I could reproduce it with the following steps:
> 
> 1. create the primary and start.
> 2. run "pgbench -i -s 100" on the primary.
> 3. run pg_basebackup to create the standby.
> 4. configure slotsync setup on the standby and start.
> 5. create a publication for all tables on the primary.
> 6. create the subscriber and start.
> 7. run "pgbench -i -Idtpf" on the subscriber.
> 8. create a subscription on the subscriber (initial data copy will start).
> 
> The logical decoding logs were written every 200 ms during the initial data
> synchronization.
> 
> Looking at the new changes for update_local_synced_slot():
...
> We call LogicalSlotAdvanceAndCheckSnapState() if one of confirmed_lsn,
> restart_lsn, and catalog_xmin is different between the remote slot and the 
> local
> slot. In my test case, during the initial sync performing, only catalog_xmin 
> was
> different and there was no serialized snapshot at restart_lsn, and the 
> slotsync
> worker called LogicalSlotAdvanceAndCheckSnapState(). However no slot
> properties were changed even after the function and it set slot_updated = 
> true.
> So it starts the next slot synchronization after 200ms.

I was trying to reproduce this and check why the catalog_xmin is different
among synced slot and remote slot, but I was not able to reproduce the case
where there are lots of logical decoding logs. The script I used is attached.

Would it be possible for you to share the script you used to reproduce this
issue? Alternatively, could you please share the log files from both the
primary and standby servers after reproducing the problem (it would be greatly
helpful if you could set the log level to DEBUG2).

Best Regards,
Hou zj


test.sh
Description: test.sh

Re: post-freeze damage control

2024-04-09 Thread Robert Haas

On Tue, Apr 9, 2024 at 7:24 AM Tomas Vondra
 wrote:
> I think it's a bit more nuanced, because it's about backups/restore. The
> bug might be subtle, and you won't learn about it until the moment when
> you need to restore (or perhaps even long after that). At which point
> "You might have taken the backup in some other way." is not really a
> viable exit route.
>
> Anyway, I'm still not worried about this particular feature, and I'll
> keep doing the stress testing.

In all sincerity, I appreciate the endorsement. Basically what's been
scaring me about this feature is the possibility that there's some
incurable design flaw that I've managed to completely miss. If it has
some more garden-variety bugs, that's still pretty bad: people will
potentially lose data and be unable to get it back. But, as long as
we're able to find the bugs and fix them, the situation should improve
over time until, hopefully, everybody trusts it roughly as much as we
trust, say, crash recovery. Perhaps even a bit more: I think this code
is much better-written than our crash recovery code, which has grown
into a giant snarl that nobody seems able to untangle, despite
multiple refactoring attempts. However, if there's some reason why the
approach is fundamentally unsound which I and others have failed to
detect, then we're at risk of shipping a feature that is irretrievably
broken. That would really suck.

I'm fairly hopeful that there is no such design defect: I certainly
can't think of one. But, it's much easier to imagine an incurable
problem here than with, say, the recent pruning+freezing changes.
Those changes might have bugs, and those bugs might be hard to find,
but if they do exist and are found, they can be fixed. Here, it's a
little less obvious that that's true. We're relying on our ability, at
incremental backup time, to sort out from the manifest and the WAL
summaries, what needs to be included in the backup in order for a
subsequent pg_combinebackup operation to produce correct results. The
basic idea is simple enough, but the details are complicated, and it
feels like a subtle defect in the algorithm could potentially scuttle
the whole thing. I'd certainly appreciate having more smart people try
to think of things that I might have overlooked.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Security lessons from liblzma

2024-04-09 Thread Joe Conway


On 4/8/24 22:57, Bruce Momjian wrote:

On Sat, Mar 30, 2024 at 04:50:26PM -0400, Robert Haas wrote:

An awful lot of what we do operates on the principle that we know the
people who are involved and trust them, and I'm glad we do trust them,
but the world is full of people who trusted somebody too much and
regretted it afterwards. The fact that we have many committers rather
than a single maintainer probably reduces risk at least as far as the
source repository is concerned, because there are more people paying
attention to potentially notice something that isn't as it should be.


One unwritten requirement for committers is that we are able to
communicate with them securely.  If we cannot do that, they potentially
could be forced by others, e.g., governments, to add code to our
repositories.

Unfortunately, there is on good way for them to communicate with us
securely once they are unable to communicate with us securely.  I
suppose some special word could be used to communicate that status ---
that is how it was done in non-electronic communication in the past.


I don't know how that really helps. If one of our committers is under 
duress, they probably cannot risk outing themselves anyway.


The best defense, IMHO, is the fact that our source code is open and can 
be reviewed freely.


The trick is to get folks to do the review.

I know, for example, at $past_employer we had a requirement to get 
someone on our staff to review every single commit in order to maintain 
certain certifications. Of course there is no guarantee that such 
reviews would catch everything, but maybe we could establish post commit 
reviews by contributors in a more rigorous way? Granted, getting more 
qualified volunteers is not a trivial problem...


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Parallel Recovery in PostgreSQL

2024-04-09 Thread Matthias van de Meent

On Tue, 9 Apr 2024 at 13:12, 范润泽  wrote:
>
> Hi, Hackers
> I have observed that there has been a paucity of discussion concerning the 
> parallel replay of WAL logs.
> The most recent discourse on this subject was during the PGCon event in 2023, 
> where it was noted that PostgreSQL utilizes a single process for WAL replay.
> However, when configuring primary-secondary replication, there exists a 
> tangible scenario where the primary accumulates an excessive backlog that the 
> secondary cannot replay promptly.
> This situation prompts me to question whether it is pertinent to consider 
> integrating a parallel replay feature.
> Such a functionality could potentially mitigate the risk of delayed WAL 
> application on replicas and enhance overall system resilience and performance.
> I am keen to hear your thoughts on this issue and whether you share the view 
> that parallel WAL replay is a necessity that we should collaboratively 
> explore further.

I think we should definitely explore this further, yes.

Note that parallel WAL replay is not the only thing we can improve
here: A good part of WAL redo time is spent in reading and validating
the WAL records. If we can move that part to a parallel worker/thread,
that could probably improve the throughput by a good margin.

Then there is another issue with parallel recovery that I also called
out at PGCon 2023: You can't just reorder WAL records in a simple
page- or relation-based parallelization approach.

Indexes often contain transient information, of which replay isn't
easily parallelizable with other relation's redo due to their
dependencies on other relation's pages while gauranteeing correct
query results.

E.g., the index insertion of page 2 in one backend's operations order
{ 1) Page 1: heap insertion, 2) Page 2: index insert, 3) commit }
cannot be reordered to before the heap page insertion at 1, because
that'd allow a concurrent index access after replay of 2), but before
replay of 1), to see an "all-visible" page 1 in its index scan, while
the heap tuple of the index entry wasn't even inserted yet. Index-only
scans could thus return invalid results.

See also the wiki page [0] on parallel recovery, and Koichi-san's
repository [1] with his code for parallel replay based on PG 14.6.

Kind regards,

Matthias van de Meent

[0] https://wiki.postgresql.org/wiki/Parallel_Recovery
[1] https://github.com/koichi-szk/postgres/commits/parallel_replay_14_6/

Re: PostgreSQL 17 Release Management Team & Feature Freeze

2024-04-09 Thread Andrew Dunstan




On 2024-04-08 Mo 19:26, Tom Lane wrote:

Andrew Dunstan  writes:

I quite like the triage idea. But I think there's also a case for being
more a bit more flexible with those patches we don't throw out. A case
close to my heart: I'd have been very sad if the NESTED piece of
JSON_TABLE hadn't made the cut, which it did with a few hours to spare,
and I would not have been alone, far from it. I'd have been happy to
give Amit a few more days or a week if he needed it, for a significant
headline feature.
I know there will be those who say it's the thin end of the wedge and
rulez is rulez, but this is my view.

You've certainly been around the project long enough to remember the
times in the past when we let the schedule slip for "one more big
patch".  It just about never worked out well, so I'm definitely in
favor of a hard deadline.  The trick is to control the tendency to
push in patches that are only almost-ready in order to nominally
meet the deadline.  (I don't pretend to be immune from that
temptation myself, but I think I resisted it better than some
this year.)





If we want to change how things are working I suspect we probably need 
something a lot more radical than any of the suggestions I've seen 
floating around. I don't know what that might be, but ISTM we're not 
thinking boldly enough.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Flushing large data immediately in pqcomm

2024-04-09 Thread Ranier Vilela

Em seg., 8 de abr. de 2024 às 09:27, Jelte Fennema-Nio 
escreveu:

> On Sun, 7 Apr 2024 at 11:34, David Rowley  wrote:
> > That seems to require modifying the following function signatures:
> > secure_write(), be_tls_write(), be_gssapi_write().  That's not an area
> > I'm familiar with, however.
>
> Attached is a new patchset where 0003 does exactly that. The only
> place where we need to cast to non-const is for GSS, but that seems
> fine (commit message has more details).
>
+1.
Looks ok to me.
The GSS pointer *ptr, is already cast to char * where it is needed,
so the code is already correct.

best regards,
Ranier Vilela

Re: WIP Incremental JSON Parser

2024-04-09 Thread Andrew Dunstan



On 2024-04-09 Tu 01:23, Michael Paquier wrote:

On Tue, Apr 09, 2024 at 09:48:18AM +0900, Michael Paquier wrote:

There is no direct check on test_json_parser_perf.c, either, only a
custom rule in the Makefile without specifying something for meson.
So it looks like you could do short execution check in a TAP test, at
least.



Not adding a test for that was deliberate - any sane test takes a while, 
and I didn't want to spend that much time on it every time someone runs 
"make check-world" or equivalent. However, adding a test to run it with 
a trivial number of iterations seems reasonable, so I'll add that. I'll 
also add a meson target for the binary.




While reading the code, I've noticed a few things, as well:

+   /* max delicious line length is less than this */
+   charbuff[6001];

Delicious applies to the contents, nothing to do with this code even
if I'd like to think that these lines of code are edible and good.
Using a hardcoded limit for some JSON input sounds like a bad idea to
me even if this is a test module.  Comment that applies to both the
perf and the incremental tools.  You could use a #define'd buffer size
for readability rather than assuming this value in many places.



The comment is a remnant of when I hadn't yet added support for 
incomplete tokens, and so I had to parse the input line by line. I agree 
it can go, and we can use a manifest constant for the buffer size.





+++ b/src/test/modules/test_json_parser/test_json_parser_incremental.c
+ * This progam tests incremental parsing of json. The input is fed into
+ * full range of incement handling, especially in the lexer, is exercised.
+++ b/src/test/modules/test_json_parser/test_json_parser_perf.c
+ *Performancet est program for both flavors of the JSON parser
+ * This progam tests either the standard (recursive descent) JSON parser
+++ b/src/test/modules/test_json_parser/README
+  reads in a file and pases it in very small chunks (60 bytes at a time) to

Collection of typos across various files.



Will fix. (The older I get the more typos I seem to make and the harder 
it is to notice them. It's very annoying.)





+   appendStringInfoString(, "1+23 trailing junk");
What's the purpose here?  Perhaps the intention should be documented
in a comment?



The purpose is to ensure that if there is not a trailing '\0' on the 
json chunk the parser will still do the right thing. I'll add a comment 
to that effect.





At the end, having a way to generate JSON blobs randomly to test this
stuff would be more appealing than what you have currently, with
perhaps a few factors like:
- Array and simple object density.
- Max Level of nesting.
- Limitation to ASCII characters that can be escaped.
- Perhaps more things I cannot think about?



No, I disagree. Maybe we need those things as well, but we do need a 
static test where we can test the output against known results. I have 
no objection to changing the input and output files.


It's worth noting that t/002_inline.pl does generate some input and test 
e.g., the maximum nesting levels among other errors. Perhaps you missed 
that. If you think we need more tests there adding them would be 
extremely simple.





So the current state of things is kind of disappointing, and the size
of the data set added to the tree is not that portable either if you
want to test various scenarios depending on the data set.  It seems to
me that this has been committed too hastily and that this is not ready
for integration, even if that's just a test module.  Tom also has
shared some concerns upthread, as far as I can see.



I have posted a patch already that addresses the issue Tom raised.


cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: NumericShort vs NumericLong format

2024-04-09 Thread Ole Peder Brandtzæg

On Tue, Mar 07, 2023 at 12:15:27AM -0500, Tom Lane wrote:
> "David G. Johnston"  writes:
> > As an aside, for anyone more fluent than I who reads this, is the use of
> > the word "dynamic scale" in this code comment supposed to be "display
> > scale"?
> > https://github.com/postgres/postgres/blob/cf96907aadca454c4094819c2ecddee07eafe203/src/backend/utils/adt/numeric.c#L121
> 
> Yeah, I think you're right.

Familiarizing myself with numeric.c today, I too was confused by this.
AFAICT, it's meant to say display scale as used elsewhere in the file;
for instance, the comment for NumericShort's n_header reads "Sign +
display scale + weight". Would it be appropriate if I submitted a patch
for this? It's admittedly trivial, but I figured I should say hi before
submitting one.

All the best,
Ole

-- 
Ole Peder Brandtzæg | En KLST/ITK-hybrid
Please don't look at me with those eyes
Please don't hint that you're capable of lies

Re: sql/json remaining issue

2024-04-09 Thread Amit Langote

Hi,

On Tue, Apr 9, 2024 at 4:47 PM jian he  wrote:
>
> hi.
> `
>  | NESTED [ PATH ] json_path_specification [ AS json_path_name ]
> COLUMNS ( json_table_column [, ...] )
> NESTED [ PATH ] json_path_specification [ AS json_path_name ] COLUMNS
> ( json_table_column [, ...] )
> `
>  "json_path_specification" should be "path_expression"?
> --
> drop table s1;
> create or replace function random_text_1000() returns text
> as $$select string_agg(md5(random()::text),'') from
> generate_Series(1,1000) s $$ LANGUAGE SQL;
>
> create unlogged table s1(a int GENERATED BY DEFAULT AS IDENTITY, js jsonb);
> insert into s1(js)
> select jsonb ('{"a":{"za":[{"z1": [11,]},{"z21": [22, 234,' || g
> || ']},{"z22": [32, 204,145]}]},"c": ' || g
> || ',"id": "' || random_text_1000() || '"}')
> from generate_series(1_000_000, 1_000_000) g;
> insert into s1(js)
> select jsonb ('{"a":{"za":[{"z1": [11,]},{"z21": [22, 234,' || g
> || ']},{"z22": [32, 204,145]}]},"c": ' || g
> || ',"id": "' || random_text_1000() || '"}')
> from generate_series(235, 235 + 20,1) g;
>
> select count(*), pg_size_pretty(pg_total_relation_size('s1')) from s1;
>  count  | pg_size_pretty
> +
>  22 | 6398 MB
> --
>
> explain(analyze, costs off,buffers, timing)
> SELECT sub.*, s.a as s_a FROM s,
> (values(23)) x(x),
> generate_series(13, 13) y,
> JSON_TABLE(js, '$' AS c1 PASSING x AS x, y AS y
> COLUMNS (
> xx1 int PATH '$.c',
> NESTED PATH '$.a.za[2]' COLUMNS (NESTED PATH '$.z22[*]' as z22 COLUMNS
> (c int PATH '$')),
> NESTED PATH '$.a.za[1]' COLUMNS (d json PATH '$ ? (@.z21[*] == ($"x" -1))'),
> NESTED PATH '$.a.za[0]' COLUMNS (NESTED PATH '$.z1[*] ? (@ >= ($"x"
> -2))' as z1 COLUMNS (a int PATH '$')),
> NESTED PATH '$.a.za[1]' COLUMNS
> (NESTED PATH '$.z21[*] ? (@ >= ($"y" +121))' as z21 COLUMNS (b int
> PATH '$ ? (@ <= ($"x" + 76))' default -1000 ON EMPTY))
> )) sub;
>
> for one jsonb, it can expand to 7 rows, the above query will return
> around 1.4 million rows.
> i use the above query, and pg_log_backend_memory_contexts in another
> session to check the memory usage.
> didn't find memory over consumed issue.
>
> but I am not sure this is the right way to check the memory consumption.
> --
> begin;
> SELECT sub.*, s.a as s_a FROM s,
> (values(23)) x(x),
> generate_series(13, 13) y,
> JSON_TABLE(js, '$' AS c1 PASSING x AS x, y AS y
> COLUMNS (
> xx1 int PATH '$.c',
> NESTED PATH '$.a.za[2]' COLUMNS (NESTED PATH '$.z22[*]' as z22 COLUMNS
> (c int PATH '$')),
> NESTED PATH '$.a.za[1]' COLUMNS (d json PATH '$ ? (@.z21[*] == ($"x" -1))'),
> NESTED PATH '$.a.za[0]' COLUMNS (NESTED PATH '$.z1[*] ? (@ >= ($"x"
> -2))' as z1 COLUMNS (a int PATH '$')),
> NESTED PATH '$.a.za[1]' COLUMNS
> (NESTED PATH '$.z21[*] ? (@ >= ($"y" +121))' as z21 COLUMNS (b int
> PATH '$ ? (@ <= ($"x" + 76))' error ON EMPTY))
> )) sub;
> rollback;
>
> only the last row will fail, because of "error ON EMPTY", (1_000_000
> <= 23 + 76) is false.
> I remember the very previous patch, because of error cleanup, it took
> a lot of resources.
> does our current implementation, only the very last row fail, will it
> be easy to clean up the transaction?

I am not sure I understand your concern.  Could you please rephrase
it?  Which previous patch are you referring to and what problem did it
cause with respect to error cleanup?

Per-row memory allocated for each successful output row JSON_TABLE()
doesn't pile up, because it's allocated in a context that is reset
after evaluating each row; see tfuncLoadRows().  But again I may be
misunderstanding your concern.

> the last query error message is:
> `
> ERROR:  no SQL/JSON item
> `
>
> we are in ExecEvalJsonExprPath, can we output it to be:
> `
> ERROR:  after applying json_path "5s", no SQL/JSON item found
> `
> in a json_table query, we can have multiple path_expressions, like the
> above query.
> it's not easy to know applying which path_expression failed.

Hmm, I'm not so sure about mentioning the details of the path because
path names are optional and printing path expression itself is not a
good idea.  Perhaps, we could mention the column name which would
always be there, but we'd then need to add a new field column_name
that's optionally set to JsonFuncExpr and JsonExpr, that is, when they
are being set up for JSON_TABLE() columns.  As shown in the attached.
With the patch you'll get:

ERROR:  no SQL/JSON item found for column "b"

--
Thanks, Amit Langote


v1-0001-JSON_TABLE-mention-column-name-in-the-ON-EMPTY-er.patch
Description: Binary data

Re: post-freeze damage control

2024-04-09 Thread Tomas Vondra

On 4/9/24 01:33, Michael Paquier wrote:
> On Tue, Apr 09, 2024 at 01:16:02AM +0200, Tomas Vondra wrote:
>> I don't feel too particularly worried about this. Yes, backups are super
>> important because it's often the only thing you have left when things go
>> wrong, and the incremental aspect is all new. The code I've seen while
>> doing the CoW-related patches seemed very precise and careful, and the
>> one bug we found & fixed does not make it bad.
>>
>> Sure, I can't rule there being more bugs, but I've been doing some
>> pretty extensive stress testing of this (doing incremental backups +
>> combinebackup, and comparing the results against the source, and that
>> sort of stuff). And so far only that single bug this way. I'm still
>> doing this randomized stress testing, with more and more complex
>> workloads etc. and I'll let keep doing that for a while.
>>
>> Maybe I'm a bit too happy-go-lucky, but IMO the risk here is limited.
> 
> Even if there's a critical bug, there are still other ways to take
> backups, so there is an exit route even if a problem is found and even
> if this problem requires a complex solution to be able to work
> correctly.
> 

I think it's a bit more nuanced, because it's about backups/restore. The
bug might be subtle, and you won't learn about it until the moment when
you need to restore (or perhaps even long after that). At which point
"You might have taken the backup in some other way." is not really a
viable exit route.

Anyway, I'm still not worried about this particular feature, and I'll
keep doing the stress testing.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Parallel Recovery in PostgreSQL

2024-04-09 Thread 范润泽

Hi, Hackers
I have observed that there has been a paucity of discussion concerning the 
parallel replay of WAL logs. 
The most recent discourse on this subject was during the PGCon event in 2023, 
where it was noted that PostgreSQL utilizes a single process for WAL replay. 
However, when configuring primary-secondary replication, there exists a 
tangible scenario where the primary accumulates an excessive backlog that the 
secondary cannot replay promptly.
This situation prompts me to question whether it is pertinent to consider 
integrating a parallel replay feature. 
Such a functionality could potentially mitigate the risk of delayed WAL 
application on replicas and enhance overall system resilience and performance.
I am keen to hear your thoughts on this issue and whether you share the view 
that parallel WAL replay is a necessity that we should collaboratively explore 
further.
Thank you for your attention to this matter.
Best regards,
David Fan

Re: Add trim_trailing_whitespace to editorconfig file

2024-04-09 Thread Jelte Fennema-Nio

On Thu, 4 Apr 2024 at 16:58, Jelte Fennema-Nio  wrote:
> > ISTM that with a small shell script, .editorconfig could be generated
> > from .gitattributes?
>
> Honestly, I don't think building such automation is worth the effort.

Okay, I spent the time to add a script to generate the editorconfig
based on .gitattributes after all. So attached is a patch that adds
that.

v5-0001-Add-script-to-keep-.editorconfig-in-sync-with-.gi.patch
Description: Binary data

RE: Is this a problem in GenericXLogFinish()?

2024-04-09 Thread Hayato Kuroda (Fujitsu)

Dear Michael,

> On Fri, Apr 05, 2024 at 06:22:58AM +, Hayato Kuroda (Fujitsu) wrote:
> > Thanks for pointing out. Yes, we fixed a behavior by the commit aa5edbe37,
> > but we missed the redo case. I made a fix patch based on the suggestion [1].
> 
> +   boolmod_buf = false;
> 
> Perhaps you could name that mod_wbuf, to be consistent with the WAL
> insert path.

Right, fixed.

> I'm slightly annoyed by the fact that there is no check on
> is_prim_bucket_same_wrt for buffer 1 in the BLK_NEEDS_REDO case to
> show the symmetry between the insert and replay paths.  Shouldn't
> there be at least an assert for that in the branch when there are no
> tuples (imagine an else to cover xldata->ntups == 0)?  I mean between
> just before updating the hash page's opaque area when
> is_prev_bucket_same_wrt.

Indeed, added an Assert() in else part. Was it same as your expectation?

> I've been thinking about ways to make such cases detectable in an
> automated fashion.  The best choice would be
> verifyBackupPageConsistency(), just after RestoreBlockImage() on the
> copy of the block image stored in WAL before applying the page masking
> which would mask the LSN.  A problem, unfortunately, is that we would
> need to transfer the knowledge of REGBUF_NO_CHANGE as a new BKPIMAGE_*
> flag so we would be able to track if the block rebuilt from the record
> has the *same* LSN as the copy used for the consistency check.  So
> this edge consistency case would come at a cost, I am afraid, and the
> 8 bits of bimg_info are precious :/

I could not decide that the change has more benefit than its cost, so I did
nothing for it.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



v2_add_modbuf_flag.diff
Description: v2_add_modbuf_flag.diff

Re: PostgreSQL 17 Release Management Team & Feature Freeze

2024-04-09 Thread Tomas Vondra

On 4/9/24 11:25, Matthias van de Meent wrote:
> On Mon, 8 Apr 2024 at 20:15, Tomas Vondra  
> wrote:
>>
>>
>> On 4/8/24 17:48, Matthias van de Meent wrote:
>>> On Mon, 8 Apr 2024 at 17:21, Tomas Vondra  
>>> wrote:

 Maybe it'd be better to start by expanding the existing rule about not
 committing patches introduced for the first time in the last CF.
>>>
>>> I don't think adding more hurdles about what to include into the next
>>> release is a good solution. Why the March CF, and not earlier? Or
>>> later? How about unregistered patches? Changes to the docs? Bug fixes?
>>> The March CF already has a submission deadline of "before march", so
>>> that already puts a soft limit on the patches considered for the april
>>> deadline.
>>>
 What if
 we said that patches in the last CF must not go through significant
 changes, and if they do it'd mean the patch is moved to the next CF?
>>>
>>> I also think there is already a big issue with a lack of interest in
>>> getting existing patches from non-committers committed, reducing the
>>> set of patches that could be considered is just cheating the numbers
>>> and discouraging people from contributing. For one, I know I have
>>> motivation issues keeping up with reviewing other people's patches
>>> when none (well, few, as of this CF) of my patches get reviewed
>>> materially and committed. I don't see how shrinking the window of
>>> opportunity for significant review from 9 to 7 months is going to help
>>> there.
>>>
>>
>> I 100% understand how frustrating the lack of progress can be, and I
>> agree we need to do better. I tried to move a number of stuck patches
>> this CF, and I hope (and plan) to do more of this in the future.
> 
> Thanks in advance.
> 
>> But I don't quite see how would this rule modification change the
>> situation for non-committers. AFAIK we already have the rule that
>> (complex) patches should not appear in the last CF for the first time,
>> and I'd argue that a substantial rework of a complex patch is not that
>> far from a completely new patch. Sure, the reworks may be based on a
>> thorough review, so there's a lot of nuance. But still, is it possible
>> to properly review if it gets reworked at the very end of the CF?
> 
> Possible? Probably, if there is a good shared understanding of why the
> previous patch version's approach didn't work well, and if the next
> approach is well understood as well. But it's not likely, that I'll
> agree with.
> 
> But the main issue I have with your suggestion is that the March
> commitfest effectively contains all new patches starting from January,
> and the leftovers not committed by February. If we start banning all
> new patches and those with significant reworks from the March
> commitfest, then combined with the lack of CF in May there wouldn't be
> any chance for new patches in the first half of the year, and an
> effective block on rewrites for 6 months- the next CF is only in July.
> Sure, there is a bit of leeway there, as some patches get committed
> before the commitfest they're registered in is marked active, but our
> development workflow is already quite hostile to incidental
> contributor's patches [^0], and increasing the periods in which
> authors shouldn't expect their patches to be reviewed due to a major
> release that's planned in >5 months is probably not going to help the
> case.
> 

But I don't think I suggested to ban such patches from the March CF
entirely. Surely those patches can still be submitted, reviewed, and
even reworked in the last CF. All I said is it might be better to treat
those patches as not committable by default. Sorry if that wasn't clear
enough ...

Would this be an additional restriction? I'm not quite sure. Surely if
the intent is to only commit patches that we agree are in "sufficiently
good" shape, and getting them into such shape requires time & reviews,
this would not be a significant change.

FWIW I'm not a huge fan of hard unbreakable rules, so there should be
some leeway when justified, but the bar would be somewhat higher (clear
consensus, RMT having a chance to say no, ...).

>>> So, I think that'd be counter-productive, as this would get the
>>> perverse incentive to band-aid over (architectural) issues to limit
>>> churn inside the patch, rather than fix those issues in a way that's
>>> appropriate for the project as a whole.
>>>
>>
>> Surely those architectural shortcomings should be identified in a review
>> - which however requires time to do properly, and thus is an argument
>> for ensuring there's enough time for such review (which is in direct
>> conflict with the last-minute crush, IMO).
>>
>> Once again, I 100% agree we need to do better in handling patches from
>> non-committers, absolutely no argument there. But does it require this
>> last-minute crush? I don't think so, it can't be at the expense of
>> proper review and getting it right.
> 
> Agreed on this, 100%, but as mentioned above, the March commitfest

Re: broken JIT support on Fedora 40

2024-04-09 Thread Dmitry Dolgov

> On Tue, Apr 09, 2024 at 07:07:58PM +1200, Thomas Munro wrote:
> On Sat, Apr 6, 2024 at 5:01 AM Dmitry Dolgov <9erthali...@gmail.com> wrote:
> > > > Yep, I think this is it. I've spent some hours trying to understand why
> > > > suddenly deform function has noundef ret attribute, when it shouldn't --
> > > > this explains it and the proposed change fixes the crash. One thing that
> > > > is still not clear to me though is why this copied attribute doesn't
> > > > show up in the bitcode dumped right before running inline pass (I've
> > > > added this to troubleshoot the issue).
> > >
> > > One thing to consider in this context is indeed adding "verify" pass as
> > > suggested in the PR, at least for the debugging configuration. Without 
> > > the fix
> > > it immediately returns:
> > >
> > >   Running analysis: VerifierAnalysis on deform_0_1
> > >   Attribute 'noundef' applied to incompatible type!
> > >
> > >   llvm error: Broken function found, compilation aborted!
> >
> > Here is what I have in mind. Interestingly enough, it also shows few
> > more errors besides "noundef":
> >
> > Intrinsic name not mangled correctly for type arguments! Should be: 
> > llvm.lifetime.end.p0
> > ptr @llvm.lifetime.end.p0i8
> >
> > It refers to the function from create_LifetimeEnd, not sure how
> > important is this.
>
> Would it be too slow to run the verify pass always, in assertion
> builds?  Here's a patch for the original issue, and a patch to try
> that idea + a fix for that other complaint it spits out.  The latter
> would only run for LLVM 17+, but that seems OK.

Sounds like a good idea. About the overhead, I've done a quick test on the
reproducer at hands, doing explain analyze in a tight loop and fetching
"optimization" timinigs. It gives quite visible difference 96ms p99 with verify
vs 46ms p99 without verify (and a rather low stddev, ~1.5ms). But I can
imagine it's acceptable for a build with assertions.

Btw, I've found there is a C-api for this exposed, which produces the same
warnings for me. Maybe it would be even better this way:

/**
 * Toggle adding the VerifierPass for the PassBuilder, ensuring all 
functions
 * inside the module is valid.
 */
void LLVMPassBuilderOptionsSetVerifyEach(LLVMPassBuilderOptionsRef 
Options,

 LLVMBool VerifyEach);


+   /* In assertion builds, run the LLVM verify pass. */
+#ifdef USE_ASSERT_CHECKING
+   LLVMPassBuilderOptionsSetVerifyEach(options, true);
+#endif

Incorrect handling of IS [NOT] NULL quals on inheritance parents

2024-04-09 Thread Richard Guo

In b262ad440e we introduced an optimization that drops IS NOT NULL quals
on a NOT NULL column, and reduces IS NULL quals on a NOT NULL column to
constant-FALSE.  I happened to notice that this is not working correctly
for traditional inheritance parents.  Traditional inheritance parents
might have NOT NULL constraints marked NO INHERIT, while their child
tables do not have NOT NULL constraints.  In such a case, we would have
problems when we have removed redundant IS NOT NULL restriction clauses
of the parent rel, as this could cause NULL values from child tables to
not be filtered out, or when we have reduced IS NULL restriction clauses
of the parent rel to constant-FALSE, as this could cause NULL values
from child tables to not be selected out.  As an example, consider

create table p (a int);
create table c () inherits (p);

alter table only p alter a set not null;

insert into c values (null);

-- The IS NOT NULL qual is droped, causing the NULL value from 'c' to
-- not be filtered out
explain (costs off) select * from p where a is not null;
   QUERY PLAN
-
 Append
   ->  Seq Scan on p p_1
   ->  Seq Scan on c p_2
(3 rows)

select * from p where a is not null;
 a
---

(1 row)

-- The IS NULL qual is reduced to constant-FALSE, causing the NULL value
-- from 'c' to not be selected out
explain (costs off) select * from p where a is null;
QUERY PLAN
--
 Result
   One-Time Filter: false
(2 rows)

select * from p where a is null;
 a
---
(0 rows)


To fix this issue, I think we can avoid calculating notnullattnums for
inheritance parents in get_relation_info().  Meanwhile, when we populate
childrel's base restriction quals from parent rel's quals, we check if
each qual can be proven always false/true, to apply the optimization we
have in b262ad440e to each child.  Something like attached.

This can also be beneficial to partitioned tables in cases where the
parent table does not have NOT NULL constraints, while some of its child
tables do.  Previously, the optimization introduced in b262ad440e was
not applicable in this case.  With this change, the optimization can now
be applied to each child table that has the right NOT NULL constraints.

Thoughts?

Thanks
Richard


v1-0001-Fix-handling-of-IS-NOT-NULL-quals-on-inheritance-parents.patch
Description: Binary data

Re: Add notes to pg_combinebackup docs

2024-04-09 Thread Tomas Vondra

On 4/9/24 09:59, Martín Marqués wrote:
> Hello,
> 
> While doing some work/research on the new incremental backup feature
> some limitations were not listed in the docs. Mainly the fact that
> pg_combienbackup works with plain format and not tar.
> 

Right. The docs mostly imply this by talking about output directory and
backup directories, but making it more explicit would not hurt.

FWIW it'd be great if we could make incremental backups work with tar
format in the future too. People probably don't want to keep around the
expanded data directory or extract everything before combining the
backups is not very convenient. Reading and writing the tar would make
this simpler.

> Around the same time, Tomas Vondra tested incremental backups with a
> cluster where he enabled checksums after taking the previous full
> backup. After combining the backups the synthetic backup had pages
> with checksums and other pages without checksums which ended in
> checksum errors.
> 

I'm not sure just documenting this limitation is sufficient. We can't
make the incremental backups work in this case (it's as if someone
messes with cluster without writing stuff into WAL), but I think we
should do better than silently producing (seemingly) corrupted backups.

I say seemingly, because the backup is actually fine, the only problem
is it has checksums enabled in the controlfile, but the pages from the
full backup (and the early incremental backups) have no checksums.

What we could do is detect this in pg_combinebackup, and either just
disable checksums with a warning and hint to maybe enable them again. Or
maybe just print that the user needs to disable them.

I was thinking maybe we could detect this while taking the backups, and
force taking a full backup if checksums got enabled since the last
backup. But we can't do that because we only have the manifest from the
last backup, and the manifest does not include info about checksums.

It's a bit unfortunate we don't have a way to enable checksums online.
That'd not have this problem IIRC, because it writes proper WAL. Maybe
it's time to revive that idea ... I recall there were some concerns
about tracking progress to allow resuming stuff, but maybe not having
anything because in some (rare?) cases it'd need to do more work does
not seem like a great trade off.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

1 2 >

1 - 100 of 112 matches

Mail list logo