date:20240510

Hi, Tom!

On Fri, 10 May 2024, 04:43 Tom Lane,  wrote:

> Alexander Korotkov  writes:
> > The revised patchset is attached.  I applied cosmetical changes.  I'm
> > going to push it if no objections.
>
> Is this really suitable material to be pushing post-feature-freeze?
> It doesn't look like it's fixing any new-in-v17 issues.
>
> regards, tom lane
>

I think these patches are nice-to-have optimizations and refactorings to
make code look better. They are not necessary for the main feature. They
don't fix any bugs. But they were requested in the thread, and make sense
in my opinion.

I really don't know what's the policy of applying code improvements other
than bugfixes post feature-freeze. IMO they are safe to be appiled to v17,
but they also could be added later.

Regards,
Pavel Borisov
Supabase

>

Re: Parallel CREATE INDEX for GIN indexes

2024-05-10 Thread Andy Fan

Tomas Vondra  writes:

>> I guess both of you are talking about worker process, if here are
>> something in my mind:
>> 
>> *btbuild* also let the WORKER dump the tuples into Sharedsort struct
>> and let the LEADER merge them directly. I think this aim of this design
>> is it is potential to save a mergeruns. In the current patch, worker dump
>> to local tuplesort and mergeruns it and then leader run the merges
>> again. I admit the goal of this patch is reasonable, but I'm feeling we
>> need to adapt this way conditionally somehow. and if we find the way, we
>> can apply it to btbuild as well. 
>> 
>
> I'm a bit confused about what you're proposing here, or how is that
> related to what this patch is doing and/or to the what Matthias
> mentioned in his e-mail from last week.
>
> Let me explain the relevant part of the patch, and how I understand the
> improvement suggested by Matthias. The patch does the work in three phases:

What's in my mind is:

1. WORKER-1

Tempfile 1:

key1:  1
key3:  2
...

Tempfile 2:

key5:  3
key7:  4
...

2. WORKER-2

Tempfile 1:

Key2: 1
Key6: 2
...

Tempfile 2:
Key3: 3
Key6: 4
..

In the above example:  if we do the the merge in LEADER, only 1 mergerun
is needed. reading 4 tempfile 8 tuples in total and write 8 tuples.

If we adds another mergerun into WORKER, the result will be:

WORKER1:  reading 2 tempfile 4 tuples and write 1 tempfile (called X) 4
tuples. 
WORKER2:  reading 2 tempfile 4 tuples and write 1 tempfile (called Y) 4
tuples. 

LEADER:  reading 2 tempfiles (X & Y) including 8 tuples and write it
into final tempfile.

So the intermedia result X & Y requires some extra effort.  so I think
the "extra mergerun in worker" is *not always* a win, and my proposal is
if we need to distinguish the cases in which one we should add the
"extra mergerun in worker" step.

> The trouble with (2) is that it "just copies" data from one tuplesort
> into another, increasing the disk space requirements. In an extreme
> case, when nothing can be combined, it pretty much doubles the amount of
> disk space, and makes the build longer.

This sounds like the same question as I talk above, However my proposal
is to distinguish which cost is bigger between "the cost saving from
merging TIDs in WORKERS" and "cost paid because of the extra copy",
then we do that only when we are sure we can benefits from it, but I
know it is hard and not sure if it is doable. 

> What I think Matthias is suggesting, is that this "TID list merging"
> could be done directly as part of the tuplesort in step (1). So instead
> of just moving the "sort tuples" from the appropriate runs, it could
> also do an optional step of combining the tuples and writing this
> combined tuple into the tuplesort result (for that worker).

OK, I get it now. So we talked about lots of merge so far at different
stage and for different sets of tuples.

1. "GIN deform buffer" did the TIDs merge for the same key for the tuples
in one "deform buffer" batch, as what the current master is doing.

2. "in memory buffer sort" stage, currently there is no TID merge so
far and Matthias suggest that. 

3. Merge the TIDs for the same keys in LEADER vs in WORKER first +
LEADER then. this is what your 0002 commit does now and I raised some
concerns as above.

> Matthias also mentioned this might be useful when building btree indexes
> with key deduplication.

> AFAICS this might work, although it probably requires for the "combined"
> tuple to be smaller than the sum of the combined tuples (in order to fit
> into the space). But at least in the GIN build in the workers this is
> likely true, because the TID lists do not overlap (and thus not hurting
> the compressibility).
>
> That being said, I still see this more as an optimization than something
> required for the patch,

If GIN deform buffer is big enough (like greater than the in memory
buffer sort) shall we have any gain because of this, since the
scope is the tuples in in-memory-buffer-sort. 

> and I don't think I'll have time to work on this
> anytime soon. The patch is not extremely complex, but it's not trivial
> either. But if someone wants to take a stab at extending tuplesort to
> allow this, I won't object ...

Agree with this. I am more interested with understanding the whole
design and the scope to fix in this patch, and then I can do some code
review and testing, as for now, I still in the "understanding design and
scope" stage. If I'm too slow about this patch, please feel free to
commit it any time and I don't expect I can find any valueable
improvement and bugs.  I probably needs another 1 ~ 2 weeks to study
this patch.

-- 
Best Regards
Andy Fan

Re: First draft of PG 17 release notes

2024-05-10 Thread Bharath Rupireddy

On Thu, May 9, 2024 at 9:34 AM Bruce Momjian  wrote:
>
> I have committed the first draft of the PG 17 release notes;  you can
> see the results here:
>
> https://momjian.us/pgsql_docs/release-17.html
>
> It will be improved until the final release.  The item count is 188,
> which is similar to recent releases:
>
> release-10:  189
> release-11:  170
> release-12:  180
> release-13:  178
> release-14:  220
> release-15:  184
> release-16:  206
> release-17:  188
>
> I welcome feedback.  For some reason it was an easier job than usual.

Thanks a lot for this work Bruce! It looks like commit
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=91f2cae7a4e664e9c0472b364c7db29d755ab151
is missing from daft release notes. Just curious to know if it's
intentional or a miss out.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2024-05-10 Thread Alexander Korotkov

On Fri, May 10, 2024 at 3:43 AM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > The revised patchset is attached.  I applied cosmetical changes.  I'm
> > going to push it if no objections.
>
> Is this really suitable material to be pushing post-feature-freeze?
> It doesn't look like it's fixing any new-in-v17 issues.

These are code improvements to the 5ae2087202, which answer critics in
the thread.  0001 comprises an optimization, but it's rather small and
simple.  0002 and 0003 contain refactoring.  0004 contains better
error reporting.  For me this looks like pretty similar to what others
commit post-FF, isn't it?

--
Regards,
Alexander Korotkov
Supabase

Use pgBufferUsage for block reporting in analyze

2024-05-10 Thread Anthonin Bonnefoy

Hi,

Analyze logs within autovacuum uses specific variables
VacuumPage{Hit,Miss,Dirty} to track the buffer usage count. However,
pgBufferUsage already provides block usage tracking and handles more cases
(temporary tables, parallel workers...).

Those variables were only used in two places, block usage reporting in
verbose vacuum and analyze. 5cd72cc0c5017a9d4de8b5d465a75946da5abd1d
removed their usage in the vacuum command as part of a bugfix.

This patch replaces those Vacuum specific variables by pgBufferUsage
in analyze. This makes VacuumPage{Hit,Miss,Dirty} unused and removable.
This commit removes both their calls in bufmgr and their declarations.

Regards,
Anthonin


v1-0001-Use-pgBufferUsage-for-block-reporting-in-analyze.patch
Description: Binary data

Re: gcc 12.1.0 warning

Hi,

On Tue, 23 Apr 2024 at 19:59, Andres Freund  wrote:
>
>
> Which seems entirely legitimate. ISTM that guc_var_compare() ought to only
> cast the pointers to the key type, i.e. char *.  And incidentally that does
> prevent the warning.
>
> The reason it doesn't happen in newer versions of postgres is that we aren't
> using guc_var_compare() in the relevant places anymore...

The fix is attached. It cleanly applies from REL_15_STABLE to
REL_12_STABLE, fixes the warnings and the tests pass.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft
From 588c99f5c402fc41414702133636e5a51f9e3470 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Fri, 10 May 2024 10:43:45 +0300
Subject: [PATCH v1] Fix -Warray-bounds warning in guc_var_compare() function

There are a couple of places that guc_var_compare() function takes
'const char ***' type and then casts it to the
'const struct config_generic *' type. This triggers '-Warray-bounds'
warning. So, instead cast them to the 'const char *' type.

Author: Nazir Bilal Yavuz 
Reported-by: Erik Rijkers 
Suggested-by: Andres Freund 
Discussion: https://postgr.es/m/a74a1a0d-0fd2-3649-5224-4f754e8f91aa%40xs4all.nl
---
 src/backend/utils/misc/guc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index c410ba532d2..eec97dea659 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -5721,10 +5721,10 @@ find_option(const char *name, bool create_placeholders, bool skip_errors,
 static int
 guc_var_compare(const void *a, const void *b)
 {
-	const struct config_generic *confa = *(struct config_generic *const *) a;
-	const struct config_generic *confb = *(struct config_generic *const *) b;
+	const char *confa_name = **(char **const *) a;
+	const char *confb_name = **(char **const *) b;
 
-	return guc_name_compare(confa->name, confb->name);
+	return guc_name_compare(confa_name, confb_name);
 }
 
 /*
-- 
2.43.0

Re: consider -Wmissing-variable-declarations

2024-05-10 Thread Heikki Linnakangas


On 09/05/2024 12:23, Peter Eisentraut wrote:

In [0] I had noticed that we have no automated verification that global
variables are declared in header files.  (For global functions, we have
this through -Wmissing-prototypes.)  As I mentioned there, I discovered
the Clang compiler option -Wmissing-variable-declarations, which does
exactly that.  Clang has supported this for quite some time, and GCC 14,
which was released a few days ago, now also supports it.  I went and
installed this option into the standard build flags and cleaned up the
warnings it found, which revealed a number of interesting things.


Nice! More checks like this is good in general.


Attached are patches organized by sub-topic.  The most dubious stuff is
in patches 0006 and 0007.  A bunch of GUC-related variables are not in
header files but are pulled in via ad-hoc extern declarations.  I can't
recognize an intentional scheme there, probably just done for
convenience or copied from previous practice.  These should be organized
into appropriate header files.


+1 for moving all these to header files. Also all the "other stuff" in 
patch 0007.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Fix parallel vacuum buffer usage reporting

Hi,

Thank you for working on this!

On Wed, 1 May 2024 at 06:37, Masahiko Sawada  wrote:
>
> Thank you for further testing! I've pushed the patch.

I realized a behaviour change while looking at 'Use pgBufferUsage for
block reporting in analyze' thread [1]. Since that change applies here
as well, I thought it is better to mention it here.

Before this commit, VacuumPageMiss did not count the blocks if its
read was already completed by other backends [2]. Now,
'pgBufferUsage.local_blks_read + pgBufferUsage.shared_blks_read'
counts every block attempted to be read; possibly double counting if
someone else has already completed the read. I do not know which
behaviour is correct but I wanted to mention this.

[1] 
https://postgr.es/m/CAO6_Xqr__kTTCLkftqS0qSCm-J7_xbRG3Ge2rWhucxQJMJhcRA%40mail.gmail.com

[2] In the WaitReadBuffers() function, see comment:
/*
 * Skip this block if someone else has already completed it.  If an
 * I/O is already in progress in another backend, this will wait for
 * the outcome: either done, or something went wrong and we will
 * retry.
 */

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

Re: Use pgBufferUsage for block reporting in analyze

2024-05-10 Thread Michael Paquier

On Fri, May 10, 2024 at 10:54:07AM +0200, Anthonin Bonnefoy wrote:
> This patch replaces those Vacuum specific variables by pgBufferUsage
> in analyze. This makes VacuumPage{Hit,Miss,Dirty} unused and removable.
> This commit removes both their calls in bufmgr and their declarations.

Hmm, yeah, it looks like you're right.  I can track all the blocks
read, hit and dirtied for VACUUM and ANALYZE in all the code path
where these removed variables were incremented.  This needs some
runtime check to make sure that the calculations are consistent before
and after the fact (cannot do that now).

 appendStringInfo(&buf, _("buffer usage: %lld hits, %lld misses, 
%lld dirtied\n"),
- (long long) AnalyzePageHit,
- (long long) AnalyzePageMiss,
- (long long) AnalyzePageDirty);
+ (long long) (bufferusage.shared_blks_hit + 
bufferusage.local_blks_hit),
+ (long long) (bufferusage.shared_blks_read + 
bufferusage.local_blks_read),
+ (long long) (bufferusage.shared_blks_dirtied + 
bufferusage.local_blks_dirtied));

Perhaps this should say "read" rather than "miss" in the logs as the
two read variables for the shared and local blocks are used?  For
consistency, at least.

That's not material for v17, only for v18.
--
Michael

signature.asc
Description: PGP signature

Re: open items

2024-05-10 Thread Alvaro Herrera

On 2024-May-09, Robert Haas wrote:

> * not null constraints break dump/restore. I asked whether all of the
> issues had been addressed here and Justin Pryzby opined that the only
> thing that was still relevant for this release was a possible test
> case change, which I would personally consider a good enough reason to
> at least consider calling this done. But it's not clear to me whether
> Justin's opinion is the consensus position, or perhaps more
> relevantly, whether it's Álvaro's position.

I have fixed the reported issues, so as far as these specific items go,
we could close the reported open item.

However, in doing so I realized that some code is more complex than it
needs to be, and exposes users to ugliness that they don't need to see,
so I posted additional patches.  I intend to get these committed today.

A possible complaint is that the upgrade mechanics which are mostly in
pg_dump with some pieces in pg_upgrade are not very explicitly
documented.  There are already comments in all relevant places, but
perhaps an overall picture is necessary.  I'll see about this, probably
as a long comment somewhere.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"La virtud es el justo medio entre dos defectos" (Aristóteles)

Re: Parallel CREATE INDEX for GIN indexes

2024-05-10 Thread Tomas Vondra

On 5/10/24 07:53, Andy Fan wrote:
> 
> Tomas Vondra  writes:
> 
>>> I guess both of you are talking about worker process, if here are
>>> something in my mind:
>>>
>>> *btbuild* also let the WORKER dump the tuples into Sharedsort struct
>>> and let the LEADER merge them directly. I think this aim of this design
>>> is it is potential to save a mergeruns. In the current patch, worker dump
>>> to local tuplesort and mergeruns it and then leader run the merges
>>> again. I admit the goal of this patch is reasonable, but I'm feeling we
>>> need to adapt this way conditionally somehow. and if we find the way, we
>>> can apply it to btbuild as well. 
>>>
>>
>> I'm a bit confused about what you're proposing here, or how is that
>> related to what this patch is doing and/or to the what Matthias
>> mentioned in his e-mail from last week.
>>
>> Let me explain the relevant part of the patch, and how I understand the
>> improvement suggested by Matthias. The patch does the work in three phases:
> 
> What's in my mind is:
> 
> 1. WORKER-1
> 
> Tempfile 1:
> 
> key1:  1
> key3:  2
> ...
> 
> Tempfile 2:
> 
> key5:  3
> key7:  4
> ...
> 
> 2. WORKER-2
> 
> Tempfile 1:
> 
> Key2: 1
> Key6: 2
> ...
> 
> Tempfile 2:
> Key3: 3
> Key6: 4
> ..
> 
> In the above example:  if we do the the merge in LEADER, only 1 mergerun
> is needed. reading 4 tempfile 8 tuples in total and write 8 tuples.
> 
> If we adds another mergerun into WORKER, the result will be:
> 
> WORKER1:  reading 2 tempfile 4 tuples and write 1 tempfile (called X) 4
> tuples. 
> WORKER2:  reading 2 tempfile 4 tuples and write 1 tempfile (called Y) 4
> tuples. 
> 
> LEADER:  reading 2 tempfiles (X & Y) including 8 tuples and write it
> into final tempfile.
> 
> So the intermedia result X & Y requires some extra effort.  so I think
> the "extra mergerun in worker" is *not always* a win, and my proposal is
> if we need to distinguish the cases in which one we should add the
> "extra mergerun in worker" step.
> 

The thing you're forgetting is that the mergesort in the worker is
*always* a simple append, because the lists are guaranteed to be
non-overlapping, so it's very cheap. The lists from different workers
are however very likely to overlap, and hence a "full" mergesort is
needed, which is way more expensive.

And not only that - without the intermediate merge, there will be very
many of those lists the leader would have to merge.

If we do the append-only merges in the workers first, we still need to
merge them in the leader, of course, but we have few lists to merge
(only about one per worker).

Of course, this means extra I/O on the intermediate tuplesort, and it's
not difficult to imagine cases with no benefit, or perhaps even a
regression. For example, if the keys are unique, the in-worker merge
step can't really do anything. But that seems quite unlikely IMHO.

Also, if this overhead was really significant, we would not see the nice
speedups I measured during testing.

>> The trouble with (2) is that it "just copies" data from one tuplesort
>> into another, increasing the disk space requirements. In an extreme
>> case, when nothing can be combined, it pretty much doubles the amount of
>> disk space, and makes the build longer.
> 
> This sounds like the same question as I talk above, However my proposal
> is to distinguish which cost is bigger between "the cost saving from
> merging TIDs in WORKERS" and "cost paid because of the extra copy",
> then we do that only when we are sure we can benefits from it, but I
> know it is hard and not sure if it is doable. 
> 

Yeah. I'm not against picking the right execution strategy during the
index build, but it's going to be difficult, because we really don't
have the information to make a reliable decision.

We can't even use the per-column stats, because it does not say much
about the keys extracted by GIN, I think. And we need to do the decision
at the very beginning, before we write the first batch of data either to
the local or shared tuplesort.

But maybe we could wait until we need to flush the first batch of data
(in the callback), and make the decision then? In principle, if we only
flush once at the end, the intermediate sort is not needed at all (fairy
unlikely for large data sets, though).

Well, in principle, maybe we could even start writing into the local
tuplesort, and then "rethink" after a while and switch to the shared
one. We'd still need to copy data we've already written to the local
tuplesort, but hopefully that'd be just a fraction compared to doing
that for the whole table.

>> What I think Matthias is suggesting, is that this "TID list merging"
>> could be done directly as part of the tuplesort in step (1). So instead
>> of just moving the "sort tuples" from the appropriate runs, it could
>> also do an optional step of combining the tuples and writing this
>> combined tuple into the tuplesort result (for that worker).
> 
> OK, I get it now. So we talked about lots of merge so far at differen

Re: Fix parallel vacuum buffer usage reporting

2024-05-10 Thread Alena Rybakina

Hi! I could try to check it with the test, but I want to ask you about 
details, because I'm not sure that I completely understand the test case.


You mean that we need to have two backends and on one of them we deleted 
the tuples before vacuum called the other, do you?


On 10.05.2024 13:25, Nazir Bilal Yavuz wrote:

Hi,

Thank you for working on this!

On Wed, 1 May 2024 at 06:37, Masahiko Sawada  wrote:

Thank you for further testing! I've pushed the patch.

I realized a behaviour change while looking at 'Use pgBufferUsage for
block reporting in analyze' thread [1]. Since that change applies here
as well, I thought it is better to mention it here.

Before this commit, VacuumPageMiss did not count the blocks if its
read was already completed by other backends [2]. Now,
'pgBufferUsage.local_blks_read + pgBufferUsage.shared_blks_read'
counts every block attempted to be read; possibly double counting if
someone else has already completed the read. I do not know which
behaviour is correct but I wanted to mention this.

[1] 
https://postgr.es/m/CAO6_Xqr__kTTCLkftqS0qSCm-J7_xbRG3Ge2rWhucxQJMJhcRA%40mail.gmail.com

[2] In the WaitReadBuffers() function, see comment:
 /*
  * Skip this block if someone else has already completed it.  If an
  * I/O is already in progress in another backend, this will wait for
  * the outcome: either done, or something went wrong and we will
  * retry.
  */


--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: pgsql: Fix overread in JSON parsing errors for incomplete byte sequence

2024-05-10 Thread Alvaro Herrera

On 2024-May-09, Michael Paquier wrote:

> Fix overread in JSON parsing errors for incomplete byte sequences

I'm getting this error in the new test:

t/002_inline.pl  1/? 
#   Failed test 'incomplete UTF-8 sequence, chunk size 3: correct error output'
#   at t/002_inline.pl line 134.
#   'Escape sequence "\�1+2" is invalid.'
# doesn't match '(?^:(Token|Escape sequence) ""?\\\x{F5}" is invalid)'
# Looks like you failed 1 test of 850.

Not sure what's going on here, or why it fails for me while the
buildfarm is all happy.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Every machine is a smoke machine if you operate it wrong enough."
https://twitter.com/libseybieda/status/1541673325781196801

Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

Hi, Alexander!

On Fri, 10 May 2024 at 12:39, Alexander Korotkov 
wrote:

> On Fri, May 10, 2024 at 3:43 AM Tom Lane  wrote:
> > Alexander Korotkov  writes:
> > > The revised patchset is attached.  I applied cosmetical changes.  I'm
> > > going to push it if no objections.
> >
> > Is this really suitable material to be pushing post-feature-freeze?
> > It doesn't look like it's fixing any new-in-v17 issues.
>
> These are code improvements to the 5ae2087202, which answer critics in
> the thread.  0001 comprises an optimization, but it's rather small and
> simple.  0002 and 0003 contain refactoring.  0004 contains better
> error reporting.  For me this looks like pretty similar to what others
> commit post-FF, isn't it?
>
I've re-checked patches v2. Differences from v1 are in improving
naming/pgindent's/commit messages.
In 0002 order of variables in struct BtreeLastVisibleEntry changed.
This doesn't change code behavior.

Patch v2-0003 doesn't contain credits and a discussion link. All other
patches do.

Overall, patches contain small performance optimization (0001), code
refactoring and error reporting changes. IMO they could be pushed post-FF.

Regards,
Pavel.

Re: pgsql: Fix overread in JSON parsing errors for incomplete byte sequence

2024-05-10 Thread Alvaro Herrera

On 2024-May-10, Alvaro Herrera wrote:

> Not sure what's going on here, or why it fails for me while the
> buildfarm is all happy.

Ah, I ran 'git clean -dfx' and now it works correctly.  I must have had
an incomplete rebuild.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"World domination is proceeding according to plan"(Andrew Morton)

Re: open items

2024-05-10 Thread Melanie Plageman

On Thu, May 9, 2024 at 3:28 PM Robert Haas  wrote:
>
> Just a few reminders about the open items list at
> https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items --
>
> * Incorrect Assert in heap_end/rescan for BHS. Either the description
> of this item is inaccurate, or we've been unable to fix an incorrect
> assert after more than a month. I interpret
> https://www.postgresql.org/message-id/54858BA1-084E-4F7D-B2D1-D15505E512FF%40yesql.se
> as a vote in favor of committing some patch by Melanie to fix this.
> Either Tomas should commit that patch, or Melanie should commit that
> patch, or somebody should say why that patch shouldn't be committed,
> or someone should request more help determining whether that patch is
> indeed the correct fix, or something. But let's not just sit on this.

Sorry, yes, the trivial fix has been done for a while. There is one
outstanding feedback on the patch: an update to one of the comments
suggested by Tomas. I got distracted by trying to repro and fix a bug
from the section "live issues affecting stable branches". I will
update this BHS patch by tonight and commit it once Tomas has a chance
to +1.

Thanks,
Melanie

WAL record CRC calculated incorrectly because of underlying buffer modification

2024-05-10 Thread Alexander Lakhin


Hello hackers,

I've investigated a recent buildfarm failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dodo&dt=2024-05-02%2006%3A40%3A36

where the test failed due to a CRC error:
2024-05-02 17:08:18.401 ACST [3406:7] LOG:  incorrect resource manager data 
checksum in record at 0/F14D7A60

(Chipmunk produced similar errors as well:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk&dt=2022-08-25%2019%3A40%3A11
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk&dt=2024-03-22%2003%3A20%3A39
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk&dt=2023-08-19%2006%3A43%3A20
)

and discovered that XLogRecordAssemble() calculates CRC over a buffer,
that might be modified by another process.

With the attached patch applied, the following test run:
echo "
autovacuum_naptime = 1
autovacuum_vacuum_threshold = 1

wal_consistency_checking = all
" >/tmp/temp.config

for ((i=1;i<=100;i++)); do echo "iteration $i"; TEMP_CONFIG=/tmp/temp.config TESTS="test_setup hash_index" make 
check-tests -s || break; done


fails for me on iterations 7, 10, 17:
ok 1 - test_setup   2557 ms
not ok 2 - hash_index  24719 ms
# (test process exited with exit code 2)

postmaster.log contains:
2024-05-10 12:46:44.320 UTC checkpointer[1881151] LOG:  checkpoint starting: 
immediate force wait
2024-05-10 12:46:44.365 UTC checkpointer[1881151] LOG:  checkpoint complete: wrote 41 buffers (0.3%); 0 WAL file(s) 
added, 0 removed, 26 recycled; write=0.001 s, sync=0.001 s, total=0.046 s; sync files=0, longest=0.000 s, average=0.000 
s; distance=439134 kB, estimate=527137 kB; lsn=0/3CE131F0, redo lsn=0/3CE13198

TRAP: failed Assert("memcmp(block1_ptr, block1_copy, block1_len) == 0"), File: 
"xloginsert.c", Line: 949, PID: 1881271
ExceptionalCondition at assert.c:52:13
XLogRecordAssemble at xloginsert.c:953:1
XLogInsert at xloginsert.c:520:9
hashbucketcleanup at hash.c:844:14
hashbulkdelete at hash.c:558:3
index_bulk_delete at indexam.c:760:1
vac_bulkdel_one_index at vacuum.c:2498:10
lazy_vacuum_one_index at vacuumlazy.c:2443:10
lazy_vacuum_all_indexes at vacuumlazy.c:2026:26
lazy_vacuum at vacuumlazy.c:1944:10
lazy_scan_heap at vacuumlazy.c:1050:3
heap_vacuum_rel at vacuumlazy.c:503:2
vacuum_rel at vacuum.c:2214:2
vacuum at vacuum.c:622:8
autovacuum_do_vac_analyze at autovacuum.c:3102:2
do_autovacuum at autovacuum.c:2425:23
AutoVacWorkerMain at autovacuum.c:1569:3
pgarch_die at pgarch.c:846:1
StartChildProcess at postmaster.c:3929:5
StartAutovacuumWorker at postmaster.c:3997:12
process_pm_pmsignal at postmaster.c:3809:3
ServerLoop at postmaster.c:1667:5
BackgroundWorkerInitializeConnection at postmaster.c:4156:1
main at main.c:184:3
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7fc71a8d7e40]
postgres: autovacuum worker regression(_start+0x25)[0x556a8631a5e5]
2024-05-10 12:46:45.038 UTC checkpointer[1881151] LOG:  checkpoint starting: 
immediate force wait
2024-05-10 12:46:45.965 UTC autovacuum worker[1881275] LOG: automatic analyze of table 
"regression.pg_catalog.pg_attribute"
    avg read rate: 0.000 MB/s, avg write rate: 5.409 MB/s
    buffer usage: 1094 hits, 0 misses, 27 dirtied
    system usage: CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.03 s
2024-05-10 12:46:46.892 UTC postmaster[1881150] LOG:  server process (PID 
1881271) was terminated by signal 6: Aborted
2024-05-10 12:46:46.892 UTC postmaster[1881150] DETAIL:  Failed process was running: autovacuum: VACUUM ANALYZE 
public.hash_cleanup_heap


(This can be reproduced with 027_stream_regress, of course, but it would
take more time.)

Best regards,
Alexanderdiff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index 9047601534..aeb462c137 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -902,8 +902,21 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
 	 */
 	INIT_CRC32C(rdata_crc);
 	COMP_CRC32C(rdata_crc, hdr_scratch + SizeOfXLogRecord, hdr_rdt.len - SizeOfXLogRecord);
+
+char block1_copy[BLCKSZ];
+char *block1_ptr = NULL;
+int block1_len = 0;
+
 	for (rdt = hdr_rdt.next; rdt != NULL; rdt = rdt->next)
+{
+	if (!block1_ptr)
+	{
+		memcpy(block1_copy, rdt->data, rdt->len);
+		block1_ptr = rdt->data;
+		block1_len = rdt->len;
+	}
 		COMP_CRC32C(rdata_crc, rdt->data, rdt->len);
+}
 
 	/*
 	 * Ensure that the XLogRecord is not too large.
@@ -930,6 +943,12 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
 	rechdr->xl_prev = InvalidXLogRecPtr;
 	rechdr->xl_crc = rdata_crc;
 
+if (block1_ptr)
+{
+pg_usleep(100);
+Assert(memcmp(block1_ptr, block1_copy, block1_len) == 0);
+}
+
 	return &hdr_rdt;
 }

Re: Fix parallel vacuum buffer usage reporting

Hi,

On Fri, 10 May 2024 at 14:49, Alena Rybakina  wrote:
>
> Hi! I could try to check it with the test, but I want to ask you about
> details, because I'm not sure that I completely understand the test case.
>
> You mean that we need to have two backends and on one of them we deleted
> the tuples before vacuum called the other, do you?
>

I think triggering a parallel vacuum is enough. I am able to see the
differences with the following:

You can apply the attached diff file to see the differences between
the previous version and the patched version. Then, run this query:

CREATE TABLE vacuum_fix (aid int, bid int, cid int) with
(autovacuum_enabled=false);
INSERT INTO vacuum_fix SELECT *, *, * FROM generate_series(1, 100);
CREATE INDEX a_idx on vacuum_fix (aid);
CREATE INDEX b_idx on vacuum_fix (bid);
CREATE INDEX c_idx on vacuum_fix (cid);
VACUUM vacuum_fix;
UPDATE vacuum_fix SET aid = aid + 1;
VACUUM (VERBOSE, PARALLEL 2) vacuum_fix ;

After that I saw:

INFO:  vacuuming "test.public.vacuum_fix"
INFO:  launched 2 parallel vacuum workers for index vacuuming (planned: 2)
INFO:  finished vacuuming "test.public.vacuum_fix": index scans: 1
...
...
buffer usage: 29343 hits, 9580 misses in the previous version, 14165
misses in the patched version, 14262 dirtied

Patched version counts 14165 misses but the previous version counts
9580 misses in this specific example.

--
Regards,
Nazir Bilal Yavuz
Microsoft
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 84cc983b6e6..582973d575b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -309,6 +309,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	PgStat_Counter startreadtime = 0,
 startwritetime = 0;
 	WalUsage	startwalusage = pgWalUsage;
+	int64		StartPageMiss = VacuumPageMiss;
 	BufferUsage startbufferusage = pgBufferUsage;
 	ErrorContextCallback errcallback;
 	char	  **indnames = NULL;
@@ -606,6 +607,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			StringInfoData buf;
 			char	   *msgfmt;
 			int32		diff;
+			int64		PageMissOp = VacuumPageMiss - StartPageMiss;
 			double		read_rate = 0,
 		write_rate = 0;
 
@@ -748,8 +750,9 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
 			 read_rate, write_rate);
 			appendStringInfo(&buf,
-			 _("buffer usage: %lld hits, %lld misses, %lld dirtied\n"),
+			 _("buffer usage: %lld hits, %lld misses in the previous version, %lld misses in the patched version, %lld dirtied\n"),
 			 (long long) (bufferusage.shared_blks_hit + bufferusage.local_blks_hit),
+			 (long long) (PageMissOp),
 			 (long long) (bufferusage.shared_blks_read + bufferusage.local_blks_read),
 			 (long long) (bufferusage.shared_blks_dirtied + bufferusage.local_blks_dirtied));
 			appendStringInfo(&buf,

Re: SQL:2011 application time

2024-05-10 Thread Peter Eisentraut

I have committed the 
v2-0001-Fix-ON-CONFLICT-DO-NOTHING-UPDATE-for-temporal-in.patch from 
this (confusingly, there was also a v2 earlier in this thread), and I'll 
continue working on the remaining items.



On 09.05.24 06:24, Paul Jungwirth wrote:
Here are a couple new patches, rebased to e305f715, addressing Peter's 
feedback. I'm still working on integrating jian he's suggestions for the 
last patch, so I've omitted that one here.


On 5/8/24 06:51, Peter Eisentraut wrote:
About v2-0001-Fix-ON-CONFLICT-DO-NOTHING-UPDATE-for-temporal-in.patch, 
I think the
ideas are right, but I wonder if we can fine-tune the new conditionals 
a bit.


--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -210,7 +210,7 @@ ExecOpenIndices(ResultRelInfo *resultRelInfo, bool 
speculative)
  * If the indexes are to be used for speculative 
insertion, add extra

  * information required by unique index entries.
  */
-   if (speculative && ii->ii_Unique)
+   if (speculative && ii->ii_Unique && 
!ii->ii_HasWithoutOverlaps)

 BuildSpeculativeIndexInfo(indexDesc, ii);

Here, I think we could check !indexDesc->rd_index->indisexclusion 
instead.  So we

wouldn't need ii_HasWithoutOverlaps.


Okay.

Or we could push this into BuildSpeculativeIndexInfo(); it could just 
skip the rest
if an exclusion constraint is passed, on the theory that all the 
speculative index

info is already present in that case.


I like how BuildSpeculativeIndexInfo starts with an Assert that it's 
given a unique index, so I've left the check outside the function. This 
seems cleaner anyway: the function stays more focused.



--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -815,7 +815,7 @@ infer_arbiter_indexes(PlannerInfo *root)
  */
 if (indexOidFromConstraint == idxForm->indexrelid)
 {
-   if (!idxForm->indisunique && onconflict->action == 
ONCONFLICT_UPDATE)
+   if ((!idxForm->indisunique || idxForm->indisexclusion) && 
onconflict->action == ONCONFLICT_UPDATE)

 ereport(ERROR,
 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
  errmsg("ON CONFLICT DO UPDATE not supported 
with exclusion constraints")));


Shouldn't this use only idxForm->indisexclusion anyway?  Like

+   if (idxForm->indisexclusion && onconflict->action == 
ONCONFLICT_UPDATE)


That matches what the error message is reporting afterwards.


Agreed.


  * constraints), so index under consideration can be immediately
  * skipped if it's not unique
  */
-   if (!idxForm->indisunique)
+   if (!idxForm->indisunique || idxForm->indisexclusion)
 goto next;

Maybe here we need a comment.  Or make that a separate statement, like


Yes, that is nice. Done.

Yours,

Re: open items

2024-05-10 Thread Daniel Gustafsson

> On 10 May 2024, at 14:48, Melanie Plageman  wrote:
> 
> On Thu, May 9, 2024 at 3:28 PM Robert Haas  wrote:
>> 
>> Just a few reminders about the open items list at
>> https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items --
>> 
>> * Incorrect Assert in heap_end/rescan for BHS. Either the description
>> of this item is inaccurate, or we've been unable to fix an incorrect
>> assert after more than a month. I interpret
>> https://www.postgresql.org/message-id/54858BA1-084E-4F7D-B2D1-D15505E512FF%40yesql.se
>> as a vote in favor of committing some patch by Melanie to fix this.

It's indeed a vote for that.

>> Either Tomas should commit that patch, or Melanie should commit that
>> patch, or somebody should say why that patch shouldn't be committed,
>> or someone should request more help determining whether that patch is
>> indeed the correct fix, or something. But let's not just sit on this.
> 
> Sorry, yes, the trivial fix has been done for a while. There is one
> outstanding feedback on the patch: an update to one of the comments
> suggested by Tomas. I got distracted by trying to repro and fix a bug
> from the section "live issues affecting stable branches". I will
> update this BHS patch by tonight and commit it once Tomas has a chance
> to +1.

+1

--
Daniel Gustafsson

Re: First draft of PG 17 release notes

On Fri, May 10, 2024 at 01:54:30PM +0530, Bharath Rupireddy wrote:
> On Thu, May 9, 2024 at 9:34 AM Bruce Momjian  wrote:
> >
> > I have committed the first draft of the PG 17 release notes;  you can
> > see the results here:
> >
> > https://momjian.us/pgsql_docs/release-17.html
> >
> > It will be improved until the final release.  The item count is 188,
> > which is similar to recent releases:
> >
> > release-10:  189
> > release-11:  170
> > release-12:  180
> > release-13:  178
> > release-14:  220
> > release-15:  184
> > release-16:  206
> > release-17:  188
> >
> > I welcome feedback.  For some reason it was an easier job than usual.
> 
> Thanks a lot for this work Bruce! It looks like commit
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=91f2cae7a4e664e9c0472b364c7db29d755ab151
> is missing from daft release notes. Just curious to know if it's
> intentional or a miss out.

I did not mention it because the commit didn't mention any performance
benefit and it seemed more like an internal change than something people
needed to know about.  I could reword and merge it into this item, if
you think I should:

 Improve performance of heavily-contended WAL writes (Bharath 
Rupireddy) 

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: Direct SSL connection with ALPN and HBA rules

2024-05-10 Thread Heikki Linnakangas


On 29/04/2024 22:32, Jacob Champion wrote:

On Mon, Apr 29, 2024 at 12:06 PM Heikki Linnakangas  wrote:

There is a small benefit with sslmode=prefer if you connect to a server
that doesn't support SSL, though. With sslnegotiation=direct, if the
server rejects the direct SSL connection, the client will reconnect and
try SSL with SSLRequest. The server will respond with 'N', and the
client will proceed without encryption. sslnegotiation=directonly
removes that SSLRequest attempt, eliminating one roundtrip.


Okay, agreed that in this case, there is a performance benefit. It's
not enough to convince me, honestly, but are there any other cases I
missed as well?


I realized one case that hasn't been discussed so far: If you use the 
combination of "sslmode=prefer sslnegotiation=requiredirect" to connect 
to a pre-v17 server that has SSL enabled but does not support direct SSL 
connections, you will fall back to a plaintext connection instead. 
That's almost certainly not what you wanted. I'm coming around to your 
opinion that we should not allow that combination.


Stepping back to summarize my thoughts, there are now three things I 
don't like about the status quo:


1. As noted above, the sslmode=prefer and sslnegotiation=requiredirect 
combination is somewhat dangerous, as you might unintentionally fall 
back to plaintext authentication when connecting to a pre-v17 server.


2. There is an asymmetry between "postgres" and "direct"
option names. "postgres" means "try only traditional negotiation", while
"direct" means "try direct first, and fall back to traditional
negotiation if it fails". That is apparent only if you know that the
"requiredirect" mode also exists.

3. The "require" word in "requiredirect" suggests that it's somehow
more strict or more secure, similar to sslmode=require. However, I don't 
consider direct SSL connections to be a security feature.



New proposal:

- Remove the "try both" mode completely, and rename "requiredirect" to 
just "direct". So there would be just two modes: "postgres" and 
"direct". On reflection, the automatic fallback mode doesn't seem very 
useful. It would make sense as the default, because then you would get 
the benefits automatically in most cases but still be compatible with 
old servers. But if it's not the default, you have to fiddle with libpq 
settings anyway to enable it, and then you might as well use the 
"requiredirect" mode when you know the server supports it. There isn't 
anything wrong with it as such, but given how much confusion there's 
been on how this all works, I'd prefer to cut this back to the bare 
minimum now. We can add it back in the future, and perhaps make it the 
default at the same time. This addresses points 2. and 3. above.


and:

- Only allow sslnegotiation=direct with sslmode=require or higher. This 
is what you, Jacob, wanted to do all along, and addresses point 1.


Thoughts?

--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Fix parallel vacuum buffer usage reporting

Hi,

On Fri, 10 May 2024 at 16:21, Nazir Bilal Yavuz  wrote:
>
> Hi,
>
> On Fri, 10 May 2024 at 14:49, Alena Rybakina  wrote:
> >
> > Hi! I could try to check it with the test, but I want to ask you about
> > details, because I'm not sure that I completely understand the test case.
> >
> > You mean that we need to have two backends and on one of them we deleted
> > the tuples before vacuum called the other, do you?
> >
>
> I think triggering a parallel vacuum is enough. I am able to see the
> differences with the following:
>
> You can apply the attached diff file to see the differences between
> the previous version and the patched version. Then, run this query:
>
> CREATE TABLE vacuum_fix (aid int, bid int, cid int) with
> (autovacuum_enabled=false);
> INSERT INTO vacuum_fix SELECT *, *, * FROM generate_series(1, 100);
> CREATE INDEX a_idx on vacuum_fix (aid);
> CREATE INDEX b_idx on vacuum_fix (bid);
> CREATE INDEX c_idx on vacuum_fix (cid);
> VACUUM vacuum_fix;
> UPDATE vacuum_fix SET aid = aid + 1;
> VACUUM (VERBOSE, PARALLEL 2) vacuum_fix ;
>
> After that I saw:
>
> INFO:  vacuuming "test.public.vacuum_fix"
> INFO:  launched 2 parallel vacuum workers for index vacuuming (planned: 2)
> INFO:  finished vacuuming "test.public.vacuum_fix": index scans: 1
> ...
> ...
> buffer usage: 29343 hits, 9580 misses in the previous version, 14165
> misses in the patched version, 14262 dirtied
>
> Patched version counts 14165 misses but the previous version counts
> 9580 misses in this specific example.

I am sorry that I showed the wrong thing, this is exactly what is
fixed in this patch. Actually, I do not know how to trigger it;
currently I am looking for it. I will share if anything comes to my
mind.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 49637284f91..a6f1df11066 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1355,6 +1355,7 @@ WaitReadBuffers(ReadBuffersOperation *operation)
 	IOContext	io_context;
 	IOObject	io_object;
 	char		persistence;
+	static int  double_counts = 0;
 
 	/*
 	 * Currently operations are only allowed to include a read of some range,
@@ -1426,6 +1427,7 @@ WaitReadBuffers(ReadBuffersOperation *operation)
 			  operation->smgr->smgr_rlocator.locator.relNumber,
 			  operation->smgr->smgr_rlocator.backend,
 			  true);
+			double_counts++;
 			continue;
 		}
 
@@ -1523,6 +1525,7 @@ WaitReadBuffers(ReadBuffersOperation *operation)
 		if (VacuumCostActive)
 			VacuumCostBalance += VacuumCostPageMiss * io_buffers_len;
 	}
+	elog(LOG, "Double counts = %d", double_counts);
 }
 
 /*

Serverside SNI support in libpq

2024-05-10 Thread Daniel Gustafsson

SNI was brought up the discussions around the ALPN work, and I have had asks
for it off-list, so I decided to dust off an old patch I started around the
time we got client-side SNI support but never finished (until now). Since
there is discussion and thinking around how we handle SSL right now I wanted to
share this early even though it will be parked in the July CF for now. There
are a few usecases for serverside SNI, allowing for completely disjoint CAs for
different hostnames is one that has come up. Using strict SNI mode (elaborated
on below) as a cross-host attack mitigation was mentioned in [0].

The attached patch adds serverside SNI support to libpq, it is still a bit
rough around the edges but I'm sharing it early to make sure I'm not designing
it in a direction that the community doesn't like. A new config file
$datadir/pg_hosts.conf is used for configuring which certicate and key should
be used for which hostname. The file is parsed in the same way as pg_ident
et.al so it allows for the usual include type statements we support. A new
GUC, ssl_snimode, is added which controls how the hostname TLS extension is
handled. The possible values are off, default and strict:

- off: pg_hosts.conf is not parsed and the hostname TLS extension is
not inspected at all. The normal SSL GUCs for certificates and keys
are used.
- default: pg_hosts.conf is loaded as well as the normal GUCs. If no
match for the TLS extension hostname is found in pg_hosts the cert
and key from the postgresql.conf GUCs is used as the default (used
as a wildcard host).
- strict: only pg_hosts.conf is loaded and the TLS extension hostname
MUST be passed and MUST have a match in the configuration, else the
connection is refused.

As of now the patch use default as the initial value for the GUC.

The way multiple certificates are handled is that libpq creates one SSL_CTX for
each at startup, and switch to the appropriate one when the connection is
inspected. Configuration handling is done in secure-common to not tie it to a
specific TLS backend (should we ever support more), but the validation of the
config values is left for the TLS backend.

There are a few known open items with this patch:

* There are two OpenSSL callbacks which can be used to inspect the hostname TLS
extension: SSL_CTX_set_tlsext_servername_callback and
SSL_CTX_set_client_hello_cb. The documentation for the latter says you
shouldn't use the former, and the docs for the former says you need it even if
you use the latter. For now I'm using SSL_CTX_set_tlsext_servername_callback
mainly because the OpenSSL tools themselves use that for SNI.

* The documentation is not polished at all and will require a more work to make
it passable I think. There are also lot's more testing that can be done, so
far it's pretty basic.

* I've so far only tested with OpenSSL and haven't yet verified how LibreSSL
handles this.

--
Daniel Gustafsson

[0]
https://www.postgresql.org/message-id/e782e9f4-a0cd-49f5-800b-5e32a1b29183%40eisentraut.org

v1-0001-POC-serverside-SNI-support-for-libpq.patch
Description: Binary data

Re: Bug: PGTYPEStimestamp_from_asc() in ECPG pgtypelib

"Ryo Matsumura (Fujitsu)"  writes:
>> But how come we haven't noticed that
>> before?  Have you added a setlocale() call somewhere?

> I didn't notice to this point.
> I added setlocale() to ECPG in my local branch.
> I will test again after removing it.
> It looks to me like existing ECPG code does not include setlocale().

> So Please ignore about behavior of PGTYPEStimestamp_fmt_asc().

If that's the only ecpg test case that fails under a non-C locale,
I think it'd be good to change it to use a non-locale-dependent
format string.  Surely there's no compelling reason why it has to
use %x/%X rather than something more stable.

> I want to continue to discuss about PGTYPEStimestamp_from_asc.
> Current PGTYPEStimestamp_from_asc() returns 0, but should we return -1?

No, I think changing its behavior now after so many years would be a
bad idea.  In any case, what's better about -1?  They are both legal
timestamp values.  I think we'd better fix the docs to match what the
code actually does, and to tell people that they *must* check errno
to detect errors.

>> I'm a bit inclined to just drop "block 3".

> Are you concerned at the above point?

Given your point that no other dt_test case is checking error
behavior, I'm not too concerned about dropping this one.  If
we wanted to take the issue seriously, we'd need to add a bunch
of new tests not just tweak this one.  (It's far from obvious that
this was even meant to be a test of an error case --- it looks like
just sloppily-verified testing to me.  "block 3" must have been
dropped in after the tests before and after it were written,
without noticing that it changed their results.)

regards, tom lane

End-of-cycle code beautification tasks

Per src/tools/RELEASE_CHANGES, we still have some routine
tasks to finish before beta1:

* Run mechanical code beautification tools:
  pgindent, pgperltidy, and "make reformat-dat-files"
  (complete steps from src/tools/pgindent/README)

* Renumber any manually-assigned OIDs between 8000 and 
  to lower numbers, using renumber_oids.pl (see notes in bki.sgml)

(Although I expect the pgindent changes to be minimal, there will
be some since src/tools/pgindent/typedefs.list hasn't been
maintained entirely accurately.)

I've been holding off doing this so as not to joggle the elbows
of people trying to complete open items or revert failed patches,
but it's getting to be time.  Any objections to doing these
things on Tuesday May 14th?

regards, tom lane

Re: Fix parallel vacuum buffer usage reporting

Hi,

On Fri, 10 May 2024 at 16:55, Nazir Bilal Yavuz  wrote:
>
> Hi,
>
> On Fri, 10 May 2024 at 16:21, Nazir Bilal Yavuz  wrote:
> >
> > Hi,
> >
> > On Fri, 10 May 2024 at 14:49, Alena Rybakina  
> > wrote:
> > >
> > > Hi! I could try to check it with the test, but I want to ask you about
> > > details, because I'm not sure that I completely understand the test case.
> > >
> > > You mean that we need to have two backends and on one of them we deleted
> > > the tuples before vacuum called the other, do you?
> > >

There should be some other backend(s) which will try to read the same
buffer with the ongoing VACUUM operation. I think it works now but the
reproduction steps are a bit racy. See:

1- Build Postgres with attached diff, it is the same
see_previous_output.diff that I shared two mails ago.

2- Run Postgres, all settings are default.

3- Use two client backends, let's name them as A and B client backends.

4- On A client backend, run:

CREATE TABLE vacuum_fix (aid int, bid int) with (autovacuum_enabled=false);
INSERT INTO vacuum_fix SELECT *, * FROM generate_series(1, 2000);
VACUUM vacuum_fix;
UPDATE vacuum_fix SET aid = aid + 1, bid = bid + 1;

5- Now it will be a bit racy, SQL commands below need to be run at the
same time. The aim is for VACUUM on A client backend and SELECT on B
client backend to read the same buffers at the same time. So, some of
the buffers will be double counted.

Firstly, run VACUUM on A client backend; immediately after running
VACUUM, run SELECT on B backend.

A client backend:
VACUUM VERBOSE vacuum_fix;

B client backend:
SELECT * from vacuum_fix WHERE aid = -1;

This is the output of the VACUUM VERBOSE on my end:

INFO:  vacuuming "test.public.vacuum_fix"
INFO:  finished vacuuming "test.public.vacuum_fix": index scans: 0
pages: 0 removed, 176992 remain, 176992 scanned (100.00% of total)
...
...
buffer usage: 254181 hits, 99030 misses in the previous version, 99865
misses in the patched version, 106830 dirtied
...
VACUUM
Time: 2578.217 ms (00:02.578)

VACUUM does not run parallel, so this test case does not trigger what
is fixed in this thread. As it can be seen, there is ~1000 buffers
difference.

I am not sure if there is an easier way to reproduce this but I hope this helps.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 84cc983b6e6..582973d575b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -309,6 +309,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	PgStat_Counter startreadtime = 0,
 startwritetime = 0;
 	WalUsage	startwalusage = pgWalUsage;
+	int64		StartPageMiss = VacuumPageMiss;
 	BufferUsage startbufferusage = pgBufferUsage;
 	ErrorContextCallback errcallback;
 	char	  **indnames = NULL;
@@ -606,6 +607,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			StringInfoData buf;
 			char	   *msgfmt;
 			int32		diff;
+			int64		PageMissOp = VacuumPageMiss - StartPageMiss;
 			double		read_rate = 0,
 		write_rate = 0;
 
@@ -748,8 +750,9 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 			appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"),
 			 read_rate, write_rate);
 			appendStringInfo(&buf,
-			 _("buffer usage: %lld hits, %lld misses, %lld dirtied\n"),
+			 _("buffer usage: %lld hits, %lld misses in the previous version, %lld misses in the patched version, %lld dirtied\n"),
 			 (long long) (bufferusage.shared_blks_hit + bufferusage.local_blks_hit),
+			 (long long) (PageMissOp),
 			 (long long) (bufferusage.shared_blks_read + bufferusage.local_blks_read),
 			 (long long) (bufferusage.shared_blks_dirtied + bufferusage.local_blks_dirtied));
 			appendStringInfo(&buf,

Re: WAL record CRC calculated incorrectly because of underlying buffer modification

2024-05-10 Thread Andres Freund

Hi,

On 2024-05-10 16:00:01 +0300, Alexander Lakhin wrote:
> and discovered that XLogRecordAssemble() calculates CRC over a buffer,
> that might be modified by another process.

If, with "might", you mean that it's legitimate for that buffer to be
modified, I don't think so.  The bug is that something is modifying the buffer
despite it being exclusively locked.

I.e. what we likely have here is a bug somewhere in the hash index code.

Greetings,

Andres Freund

Re: Fix parallel vacuum buffer usage reporting

2024-05-10 Thread Masahiko Sawada

On Fri, May 10, 2024 at 7:26 PM Nazir Bilal Yavuz  wrote:
>
> Hi,
>
> Thank you for working on this!
>
> On Wed, 1 May 2024 at 06:37, Masahiko Sawada  wrote:
> >
> > Thank you for further testing! I've pushed the patch.
>
> I realized a behaviour change while looking at 'Use pgBufferUsage for
> block reporting in analyze' thread [1]. Since that change applies here
> as well, I thought it is better to mention it here.
>
> Before this commit, VacuumPageMiss did not count the blocks if its
> read was already completed by other backends [2]. Now,
> 'pgBufferUsage.local_blks_read + pgBufferUsage.shared_blks_read'
> counts every block attempted to be read; possibly double counting if
> someone else has already completed the read.

True. IIUC there is such a difference only in HEAD but not in v15 and
v16. The following comment in WaitReadBuffers() says that it's a
traditional behavior that we count blocks as read even if someone else
has already completed its I/O:

/*
 * We count all these blocks as read by this backend.  This is traditional
 * behavior, but might turn out to be not true if we find that someone
 * else has beaten us and completed the read of some of these blocks.  In
 * that case the system globally double-counts, but we traditionally don't
 * count this as a "hit", and we don't have a separate counter for "miss,
 * but another backend completed the read".
 */

So I think using pgBufferUsage for (parallel) vacuum is a legitimate
usage and makes it more consistent with other parallel operations.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: First draft of PG 17 release notes

2024-05-10 Thread Daniel Verite

Bruce Momjian wrote:

>  have committed the first draft of the PG 17 release notes;  you can
> see the results here:
> 
> https://momjian.us/pgsql_docs/release-17.html

In the psql items, I'd suggest mentioning

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=90f5178

For the short description, maybe something like that:

- Improve FETCH_COUNT to work with all queries (Daniel Vérité)
Previously, results would be fetched in chunks only for queries
that start with the SELECT keyword.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

Re: Use generation memory context for tuplestore.c

2024-05-10 Thread Dmitry Dolgov

> On Sat, May 04, 2024 at 01:55:22AM +1200, David Rowley wrote:
> (40af10b57 did this for tuplesort.c, this is the same, but for tuplestore.c)

An interesting idea, thanks. I was able to reproduce the results of your
benchmark and get similar conclusions from the results.

> Using generation has the following advantages:
>
> [...]
>
> 2. Allocation patterns in tuplestore.c are FIFO, which is exactly what
> generation was designed to handle best.

Do I understand correctly, that the efficiency of generation memory
context could be measured directly via counting number of malloc/free
calls? In those experiments I've conducted the numbers were indeed
visibly lower for the patched version (~30%), but I would like to
confirm my interpretation of this difference.

> 5. Higher likelihood of neighbouring tuples being stored consecutively
> in memory, resulting in better CPU memory prefetching.

I guess this roughly translates into better CPU cache utilization.
Measuring cache hit ratio for unmodified vs patched versions in my case
indeed shows about 10% less cache misses.

> The attached bench.sh.txt tests the performance of this change and
> result_chart.png shows the results I got when running on an AMD 3990x
> master @ 8f0a97dff vs patched.

The query you use in the benchmark, is it something like a "best-case"
scenario for generation memory context? I was experimenting with
different size of the b column, lower values seems to produce less
difference between generation and aset (although still generation
context is distinctly faster regarding final query latencies, see the
attached PDF estimation, ran for 8192 rows). I'm curious what could be a
"worst-case" workload type for this patch?

I've also noticed the first patch disables materialization in some tests.

--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql

+set enable_material = 0;
+
 -- UPDATE on a partition subtree has been seen to have problems.
 insert into ab values (1,2);
 explain (analyze, costs off, summary off, timing off)

Is it due to the volatility of Maximum Storage values? If yes, could it
be covered with something similar to COSTS OFF instead?

Re: Use generation memory context for tuplestore.c

2024-05-10 Thread Matthias van de Meent

On Sat, 4 May 2024 at 04:02, David Rowley  wrote:
>
> On Sat, 4 May 2024 at 03:51, Matthias van de Meent
>  wrote:
> > Was a bump context considered? If so, why didn't it make the cut?
> > If tuplestore_trim is the only reason why the type of context in patch
> > 2 is a generation context, then couldn't we make the type of context
> > conditional on state->eflags & EXEC_FLAG_REWIND, and use a bump
> > context if we require rewind capabilities (i.e. where _trim is never
> > effectively executed)?
>
> I didn't really want to raise all this here, but to answer why I
> didn't use bump...
>
> There's a bit more that would need to be done to allow bump to work in
> use-cases where no trim support is needed. Namely, if you look at
> writetup_heap(), you'll see a heap_free_minimal_tuple(), which is
> pfreeing the memory that was allocated for the tuple in either
> tuplestore_puttupleslot(), tuplestore_puttuple() or
> tuplestore_putvalues().   So basically, what happens if we're still
> loading the tuplestore and we've spilled to disk, once the
> tuplestore_put* function is called, we allocate memory for the tuple
> that might get stored in RAM (we don't know yet), but then call
> tuplestore_puttuple_common() which decides if the tuple goes to RAM or
> disk, then because we're spilling to disk, the write function pfree's
> the memory we allocate in the tuplestore_put function after the tuple
> is safely written to the disk buffer.

Thanks, that's exactly the non-obvious issue I was looking for, but
couldn't find immediately.

> This is a fairly inefficient design.  While, we do need to still form
> a tuple and store it somewhere for tuplestore_putvalues(), we don't
> need to do that for a heap tuple. I think it should be possible to
> write directly from the table's page.

[...]

> I'd rather tackle these problems independently and I believe there are
> much bigger wins to moving from aset to generation than generation to
> bump, so that's where I've started.

That's entirely reasonable, and I wouldn't ask otherwise.

Thanks,

Matthias van de Meent

Re: First draft of PG 17 release notes

2024-05-10 Thread Jelte Fennema-Nio

On Thu, 9 May 2024 at 06:04, Bruce Momjian  wrote:
>
> I have committed the first draft of the PG 17 release notes;  you can
> see the results here:
>
> https://momjian.us/pgsql_docs/release-17.html

Great work!

There are two commits that I think would benefit from being listed
(but maybe they are already listed and I somehow missed them, or they
are left out on purpose for some reason):

- c4ab7da60617f020e8d75b1584d0754005d71830
- cafe1056558fe07cdc52b95205588fcd80870362

Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2024-05-10 Thread Mark Dilger

> On May 10, 2024, at 5:10 AM, Pavel Borisov  wrote:
> 
> Hi, Alexander!
> 
> On Fri, 10 May 2024 at 12:39, Alexander Korotkov  wrote:
> On Fri, May 10, 2024 at 3:43 AM Tom Lane  wrote:
> > Alexander Korotkov  writes:
> > > The revised patchset is attached.  I applied cosmetical changes.  I'm
> > > going to push it if no objections.
> >
> > Is this really suitable material to be pushing post-feature-freeze?
> > It doesn't look like it's fixing any new-in-v17 issues.
> 
> These are code improvements to the 5ae2087202, which answer critics in
> the thread.  0001 comprises an optimization, but it's rather small and
> simple.  0002 and 0003 contain refactoring.  0004 contains better
> error reporting.  For me this looks like pretty similar to what others
> commit post-FF, isn't it?
> I've re-checked patches v2. Differences from v1 are in improving 
> naming/pgindent's/commit messages.
> In 0002 order of variables in struct BtreeLastVisibleEntry changed. This 
> doesn't change code behavior.
> 
> Patch v2-0003 doesn't contain credits and a discussion link. All other 
> patches do. 
> 
> Overall, patches contain small performance optimization (0001), code 
> refactoring and error reporting changes. IMO they could be pushed post-FF.

v2-0001's commit message itself says, "This commit implements skipping keys". I 
take no position on the correctness or value of the improvement, but it seems 
out of scope post feature freeze.  The patch seems to postpone uniqueness 
checking until later in the scan than what the prior version did, and that kind 
of change could require more analysis than we have time for at this point in 
the release cycle.

v2-0002 does appear to just be refactoring.  I don't care for a small portion 
of that patch, but I doubt it violates the post feature freeze rules.  In 
particular:

  +   BtreeLastVisibleEntry lVis = {InvalidBlockNumber, InvalidOffsetNumber, 
-1, NULL};

v2-0003 may be an improvement in some way, but it compounds some preexisting 
confusion also.  There is already a member of the BtreeCheckState called 
"target" and a memory context in that struct called "targetcontext".  That 
context is used to allocate pages "state->target", "rightpage", "child" and 
"page", but not "metapage".  Perhaps "targetcontext" is a poor choice of name?  
"notmetacontext" is a terrible name, but closer to describing the purpose of 
the memory context.  Care to propose something sensible?

Prior to applying v2-0003, the rightpage was stored in state->target, and 
continued to be in state->target later when entering

/*
 * * Downlink check *
 *  
 * Additional check of child items iff this is an internal page and
 * caller holds a ShareLock.  This happens for every downlink (item)
 * in target excluding the negative-infinity downlink (again, this is
 * because it has no useful value to compare).
 */ 
if (!P_ISLEAF(topaque) && state->readonly)
bt_child_check(state, skey, offset);

and thereafter.  Now, the rightpage of state->target is created, checked, and 
free'd, and then the old state->target gets processed in the downlink check and 
thereafter.  This is either introducing a bug, or fixing one, but the commit 
message is totally ambiguous about whether this is a bugfix or a code cleanup 
or something else?  I think this kind of patch should have a super clear commit 
message about what it thinks it is doing.

v2-0004 guards against a real threat, and is reasonable post feature freeze

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: wrong comment in libpq.h

2024-05-10 Thread David Zhang




It looks like this wording "prototypes for functions in" is used many 
times in src/include/, in many cases equally inaccurately, so I would 
suggest creating a more comprehensive patch for this.


I noticed this "prototypes for functions in" in many header files and 
briefly checked them. It kind of all make sense except the bufmgr.h has 
something like below.


/* in buf_init.c */
extern void InitBufferPool(void);
extern Size BufferShmemSize(void);

/* in localbuf.c */
extern void AtProcExit_LocalBuffers(void);

/* in freelist.c */

which doesn't say "prototypes for functions xxx", but it still make 
sense for me.



The main confusion part is in libpq.h.

/*
 * prototypes for functions in be-secure.c
 */
extern PGDLLIMPORT char *ssl_library;
extern PGDLLIMPORT char *ssl_cert_file;
extern PGDLLIMPORT char *ssl_key_file;
extern PGDLLIMPORT char *ssl_ca_file;
extern PGDLLIMPORT char *ssl_crl_file;
extern PGDLLIMPORT char *ssl_crl_dir;
extern PGDLLIMPORT char *ssl_dh_params_file;
extern PGDLLIMPORT char *ssl_passphrase_command;
extern PGDLLIMPORT bool ssl_passphrase_command_supports_reload;
#ifdef USE_SSL
extern PGDLLIMPORT bool ssl_loaded_verify_locations;
#endif

If we can delete the comment and move the variables declarations to /* 
GUCs */ section, then it should be more consistent.


/* GUCs */
extern PGDLLIMPORT char *SSLCipherSuites;
extern PGDLLIMPORT char *SSLECDHCurve;
extern PGDLLIMPORT bool SSLPreferServerCiphers;
extern PGDLLIMPORT int ssl_min_protocol_version;
extern PGDLLIMPORT int ssl_max_protocol_version;

One more argument for my previous patch is that with this minor change 
it can better align with the parameters in postgresql.conf.


# - SSL -

#ssl = off
#ssl_ca_file = ''
#ssl_cert_file = 'server.crt'
#ssl_crl_file = ''
#ssl_crl_dir = ''
#ssl_key_file = 'server.key'
#ssl_ciphers = 'HIGH:MEDIUM:+3DES:!aNULL'    # allowed SSL ciphers
#ssl_prefer_server_ciphers = on
#ssl_ecdh_curve = 'prime256v1'
#ssl_min_protocol_version = 'TLSv1.2'
#ssl_max_protocol_version = ''
#ssl_dh_params_file = ''
#ssl_passphrase_command = ''
#ssl_passphrase_command_supports_reload = off


best regards,

David

Augmenting the deadlock message with application_name

2024-05-10 Thread Karoline Pauls

As we know, the deadlock error message isn't the most friendly one. All the 
client gets back is process PIDs, transaction IDs, and lock types. You have to 
check the server log to retrieve lock details. This is tedious.

In one of my apps I even added a deadlock exception handler on the app side to 
query pg_stat_activity for processes involved in the deadlock and include their 
application names and queries in the exception message. It is a little racy but 
works well enough.

Ideally I'd like to see that data coming from Postgres upon detecting the 
deadlock. That's why I made this small change.

The change makes the deadlock error look as follows - the new element is the 
application name or "" in its place if the activity 
user doesn't match the current user (and the current use isn't a superuser):

postgres=*> SELECT * FROM q WHERE id = 2 FOR UPDATE;
ERROR: deadlock detected
DETAIL: Process 194520 (application_name: ) waits for 
ShareLock on transaction 776; blocked by process 194521.
Process 194521 (application_name: woof) waits for ShareLock on transaction 775; 
blocked by process 194520.
HINT: See server log for query details.
CONTEXT: while locking tuple (0,2) in relation "q"

I added a new LocalPgBackendCurrentActivity struct combining application name 
and query string pointers and a sameProcess boolean. It is returned by value, 
since it's small. Performance-wise, this is a a part of the deadlock handler, 
if the DB hits it frequently, there are much more serious problems going on.

I could extend it by sending the queries back to the client, with an identical 
security check, but this is a potential information exposure of whatever's in 
the query plaintext. Another extension is to replace "(application_name: 
)" with something better like "(unknown 
application_name)", or even nothing.

Attached patch is for master, 2fb7560c. It doesn't contain any tests.

Let me know if you approve of the patch and if it makes sense to continue 
working on it.

Best,
KarolineFrom f7d3b3854ece45b2ba3b79643e99c1acbb09abae Mon Sep 17 00:00:00 2001
From: Karoline Pauls 
Date: Wed, 8 May 2024 23:49:46 +0100
Subject: [PATCH] call pgstat_get_backend_current_activity once

---
 src/backend/storage/lmgr/deadlock.c | 31 +
 src/backend/utils/activity/backend_status.c | 10 +++
 src/include/utils/backend_status.h  |  4 ++-
 src/tools/pgindent/typedefs.list|  1 +
 4 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/src/backend/storage/lmgr/deadlock.c b/src/backend/storage/lmgr/deadlock.c
index d94f65df..c08f66af 100644
--- a/src/backend/storage/lmgr/deadlock.c
+++ b/src/backend/storage/lmgr/deadlock.c
@@ -1075,17 +1075,22 @@ DeadLockReport(void)
 	StringInfoData logbuf;		/* errdetail for server log */
 	StringInfoData locktagbuf;
 	int			i;
+	bool		isSuperuser = superuser();
+	LocalPgBackendCurrentActivity *activities = \
+		(LocalPgBackendCurrentActivity *) palloc(sizeof(LocalPgBackendCurrentActivity) * nDeadlockDetails);
 
 	initStringInfo(&clientbuf);
 	initStringInfo(&logbuf);
 	initStringInfo(&locktagbuf);
 
-	/* Generate the "waits for" lines sent to the client */
+	/* Generate the "waits for" lines sent to the client and the log */
 	for (i = 0; i < nDeadlockDetails; i++)
 	{
 		DEADLOCK_INFO *info = &deadlockDetails[i];
 		int			nextpid;
-		LocalPgBackendCurrentActivity activity = pgstat_get_backend_current_activity(info->pid, true);
+		const char *lockModeName = GetLockmodeName(info->locktag.locktag_lockmethodid, info->lockmode);
+
+		activities[i] = pgstat_get_backend_current_activity(info->pid);
 
 		/* The last proc waits for the first one... */
 		if (i < nDeadlockDetails - 1)
@@ -1104,29 +1109,33 @@ DeadLockReport(void)
 		appendStringInfo(&clientbuf,
 		 _("Process %d (application_name: %s) waits for %s on %s; blocked by process %d."),
 		 info->pid,
-		 activity.st_appname,
-		 GetLockmodeName(info->locktag.locktag_lockmethodid,
-		 info->lockmode),
+		 (activities[i].sameUser || isSuperuser) ? activities[i].st_appname : "",
+		 lockModeName,
 		 locktagbuf.data,
 		 nextpid);
-	}
 
-	/* Duplicate all the above for the server ... */
-	appendBinaryStringInfo(&logbuf, clientbuf.data, clientbuf.len);
+		/* the difference is that we don't do user checking for the server log */
+		appendStringInfo(&logbuf,
+		 _("Process %d (application_name: %s) waits for %s on %s; blocked by process %d."),
+		 info->pid,
+		 activities[i].st_appname,
+		 lockModeName,
+		 locktagbuf.data,
+		 nextpid);
+	}
 
 	/* ... and add info about query strings */
 	for (i = 0; i < nDeadlockDetails; i++)
 	{
 		DEADLOCK_INFO *info = &deadlockDetails[i];
-		LocalPgBackendCurrentActivity activity = pgstat_get_backend_current_activity(info->pid, false);
 
 		appendStringInfoChar(&logbuf, '\n');
 
 		appendStringInfo(&logbuf,
 		 _("Process %d, (application_name: %s): %s"),
 		 info->pid,
-		 activity.

Re: Bug: PGTYPEStimestamp_from_asc() in ECPG pgtypelib

2024-05-10 Thread Ryo Matsumura (Fujitsu)

Hi Tom,
Thank you for comment.

>> At current implementation, PGTYPEStimestamp_from_asc returns -1.
> It looks to me like it returns 0 ("noresult").  Where are you seeing -1?

I took a mistake. Sorry.
PGTYPEStimestamp_from_asc returns 0(noresult).
PGTYPEStimestamp_fmt_asc given 'noresult' returns -1.


> But how come we haven't noticed that
> before?  Have you added a setlocale() call somewhere?

I didn't notice to this point.
I added setlocale() to ECPG in my local branch.
I will test again after removing it.
It looks to me like existing ECPG code does not include setlocale().

So Please ignore about behavior of PGTYPEStimestamp_fmt_asc().
I want to continue to discuss about PGTYPEStimestamp_from_asc.


Current PGTYPEStimestamp_from_asc() returns 0, but should we return -1?
The document claims about return that "It is set to 1899-12-31 23:59:59.".

I wonder.
It is the incompatibility, but it may be allowed.
because I think usual users may check with errno.
Of course, the reason is weak.
Some users may check with 0(noresult) from their experience.


> That documentation has more problems too: it claims that "endptr"
> is unimplemented, which looks like a lie to me: the code is there
> to do it, and there are several callers that depend on it.

I think so too. The followings have the same problem.
PGTYPESdate_from_asc(ParseDate)
PGTYPESinterval_from_asc(ParseDate)
PGTYPEStimestamp_from_asc(ParseDate)
PGTYPESnumeric_from_asc


> which is consistent with that, but not very much like what the
> comment is expecting.  I'm a bit inclined to just drop "block 3".
> If we want to keep some kind of test of the error behavior,
> it doesn't belong right there, and it should be showing what errno
> gets set to.

It is easy to exchange block3 to errno-checking.
However if we just fix there, it is weird because there is
no other errno-checking in dt_tests.

> I'm a bit inclined to just drop "block 3".
Are you concerned at the above point?

Best Regards
Ryo Matsumura

Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

Hi, Mark!


On Fri, 10 May 2024, 21:35 Mark Dilger, 
wrote:

>
>
> > On May 10, 2024, at 5:10 AM, Pavel Borisov 
> wrote:
> >
> > Hi, Alexander!
> >
> > On Fri, 10 May 2024 at 12:39, Alexander Korotkov 
> wrote:
> > On Fri, May 10, 2024 at 3:43 AM Tom Lane  wrote:
> > > Alexander Korotkov  writes:
> > > > The revised patchset is attached.  I applied cosmetical changes.  I'm
> > > > going to push it if no objections.
> > >
> > > Is this really suitable material to be pushing post-feature-freeze?
> > > It doesn't look like it's fixing any new-in-v17 issues.
> >
> > These are code improvements to the 5ae2087202, which answer critics in
> > the thread.  0001 comprises an optimization, but it's rather small and
> > simple.  0002 and 0003 contain refactoring.  0004 contains better
> > error reporting.  For me this looks like pretty similar to what others
> > commit post-FF, isn't it?
> > I've re-checked patches v2. Differences from v1 are in improving
> naming/pgindent's/commit messages.
> > In 0002 order of variables in struct BtreeLastVisibleEntry changed. This
> doesn't change code behavior.
> >
> > Patch v2-0003 doesn't contain credits and a discussion link. All other
> patches do.
> >
> > Overall, patches contain small performance optimization (0001), code
> refactoring and error reporting changes. IMO they could be pushed post-FF.
>
> v2-0001's commit message itself says, "This commit implements skipping
> keys". I take no position on the correctness or value of the improvement,
> but it seems out of scope post feature freeze.  The patch seems to postpone
> uniqueness checking until later in the scan than what the prior version
> did, and that kind of change could require more analysis than we have time
> for at this point in the release cycle.
>
>
> v2-0002 does appear to just be refactoring.  I don't care for a small
> portion of that patch, but I doubt it violates the post feature freeze
> rules.  In particular:
>
>   +   BtreeLastVisibleEntry lVis = {InvalidBlockNumber,
> InvalidOffsetNumber, -1, NULL};
>
>
> v2-0003 may be an improvement in some way, but it compounds some
> preexisting confusion also.  There is already a member of the
> BtreeCheckState called "target" and a memory context in that struct called
> "targetcontext".  That context is used to allocate pages "state->target",
> "rightpage", "child" and "page", but not "metapage".  Perhaps
> "targetcontext" is a poor choice of name?  "notmetacontext" is a terrible
> name, but closer to describing the purpose of the memory context.  Care to
> propose something sensible?
>
> Prior to applying v2-0003, the rightpage was stored in state->target, and
> continued to be in state->target later when entering
>
> /*
>  * * Downlink check *
>  *
>  * Additional check of child items iff this is an internal page and
>  * caller holds a ShareLock.  This happens for every downlink
> (item)
>  * in target excluding the negative-infinity downlink (again, this
> is
>  * because it has no useful value to compare).
>  */
> if (!P_ISLEAF(topaque) && state->readonly)
> bt_child_check(state, skey, offset);
>
> and thereafter.  Now, the rightpage of state->target is created, checked,
> and free'd, and then the old state->target gets processed in the downlink
> check and thereafter.  This is either introducing a bug, or fixing one, but
> the commit message is totally ambiguous about whether this is a bugfix or a
> code cleanup or something else?  I think this kind of patch should have a
> super clear commit message about what it thinks it is doing.
>
>
> v2-0004 guards against a real threat, and is reasonable post feature freeze
>
>
> —
> Mark Dilger
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

IMO 0003 doesn't introduce nor fixes a bug. It loads rightpage into a local
variable, rather that to a BtreeCheckState that can have another users of
state->target afterb uniqueness check in the future, but don't have now. So
the original patch is correct, and the goal of this refactoring is to untie
rightpage fron state structure as it's used only transiently for cross-page
unuque check. It's the same style as already used bt_right_page_check_scankey()
that loads rightpage into a local variable.

For 0002 I doubt I understand your actual positiob. Could you explain what
it violates or doesn't violate?

Best regards,
Pavel.

Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

On Fri, 10 May 2024, 22:42 Pavel Borisov,  wrote:

> Hi, Mark!
>
>
> On Fri, 10 May 2024, 21:35 Mark Dilger, 
> wrote:
>
>>
>>
>> > On May 10, 2024, at 5:10 AM, Pavel Borisov 
>> wrote:
>> >
>> > Hi, Alexander!
>> >
>> > On Fri, 10 May 2024 at 12:39, Alexander Korotkov 
>> wrote:
>> > On Fri, May 10, 2024 at 3:43 AM Tom Lane  wrote:
>> > > Alexander Korotkov  writes:
>> > > > The revised patchset is attached.  I applied cosmetical changes.
>> I'm
>> > > > going to push it if no objections.
>> > >
>> > > Is this really suitable material to be pushing post-feature-freeze?
>> > > It doesn't look like it's fixing any new-in-v17 issues.
>> >
>> > These are code improvements to the 5ae2087202, which answer critics in
>> > the thread.  0001 comprises an optimization, but it's rather small and
>> > simple.  0002 and 0003 contain refactoring.  0004 contains better
>> > error reporting.  For me this looks like pretty similar to what others
>> > commit post-FF, isn't it?
>> > I've re-checked patches v2. Differences from v1 are in improving
>> naming/pgindent's/commit messages.
>> > In 0002 order of variables in struct BtreeLastVisibleEntry changed.
>> This doesn't change code behavior.
>> >
>> > Patch v2-0003 doesn't contain credits and a discussion link. All other
>> patches do.
>> >
>> > Overall, patches contain small performance optimization (0001), code
>> refactoring and error reporting changes. IMO they could be pushed post-FF.
>>
>> v2-0001's commit message itself says, "This commit implements skipping
>> keys". I take no position on the correctness or value of the improvement,
>> but it seems out of scope post feature freeze.  The patch seems to postpone
>> uniqueness checking until later in the scan than what the prior version
>> did, and that kind of change could require more analysis than we have time
>> for at this point in the release cycle.
>>
>>
>> v2-0002 does appear to just be refactoring.  I don't care for a small
>> portion of that patch, but I doubt it violates the post feature freeze
>> rules.  In particular:
>>
>>   +   BtreeLastVisibleEntry lVis = {InvalidBlockNumber,
>> InvalidOffsetNumber, -1, NULL};
>>
>>
>> v2-0003 may be an improvement in some way, but it compounds some
>> preexisting confusion also.  There is already a member of the
>> BtreeCheckState called "target" and a memory context in that struct called
>> "targetcontext".  That context is used to allocate pages "state->target",
>> "rightpage", "child" and "page", but not "metapage".  Perhaps
>> "targetcontext" is a poor choice of name?  "notmetacontext" is a terrible
>> name, but closer to describing the purpose of the memory context.  Care to
>> propose something sensible?
>>
>> Prior to applying v2-0003, the rightpage was stored in state->target, and
>> continued to be in state->target later when entering
>>
>> /*
>>  * * Downlink check *
>>  *
>>  * Additional check of child items iff this is an internal page
>> and
>>  * caller holds a ShareLock.  This happens for every downlink
>> (item)
>>  * in target excluding the negative-infinity downlink (again,
>> this is
>>  * because it has no useful value to compare).
>>  */
>> if (!P_ISLEAF(topaque) && state->readonly)
>> bt_child_check(state, skey, offset);
>>
>> and thereafter.  Now, the rightpage of state->target is created, checked,
>> and free'd, and then the old state->target gets processed in the downlink
>> check and thereafter.  This is either introducing a bug, or fixing one, but
>> the commit message is totally ambiguous about whether this is a bugfix or a
>> code cleanup or something else?  I think this kind of patch should have a
>> super clear commit message about what it thinks it is doing.
>>
>>
>> v2-0004 guards against a real threat, and is reasonable post feature
>> freeze
>>
>>
>> —
>> Mark Dilger
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>
> IMO 0003 doesn't introduce nor fixes a bug. It loads rightpage into a
> local variable, rather that to a BtreeCheckState that can have another
> users of state->target afterb uniqueness check in the future, but don't
> have now. So the original patch is correct, and the goal of this
> refactoring is to untie rightpage fron state structure as it's used only
> transiently for cross-page unuque check. It's the same style as already
> used bt_right_page_check_scankey() that loads rightpage into a local
> variable.
>
> For 0002 I doubt I understand your actual positiob. Could you explain what
> it violates or doesn't violate?
>
Please forgive many typos in the previous message, I wrote from phone.

Pavel.

>

Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2024-05-10 Thread Alexander Korotkov

On Fri, May 10, 2024 at 8:35 PM Mark Dilger
 wrote:
> > On May 10, 2024, at 5:10 AM, Pavel Borisov  wrote:
> > On Fri, 10 May 2024 at 12:39, Alexander Korotkov  
> > wrote:
> > On Fri, May 10, 2024 at 3:43 AM Tom Lane  wrote:
> > > Alexander Korotkov  writes:
> > > > The revised patchset is attached.  I applied cosmetical changes.  I'm
> > > > going to push it if no objections.
> > >
> > > Is this really suitable material to be pushing post-feature-freeze?
> > > It doesn't look like it's fixing any new-in-v17 issues.
> >
> > These are code improvements to the 5ae2087202, which answer critics in
> > the thread.  0001 comprises an optimization, but it's rather small and
> > simple.  0002 and 0003 contain refactoring.  0004 contains better
> > error reporting.  For me this looks like pretty similar to what others
> > commit post-FF, isn't it?
> > I've re-checked patches v2. Differences from v1 are in improving 
> > naming/pgindent's/commit messages.
> > In 0002 order of variables in struct BtreeLastVisibleEntry changed. This 
> > doesn't change code behavior.
> >
> > Patch v2-0003 doesn't contain credits and a discussion link. All other 
> > patches do.
> >
> > Overall, patches contain small performance optimization (0001), code 
> > refactoring and error reporting changes. IMO they could be pushed post-FF.
>
> v2-0001's commit message itself says, "This commit implements skipping keys". 
> I take no position on the correctness or value of the improvement, but it 
> seems out of scope post feature freeze.  The patch seems to postpone 
> uniqueness checking until later in the scan than what the prior version did, 
> and that kind of change could require more analysis than we have time for at 
> this point in the release cycle.

Formally this could be classified as algorithmic change and probably
should be postponed to the next release.  But that's quite local
optimization, which just postpones a function call within the same
iteration of loop.  And the effect is huge.  Probably we could allow
this post-FF in the sake of quality release, given it's very local
change with a huge effect.

> v2-0002 does appear to just be refactoring.  I don't care for a small portion 
> of that patch, but I doubt it violates the post feature freeze rules.  In 
> particular:
>
>   +   BtreeLastVisibleEntry lVis = {InvalidBlockNumber, InvalidOffsetNumber, 
> -1, NULL};

I don't understand what is the problem with this line and post feature
freeze rules.  Please, explain it more.

> v2-0003 may be an improvement in some way, but it compounds some preexisting 
> confusion also.  There is already a member of the BtreeCheckState called 
> "target" and a memory context in that struct called "targetcontext".  That 
> context is used to allocate pages "state->target", "rightpage", "child" and 
> "page", but not "metapage".  Perhaps "targetcontext" is a poor choice of 
> name?  "notmetacontext" is a terrible name, but closer to describing the 
> purpose of the memory context.  Care to propose something sensible?
>
> Prior to applying v2-0003, the rightpage was stored in state->target, and 
> continued to be in state->target later when entering
>
> /*
>  * * Downlink check *
>  *
>  * Additional check of child items iff this is an internal page and
>  * caller holds a ShareLock.  This happens for every downlink (item)
>  * in target excluding the negative-infinity downlink (again, this is
>  * because it has no useful value to compare).
>  */
> if (!P_ISLEAF(topaque) && state->readonly)
> bt_child_check(state, skey, offset);
>
> and thereafter.  Now, the rightpage of state->target is created, checked, and 
> free'd, and then the old state->target gets processed in the downlink check 
> and thereafter.  This is either introducing a bug, or fixing one, but the 
> commit message is totally ambiguous about whether this is a bugfix or a code 
> cleanup or something else?  I think this kind of patch should have a super 
> clear commit message about what it thinks it is doing.

The only bt_target_page_check() caller is
bt_check_level_from_leftmost(), which overrides state->target in the
next iteration anyway.  I think the patch is just refactoring to
eliminate the confusion pointer by Peter Geoghegan upthread.

0002 and 0003 don't address any bugs, but It would be very nice to
accept them, because it would simplify future backpatching in this
area.

> v2-0004 guards against a real threat, and is reasonable post feature freeze

Ok.

--
Regards,
Alexander Korotkov
Supabase

Re: Augmenting the deadlock message with application_name

On Thu, May  9, 2024 at 11:44:03PM +, Karoline Pauls wrote:
> As we know, the deadlock error message isn't the most friendly one. All the
> client gets back is process PIDs, transaction IDs, and lock types. You have to
> check the server log to retrieve lock details. This is tedious.
> 
> In one of my apps I even added a deadlock exception handler on the app side to
> query pg_stat_activity for processes involved in the deadlock and include 
> their
> application names and queries in the exception message. It is a little racy 
> but
> works well enough.
> 
> Ideally I'd like to see that data coming from Postgres upon detecting the
> deadlock. That's why I made this small change.
> 
> The change makes the deadlock error look as follows - the new element is the
> application name or "" in its place if the activity
> user doesn't match the current user (and the current use isn't a superuser):
> 
> postgres=*> SELECT * FROM q WHERE id = 2 FOR UPDATE;
> ERROR:  deadlock detected
> DETAIL:  Process 194520 (application_name: ) waits for
> ShareLock on transaction 776; blocked by process 194521.
> Process 194521 (application_name: woof) waits for ShareLock on transaction 
> 775;
> blocked by process 194520.
> HINT:  See server log for query details.
> CONTEXT:  while locking tuple (0,2) in relation "q"

log_line_prefix supports application name --- why would you not use
that?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2024-05-10 Thread Mark Dilger

> On May 10, 2024, at 11:42 AM, Pavel Borisov  wrote:
> 
> IMO 0003 doesn't introduce nor fixes a bug. It loads rightpage into a local 
> variable, rather that to a BtreeCheckState that can have another users of 
> state->target afterb uniqueness check in the future, but don't have now. So 
> the original patch is correct, and the goal of this refactoring is to untie 
> rightpage fron state structure as it's used only transiently for cross-page 
> unuque check. It's the same style as already used 
> bt_right_page_check_scankey() that loads rightpage into a local variable.

Well, you can put an Assert(false) dead in the middle of the code we're 
discussing and all the regression tests still pass, so I'd argue the change is 
getting zero test coverage.

This patch introduces a change that stores a new page into variable "rightpage" 
rather than overwriting "state->target", which the old implementation most 
certainly did.  That means that after returning from bt_target_page_check() 
into the calling function bt_check_level_from_leftmost() the value in 
state->target is not what it would have been prior to this patch.  Now, that'd 
be irrelevant if nobody goes on to consult that value, but just 44 lines 
further down in bt_check_level_from_leftmost() state->target is clearly used.  
So the behavior at that point is changing between the old and new versions of 
the code, and I think I'm within reason to ask if it was wrong before the 
patch, wrong after the patch, or something else?  Is this a bug being 
introduced, being fixed, or ... ?

Having a regression test that actually touches this code would go a fair way 
towards helping the analysis.

> For 0002 I doubt I understand your actual positiob. Could you explain what it 
> violates or doesn't violate?

v2-0002 is does not violate the post feature freeze restriction on new features 
so far as I can tell, but I just don't care for the variable initialization 
because it doesn't name the fields.  If anybody refactored the struct they 
might not notice that the need to reorder this initialization, and depending on 
various factors including compiler flags they might not get an error.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: open items

2024-05-10 Thread Robert Haas

On Fri, May 10, 2024 at 8:48 AM Melanie Plageman
 wrote:
> Sorry, yes, the trivial fix has been done for a while. There is one
> outstanding feedback on the patch: an update to one of the comments
> suggested by Tomas. I got distracted by trying to repro and fix a bug
> from the section "live issues affecting stable branches". I will
> update this BHS patch by tonight and commit it once Tomas has a chance
> to +1.

Great, thanks!

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: open items

2024-05-10 Thread Robert Haas

On Fri, May 10, 2024 at 7:14 AM Alvaro Herrera  wrote:
> A possible complaint is that the upgrade mechanics which are mostly in
> pg_dump with some pieces in pg_upgrade are not very explicitly
> documented.  There are already comments in all relevant places, but
> perhaps an overall picture is necessary.  I'll see about this, probably
> as a long comment somewhere.

I think that would be really helpful.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: First draft of PG 17 release notes

On Fri, May 10, 2024 at 06:29:11PM +0200, Daniel Verite wrote:
>   Bruce Momjian wrote:
> 
> >  have committed the first draft of the PG 17 release notes;  you can
> > see the results here:
> > 
> > https://momjian.us/pgsql_docs/release-17.html
> 
> In the psql items, I'd suggest mentioning
> 
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=90f5178
> 
> For the short description, maybe something like that:
> 
> - Improve FETCH_COUNT to work with all queries (Daniel Vérité)
> Previously, results would be fetched in chunks only for queries
> that start with the SELECT keyword.

Agreed, patch attached and applied.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/release-17.sgml b/doc/src/sgml/release-17.sgml
index e4b34d827d1..08238be9cb7 100644
--- a/doc/src/sgml/release-17.sgml
+++ b/doc/src/sgml/release-17.sgml
@@ -1893,8 +1893,6 @@ This is similar to PQpipelineSync() but it does not flush to the server unless t
 
 
 
@@ -1937,6 +1935,17 @@ This is enabled with the client-side option sslnegotation=direct, requires ALPN,
 
  
 
+
+
+
+
+Allow FETCH_COUNT to work with non-SELECT queries (Daniel Vérité)
+
+
+

Re: BitmapHeapScan streaming read user and prelim refactoring

2024-05-10 Thread Melanie Plageman

On Thu, May 2, 2024 at 5:37 PM Tomas Vondra
 wrote:
>
>
>
> On 4/24/24 22:46, Melanie Plageman wrote:
> > On Tue, Apr 23, 2024 at 6:43 PM Tomas Vondra
> >  wrote:
> >>
> >> On 4/23/24 18:05, Melanie Plageman wrote:
> >>> The patch with a fix is attached. I put the test in
> >>> src/test/regress/sql/join.sql. It isn't the perfect location because
> >>> it is testing something exercisable with a join but not directly
> >>> related to the fact that it is a join. I also considered
> >>> src/test/regress/sql/select.sql, but it also isn't directly related to
> >>> the query being a SELECT query. If there is a better place for a test
> >>> of a bitmap heap scan edge case, let me know.
> >>
> >> I don't see a problem with adding this to join.sql - why wouldn't this
> >> count as something related to a join? Sure, it's not like this code
> >> matters only for joins, but if you look at join.sql that applies to a
> >> number of other tests (e.g. there are a couple btree tests).
> >
> > I suppose it's true that other tests in this file use joins to test
> > other code. I guess if we limited join.sql to containing tests of join
> > implementation, it would be rather small. I just imagined it would be
> > nice if tests were grouped by what they were testing -- not how they
> > were testing it.
> >
> >> That being said, it'd be good to explain in the comment why we're
> >> testing this particular plan, not just what the plan looks like.
> >
> > You mean I should explain in the test comment why I included the
> > EXPLAIN plan output? (because it needs to contain a bitmapheapscan to
> > actually be testing anything)
> >
>
> No, I meant that the comment before the test describes a couple
> requirements the plan needs to meet (no need to process all inner
> tuples, bitmapscan eligible for skip_fetch on outer side, ...), but it
> does not explain why we're testing that plan.
>
> I could get to that by doing git-blame to see what commit added this
> code, and then read the linked discussion. Perhaps that's enough, but
> maybe the comment could say something like "verify we properly discard
> tuples on rescans" or something like that?

Attached is v3. I didn't use your exact language because the test
wouldn't actually verify that we properly discard the tuples. Whether
or not the empty tuples are all emitted, it just resets the counter to
0. I decided to go with "exercise" instead.

- Melanie
From 0f10d304fc7dcce99622f348183f36a8062e70c6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Fri, 10 May 2024 14:52:34 -0400
Subject: [PATCH v3] BitmapHeapScan: Remove incorrect assert

04e72ed617be pushed the skip fetch optimization (allowing bitmap heap
scans to operate like index-only scans if none of the underlying data is
needed) into heap AM-specific bitmap heap scan code.

04e72ed617be added an assert that all tuples in blocks eligible for the
optimization had been NULL-filled and emitted by the end of the scan.
This assert is incorrect when not all tuples need be scanned to execute
the query; for example: a join in which not all inner tuples need to be
scanned before skipping to the next outer tuple.

Author: Melanie Plageman
Reviewed-by: Richard Guo, Tomas Vondra
Discussion: https://postgr.es/m/CAMbWs48orzZVXa7-vP9Nt7vQWLTE04Qy4PePaLQYsVNQgo6qRg%40mail.gmail.com
---
 src/backend/access/heap/heapam.c   |  4 ++--
 src/test/regress/expected/join.out | 38 ++
 src/test/regress/sql/join.sql  | 26 
 3 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4be0dee4de..8600c22515 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1184,7 +1184,7 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_vmbuffer = InvalidBuffer;
 	}
 
-	Assert(scan->rs_empty_tuples_pending == 0);
+	scan->rs_empty_tuples_pending = 0;
 
 	/*
 	 * The read stream is reset on rescan. This must be done before
@@ -1216,7 +1216,7 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_vmbuffer))
 		ReleaseBuffer(scan->rs_vmbuffer);
 
-	Assert(scan->rs_empty_tuples_pending == 0);
+	scan->rs_empty_tuples_pending = 0;
 
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 0246d56aea..8829bd76e7 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -7924,3 +7924,41 @@ where exists (select 1 from j3
 (13 rows)
 
 drop table j3;
+-- Exercise the "skip fetch" Bitmap Heap Scan optimization when candidate
+-- tuples are discarded. This may occur when:
+--   1. A join doesn't require all inner tuples to be scanned for each outer
+--  tuple, and
+--   2. The inner side is scanned using a bitmap heap scan, and
+--   3. The bitmap heap scan is eligible for the "skip fetch" optimization.
+--  This optimization is us

Re: Augmenting the deadlock message with application_name

2024-05-10 Thread Karoline Pauls

On Friday, 10 May 2024 at 20:17, Bruce Momjian  wrote:
>
> log_line_prefix supports application name --- why would you not use
> that?
>

log_line_prefix is effective in the server log. This change is mostly about 
improving the message sent back to the client. While the server log is also 
changed to reflect the client message, it doesn't need to be.

Additionally, with `%a` added to log_line_prefix, the server log would only 
contain the application name of the client affected by the deadlock, not the 
application names of all other clients involved in it.

Example server log with application names (here: a and b) added to the log 
prefix:

2024-05-10 20:39:58.459 BST [197591] (a)ERROR:  deadlock detected
2024-05-10 20:39:58.459 BST [197591] (a)DETAIL:  Process 197591 
(application_name: a) waits for ShareLock on transaction 782; blocked by 
process 197586.
Process 197586 (application_name: b) waits for ShareLock on transaction 
781; blocked by process 197591.
Process 197591, (application_name: a): SELECT * FROM q WHERE id = 2 FOR 
UPDATE;
Process 197586, (application_name: b): SELECT * FROM q WHERE id = 1 FOR 
UPDATE;
2024-05-10 20:39:58.459 BST [197591] (a)HINT:  See server log for query details.
2024-05-10 20:39:58.459 BST [197591] (a)CONTEXT:  while locking tuple (0,2) in 
relation "q"
2024-05-10 20:39:58.459 BST [197591] (a)STATEMENT:  SELECT * FROM q WHERE id = 
2 FOR UPDATE;

All log line prefixes refer to the application a. The message has both a and b.

Anyway, the server log is not the important part here. The crucial UX feature 
is the client getting application names back, so  browsing through server logs 
can be avoided.

Best,
Karoline

Re: First draft of PG 17 release notes

2024-05-10 Thread Maiquel Grassi

Mhd

Enviado desde Outlook para Android

From: Bruce Momjian 
Sent: Friday, May 10, 2024 4:47:04 PM
To: Daniel Verite 
Cc: PostgreSQL-development 
Subject: Re: First draft of PG 17 release notes

On Fri, May 10, 2024 at 06:29:11PM +0200, Daniel Verite wrote:
>Bruce Momjian wrote:
>
> >  have committed the first draft of the PG 17 release notes;  you can
> > see the results here:
> >
> > https://momjian.us/pgsql_docs/release-17.html
>
> In the psql items, I'd suggest mentioning
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=90f5178
>
> For the short description, maybe something like that:
>
> - Improve FETCH_COUNT to work with all queries (Daniel Vérité)
> Previously, results would be fetched in chunks only for queries
> that start with the SELECT keyword.

Agreed, patch attached and applied.

--
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: pg_sequence_last_value() for unlogged sequences on standbys

2024-05-10 Thread Nathan Bossart

On Wed, May 08, 2024 at 11:01:01AM +0900, Michael Paquier wrote:
> On Tue, May 07, 2024 at 02:39:42PM -0500, Nathan Bossart wrote:
>> Okay, phew.  We can still do something like v3-0002 for v18.  I'll give
>> Michael a chance to comment on 0001 before committing/back-patching that
>> one.
> 
> What you are doing in 0001, and 0002 for v18 sounds fine to me.

Great.  Rather than commit this on a Friday afternoon, I'll just post what
I have staged for commit early next week.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 19d9a1dd88385664e6991121e4751aba85a45639 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 10 May 2024 15:55:24 -0500
Subject: [PATCH v4 1/1] Fix pg_sequence_last_value() for unlogged sequences on
 standbys.

Presently, when this function is called for an unlogged sequence on
a standby server, it will error out with a message like

ERROR:  could not open file "base/5/16388": No such file or directory

Since the pg_sequences system view uses pg_sequence_last_value(),
it can error similarly.  To fix, modify the function to return NULL
for unlogged sequences on standby servers.  Since this bug is
present on all versions since v15, this approach is preferable to
making the ERROR nicer because we need to repair the pg_sequences
view without modifying its definition on released versions.  For
consistency, this commit also modifies the function to return NULL
for other sessions' temporary sequences.  The pg_sequences view
already appropriately filters out such sequences, so there's no bug
there, but we might as well offer some defense in case someone
invokes this function directly.

Unlogged sequences were first introduced in v15, but temporary
sequences are much older, so while the fix for unlogged sequences
is only back-patched to v15, the temporary sequence portion is
back-patched to all supported versions.

We could also remove the privilege check in the pg_sequences view
definition in v18 if we modify this function to return NULL for
sequences for which the current user lacks privileges, but that is
left as a future exercise for when v18 development begins.

Reviewed-by: Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/20240501005730.GA594666%40nathanxps13
Backpatch-through: 12
---
 doc/src/sgml/system-views.sgml| 34 +++
 src/backend/commands/sequence.c   | 31 +---
 src/test/recovery/t/001_stream_rep.pl |  8 +++
 3 files changed, 60 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index a54f4a4743..9842ee276e 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -3091,15 +3091,41 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts 
ppx
   
The last sequence value written to disk.  If caching is used,
this value can be greater than the last value handed out from the
-   sequence.  Null if the sequence has not been read from yet.  Also, if
-   the current user does not have USAGE
-   or SELECT privilege on the sequence, the value is
-   null.
+   sequence.
   
  
 

   
+
+  
+   The last_value column will read as null if any of
+   the following are true:
+   
+
+ 
+  The sequence has not been read from yet.
+ 
+
+
+ 
+  The current user does not have USAGE or
+  SELECT privilege on the sequence.
+ 
+
+
+ 
+  The sequence is a temporary sequence created by another session.
+ 
+
+
+ 
+  The sequence is unlogged and the server is a standby.
+ 
+
+   
+  
+
  
 
  
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index 46103561c3..28f8522264 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -1777,11 +1777,8 @@ pg_sequence_last_value(PG_FUNCTION_ARGS)
Oid relid = PG_GETARG_OID(0);
SeqTableelm;
Relationseqrel;
-   Buffer  buf;
-   HeapTupleData seqtuple;
-   Form_pg_sequence_data seq;
-   boolis_called;
-   int64   result;
+   boolis_called = false;
+   int64   result = 0;
 
/* open and lock sequence */
init_sequence(relid, &elm, &seqrel);
@@ -1792,12 +1789,28 @@ pg_sequence_last_value(PG_FUNCTION_ARGS)
 errmsg("permission denied for sequence %s",

RelationGetRelationName(seqrel;
 
-   seq = read_seq_tuple(seqrel, &buf, &seqtuple);
+   /*
+* We return NULL for other sessions' temporary sequences.  The
+* pg_sequences system view already filters those out, but this offers a
+* defense against ERRORs in case someone invokes this function 
directly.
+*
+* Also, for the benefit of the pg_sequences view, we return NULL for
+

Re: Augmenting the deadlock message with application_name

Karoline Pauls  writes:
> On Friday, 10 May 2024 at 20:17, Bruce Momjian  wrote:
>> log_line_prefix supports application name --- why would you not use
>> that?

> log_line_prefix is effective in the server log. This change is mostly about 
> improving the message sent back to the client. While the server log is also 
> changed to reflect the client message, it doesn't need to be.

It's normally necessary to look at the server log anyway if you want
to figure out what caused the deadlock, since the client message
intentionally doesn't provide query texts.  I think this proposal
doesn't move the goalposts noticeably: it seems likely to me that
in many installations the sessions would mostly all have the same
application_name, or at best not-too-informative names like "psql"
versus "PostgreSQL JDBC Driver".  (If we thought these names *were*
really informative about what other sessions are doing, we'd probably
have to hide them from unprivileged users in pg_stat_activity, and
then there'd also be a security concern here.)

On the whole I'd reject this proposal as causing churn in
application-visible behavior for very little gain.

regards, tom lane

Re: Augmenting the deadlock message with application_name

On Fri, May 10, 2024 at 08:10:58PM +, Karoline Pauls wrote:
> On Friday, 10 May 2024 at 20:17, Bruce Momjian 
> wrote:
> >
> > log_line_prefix supports application name --- why would you not use
> > that?
> >
>
> log_line_prefix is effective in the server log. This change is mostly
> about improving the message sent back to the client. While the server
> log is also changed to reflect the client message, it doesn't need to
> be.

I was hoping client_min_messages would show the application name, but it
doesn't but your bigger point is below.

> Additionally, with `%a` added to log_line_prefix, the server log
> would only contain the application name of the client affected by the
> deadlock, not the application names of all other clients involved in
> it.

Yeah, getting the application names of the pids reported in the log
requires looking backward in the logs to find out what the most recent
log line was for the pids involved.

Frankly, I think it would be more useful to show the session_id in the
deadlock so you could then use that to look back to any details you want
in the logs, not only the application name.

> Anyway, the server log is not the important part here. The crucial
> UX feature is the client getting application names back, so browsing
> through server logs can be avoided.

Well, we don't want to report too much information because it gets
confusing.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: First draft of PG 17 release notes

On Fri, May 10, 2024 at 06:50:54PM +0200, Jelte Fennema-Nio wrote:
> On Thu, 9 May 2024 at 06:04, Bruce Momjian  wrote:
> >
> > I have committed the first draft of the PG 17 release notes;  you can
> > see the results here:
> >
> > https://momjian.us/pgsql_docs/release-17.html
> 
> Great work!
> 
> There are two commits that I think would benefit from being listed
> (but maybe they are already listed and I somehow missed them, or they
> are left out on purpose for some reason):

I looked at both of these.   In both cases I didn't see why the user
would need to know these changes were made:

---

> - c4ab7da60617f020e8d75b1584d0754005d71830

commit c4ab7da6061
Author: David Rowley 
Date:   Sun Apr 7 21:20:18 2024 +1200

Avoid needless large memcpys in libpq socket writing

Until now, when calling pq_putmessage to write new data to a libpq
socket, all writes are copied into a buffer and that buffer gets 
flushed
when full to avoid having to perform small writes to the socket.

There are cases where we must write large amounts of data to the 
socket,
sometimes larger than the size of the buffer.  In this case, it's
wasteful to memcpy this data into the buffer and flush it out, 
instead,
we can send it directly from the memory location that the data is 
already
stored in.

Here we adjust internal_putbytes() so that after having just 
flushed the
buffer to the socket, if the remaining bytes to send is as big or 
bigger
than the buffer size, we just send directly rather than needlessly
copying into the PqSendBuffer buffer first.

Examples of operations that write large amounts of data in one 
message
are; outputting large tuples with SELECT or COPY TO STDOUT and
pg_basebackup.

Author: Melih Mutlu
Reviewed-by: Heikki Linnakangas
Reviewed-by: Jelte Fennema-Nio
Reviewed-by: David Rowley
Reviewed-by: Ranier Vilela
Reviewed-by: Andres Freund
Discussion: 
https://postgr.es/m/cagpvpcr15nosj0f6xe-c2h477zfr88q12e6wjeoezc8zykt...@mail.gmail.com

> - cafe1056558fe07cdc52b95205588fcd80870362

commit cafe1056558
Author: Robert Haas 
Date:   Tue Apr 2 10:26:10 2024 -0400

Allow SIGINT to cancel psql database reconnections.

After installing the SIGINT handler in psql, SIGINT can no longer 
cancel
database reconnections. For instance, if the user starts a 
reconnection
and then needs to do some form of interaction (ie psql is polling),
there is no way to cancel the reconnection process currently.

Use PQconnectStartParams() in order to insert a cancel_pressed check
into the polling loop.

Tristan Partin, reviewed by Gurjeet Singh, Heikki Linnakangas, Jelte
Fennema-Nio, and me.

Discussion: http://postgr.es/m/d08wwcpvhkhn.3qelikzj2d...@neon.tech

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: First draft of PG 17 release notes

Bruce Momjian  writes:
> On Fri, May 10, 2024 at 06:50:54PM +0200, Jelte Fennema-Nio wrote:
>> There are two commits that I think would benefit from being listed
>> (but maybe they are already listed and I somehow missed them, or they
>> are left out on purpose for some reason):

> I looked at both of these.   In both cases I didn't see why the user
> would need to know these changes were made:

I agree that the buffering change is not likely interesting, but
the fact that you can now control-C out of a psql "\c" command
is user-visible.  People might have internalized the fact that
it didn't work, or created complicated workarounds.

regards, tom lane

Re: First draft of PG 17 release notes

On Fri, May 10, 2024 at 05:31:33PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Fri, May 10, 2024 at 06:50:54PM +0200, Jelte Fennema-Nio wrote:
> >> There are two commits that I think would benefit from being listed
> >> (but maybe they are already listed and I somehow missed them, or they
> >> are left out on purpose for some reason):
> 
> > I looked at both of these.   In both cases I didn't see why the user
> > would need to know these changes were made:
> 
> I agree that the buffering change is not likely interesting, but
> the fact that you can now control-C out of a psql "\c" command
> is user-visible.  People might have internalized the fact that
> it didn't work, or created complicated workarounds.

It was not clear to me what the user-visible behavior was with the
SIGINT control.  Yes, based on your details, it should be mentioned.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: Use generation memory context for tuplestore.c

2024-05-10 Thread David Rowley

Thanks for having a look at this.

On Sat, 11 May 2024 at 04:34, Dmitry Dolgov <9erthali...@gmail.com> wrote:
> Do I understand correctly, that the efficiency of generation memory
> context could be measured directly via counting number of malloc/free
> calls? In those experiments I've conducted the numbers were indeed
> visibly lower for the patched version (~30%), but I would like to
> confirm my interpretation of this difference.

For the test case I had in the script, I imagine the reduction in
malloc/free is just because of the elimination of the power-of-2
roundup.  If the test case had resulted in tuplestore_trim() being
used, e.g if Material was used to allow mark and restore for a Merge
Join or WindowAgg, then the tuplestore_trim() would result in
pfree()ing of tuples and further reduce malloc of new blocks.  This is
true because AllocSet records pfree'd non-large chunks in a freelist
element and makes no attempt to free() blocks.

In the tuplestore_trim() scenario with the patched version, the
pfree'ing of chunks is in a first-in-first-out order which allows an
entire block to become empty of palloc'd chunks which allows it to
become the freeblock and be reused rather than malloc'ing another
block when there's not enough space on the current block.  If the
scenario is that tuplestore_trim() always manages to keep the number
of tuples down to something that can fit on one GenerationBlock, then
we'll only malloc() 2 blocks and the generation code will just
alternate between the two, reusing the empty one when it needs a new
block rather than calling malloc.

> > 5. Higher likelihood of neighbouring tuples being stored consecutively
> > in memory, resulting in better CPU memory prefetching.
>
> I guess this roughly translates into better CPU cache utilization.
> Measuring cache hit ratio for unmodified vs patched versions in my case
> indeed shows about 10% less cache misses.

Thanks for testing that.  In simple test cases that's likely to come
from removing the power-of-2 round-up behaviour of AllocSet allowing
the allocations to be more dense.  In more complex cases when
allocations are making occasional use of chunks from AllowSet's
freelist[], the access pattern will be more predictable and allow the
CPU to prefetch memory more efficiently, resulting in a better CPU
cache hit ratio.

> The query you use in the benchmark, is it something like a "best-case"
> scenario for generation memory context? I was experimenting with
> different size of the b column, lower values seems to produce less
> difference between generation and aset (although still generation
> context is distinctly faster regarding final query latencies, see the
> attached PDF estimation, ran for 8192 rows). I'm curious what could be a
> "worst-case" workload type for this patch?

It's not purposefully "best-case".  Likely there'd be more improvement
if I'd scanned the Material node more than twice.  However, the tuple
size I picked is close to the best case as it's just over 1024 bytes.
Generation will just round the size up to the next 8-byte alignment
boundary, whereas AllocSet will round that up to 2048 bytes.

A case where the patched version would be even better would be where
the unpatched version spilled to disk but the patched version did not.
I imagine there's room for hundreds of percent speedup for that case.

> I've also noticed the first patch disables materialization in some tests.
>
> --- a/src/test/regress/sql/partition_prune.sql
> +++ b/src/test/regress/sql/partition_prune.sql
>
> +set enable_material = 0;
> +
>  -- UPDATE on a partition subtree has been seen to have problems.
>  insert into ab values (1,2);
>  explain (analyze, costs off, summary off, timing off)
>
> Is it due to the volatility of Maximum Storage values? If yes, could it
> be covered with something similar to COSTS OFF instead?

I don't think not showing this with COSTS OFF is very consistent with
Sort and Memoize's memory stats output.  I opted to disable the
Material node for that plan as it seemed like the easiest way to make
the output stable.  There are other ways that could be done. See
explain_memoize() in memoize.sql.  I thought about doing that, but it
seemed like overkill when there was a much more simple way to get what
I wanted. I'd certainly live to regret that if disabling Material put
the Nested Loop costs dangerously close to the costs of some other
join method and the plan became unstable on the buildfarm.

David

Re: Support tid range scan in parallel?

2024-05-10 Thread David Rowley

On Fri, 10 May 2024 at 05:16, Cary Huang  wrote:
> I understand that the regression tests for parallel ctid range scan is a
> bit lacking now. It only has a few EXPLAIN clauses to ensure parallel
> workers are used when tid ranges are specified. They are added as
> part of select_parallel.sql test. I am not sure if it is more appropriate
> to have them as part of tidrangescan.sql test instead. So basically
> re-run the same test cases in tidrangescan.sql but in parallel?

I think tidrangescan.sql is a more suitable location than
select_parallel.sql I don't think you need to repeat all the tests as
many of them are testing the tid qual processing which is the same
code as it is in the parallel version.

You should add a test that creates a table with a very low fillfactor,
low enough so only 1 tuple can fit on each page and insert a few dozen
tuples. The test would do SELECT COUNT(*) to ensure you find the
correct subset of tuples. You'd maybe want MIN(ctid) and MAX(ctid) in
there too for extra coverage to ensure that the correct tuples are
being counted.  Just make sure and EXPLAIN (COSTS OFF) the query first
in the test to ensure that it's always testing the plan you're
expecting to test.

David

Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2024-05-10 Thread Mark Dilger

> On May 10, 2024, at 12:05 PM, Alexander Korotkov  wrote:
> 
> The only bt_target_page_check() caller is
> bt_check_level_from_leftmost(), which overrides state->target in the
> next iteration anyway.  I think the patch is just refactoring to
> eliminate the confusion pointer by Peter Geoghegan upthread.

I find your argument unconvincing.

After bt_target_page_check() returns at line 919, and before 
bt_check_level_from_leftmost() overrides state->target in the next iteration, 
bt_check_level_from_leftmost() conditionally fetches an item from the page 
referenced by state->target.  See line 963.

I'm left with four possibilities:

1)  bt_target_page_check() never gets to the code that uses "rightpage" rather 
than "state->target" in the same iteration where bt_check_level_from_leftmost() 
conditionally fetches an item from state->target, so the change you're making 
doesn't matter.

2)  The code prior to v2-0003 was wrong, having changed state->target in an 
inappropriate way, causing the wrong thing to happen at what is now line 963.  
The patch fixes the bug, because state->target no longer gets overwritten where 
you are now using "rightpage" for the value.

3)  The code used to work, having set up state->target correctly in the place 
where you are now using "rightpage", but v2-0003 has broken that. 

4)  It's been broken all along and your patch just changes from wrong to wrong.

If you believe (1) is true, then I'm complaining that you are relying far to 
much on action at a distance, and that you are not documenting it.  Even with 
documentation of this interrelationship, I'd be unhappy with how brittle the 
code is.  I cannot easily discern that the two don't ever happen in the same 
iteration, and I'm not at all convinced one way or the other.  I tried to set 
up some Asserts about that, but none of the test cases actually reach the new 
code, so adding Asserts doesn't help to investigate the question.

If (2) is true, then I'm complaining that the commit message doesn't mention 
the fact that this is a bug fix.  Bug fixes should be clearly documented as 
such, otherwise future work might assume the commit can be reverted with only 
stylistic consequences.

If (3) is true, then I'm complaining that the patch is flat busted.

If (4) is true, then maybe we should revert the entire feature, or have a 
discussion of mitigation efforts that are needed.

Regardless of which of 1..4 you pick, I think it could all do with more 
regression test coverage.

For reference, I said something similar earlier today in another email to this 
thread:

This patch introduces a change that stores a new page into variable "rightpage" 
rather than overwriting "state->target", which the old implementation most 
certainly did.  That means that after returning from bt_target_page_check() 
into the calling function bt_check_level_from_leftmost() the value in 
state->target is not what it would have been prior to this patch.  Now, that'd 
be irrelevant if nobody goes on to consult that value, but just 44 lines 
further down in bt_check_level_from_leftmost() state->target is clearly used.  
So the behavior at that point is changing between the old and new versions of 
the code, and I think I'm within reason to ask if it was wrong before the 
patch, wrong after the patch, or something else?  Is this a bug being 
introduced, being fixed, or ... ?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Why is citext/regress failing on hamerkop?

2024-05-10 Thread Thomas Munro

For example, 'i'::citext = 'İ'::citext fails to be true.

It must now be using UTF-8 (unlike, say, Drongo) and non-C ctype,
given that the test isn't skipped. This isn't the first time that
we've noticed that Windows doesn't seem to know about İ→i (see [1]),
but I don't think anyone has explained exactly why, yet. It could be
that it just doesn't know about that in any locale, or that it is
locale-dependent and would only do that for Turkish, the same reason
we skip the test for ICU, or ...

Either way, it seems like we'll need to skip that test on Windows if
we want hamerkop to be green. That can probably be cribbed from
collate.windows.win1252.sql into contrib/citext/sql/citext_utf8.sql's
prelude... I just don't know how to explain it in the comment 'cause I
don't know why.

One new development in Windows-land is that the system now does
actually support UTF-8 in the runtime libraries[2]. You can set it at
a system level, or for an application at build time, or by adding
".UTF-8" to a locale name when opening the locale (apparently much
more like Unix systems, but I don't know what exactly it does). I
wonder why we see this change now... did hamerkop have that ACP=UTF-8
setting applied on purpose, or if computers in Japan started doing
that by default instead of using Shift-JIS, or if it only started
picking UTF-8 around the time of the Meson change somehow, or the
initdb-template change. It's a little hard to tell from the logs.

[1]
https://www.postgresql.org/message-id/CAC%2BAXB10p%2BmnJ6wrAEm6jb51%2B8%3DBfYzD%3Dw6ftHRbMjMuSFN3kQ%40mail.gmail.com
[2]
https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page

Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.