Re: Track in pg_replication_slots the reason why slots conflict?

2024-01-02 Thread Bertrand Drouvot
Hi,

On Wed, Jan 03, 2024 at 08:53:44AM +0530, Amit Kapila wrote:
> On Wed, Jan 3, 2024 at 7:10 AM Michael Paquier  wrote:
> >
> > On Tue, Jan 02, 2024 at 02:07:58PM +, Bertrand Drouvot wrote:
> > > +   wal_level_insufficient means that the
> > > +is insufficient on the primary
> > > +   server.
> > >
> > > I'd prefer "primary_wal_level" instead of "wal_level_insufficient". I 
> > > think it's
> > > better to directly mention it is linked to the primary (without the need 
> > > to refer
> > > to the documentation) and that the fact that it is "insufficient" is more 
> > > or less
> > > implicit.
> > >
> > > Basically I think that with "primary_wal_level" one would need to refer 
> > > to the doc
> > > less frequently than with "wal_level_insufficient".
> >
> > I can see your point, but wal_level_insufficient speaks a bit more to
> > me because of its relationship with the GUC setting.   Something like
> > wal_level_insufficient_on_primary may speak better, but that's also
> > quite long.  I'm OK with what the patch does.
> >
> 
> Thanks, I also prefer "wal_level_insufficient". To me
> "primary_wal_level" sounds more along the lines of a GUC name than the
> conflict_reason. The other names that come to mind are
> "wal_level_lower_than_required", "wal_level_lower",
> "wal_level_lesser_than_required", "wal_level_lesser" but I feel
> "wal_level_insufficient" sounds better than these. Having said that, I
> am open to any of these or better options for this conflict_reason.
> 

Thank you both for giving your thoughts on it, I got your points and I'm OK with
"wal_level_insufficient".

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: speed up a logical replica setup

2024-01-02 Thread vignesh C
On Wed, 1 Nov 2023 at 19:28, Ashutosh Bapat
 wrote:
>
> At this stage the standby would have various replication objects like
> publications, subscriptions, origins inherited from the upstream
> server and possibly very much active. With failover slots, it might
> inherit replication slots. Is it intended that the new subscriber also
> acts as publisher for source's subscribers OR that the new subscriber
> should subscribe to the upstreams of the source? Some use cases like
> logical standby might require that but a multi-master multi-node setup
> may not. The behaviour should be user configurable.

How about we do like this:
a) Starting the server in binary upgrade mode(so that the existing
subscriptions will not try to connect to the publishers) b) Disable
the subscriptions c) Drop the replication slots d) Drop the
publications e) Then restart the server in normal(non-upgrade) mode.
f) The rest of pg_subscriber work like
create_all_logical_replication_slots, create_subscription,
set_replication_progress, enable_subscription, etc
This will be done by default. There will be an option
--clean-logical-replication-info provided to allow DBA not to clean
the objects if DBA does not want to remove these objects.
I felt cleaning the replication information is better as a) Node-1
will replicate all the data to Node-2 (that Node-1 is subscribing to
from other nodes) after pg_subscriber setup is done. b) all the data
that Node-1 is publishing need not be published again by Node-2. There
is an option to override if the user does not want to remove the
logical replication objects.

Regards,
Vignesh




Re: Transaction timeout

2024-01-02 Thread Andrey M. Borodin
On 1 Jan 2024, at 19:28, Andrey M. Borodin  wrote: 3. Check that timeout is not rescheduled by new queries (Nik's case)The test of Nik's case was not stable enough together with COMMIT AND CHAIN. So I've separated these cases into different permutations.Looking through CI logs it seems variation in sleeps and actual timeouts easily reach 30+ms. I'm not entirely sure we can reach 100% stable tests without too big timeouts.Best regards, Andrey Borodin.

v21-0001-Introduce-transaction_timeout.patch
Description: Binary data


v21-0002-Add-better-tests-for-transaction_timeout.patch
Description: Binary data


v21-0003-Try-to-enable-transaction_timeout-before-next-co.patch
Description: Binary data


v21-0004-fix-reschedule-timeout-for-each-commmand.patch
Description: Binary data


Re: pg_upgrade and logical replication

2024-01-02 Thread Michael Paquier
On Wed, Jan 03, 2024 at 11:24:50AM +0530, Amit Kapila wrote:
> I think the next possible step here is to document how to upgrade the
> logical replication nodes as previously discussed in this thread [1].
> IIRC, there were a few issues with the steps mentioned but if we want
> to document those we can start a separate thread for it as that
> involves both publishers and subscribers.
> 
> [1] - 
> https://www.postgresql.org/message-id/CALDaNm2pe7SoOGtRkrTNsnZPnaaY%2B2iHC40HBYCSLYmyRg0wSw%40mail.gmail.com

Yep.  A second thing is whether it makes sense to have more automated
test coverage when it comes to the interferences between subscribers
and publishers with more complex node structures.
--
Michael


signature.asc
Description: PGP signature


Re: Assorted typo fixes

2024-01-02 Thread Tom Lane
Michael Paquier  writes:
> 0010 is indeed the "correct" plural form for vertex I've known but
> "vertexes" is not wrong either.  Perhaps that's worth changing on
> consistency grounds?

Yeah.  A quick grep shows that we have 16 uses of "vertices" and
only this one of "vertexes".  It's not really wrong, but +1 for
making it match the others.

> 8), must be identical.  It doesn't matter which representation
> you choose to be the canonical one, so long as two equivalent values with
> -   different formattings are always mapped to the same value with the same
> +   different formatting are always mapped to the same value with the same
> formatting.

> I am not sure about this one in 0011 though..  It also feels like this
> could be reworded completely.

I'd leave this alone, it's not wrong either.  If you want to propose
a complete rewording, do so; but that's not "misspelling" territory.

regards, tom lane




Re: pg_upgrade and logical replication

2024-01-02 Thread Amit Kapila
On Wed, Jan 3, 2024 at 6:21 AM Michael Paquier  wrote:
>
> On Tue, Jan 02, 2024 at 03:58:25PM +0530, Amit Kapila wrote:
> > On Fri, Dec 29, 2023 at 2:26 PM vignesh C  wrote:
> >> Thanks, the changes look good.
> >
> > Pushed.
>
> Yeah!  Thanks Amit and everybody involved here!  Thanks also to Julien
> for raising the thread and the problem, to start with.
>

I think the next possible step here is to document how to upgrade the
logical replication nodes as previously discussed in this thread [1].
IIRC, there were a few issues with the steps mentioned but if we want
to document those we can start a separate thread for it as that
involves both publishers and subscribers.

[1] - 
https://www.postgresql.org/message-id/CALDaNm2pe7SoOGtRkrTNsnZPnaaY%2B2iHC40HBYCSLYmyRg0wSw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: Assorted typo fixes

2024-01-02 Thread Michael Paquier
On Tue, Jan 02, 2024 at 12:34:20PM -0500, Robert Haas wrote:
> 0002-0005 look OK to me, so I committed those as well.

Cool, thanks.

> With regard to 0006, we typically use indexes rather than indices as
> the plural of "index", although exceptions exist.
> 
> I haven't looked at the rest.

I got that on a local branch, then got drifted away.  I have grouped
0007~0009 together (0007 was on me), and applied them on HEAD.

0010 is indeed the "correct" plural form for vertex I've known but
"vertexes" is not wrong either.  Perhaps that's worth changing on
consistency grounds?

8), must be identical.  It doesn't matter which representation
you choose to be the canonical one, so long as two equivalent values with
-   different formattings are always mapped to the same value with the same
+   different formatting are always mapped to the same value with the same
formatting.

I am not sure about this one in 0011 though..  It also feels like this
could be reworded completely.
--
Michael


signature.asc
Description: PGP signature


Re: Adding facility for injection points (or probe points?) for more advanced tests

2024-01-02 Thread Nathan Bossart
I'd like to spend some more time reviewing this one, but here are a couple
of comments.

On Tue, Dec 12, 2023 at 11:44:57AM +0100, Michael Paquier wrote:
> +/*
> + * Allocate shmem space for dynamic shared hash.
> + */
> +void
> +InjectionPointShmemInit(void)
> +{
> +#ifdef USE_INJECTION_POINTS
> + HASHCTL info;
> +
> + /* key is a NULL-terminated string */
> + info.keysize = sizeof(char[INJ_NAME_MAXLEN]);
> + info.entrysize = sizeof(InjectionPointEntry);
> + InjectionPointHash = ShmemInitHash("InjectionPoint hash",
> +
> INJECTION_POINT_HASH_INIT_SIZE,
> +
> INJECTION_POINT_HASH_MAX_SIZE,
> +
> ,
> +
> HASH_ELEM | HASH_STRINGS);
> +#endif
> +}

Should we specify HASH_FIXED_SIZE, too?  This hash table will be in the
main shared memory segment and therefore won't be able to expand too far
beyond the declared maximum size.

> + /*
> +  * Check if the callback exists in the local cache, to avoid unnecessary
> +  * external loads.
> +  */
> + injection_callback = injection_point_cache_get(name);
> + if (injection_callback == NULL)
> + {
> + charpath[MAXPGPATH];
> +
> + /* Found, so just run the callback registered */
> + snprintf(path, MAXPGPATH, "%s/%s%s", pkglib_path,
> +  entry_by_name->library, DLSUFFIX);
> +
> + if (!file_exists(path))
> + elog(ERROR, "could not find injection library \"%s\"", 
> path);
> +
> + injection_callback = (InjectionPointCallback)
> + load_external_function(path, entry_by_name->function, 
> true, NULL);
> +
> + /* add it to the local cache when found */
> + injection_point_cache_add(name, injection_callback);
> + }

I'm wondering how important it is to cache the callbacks locally.
load_external_function() won't reload an already-loaded library, so AFAICT
this is ultimately just saving a call to dlsym().

> + name is the name of the injection point, that
> + will execute the function loaded from
> + library.
> + Injection points are saved in a hash table in shared memory, and
> + last until the server is shut down.
> +

I think  is supposed to be  here.

> +++ 
> b/src/test/modules/test_injection_points/t/002_invalid_checkpoint_after_promote.pl

0003 and 0004 add tests to the test_injection_points module.  Is the idea
that we'd add any tests that required injection points here?  I think it'd
be better if we could move the tests closer to the logic they're testing,
but perhaps that is difficult because you also need to define the callback
functions somewhere.  Hm...

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2024-01-02 Thread Dilip Kumar
On Tue, Jan 2, 2024 at 7:53 PM Robert Haas  wrote:
>
> On Fri, Dec 22, 2023 at 8:14 AM Andrey M. Borodin  
> wrote:
> > Just a side node.
> > It seems like commit log is kind of antipattern of data contention. Even 
> > when we build a super-optimized SLRU. Nearby **bits** are written by 
> > different CPUs.
> > I think that banks and locks are good thing. But also we could reorganize 
> > log so that
> > status of transaction 0 is on a page 0 at bit offset 0
> > status of transaction 1 is on a page 1 at bit offset 0
> > status of transaction 2 is on a page 2 at bit offset 0
> > status of transaction 3 is on a page 3 at bit offset 0
> > status of transaction 4 is on a page 0 at bit offset 2
> > status of transaction 5 is on a page 1 at bit offset 2
> > status of transaction 6 is on a page 2 at bit offset 2
> > status of transaction 7 is on a page 3 at bit offset 2
> > etc...
>
> This is an interesting idea. A variant would be to stripe across
> cachelines within the same page rather than across pages. If we do
> stripe across pages as proposed here, we'd probably want to rethink
> the way the SLRU is extended -- page at a time wouldn't really make
> sense, but preallocating an entire file might.

Yeah, this is indeed an interesting idea.  So I think if we are
interested in working in this direction maybe this can be submitted in
a different thread, IMHO.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




LLVM 18

2024-01-02 Thread Thomas Munro
LLVM 16 provided a new function name[1], and LLVM 18 (not shipped yet)
has started complaining[2] about the old spelling.

Here's a patch.

[1] 
https://github.com/llvm/llvm-project/commit/1b97645e56bf321b06d1353024339958b64fd242
[2] 
https://github.com/llvm/llvm-project/commit/5ac12951b4e9bbfcc5791282d0961ec2b65575e9
From bc2a07e0012aa58af2cf97a202d181f473a4d7bc Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Wed, 3 Jan 2024 17:45:30 +1300
Subject: [PATCH] Track LLVM 18 changes.

https://github.com/llvm/llvm-project/commit/1b97645e56bf321b06d1353024339958b64fd242
https://github.com/llvm/llvm-project/commit/5ac12951b4e9bbfcc5791282d0961ec2b65575e9

diff --git a/src/backend/jit/llvm/llvmjit_inline.cpp b/src/backend/jit/llvm/llvmjit_inline.cpp
index d92d7f3c88..17c0aa427a 100644
--- a/src/backend/jit/llvm/llvmjit_inline.cpp
+++ b/src/backend/jit/llvm/llvmjit_inline.cpp
@@ -821,7 +821,10 @@ static void
 add_module_to_inline_search_path(InlineSearchPath& searchpath, llvm::StringRef modpath)
 {
 	/* only extension in libdir are candidates for inlining for now */
-	if (!modpath.startswith("$libdir/"))
+#if LLVM_VERSION_MAJOR < 16
+#define starts_with startswith
+#endif
+	if (!modpath.starts_with("$libdir/"))
 		return;
 
 	/* if there's no match, attempt to load */
-- 
2.39.2



Re: verify predefined LWLocks have entries in wait_event_names.txt

2024-01-02 Thread Robert Haas
On Tue, Jan 2, 2024 at 4:45 PM Nathan Bossart  wrote:
> That seems to date back to commit 14a9101.  I can agree that the suffix is
> somewhat redundant since these are already marked as type "LWLock", but
> I'll admit I've been surprised by this before, too.  IMHO it makes this
> proposed test more important because you can't just grep for a different
> lock to find all the places you need to update.

I agree. I am pretty sure that the reason this happened in the first
place is that I grepped for the name of some other LWLock and adjusted
things for the new lock at every place where that found a hit.

> > - Check in both directions instead of just one?
> >
> > - Verify ordering?
>
> To do those things, I'd probably move the test to one of the scripts that
> generates the documentation or header file (pg_wait_events doesn't tell us
> whether a lock is predefined or what order it's listed in).  That'd cause
> failures at build time instead of during testing, which might be kind of
> nice, too.

Yeah, I think that would be better.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Track in pg_replication_slots the reason why slots conflict?

2024-01-02 Thread Amit Kapila
On Wed, Jan 3, 2024 at 7:10 AM Michael Paquier  wrote:
>
> On Tue, Jan 02, 2024 at 02:07:58PM +, Bertrand Drouvot wrote:
> > +   wal_level_insufficient means that the
> > +is insufficient on the primary
> > +   server.
> >
> > I'd prefer "primary_wal_level" instead of "wal_level_insufficient". I think 
> > it's
> > better to directly mention it is linked to the primary (without the need to 
> > refer
> > to the documentation) and that the fact that it is "insufficient" is more 
> > or less
> > implicit.
> >
> > Basically I think that with "primary_wal_level" one would need to refer to 
> > the doc
> > less frequently than with "wal_level_insufficient".
>
> I can see your point, but wal_level_insufficient speaks a bit more to
> me because of its relationship with the GUC setting.   Something like
> wal_level_insufficient_on_primary may speak better, but that's also
> quite long.  I'm OK with what the patch does.
>

Thanks, I also prefer "wal_level_insufficient". To me
"primary_wal_level" sounds more along the lines of a GUC name than the
conflict_reason. The other names that come to mind are
"wal_level_lower_than_required", "wal_level_lower",
"wal_level_lesser_than_required", "wal_level_lesser" but I feel
"wal_level_insufficient" sounds better than these. Having said that, I
am open to any of these or better options for this conflict_reason.

> +   as invalidated. Possible values are:
> +
> Higher-level nit: indentation seems to be one space off here.
>

Thanks, fixed in the attached patch.

-- 
With Regards,
Amit Kapila.


v6-0001-Track-conflict_reason-in-pg_replication_slots.patch
Description: Binary data


Re: verify predefined LWLocks have entries in wait_event_names.txt

2024-01-02 Thread Michael Paquier
On Tue, Jan 02, 2024 at 11:31:20AM -0600, Nathan Bossart wrote:
> +# Find the location of lwlocknames.h.
> +my $include_dir = $node->config_data('--includedir');
> +my $lwlocknames_file = "$include_dir/server/storage/lwlocknames.h";

I am afraid that this is incorrect because an installation could
decide to install server-side headers in a different path than
$include/server/.  Using --includedir-server would be the correct
answer, appending "storage/lwlocknames.h" to the path retrieved from
pg_config.
--
Michael


signature.asc
Description: PGP signature


Re: Reducing output size of nodeToString

2024-01-02 Thread David Rowley
On Thu, 14 Dec 2023 at 19:21, Matthias van de Meent
 wrote:
>
> On Thu, 7 Dec 2023 at 13:09, David Rowley  wrote:
> > We could also easily serialize plans to binary format for copying to
> > parallel workers rather than converting them to a text-based
> > serialized format. It would also allow us to do things like serialize
> > PREPAREd plans into a nicely compact single allocation that we could
> > just pfree in a single pfree call on DEALLOCATE.
>
> I'm not sure what benefit you're refering to. If you mean "it's more
> compact than the current format" then sure; but the other points can
> already be covered by either the current nodeToString format, or by
> nodeCopy-ing the prepared plan into its own MemoryContext, which would
> allow us to do essentially the same thing.

There's significantly less memory involved in just having a plan
serialised into a single chunk of memory vs a plan stored in its own
MemoryContext.  With the serialised plan, you don't have any power of
2 rounding up wastage that aset.c does and don't need extra space for
all the MemoryChunks that would exist for every single palloc'd chunk
in the MemoryContext version.

I think it would be nice if one day in the future if a PREPAREd plan
could have multiple different plans cached. We could then select which
one to use by looking at statistics for the given parameters and
choose the plan that's most suitable for the given parameters.   Of
course, this is a whole entirely different project. I mention it just
because being able to serialise a plan would make the memory
management and overhead for such a feature much more manageable.
There'd likely need to be some eviction logic in such a feature as the
number of possible plans for some complex query is quite likely to be
much more than we'd care to cache.

David




Re: Track in pg_replication_slots the reason why slots conflict?

2024-01-02 Thread Michael Paquier
On Tue, Jan 02, 2024 at 02:07:58PM +, Bertrand Drouvot wrote:
> +   wal_level_insufficient means that the
> +is insufficient on the primary
> +   server.
> 
> I'd prefer "primary_wal_level" instead of "wal_level_insufficient". I think 
> it's
> better to directly mention it is linked to the primary (without the need to 
> refer
> to the documentation) and that the fact that it is "insufficient" is more or 
> less
> implicit.
> 
> Basically I think that with "primary_wal_level" one would need to refer to 
> the doc
> less frequently than with "wal_level_insufficient".

I can see your point, but wal_level_insufficient speaks a bit more to
me because of its relationship with the GUC setting.   Something like
wal_level_insufficient_on_primary may speak better, but that's also
quite long.  I'm OK with what the patch does.

+   as invalidated. Possible values are:
+
Higher-level nit: indentation seems to be one space off here.
--
Michael


signature.asc
Description: PGP signature


Re: Remove unneeded PGDATABASE setting from TAP tests

2024-01-02 Thread Michael Paquier
On Mon, Jan 01, 2024 at 01:52:10PM +0530, Bharath Rupireddy wrote:
> Oh, yeah. We can remove that too, after all, PGDATABASE is being set
> to postgres the default database which PostgreSQL::Test::Cluster does
> set for us. I was earlier swayed by the comment atop the PGDATABASE
> setting in 011_clusterdb_all.pl. Attached is v2 patch with this
> change.

Thanks.  Applied after keeping the comment in 011_clusterdb_all.pl,
rewording it a bit.
--
Michael


signature.asc
Description: PGP signature


Re: pg_upgrade and logical replication

2024-01-02 Thread Michael Paquier
On Tue, Jan 02, 2024 at 03:58:25PM +0530, Amit Kapila wrote:
> On Fri, Dec 29, 2023 at 2:26 PM vignesh C  wrote:
>> Thanks, the changes look good.
> 
> Pushed.

Yeah!  Thanks Amit and everybody involved here!  Thanks also to Julien
for raising the thread and the problem, to start with.
--
Michael


signature.asc
Description: PGP signature


Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-01-02 Thread Jacob Burroughs
What if we created a new guc flag `GUC_PROTOCOL_EXTENSION` (or a
better name), used that for any of the GUCs options that should *only*
be set via a `ParameterSet` protocol message, and then prevent
changing those through SET/RESET/RESET ALL (but I don't see a reason
to prevent reading them through SHOW). (I would imagine most/all
`GUC_PROTOCOL_EXTENSION` GUCs would also set `GUC_NOT_IN_SAMPLE`,
`GUC_DISALLOW_IN_FILE`, and maybe `GUC_NO_SHOW_ALL`.). Looking back at
use cases in the original message of this thread, I would imagine at
least the "configurable GUC report" and "protocol compression" would
likely want to be flagged with `GUC_PROTOCOL_EXTENSION` since both
would seem like things that would require low-level client
involvement/support when they change.

I was already drafting this email before the last one in the thread
came through, but I think this proposal addresses a few things from
it:
> - Since we currently don't have any protocol parameters. How do I test
> it? Should I add a debug protocol parameter specifically for this
> purpose? Or should my tests  just always error at the moment?
`protcocol_managed_params` would serve as the example/test parameter
in this patch series
> - If I create a debug protocol parameter, what designs should it
> inherit from GUCs? e.g. parsing and input validation sounds useful.
> And where should it be stored e.g. protocol_params.c?
I'm thinking using flag(s) on GUCs would get useful mechanics without
needing to implement an entirely separate param system.  It appears
there are existing flags that cover almost everything we would want
except for actually marking a GUC as a protocol extension.
> - How do you get the value of a protocol parameter status? Do we
> expect the client to keep track of what it has sent? Do we always send
> a ParameterStatus message whenever it is changed? Or should we add a
> ParamaterGet protocol message too?
I would think it would be reasonable for a client to track its own
ParameterSets if it has a reason to care about them, since presumably
it needs to do something differently after setting them anyways?
Though I could see an argument for sending `ParameterStatus` if we
want that to serve as an "ack" so a client could wait until it had
active confirmation from a server that a new parameter value was
applied when necessary.

> To clarify, the main thing that I want to do is take the value from
> ParameterStatus and somehow, without having to escape it, set that
> value for that GUC for a different session. As explained above, I now
> think that this newly proposed protocol message is a bad fit for this.
> But I still think that that is not a weird thing to want.

Now, for the possibly nutty part of my proposal, what if we added a
GUC string list named `protcocol_managed_params` that had the
`GUC_PROTOCOL_EXTENSION` flag set, which would be a list of GUC names
to treat as if `GUC_PROTOCOL_EXTENSION` is set on them within the
context of the session.   If a client wanted to use `ParameterSet` to
manage a non-`GUC_PROTOCOL_EXTENSION` managed parameter, it would
first issue a `ParameterSet` to add the parameter to the
`protcocol_managed_params` list, and then issue a `ParameterSet` to
actually set the parameter.  If for some reason (e.g. some pgbouncer
use cases) the client then wanted the parameter to be settable
"normally" it would issue a third `ParameterSet` to remove the
parameter from `protcocol_managed_params`.  Regarding transactional
semantics, I *think* it would be reasonable to specify that changes
made through `ParameterSet` are not transactional and that
`protcocol_managed_params` itself cannot be changed while a
transaction is active.  That way within a given transaction a given
parameter has *only* transactional semantics or has *only*
nontransactional semantics, which I think avoids the potential
consistency problems?  I think this would both address the pgbouncer
use case where it wants to be able to reflect a ParameterStatus
message back to the server while being agnostic to its contents while
also addressing the "SET ROLE"/"SET SESSION AUTHORIZATION" issue: a
pooler would just add `session_authorization` to the
`protcocol_managed_params` list and then it would have full control
over the user by not passing along `ParameterSet` messages setting the
user from the client. (I think it would generally be reasonable to
still allow setting the role within the restricted
session_authorization role, but that would be a pooler decision not a
PG one.)




Re: Re: UUID v7

2024-01-02 Thread Przemysław Sztoch

Dear Andrey,

1. Is it possible to add a function that returns the version of the 
generated uuid?

It will be very useful.
I don't know if it's possible, but I think there are bits in the UUID 
that inform about the version.


2. If there is any doubt about adding the function to the main sources 
(standard development in progress), in my opinion you can definitely add 
this function to the uuid-ossp extension.


3. Wouldn't it be worth including UUID version 6 as well?

4. Sometimes you will need to generate a uuid for historical time. There 
should be an additional function gen_uuid_v7(timestamp).


Nevertheless, the need for uuid v6/7/8 is very high and I'm glad it's 
coming to PostgreSQL. It should be a PG17 version.

--
Przemysław Sztoch | Mobile +48 509 99 00 66


Re: Things I don't like about \du's "Attributes" column

2024-01-02 Thread Jim Nasby


  
  
On 1/2/24 1:38 PM, Robert Haas wrote:


  But to try to apply that concept
here means that we suppose the user knows whether the default is
INHERIT or NOINHERIT, whether the default is BYPASSRLS or NOBYPASSRLS,
etc. And I'm just a little bit skeptical of that assumption. Perhaps
it's just that I've spent less time doing user management than table
administration and so I'm the only one who finds this fuzzier than
some other kinds of SQL objects, but I'm not sure it's just that.
Roles are pretty weird.

In my consulting experience, it's extremely rare for users to do
  anything remotely sophisticated with roles (I was always happy
  just to see apps weren't connecting as a superuser...).
Like you, I view \du and friends as more of a "helping hand" to
  seeing the state of things, without the expectation that every
  tiny nuance will always be visible, because I don't think it's
  practical to do that in psql. While that behavior might surprise
  some users, the good news is once they start exploring non-default
  options the behavior becomes self-evident.
Some attributes are arguably important enough to warrant their
  own column. The most obvious is NOLOGIN, since those roles are
  generally used for a very different purpose than LOGIN roles.
  SUPERUSER might be another candidate (though, I much prefer a
  dedicated "sudo role" than explicit SU on roles).
I'm on the fence when it comes to SQL syntax vs what we have now.
  What we currenly have is more readable, but off-hand I think the
  other places we list attributes we do it in SQL syntax. It might
  be worth changing just for consistency sake.
--
Jim Nasby, Data Architect, Austin TX

  





Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-01-02 Thread Jelte Fennema-Nio
On Tue, 2 Jan 2024 at 18:51, Robert Haas  wrote:
> > On the whole, this feels like you are trying to force some things
> > into the GUC model that should not be there.  I do perceive that
> > there are things that could be protocol-level variables, but
> > trying to say they are a kind of GUC seems like it will create
> > more problems than it solves.
>
> This is not a bad thought. If we made the things that were settable
> with this mechanism distinct from the set of things that are settable
> as GUCs, that might work out better. For example, it completely the
> objection to (3). But I'm not 100% sure, either.

It seems like the general sentiment in the thread is that protocol
parameters are different enough from GUCs that they should be handled
separately. And I think I agree. Partially because of the
transactional reasons mentioned upthread, but also because allowing to
change defaults of protocol parameters in postgresql.conf seems like a
really bad idea. If the client does not specify the protocol
parameter, then they almost certainly want whatever value the default
has been for ages. Giving a DBA the ability to change that seems like
a recipe to accidentally break many clients.

It does cause some new questions for this patchset though:
- Since we currently don't have any protocol parameters. How do I test
it? Should I add a debug protocol parameter specifically for this
purpose? Or should my tests  just always error at the moment?
- If I create a debug protocol parameter, what designs should it
inherit from GUCs? e.g. parsing and input validation sounds useful.
And where should it be stored e.g. protocol_params.c?
- How do you get the value of a protocol parameter status? Do we
expect the client to keep track of what it has sent? Do we always send
a ParameterStatus message whenever it is changed? Or should we add a
ParamaterGet protocol message too?

> > > 4. Have an easy way to use the value contained in a ParameterStatus
> > > message as the value for a GUC
> >
> > I agree that GUC_LIST_QUOTE is a mess, but "I'm too lazy to deal
> > with that" seems like a rather poor argument for instead expending
> > the amount of labor involved in a protocol change.
>
> Not sure about this one. It seems unwarranted to introduce an
> accusation of laziness when someone is finally making the effort to
> address what is IMV a pretty serious deficiency in our current
> implementation, but I have no educated opinion about what if anything
> ought to be done about GUC_LIST_QUOTE or how that relates to the
> present effort.

To clarify, the main thing that I want to do is take the value from
ParameterStatus and somehow, without having to escape it, set that
value for that GUC for a different session. As explained above, I now
think that this newly proposed protocol message is a bad fit for this.
But I still think that that is not a weird thing to want.

The current situation is that you get a value in ParameterStatus, but
before it's useful you first need to do some escaping. And to know how
to escape it requires you to maintain a hardcoded list of GUCs that
are GUC_LIST_QUOTE (which might change in the next Postgres release).
I see two options to address this:

1. Add another protocol message that sets GUCs instead of protocol
parameters (which would behave just like SET, i.e. it would be
transactional)
2. Support preparing "SET search_path = $1" and then bind a single
string to it. i.e. have this PSQL command not fail with a syntax
error:
> SET search_path = $1; \bind '"user", public';
ERROR:  42601: syntax error at or near "$1"
LINE 1: SET search_path = $1;

I'll take a look at implementing option 2, since I have a feeling
that's less likely to be controversial.




Re: Next step towards 64bit XIDs: Switch to FullTransactionId for PGPROC->xid and XLogRecord->xl_xid

2024-01-02 Thread Jim Nasby


  
  
On 1/2/24 1:58 PM, Robert Haas wrote:


  Maybe this analysis I've just given isn't quite right, but my point is
that we should try to think hard about where in the system 32-bit XIDs
suck and for what reason, and use that as a guide to what to change
first.

Very much this. The biggest reason 2B XIDs are such an issue is
  because it's incredibly expensive to move the window forward,
  which is governed by on-disk stuff.

  





Re: SET ROLE x NO RESET

2024-01-02 Thread Jim Nasby


  
  
On 12/31/23 1:19 PM, Joe Conway wrote:

On
  12/30/23 17:19, Michał Kłeczek wrote:
  
  

On 30 Dec 2023, at 17:16, Eric Hanson
   wrote:
  
  
  What do you think of adding a NO RESET option to the SET ROLE
  command?
  


What I proposed some time ago is SET ROLE … GUARDED BY
‘password’, so that you could later: RESET ROLE WITH ‘password'

  
  
  I like that too, but see it as a separate feature. FWIW that is
  also supported by the set_user extension referenced elsewhere on
  this thread.

That means you'd need to manage that password. ISTM a better
mechanism to do this in SQL would be a method of changing roles that
returns a nonce that you'd then use to reset your role. I believe
that'd also make it practical to do this server-side in the various
PLs.
  





Re: introduce dynamic shared memory registry

2024-01-02 Thread Nathan Bossart
Here's a new version of the patch set with Bharath's feedback addressed.

On Tue, Jan 02, 2024 at 11:31:14AM -0500, Robert Haas wrote:
> On Tue, Jan 2, 2024 at 11:21 AM Nathan Bossart  
> wrote:
>> > Are we expecting, for instance, a 128-bit UUID being used as a key and
>> > hence limiting it to a higher value 256 instead of just NAMEDATALEN?
>> > My thoughts were around saving a few bytes of shared memory space that
>> > can get higher when multiple modules using a DSM registry with
>> > multiple DSM segments.
>>
>> I'm not really expecting folks to use more than, say, 16 characters for the
>> key, but I intentionally set it much higher in case someone did have a
>> reason to use longer keys.  I'll lower it to 64 in the next revision unless
>> anyone else objects.
> 
> This surely doesn't matter either way. We're not expecting this hash
> table to have more than a handful of entries; the difference between
> 256, 64, and NAMEDATALEN won't even add up to kilobytes in any
> realistic scenario, let along MB or GB.

Right.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 01343422efdd4bb0d3ccc4c45aa7e964ca5d04d0 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 11 Oct 2023 22:07:26 -0500
Subject: [PATCH v4 1/2] add dsm registry

---
 src/backend/storage/ipc/Makefile  |   1 +
 src/backend/storage/ipc/dsm_registry.c| 176 ++
 src/backend/storage/ipc/ipci.c|   3 +
 src/backend/storage/ipc/meson.build   |   1 +
 src/backend/storage/lmgr/lwlock.c |   4 +
 src/backend/storage/lmgr/lwlocknames.txt  |   1 +
 .../utils/activity/wait_event_names.txt   |   3 +
 src/include/storage/dsm_registry.h|  23 +++
 src/include/storage/lwlock.h  |   2 +
 src/test/modules/Makefile |   1 +
 src/test/modules/meson.build  |   1 +
 src/test/modules/test_dsm_registry/.gitignore |   4 +
 src/test/modules/test_dsm_registry/Makefile   |  23 +++
 .../expected/test_dsm_registry.out|  14 ++
 .../modules/test_dsm_registry/meson.build |  33 
 .../sql/test_dsm_registry.sql |   4 +
 .../test_dsm_registry--1.0.sql|  10 +
 .../test_dsm_registry/test_dsm_registry.c |  75 
 .../test_dsm_registry.control |   4 +
 src/tools/pgindent/typedefs.list  |   3 +
 20 files changed, 386 insertions(+)
 create mode 100644 src/backend/storage/ipc/dsm_registry.c
 create mode 100644 src/include/storage/dsm_registry.h
 create mode 100644 src/test/modules/test_dsm_registry/.gitignore
 create mode 100644 src/test/modules/test_dsm_registry/Makefile
 create mode 100644 src/test/modules/test_dsm_registry/expected/test_dsm_registry.out
 create mode 100644 src/test/modules/test_dsm_registry/meson.build
 create mode 100644 src/test/modules/test_dsm_registry/sql/test_dsm_registry.sql
 create mode 100644 src/test/modules/test_dsm_registry/test_dsm_registry--1.0.sql
 create mode 100644 src/test/modules/test_dsm_registry/test_dsm_registry.c
 create mode 100644 src/test/modules/test_dsm_registry/test_dsm_registry.control

diff --git a/src/backend/storage/ipc/Makefile b/src/backend/storage/ipc/Makefile
index 6d5b921038..d8a1653eb6 100644
--- a/src/backend/storage/ipc/Makefile
+++ b/src/backend/storage/ipc/Makefile
@@ -12,6 +12,7 @@ OBJS = \
 	barrier.o \
 	dsm.o \
 	dsm_impl.o \
+	dsm_registry.o \
 	ipc.o \
 	ipci.o \
 	latch.o \
diff --git a/src/backend/storage/ipc/dsm_registry.c b/src/backend/storage/ipc/dsm_registry.c
new file mode 100644
index 00..2b2be4bb99
--- /dev/null
+++ b/src/backend/storage/ipc/dsm_registry.c
@@ -0,0 +1,176 @@
+/*-
+ *
+ * dsm_registry.c
+ *
+ * Functions for interfacing with the dynamic shared memory registry.  This
+ * provides a way for libraries to use shared memory without needing to
+ * request it at startup time via a shmem_request_hook.
+ *
+ * Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *	  src/backend/storage/ipc/dsm_registry.c
+ *
+ *-
+ */
+
+#include "postgres.h"
+
+#include "lib/dshash.h"
+#include "storage/dsm_registry.h"
+#include "storage/lwlock.h"
+#include "storage/shmem.h"
+#include "utils/memutils.h"
+
+typedef struct DSMRegistryCtxStruct
+{
+	dsa_handle	dsah;
+	dshash_table_handle dshh;
+} DSMRegistryCtxStruct;
+
+static DSMRegistryCtxStruct *DSMRegistryCtx;
+
+typedef struct DSMRegistryEntry
+{
+	char		key[64];
+	dsm_handle	handle;
+} DSMRegistryEntry;
+
+static const dshash_parameters dsh_params = {
+	offsetof(DSMRegistryEntry, handle),
+	sizeof(DSMRegistryEntry),
+	dshash_memcmp,
+	dshash_memhash,
+	LWTRANCHE_DSM_REGISTRY_HASH
+};
+
+static dsa_area *dsm_registry_dsa;
+static dshash_table 

Re: SET ROLE x NO RESET

2024-01-02 Thread Michał Kłeczek



> On 2 Jan 2024, at 18:36, Robert Haas  wrote:
> 
> On Sun, Dec 31, 2023 at 2:20 PM Joe Conway  wrote:
>> On 12/30/23 17:19, Michał Kłeczek wrote:
 On 30 Dec 2023, at 17:16, Eric Hanson  wrote:
 
 What do you think of adding a NO RESET option to the SET ROLE command?
>>> 
>>> What I proposed some time ago is SET ROLE … GUARDED BY ‘password’, so
>>> that you could later: RESET ROLE WITH ‘password'
>> 
>> I like that too, but see it as a separate feature. FWIW that is also
>> supported by the set_user extension referenced elsewhere on this thread.
> 
> IMHO, the best solution here would be a protocol message to change the
> session user. The pooler could use that repeatedly on the same
> session, but refuse to propagate such messages from client
> connections.

I think that is a different use case and both are needed.

In my case I have scripts that I want to execute with limited privileges
and make sure the scripts cannot escape the sandbox via RESET ROLE.

Thanks,
Michal 



Re: add AVX2 support to simd.h

2024-01-02 Thread Nathan Bossart
On Tue, Jan 02, 2024 at 12:50:04PM -0500, Tom Lane wrote:
> The patch needs better comments (as in, more than "none whatsoever").

Yes, will do.

> Also, do you really want to structure the header so that USE_SSE2
> doesn't get defined?  In that case you are committing to provide
> an AVX2 replacement every single place that there's USE_SSE2, which
> doesn't seem like a great thing to require.  OTOH, maybe there's
> no choice given than we need a different definition for Vector8 and
> Vector32?

Yeah, the precedent is to use these abstracted types elsewhere so that any
SIMD-related improvements aren't limited to one architecture.  There are a
couple of places that do explicitly check for USE_NO_SIMD, though.  Maybe
there's an eventual use-case for using SSE2 intrinsics even when you have
AVX2 support, but for now, ensuring we have an AVX2 replacement for
everything doesn't seem particularly burdensome.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: verify predefined LWLocks have entries in wait_event_names.txt

2024-01-02 Thread Nathan Bossart
On Tue, Jan 02, 2024 at 01:13:16PM -0500, Robert Haas wrote:
> On Tue, Jan 2, 2024 at 12:31 PM Nathan Bossart  
> wrote:
>> I think we're supposed to omit the "Lock" suffix in wait_event_names.txt.
> 
> Ugh, sorry. But also, why in the world?

That seems to date back to commit 14a9101.  I can agree that the suffix is
somewhat redundant since these are already marked as type "LWLock", but
I'll admit I've been surprised by this before, too.  IMHO it makes this
proposed test more important because you can't just grep for a different
lock to find all the places you need to update.

> - Check in both directions instead of just one?
> 
> - Verify ordering?

To do those things, I'd probably move the test to one of the scripts that
generates the documentation or header file (pg_wait_events doesn't tell us
whether a lock is predefined or what order it's listed in).  That'd cause
failures at build time instead of during testing, which might be kind of
nice, too.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Next step towards 64bit XIDs: Switch to FullTransactionId for PGPROC->xid and XLogRecord->xl_xid

2024-01-02 Thread Robert Haas
On Mon, Jan 1, 2024 at 1:15 AM Maxim Orlov  wrote:
> Yeah, obviously, this is patch make WAL bigger.  I'll try to look into the 
> idea of fxid calculation, as mentioned above.
> It might in part be a "chicken and the egg" situation.  It is very hard to 
> split overall 64xid patch into smaller pieces
> with each part been a meaningful and beneficial for current 32xids Postgres.

I agree, but I think that's what we need to try to do. I mean, we
don't want to commit patches that individually make things *worse* on
the theory that when we do that enough times the result will be
overall better. And even if the individual patches seem to be entirely
neutral, that still doesn't offer great hope that the final result
will be an improvement.

Personally, I'm not convinced that "use 64-bit XIDs everywhere" is a
desirable goal. The space consumption issue here is just the tip of
the iceberg, and really exists in many places where we store XIDs. We
don't want to make tuple headers wider any more than we want to bloat
the WAL. We don't want to make the ProcArray bigger, either, not
because of memory consumption but because of the speed of memory
access. What we really want to do is fix the practical problems that
32-bit XIDs cause. AFAIK there are basically three such problems:

1. SLRUs that are indexed by XID can wrap around, or try to, leading
to confusion and bugs in the code and maybe errors when something
fills up.

2. If a table's relfrozenxid become older than ~2B, we have to stop
XID assignment until the problem is cured.

3. If a running transaction's XID age exceeds ~2B, we can't start new
transactions until the problem is cured.

There's already a patch for (1), I believe. I think we can cure that
by indexing SLRUs by 64-bit integers without changing how XIDs are
stored outside of SLRUs. Various people have attempted (2), for
example with an XID-per-page approach, but nothing's been committed
yet. We can't attempt to cure (3) until (1) or (2) are fixed, and I
think that would require using 64-bit XIDs in the ProcArray, but note
that (3) is less serious: (1) and (2) constrain the size of the XID
range that can legally appear on disk, whereas (3) only constraints
the size of the XID range that can legally occur in memory. That makes
a big difference, because reducing the size of the XID range that can
legally appear on disk requires vacuuming all tables with older
relfrozenxids which in the worst case takes time proportional to the
disk utilization of the instance, whereas reducing the size of the XID
range that can legally appear in memory just requires ending
transactions (including possibly prepared transactions) and removing
replication slots, which can be done in O(1) time.

Maybe this analysis I've just given isn't quite right, but my point is
that we should try to think hard about where in the system 32-bit XIDs
suck and for what reason, and use that as a guide to what to change
first.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Things I don't like about \du's "Attributes" column

2024-01-02 Thread Robert Haas
On Tue, Jan 2, 2024 at 1:46 PM Tom Lane  wrote:
> True, although if you aren't happy with the current state then what
> you actually need to construct is a SQL command to set a *different*
> state from what \du is saying.  Going from LOGIN to NOLOGIN or vice
> versa can also be non-obvious.  So you're likely to end up consulting
> "\h alter user" no matter what, if you don't have it memorized.

That could be true in some cases, but I don't think it's true in all
cases. If you're casually familiar with ALTER USER you probably
remember that many of the properties are negated by sticking NO on the
front; you're less likely to forget that than you are to forget the
name of some specific property. And certainly if you see CONNECTION
LIMIT 24 and you want to change it to CONNECTION LIMIT 42 it's going
to be pretty clear what to adjust.

> I think your argument does have relevance for the other issue about
> whether it's good to be silent about the defaults.  If \du says
> nothing at all about a particular property, that certainly isn't
> helping you to decide whether you want to change it and if so to what.
> I'm not convinced that point is dispositive, but it's something
> to consider.

I agree with 100% of what you say here.

To add to that a bit, I would probably never ask a user to give me the
output of \du to troubleshoot some issue. I would either ask them for
pg_dumpall -g output, or I'd ask them to give me the raw contents of
pg_authid. That's because I know that either of those things are going
to tell me about ALL the properties of the role, or at least all of
the properties of the role that are stored in pg_authid, without
omitting anything that some hacker thought was uninteresting. I don't
know that \du is going to do that, and I'm not going to want to read
the code to figure out which cases it thinks are uninteresting,
*especially* if it behaves differently by version.

The counterargument is that if you don't omit anything, the output
gets very long, which is a problem, because either you go wide, and
then you get wrapping, or you use multiple-lines, and then if there
are 500 users the output goes on forever.

I think a key consideration here is how easy it will be for somebody
to guess the value of a property that is not mentioned. Personally,
I'd assume that if CONNECTION LIMIT isn't mentioned, it's unlimited.
But a lot of the other options are less clear. Probably NOSUPERUSER is
the default and SUPERUSER is the exception, but it's very unclear
whether LOGIN or NOLOGIN is should be treated as the "normal" case,
given that the feature encompasses users and groups and non-login
roles that people access via SET ROLE and things that look like groups
but are also used as login roles.

And with some of the other options, it's just harder to remember
whether there's a default and what it is exactly than for other object
types. With something like a table column, it feels intuitive that if
you just ask for a table column, you get a "normal" table column ...
and then if you add a fillfactor or a CHECK constraint it will show up
in the \d output, and otherwise not. But to try to apply that concept
here means that we suppose the user knows whether the default is
INHERIT or NOINHERIT, whether the default is BYPASSRLS or NOBYPASSRLS,
etc. And I'm just a little bit skeptical of that assumption. Perhaps
it's just that I've spent less time doing user management than table
administration and so I'm the only one who finds this fuzzier than
some other kinds of SQL objects, but I'm not sure it's just that.
Roles are pretty weird.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Minor cleanup for search path cache

2024-01-02 Thread Tom Lane
Jeff Davis  writes:
> Looks good to me.

Thanks for reviewing, will push shortly.

regards, tom lane




Re: Minor cleanup for search path cache

2024-01-02 Thread Jeff Davis
On Mon, 2024-01-01 at 16:38 -0500, Tom Lane wrote:
> I happened to notice that there is a not-quite-theoretical crash
> hazard in spcache_init().  If we see that SPCACHE_RESET_THRESHOLD
> is exceeded and decide to reset the cache, but then nsphash_create
> fails for some reason (perhaps OOM), an error will be thrown
> leaving the SearchPathCache pointer pointing at already-freed
> memory.

Good catch, thank you. I tried to avoid OOM hazards (e.g. b282fa88df,
8efa301532), but I missed this one.

> I also observed that the code seems to have been run through
> pgindent without fixing typedefs.list, making various places
> uglier than they should be.
> 
> The attached proposed cleanup patch fixes those things and in
> passing improves (IMO anyway) some comments.  I assume it wasn't
> intentional to leave two copies of the same comment block in
> check_search_path().

Looks good to me.

Regards,
Jeff Davis





Re: Eager page freeze criteria clarification

2024-01-02 Thread Melanie Plageman
On Thu, Dec 21, 2023 at 3:58 PM Melanie Plageman
 wrote:
>
> On Wed, Dec 13, 2023 at 12:24 PM Robert Haas  wrote:
> > On Sat, Dec 9, 2023 at 5:12 AM Melanie Plageman
> >  wrote:
> > > The goal is to keep pages frozen for at least target_freeze_duration.
> > > target_freeze_duration is in seconds and pages only have a last
> > > modification LSN, so target_freeze_duration must be converted to LSNs.
> > > To accomplish this, I've added an LSNTimeline data structure, containing
> > > XLogRecPtr, TimestampTz pairs stored with decreasing precision as they
> > > age. When we need to translate the guc value to LSNs, we linearly
> > > interpolate it on this timeline. For the time being, the global
> > > LSNTimeline is in PgStat_WalStats and is only updated by vacuum. There
> > > is no reason it can't be updated with some other cadence and/or by some
> > > other process (nothing about it is inherently tied to vacuum). The
> > > cached translated value of target_freeze_duration is stored in each
> > > table's stats. This is arbitrary as it is not a table-level stat.
> > > However, it needs to be located somewhere that is accessible on
> > > update/delete. We may want to recalculate it more often than once per
> > > table vacuum, especially in case of long-running vacuums.
> >
> > This part sounds like it isn't quite baked yet. The idea of the data
> > structure seems fine, but updating it once per vacuum sounds fairly
> > unprincipled to me? Don't we want the updates to happen on a somewhat
> > regular wall clock cadence?
>
> Yes, this part was not fully baked. I actually discussed this with
> Andres at PGConf EU last week and he suggested that background writer
> update the LSNTimeline. He also suggested I propose the LSNTimeline in
> a new thread. I could add a pageinspect function returning the
> estimated time of last page modification given the page LSN (so it is
> proposed with a user).

I've rebased this over top of the revised LSNTimeline functionality
proposed separately in [1].
It is also registered for the current commitfest.

I plan to add the decay logic and benchmark it this week.

- Melanie

[1] 
https://www.postgresql.org/message-id/CAAKRu_Z7tR7D1=DR=xwqdefyk_nu_gxgw88x0htxn6ask-8...@mail.gmail.com
From 6fe9f26ebe6bd773e631bcdf0b7ea928a4ca8e27 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Wed, 27 Dec 2023 16:40:27 -0500
Subject: [PATCH v3 02/12] Add LSNTimeline for converting LSN <-> time

Add a new structure, LSNTimeline, consisting of LSNTimes -- each an LSN,
time pair. Each LSNTime can represent multiple logical LSN, time pairs,
referred to as members. LSN <-> time conversions can be done using
linear interpolation with two LSNTimes on the LSNTimeline.

This commit does not add a global instance of LSNTimeline. It adds the
structures and functions needed to maintain and access such a timeline.
---
 src/backend/utils/activity/pgstat_wal.c | 199 
 src/include/pgstat.h|  34 
 src/tools/pgindent/typedefs.list|   2 +
 3 files changed, 235 insertions(+)

diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index 6a81b781357..ba40aad258a 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -17,8 +17,11 @@
 
 #include "postgres.h"
 
+#include "access/xlog.h"
 #include "utils/pgstat_internal.h"
 #include "executor/instrument.h"
+#include "utils/builtins.h"
+#include "utils/timestamp.h"
 
 
 PgStat_PendingWalStats PendingWalStats = {0};
@@ -32,6 +35,12 @@ PgStat_PendingWalStats PendingWalStats = {0};
 static WalUsage prevWalUsage;
 
 
+static void lsntime_absorb(LSNTime *a, const LSNTime *b);
+void lsntime_insert(LSNTimeline *timeline, TimestampTz time, XLogRecPtr lsn);
+
+XLogRecPtr estimate_lsn_at_time(const LSNTimeline *timeline, TimestampTz time);
+TimestampTz estimate_time_at_lsn(const LSNTimeline *timeline, XLogRecPtr lsn);
+
 /*
  * Calculate how much WAL usage counters have increased and update
  * shared WAL and IO statistics.
@@ -184,3 +193,193 @@ pgstat_wal_snapshot_cb(void)
 		   sizeof(pgStatLocal.snapshot.wal));
 	LWLockRelease(_shmem->lock);
 }
+
+/*
+ * Set *a to be the earlier of *a or *b.
+ */
+static void
+lsntime_absorb(LSNTime *a, const LSNTime *b)
+{
+	LSNTime		result;
+	int			new_members = a->members + b->members;
+
+	if (a->time < b->time)
+		result = *a;
+	else if (b->time < a->time)
+		result = *b;
+	else if (a->lsn < b->lsn)
+		result = *a;
+	else if (b->lsn < a->lsn)
+		result = *b;
+	else
+		result = *a;
+
+	*a = result;
+	a->members = new_members;
+}
+
+/*
+ * Insert a new LSNTime into the LSNTimeline in the first element with spare
+ * capacity.
+ */
+void
+lsntime_insert(LSNTimeline *timeline, TimestampTz time,
+			   XLogRecPtr lsn)
+{
+	LSNTime		temp;
+	LSNTime		carry = {.lsn = lsn,.time = time,.members = 1};
+
+	for (int i = 0; i < timeline->length; i++)
+	{
+		bool		full;
+		LSNTime*cur = >data[i];
+
+		/*
+		 * 

Re: Things I don't like about \du's "Attributes" column

2024-01-02 Thread Tom Lane
Robert Haas  writes:
> My thought was that such people probably need to interpret LOGIN and
> NOLOGIN into their preferred language either way, but if \du displays
> something else, then they also need to mentally construct a reverse
> mapping, from whatever string is showing up there to the corresponding
> SQL syntax. The current display has that problem even for English
> speakers -- you have to know that "Cannot login" corresponds to
> "NOLOGIN" and that "No connections" corresponds to "CONNECTION LIMIT
> 0" and so forth.

True, although if you aren't happy with the current state then what
you actually need to construct is a SQL command to set a *different*
state from what \du is saying.  Going from LOGIN to NOLOGIN or vice
versa can also be non-obvious.  So you're likely to end up consulting
"\h alter user" no matter what, if you don't have it memorized.

I think your argument does have relevance for the other issue about
whether it's good to be silent about the defaults.  If \du says
nothing at all about a particular property, that certainly isn't
helping you to decide whether you want to change it and if so to what.
I'm not convinced that point is dispositive, but it's something
to consider.

regards, tom lane




Re: Things I don't like about \du's "Attributes" column

2024-01-02 Thread Robert Haas
On Tue, Jan 2, 2024 at 1:17 PM Tom Lane  wrote:
> Mmm ... maybe.  I think those of us who are native English speakers
> may overrate the intelligibility of SQL keywords to those who aren't.
> So I'm inclined to feel that preserving translatability of the
> role property descriptions is a good thing.  But it'd be good to
> hear comments on that point from people who actually use it.

+1 for comments from people who use it.

My thought was that such people probably need to interpret LOGIN and
NOLOGIN into their preferred language either way, but if \du displays
something else, then they also need to mentally construct a reverse
mapping, from whatever string is showing up there to the corresponding
SQL syntax. The current display has that problem even for English
speakers -- you have to know that "Cannot login" corresponds to
"NOLOGIN" and that "No connections" corresponds to "CONNECTION LIMIT
0" and so forth. No matter what we put there, if it's English or
Turkish or Hindi rather than SQL, you have to try to figure out what
the displayed text corresponds to at the SQL level in order to fix
anything that isn't the way you want it to be, or to recreate the
configuration on another instance.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2024-01-02 Thread Robert Haas
On Tue, Jan 2, 2024 at 1:10 PM Andrey M. Borodin  wrote:
> > On 2 Jan 2024, at 19:23, Robert Haas  wrote:
> >> And it would be even better if page for transaction statuses would be 
> >> determined by backend id somehow. Or at least cache line. Can we allocate 
> >> a range (sizeof(cacheline)) of xids\subxids\multixacts\whatever for each 
> >> backend?
> >
> > I don't understand how this could work. We need to be able to look up
> > transaction status by XID, not backend ID.
>
> When GetNewTransactionId() is called we can reserve 256 xids in backend local 
> memory. This values will be reused by transactions or subtransactions of this 
> backend. Here 256 == sizeof(CacheLine).
> This would ensure that different backends touch different cache lines.
>
> But this approach would dramatically increase xid consumption speed on 
> patterns where client reconnects after several transactions. So we can keep 
> unused xids in procarray for future reuse.
>
> I doubt we can find significant performance improvement here, because false 
> cache line sharing cannot be _that_ bad.

Yeah, this seems way too complicated for what we'd potentially gain
from it. An additional problem is that the xmin horizon computation
assumes that XIDs are assigned in monotonically increasing fashion;
breaking that would be messy. And even an occasional leak of XIDs
could precipitate enough additional vacuuming to completely outweigh
any gains we could hope to achieve here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: UUID v7

2024-01-02 Thread Andrey M. Borodin
On 2 Jan 2024, at 14:17, Andrey M. Borodin  wrote:Tests verify that get_uuid_v7_time(gen_uuid_v7()) differs no more than 1ms from now(). Maybe we should allow more tolerant values for slow test machines.Indeed, CFbot complained about flaky tests. I've increased test tolerance to 100ms. (this does not affect test time)Best regards, Andrey Borodin.

v8-0001-Implement-UUID-v7-as-per-IETF-draft.patch
Description: Binary data


v8-0003-Use-cached-random-numbers-in-gen_random_uuid-too.patch
Description: Binary data


v8-0002-Buffer-random-numbers.patch
Description: Binary data


Re: Things I don't like about \du's "Attributes" column

2024-01-02 Thread Tom Lane
Robert Haas  writes:
> I wonder if we shouldn't try to display the roles's properties using
> SQL keywords rather than narrating. Someone can be confused by "No
> connections" but "CONNECTION LIMIT 0" is pretty hard to mistake;
> likewise "LOGIN" or "NOLOGIN" seems clear enough.

Mmm ... maybe.  I think those of us who are native English speakers
may overrate the intelligibility of SQL keywords to those who aren't.
So I'm inclined to feel that preserving translatability of the
role property descriptions is a good thing.  But it'd be good to
hear comments on that point from people who actually use it.

regards, tom lane




Re: verify predefined LWLocks have entries in wait_event_names.txt

2024-01-02 Thread Robert Haas
On Tue, Jan 2, 2024 at 12:31 PM Nathan Bossart  wrote:
> I think we're supposed to omit the "Lock" suffix in wait_event_names.txt.

Ugh, sorry. But also, why in the world?

> > It seems like it would be good if there were an automated cross-check
> > between lwlocknames.txt and wait_event_names.txt.
>
> +1.  Here's a hastily-thrown-together patch for that.  I basically copied
> 003_check_guc.pl and adjusted it for this purpose.  This test only checks
> that everything in lwlocknames.txt has a matching entry in
> wait_event_names.txt.  It doesn't check that everything in the predefined
> LWLock section of wait_event_names.txt has an entry in lwlocknames.txt.
> AFAICT that would be a little more difficult because you can't distinguish
> between the two in pg_wait_events.
>
> Even with this test, I worry that we could easily forget to add entries in
> wait_event_names.txt for the non-predefined locks, but I don't presently
> have a proposal for how to prevent that.

It certainly seems better to check what we can than to check nothing.

Suggestions:

- Check in both directions instead of just one?

- Verify ordering?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2024-01-02 Thread Andrey M. Borodin



> On 2 Jan 2024, at 19:23, Robert Haas  wrote:
> 
>> 
>> And it would be even better if page for transaction statuses would be 
>> determined by backend id somehow. Or at least cache line. Can we allocate a 
>> range (sizeof(cacheline)) of xids\subxids\multixacts\whatever for each 
>> backend?
> 
> I don't understand how this could work. We need to be able to look up
> transaction status by XID, not backend ID.

When GetNewTransactionId() is called we can reserve 256 xids in backend local 
memory. This values will be reused by transactions or subtransactions of this 
backend. Here 256 == sizeof(CacheLine).
This would ensure that different backends touch different cache lines.

But this approach would dramatically increase xid consumption speed on patterns 
where client reconnects after several transactions. So we can keep unused xids 
in procarray for future reuse.

I doubt we can find significant performance improvement here, because false 
cache line sharing cannot be _that_ bad.


Best regards, Andrey Borodin.



Re: pg_upgrade failing for 200+ million Large Objects

2024-01-02 Thread Tom Lane
"Kumar, Sachin"  writes:
>> On 11/12/2023, 01:43, "Tom Lane" > > wrote:
>> ... Maybe that
>> could be improved in future, but it seems like it'd add a
>> lot more complexity, and it wouldn't make life any better for
>> pg_upgrade (which doesn't use parallel pg_restore, and seems
>> unlikely to want to in future).

> I was not able to find email thread which details why we are not using
> parallel pg_restore for pg_upgrade.

Well, it's pretty obvious isn't it?  The parallelism is being applied
at the per-database level instead.

> IMHO most of the customer will have single large
> database, and not using parallel restore will cause slow pg_upgrade.

You've offered no justification for that opinion ...

> I am attaching a patch which enables parallel pg_restore for DATA and 
> POST-DATA part
> of dump. It will push down --jobs value to pg_restore and will restore
> database sequentially.

I don't think I trust this patch one bit.  It makes way too many
assumptions about how the --section options work, or even that they
will work at all in a binary-upgrade situation.  I've spent enough
time with that code to know that --section is pretty close to being
a fiction.  One point in particular is that this would change the
order of ACL restore relative to other steps, which almost certainly
will cause problems for somebody.

regards, tom lane




Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-01-02 Thread Robert Haas
On Fri, Dec 29, 2023 at 1:38 PM Tom Lane  wrote:
> Jelte Fennema-Nio  writes:
> > 1. Protocol messages are much easier to inspect for connection poolers
> > than queries
>
> Unless you somehow forbid clients from setting GUCs in the old way,
> exactly how will that help a pooler?

I agree that for this to work out we need the things that you can set
this way to be able to be set in only this way. But I'm also a huge
fan of the idea -- if done correctly, it would solve the problem of an
end client sneaking SET ROLE or SET SESSION AUTHORIZATION past the
pooler, which is a huge issue that we really ought to address.

> > 2. It paves the way for GUCs that can only be set using a protocol
> > message (and not using SET).
>
> This is assuming facts not in evidence.  Why would we want such a thing?

See above.

> > 3. Being able to change GUCs while in an aborted transaction.
>
> I'm really dubious that that's sane.  How will it interact with, for
> example, changes to the same GUC done in the now-failed transaction?
> Or for that matter, changes that happen later in the current
> transaction?  It seems like you can't even think about this unless
> you deem GUC changes made this way to be non-transactional, which
> seems very wart-y and full of consistency problems.

I agree with these complaints.

> > 4. Have an easy way to use the value contained in a ParameterStatus
> > message as the value for a GUC
>
> I agree that GUC_LIST_QUOTE is a mess, but "I'm too lazy to deal
> with that" seems like a rather poor argument for instead expending
> the amount of labor involved in a protocol change.

Not sure about this one. It seems unwarranted to introduce an
accusation of laziness when someone is finally making the effort to
address what is IMV a pretty serious deficiency in our current
implementation, but I have no educated opinion about what if anything
ought to be done about GUC_LIST_QUOTE or how that relates to the
present effort.

> On the whole, this feels like you are trying to force some things
> into the GUC model that should not be there.  I do perceive that
> there are things that could be protocol-level variables, but
> trying to say they are a kind of GUC seems like it will create
> more problems than it solves.

This is not a bad thought. If we made the things that were settable
with this mechanism distinct from the set of things that are settable
as GUCs, that might work out better. For example, it completely the
objection to (3). But I'm not 100% sure, either.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: add AVX2 support to simd.h

2024-01-02 Thread Tom Lane
Nathan Bossart  writes:
> I'm tempted to propose that we move forward with this patch as-is after
> adding a buildfarm machine that compiles with -mavx2 or -march=x86-64-v3.
> There is likely still follow-up work to make these improvements more
> accessible, but I'm not sure that is a strict prerequisite here.

The patch needs better comments (as in, more than "none whatsoever").
It doesn't need to be much though, perhaps like

+#if defined(__AVX2__)
+
+/*
+ * When compiled with -mavx2 or allied options, we prefer AVX2 instructions.
+ */
+#include 
+#define USE_AVX2
+typedef __m256i Vector8;
+typedef __m256i Vector32;

Also, do you really want to structure the header so that USE_SSE2
doesn't get defined?  In that case you are committing to provide
an AVX2 replacement every single place that there's USE_SSE2, which
doesn't seem like a great thing to require.  OTOH, maybe there's
no choice given than we need a different definition for Vector8 and
Vector32?

regards, tom lane




Re: Build versionless .so for Android

2024-01-02 Thread Robert Haas
On Sun, Dec 31, 2023 at 1:24 AM Michael Paquier  wrote:
> FWIW, I have mixed feelings about patching the code to treat
> android-linux as an exception while changing nothing in the
> documentation to explain why this is required.  A custom patch may
> serve you better here, and note that you did not even touch the meson
> paths.  See libpq_c_args in src/interfaces/libpq/meson.build as one
> example.  That's just my opinion, of course, still there are no
> buildfarm members that would cover your patch.

It's reasonable to want good comments -- the one in the patch (1)
doesn't explain why this is required and (2) suggests that it is only
needed when cross-compiling which seems surprising and (3) has a typo.
But if it's true that this is needed, I tentatively think we might do
better to take the patch than force people to carry it out of tree.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: SET ROLE x NO RESET

2024-01-02 Thread Robert Haas
On Sun, Dec 31, 2023 at 2:20 PM Joe Conway  wrote:
> On 12/30/23 17:19, Michał Kłeczek wrote:
> >> On 30 Dec 2023, at 17:16, Eric Hanson  wrote:
> >>
> >> What do you think of adding a NO RESET option to the SET ROLE command?
> >
> > What I proposed some time ago is SET ROLE … GUARDED BY ‘password’, so
> > that you could later: RESET ROLE WITH ‘password'
>
> I like that too, but see it as a separate feature. FWIW that is also
> supported by the set_user extension referenced elsewhere on this thread.

IMHO, the best solution here would be a protocol message to change the
session user. The pooler could use that repeatedly on the same
session, but refuse to propagate such messages from client
connections.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Confine vacuum skip logic to lazy_scan_skip

2024-01-02 Thread Melanie Plageman
On Sun, Dec 31, 2023 at 1:28 PM Melanie Plageman
 wrote:
>
> There are a few comments that still need to be updated. I also noticed I
> needed to reorder and combine a couple of the commits. I wanted to
> register this for the january commitfest, so I didn't quite have time
> for the finishing touches.

I've updated this patch set to remove a commit that didn't make sense
on its own and do various other cleanup.

- Melanie
From 335faad5948b2bec3b83c2db809bb9161d373dcb Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Sat, 30 Dec 2023 16:59:27 -0500
Subject: [PATCH v2 4/6] Confine vacuum skip logic to lazy_scan_skip

In preparation for vacuum to use the streaming read interface (and eventually
AIO), refactor vacuum's logic for skipping blocks such that it is entirely
confined to lazy_scan_skip(). This turns lazy_scan_skip() and the VacSkipState
it uses into an iterator which yields blocks to lazy_scan_heap(). Such a
structure is conducive to an async interface.

By always calling lazy_scan_skip() -- instead of only when we have reached the
next unskippable block, we no longer need the skipping_current_range variable.
lazy_scan_heap() no longer needs to manage the skipped range -- checking if we
reached the end in order to then call lazy_scan_skip(). And lazy_scan_skip()
can derive the visibility status of a block from whether or not we are in a
skippable range -- that is, whether or not the next_block is equal to the next
unskippable block.
---
 src/backend/access/heap/vacuumlazy.c | 233 ++-
 1 file changed, 120 insertions(+), 113 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e3827a5e4d3..42da4ac64f8 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -250,14 +250,13 @@ typedef struct VacSkipState
 	Buffer		vmbuffer;
 	/* Next unskippable block's visibility status */
 	bool		next_unskippable_allvis;
-	/* Whether or not skippable blocks should be skipped */
-	bool		skipping_current_range;
 } VacSkipState;
 
 /* non-export function prototypes */
 static void lazy_scan_heap(LVRelState *vacrel);
-static void lazy_scan_skip(LVRelState *vacrel, VacSkipState *vacskip,
-		   BlockNumber next_block);
+static BlockNumber lazy_scan_skip(LVRelState *vacrel, VacSkipState *vacskip,
+  BlockNumber blkno,
+  bool *all_visible_according_to_vm);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
    BlockNumber blkno, Page page,
    bool sharelock, Buffer vmbuffer);
@@ -838,9 +837,15 @@ static void
 lazy_scan_heap(LVRelState *vacrel)
 {
 	BlockNumber rel_pages = vacrel->rel_pages,
-blkno,
 next_fsm_block_to_vacuum = 0;
-	VacSkipState vacskip = {.vmbuffer = InvalidBuffer};
+	bool		all_visible_according_to_vm;
+
+	/* relies on InvalidBlockNumber overflowing to 0 */
+	BlockNumber blkno = InvalidBlockNumber;
+	VacSkipState vacskip = {
+		.next_unskippable_block = InvalidBlockNumber,
+		.vmbuffer = InvalidBuffer
+	};
 	VacDeadItems *dead_items = vacrel->dead_items;
 	const int	initprog_index[] = {
 		PROGRESS_VACUUM_PHASE,
@@ -855,37 +860,17 @@ lazy_scan_heap(LVRelState *vacrel)
 	initprog_val[2] = dead_items->max_items;
 	pgstat_progress_update_multi_param(3, initprog_index, initprog_val);
 
-	/* Set up an initial range of skippable blocks using the visibility map */
-	lazy_scan_skip(vacrel, , 0);
-	for (blkno = 0; blkno < rel_pages; blkno++)
+	while (true)
 	{
 		Buffer		buf;
 		Page		page;
-		bool		all_visible_according_to_vm;
 		LVPagePruneState prunestate;
 
-		if (blkno == vacskip.next_unskippable_block)
-		{
-			/*
-			 * Can't skip this page safely.  Must scan the page.  But
-			 * determine the next skippable range after the page first.
-			 */
-			all_visible_according_to_vm = vacskip.next_unskippable_allvis;
-			lazy_scan_skip(vacrel, , blkno + 1);
-
-			Assert(vacskip.next_unskippable_block >= blkno + 1);
-		}
-		else
-		{
-			/* Last page always scanned (may need to set nonempty_pages) */
-			Assert(blkno < rel_pages - 1);
-
-			if (vacskip.skipping_current_range)
-continue;
+		blkno = lazy_scan_skip(vacrel, , blkno + 1,
+			   _visible_according_to_vm);
 
-			/* Current range is too small to skip -- just scan the page */
-			all_visible_according_to_vm = true;
-		}
+		if (blkno == InvalidBlockNumber)
+			break;
 
 		vacrel->scanned_pages++;
 
@@ -1287,20 +1272,13 @@ lazy_scan_heap(LVRelState *vacrel)
 }
 
 /*
- *	lazy_scan_skip() -- set up range of skippable blocks using visibility map.
- *
- * lazy_scan_heap() calls here every time it needs to set up a new range of
- * blocks to skip via the visibility map.  Caller passes next_block, the next
- * block in line. The parameters of the skipped range are recorded in vacskip.
- * vacrel is an in/out parameter here; vacuum options and information about the
- * relation are read and vacrel->skippedallvis is set to ensure we don't
- * advance relfrozenxid when we have 

Re: Assorted typo fixes

2024-01-02 Thread Robert Haas
On Mon, Jan 1, 2024 at 6:05 PM Dagfinn Ilmari Mannsåker
 wrote:
> Thanks. I've fixed the commit message (and elaborated it a bit more why
> I think it's a valid and safe fix).

Regarding 0001:

- AIUI, check_decls.m4 is copied from an upstream project, so I don't
think we should tinker with it.
- I'm not convinced by encrypter->encryptor
- I'm not convinced by multidimension-aware->multidimensional-aware
- I'm not convinced by cachable->cacheable
- You corrected restorting to restarting, but I'm wondering if Andres
intended restoring?

Committed the rest of 0001.

0002-0005 look OK to me, so I committed those as well.

With regard to 0006, we typically use indexes rather than indices as
the plural of "index", although exceptions exist.

I haven't looked at the rest.

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_upgrade failing for 200+ million Large Objects

2024-01-02 Thread Kumar, Sachin
> On 11/12/2023, 01:43, "Tom Lane"  > wrote:

> I had initially supposed that in a parallel restore we could
> have child workers also commit after every N TOC items, but was
> soon disabused of that idea. After a worker processes a TOC
> item, any dependent items (such as index builds) might get
> dispatched to some other worker, which had better be able to
> see the results of the first worker's step. So at least in
> this implementation, we disable the multi-command-per-COMMIT
> behavior during the parallel part of the restore. Maybe that
> could be improved in future, but it seems like it'd add a
> lot more complexity, and it wouldn't make life any better for
> pg_upgrade (which doesn't use parallel pg_restore, and seems
> unlikely to want to in future).

I was not able to find email thread which details why we are not using
parallel pg_restore for pg_upgrade. IMHO most of the customer will have single 
large
database, and not using parallel restore will cause slow pg_upgrade.

I am attaching a patch which enables parallel pg_restore for DATA and POST-DATA 
part
of dump. It will push down --jobs value to pg_restore and will restore database 
sequentially.

Benchmarks

{5 million LOs 1 large DB}
Patched {v9}
time pg_upgrade --old-bindir ~/15/bin --new-bindir ~/install/bin --old-datadir 
~/upgrade/data/pub --new-datadir ~/data/sub --jobs=20
pg_upgrade --old-bindir ~/15/bin --new-bindir ~/install/bin --old-datadir  
17.51s user 65.80s system 35% cpu 3:56.64 total


time pg_upgrade --old-bindir ~/15/bin --new-bindir ~/install/bin --old-datadir 
~/upgrade/data/pub --new-datadir ~/data/sub -r
pg_upgrade --old-bindir ~/15/bin --new-bindir ~/install/bin --old-datadir  
17.51s user 65.85s system 34% cpu 3:58.39 total


HEAD
time pg_upgrade --old-bindir ~/15/bin --new-bindir ~/install/bin --old-datadir 
~/upgrade/data/pub --new-datadir ~/data/sub -r --jobs=20
pg_upgrade --old-bindir ~/15/bin --new-bindir ~/install/bin --old-datadir  
53.95s user 82.44s system 41% cpu 5:25.23 total

time pg_upgrade --old-bindir ~/15/bin --new-bindir ~/install/bin --old-datadir 
~/upgrade/data/pub --new-datadir ~/data/sub -r
pg_upgrade --old-bindir ~/15/bin --new-bindir ~/install/bin --old-datadir  
54.94s user 81.26s system 41% cpu 5:24.86 total



Fix with --jobs propagation to pg_restore {on top of v9}
time pg_upgrade --old-bindir ~/15/bin --new-bindir ~/install/bin --old-datadir 
~/upgrade/data/pub --new-datadir ~/data/sub -r --jobs=20
pg_upgrade --old-bindir ~/15/bin --new-bindir ~/install/bin --old-datadir  
29.12s user 69.85s system 275% cpu 35.930 total 


Although parallel restore does have small regression in ideal case of 
pg_upgrade --jobs


Multiple DBs {4 DBs each having 2 million LOs}

Fix with --jobs scheduling
time pg_upgrade --old-bindir ~/15/bin --new-bindir ~/install/bin --old-datadir 
~/upgrade/data/pub --new-datadir ~/data/sub -r --jobs=4
pg_upgrade --old-bindir ~/15/bin --new-bindir ~/install/bin --old-datadir  
31.80s user 109.52s system 120% cpu 1:57.35 total


Patched {v9}
time pg_upgrade --old-bindir ~/15/bin --new-bindir ~/install/bin --old-datadir 
~/upgrade/data/pub --new-datadir ~/data/sub -r --jobs=4
pg_upgrade --old-bindir ~/15/bin --new-bindir ~/install/bin --old-datadir  
30.88s user 110.05s system 135% cpu 1:43.97 total


Regards
Sachin



v9-005-parallel_pg_restore.patch
Description: v9-005-parallel_pg_restore.patch


verify predefined LWLocks have entries in wait_event_names.txt

2024-01-02 Thread Nathan Bossart
(new thread)

On Tue, Jan 02, 2024 at 10:34:11AM -0500, Robert Haas wrote:
> On Wed, Dec 27, 2023 at 10:36 AM Nathan Bossart
>  wrote:
>> Thanks!  I also noticed that WALSummarizerLock probably needs a mention in
>> wait_event_names.txt.
> 
> Fixed.

I think we're supposed to omit the "Lock" suffix in wait_event_names.txt.

> It seems like it would be good if there were an automated cross-check
> between lwlocknames.txt and wait_event_names.txt.

+1.  Here's a hastily-thrown-together patch for that.  I basically copied
003_check_guc.pl and adjusted it for this purpose.  This test only checks
that everything in lwlocknames.txt has a matching entry in
wait_event_names.txt.  It doesn't check that everything in the predefined
LWLock section of wait_event_names.txt has an entry in lwlocknames.txt.
AFAICT that would be a little more difficult because you can't distinguish
between the two in pg_wait_events.

Even with this test, I worry that we could easily forget to add entries in
wait_event_names.txt for the non-predefined locks, but I don't presently
have a proposal for how to prevent that.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 5031b55d8e1095ca919fe9a088ec0539b28984fc Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 2 Jan 2024 11:19:20 -0600
Subject: [PATCH v1 1/1] verify wait events for all lwlocks

---
 .../utils/activity/wait_event_names.txt   |  2 +-
 src/test/modules/test_misc/meson.build|  1 +
 .../modules/test_misc/t/005_check_lwlock.pl   | 51 +++
 3 files changed, 53 insertions(+), 1 deletion(-)
 create mode 100644 src/test/modules/test_misc/t/005_check_lwlock.pl

diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index d876f8a667..f61ec3e59d 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -324,7 +324,7 @@ XactTruncation	"Waiting to execute pg_xact_status or update
 WrapLimitsVacuum	"Waiting to update limits on transaction id and multixact consumption."
 NotifyQueueTail	"Waiting to update limit on NOTIFY message storage."
 WaitEventExtension	"Waiting to read or update custom wait events information for extensions."
-WALSummarizerLock	"Waiting to read or update WAL summarization state."
+WALSummarizer	"Waiting to read or update WAL summarization state."
 
 XactBuffer	"Waiting for I/O on a transaction status SLRU buffer."
 CommitTsBuffer	"Waiting for I/O on a commit timestamp SLRU buffer."
diff --git a/src/test/modules/test_misc/meson.build b/src/test/modules/test_misc/meson.build
index 911084ac0f..13bc6cb423 100644
--- a/src/test/modules/test_misc/meson.build
+++ b/src/test/modules/test_misc/meson.build
@@ -10,6 +10,7 @@ tests += {
   't/002_tablespace.pl',
   't/003_check_guc.pl',
   't/004_io_direct.pl',
+  't/005_check_lwlock.pl',
 ],
   },
 }
diff --git a/src/test/modules/test_misc/t/005_check_lwlock.pl b/src/test/modules/test_misc/t/005_check_lwlock.pl
new file mode 100644
index 00..aff83e879e
--- /dev/null
+++ b/src/test/modules/test_misc/t/005_check_lwlock.pl
@@ -0,0 +1,51 @@
+# Tests to cross-check the consistency of pg_wait_events with lwlocknames.h.
+
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->start;
+
+# Grab the names of all the LWLocks from pg_wait_events.
+my $all_lwlocks = $node->safe_psql(
+	'postgres',
+	"SELECT name FROM pg_wait_events WHERE type = 'LWLock'");
+my @all_lwlocks_array = split("\n", $all_lwlocks);
+
+# Find the location of lwlocknames.h.
+my $include_dir = $node->config_data('--includedir');
+my $lwlocknames_file = "$include_dir/server/storage/lwlocknames.h";
+
+# Read the list of locks from lwlocknames.h.
+my $num_tests = 0;
+my @lwlocks_found;
+open(my $contents, '<', $lwlocknames_file)
+  || die "Could not open $lwlocknames_file: $!";
+while (my $line = <$contents>)
+{
+	if ($line =~ m/^#define ([a-zA-Z]+)Lock .*/)
+	{
+		push @lwlocks_found, $1;
+	}
+}
+close $contents;
+
+# Cross-check that all the locks found in lwlocknames.h have matching entries
+# in pg_wait_events.
+my %all_lwlocks_hash = map { $_ => 1 } @all_lwlocks_array;
+my @missing_from_wait_events = grep(!$all_lwlocks_hash{$_}, @lwlocks_found);
+is(scalar(@missing_from_wait_events),
+	0, "no LWLocks missing from pg_wait_events");
+
+foreach my $lwlock (@missing_from_wait_events)
+{
+	print(
+		"found $lwlock in lwlocknames.h, missing from wait_event_names.txt\n"
+	);
+}
+
+done_testing();
-- 
2.25.1



Re: The presence of a NULL "defaclacl" value in pg_default_acl prevents the dropping of a role.

2024-01-02 Thread Tom Lane
"=?UTF-8?B?5p2o5Lyv5a6HKOmVv+Wggik=?="  writes:
> postgres=# create user adminuser;
> CREATE ROLE
> postgres=# create user normaluser;
> CREATE ROLE
> postgres=# alter default privileges for role adminuser grant all on tables to 
> normaluser;
> ALTER DEFAULT PRIVILEGES
> postgres=# alter default privileges for role adminuser revoke all ON tables 
> from adminuser;
> ALTER DEFAULT PRIVILEGES
> postgres=# alter default privileges for role adminuser revoke all ON tables 
> from normaluser;
> ALTER DEFAULT PRIVILEGES
> postgres=# select * from pg_default_acl where pg_get_userbyid(defaclrole) = 
> 'adminuser';
>   oid  | defaclrole | defaclnamespace | defaclobjtype | defaclacl
> ---++-+---+---
>  16396 |  16394 |   0 | r | {}
> (1 row)
> postgres=# drop user adminuser ;
> ERROR:  role "adminuser" cannot be dropped because some objects depend on it
> DETAIL:  owner of default privileges on new relations belonging to role 
> adminuser

This looks perfectly normal to me: the privileges for 'adminuser'
itself are not at the default state.  If you then do

regression=# alter default privileges for role adminuser grant all on tables to 
adminuser ;
ALTER DEFAULT PRIVILEGES

then things are back to normal, and the pg_default_acl entry goes away:

regression=# select * from pg_default_acl;
 oid | defaclrole | defaclnamespace | defaclobjtype | defaclacl 
-++-+---+---
(0 rows)

and you can drop the user:

regression=# drop user adminuser ;
DROP ROLE

You could argue that there's no need to be picky about an entry that
only controls privileges for the user-to-be-dropped, but it is working
as designed and documented.

I fear your proposed patch is likely to break more things than it fixes.
In particular it looks like it would forget the existence of the
user's self-revocation altogether, even before the drop of the user.

regards, tom lane




Re: introduce dynamic shared memory registry

2024-01-02 Thread Robert Haas
On Tue, Jan 2, 2024 at 11:21 AM Nathan Bossart  wrote:
> > Are we expecting, for instance, a 128-bit UUID being used as a key and
> > hence limiting it to a higher value 256 instead of just NAMEDATALEN?
> > My thoughts were around saving a few bytes of shared memory space that
> > can get higher when multiple modules using a DSM registry with
> > multiple DSM segments.
>
> I'm not really expecting folks to use more than, say, 16 characters for the
> key, but I intentionally set it much higher in case someone did have a
> reason to use longer keys.  I'll lower it to 64 in the next revision unless
> anyone else objects.

This surely doesn't matter either way. We're not expecting this hash
table to have more than a handful of entries; the difference between
256, 64, and NAMEDATALEN won't even add up to kilobytes in any
realistic scenario, let along MB or GB.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: introduce dynamic shared memory registry

2024-01-02 Thread Nathan Bossart
On Fri, Dec 29, 2023 at 08:53:54PM +0530, Bharath Rupireddy wrote:
> With the use of dsm registry for pg_prewarm, do we need this
> test_dsm_registry module at all? Because 0002 patch pretty much
> demonstrates how to use the DSM registry. With this comment and my
> earlier comment on incorporating the use of dsm registry in
> worker_spi, I'm trying to make a point to reduce the code for this
> feature. However, if others have different opinions, I'm okay with
> having a separate test module.

I don't have a strong opinion here, but I lean towards still having a
dedicated test suite, if for no other reason that to guarantee some
coverage even if the other in-tree uses disappear.

> I've tried with a shared memory structure size of 10GB on my
> development machine and I have seen an intermittent crash (I haven't
> got a chance to dive deep into it). I think the shared memory that a
> named-DSM segment can get via this DSM registry feature depends on the
> total shared memory area that a typical DSM client can allocate today.
> I'm not sure it's worth it to limit the shared memory for this feature
> given that we don't limit the shared memory via startup hook.

I would be interested to see more details about the crashes you are seeing.
Is this unique to the registry or an existing problem with DSM segments?

> Are we expecting, for instance, a 128-bit UUID being used as a key and
> hence limiting it to a higher value 256 instead of just NAMEDATALEN?
> My thoughts were around saving a few bytes of shared memory space that
> can get higher when multiple modules using a DSM registry with
> multiple DSM segments.

I'm not really expecting folks to use more than, say, 16 characters for the
key, but I intentionally set it much higher in case someone did have a
reason to use longer keys.  I'll lower it to 64 in the next revision unless
anyone else objects.

> 2. Do you see any immediate uses of dsm_registry_find()? Especially
> given that dsm_registry_init_or_attach() does the necessary work
> (returns the DSM address if DSM already exists for a given key) for a
> postgres process. If there is no immediate use (the argument to remove
> this function goes similar to detach_cb above), I'm sure we can
> introduce it when there's one.

See [0].  FWIW I tend to agree that we could leave this one out for now.

> 3. I think we don't need miscadmin.h inclusion in autoprewarm.c, do we?

I'll take a look at this one.

[0] https://postgr.es/m/ZYIu_JL-usgd3E1q%40paquier.xyz

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: add AVX2 support to simd.h

2024-01-02 Thread Nathan Bossart
On Mon, Jan 01, 2024 at 07:12:26PM +0700, John Naylor wrote:
> On Thu, Nov 30, 2023 at 12:15 AM Nathan Bossart
>  wrote:
>> I don't intend for this patch to be
>> seriously considered until we have better support for detecting/compiling
>> AVX2 instructions and a buildfarm machine that uses them.
> 
> That's completely understandable, yet I'm confused why there is a
> commitfest entry for it marked "needs review".

Perhaps I was too optimistic about adding support for newer instructions...

I'm tempted to propose that we move forward with this patch as-is after
adding a buildfarm machine that compiles with -mavx2 or -march=x86-64-v3.
There is likely still follow-up work to make these improvements more
accessible, but I'm not sure that is a strict prerequisite here.

(In case it isn't clear, I'm volunteering to set up such a buildfarm
machine.)

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: trying again to get incremental backup

2024-01-02 Thread Robert Haas
On Wed, Dec 27, 2023 at 10:36 AM Nathan Bossart
 wrote:
> On Wed, Dec 27, 2023 at 09:11:02AM -0500, Robert Haas wrote:
> > Thanks. I don't think there's a real bug, but I pushed a fix, same as
> > what you had.
>
> Thanks!  I also noticed that WALSummarizerLock probably needs a mention in
> wait_event_names.txt.

Fixed.

It seems like it would be good if there were an automated cross-check
between lwlocknames.txt and wait_event_names.txt.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: WIP Incremental JSON Parser

2024-01-02 Thread Robert Haas
On Tue, Dec 26, 2023 at 11:49 AM Andrew Dunstan  wrote:
> Quite a long time ago Robert asked me about the possibility of an
> incremental JSON parser. I wrote one, and I've tweaked it a bit, but the
> performance is significantly worse that that of the current Recursive
> Descent parser. Nevertheless, I'm attaching my current WIP state for it,
> and I'll add it to the next CF to keep the conversation going.

Thanks for doing this. I think it's useful even if it's slower than
the current parser, although that probably necessitates keeping both,
which isn't great, but I don't see a better alternative.

> One possible use would be in parsing large manifest files for
> incremental backup. However, it struck me a few days ago that this might
> not work all that well. The current parser and the new parser both
> palloc() space for each field name and scalar token in the JSON (unless
> they aren't used, which is normally not the case), and they don't free
> it, so that particularly if done in frontend code this amounts to a
> possible memory leak, unless the semantic routines do the freeing
> themselves. So while we can save some memory by not having to slurp in
> the whole JSON in one hit, we aren't saving any of that other allocation
> of memory, which amounts to almost as much space as the raw JSON.

It seems like a pretty significant savings no matter what. Suppose the
backup_manifest file is 2GB, and instead of creating a 2GB buffer, you
create an 1MB buffer and feed the data to the parser in 1MB chunks.
Well, that saves 2GB less 1MB, full stop. Now if we address the issue
you raise here in some way, we can potentially save even more memory,
which is great, but even if we don't, we still saved a bunch of memory
that could not have been saved in any other way.

As far as addressing that other issue, we could address the issue
either by having the semantic routines free the memory if they don't
need it, or alternatively by having the parser itself free the memory
after invoking any callbacks to which it might be passed. The latter
approach feels more conceptually pure, but the former might be the
more practical approach. I think what really matters here is that we
document who must or may do which things. When a callback gets passed
a pointer, we can document either that (1) it's a palloc'd chunk that
the calllback can free if they want or (2) that it's a palloc'd chunk
that the caller must not free or (3) that it's not a palloc'd chunk.
We can further document the memory context in which the chunk will be
allocated, if applicable, and when/if the parser will free it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2024-01-02 Thread Daniel Verite
  Hi,

PFA a rebased version.

Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite
From cd0fe1d517a0e31e031fbbea1e603a715c77ea97 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Daniel=20V=C3=A9rit=C3=A9?= 
Date: Tue, 2 Jan 2024 14:15:48 +0100
Subject: [PATCH v5 1/2] Implement retrieval of results in chunks with libpq.

This mode is similar to the single-row mode except that chunks
of results contain up to N rows instead of a single row.
It is meant to reduce the overhead of the row-by-row allocations
for large result sets.
The mode is selected with PQsetChunkedRowsMode(int maxRows) and results
have the new status code PGRES_TUPLES_CHUNK.
---
 doc/src/sgml/libpq.sgml   |  96 ++
 .../libpqwalreceiver/libpqwalreceiver.c   |   1 +
 src/bin/pg_amcheck/pg_amcheck.c   |   1 +
 src/interfaces/libpq/exports.txt  |   1 +
 src/interfaces/libpq/fe-exec.c| 117 +++---
 src/interfaces/libpq/libpq-fe.h   |   4 +-
 src/interfaces/libpq/libpq-int.h  |   7 +-
 7 files changed, 184 insertions(+), 43 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index ed88ac001a..8007bf67d8 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -3537,7 +3537,20 @@ ExecStatusType PQresultStatus(const PGresult *res);
 The PGresult contains a single result 
tuple
 from the current command.  This status occurs only when
 single-row mode has been selected for the query
-(see ).
+(see ).
+   
+  
+ 
+
+ 
+  PGRES_TUPLES_CHUNK
+  
+   
+The PGresult contains several tuples
+from the current command. The count of tuples cannot exceed
+the maximum passed to .
+This status occurs only when the chunked mode has been selected
+for the query (see ).

   
  
@@ -5187,8 +5200,8 @@ PGresult *PQgetResult(PGconn *conn);
   
Another frequently-desired feature that can be obtained with
 and 
-   is retrieving large query results a row at a time.  This is discussed
-   in .
+   is retrieving large query results a limited number of rows at a time.  This 
is discussed
+   in .
   
 
   
@@ -5551,12 +5564,13 @@ int PQflush(PGconn *conn);
 
 
 
- To enter single-row mode, call PQsetSingleRowMode
- before retrieving results with PQgetResult.
- This mode selection is effective only for the query currently
- being processed. For more information on the use of
- PQsetSingleRowMode,
- refer to .
+ To enter single-row or chunked modes, call
+ respectively PQsetSingleRowMode
+ or PQsetChunkedRowsMode before retrieving results
+ with PQgetResult.  This mode selection is effective
+ only for the query currently being processed. For more information on the
+ use of these functions refer
+ to .
 
 
 
@@ -5895,10 +5909,10 @@ UPDATE mytable SET x = x + 1 WHERE id = 42;
   
  
 
- 
-  Retrieving Query Results Row-by-Row
+ 
+  Retrieving Query Results by chunks
 
-  
+  
libpq
single-row mode
   
@@ -5909,13 +5923,15 @@ UPDATE mytable SET x = x + 1 WHERE id = 42;
PGresult.  This can be unworkable for commands
that return a large number of rows.  For such cases, applications can use
 and  
in
-   single-row mode.  In this mode, the result row(s) are
-   returned to the application one at a time, as they are received from the
-   server.
+   single-row mode or chunked 
mode.
+   In these modes, the result row(s) are returned to the application one at a
+   time for the single-row mode and by chunks for the chunked mode, as they
+   are received from the server.
   
 
   
-   To enter single-row mode, call 
+   To enter these modes, call 
+or 
immediately after a successful call of 
(or a sibling function).  This mode selection is effective only for the
currently executing query.  Then call 
@@ -5923,7 +5939,8 @@ UPDATE mytable SET x = x + 1 WHERE id = 42;
linkend="libpq-async"/>.  If the query returns any rows, they are returned
as individual PGresult objects, which look like
normal query results except for having status code
-   PGRES_SINGLE_TUPLE instead of
+   PGRES_SINGLE_TUPLE for the single-row mode and
+   PGRES_TUPLES_CHUNK for the chunked mode, instead of
PGRES_TUPLES_OK.  After the last row, or immediately if
the query returns zero rows, a zero-row object with status
PGRES_TUPLES_OK is returned; this is the signal that no
@@ -5936,9 +5953,9 @@ UPDATE mytable SET x = x + 1 WHERE id = 42;
   
 
   
-   When using pipeline mode, single-row mode needs to be activated for each
-   query in the pipeline before retrieving results for that query
-   with PQgetResult.
+   When using pipeline mode, the single-row or chunked mode need to be
+   activated for each query 

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-01-02 Thread Jelte Fennema-Nio
On Sat, 30 Dec 2023 at 00:07, Jelte Fennema-Nio  wrote:
>
> On Fri, 29 Dec 2023 at 19:32, Jeff Davis  wrote:
> > That is my biggest concern right now: what will new clients connecting
> > to old servers do?
>
> This is not that big of a deal. Since it's only an addition of a new
> message type, it's completely backwards compatible with the current
> protocol version. i.e. as long as a client just doesn't send it when
> the server reports an older protocol version everything works fine.
> The protocol already has version negotiation built in and the server
> implements it in a reasonable way. The only problem is that no-one
> bothered to implement the client side of it in libpq, because it
> wasn't necessary yet since 3.x only had a single minor version.
>
> Patch v20230911-0003[5] from thread [3] implements client side
> handling in (imho) a sane way.

Okay I updated this patchset to start with better handling of the
NegotiateProtocolVersion packet. The implementation is quite different
from [5] after all, but the core idea is the same. It also allows the
connection to continue to work in case of a missing protocol
extension, which is necessary for the libpq compression patchset[6].
In case the protocol extension is a requirement, the client can still
choose to disconnect by checking the return value of the newly added
PQunsupportedProtocolExtensions function.

I also fixed the tests of my final patch, but haven't changed the
behaviour of it in any of the suggested ways.

[3]: 
https://www.postgresql.org/message-id/flat/AB607155-8FED-4C8C-B702-205B33884CBB%40yandex-team.ru#961c695d190cdccb3975a157b22ce9d8
From ad000807b9f5491ed616b167297f1fdade76f4ee Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio 
Date: Tue, 2 Jan 2024 11:19:54 +0100
Subject: [PATCH v2 3/4] Bump protocol version to 3.1

In preparation of new additions to the protocol in a follow up commit
this bumps the minor protocol version number.
---
 src/include/libpq/pqcomm.h| 3 +--
 src/interfaces/libpq/fe-connect.c | 2 +-
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/src/include/libpq/pqcomm.h b/src/include/libpq/pqcomm.h
index 999ab94b9c8..a6a203d4c4f 100644
--- a/src/include/libpq/pqcomm.h
+++ b/src/include/libpq/pqcomm.h
@@ -91,11 +91,10 @@ is_unixsock_path(const char *path)
 
 /*
  * The earliest and latest frontend/backend protocol version supported.
- * (Only protocol version 3 is currently supported)
  */
 
 #define PG_PROTOCOL_EARLIEST	PG_PROTOCOL(3,0)
-#define PG_PROTOCOL_LATEST		PG_PROTOCOL(3,0)
+#define PG_PROTOCOL_LATEST		PG_PROTOCOL(3,1)
 
 typedef uint32 ProtocolVersion; /* FE/BE protocol version number */
 
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index aa01443cf03..8a1ca07ab3b 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -2779,7 +2779,7 @@ keep_going:		/* We will come back to here until there is
 		 * must persist across individual connection attempts, but we must
 		 * reset them when we start to consider a new server.
 		 */
-		conn->pversion = PG_PROTOCOL(3, 0);
+		conn->pversion = PG_PROTOCOL_LATEST;
 		conn->send_appname = true;
 #ifdef USE_SSL
 		/* initialize these values based on SSL mode */
-- 
2.34.1

From 7e32f3d58c2c58d51f1cd5297a5b2c05b578daf1 Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio 
Date: Tue, 2 Jan 2024 11:16:19 +0100
Subject: [PATCH v2 1/4] libpq: Handle NegotiateProtocolVersion message more
 leniently

Currently libpq would always error when the server returned a
NegotiateProtocolVersion message. This was fine because libpq only
supports a single protocol version and did not support any protocol
extensions. But we now need to change that to be able to add support for
future protocol changes, with a working fallback when connecting to an
older server.

This patch modifies the client side checks to allow a range of supported
protocol versions, instead of only allowing the exact version that was
requested. In addition it now allows connecting when the server does not
support some of the requested protocol extensions.

This patch also adds a new PQunsupportedProtocolExtensions API to libpq,
since a user might want to take some action in case a protocol extension
is not supported.
---
 doc/src/sgml/libpq.sgml | 19 
 src/interfaces/libpq/exports.txt|  1 +
 src/interfaces/libpq/fe-connect.c   | 46 -
 src/interfaces/libpq/fe-protocol3.c | 46 ++---
 src/interfaces/libpq/libpq-fe.h |  1 +
 src/interfaces/libpq/libpq-int.h|  2 ++
 6 files changed, 83 insertions(+), 32 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index ed88ac001a1..2e9ae41e389 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -2576,6 +2576,25 @@ int PQprotocolVersion(const PGconn *conn);
  
 
 
+
+ PQprotocolVersionPQprotocolVersion
+
+ 
+  
+   Returns a 

Re: Things I don't like about \du's "Attributes" column

2024-01-02 Thread Robert Haas
On Thu, Jun 22, 2023 at 8:50 PM Tom Lane  wrote:
> * It seems weird that some attributes are described in the negative
> ("Cannot login", "No inheritance").  I realize that this corresponds
> to the defaults, so that a user created by CREATE USER with no options
> shows nothing in the Attributes column; but I wonder how much that's
> worth.  As far as "Cannot login" goes, you could argue that the silent
> default ought to be for the properties assigned by CREATE ROLE, since
> the table describes itself as "List of roles".  I'm not dead set on
> changing this, but it seems like a topic that deserves a fresh look.

I wonder if we shouldn't try to display the roles's properties using
SQL keywords rather than narrating. Someone can be confused by "No
connections" but "CONNECTION LIMIT 0" is pretty hard to mistake;
likewise "LOGIN" or "NOLOGIN" seems clear enough. If we took this
approach, there would still be a question in my mind about whether to
show values where the configured value of the property matches the
default, and maybe we would want to do that in some cases and skip it
in others, or maybe we would end up with a uniform rule, but that
issue could be considered somewhat separately from how to print the
properties we choose to display.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2024-01-02 Thread Robert Haas
On Fri, Dec 22, 2023 at 8:14 AM Andrey M. Borodin  wrote:
> Just a side node.
> It seems like commit log is kind of antipattern of data contention. Even when 
> we build a super-optimized SLRU. Nearby **bits** are written by different 
> CPUs.
> I think that banks and locks are good thing. But also we could reorganize log 
> so that
> status of transaction 0 is on a page 0 at bit offset 0
> status of transaction 1 is on a page 1 at bit offset 0
> status of transaction 2 is on a page 2 at bit offset 0
> status of transaction 3 is on a page 3 at bit offset 0
> status of transaction 4 is on a page 0 at bit offset 2
> status of transaction 5 is on a page 1 at bit offset 2
> status of transaction 6 is on a page 2 at bit offset 2
> status of transaction 7 is on a page 3 at bit offset 2
> etc...

This is an interesting idea. A variant would be to stripe across
cachelines within the same page rather than across pages. If we do
stripe across pages as proposed here, we'd probably want to rethink
the way the SLRU is extended -- page at a time wouldn't really make
sense, but preallocating an entire file might.

> And it would be even better if page for transaction statuses would be 
> determined by backend id somehow. Or at least cache line. Can we allocate a 
> range (sizeof(cacheline)) of xids\subxids\multixacts\whatever for each 
> backend?

I don't understand how this could work. We need to be able to look up
transaction status by XID, not backend ID.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Track in pg_replication_slots the reason why slots conflict?

2024-01-02 Thread Bertrand Drouvot
Hi,

On Tue, Jan 02, 2024 at 10:35:59AM +0530, Amit Kapila wrote:
> On Mon, Jan 1, 2024 at 5:24 PM shveta malik  wrote:
> >
> > Please ignore the previous patch and PFA new v4 (v4_2). The only
> > change from the earlier v4 is the subject correction in commit msg.
> >

Thanks for the patch!

> The patch looks good to me. I have slightly changed one of the
> descriptions in the docs and also modified the commit message a bit. I
> will push this after two days unless there are any more
> comments/suggestions.
>

The patch LGTM, I just have a Nit comment:

+   wal_level_insufficient means that the
+is insufficient on the primary
+   server.

I'd prefer "primary_wal_level" instead of "wal_level_insufficient". I think it's
better to directly mention it is linked to the primary (without the need to 
refer
to the documentation) and that the fact that it is "insufficient" is more or 
less
implicit.

Basically I think that with "primary_wal_level" one would need to refer to the 
doc
less frequently than with "wal_level_insufficient".

But again, that's a Nit so feel free to ignore.

Regards,
 
-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: [PoC] Improve dead tuple storage for lazy vacuum

2024-01-02 Thread Masahiko Sawada
On Wed, Dec 27, 2023 at 12:08 PM John Naylor  wrote:
>
> On Tue, Dec 26, 2023 at 12:43 PM Masahiko Sawada  
> wrote:
> >
> > On Thu, Dec 21, 2023 at 4:41 PM John Naylor  wrote:
>
> > > +TidStoreSetBlockOffsets(TidStore *ts, BlockNumber blkno, OffsetNumber 
> > > *offsets,
> > > + int num_offsets)
> > > +{
> > > + char buf[MaxBlocktableEntrySize];
> > > + BlocktableEntry *page = (BlocktableEntry *) buf;
> > >
> > > I'm not sure this is safe with alignment. Maybe rather than plain
> > > "char", it needs to be a union with BlocktableEntry, or something.
> >
> > I tried it in the new patch set but could you explain why it could not
> > be safe with alignment?
>
> I was thinking because "buf" is just an array of bytes. But, since the
> next declaration is a cast to a pointer to the actual type, maybe we
> can rely on the compiler to do the right thing. (It seems to on my
> machine in any case)

Okay, I kept it.

>
> > > About separation of responsibilities for locking: The only thing
> > > currently where the tid store is not locked is tree iteration. That's
> > > a strange exception. Also, we've recently made RT_FIND return a
> > > pointer, so the caller must somehow hold a share lock, but I think we
> > > haven't exposed callers the ability to do that, and we rely on the tid
> > > store lock for that. We have a mix of tree locking and tid store
> > > locking. We will need to consider carefully how to make this more
> > > clear, maintainable, and understandable.
> >
> > Yes, tidstore should be locked during the iteration.
> >
> > One simple direction about locking is that the radix tree has the lock
> > but no APIs hold/release it. It's the caller's responsibility. If a
> > data structure using a radix tree for its storage has its own lock
> > (like tidstore), it can use it instead of the radix tree's one. A
>
> It looks like the only reason tidstore has its own lock is because it
> has no way to delegate locking to the tree's lock. Instead of working
> around the limitations of the thing we've designed, let's make it work
> for the one use case we have. I think we need to expose RT_LOCK_*
> functions to the outside, and have tid store use them. That would
> allow us to simplify all those "if (TidStoreIsShared(ts)
> LWLockAcquire(..., ...)" calls, which are complex and often redundant.

I agree that we expose RT_LOCK_* functions and have tidstore use them,
but am not sure the if (TidStoreIsShared(ts) LWLockAcquire(..., ...)"
calls part. I think that even if we expose them, we will still need to
do something like "if (TidStoreIsShared(ts))
shared_rt_lock_share(ts->tree.shared)", no?

>
> At some point, we'll probably want to keep locking inside, at least to
> smooth the way for fine-grained locking you mentioned.
>
> > > In a green field, it'd be fine to replace these with an expression of
> > > "num_offsets", but it adds a bit of noise for reviewers and the git
> > > log. Is it really necessary?
> >
> > I see your point. I think we can live with having both
> > has_lpdead_items and num_offsets. But we will have to check if these
> > values are consistent, which could be less maintainable.
>
> It would be clearer if that removal was split out into a separate patch.

Agreed.

>
> > >  I'm also not quite sure why "deadoffsets" and "lpdead_items" got
> > > moved to the PruneState. The latter was renamed in a way that makes
> > > more sense, but I don't see why the churn is necessary.
> ...
> > > I guess it was added here, 800 lines away? If so, why?
> >
> > The above changes are related. The idea is not to use tidstore in a
> > one-pass strategy. If the table doesn't have any indexes, in
> > lazy_scan_prune() we collect offset numbers of dead tuples on the page
> > and vacuum the page using them. In this case, we don't need to use
> > tidstore so we pass the offsets array to lazy_vacuum_heap_page(). The
> > LVPagePruneState is a convenient place to store collected offset
> > numbers.
>
> Okay, that makes sense, but if it was ever explained, I don't
> remember, and there is nothing in the commit message either.
>
> I'm not sure this can be split up easily, but if so it might help reviewing.

Agreed.

>
> This change also leads to a weird-looking control flow:
>
> if (vacrel->nindexes == 0)
> {
>   if (prunestate.num_offsets > 0)
>   {
> ...
>   }
> }
> else if (prunestate.num_offsets > 0)
> {
>   ...
> }

Fixed.

I've attached a new patch set. From v47 patch, I've merged your
changes for radix tree, and split the vacuum integration patch into 3
patches: simply replaces VacDeadItems with TidsTore (0007 patch), and
use a simple TID array for one-pass strategy (0008 patch), and replace
has_lpdead_items with "num_offsets > 0" (0009 patch), while
incorporating your review comments on the vacuum integration patch
(sorry for making it difficult to see the changes from v47 patch).
0013 to 0015 patches are also updates from v47 patch.

I'm thinking that we should change the order of the patches so that
tidstore patch 

Re: Password leakage avoidance

2024-01-02 Thread Sehrope Sarkuni
Having worked on and just about wrapped up the JDBC driver patch for this,
couple thoughts:

1. There's two sets of defaults, the client program's default and the
server's default. Need to pick one for each implemented function. They
don't need to be the same across the board.
2. Password encoding should be split out and made available as its own
functions. Not just as part of a wider "create _or_ alter a user's
password" function attached to a connection. We went a step further and
added an intermediate function that generates the "ALTER USER ... PASSWORD"
SQL.
3. We only added an "ALTER USER ... PASSWORD" function, not anything to
create a user. There's way too many options for that and keeping this
targeted at just assigning passwords makes it much easier to test.
4. RE:defaults, the PGJDBC approach is that the encoding-only function uses
the driver's default (SCRAM). The "alterUserPassword(...)" uses the
server's default (again usually SCRAM for modern installs but could be
something else). It's kind of odd that they're different but the use cases
are different as well.
5. Our SCRAM specific function allows for customizing the algo iteration
and salt parameters. That topic came up on hackers previously[1]. Our high
level "alterUserPassword(...)" function does not have those options but it
is available as part of our PasswordUtil SCRAM API for advanced users who
want to leverage it. The higher level functions have defaults for iteration
counts (4096) and salt size (16-bytes).
6. Getting the verbiage right for the PGJDBC version was kind of annoying
as we wanted to match the server's wording, e.g. "password_encryption", but
it's clearly hashing, not encryption. We settled on "password encoding" for
describing the overall task with the comments referencing the server's
usage of the term "password_encryption". Found a similar topic[2] on
changing that recently as well but looks like that's not going anywhere.

[1]:
https://www.postgresql.org/message-id/1d669d97-86b3-a5dc-9f02-c368bca911f6%40iki.fi
[2]:
https://www.postgresql.org/message-id/flat/ZV149Fd2JG_OF7CM%40momjian.us#cc97d20ff357a9e9264d4ae14e96e566

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/


Re: POC: Extension for adding distributed tracing - pg_tracing

2024-01-02 Thread Aleksander Alekseev
Hi,

> Overall solution looks good for me except SQL Commenter - query 
> instrumentation
> with SQL comments is often not possible on production systems. Instead
> the very often requested feature is to enable tracing for a given single 
> query ID,
> or several (very limited number of) queries by IDs. It could be done by adding
> SQL function to add and remove query ID into a list (even array would do)
> stored in top tracing context.

Not 100% sure if I follow.

By queries you mean particular queries, not transactions? And since
they have an assigned ID it means that the query is already executing
and we want to enable the tracking in another session, right? If this
is the case I would argue that enabling/disabling tracing for an
_already_ running query (or transaction) would be way too complicated.

I wouldn't mind enabling/disabling tracing for the current session
and/or a given session ID. In the latter case it can have an effect
only for the new transactions. This however could be implemented
separately from the proposed patchset. I suggest keeping the scope
small.

-- 
Best regards,
Aleksander Alekseev




Re: Proposal to add page headers to SLRU pages

2024-01-02 Thread Aleksander Alekseev
Hi,

> I have also added this thread to the current Commitfest and hope this patch
> will be  part of the 17 release.
>
> The commitfest link:
> https://commitfest.postgresql.org/46/4709/

Thanks for the updated patch.

cfbot seems to have some complaints regarding compiler warnings and
also building the patch on Windows:

http://cfbot.cputube.org/


-- 
Best regards,
Aleksander Alekseev




Re: Reducing output size of nodeToString

2024-01-02 Thread Peter Eisentraut

On 06.12.23 22:08, Matthias van de Meent wrote:

PFA a patch that reduces the output size of nodeToString by 50%+ in
most cases (measured on pg_rewrite), which on my system reduces the
total size of pg_rewrite by 33% to 472KiB. This does keep the textual
pg_node_tree format alive, but reduces its size signficantly.

The basic techniques used are
  - Don't emit scalar fields when they contain a default value, and
make the reading code aware of this.
  - Reasonable defaults are set for most datatypes, and overrides can
be added with new pg_node_attr() attributes. No introspection into
non-null Node/Array/etc. is being done though.
  - Reset more fields to their default values before storing the values.
  - Don't write trailing 0s in outDatum calls for by-ref types. This
saves many bytes for Name fields, but also some other pre-existing
entry points.


Based on our discussions, my understanding is that you wanted to produce 
an updated patch set that is split up a bit.


My suggestion is to make incremental patches along these lines:

- Omit from output all fields that have value zero.

- Omit location fields that have value -1.

- Omit trailing zeroes for scalar values.

- Recent location fields before storing in pg_rewrite (or possibly 
catalogs in general?)


- And then whatever is left, including the "default" value system that 
you have proposed.


The last one I have some doubts about, as previously expressed, but the 
first few seem sensible to me.  By splitting it up we can consider these 
incrementally.






Re: pg_upgrade and logical replication

2024-01-02 Thread Amit Kapila
On Fri, Dec 29, 2023 at 2:26 PM vignesh C  wrote:
>
> On Thu, 28 Dec 2023 at 15:59, Amit Kapila  wrote:
> >
> > On Wed, Dec 13, 2023 at 12:09 PM vignesh C  wrote:
> > >
> > > Thanks for the comments, the attached v25 version patch has the
> > > changes for the same.
> > >
> >
> > I have looked at it again and made some cosmetic changes like changing
> > some comments and a minor change in one of the error messages. See, if
> > the changes look okay to you.
>
> Thanks, the changes look good.
>

Pushed.

-- 
With Regards,
Amit Kapila.




Re: Commitfest manager January 2024

2024-01-02 Thread vignesh C
On Tue, 2 Jan 2024 at 15:43, Magnus Hagander  wrote:
>
> On Tue, Jan 2, 2024 at 3:45 AM vignesh C  wrote:
> >
> > On Mon, 1 Jan 2024 at 21:01, Magnus Hagander  wrote:
> > >
> > > On Mon, Jan 1, 2024 at 4:35 AM vignesh C  wrote:
> > > >
> > > > On Sun, 24 Dec 2023 at 18:40, vignesh C  wrote:
> > > > >
> > > > > On Sun, 24 Dec 2023 at 07:16, Michael Paquier  
> > > > > wrote:
> > > > > >
> > > > > > On Sat, Dec 23, 2023 at 08:52:38AM +0530, vignesh C wrote:
> > > > > > > I didn't see anyone volunteering for the January Commitfest, so 
> > > > > > > I'll
> > > > > > > volunteer to be CF manager for January 2024 Commitfest.
> > > > > >
> > > > > > (Adding Magnus in CC.)
> > > > > >
> > > > > > That would be really helpful.  Thanks for helping!  Do you have the
> > > > > > admin rights on the CF app?  You are going to require them in order 
> > > > > > to
> > > > > > mark the CF as in-process, and you would also need to switch the CF
> > > > > > after that from "Future" to "Open" so as people can still post
> > > > > > patches once January one begins.
> > > > >
> > > > > I don't have admin rights for the CF app. Please provide admin rights.
> > > >
> > > > I have not yet got the admin rights, Kindly provide admin rights for 
> > > > the CF app.
> > >
> > > It's been christmas holidays...
> > >
> > >
> > > What's your community username?
> >
> > My username is vignesh.postgres
>
> CF admin permissions granted!

Thanks, I have updated the commitfest 2024-01 to In Progress and
commitfest 2024-03 to Open.

Regards,
Vignesh




The presence of a NULL "defaclacl" value in pg_default_acl prevents the dropping of a role.

2024-01-02 Thread 杨伯宇(长堂)
Hello postgres hackers:
I recently came across a scenario involving system catalog "pg_default_acl"
where a tuple contains a NULL value for the "defaclacl" attribute. This can 
cause
confusion while dropping a role whose default ACL has been changed.
Here is a way to reproduce that:

``` example
postgres=# create user adminuser;
CREATE ROLE
postgres=# create user normaluser;
CREATE ROLE
postgres=# alter default privileges for role adminuser grant all on tables to 
normaluser;
ALTER DEFAULT PRIVILEGES
postgres=# alter default privileges for role adminuser revoke all ON tables 
from adminuser;
ALTER DEFAULT PRIVILEGES
postgres=# alter default privileges for role adminuser revoke all ON tables 
from normaluser;
ALTER DEFAULT PRIVILEGES
postgres=# select * from pg_default_acl where pg_get_userbyid(defaclrole) = 
'adminuser';
  oid  | defaclrole | defaclnamespace | defaclobjtype | defaclacl
---++-+---+---
 16396 |  16394 |   0 | r | {}
(1 row)
postgres=# drop user adminuser ;
ERROR:  role "adminuser" cannot be dropped because some objects depend on it
DETAIL:  owner of default privileges on new relations belonging to role 
adminuser
```

I believe this is a bug since the tuple can be deleted if we revoke from 
"normaluser"
first. Besides, according to the document:
https://www.postgresql.org/docs/current/sql-alterdefaultprivileges.html
> If you wish to drop a role for which the default privileges have been altered,
> it is necessary to reverse the changes in its default privileges or use DROP 
> OWNED BY
> to get rid of the default privileges entry for the role.
There must be a way to "reverse the changes", but NULL value of "defaclacl"
prevents it. Luckily, "DROP OWNED BY" works well.

The code-level reason could be that the function "SetDefaultACL" doesn't handle
the situation where "new_acl" is NULL. So I present a simple patch here.

diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index 01ff575b093..0e313526b28 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -1335,7 +1335,7 @@ SetDefaultACL(InternalDefaultACL *iacls)
   */
  aclitemsort(new_acl);
  aclitemsort(def_acl);
- if (aclequal(new_acl, def_acl))
+ if (aclequal(new_acl, def_acl) || ACL_NUM(new_acl) == 0)
  {
   /* delete old entry, if indeed there is one */
   if (!isNew)

Best regards,
Boyu Yang

Re: Commitfest manager January 2024

2024-01-02 Thread Magnus Hagander
On Tue, Jan 2, 2024 at 3:45 AM vignesh C  wrote:
>
> On Mon, 1 Jan 2024 at 21:01, Magnus Hagander  wrote:
> >
> > On Mon, Jan 1, 2024 at 4:35 AM vignesh C  wrote:
> > >
> > > On Sun, 24 Dec 2023 at 18:40, vignesh C  wrote:
> > > >
> > > > On Sun, 24 Dec 2023 at 07:16, Michael Paquier  
> > > > wrote:
> > > > >
> > > > > On Sat, Dec 23, 2023 at 08:52:38AM +0530, vignesh C wrote:
> > > > > > I didn't see anyone volunteering for the January Commitfest, so I'll
> > > > > > volunteer to be CF manager for January 2024 Commitfest.
> > > > >
> > > > > (Adding Magnus in CC.)
> > > > >
> > > > > That would be really helpful.  Thanks for helping!  Do you have the
> > > > > admin rights on the CF app?  You are going to require them in order to
> > > > > mark the CF as in-process, and you would also need to switch the CF
> > > > > after that from "Future" to "Open" so as people can still post
> > > > > patches once January one begins.
> > > >
> > > > I don't have admin rights for the CF app. Please provide admin rights.
> > >
> > > I have not yet got the admin rights, Kindly provide admin rights for the 
> > > CF app.
> >
> > It's been christmas holidays...
> >
> >
> > What's your community username?
>
> My username is vignesh.postgres

CF admin permissions granted!

//Magnus




Re: Adding facility for injection points (or probe points?) for more advanced tests

2024-01-02 Thread Ashutosh Bapat
On Tue, Dec 12, 2023 at 4:15 PM Michael Paquier  wrote:
>
> On Tue, Dec 12, 2023 at 10:27:09AM +0530, Dilip Kumar wrote:
> > Oops, I only included the code changes where I am adding injection
> > points and some comments to verify that, but missed the actual test
> > file. Attaching it here.
>
> I see.  Interesting that this requires persistent connections to work.
> That's something I've found clunky to rely on when the scenarios a
> test needs to deal with are rather complex.  That's an area that could
> be made easier to use outside of this patch..  Something got proposed
> by Andrew Dunstan to make the libpq routines usable through a perl
> module, for example.
>
> > Note:  I think the latest patches are conflicting with the head, can you 
> > rebase?
>
> Indeed, as per the recent manipulations in ipci.c for the shmem
> initialization areas.  Here goes a v6.

Sorry for replying late here. Another minor conflict has risen again.
It's minor enough to be ignored for a review.

On Tue, Nov 21, 2023 at 6:56 AM Michael Paquier  wrote:
>
> On Mon, Nov 20, 2023 at 04:53:45PM +0530, Ashutosh Bapat wrote:
> > On Wed, Oct 25, 2023 at 9:43 AM Michael Paquier  wrote:
> >> I have added some documentation to explain that, as well.  I am not
> >> wedded to the name proposed in the patch, so if you feel there is
> >> better, feel free to propose ideas.
> >
> > Actually with Attach and Detach terminology, INJECTION_POINT becomes
> > the place where we "declare" the injection point. So the documentation
> > needs to first explain INJECTION_POINT and then explain the other
> > operations.
>
> Sure.

This discussion has not been addressed in v6. I think the interface
needs to be documented in the order below
INJECTION_POINT - this declares an injection point - i.e. a place in
code where an external code can be injected (and run).
InjectionPointAttach() - this is used to associate ("attach") external
code to an injection point.
InjectionPointDetach() - this is used to disassociate ("detach")
external code from an injection point.

Specifying that InjectionPointAttach() "defines" an injection point
gives an impression that the injection point will be "somehow" added
to the code by calling InjectionPointAttach() which is not true. For
InjectionPointAttach() to be useful, the first argument to it should
be something already "declared" in the code using INJECTION_POINT().
Hence INJECTION_POINT needs to be mentioned in the documentation
first, followed by Attach and detach. The documentation needs to be
rephrased to use terms "declare", "attach" and "detach" instead of
"define", "run". The first set is aligned with the functionality
whereas the second set is aligned with the implementation.

Even if an INJECTION_POINT is not "declared" attach would succeed but
doesn't do anything. I think this needs to be made clear in the
documentation. Better if we could somehow make Attach() fail if the
specified injection point is not "declared" using INJECTION_POINT. Of
course we don't want to bloat the hash table with all "declared"
injection points even if they aren't being attached to and hence not
used. I think, exposing the current injection point strings as
#defines and encouraging users to use these macros instead of string
literals will be a good start.

With the current implementation it's possible to "declare" injection
point with same name at multiple places. It's useful but is it
intended?

/* Field sizes */
#define INJ_NAME_MAXLEN 64
#define INJ_LIB_MAXLEN 128
#define INJ_FUNC_MAXLEN 128
I think these limits should be either documented or specified in the
error messages for users to fix their code in case of
errors/unexpected behaviour.

Here are some code level comments on 0001

+typedef struct InjectionPointArrayEntry

This is not an array entry anymore. I think we should rename
InjectionPointEntry as SharedInjectionPointEntry and InjectionPointArrayEntry
as LocalInjectionPointEntry.

+/* utilities to handle the local array cache */
+static void
+injection_point_cache_add(const char *name,
+ InjectionPointCallback callback)
+{
... snip ...
+
+ entry = (InjectionPointCacheEntry *)
+ hash_search(InjectionPointCache, name, HASH_ENTER, );
+
+ if (!found)

The function is called only when the injection point is not found in the local
cache. So this condition will always be true. An Assert will help to make it
clear and also prevent an unintended callback replacement.

+#ifdef USE_INJECTION_POINTS
+static bool
+file_exists(const char *name)

There's similar function in jit.c and dfmgr.c. Can we not reuse that code?

+ /* Save the entry */
+ memcpy(entry_by_name->name, name, sizeof(entry_by_name->name));
+ entry_by_name->name[INJ_NAME_MAXLEN - 1] = '\0';
+ memcpy(entry_by_name->library, library, sizeof(entry_by_name->library));
+ entry_by_name->library[INJ_LIB_MAXLEN - 1] = '\0';
+ memcpy(entry_by_name->function, function, sizeof(entry_by_name->function));
+ entry_by_name->function[INJ_FUNC_MAXLEN - 1] = '\0';

Most of the code is 

Re: [17] CREATE SUBSCRIPTION ... SERVER

2024-01-02 Thread Bharath Rupireddy
On Mon, Jan 1, 2024 at 12:29 AM Jeff Davis  wrote:
>
> On Fri, 2023-12-29 at 15:22 -0800, Jeff Davis wrote:
> > On Tue, 2023-09-05 at 12:08 -0700, Jeff Davis wrote:
> > > OK, so we could have a built-in FDW called pg_connection that would
> > > do
> > > the right kinds of validation; and then also allow other FDWs but
> > > the
> > > subscription would have to do its own validation.
> >
> > Attached a rough rebased version.
>
> Attached a slightly better version which fixes a pg_dump issue and
> improves the documentation.

Hi,  I spent some time today reviewing the v4 patch and below are my
comments. BTW, the patch needs a rebase due to commit 9a17be1e2.

1.
+/*
+ * We don't want to allow unprivileged users to be able to trigger
+ * attempts to access arbitrary network destinations, so
require the user
+ * to have been specifically authorized to create connections.
+ */
+if (!has_privs_of_role(owner, ROLE_PG_CREATE_CONNECTION))

Can the pg_create_connection predefined role related code be put into
a separate 0001 patch? I think this can go in a separate commit.

2. Can one use {FDW, user_mapping, foreign_server} combo other than
the built-in pg_connection_fdw? If yes, why to allow say oracle_fdw
foreign server and user mapping with logical replication? Isn't this a
security concern?

3. I'd like to understand how the permission model works with this
feature amidst various users a) subscription owner b) table owner c)
FDW owner d) user mapping owner e) foreign server owner f) superuser
g) user with which logical replication bg workers (table sync,
{parallel} apply workers) are started up and running.
What if foreign server owner doesn't have permissions on the table
being applied by logical replication bg workers?
What if foreign server owner is changed with ALTER SERVER ... OWNER TO
when logical replication is in-progress?
What if the owner of  {FDW, user_mapping, foreign_server} is different
from a subscription owner with USAGE privilege granted? Can the
subscription still use the foreign server?

4. How does the invalidation of {FDW, user_mapping, foreign_server}
affect associated subscription and vice-versa?

5. What if the password is changed in user mapping with ALTER USER
MAPPING? Will it refresh the subscription so that all the logical
replication workers get restarted with new connection info?

6. How does this feature fit if a subscription is created with
run_as_owner? Will it check if the table owner has permissions to use
{FDW, user_mapping, foreign_server} comob?

7.
+if (strcmp(d->defname, "user") == 0 ||
+strcmp(d->defname, "password") == 0 ||
+strcmp(d->defname, "sslpassword") == 0 ||
+strcmp(d->defname, "password_required") == 0)
+ereport(ERROR,
+(errmsg("invalid option \"%s\" for pg_connection_fdw",

+ereport(ERROR,
+(errmsg("invalid user mapping option \"%s\"
for pg_connection_fdw",
+d->defname)));

Can we emit an informative error message and hint using
initClosestMatch, updateClosestMatch, getClosestMatch similar to other
FDWs elsewhere in the code?

8.
+ errmsg("password is required"),
+ errdetail("Non-superusers must provide a
password in the connection string.")));

The error message and detail look generic, can it be improved to
include something about pg_connection_fdw?

9.
+{ oid => '6015', oid_symbol => 'PG_CONNECTION_FDW',
+  descr => 'Pseudo FDW for connections to Postgres',
+  fdwname => 'pg_connection_fdw', fdwowner => 'POSTGRES',

What if the database cluster is initialized with an owner different
than 'POSTGRES' at the time of initdb? Will the fdwowner be correct in
that case?

10.
+# src/include/catalog/pg_foreign_data_wrapper.dat
+{ oid => '6015', oid_symbol => 'PG_CONNECTION_FDW',

Do we want to REVOKE USAGE ON FOREIGN DATA WRAPPER pg_connection_fdw
FROM PUBLIC and REVOKE EXECUTE ON its handler functions? With this,
the permissions are granted explicitly to the foreign server/user
mapping creators.

11. How about splitting patches in the following manner for better
manageability (all of which can go as separate commits) of this
feature?
0001 for pg_create_connection predefined role per comment #1.
0002 for introducing in-built FDW pg_connection_fdw.
0003 utilizing in-built FDW for logical replication to provide CREATE
SUBSCRIPTION ... SERVER.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: UUID v7

2024-01-02 Thread Andrey M. Borodin


> On 9 Oct 2023, at 23:46, Andrey Borodin  wrote:

Here's next iteration of the patch. I've added get_uuid_v7_time().
This function extracts timestamp from uuid, iff it is v7. Timestamp correctness 
only guaranteed if the timestamp was generated by the same implementation (6 
bytes for milliseconds obtained by gettimeofday()).
Tests verify that get_uuid_v7_time(gen_uuid_v7()) differs no more than 1ms from 
now(). Maybe we should allow more tolerant values for slow test machines.


Best regards, Andrey Borodin.




v7-0001-Implement-UUID-v7-as-per-IETF-draft.patch
Description: Binary data


v7-0003-Use-cached-random-numbers-in-gen_random_uuid-too.patch
Description: Binary data


v7-0002-Buffer-random-numbers.patch
Description: Binary data


Re: Extract numeric filed in JSONB more effectively

2024-01-02 Thread jian he
hi.
you don't need to change src/include/catalog/catversion.h
as mentioned in https://wiki.postgresql.org/wiki/Committing_checklist
Otherwise, cfbot will fail many times.

+typedef enum JsonbValueTarget
+{
+ JsonbValue_AsJsonbValue,
+ JsonbValue_AsJsonb,
+ JsonbValue_AsText
+} JsonbValueTarget;

change to

+typedef enum JsonbValueTarget
+{
+ JsonbValue_AsJsonbValue,
+ JsonbValue_AsJsonb,
+ JsonbValue_AsText,
+} JsonbValueTarget;

currently cannot do `git apply`.