date:20220329

On Tue, Mar 29, 2022 at 9:00 PM Mark Dilger 
wrote:

>  A grant or revoke on an unrecognized custom parameter will create a SUSET
> placeholder,

[...]

> which cleans up the problem, with one exception:  if the user executes a
> "revoke set on parameter some.such from public" prior to loading the module
> which defines parameter some.such, that revoke won't be retained.  That
> doesn't seem entirely wrong to me, since no privilege to set the parameter
> existed when the revoke was performed, but rather was granted along with
> the creation of the parameter, but it also doesn't seem entirely right.
> Maybe revoke commands (but not grant commands) should error on unrecognized
> custom parameters?

The only revoke role target that makes sense here is the default grant
given to public.  Aside from that technicality the grant system is purely
additive and so only the GRANTS end up retained so far as perpetual state
is concerned.

For a revoke that doesn't target public we should remove the corresponding
unrecognized setting grant if one is present.  We don't generally/always
raise a notice if a revoke doesn't actually cause something to be ungranted
-  though I'm partial to being so informed.  I suppose we could distinguish
the cases where the not-yet-loaded setting name is unrecognized by the
system from the one where it is recognized but the grant is actually on a
different role (or a revoke entry from public for the setting name is
present).

For a revoke of an unknown setting from public we should keep an entry
somewhere that tells the system that the default grant to public for that
setting has been revoked.  Maybe there isn't the same timing concern here
as there is for GRANT, but if only for symmetry it seems like a good thing
to implement.

I have the impression I'm missing something in what I wrote above but
cannot quite figure out what.  In any case as a first pass at this the
behavior described is kinda what I'm expecting.

David J.

P.S.

Skimming the patch we are, to my agreement, not touching the ALTER DEFAULT
PRIVILEGES command to work with this feature.  Should the omission be noted
explicitly?  At least in the commit message I would think.  Though the
sentence in [1] "Also, these default privilege settings can be overridden
using the ALTER DEFAULT PRIVILEGES command." is rendered only partially
correct.

[1] https://www.postgresql.org/docs/current/ddl-priv.html

P.P.S.

+   The default privileges for a user parameter allow
+   PUBLIC to SET and
+   RESET the assigned value.  By default,
+   PUBLIC has no privileges on
+   postmaster, superuser-backend,
+   internal, backend,
+   sighup, and superuser parameters.

Can we rephrase this to something like:

By default, PUBLIC has no privileges on parameters in the postmaster, ...,
and superuser contexts.

pg_settings.context exists and those are the values found there.  My
initial interpretation of the wording was the postmaster, etc..., were
themselves parameters, not containers for many parameters.

Re: Is there any documentation on how to correctly create extensions in HA(primary-standby) setup?

Hi,

On Sun, Mar 27, 2022 at 09:07:12AM +0530, Bharath Rupireddy wrote:
> On Fri, Mar 25, 2022 at 10:20 AM Greg Stark  wrote:
> >
> > This doesn't seem to be getting any further attention. It sounds like
> > Julien didn't agree with the scope of the text. Bharath do you think
> > Julien's comments make sense? Will you have a chance to look at this?
> 
> Thanks Greg. I was busy with other features.
> 
> Thanks Julien for the off-list discussion. I tried to address review
> comments in the v3 patch attached. Now, I've added the notes in
> high-availability.sgml which sort of suits more and closer to physical
> replicatioin than contrib.sgml or extend.sgml.

+[...] Firstly, the module's shared library must be present on
+ both primary and standbys.

I'm a bit confused with it.  It looks like it means that the .so must be
physically present on both servers, but I'm assuming that you're talking about
shared_preload_libraries?

If yes, I still think it's worth documenting that it *needs* to be present on
the standbys *if* you want it to be enabled on the standby, including if it can
be promoted to a primary node.  And that any related GUC also has to be
properly configured on all nodes (so maybe moving the last paragraph just after
this one?).

If no, maybe just saying that the module has to be installed and configured on
all nodes?

+ [...] If the module exposes SQL functions, running
+ CREATE 
EXTENSION
+ command on primary is sufficient as standbys will receive it via physical
+ replication.

I think it's better to phrase it with something like "CREATE EXTENSION is
replicated in physical replication similarly to other DDL commands".

+ [...] The
+ module's shared library gets loaded upon first usage of any of its
+ functions on primary and standbys.

Is it worth documenting that?  Note that this is only true if the lib isn't in
shared_preload_libraries and if it's a wrapper on top of a C function.

nitpicking: there's a trailing whitespace after "standbys."

+ If the module doesn't expose SQL functions, the shared library has to be
+ loaded separately on primary and standbys, either by
+ LOAD command or by
+ setting parameter  or
+  or
+ , depending on module's need.

I think this is also confusing.  The need for preloading is entirely orthogonal
to SQL functions in the extension, especially since this is implying SQL
function over C-code.  This should be reworded to go with the first paragraph I
think.

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-29 Thread Dilip Kumar

On Wed, Mar 30, 2022 at 6:47 AM Andres Freund  wrote:
>
>
> du -s /tmp/initdb/
> WAL_LOG: 35112
> FILE_COPY: 29288
>
> So it seems we should specify a strategy in initdb? It kind of makes sense -
> we're not going to read anything from those database. And because of the
> ringbuffer of 256kB, we'll not even reduce IO meaningfully.

I think this makes sense, so you mean with initdb we will always use
file_copy or we want to give a command line option for initdb ?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Assert in pageinspect with NULL pages

Hi,

On Mon, Mar 28, 2022 at 08:29:29PM +0300, Maxim Orlov wrote:
> I've suddenly found that the test in this patch is based on a fact that
> heap pages don't have PageSpecial or it is of different size with btree
> pages Special area:
>
> CREATE TABLE test1 (a int8, b int4range);
>
> SELECT bt_page_items(get_raw_page('test1', 0));
>
>
> In the current state is is so, but it is not guaranteed. For example, in
> the proposed 64xid patch [1] we introduce PageSpecial into heap pages and
> its size on occasion equals to the size of btree page special so check from
> above is false. Even if we pass heap page into pageinspect pretending it is
> btree page. So the test will fail.
>
>
> Generally it seems not only a wrong test but the consequence of a bigger
> scale fact that we can not always distinguish different AM's pages. So I'd
> propose at least that we don't come to a conclusion that a page is valid
> based on PageSpecial size is right.

We don't assume that the page is valid, or of the correct AM, just based on the
special area size.  We do reject it if the size is invalid, but we also have
other tests based on what's actually possible to test (checking flag sanity,
magic values and so on).

Note that Peter G. suggested to add more checks for btree based on
palloc_btree_page(), did you check if those were enough to fix this test with
the 64bits xid patchset applied?

Re: Jumble Query with COERCE_SQL_SYNTAX

Hi,

On Tue, Mar 29, 2022 at 03:52:57PM +0300, Yura Sokolov wrote:
> 
> v14 introduced the way to get original text for some kind of expressions
> using new 'funcformat' - COERCE_SQL_SYNTAX:
> - EXTRACT(part from timestamp)
> - (text IS [form] NORMALIZED)
> and others.
> 
> Mentioned EXTRACT and NORMALIZED statements has parts, that are not
> usual arguments but some kind of syntax. At least, there is no way to:
> 
>   PREPARE a(text) as select extract($1 from now());
> 
> But JumbleExpr doesn't distinguish it and marks this argument as a
> variable constant, ie remembers it in 'clocations'.
> 
> I believe such "non-variable constant" should not be jumbled as
> replaceable thing.

Yeah, the problem is really that those are some form of sublanguage inside SQL,
which is always a mess :(

It's probably an implementation detail that we treat those as syntactic sugar
for plain function calls, but since that's what we're doing I don't think it's
really sensible to change that.  For instance, for the query jumbler using this
query or "select pg_catalog.extract($1, now())" are identical, and that form
can be prepared.  Maybe it would make sense to allow a parameter for the
EXTRACT(x FROM y), since we're already allowing a non-standard form with
plain string literal?  The story is a bit different for NORMALIZED though.

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

On Tue, Mar 29, 2022 at 11:58 AM Peter Geoghegan  wrote:
> > I think I understand what the first paragraph of the header comment
> > for heap_tuple_needs_freeze() is trying to say, but the second one is
> > quite confusing. I think this is again because it veers into talking
> > about what the caller should do rather than explaining what the
> > function itself does.
>
> I wouldn't have done it that way if the function wasn't called
> heap_tuple_needs_freeze().
>
> I would be okay with removing this paragraph if the function was
> renamed to reflect the fact it now tells the caller something about
> the tuple having an old XID/MXID relative to the caller's own XID/MXID
> cutoffs. Maybe the function name should be heap_tuple_would_freeze(),
> making it clear that the function merely tells caller what
> heap_prepare_freeze_tuple() *would* do, without presuming to tell the
> vacuumlazy.c caller what it *should* do about any of the information
> it is provided.

Attached is v13, which does it that way. This does seem like a real
increase in clarity, albeit one that comes at the cost of renaming
heap_tuple_needs_freeze().

v13 also addresses all of the other items from Robert's most recent
round of feedback.

I would like to commit something close to v13 on Friday or Saturday.

Thanks
-- 
Peter Geoghegan


v13-0003-vacuumlazy.c-Move-resource-allocation-to-heap_va.patch
Description: Binary data


v13-0002-Generalize-how-VACUUM-skips-all-frozen-pages.patch
Description: Binary data


v13-0001-Set-relfrozenxid-to-oldest-extant-XID-seen-by-VA.patch
Description: Binary data

Re: Column Filtering in Logical Replication

2022-03-29 Thread Amit Kapila

On Tue, Mar 29, 2022 at 6:09 PM Tomas Vondra
 wrote:
>
> On 3/29/22 13:47, Amit Kapila wrote:
> > On Tue, Mar 29, 2022 at 4:33 PM Tomas Vondra
> >  wrote:
> >>
> >> On 3/29/22 12:00, Amit Kapila wrote:
> 
>  Thanks, I'll take a look later.
> 
> >>>
> >>> This is still failing [1][2].
> >>>
> >>> [1] - 
> >>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&dt=2022-03-28%2005%3A16%3A53
> >>> [2] - 
> >>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2022-03-24%2013%3A13%3A08
> >>>
> >>
> >> AFAICS we've concluded this is a pre-existing issue, not something
> >> introduced by a recently committed patch, and I don't think there's any
> >> proposal how to fix that.
> >>
> >
> > I have suggested in email [1] that increasing values
> > max_sync_workers_per_subscription/max_logical_replication_workers
> > should solve this issue. Now, whether this is a previous issue or
> > behavior can be debatable but I think it happens for the new test case
> > added by commit c91f71b9dc.
> >
>
> IMHO that'd be just hiding the actual issue, which is the failure to
> sync the subscription in some circumstances. We should fix that, not
> just make sure the tests don't trigger it.
>

I am in favor of fixing/changing some existing behavior to make it
better and would be ready to help in that investigation as well but
was just not sure if it is a good idea to let some of the buildfarm
member(s) fail for a number of days. Anyway, I leave this judgment to
you.

-- 
With Regards,
Amit Kapila.

Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)

2022-03-29 Thread Kyotaro Horiguchi

At Wed, 30 Mar 2022 10:06:17 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Tue, 29 Mar 2022 13:29:15 +0200, Erik Rijkers  wrote in 
> > Op 29-03-2022 om 12:50 schreef Matthias van de Meent:
> > > An shorter (?) reproducer might be the following, which forces any
> > > value for 'a' to be toasted and thus triggering the check in
> > > init_toast_snapshot regardless of value length:
> > > CREATE TABLE t (a text);
> > > ALTER TABLE t ALTER COLUMN a SET STORAGE EXTERNAL;
> > > INSERT INTO t VALUES ('toast');
> > > BEGIN;
> > > DECLARE c CURSOR FOR SELECT * FROM t;
> > > FETCH ALL IN c;
> 
> Yeah, unfortunately I tried that first and saw it didn't work. And it
> still doesn't for me.  With such a short text pg_detoast_datum_pakced
> doesn't call detoast_attr.  Actually it is VARATT_IS_1B. (@master)
> 
> I think I'm missing something here. I'm going to examine around.

Hmm. Strange. My memory tells that I did the same thing before.. I
thought that it is somewhat related to compression since repeat('x',
4096) didin't seem working at that time, but it worked this time.
Maybe I was confused between extended and external..

But, in the first place the *fix* has been found to be wrong.  I'm
going to search for the right fix..


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Add parameter jit_warn_above_fraction

On Wed, 30 Mar 2022 at 14:48, Andres Freund  wrote:
>
> On 2022-03-30 14:30:32 +1300, David Rowley wrote:
> > Maybe nodes below an Append/MergeAppend with run-time pruning could compile
> > on-demand and other nodes up-front.  Or maybe there's no problem with making
> > everything on-demand.
>
> Yea, that could work. The expressions for one "partition query" would still
> have to be emitted at once. For each such subtree we should make a separate
> costing decision. But I think an additional "will be executed" sub-node is a
> different story, the threshold shouldn't be done on a per-node basis.  That
> partitioning of the plan tree is kind of what I was trying to get at...

Maybe this point is moot if we get something like [1]. It might mean
that run-time pruning would happen early enough that we could just JIT
compile non-pruned subnodes.

David

[1] https://commitfest.postgresql.org/37/3478/

Re: Add parameter jit_warn_above_fraction

Hi,

On 2022-03-30 14:30:32 +1300, David Rowley wrote:
> On Wed, 30 Mar 2022 at 13:20, Andres Freund  wrote:
> > I wonder whether it'd make sense to combine that with awareness of a few 
> > plan
> > types that can lead to large portions of child nodes never being executed. 
> > One
> > the case where the current behaviour is the worst is runtime partition 
> > pruning
> > in append - we compile expressions for whole subtrees that will never be
> > executed. We should be much more hesitant to compile there compared to a
> > cheap-ish node that we know will be executed as part of a large expensive 
> > part
> > of the plan tree.
> 
> I think that's also a problem but I think that might be better fixed
> another way.
> 
> There is a patch [1] around that seems to change things to compile JIT
> on-demand.

That's a bad idea idea to do on a per-function basis. Emitting functions one
by one is considerably slower than doing so in larger chunks. Of course that
can be addressed to some degree by something like what you suggest below.

> I've not looked at the patch but imagine the overhead might be kept minimal
> by initially setting the evalfunc to compile and run, then set it to just
> run the compiled Expr for subsequent executions.

That part I'm not worried about, there's such an indirection on the first call
either way IIRC.

> Maybe nodes below an Append/MergeAppend with run-time pruning could compile
> on-demand and other nodes up-front.  Or maybe there's no problem with making
> everything on-demand.

Yea, that could work. The expressions for one "partition query" would still
have to be emitted at once. For each such subtree we should make a separate
costing decision. But I think an additional "will be executed" sub-node is a
different story, the threshold shouldn't be done on a per-node basis.  That
partitioning of the plan tree is kind of what I was trying to get at...

Greetings,

Andres Freund

Re: logical replication empty transactions

2022-03-29 Thread Masahiko Sawada

On Tue, Mar 29, 2022 at 6:15 PM houzj.f...@fujitsu.com
 wrote:
>
> On Tuesday, March 29, 2022 5:12 PM Amit Kapila  
> wrote:
> >
> > On Tue, Mar 29, 2022 at 2:05 PM houzj.f...@fujitsu.com
> >  wrote:
> > >
> > > Attach the new version patch which addressed the above comments and
> > > slightly adjusted some code comments.
> > >
> >
> > The patch looks good to me. One minor suggestion is to change the function
> > name ProcessPendingWritesAndTimeOut() to ProcessPendingWrites().
>
> Thanks for the comment.
> Attach the new version patch with this change.
>

Thank you for updating the patch. Looks good to me.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: [PATCH] Full support for index LP_DEAD hint bits on standby

On Tue, Mar 22, 2022 at 6:52 AM Michail Nikolaev 
wrote:

> Hello, Andres.
>
> > Fails to apply at the moment: http://cfbot.cputube.org/patch_37_2947.log
>
> Thanks for notifying me. BTW, some kind of automatic email in case of
> status change could be very helpful.
>
> > Marked as waiting for author.
>
> New version is attached, build is passing
> (https://cirrus-ci.com/build/5599876384817152), so, moving it back to
> "ready for committer" .
>
>
This may be a naive comment but I'm curious: The entire new second
paragraph of the README scares me:

+There are restrictions on settings LP_DEAD bits by the standby related to
+minRecoveryPoint value. In case of crash recovery standby will start to
process
+queries after replaying WAL to minRecoveryPoint position (some kind of
rewind to
+the previous state). A the same time setting of LP_DEAD bits are not
protected
+by WAL in any way. So, to mark tuple as dead we must be sure it was
"killed"
+before minRecoveryPoint (comparing the LSN of commit record). Another valid
+option is to compare "killer" LSN with index page LSN because
minRecoveryPoint
+would be moved forward when the index page flushed. Also, in some cases
xid of
+"killer" is unknown - for example, tuples were cleared by XLOG_HEAP2_PRUNE.
+In that case, we compare the LSN of the heap page to index page LSN.

In terms of having room for bugs this description seems like a lot of logic
to have to get correct.

Could we just do this first pass as:

Enable recovery mode LP_DEAD hint bit updates after the first streamed
CHECKPOINT record comes over from the primary.

?

Now, maybe there aren't any real concerns here but even then breaking up
the patches into enabling the general feature in a limited way and then
ensuring that it behaves sanely during the standby crash recovery window
would likely increase the appeal and ease the burden on the potential
committer.

The proposed theory here seems sound to my inexperienced ears.  I have no
idea whether there are other bits, and/or assumptions, lurking around that
interfere with this though.

David J.

Re: Add parameter jit_warn_above_fraction

On Wed, 30 Mar 2022 at 13:20, Andres Freund  wrote:
> I wonder whether it'd make sense to combine that with awareness of a few plan
> types that can lead to large portions of child nodes never being executed. One
> the case where the current behaviour is the worst is runtime partition pruning
> in append - we compile expressions for whole subtrees that will never be
> executed. We should be much more hesitant to compile there compared to a
> cheap-ish node that we know will be executed as part of a large expensive part
> of the plan tree.

I think that's also a problem but I think that might be better fixed
another way.

There is a patch [1] around that seems to change things to compile JIT
on-demand.  I've not looked at the patch but imagine the overhead
might be kept minimal by initially setting the evalfunc to compile and
run, then set it to just run the compiled Expr for subsequent
executions. Maybe nodes below an Append/MergeAppend with run-time
pruning could compile on-demand and other nodes up-front.  Or maybe
there's no problem with making everything on-demand.

David

[1] https://commitfest.postgresql.org/37/3071/

Re: remove reset_shared()

Hi,

On Tue, Mar 29, 2022 at 03:17:02PM -0700, Nathan Bossart wrote:
> Hi hackers,
>
> Is there any reason to keep reset_shared() around anymore?  It is now just
> a wrapper function for CreateSharedMemoryAndSemaphores(), and AFAICT the
> information in the comments is already covered by comments in the shared
> memory code.  I think it's arguable that the name of the function makes it
> clear that it might recreate the shared memory, but if that is a concern,
> perhaps we could rename the function to something like
> CreateOrRecreateSharedMemoryAndSemaphores().
>
> I've attached a patch that simply removes this wrapper function.  This is
> admittedly just nitpicking, so I don't intend to carry this patch further
> if anyone is opposed.

I'm +0.5 for it, it doesn't bring much and makes things a bit harder to
understand, as you need to go through an extra function.

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

Hi,

On 2022-03-29 11:55:05 -0400, Robert Haas wrote:
> I committed v6 instead.

Just noticed that it makes initdb a bit slower / the cluster a bit bigger,
because now there's WAL traffic from creating the databases.  There's an
optimization (albeit insufficient) to reduce WAL traffic in bootstrap mode,
but not for single user mode when the CREATE DATABASEs happen.

In an optimized build, with wal-segsize 1 (the most extreme case) using
FILE_COPY vs WAL_LOG:

perf stat ~/build/postgres/dev-optimize/install/bin/initdb /tmp/initdb/ 
--wal-segsize=1
WAL_LOG:

487.58 msec task-clock#0.848 CPUs utilized
 2,874  context-switches  #5.894 K/sec
 0  cpu-migrations#0.000 /sec
10,209  page-faults   #   20.938 K/sec
 1,550,483,095  cycles#3.180 GHz
 2,537,618,094  instructions  #1.64  insn per cycle
   492,780,121  branches  #1.011 G/sec
 7,384,884  branch-misses #1.50% of all branches

   0.575213800 seconds time elapsed

   0.349812000 seconds user
   0.133225000 seconds sys

FILE_COPY:

476.54 msec task-clock#0.854 CPUs utilized
 3,005  context-switches  #6.306 K/sec
 0  cpu-migrations#0.000 /sec
10,050  page-faults   #   21.090 K/sec
 1,516,058,200  cycles#3.181 GHz
 2,504,126,907  instructions  #1.65  insn per cycle
   488,042,856  branches  #1.024 G/sec
 7,327,364  branch-misses #1.50% of all branches

   0.557934976 seconds time elapsed

   0.360473000 seconds user
   0.112109000 seconds sys


the numbers are similar if repeated.

du -s /tmp/initdb/
WAL_LOG: 35112
FILE_COPY: 29288

So it seems we should specify a strategy in initdb? It kind of makes sense -
we're not going to read anything from those database. And because of the
ringbuffer of 256kB, we'll not even reduce IO meaningfully.

- Andres

Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403)

2022-03-29 Thread Kyotaro Horiguchi

At Tue, 29 Mar 2022 13:29:15 +0200, Erik Rijkers  wrote in 
> Op 29-03-2022 om 12:50 schreef Matthias van de Meent:
> > An shorter (?) reproducer might be the following, which forces any
> > value for 'a' to be toasted and thus triggering the check in
> > init_toast_snapshot regardless of value length:
> > CREATE TABLE t (a text);
> > ALTER TABLE t ALTER COLUMN a SET STORAGE EXTERNAL;
> > INSERT INTO t VALUES ('toast');
> > BEGIN;
> > DECLARE c CURSOR FOR SELECT * FROM t;
> > FETCH ALL IN c;

Yeah, unfortunately I tried that first and saw it didn't work. And it
still doesn't for me.  With such a short text pg_detoast_datum_pakced
doesn't call detoast_attr.  Actually it is VARATT_IS_1B. (@master)

I think I'm missing something here. I'm going to examine around.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: pg_stat_reset_single_*_counters vs pg_stat_database.stats_reset

On Tue, Mar 29, 2022 at 5:56 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Tue, Mar 29, 2022 at 5:50 PM Andres Freund  wrote:
>
>> Hi,
>>
>> On 2022-03-29 17:06:24 -0700, David G. Johnston wrote:
>> > On Tue, Mar 29, 2022 at 4:43 PM Andres Freund 
>> wrote:
>> > > But more importantly, a
>> > > per-relation/function reset field wouldn't address Tomas's concern: He
>> > > wants a
>> > > single thing to check to see if any stats have been reset - and
>> that's imo
>> > > a
>> > > quite reasonable desire.
>> > >
>> >
>> > Per the original email:
>> >
>> > "Starting with the below commit, pg_stat_reset_single_function_counters,
>> > pg_stat_reset_single_table_counters don't just reset the stats for the
>> > individual function, but also set pg_stat_database.stats_reset."
>> >
>> > Thus we already have the desired behavior, it is just poorly documented.
>>
>> The problem is that it also make stats_reset useless for other purposes -
>> which I do consider a problem. Hence this thread.  My concern would be
>> mollified if I there were a separate reset timestamp counting the last
>> "database wide" reset time. Your comment about that was something about
>> relation/function level timestamps, which doesn't seem relevant.
>>
>
> I can't figure out whether you agree that as of today stats_reset is the
> "database wide" reset time.  The first sentence makes it sound like you do,
> the first one makes it sound like you don't.
>

OK, I meant the third one seems contrary, but re-reading this all again I
think I see what you are saying.

You want to add a field that only changes when "reset all stats" is
executed for a given database.  Leaving stats_reset to mean "the last time
any individual stat record changed".  I can get behind that.

David J.

Re: pg_stat_reset_single_*_counters vs pg_stat_database.stats_reset

On Tue, Mar 29, 2022 at 5:50 PM Andres Freund  wrote:

> Hi,
>
> On 2022-03-29 17:06:24 -0700, David G. Johnston wrote:
> > On Tue, Mar 29, 2022 at 4:43 PM Andres Freund 
> wrote:
> > > But more importantly, a
> > > per-relation/function reset field wouldn't address Tomas's concern: He
> > > wants a
> > > single thing to check to see if any stats have been reset - and that's
> imo
> > > a
> > > quite reasonable desire.
> > >
> >
> > Per the original email:
> >
> > "Starting with the below commit, pg_stat_reset_single_function_counters,
> > pg_stat_reset_single_table_counters don't just reset the stats for the
> > individual function, but also set pg_stat_database.stats_reset."
> >
> > Thus we already have the desired behavior, it is just poorly documented.
>
> The problem is that it also make stats_reset useless for other purposes -
> which I do consider a problem. Hence this thread.  My concern would be
> mollified if I there were a separate reset timestamp counting the last
> "database wide" reset time. Your comment about that was something about
> relation/function level timestamps, which doesn't seem relevant.
>

I can't figure out whether you agree that as of today stats_reset is the
"database wide" reset time.  The first sentence makes it sound like you do,
the first one makes it sound like you don't.

David J.

Re: [PATCH] Full support for index LP_DEAD hint bits on standby

On Tue, Mar 29, 2022 at 5:20 PM Peter Geoghegan  wrote:

> On Tue, Mar 29, 2022 at 4:55 AM Michail Nikolaev
>  wrote:
>
> I think that you could do a better job of explaining and promoting the
> problem that you're trying to solve here. Emphasis on the problem, not
> so much the solution.

As a specific recommendation here - submit patches with a complete commit
message.  Tweak it for each new version so that any prior discussion that
informed the general design of the patch is reflected in the commit message.

This doesn't solve the "experience" issue by itself but does allow someone
with interest to jump in without having to read an entire thread,
including false-starts and half-ideas, to understand what the patch is
doing, and why.  At the end of the day the patch should largely speak for
itself, and depend minimally on the discussion thread, to be understood.

David J.

Re: pg_stat_reset_single_*_counters vs pg_stat_database.stats_reset

Hi,

On 2022-03-29 17:06:24 -0700, David G. Johnston wrote:
> On Tue, Mar 29, 2022 at 4:43 PM Andres Freund  wrote:
> > But more importantly, a
> > per-relation/function reset field wouldn't address Tomas's concern: He
> > wants a
> > single thing to check to see if any stats have been reset - and that's imo
> > a
> > quite reasonable desire.
> >
> 
> Per the original email:
> 
> "Starting with the below commit, pg_stat_reset_single_function_counters,
> pg_stat_reset_single_table_counters don't just reset the stats for the
> individual function, but also set pg_stat_database.stats_reset."
> 
> Thus we already have the desired behavior, it is just poorly documented.

The problem is that it also make stats_reset useless for other purposes -
which I do consider a problem. Hence this thread.  My concern would be
mollified if I there were a separate reset timestamp counting the last
"database wide" reset time. Your comment about that was something about
relation/function level timestamps, which doesn't seem relevant.

> Now, maybe other functions aren't doing this?  If so, given these functions
> do, we probably should just change any outliers to match.

Don't think there are other functions.

Greetings,

Andres Freund

Re: pgsql: Add 'basebackup_to_shell' contrib module.

2022-03-29 Thread Thomas Munro

On Wed, Mar 30, 2022 at 11:25 AM Andres Freund  wrote:
> Didn't immediate find a reference to a cat equivalent. Maybe just gzip the
> file? That can read from stdin across platforms afaict.

. o O ( gzip | gzip -d )

Re: Temporary tables versus wraparound... again

Hi,

On 2022-03-29 19:51:26 -0400, Greg Stark wrote:
> On Mon, 28 Mar 2022 at 16:30, Andres Freund  wrote:
> >
> > > Make ON COMMIT DELETE ROWS reset relfrozenxmin and other table stats
> > > like normal truncate. Otherwise even typical short-lived transactions
> > > using temporary tables can easily cause them to reach relfrozenxid.
> >
> > Might be worth mentioning that ON COMMIT DELETE is implemented as truncating
> > tables. If we actually implemented it as deleting rows, it'd not at all be
> > correct to reset relfrozenxmin.
> 
> In the commit message or are you saying this needs documentation or a comment?

In the commit message.


> > > + pgcform->relminmxid = GetOldestMultiXactId();
> >
> > Ugh. That's pretty expensive for something now done at a much higher rate 
> > than
> > before.
> 
> This I'm really not sure about. I really don't know much about
> multixacts. I've been reading a bit but I'm not sure what to do yet.
> I'm wondering if there's something cheaper we can use. We don't need
> the oldest mxid that might be visible in a table somewhere, just the
> oldest that has a real live uncommitted transaction in it that could
> yet create new tuples in the truncated table.

> In the case of temporary tables I think we could just set it to the
> next mxid since there are no live transactions capable of inserting
> into the temporary table. But in the case of a table created in this
> transaction then that wouldn't be good enough. I think? I'm not clear
> whether existing mxids get reused for new updates if they happen to
> have the same set of locks in them as some existing mxid.

Yes, that can happen. But of course the current xid is always part of the
multixact, so it can't be a multixact from before the transaction started.

There's already a record of the oldest mxid a backend considers live, computed
on the first use of multixacts in a transaction. See
MultiXactIdSetOldestVisible(). Which I think might serve as a suitable
relminmxid of a temporary table in an already running transaction?

Greetings,

Andres Freund

Re: Temporary tables versus wraparound... again

On Tue, Mar 29, 2022 at 4:52 PM Greg Stark  wrote:

> On Mon, 28 Mar 2022 at 16:30, Andres Freund  wrote:
> >
> > > Make ON COMMIT DELETE ROWS reset relfrozenxmin and other table
> stats
> > > like normal truncate. Otherwise even typical short-lived
> transactions
> > > using temporary tables can easily cause them to reach relfrozenxid.
> >
> > Might be worth mentioning that ON COMMIT DELETE is implemented as
> truncating
> > tables. If we actually implemented it as deleting rows, it'd not at all
> be
> > correct to reset relfrozenxmin.
>
> In the commit message or are you saying this needs documentation or a
> comment?
>

Just flying by here but...

The user-facing documentation already covers this:

https://www.postgresql.org/docs/current/sql-createtable.html

"All rows in the temporary table will be deleted at the end of each
transaction block. Essentially, an automatic TRUNCATE is done at each
commit. When used on a partitioned table, this is not cascaded to its
partitions."

I'm not sure why we felt the need to add "essentially" here - but maybe
it's because we didn't "reset relfronzedenxmin and other table stats like
normal truncate."?  Or maybe just natural word flow.

Either way, maybe word it like this to avoid the need for essentially
altogether:

The temporary table will be automatically truncated at the end of each
transaction block.  However, unlike the TRUNCATE command, descendent tables
will not be cascaded to. (I'm changing partitions to descendant tables to
make a point here - the TRUNCATE command only references descendent tables,
not mentioning partitioning by name at all.  Is this desirable?)

I don't have any substantive insight into the commit message or code
comments; but it doesn't seem immediately wrong to assume the reader
understands that ON COMMIT DELETE ROWS uses something more akin to TRUNCATE
rather than DELETE since that is what the feature is documented to do.  The
commit message in particular seems like it doesn't need to teach that
point; but can do so if it makes understanding the changes easier.

David J.

Re: [PATCH] Full support for index LP_DEAD hint bits on standby

On Tue, Mar 29, 2022 at 4:55 AM Michail Nikolaev
 wrote:
> > Overall, I think that this patch has serious design flaws, and that
> > this issue is really just a symptom of a bigger problem.
>
> Could you please advise me on something? The ways I see:
> * give up :)

I would never tell anybody to give up on something like this, because
I don't really have the right to do so. And because it really isn't my
style.

> * try to fix this concept
> * go back to concept with LP_DEAD horizon WAL and optional cancellation
> * try to limit scope on “allow standby to use LP_DEAD set on primary

The simple answer is: I don't know. I could probably come up with a
better answer than that, but it would take real effort, and time.

> in some cases” (by marking something in btree page probably)
> * look for some new way

You seem like a smart guy, and I respect the effort that you have put
in already -- I really do. But I think that you have unrealistic ideas
about how to be successful with a project like this.

The reality is that the Postgres development process gives authority
to a relatively small number of committers. This is not a perfect
system, at all, but it's the best system that we have. Only a minority
of the committers are truly experienced with the areas of the code
that your patch touches -- so the number of people that are ever
likely to commit a patch like that is very small (even if the patch
was perfect). You need to convince at least one of them to do so, or
else your patch doesn't get into PostgreSQL, no matter what else may
be true. I hope that my remarks don't seem disdainful or belittling --
that is not my intention. These are just facts.

I think that you could do a better job of explaining and promoting the
problem that you're trying to solve here. Emphasis on the problem, not
so much the solution. Only a very small number of patches don't need
to be promoted. Of course I can see that the general idea has merit,
but that isn't enough. Why do *you* care about this problem so much?
The answer isn't self-evident. You have to tell us why it matters so
much.

You must understand that this whole area is *scary*. The potential for
serious data corruption bugs is very real. And because the whole area
is so complicated (it is at the intersection of 2-3 complicated
areas), we can expect those bugs to be hidden for a long time. We
might never be 100% sure that we've fixed all of them if the initial
design is not generally robust. Most patches are not like that.

--
Peter Geoghegan

Re: Add parameter jit_warn_above_fraction

Hi,

On 2022-03-30 12:41:41 +1300, David Rowley wrote:
> On Wed, 30 Mar 2022 at 12:16, Andres Freund  wrote:
> > > I did propose a patch to address this in [1]. It does need more work
> > > and I do plan to come back to it for v16.
> >
> > FWIW, that doesn't seem quite right - won't it stop JITing e.g. on the inner
> > side of a nested loop, just because it's cheap, even though that's where the
> > bulk of the benefits comes from?
>
> Yeah, I think the total cost would need to be multiplied by the number
> of times we expect to call that part of the plan.  I've not yet sat
> down to figure out if that's easy/cheap or hard/costly information to
> obtain.

I wonder whether it'd make sense to combine that with awareness of a few plan
types that can lead to large portions of child nodes never being executed. One
the case where the current behaviour is the worst is runtime partition pruning
in append - we compile expressions for whole subtrees that will never be
executed. We should be much more hesitant to compile there compared to a
cheap-ish node that we know will be executed as part of a large expensive part
of the plan tree.

- Andres

Re: pg_stat_reset_single_*_counters vs pg_stat_database.stats_reset

On Tue, Mar 29, 2022 at 4:43 PM Andres Freund  wrote:

> But more importantly, a
> per-relation/function reset field wouldn't address Tomas's concern: He
> wants a
> single thing to check to see if any stats have been reset - and that's imo
> a
> quite reasonable desire.
>

Per the original email:

"Starting with the below commit, pg_stat_reset_single_function_counters,
pg_stat_reset_single_table_counters don't just reset the stats for the
individual function, but also set pg_stat_database.stats_reset."

Thus we already have the desired behavior, it is just poorly documented.

Now, maybe other functions aren't doing this?  If so, given these functions
do, we probably should just change any outliers to match.

I'm reading Tomas's comments as basically a defense of the status quo, at
least so far as the field goes.  He didn't comment on the idea of "drop the
reset_[relation|function]_counters(...)" functions.  Combined, I take that
as supporting the entire status quo: leaving the function and fields
as-is.  I'm inclined to do the same.  I don't see any real benefit to
change here as there is no user demand for it and the field change proposal
is to change only one of the at least three locations that should be
changed if we want to have a consistent design.  And we aren't getting user
reports saying the presence of the functions is a problem (confusion or
otherwise) either, so unless there is a technical reason writing these
functions in the new system is undesirable we have no justification that I
can see for removing the long-standing feature.

David J.

Re: [PATCH] Expose port->authn_id to extensions and triggers

Hi,

On 2022-03-29 23:38:29 +, Jacob Champion wrote:
> On Sat, 2022-03-26 at 11:36 -0700, Andres Freund wrote:
> > > I also note that exposing it as a GUC means we have zero control over
> > > who/what can read it.  Maybe that's not a problem, but it needs to be
> > > thought about before we go down that path.
> > 
> > Yes, I think that's a fair concern.
> 
> I like that there's no builtin way, today, for a superuser to modify
> the internal value; it strengthens the use as an auditing tool. Moving
> this to a PGC_SU_BACKEND GUC seems to weaken that. And it looks like
> PGC_INTERNAL is skipped during the serialization, so if we chose that
> option, we'd need to write new code anyway?

It'd be pretty simple to change can_skip_gucvar()'s selection of what to
sync. E.g. an additional flag like GUC_PARALLEL_SYNCHRONIZE.

I'm not convinced that a GUC is the answer, to be clear.

> We'd also need to guess whether the GUC system's serialization of NULL
> as an empty string is likely to cause problems for any future auth
> methods.

You can't represent a NULL in a postgres 'text' datum, independent of
parallelism. So the current definition of pg_session_authn_id() already
precludes that (and set_authn_id() and ...).  Honestly, I can't see a reason
why we should ever allow authn_id to contain a NULL byte.

Greetings,

Andres Freund

Re: Temporary tables versus wraparound... again

2022-03-29 Thread Greg Stark

On Mon, 28 Mar 2022 at 16:30, Andres Freund  wrote:
>
> > Make ON COMMIT DELETE ROWS reset relfrozenxmin and other table stats
> > like normal truncate. Otherwise even typical short-lived transactions
> > using temporary tables can easily cause them to reach relfrozenxid.
>
> Might be worth mentioning that ON COMMIT DELETE is implemented as truncating
> tables. If we actually implemented it as deleting rows, it'd not at all be
> correct to reset relfrozenxmin.

In the commit message or are you saying this needs documentation or a comment?

> > Also add warnings when old temporary tables are found to still be in
> > use during autovacuum. Long lived sessions using temporary tables are
> > required to vacuum them themselves.
>
> I'd do that in a separate patch.

Hm, seems a bit small but sure no problem, I'll split it out.

> > + pgcform->relfrozenxid = RecentXmin;
>
> Hm. Is RecentXmin guaranteed to be valid at this point?

I mentioned the same worry. But ok, I just looked into it and it's
definitely not a problem. We only do truncates after either a user
issued TRUNCATE when the table was created in the same transaction or
at commit iff a flag is set indicating temporary tables have been
used. Either way a snapshot has been taken. I've added some comments
and an assertion and I think if assertions are disabled and this
impossible condition is hit we can just skip the stats reset.

> > + pgcform->relminmxid = GetOldestMultiXactId();
>
> Ugh. That's pretty expensive for something now done at a much higher rate than
> before.

This I'm really not sure about. I really don't know much about
multixacts. I've been reading a bit but I'm not sure what to do yet.
I'm wondering if there's something cheaper we can use. We don't need
the oldest mxid that might be visible in a table somewhere, just the
oldest that has a real live uncommitted transaction in it that could
yet create new tuples in the truncated table.

In the case of temporary tables I think we could just set it to the
next mxid since there are no live transactions capable of inserting
into the temporary table. But in the case of a table created in this
transaction then that wouldn't be good enough. I think? I'm not clear
whether existing mxids get reused for new updates if they happen to
have the same set of locks in them as some existing mxid.

> we put else if on a separate line from }. And { also is always on a separate
> line.

Sorry, old habits...

Incidentally in doing the above I noticed an actual bug :( The
toast reset had the wrong relid in it. I'll add the toast table to the
test too.

--
greg

Re: Add parameter jit_warn_above_fraction

David Rowley  writes:
> On Wed, 30 Mar 2022 at 12:16, Andres Freund  wrote:
>> FWIW, that doesn't seem quite right - won't it stop JITing e.g. on the inner
>> side of a nested loop, just because it's cheap, even though that's where the
>> bulk of the benefits comes from?

> Yeah, I think the total cost would need to be multiplied by the number
> of times we expect to call that part of the plan.  I've not yet sat
> down to figure out if that's easy/cheap or hard/costly information to
> obtain.

As long as you don't need the info till the end of planning, it should be
reasonably simple to determine.  I'm not sure if we actually record the
expected number of loops in the plan tree today, but the costing
mechanisms certainly estimate that along the way, so we could store it
if need be.

regards, tom lane

Re: pg_stat_reset_single_*_counters vs pg_stat_database.stats_reset

Hi,

On 2022-03-29 14:14:05 -0700, David G. Johnston wrote:
> On Tue, Mar 29, 2022 at 1:37 PM Andres Freund  wrote:
> 
> > > Secondly, to do anything really meaningful you need to calculate deltas,
> > > and be able to detect if some of the stats were reset for the particular
> > > interval. And the stat_reset timestamp was designed to be a simple way
> > > to detect that (instead of having to inspect all individual timestamps).
> >
> > I wonder if we should just split that per-database timestamp into two. One
> > about the pg_stat_database contents, one about per-database stats? That
> > doesn't have the same memory-usage-increase concerns as adding
> > per-table/function reset stats.
> >
> >
> That seems like only half a solution.  The reasoning for doing such a split
> for pg_stat_database is identical to the reason that new fields should be
> added to pg_stat_all_tables and pg_stat_user_functions (and possibly
> others).

Not really IMO. There's obviously the space usage aspect - there's always
fewer pg_stat_database stats than relation stats. But more importantly, a
per-relation/function reset field wouldn't address Tomas's concern: He wants a
single thing to check to see if any stats have been reset - and that's imo a
quite reasonable desire.


> pg_stat_all_tables already has 16 bigint fields, 4 timestamptz fields, 2
> names and an oid.  Seems like one more timestamptz field is a marginal
> increase whose presence lets us keep the already implemented per-table
> reset mechanism.  We should at least measure the impact that adding the
> field has before deciding its presence is too costly.

Because of the desire for a single place to check whether there has been a
reset within a database, that's imo an orthogonal debate.

Greetings,

Andres Freund

Re: Add parameter jit_warn_above_fraction

On Wed, 30 Mar 2022 at 12:16, Andres Freund  wrote:
> > I did propose a patch to address this in [1]. It does need more work
> > and I do plan to come back to it for v16.
>
> FWIW, that doesn't seem quite right - won't it stop JITing e.g. on the inner
> side of a nested loop, just because it's cheap, even though that's where the
> bulk of the benefits comes from?

Yeah, I think the total cost would need to be multiplied by the number
of times we expect to call that part of the plan.  I've not yet sat
down to figure out if that's easy/cheap or hard/costly information to
obtain.

David

Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-03-29 Thread Jacob Champion

On Sat, 2022-03-26 at 11:36 -0700, Andres Freund wrote:
> > I also note that exposing it as a GUC means we have zero control over
> > who/what can read it.  Maybe that's not a problem, but it needs to be
> > thought about before we go down that path.
> 
> Yes, I think that's a fair concern.

I like that there's no builtin way, today, for a superuser to modify
the internal value; it strengthens the use as an auditing tool. Moving
this to a PGC_SU_BACKEND GUC seems to weaken that. And it looks like
PGC_INTERNAL is skipped during the serialization, so if we chose that
option, we'd need to write new code anyway?

We'd also need to guess whether the GUC system's serialization of NULL
as an empty string is likely to cause problems for any future auth
methods. My guess is "no", to be honest, but I do like maintaining the
distinction -- it feels safer.

v8 rebases over the recent SSL changes to get the cfbot green again.

Thanks,
--Jacob
commit bd02c608e3053217056464a31dff49344ca3a5f3
Author: Jacob Champion 
Date:   Tue Mar 29 16:26:52 2022 -0700

fixup! Add API to retrieve authn_id from SQL

diff --git a/src/test/ssl/t/001_ssltests.pl b/src/test/ssl/t/001_ssltests.pl
index ac2848b931..2b6947fee5 100644
--- a/src/test/ssl/t/001_ssltests.pl
+++ b/src/test/ssl/t/001_ssltests.pl
@@ -428,7 +428,7 @@ $node->connect_ok(
 # Sanity-check pg_session_authn_id() for long ID strings
 my $res = $node->safe_psql('postgres',
"SELECT pg_session_authn_id();",
-   connstr => "$dn_connstr user=ssltestuser sslcert=ssl/client-dn.crt 
sslkey=$key{'client-dn.key'}",
+   connstr => "$dn_connstr user=ssltestuser sslcert=ssl/client-dn.crt " . 
sslkey('client-dn.key'),
 );
 is($res, "CN=ssltestuser-dn,OU=Testing,OU=Engineering,O=PGDG", "users with 
cert authentication have entire DN as authn_id");
 
From 34e590e13240bde6a5daa0e3f866b56ad7a4c2f9 Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Mon, 14 Feb 2022 08:10:53 -0800
Subject: [PATCH v8 1/3] Add API to retrieve authn_id from SQL

The authn_id field in MyProcPort is currently only accessible to the
backend itself.  Add a SQL function, pg_session_authn_id(), to expose
the field to triggers that may want to make use of it.
---
 doc/src/sgml/func.sgml| 26 +++
 src/backend/utils/adt/name.c  | 12 ++-
 src/include/catalog/pg_proc.dat   |  3 +++
 src/test/authentication/t/001_password.pl | 11 ++
 src/test/ssl/t/001_ssltests.pl|  7 ++
 5 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 3a9d62b408..454c15fde4 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -22280,6 +22280,32 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n);

   
 
+  
+   
+
+ pg_session_authn_id
+
+pg_session_authn_id ()
+text
+   
+   
+Returns the authenticated identity for the current connection, or
+NULL if the user has not been authenticated.
+   
+   
+The authenticated identity is an immutable identifier for the user
+presented during the connection handshake; the exact format depends on
+the authentication method in use. (For example, when using the
+scram-sha-256 auth method, the authenticated identity
+is simply the username. When using the cert auth
+method, the authenticated identity is the Distinguished Name of the
+client certificate.) Even for auth methods which use the username as
+the authenticated identity, this function differs from
+session_user in that its return value cannot be
+changed after login.
+   
+  
+
   

 
diff --git a/src/backend/utils/adt/name.c b/src/backend/utils/adt/name.c
index e8bba3670c..662a7943ed 100644
--- a/src/backend/utils/adt/name.c
+++ b/src/backend/utils/adt/name.c
@@ -23,6 +23,7 @@
 #include "catalog/namespace.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
+#include "libpq/libpq-be.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
@@ -257,7 +258,7 @@ namestrcmp(Name name, const char *str)
 
 
 /*
- * SQL-functions CURRENT_USER, SESSION_USER
+ * SQL-functions CURRENT_USER, SESSION_USER, PG_SESSION_AUTHN_ID
  */
 Datum
 current_user(PG_FUNCTION_ARGS)
@@ -271,6 +272,15 @@ session_user(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(DirectFunctionCall1(namein, CStringGetDatum(GetUserNameFromId(GetSessionUserId(), false;
 }
 
+Datum
+pg_session_authn_id(PG_FUNCTION_ARGS)
+{
+	if (!MyProcPort || !MyProcPort->authn_id)
+		PG_RETURN_NULL();
+
+	PG_RETURN_TEXT_P(cstring_to_text(MyProcPort->authn_id));
+}
+
 
 /*
  * SQL-functions CURRENT_SCHEMA, CURRENT_SCHEMAS
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index deb00307f6..27ab913402 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/

Re: Add parameter jit_warn_above_fraction

Hi,

On 2022-03-30 09:05:44 +1300, David Rowley wrote:
> I really believe that the main problem here is that JIT only enables
> when the *total* plan cost reaches a certain threshold.

Yes, that is/was a clear design mistake. It wasn't quite as bad back when it
was written - partitioning blows the problem up by an order of magnitude or
three. . The costing really needs to at least multiply the number of
to-be-compiled expressions with some cost to decide whether the cost of JITing
is worth it, rather than making "flat" decision.

> I did propose a patch to address this in [1]. It does need more work
> and I do plan to come back to it for v16.

FWIW, that doesn't seem quite right - won't it stop JITing e.g. on the inner
side of a nested loop, just because it's cheap, even though that's where the
bulk of the benefits comes from?

Greetings,

Andres Freund

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-03-29 Thread Nathan Bossart

Thanks for taking a look!

On Thu, Mar 24, 2022 at 01:17:01PM +1300, Thomas Munro wrote:
>  /* we're only handling directories here, skip if it's not ours */
> -if (lstat(path, &statbuf) == 0 && !S_ISDIR(statbuf.st_mode))
> +if (lstat(path, &statbuf) != 0)
> +ereport(ERROR,
> +(errcode_for_file_access(),
> + errmsg("could not stat file \"%s\": %m", path)));
> +else if (!S_ISDIR(statbuf.st_mode))
>  return;
> 
> Why is this a good place to silently ignore non-directories?
> StartupReorderBuffer() is already in charge of skipping random
> detritus found in the directory, so would it be better to do "if
> (get_dirent_type(...) != PGFILETYPE_DIR) continue" there, and then
> drop the lstat() stanza from ReorderBufferCleanupSeralizedTXNs()
> completely?  Then perhaps its ReadDirExtended() shoud be using ERROR
> instead of INFO, so that missing/non-dir/b0rked directories raise an
> error.

My guess is that this was done because ReorderBufferCleanupSerializedTXNs()
is also called from ReorderBufferAllocate() and ReorderBufferFree().
However, it is odd that we just silently return if the slot path isn't a
directory in those cases.  I think we could use get_dirent_type() in
StartupReorderBuffer() as you suggested, and then we could let ReadDir()
ERROR for non-directories for the other callers of
ReorderBufferCleanupSerializedTXNs().  WDYT?

> I don't understand why it's reporting readdir() errors at INFO
> but unlink() errors at ERROR, and as far as I can see the other paths
> that reach this code shouldn't be sending in paths to non-directories
> here unless something is seriously busted and that's ERROR-worthy.

I agree.  I'll switch it to ReadDir() in the next revision so that we ERROR
instead of INFO.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Frontend error logging style

Daniel Gustafsson  writes:
> On 29 Mar 2022, at 16:38, Peter Eisentraut 
>  wrote:
>> Most of these should probably be addressed separately from Tom's patch.

> Yeah, I think so too.

Agreed.  I tried to confine my patch to mechanical changes, except
for changing places where the detail/hint features could be used.
(I don't say that I yielded to temptation nowhere, because I don't
recall that for certain; but editing the message texts was not the
point of my patch.)

Feel free to work on a followup editing patch though.

Another thing that occurred to me as I looked at your list is that
I think a lot of these are nearly-can't-happen cases that we didn't
put a lot of effort into devising great error messages for.  Maybe
we should continue that approach, and in particular not put effort
into translating such messages.  That would suggest inventing
"pg_fatal_internal" and so forth, with the same functionality
except for not being gettext triggers, comparable to
errmsg_internal.  However, then somebody would have to make
judgment calls about which messages not to bother translating.
Do we want to go down that path?

regards, tom lane

Re: pgsql: Add 'basebackup_to_shell' contrib module.

2022-03-29 Thread Andrew Dunstan



On 3/29/22 17:19, Robert Haas wrote:
> On Tue, Mar 29, 2022 at 4:36 PM Thomas Munro  wrote:
>> It failed:
>>
>> https://cirrus-ci.com/task/5567070686412800
>> https://api.cirrus-ci.com/v1/artifact/task/5567070686412800/log/contrib/basebackup_to_shell/tmp_check/log/001_basic_primary.log
>> https://api.cirrus-ci.com/v1/artifact/task/5567070686412800/log/contrib/basebackup_to_shell/tmp_check/log/regress_log_001_basic
> Hmm. The log here is not very informative. It just says that the first
> time we tried to use the 'shell' target, it timed out. I suppose the
> most obvious explanation for that is that the shell command we
> executed timed out:
>
> qq{type con > "$escaped_backup_path%f"}
>
> But why should that be so? Does 'type con' not respond to EOF? I don't
> see how that can be the case. Is our implementation of pclose broken?
> If so, then I think COPY TO/FROM PROGRAM would be broken on Windows.
>

AIUI 'type con' is not the equivalent of Unix cat, especially w.r.t.
stdin. It's going to try to read from the console, not from stdin. It's
more the equivalent of 'cat /dev/tty'. So it's not at all surprising
that it hangs. I don't know of a Windows builtin that is equivalent to cat.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: pgsql: Add 'basebackup_to_shell' contrib module.

Hi,

On 2022-03-29 17:19:44 -0400, Robert Haas wrote:
> On Tue, Mar 29, 2022 at 4:36 PM Thomas Munro  wrote:
> > It failed:
> >
> > https://cirrus-ci.com/task/5567070686412800
> > https://api.cirrus-ci.com/v1/artifact/task/5567070686412800/log/contrib/basebackup_to_shell/tmp_check/log/001_basic_primary.log
> > https://api.cirrus-ci.com/v1/artifact/task/5567070686412800/log/contrib/basebackup_to_shell/tmp_check/log/regress_log_001_basic
> 
> Hmm. The log here is not very informative. It just says that the first
> time we tried to use the 'shell' target, it timed out. I suppose the
> most obvious explanation for that is that the shell command we
> executed timed out:
> 
> qq{type con > "$escaped_backup_path%f"}
> 
> But why should that be so? Does 'type con' not respond to EOF?

This is trying to write stdin into a file? I think the problem may be that con
doesn't represent stdin, it it's console input. I think consoles are a
separate thing from stdin on windows - you can have console input, even while
stdin is coming from a file or such.

Didn't immediate find a reference to a cat equivalent. Maybe just gzip the
file? That can read from stdin across platforms afaict.

Greetings,

Andres Freund

Re: Add parameter jit_warn_above_fraction

On Tue, Mar 29, 2022 at 3:04 PM Tom Lane  wrote:
> I think David's questions are sufficiently cogent and difficult
> that we should not add jit_warn_above_fraction at this time.

+1

-- 
Peter Geoghegan

Re: Frontend error logging style

2022-03-29 Thread Daniel Gustafsson

> On 29 Mar 2022, at 16:38, Peter Eisentraut 
>  wrote:
> 
> On 29.03.22 12:13, Daniel Gustafsson wrote:
> 
> Most of these should probably be addressed separately from Tom's patch.

Yeah, I think so too.  I'll await input from Tom on how he wants to do, but I'm
happy to take these to a new thread.  The main point of the review was to find
logic errors in the logging changes, these came as a bonus from reading them
all in one place.

>>> @@ -508,24 +502,15 @@ writefile(char *path, char **lines)
>>> if (fclose(out_file))
>>> -   {
>>> -   pg_log_error("could not write file \"%s\": %m", path);
>>> -   exit(1);
>>> -   }
>>> +   pg_fatal("could not write file \"%s\": %m", path);
>>> }
>> Should we update this message to differentiate it from the fwrite error case?
>> Something like "an error occurred during writing.."
> 
> Should be "could not close ...", no?

Yes, it should, not sure what I was thinking.

>>> @@ -2057,10 +2004,7 @@ check_locale_name(int category, const char *locale, 
>>> char **canonname)
>>> 
>>> save = setlocale(category, NULL);
>>> if (!save)
>>> -   {
>>> -   pg_log_error("setlocale() failed");
>>> -   exit(1);
>>> -   }
>>> +   pg_fatal("setlocale() failed");
>> Should this gain a hint message for those users who have no idea what this
>> really means?
> 
> My setlocale() man page says:
> 
> ERRORS
> No errors are defined.
> 
> So uh ... ;-)

Even more reason to be confused then =) We have a mixed bag in the tree on how
we handle errors from setlocale, in this case we could reword it to say
something like "failed to retrieve locale for %s, category" which IMO would be
slightly more informative. Might be academic though since I guess it rarely, if
ever, happens.

>>> --- a/src/bin/pg_basebackup/receivelog.c
>>> +++ b/src/bin/pg_basebackup/receivelog.c
>>> @@ -140,7 +140,7 @@ open_walfile(StreamCtl *stream, XLogRecPtr startpoint)
>>> /* fsync file in case of a previous crash */
>>> if (stream->walmethod->sync(f) != 0)
>>> {
>>> -   pg_log_fatal("could not fsync existing 
>>> write-ahead log file \"%s\": %s",
>>> +   pg_log_error("could not fsync existing 
>>> write-ahead log file \"%s\": %s",
>>>  fn, 
>>> stream->walmethod->getlasterror());
>>> stream->walmethod->close(f, CLOSE_UNLINK);
>>> exit(1);
>> In the case where we already have IO related errors, couldn't the close() 
>> call
>> cause an additional error message which potentially could be helpful for
>> debugging?
> 
> Yeah, I think in general we have been sloppy with reporting file-closing 
> errors properly.  Those presumably happen very rarely, but when they do, 
> things are probably very bad.

Agreed.  I'm not sure if Tom wants to add net new loggings in this patchset,
else we could do this separately.

>> The ngettext() call seems a bit out of place here since we already know that
>> second form will be taken given the check on PQntuples(res).
> 
> See 
> 
>  for a similar case that explains why this is still correct and necessary.

Doh, I knew that, sorry.

>>> }
>>> +   pg_fatal("cannot cluster specific table(s) in all 
>>> databases");
>> An ngettext() candidate perhaps? There are more like this in main() hunks 
>> further down omitted for brevity here.
> 
> I would just rephrase this as
> 
>"cannot cluster specific tables in all databases"
> 
> This is still correct and sensible if the user specified just one table.

That's one way yes.  Mostly I'm just a bit allergic to the "foo(s)" which my
unscientifically based gut feeling am concerned about not necessarily always
working well for translations.

--
Daniel Gustafsson   https://vmware.com/

remove reset_shared()

2022-03-29 Thread Nathan Bossart

Hi hackers,

Is there any reason to keep reset_shared() around anymore?  It is now just
a wrapper function for CreateSharedMemoryAndSemaphores(), and AFAICT the
information in the comments is already covered by comments in the shared
memory code.  I think it's arguable that the name of the function makes it
clear that it might recreate the shared memory, but if that is a concern,
perhaps we could rename the function to something like
CreateOrRecreateSharedMemoryAndSemaphores().

I've attached a patch that simply removes this wrapper function.  This is
admittedly just nitpicking, so I don't intend to carry this patch further
if anyone is opposed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 295e1e93a2ff4e655e85040d931c0d332d14b5bd Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 29 Mar 2022 14:56:47 -0700
Subject: [PATCH v1 1/1] Remove reset_shared().

This is now just a wrapper for CreateSharedMemoryAndSemaphores(),
and the information in the comments is already covered by comments
in the shared memory code.
---
 src/backend/postmaster/postmaster.c | 22 ++
 1 file changed, 2 insertions(+), 20 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 80bb269599..f62d12e74a 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -399,7 +399,6 @@ static void getInstallationPaths(const char *argv0);
 static void checkControlFile(void);
 static Port *ConnCreate(int serverFd);
 static void ConnFree(Port *port);
-static void reset_shared(void);
 static void SIGHUP_handler(SIGNAL_ARGS);
 static void pmdie(SIGNAL_ARGS);
 static void reaper(SIGNAL_ARGS);
@@ -1072,7 +1071,7 @@ PostmasterMain(int argc, char *argv[])
 	/*
 	 * Set up shared memory and semaphores.
 	 */
-	reset_shared();
+	CreateSharedMemoryAndSemaphores();
 
 	/*
 	 * Estimate number of openable files.  This must happen after setting up
@@ -2736,23 +2735,6 @@ InitProcessGlobals(void)
 }
 
 
-/*
- * reset_shared -- reset shared memory and semaphores
- */
-static void
-reset_shared(void)
-{
-	/*
-	 * Create or re-create shared memory and semaphores.
-	 *
-	 * Note: in each "cycle of life" we will normally assign the same IPC keys
-	 * (if using SysV shmem and/or semas).  This helps ensure that we will
-	 * clean up dead IPC objects if the postmaster crashes and is restarted.
-	 */
-	CreateSharedMemoryAndSemaphores();
-}
-
-
 /*
  * SIGHUP -- reread config files, and tell children to do same
  */
@@ -4109,7 +4091,7 @@ PostmasterStateMachine(void)
 		/* re-read control file into local memory */
 		LocalProcessControlFile(true);
 
-		reset_shared();
+		CreateSharedMemoryAndSemaphores();
 
 		StartupPID = StartupDataBase();
 		Assert(StartupPID != 0);
-- 
2.25.1

Re: Window Function "Run Conditions"

Hi,

On 2022-03-29 15:11:52 +1300, David Rowley wrote:
> One thing which I'm not sure about with the patch is how I'm handling
> the evaluation of the runcondition in nodeWindowAgg.c.  Instead of
> having ExecQual() evaluate an OpExpr such as "row_number() over (...)
> <= 10", I'm replacing the WindowFunc with the Var in the targetlist
> that corresponds to the given WindowFunc.  This saves having to double
> evaluate the WindowFunc. Instead, the value of the Var can be taken
> directly from the slot.  I don't know of anywhere else we do things
> quite like that.  The runcondition is slightly similar to HAVING
> clauses, but HAVING clauses don't work this way.

Don't HAVING clauses actually work pretty similar? Yes, they don't have a Var,
but for expression evaluation purposes an Aggref is nearly the same as a plain
Var:

EEO_CASE(EEOP_INNER_VAR)
{
int attnum = op->d.var.attnum;

/*
 * Since we already extracted all referenced columns from the
 * tuple with a FETCHSOME step, we can just grab the value
 * directly out of the slot's decomposed-data arrays.  But let's
 * have an Assert to check that that did happen.
 */
Assert(attnum >= 0 && attnum < innerslot->tts_nvalid);
*op->resvalue = innerslot->tts_values[attnum];
*op->resnull = innerslot->tts_isnull[attnum];

EEO_NEXT();
}
vs
EEO_CASE(EEOP_AGGREF)
{
/*
 * Returns a Datum whose value is the precomputed aggregate value
 * found in the given expression context.
 */
int aggno = op->d.aggref.aggno;

Assert(econtext->ecxt_aggvalues != NULL);

*op->resvalue = econtext->ecxt_aggvalues[aggno];
*op->resnull = econtext->ecxt_aggnulls[aggno];

EEO_NEXT();
}

specifically we don't re-evaluate expressions?

This is afaics slightly cheaper than referencing a variable in a slot.

Greetings,

Andres Freund

Re: Add parameter jit_warn_above_fraction

Peter Geoghegan  writes:
> On Tue, Mar 29, 2022 at 1:06 PM David Rowley  wrote:
>> That means that even if the total execution time of a plan was a true
>> reflection of the total estimated plan cost, then the fraction of time
>> spent (as is measured by jit_warn_above_fraction) doing JIT would
>> entirely depend on the number of expressions to compile.   Of course,
>> the planner's not that good, but does that not indicate that the JIT
>> costing should really account for the number of expressions and not
>> just the total plan cost?

> That's a good point.

I think David's questions are sufficiently cogent and difficult
that we should not add jit_warn_above_fraction at this time.
Maybe we'll eventually decide it's the best we can do; but I do
not think we have a problem there that's so pressing that we need
to rush out a partially-baked bandaid.  Especially not at the
tail end of the development cycle, with not much time to gain
experience with it before we ship.

regards, tom lane

Re: Extend compatibility of PostgreSQL::Test::Cluster

2022-03-29 Thread Andrew Dunstan



On 1/21/22 09:59, Andrew Dunstan wrote:
> On 1/21/22 02:47, Michael Paquier wrote:
>> On Tue, Jan 18, 2022 at 06:35:39PM -0500, Andrew Dunstan wrote:
>>> Here's a version that does that and removes some recent bitrot.
>> I have been looking at the full set of features of Cluster.pm and the
>> requirements behind v10 as minimal version supported, and nothing
>> really stands out.
>>
>> +   # old versions of walreceiver just set the application name to
>> +   # `walreceiver'
>>
>> Perhaps this should mention to which older versions this sentence
>> applies?
>
>
> Will do in the next version. FTR it's versions older than 12.
>
>

I'm not sure why this item has been moved to the next CF without any
discussion I could see on the mailing list. It was always my intention
to commit it this time, and I propose to do so tomorrow with the comment
Michael has requested above. The cfbot is still happy with it.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: pgsql: Add 'basebackup_to_shell' contrib module.

On Tue, Mar 29, 2022 at 4:36 PM Thomas Munro  wrote:
> It failed:
>
> https://cirrus-ci.com/task/5567070686412800
> https://api.cirrus-ci.com/v1/artifact/task/5567070686412800/log/contrib/basebackup_to_shell/tmp_check/log/001_basic_primary.log
> https://api.cirrus-ci.com/v1/artifact/task/5567070686412800/log/contrib/basebackup_to_shell/tmp_check/log/regress_log_001_basic

Hmm. The log here is not very informative. It just says that the first
time we tried to use the 'shell' target, it timed out. I suppose the
most obvious explanation for that is that the shell command we
executed timed out:

qq{type con > "$escaped_backup_path%f"}

But why should that be so? Does 'type con' not respond to EOF? I don't
see how that can be the case. Is our implementation of pclose broken?
If so, then I think COPY TO/FROM PROGRAM would be broken on Windows.

Any ideas?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_stat_reset_single_*_counters vs pg_stat_database.stats_reset

On Tue, Mar 29, 2022 at 1:37 PM Andres Freund  wrote:

> > Secondly, to do anything really meaningful you need to calculate deltas,
> > and be able to detect if some of the stats were reset for the particular
> > interval. And the stat_reset timestamp was designed to be a simple way
> > to detect that (instead of having to inspect all individual timestamps).
>
> I wonder if we should just split that per-database timestamp into two. One
> about the pg_stat_database contents, one about per-database stats? That
> doesn't have the same memory-usage-increase concerns as adding
> per-table/function reset stats.
>
>
That seems like only half a solution.  The reasoning for doing such a split
for pg_stat_database is identical to the reason that new fields should be
added to pg_stat_all_tables and pg_stat_user_functions (and possibly
others).

pg_stat_all_tables already has 16 bigint fields, 4 timestamptz fields, 2
names and an oid.  Seems like one more timestamptz field is a marginal
increase whose presence lets us keep the already implemented per-table
reset mechanism.  We should at least measure the impact that adding the
field has before deciding its presence is too costly.

But then, I'm going on design theory here, I don't presently have a horse
in this race.  And the fact no one has called us on this deficiency (not
that I've really been looking) does suggest the status quo is at least
realistic to maintain.  But on that basis I would just leave
pg_stat_database alone with its single field.  And then explain that the
single field covers everything, including the database statistics.  And so
while it is possible to reset a subset of the statistics the field really
loses its usefulness when that is done because the reset timestamp only
applies to a subset.  It only regains its meaning if/when one performs a
full stats reset.

David J.

Re: Commitfest Update

On Tue, Mar 29, 2022 at 4:47 PM Justin Pryzby  wrote:
> If it were me, I'd move these out of the way somehow; WOA/RWF or move to June:
>
> https://commitfest.postgresql.org/37/3291/ Add PGDLLIMPORT to all direct or 
> indirect GUC variables

I plan to do this yet, but it seemed best to leave it until the end to
avoid creating as many merge conflicts.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Higher level questions around shared memory stats

Hi,

On 2022-03-29 16:24:02 -0400, Robert Haas wrote:
> On Tue, Mar 29, 2022 at 3:17 PM Andres Freund  wrote:
> > 1) How to react to corrupted statsfiles?
> >
> > IMO it'd make more sense to just throw away all stats
> > in that case.
>
> That seems reasonable to me. I think there's some downside, in that
> stats are important, and having some of them might be better than
> having none of them. On the other hand, stats file corruption should
> be very rare, and if it's not, we need to fix that.

I think it's reasonably rare because in cases there'd be corruption, we'd
typically not even have written them out / throw them away explicitly - we
only read stats when starting without crash recovery.

So the "expected" case of corruption afaicts solely is a OS crash just after
the shutdown checkpoint completed?

> I think what's actually most important here is the error reporting. We need
> to make it clear, at least via log messages, that something bad has
> happened.

The message currently (on HEAD, but similarly on the path) is:
ereport(pgStatRunningInCollector ? LOG : 
WARNING,
(errmsg("corrupted statistics 
file \"%s\"",
statfile)));

> And maybe we should have, inside the stats system, something that
> keeps track of when the stats file was last recreated from scratch because
> of a corruption event, separately from when it was last intentionally reset.

That would be easy to add. Don't think we have a good place to show the
information right now - perhaps just new functions not part of any view?

I can think of these different times:

- Last time stats were removed due to starting up in crash recovery
- Last time stats were created from scratch, because no stats data file was
  present at startup
- Last time stats were thrown away due to corruption
- Last time a subset of stats were reset using one of the pg_reset* functions

Makes sense?

> > 2) What do we want to do with stats when starting up in recovery?
> >
> > Today we throw out stats whenever crash recovery is needed. Which includes
> > starting up a replica DB_SHUTDOWNED_IN_RECOVERY.
> >
> > The shared memory stats patch loads the stats in the startup process 
> > (whether
> > there's recovery or not). It's obviously no problem to just mirror the 
> > current
> > behaviour, we just need to decide what we want?
> >
> > The reason we throw stats away before recovery seem to originate in concerns
> > around old stats being confused for new stats. Of course, "now" that we have
> > streaming replication, throwing away stats *before* recovery, doesn't 
> > provide
> > any sort of protection against that. In fact, currently there's no automatic
> > mechanism whatsoever to get rid of stats for dropped objects on a standby.
>
> Does redo update the stats?

With "update" do you mean generate new stats? In the shared memory stats patch
it triggers stats to be dropped, on HEAD it just resets all stats at startup.

Redo itself doesn't generate stats, but bgwriter, checkpointer, backends do.

Thanks,

Andres

Re: Commitfest Update

2022-03-29 Thread Justin Pryzby

On Thu, Mar 17, 2022 at 02:07:16PM -0400, Greg Stark wrote:
> So far this commitfest these 10 patches have been marked committed.
> That leaves us with 175 "Needs Review" and 28 "Ready for Comitter" so
> quite a ways to go ...

If it were me, I'd move these out of the way somehow; WOA/RWF or move to June:

https://commitfest.postgresql.org/37/3291/ Add PGDLLIMPORT to all direct or 
indirect GUC variables
https://commitfest.postgresql.org/37/3568/ PQexecParams binary handling example 
for REAL data type

Is anyone claiming these ?

https://commitfest.postgresql.org/37/3142/ Logging plan of the currently 
running query
https://commitfest.postgresql.org/37/3298/ Showing I/O timings spent 
reading/writing temp buffers in EXPLAIN
https://commitfest.postgresql.org/37/3050/ Extended statistics in EXPLAIN
https://commitfest.postgresql.org/37/3508/ Avoid smgrimmedsync() during index 
build and add unbuffered IO API

-- 
Justin

Re: pg_stat_reset_single_*_counters vs pg_stat_database.stats_reset

Hi,

On 2022-03-24 13:12:24 +0100, Tomas Vondra wrote:
> I agree it should have been documented, but I still stand behind the
> current behavior. I'm not willing to die on this hill, but I think the
> reasoning was/is sound.
> 
> Firstly, calculating transactions/second, reads/second just from by
> looking at pg_stat_database data (counter and stat_reset) is nonsense.
> It might work for short time periods, but for anything longer it's bound
> to give you bogus results - you don't even know if the system was
> running at all, and so on.

It's not that you'd use it as the sole means of determining the time
delta. But using it to see if stats were reset between two samples of
pg_stat_database imo makes plenty sense.

> Secondly, to do anything really meaningful you need to calculate deltas,
> and be able to detect if some of the stats were reset for the particular
> interval. And the stat_reset timestamp was designed to be a simple way
> to detect that (instead of having to inspect all individual timestamps).

I wonder if we should just split that per-database timestamp into two. One
about the pg_stat_database contents, one about per-database stats? That
doesn't have the same memory-usage-increase concerns as adding
per-table/function reset stats.

Greetings,

Andres Freund

Re: Add parameter jit_warn_above_fraction

On Tue, Mar 29, 2022 at 1:06 PM David Rowley  wrote:
> That means that even if the total execution time of a plan was a true
> reflection of the total estimated plan cost, then the fraction of time
> spent (as is measured by jit_warn_above_fraction) doing JIT would
> entirely depend on the number of expressions to compile.   Of course,
> the planner's not that good, but does that not indicate that the JIT
> costing should really account for the number of expressions and not
> just the total plan cost?

That's a good point. The difference between the actual cost of
executing the query with and without JIT'ing is what we care about,
for the most part. Maybe we could do a lot better just by inventing a
fairly crude model that captures the benefits of JIT'ing -- that's
what "counting the number of expressions" sounds like to me. This
model could probably assume that JIT'ing itself was free -- maybe
something this simple would work well.

The planner has traditionally used the cost units to determine the
cheapest plan; it compared total plan cost for plans that were taken
from the universe of possible plans for *one specific query*. That's
completely different to any model that expects plan costs to be
meaningful in an absolute sense. I'm not completely sure how much that
difference matters, but I suspect that the answer is: "it depends, but
often it matters a great deal".

-- 
Peter Geoghegan

Re: pgsql: Add 'basebackup_to_shell' contrib module.

2022-03-29 Thread Thomas Munro

On Wed, Mar 30, 2022 at 8:30 AM Thomas Munro  wrote:
> On Wed, Mar 30, 2022 at 3:08 AM Robert Haas  wrote:
> > Here's a new version, hopefully rectifying that deficiency. I also add
> > a second patch here, documenting basebackup_to_shell.required_role,
> > because Joe Conway observed elsewhere that I forgot to do that.
>
> Here are your patches again, plus that kludge to make the CI run your
> TAP test on Windows.

It failed:

https://cirrus-ci.com/task/5567070686412800
https://api.cirrus-ci.com/v1/artifact/task/5567070686412800/log/contrib/basebackup_to_shell/tmp_check/log/001_basic_primary.log
https://api.cirrus-ci.com/v1/artifact/task/5567070686412800/log/contrib/basebackup_to_shell/tmp_check/log/regress_log_001_basic

Re: CREATE TABLE ( .. STORAGE ..)

2022-03-29 Thread Matthias van de Meent

On Wed, 2 Feb 2022 at 11:13, Teodor Sigaev  wrote:
>
> Hi!
>
> > Are they both set to name or ColId? Although they are the same.
> >
>
> Thank you, fixed, that was just an oversight.
>
> > 2 For ColumnDef new member storage_name, did you miss the function 
> > _copyColumnDef()  _equalColumnDef()?
>
> Thank you, fixed

I noticed this and tried it out after needing it in a different
thread, so this is quite the useful addition.

I see that COMPRESSION and STORAGE now are handled slightly
differently in the grammar. Maybe we could standardize that a bit
more; so that we have only one `STORAGE [kind]` definition in the
grammar?

As I'm new to the grammar files; would you know the difference between
`name` and `ColId`, and why you would change from one to the other in
ALTER COLUMN STORAGE?

Thanks!

-Matthias

Re: Higher level questions around shared memory stats

On Tue, Mar 29, 2022 at 3:17 PM Andres Freund  wrote:
> 1) How to react to corrupted statsfiles?
>
> IMO it'd make more sense to just throw away all stats
> in that case.

That seems reasonable to me. I think there's some downside, in that
stats are important, and having some of them might be better than
having none of them. On the other hand, stats file corruption should
be very rare, and if it's not, we need to fix that. I think what's
actually most important here is the error reporting. We need to make
it clear, at least via log messages, that something bad has happened.
And maybe we should have, inside the stats system, something that
keeps track of when the stats file was last recreated from scratch
because of a corruption event, separately from when it was last
intentionally reset.

> 2) What do we want to do with stats when starting up in recovery?
>
> Today we throw out stats whenever crash recovery is needed. Which includes
> starting up a replica DB_SHUTDOWNED_IN_RECOVERY.
>
> The shared memory stats patch loads the stats in the startup process (whether
> there's recovery or not). It's obviously no problem to just mirror the current
> behaviour, we just need to decide what we want?
>
> The reason we throw stats away before recovery seem to originate in concerns
> around old stats being confused for new stats. Of course, "now" that we have
> streaming replication, throwing away stats *before* recovery, doesn't provide
> any sort of protection against that. In fact, currently there's no automatic
> mechanism whatsoever to get rid of stats for dropped objects on a standby.

Does redo update the stats?

> 3) Does anybody care about preserving the mishmash style of function comments
> present in pgstat? There's at least:

I can't speak for everyone in the universe, but I think it should be
fine to clean that up.

> I'm inclined to just do a pass through the files and normalize the comment
> styles, in a separate commit.  Personally I'd go for no ---, no copy of the
> function name, no tabs - but I don't really care as long as it's consistent.

+1 for that style.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Add parameter jit_warn_above_fraction

On Wed, 30 Mar 2022 at 02:38, Robert Haas wrote:
> I think WARNING is fine. After all, the parameter is called
> "jit_warn_above_fraction".

I had a think about this patch. I guess it's a little similar to
checkpoint_warning. The good thing about the checkpoint_warning is
that in the LOG message we give instructions about how the DBA can fix
the issue, i.e increase max_wal_size.

With the proposed patch I see there is no hint about what might be
done to remove/reduce the warnings. I imagine that's because it's not
all that clear which GUC should be changed. In my view, likely
jit_above_cost is the most relevant but there is also
jit_inline_above_cost, jit_optimize_above_cost, jit_tuple_deforming
and jit_expressions which are relevant too.

If we go with this patch, the problem I see here is that the amount
of work the JIT compiler must do for a given query depends mostly on
the number of expressions that must be compiled in the query (also to
a lesser extent jit_inline_above_cost, jit_optimize_above_cost,
jit_tuple_deforming and jit_expressions). The DBA does not really have
much control over the number of expressions in the query. All he or
she can do to get rid of the warning is something like increase
jit_above_cost. After a few iterations of that, the end result is
that jit_above_cost is now high enough that JIT no longer triggers
for, say, that query to that table with 1000 partitions where no
plan-time pruning takes place. Is that really a good thing? It likely
means that we just rarely JIT anything at all!

I really believe that the main problem here is that JIT only enables
when the *total* plan cost reaches a certain threshold. The number of
expressions to be compiled is not a factor in the decision at all.
That means that even if the total execution time of a plan was a true
reflection of the total estimated plan cost, then the fraction of time
spent (as is measured by jit_warn_above_fraction) doing JIT would
entirely depend on the number of expressions to compile. Of course,
the planner's not that good, but does that not indicate that the JIT
costing should really account for the number of expressions and not
just the total plan cost?

Anyway, what I'm trying to indicate here is that JIT is pretty much
impossible to tune properly and I don't really see why adding a
warning about it not being tuned correctly would help anyone. I think
it would be better to focus on making improvements to how the JIT
costing works.

I did propose a patch to address this in [1]. It does need more work
and I do plan to come back to it for v16.

I'd much rather see us address the costing problem before adding some
warning, especially a warning where it's not clear how to make go
away.

David

[1]
https://www.postgresql.org/message-id/CAApHDvpQJqLrNOSi8P1JLM8YE2C+ksKFpSdZg=q6sTbtQ-v=a...@mail.gmail.com

Re: pgsql: Add 'basebackup_to_shell' contrib module.

2022-03-29 Thread Thomas Munro

On Wed, Mar 30, 2022 at 3:08 AM Robert Haas  wrote:
> On Fri, Mar 25, 2022 at 5:52 PM Thomas Munro  wrote:
> > On Sat, Mar 26, 2022 at 9:55 AM Thomas Munro  wrote:
> > This line doesn't look too healthy:
> >
> > pg_basebackup: error: backup failed: ERROR:  shell command "type con >
> > "C:cirruscontrib asebackup_to_shell mp_check mp_test_tch3\base.tar""
> > failed
> >
> > I guess it's an escaping problem around \ characters.
>
> Oh, right. I didn't copy the usual incantation as completely as I
> should have done.
>
> Here's a new version, hopefully rectifying that deficiency. I also add
> a second patch here, documenting basebackup_to_shell.required_role,
> because Joe Conway observed elsewhere that I forgot to do that.

Here are your patches again, plus that kludge to make the CI run your
TAP test on Windows.
From de0eca033142d1fde643e503e5409887f38a628b Mon Sep 17 00:00:00 2001
From: Robert Haas 
Date: Tue, 29 Mar 2022 10:05:45 -0400
Subject: [PATCH v2 1/2] basebackup_to_shell: Add TAP test.

Per gripe from Andres Freund.
---
 contrib/basebackup_to_shell/Makefile   |   4 +
 contrib/basebackup_to_shell/t/001_basic.pl | 114 +
 2 files changed, 118 insertions(+)
 create mode 100644 contrib/basebackup_to_shell/t/001_basic.pl

diff --git a/contrib/basebackup_to_shell/Makefile b/contrib/basebackup_to_shell/Makefile
index f31dfaae9c..bbfebcc9a1 100644
--- a/contrib/basebackup_to_shell/Makefile
+++ b/contrib/basebackup_to_shell/Makefile
@@ -7,6 +7,10 @@ OBJS = \
 
 PGFILEDESC = "basebackup_to_shell - target basebackup to shell command"
 
+TAP_TESTS = 1
+
+export TAR
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/contrib/basebackup_to_shell/t/001_basic.pl b/contrib/basebackup_to_shell/t/001_basic.pl
new file mode 100644
index 00..df0f6d9926
--- /dev/null
+++ b/contrib/basebackup_to_shell/t/001_basic.pl
@@ -0,0 +1,114 @@
+# Copyright (c) 2021-2022, PostgreSQL Global Development Group
+
+use strict;
+use warnings;
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $node = PostgreSQL::Test::Cluster->new('primary');
+$node->init('allows_streaming' => 1);
+$node->append_conf('postgresql.conf',
+   "shared_preload_libraries = 'basebackup_to_shell'");
+$node->start;
+$node->safe_psql('postgres', 'CREATE USER backupuser REPLICATION');
+$node->safe_psql('postgres', 'CREATE ROLE trustworthy');
+
+# For nearly all pg_basebackup invocations some options should be specified,
+# to keep test times reasonable. Using @pg_basebackup_defs as the first
+# element of the array passed to IPC::Run interpolate the array (as it is
+# not a reference to an array)...
+my @pg_basebackup_defs = ('pg_basebackup', '--no-sync', '-cfast');
+
+# This particular test module generally wants to run with -Xfetch, because
+# -Xstream is not supported with a backup target, and with -U backupuser.
+my @pg_basebackup_cmd = (@pg_basebackup_defs, '-U', 'backupuser', '-Xfetch');
+
+# Can't use this module without setting basebackup_to_shell.command.
+$node->command_fails_like(
+[ @pg_basebackup_cmd, '--target', 'shell' ],
+	qr/shell command for backup is not configured/,
+	'fails if basebackup_to_shell.command is not set');
+
+# Configure basebackup_to_shell.command and reload the configuation file.
+my $backup_path = PostgreSQL::Test::Utils::tempdir;
+my $escaped_backup_path = $backup_path;
+$escaped_backup_path =~ s{\\}{}g if ($PostgreSQL::Test::Utils::windows_os);
+my $shell_command =
+	$PostgreSQL::Test::Utils::windows_os
+	? qq{type con > "$escaped_backup_path%f"}
+: qq{cat > "$escaped_backup_path/%f"};
+$node->append_conf('postgresql.conf',
+   "basebackup_to_shell.command='$shell_command'");
+$node->reload();
+
+# Should work now.
+$node->command_ok(
+[ @pg_basebackup_cmd, '--target', 'shell' ],
+	'backup with no detail: pg_basebackup');
+verify_backup('', $backup_path, "backup with no detail");
+
+# Should fail with a detail.
+$node->command_fails_like(
+[ @pg_basebackup_cmd, '--target', 'shell:foo' ],
+	qr/a target detail is not permitted because the configured command does not include %d/,
+	'fails if detail provided without %d');
+
+# Reconfigure to restrict access and require a detail.
+$shell_command =
+	$PostgreSQL::Test::Utils::windows_os
+	? qq{type con > "$escaped_backup_path%d.%f"}
+: qq{cat > "$escaped_backup_path/%d.%f"};
+$node->append_conf('postgresql.conf',
+   "basebackup_to_shell.command='$shell_command'");
+$node->append_conf('postgresql.conf',
+   "basebackup_to_shell.required_role='trustworthy'");
+$node->reload();
+
+# Should fail due to lack of permission.
+$node->command_fails_like(
+[ @pg_basebackup_cmd, '--target', 'shell' ],
+	qr/permission denied to use basebackup_to_shell/,
+	'fails if required_role not granted');
+
+# Should fail due to lack of a detail.
+$node->safe_psql('postgres', 'GRANT trustworthy TO backupuser');
+$node->command_fails_like(
+[ @pg_

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

On Tue, Mar 29, 2022 at 2:37 PM Tom Lane  wrote:
> Robert Haas  writes:
> > On Tue, Mar 29, 2022 at 2:17 PM Tom Lane  wrote:
> >> it's some random algorithm that they probably feel at liberty
> >> to change.
>
> > I guess that characterization surprises me. The man page for
> > getopt_long() says this, and has for a long time at least on systems
> > I've used:
>
> Yeah, they say they follow the POSIX spec when you set POSIXLY_CORRECT.
> What they don't spell out in any detail is what they do when you don't.
> We know that it involves rearranging the argv[] array behind the
> application's back, but not what the rules are for doing that.  In
> particular, they must have some undocumented and probably not very safe
> method for deciding which arguments are neither switches nor switch
> arguments.

I mean, I think of an option as something that starts with '-'. The
documentation contains a caveat that says: "The special argument ‘--’
forces in all cases the end of option scanning." So I think I would
expect it just looks for arguments starting with '-' that do not
follow an argument that is exactly "--".

https://github.com/gcc-mirror/gcc/blob/master/libiberty/getopt.c

   If an element of ARGV starts with '-', and is not exactly "-" or "--",
   then it is an option element.  The characters of this element
   (aside from the initial '-') are option characters.  If `getopt'
   is called repeatedly, it returns successively each of the option characters
   from each of the option elements.

OK - so I was off slightly. Either "-" or "--" terminates the options
list. Apart from that anything starting with "-" is an option.

I think you're overestimating the level of mystery that's present
here, as well as the likelihood that the rules could ever be changed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Higher level questions around shared memory stats

Hi,

Separate from the minutia in [1] I'd like to discuss a few questions of more
general interest. I'll post another question or two later.


1) How to react to corrupted statsfiles?

In HEAD we stop reading stats at the point we detect the stats file to be
corrupted. The contents of the currently read stats "entry" are zeroed out,
but prior entries stay loaded.

This means that we can get an entirely inconsisten state of the stats, without
really knowing:

E.g. if a per-db stats file fails to load halfway through, we'll have already
loaded the pg_stat_database entry. Thus pg_stat_database.stats_reset will not
be reset, but at the same time we'll only have part of the data in
pg_stat_database.


That seems like a mess? IMO it'd make more sense to just throw away all stats
in that case.  Either works for the shmstats patch, it's just a few lines to
change.

Unless there's support for throwing away stats, I'll change the shmstats patch
to match the current behaviour. Currently it resets all "global" stats like
bgwriter, checkpointer etc whenever there's a failure, but keeps already
loaded database / table / function stats, which doesn't make much sense.


2) What do we want to do with stats when starting up in recovery?

Today we throw out stats whenever crash recovery is needed. Which includes
starting up a replica DB_SHUTDOWNED_IN_RECOVERY.

The shared memory stats patch loads the stats in the startup process (whether
there's recovery or not). It's obviously no problem to just mirror the current
behaviour, we just need to decide what we want?

The reason we throw stats away before recovery seem to originate in concerns
around old stats being confused for new stats. Of course, "now" that we have
streaming replication, throwing away stats *before* recovery, doesn't provide
any sort of protection against that. In fact, currently there's no automatic
mechanism whatsoever to get rid of stats for dropped objects on a standby.


In the shared memory stats patch, dropped catalog objects cause the stats to
also be dropped on the standby. So that whole concern is largely gone.

I'm inclined to, for now, either leave the behaviour exactly as it currently
is, or to continuing throwing away stats during normal crash recovery, but to
stop doing so for DB_SHUTDOWNED_IN_RECOVERY.

I think it'd now be feasible to just never throw stats away during crash
recovery, by writing out stats during checkpoint/restartpoints, but that's
clearly work for later. The alternatives in the prior paragraph in contrast is
just a question of when to call the "restore" and when the "throw away"
function, a call that has to be made anyway.


3) Does anybody care about preserving the mishmash style of function comments
present in pgstat? There's at least:

/* --
 * pgstat_write_db_statsfile() -
 *  Write the stat file for a single database.
 *
 *  If writing to the permanent file (happens when the collector is

/* --
 * pgstat_get_replslot_entry
 *
 * Return the entry of replication slot stats with the given name. Return
 * NULL if not found and the caller didn't request to create it.

/*
 * Lookup the hash table entry for the specified table. If no hash
 * table entry exists, initialize it, if the create parameter is true.
 * Else, return NULL.
 */

/* --
 * pgstat_send() -
 *
 *  Send out one statistics message to the collector
 * --
 */

/*
 * PostPrepare_PgStat
 *  Clean up after successful PREPARE.
 *
 * Note: AtEOXact_PgStat is not called during PREPARE.
 */


 or not. Summary indented with two tabs. Longer comment indented with a
tab. The function name in the comment or not. Function parens or not. And
quite a few more differences.

I find these a *pain* to maintain. Most function comments have to be touched
to remove references to the stats collector and invariably such changes end up
changing formatting as well. Whenever adding a new function I have an internal
debate about which comment style to follow.

I've already spent a considerable amount of time going through the patch to
reduce incidental "format" changes, but there's quite a few more instances
that need to be cleaned up.

I'm inclined to just do a pass through the files and normalize the comment
styles, in a separate commit.  Personally I'd go for no ---, no copy of the
function name, no tabs - but I don't really care as long as it's consistent.

Greetings,

Andres Freund

[1] https://postgr.es/m/20220329072417.er26q5pxc4pbldn7%40alap3.anarazel.de

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

On Tue, Mar 29, 2022 at 10:03 AM Robert Haas  wrote:
> I can understand this version of the commit message. Woohoo! I like
> understanding things.

That's good news.

> I think the header comments for FreezeMultiXactId() focus way too much
> on what the caller is supposed to do and not nearly enough on what
> FreezeMultiXactId() itself does. I think to some extent this also
> applies to the comments within the function body.

To some extent this is a legitimate difference in style. I myself
don't think that it's intrinsically good to have these sorts of
comments. I just think that it can be the least worst thing when a
function is intrinsically written with one caller and one very
specific set of requirements in mind. That is pretty much a matter of
taste, though.

> I think I understand what the first paragraph of the header comment
> for heap_tuple_needs_freeze() is trying to say, but the second one is
> quite confusing. I think this is again because it veers into talking
> about what the caller should do rather than explaining what the
> function itself does.

I wouldn't have done it that way if the function wasn't called
heap_tuple_needs_freeze().

I would be okay with removing this paragraph if the function was
renamed to reflect the fact it now tells the caller something about
the tuple having an old XID/MXID relative to the caller's own XID/MXID
cutoffs. Maybe the function name should be heap_tuple_would_freeze(),
making it clear that the function merely tells caller what
heap_prepare_freeze_tuple() *would* do, without presuming to tell the
vacuumlazy.c caller what it *should* do about any of the information
it is provided.

Then it becomes natural to see the boolean return value and the
changes the function makes to caller's relfrozenxid/relminmxid tracker
variables as independent.

> I don't like the statement-free else block in lazy_scan_noprune(). I
> think you could delete the else{} and just put that same comment there
> with one less level of indentation. There's a clear "return false"
> just above so it shouldn't be confusing what's happening.

Okay, will fix.

> The comment hunk at the end of lazy_scan_noprune() would probably be
> better if it said something more specific than "caller can tolerate
> reduced processing." My guess is that it would be something like
> "caller does not need to do something or other."

I meant "caller can tolerate not pruning or freezing this particular
page". Will fix.

> I have my doubts about whether the overwrite-a-future-relfrozenxid
> behavior is any good, but that's a topic for another day. I suggest
> keeping the words "it seems best to", though, because they convey a
> level of tentativeness, which seems appropriate.

I agree that it's best to keep a tentative tone here. That code was
written following a very specific bug in pg_upgrade several years
back. There was a very recent bug fixed only last year, by commit
74cf7d46.

FWIW I tend to think that we'd have a much better chance of catching
that sort of thing if we'd had better relfrozenxid instrumentation
before now. Now you'd see a negative value in the "new relfrozenxid:
%u, which is %d xids ahead of previous value" part of the autovacuum
log message in the event of such a bug. That's weird enough that I bet
somebody would notice and report it.

> I am surprised to see you write in maintenance.sgml that the VACUUM
> which most recently advanced relfrozenxid will typically be the most
> recent aggressive VACUUM. I would have expected something like "(often
> the most recent VACUUM)".

That's always been true, and will only be slightly less true in
Postgres 15 -- the fact is that we only need to skip one all-visible
page to lose out, and that's not unlikely with tables that aren't
quite small with all the patches from v12 applied (we're still much
too naive). The work that I'll get into Postgres 15 on VACUUM is very
valuable as a basis for future improvements, but not all that valuable
to users (improved instrumentation might be the biggest benefit in 15,
or maybe relminmxid advancement for certain types of applications).

I still think that we need to do more proactive page-level freezing to
make relfrozenxid advancement happen in almost every VACUUM, but even
that won't quite be enough. There are still cases where we need to
make a choice about giving up on relfrozenxid advancement in a
non-aggressive VACUUM -- all-visible pages won't completely go away
with page-level freezing. At a minimum we'll still have edge cases
like the case where heap_lock_tuple() unsets the all-frozen bit. And
pg_upgrade'd databases, too.

0002 structures the logic for skipping using the VM in a way that will
make the choice to skip or not skip all-visible pages in
non-aggressive VACUUMs quite natural. I suspect that
SKIP_PAGES_THRESHOLD was always mostly just about relfrozenxid
advancement in non-aggressive VACUUM, all along. We can do much better
than SKIP_PAGES_THRESHOLD, especially if we preprocess the entire
visibil

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

Robert Haas  writes:
> On Tue, Mar 29, 2022 at 2:17 PM Tom Lane  wrote:
>> it's some random algorithm that they probably feel at liberty
>> to change.

> I guess that characterization surprises me. The man page for
> getopt_long() says this, and has for a long time at least on systems
> I've used:

Yeah, they say they follow the POSIX spec when you set POSIXLY_CORRECT.
What they don't spell out in any detail is what they do when you don't.
We know that it involves rearranging the argv[] array behind the
application's back, but not what the rules are for doing that.  In
particular, they must have some undocumented and probably not very safe
method for deciding which arguments are neither switches nor switch
arguments.

(Actually, if I recall previous discussions properly, another stumbling
block to doing anything here is that we'd also have to change all our
documentation to explain it.  Fixing the command line synopses would
be a mess already, and explaining the rules would be worse.)

regards, tom lane

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

On Tue, Mar 29, 2022 at 2:17 PM Tom Lane  wrote:
> Robert Haas  writes:
> > On Tue, Mar 29, 2022 at 1:53 PM Tom Lane  wrote:
> >> That test script is expecting glibc-like laxness of switch
> >> parsing.  Put the switches before the non-switch arguments.
>
> > I just did that. :-)
>
> Yup, you pushed while I was typing.
>
> FWIW, I don't think it's "Windows" enforcing this, it's our own
> src/port/getopt[_long].c.  If there were a well-defined spec
> for what glibc does with such cases, it might be interesting to
> try to make our version bug-compatible with theirs.  But AFAIK
> it's some random algorithm that they probably feel at liberty
> to change.

I guess that characterization surprises me. The man page for
getopt_long() says this, and has for a long time at least on systems
I've used:

ENVIRONMENT
 POSIXLY_CORRECT  If set, option processing stops when the first non-
  option is found and a leading `-' or `+' in the
  optstring is ignored.

And also this:

BUGS
 The argv argument is not really const as its elements may be permuted
 (unless POSIXLY_CORRECT is set).

Doesn't that make it pretty clear what the GNU version is doing?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

Robert Haas  writes:
> On Tue, Mar 29, 2022 at 1:53 PM Tom Lane  wrote:
>> That test script is expecting glibc-like laxness of switch
>> parsing.  Put the switches before the non-switch arguments.

> I just did that. :-)

Yup, you pushed while I was typing.

FWIW, I don't think it's "Windows" enforcing this, it's our own
src/port/getopt[_long].c.  If there were a well-defined spec
for what glibc does with such cases, it might be interesting to
try to make our version bug-compatible with theirs.  But AFAIK
it's some random algorithm that they probably feel at liberty
to change.

regards, tom lane

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

On Tue, Mar 29, 2022 at 1:53 PM Tom Lane  wrote:
> Andres Freund  writes:
> > Looks like there's some problem with commandline parsing?
>
> That test script is expecting glibc-like laxness of switch
> parsing.  Put the switches before the non-switch arguments.

I just did that. :-)

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

Andres Freund  writes:
> Looks like there's some problem with commandline parsing?

That test script is expecting glibc-like laxness of switch
parsing.  Put the switches before the non-switch arguments.

regards, tom lane

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

On Tue, Mar 29, 2022 at 1:35 PM Andres Freund  wrote:
> # Running: createdb -T foobar2 foobar6 -S wal_log
> createdb: error: too many command-line arguments (first is "wal_log")
> Try "createdb --help" for more information.
> not ok 31 - createdb -T foobar2 foobar6 -S wal_log exit code 0
>
> Looks like there's some problem with commandline parsing?

Apparently getopt_long() is fussier on Windows. I have committed a fix.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

On 2022-03-29 11:55:05 -0400, Robert Haas wrote:
> That commit having been reverted, I committed v6 instead. Let's see
> what breaks...

It fails in CI (for the mirror of the postgres repo on github):
https://cirrus-ci.com/task/6279465603956736?logs=test_bin#L121
tap test log: 
https://api.cirrus-ci.com/v1/artifact/task/6279465603956736/log/src/bin/scripts/tmp_check/log/regress_log_020_createdb
postmaster log: 
https://api.cirrus-ci.com/v1/artifact/task/6279465603956736/log/src/bin/scripts/tmp_check/log/020_createdb_main.log

recent versions failed similarly on cfbot:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/37/3192
https://cirrus-ci.com/task/5217140407009280?logs=test_bin#L121

# Running: createdb -T foobar2 foobar6 -S wal_log
createdb: error: too many command-line arguments (first is "wal_log")
Try "createdb --help" for more information.
not ok 31 - createdb -T foobar2 foobar6 -S wal_log exit code 0

#   Failed test 'createdb -T foobar2 foobar6 -S wal_log exit code 0'
#   at t/020_createdb.pl line 117.
not ok 32 - create database with WAL_LOG strategy: SQL found in server log

#   Failed test 'create database with WAL_LOG strategy: SQL found in server log'
#   at t/020_createdb.pl line 117.
#   ''
# doesn't match '(?^:statement: CREATE DATABASE foobar6 STRATEGY wal_log 
TEMPLATE foobar2)'
# Running: createdb -T foobar2 foobar7 -S file_copy
createdb: error: too many command-line arguments (first is "file_copy")
Try "createdb --help" for more information.
not ok 33 - createdb -T foobar2 foobar7 -S file_copy exit code 0

#   Failed test 'createdb -T foobar2 foobar7 -S file_copy exit code 0'
#   at t/020_createdb.pl line 122.
not ok 34 - create database with FILE_COPY strategy: SQL found in server log

#   Failed test 'create database with FILE_COPY strategy: SQL found in server 
log'
#   at t/020_createdb.pl line 122.
#   ''
# doesn't match '(?^:statement: CREATE DATABASE foobar7 STRATEGY file_copy 
TEMPLATE foobar2)'

Looks like there's some problem with commandline parsing?

Greetings,

Andres Freund

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

On 29.03.22 15:09, Aleksander Alekseev wrote:

Hi hackers,

I took another look at the patchset. Personally I don't think it will get much
better than it is now. I'm inclined to change the status of the CF entry to
"Ready for Committer" unless anyone disagrees.

About v25-0002-Use-64-bit-format-to-output-XIDs.patch:
I don't see the point of applying this now. It doesn't make PG15 any
better. It's just a patch part of which we might need later.

AFAICT, this patch provides no actual functionality change, so holding
it a bit for early in the PG16 cycle wouldn't lose anything.

OK. As I understand we still have a consensus that v30-0001 (SLRU refactoring,
not the XID formatting) is targeting PG15, so I changed the CF entry to
"Ready for Committer" for this single patch. I rechecked it again on the
current `master` branch without the other patches and it is OK. cfbot is happy
with the patchset as well.

That isn't really what I meant.  When I said

> At this point, I'm more concerned that code review is still finding
> bugs, and that we have no test coverage for any of this

this meant especially the SLRU refactoring patch.

Re: GSoC: Improve PostgreSQL Regression Test Coverage

2022-03-29 Thread Stephen Frost

Greetings,

* Christos Maris (christos.c.ma...@gmail.com) wrote:
> This is Christos Maris , and
> I'd like to declare my interest in the GSoC project mentioned above.
> I'm an experienced Fullstack Developer, but still an open-source beginner.
> That, combined with the fact that I'd really like to contribute to Postgres
> and learn Perl, makes this particular project ideal for me.

Glad to hear that you're interested!

> I want to ask if the mentor of this project (Stephen Frost) could help me
> with my application (if that's allowed, of course)?

You're welcome to send it to me off-list to take a look at and I can
provide comments on it, but note that I can't provide excessive help in
crafting it.  This is to be your proposal after all, not mine.

Be sure to review this:

https://google.github.io/gsocguides/student/writing-a-proposal

When it comes to working on your proposal.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Corruption during WAL replay

On Tue, Mar 29, 2022 at 12:34 PM Stephen Frost  wrote:
> Anyhow, this whole thread has struck me as a good reason to polish those
> patches off and add on top of them an extended checksum ability, first,
> independent of TDE, and remove the dependency of those patches from the
> TDE effort and instead allow it to just leverage that ability.  I still
> suspect we'll have some folks who will want TDE w/o a per-page nonce and
> that could be an option but we'd be able to support TDE w/ integrity
> pretty easily, which would be fantastic.

Yes, I like that idea. Once we get beyond feature freeze, perhaps we
can try to coordinate to avoid duplication of effort -- or absence of
effort.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

On Sun, Mar 27, 2022 at 11:24 PM Peter Geoghegan  wrote:
> Attached is v12. My current goal is to commit all 3 patches before
> feature freeze. Note that this does not include the more complicated
> patch including with previous revisions of the patch series (the
> page-level freezing work that appeared in versions before v11).

Reviewing 0001, focusing on the words in the patch file much more than the code:

I can understand this version of the commit message. Woohoo! I like
understanding things.

I think the header comments for FreezeMultiXactId() focus way too much
on what the caller is supposed to do and not nearly enough on what
FreezeMultiXactId() itself does. I think to some extent this also
applies to the comments within the function body.

On the other hand, the header comments for heap_prepare_freeze_tuple()
seem good to me. If I were thinking of calling this function, I would
know how to use the new arguments. If I were looking for bugs in it, I
could compare the logic in the function to what these comments say it
should be doing. Yay.

I think I understand what the first paragraph of the header comment
for heap_tuple_needs_freeze() is trying to say, but the second one is
quite confusing. I think this is again because it veers into talking
about what the caller should do rather than explaining what the
function itself does.

I don't like the statement-free else block in lazy_scan_noprune(). I
think you could delete the else{} and just put that same comment there
with one less level of indentation. There's a clear "return false"
just above so it shouldn't be confusing what's happening.

The comment hunk at the end of lazy_scan_noprune() would probably be
better if it said something more specific than "caller can tolerate
reduced processing." My guess is that it would be something like
"caller does not need to do something or other."

I have my doubts about whether the overwrite-a-future-relfrozenxid
behavior is any good, but that's a topic for another day. I suggest
keeping the words "it seems best to", though, because they convey a
level of tentativeness, which seems appropriate.

I am surprised to see you write in maintenance.sgml that the VACUUM
which most recently advanced relfrozenxid will typically be the most
recent aggressive VACUUM. I would have expected something like "(often
the most recent VACUUM)".

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] Full support for index LP_DEAD hint bits on standby

2022-03-29 Thread Michail Nikolaev

UPD:

> I was thinking it is safe to have additional hint bits
> on primary, but it seems like no.

Oh, sorry for the mistake, it is about standby of course.

> BTW I am wondering if it is possible
> to achieve the same situation by pg_rewind and standby promotion…

Looks like it is impossible, because wal_log_hints is required in
order to use pg_rewind.
It is possible to achieve a situation with some additional LP_DEAD on
standby compared to the primary, but any change on primary would cause
FPI, so LP_DEAD will be cleared.

Thanks,
Michail.

Re: Corruption during WAL replay

2022-03-29 Thread Stephen Frost

Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Fri, Mar 25, 2022 at 10:34 AM Tom Lane  wrote:
> > I dunno.  Compatibility and speed concerns aside, that seems like an awful
> > lot of bits to be expending on every page compared to the value.
> 
> I dunno either, but over on the TDE thread people seemed quite willing
> to expend like 16-32 *bytes* for page verifiers and nonces and things.

Absolutely.

> For compatibility and speed reasons, I doubt we could ever get by with
> doing that in every cluster, but I do have some hope of introducing
> something like that someday at least as an optional feature. It's not
> like a 16-bit checksum was state-of-the-art even when we introduced
> it. We just did it because we had 2 bytes that we could repurpose
> relatively painlessly, and not any larger number. And that's still the
> case today, so at least in the short term we will have to choose some
> other solution to this problem.

I agree that this would be great as an optional feature.  Those patches
to allow the system to be built with reserved space for $whatever would
allow us to have a larger checksum for those who want it and perhaps a
nonce with TDE for those who wish that in the future.  I mentioned
before that I thought it might be a good way to introduce page-level
epochs for 64bit xids too though it never seemed to get much traction.

Anyhow, this whole thread has struck me as a good reason to polish those
patches off and add on top of them an extended checksum ability, first,
independent of TDE, and remove the dependency of those patches from the
TDE effort and instead allow it to just leverage that ability.  I still
suspect we'll have some folks who will want TDE w/o a per-page nonce and
that could be an option but we'd be able to support TDE w/ integrity
pretty easily, which would be fantastic.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: make MaxBackends available in _PG_init

On Sat, Mar 26, 2022 at 1:23 PM Andres Freund  wrote:
> > And indeed, any third party code that previously needed to access what
> > MaxBackends is supposed to store should already be using that formula, and
> > the new GetMaxBackends() doesn't do anything about it.
>
> It couldn't rely on MaxBackends before. It can't rely on GetMaxBackends()
> now. You can see why I think that what you want is unrelated to the
> introduction of GetMaxBackends().

It's not, though, because the original proposal was to change things
around so that the value of MaxBackends would have been reliable in
_PG_init(). If we'd done that, then extensions that are using it in
_PG_init() would have gone from being buggy to being not-buggy. But
since you advocated against that change, we instead committed
something that caused them to go from being buggy to failing outright.
That's pretty painful for people with such extensions. And IMHO, it's
*much* more legitimate to want to size a data structure based on the
value of MaxBackends than it is for extensions to override GUC values.
If we can make the latter use case work in a sane way, OK, although I
have my doubts about how sane it really is, but it can't be at the
expense of telling extensions that have been (incorrectly) using
MaxBackends in _PG_init() that we're throwing them under the bus.

IMHO, the proper thing to do if certain GUC values are required for an
extension to work is to put that information in the documentation and
error out at an appropriate point if the user does not follow the
directions. Then this issue does not arise. But there's no reasonable
workaround for being unable to size data structures based on
MaxBackends.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

On Mon, Mar 28, 2022 at 3:08 PM Robert Haas  wrote:
> smgrcreate() as we would for most WAL records or whether it should be
> adopting the new system introduced by
> 49d9cfc68bf4e0d32a948fe72d5a0ef7f464944e. I wrote about this concern
> over here:
>
> http://postgr.es/m/CA+TgmoYcUPL+WOJL2ZzhH=zmrhj0iOQ=icfm0suyqbbqzea...@mail.gmail.com
>
> But apart from that question your adaptations here look reasonable to me.

That commit having been reverted, I committed v6 instead. Let's see
what breaks...

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Correct docs re: rewriting indexes when table rewrite is skipped

2022-03-29 Thread Matthias van de Meent

On Tue, 29 Mar 2022 at 16:04, James Coleman  wrote:
>
> Back in 367bc42 (for 9.2!) we "avoid[ed] index rebuild[ing] for
> no-rewrite ALTER TABLE
> .. ALTER TYPE." However the docs still claim that "a table rewrite is
> not needed; but any indexes on the affected columns must still be
> rebuilt."

Although indexes might indeed not need a rebuild, in many cases they
still do (e.g. when the type changes between text and a domain of text
with a different collation).

I think that the current state of the docs is better in that regard;
as it explicitly warns for index rebuilds, even when the letter of the
docs is incorrect: there are indeed cases we don't need to rebuild the
indexes; but that would require more elaboration.

Kind regards,

Matthias van de Meent

Re: [PATCH] Enable SSL library detection via PQsslAttribute

2022-03-29 Thread Jacob Champion

On Tue, 2022-03-29 at 14:08 +0200, Daniel Gustafsson wrote:
> Pushed with a few small tweaks to make it match project style, thanks!

Thank you!

--Jacob

Re: pg14 psql broke \d datname.nspname.relname

On Fri, Mar 25, 2022 at 3:42 PM Mark Dilger
 wrote:
> I think your change is fine, so I've rolled it into this next patch.

OK, cool. Here are some more comments.

In describe.c, why are the various describeWhatever functions
returning true when validateSQLNamePattern returns false? It seems to
me that they should return false. That would cause exec_command_d() to
set status = PSQL_CMD_ERROR, which seems appropriate. I wondered
whether we should return PSQL_CMD_ERROR only for database errors, but
that doesn't seem to be the case. For example, exec_command_a() sets
PSQL_CMD_ERROR for a failure in do_pset().

pg_dump's prohibit_crossdb_refs() has a special case for you are not
connected to a database, but psql's validateSQLNamePattern() treats it
as an invalid cross-database reference. Maybe that should be
consistent, or just the other way around. After all, I would expect
pg_dump to just bail out if we lose the database connection, but psql
may continue, because we can reconnect. Putting more code into the
tool where reconnecting doesn't really make sense seems odd.

processSQLNamePattern() documents that dotcnt can be NULL, and then
asserts that it isn't.

processSQLNamePattern() introduces new local variables schema and
name, which account for most of the notational churn in that function.
I can't see a reason why those changes are needed. You do test whether
the new variables are NULL in a couple of places, but you could
equally well test schemavar/namevar/altnamevar directly. Actually, I
don't really understand why this function needs any changes other than
passing dbnamebuf and dotcnt through to patternToSQLRegex(). Is there
a reason?

patternToSQLRegex() restructures the system of buffers as well, and I
don't understand the purpose of that either. It sort of looks like the
idea might be to relax the rule against dbname.relname patterns, but
why would we want to do that? If we don't want to do that, why remove
the assertion?

It is not very nice that patternToSQLRegex() ends up repeating the
locution "if (left && want_literal_dbname)
appendPQExpBufferChar(&left_literal, '"')" a whole bunch of times.
Suppose we remove all that. Then, in the if (!inquotes && ch == '.')
case, if left = true, we copy "cp - pattern" bytes starting at
"pattern" into the buffer. Wouldn't that accomplish the same thing
with less code?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Add header support to text format and matching feature


On 25.03.22 05:21, Greg Stark wrote:

Great to see the first of the two patches committed.

It looks like the second patch got some feedback from Peter in
February and has been marked "Waiting on author" ever since.

Remi, will you have a chance to look at this this month?

Peter, are these comments blocking if Remi doesn't have a chance to
work on it should I move it to the next release or could it be fixed
up and committed?


I will try to finish up this patch.

Re: Identify missing publications from publisher while create/alter subscription.

2022-03-29 Thread vignesh C

On Tue, Mar 29, 2022 at 4:12 PM Amit Kapila  wrote:
>
> On Tue, Mar 29, 2022 at 11:01 AM Amit Kapila  wrote:
> >
> > On Sat, Mar 26, 2022 at 7:53 PM vignesh C  wrote:
> > >
> > > The patch was not applying on HEAD, attached patch which is rebased on
> > > top of HEAD.
> > >
> >
> > IIUC, this patch provides an option that allows us to give an error if
> > while creating/altering subcsiction, user gives non-existant
> > publications. I am not sure how useful it is to add such behavior via
> > an option especially when we know that it can occur in some other ways
> > like after creating the subscription, users can independently drop
> > publication from publisher. I think it could be useful to provide
> > additional information here but it would be better if we can follow
> > Euler's suggestion [1] in the thread where he suggested issuing a
> > WARNING if the publications don't exist and document that the
> > subscription catalog can have non-existent publications.
> >
> > I think we should avoid adding new options unless they are really
> > required and useful.
> >
>
> *
> +connect_and_check_pubs(Subscription *sub, List *publications)
> +{
> + char*err;
> + WalReceiverConn *wrconn;
> +
> + /* Load the library providing us libpq calls. */
> + load_file("libpqwalreceiver", false);
> +
> + /* Try to connect to the publisher. */
> + wrconn = walrcv_connect(sub->conninfo, true, sub->name, &err);
> + if (!wrconn)
> + ereport(ERROR,
> + errcode(ERRCODE_CONNECTION_FAILURE),
> + errmsg("could not connect to the publisher: %s", err));
>
> I think it won't be a good idea to add new failure modes in existing
> commands especially if we decide to make it non-optional. I think we
> can do this check only in case we are already connecting to the
> publisher.

I have modified it to check only in create subscription/alter
subscription .. add publication and alter subscription.. set
publication cases where we are connecting to the publisher.
The changes for the same are present at v16 patch attached at [1].
[1] - 
https://www.postgresql.org/message-id/CALDaNm2zHd9FAn%2BMAQ3x-2%2BRnu8%3DRu%2BBeQXokfNBKo6sNAVb3A%40mail.gmail.com

Regards,
Vignesh

Re: Identify missing publications from publisher while create/alter subscription.

2022-03-29 Thread vignesh C

On Tue, Mar 29, 2022 at 11:02 AM Amit Kapila  wrote:
>
> On Sat, Mar 26, 2022 at 7:53 PM vignesh C  wrote:
> >
> > The patch was not applying on HEAD, attached patch which is rebased on
> > top of HEAD.
> >
>
> IIUC, this patch provides an option that allows us to give an error if
> while creating/altering subcsiction, user gives non-existant
> publications. I am not sure how useful it is to add such behavior via
> an option especially when we know that it can occur in some other ways
> like after creating the subscription, users can independently drop
> publication from publisher. I think it could be useful to provide
> additional information here but it would be better if we can follow
> Euler's suggestion [1] in the thread where he suggested issuing a
> WARNING if the publications don't exist and document that the
> subscription catalog can have non-existent publications.
>
> I think we should avoid adding new options unless they are really
> required and useful.
>
> [1] - 
> https://www.postgresql.org/message-id/a2f2fba6-40dd-44cc-b40e-58196bb77f1c%40www.fastmail.com

Thanks for the suggestion, I have changed the patch as suggested.
Attached v16 patch has the changes for the same.

Regards,
Vignesh
From 4232956ff31192e903736e921b92aac06243c171 Mon Sep 17 00:00:00 2001
From: Vigneshwaran C 
Date: Sat, 26 Mar 2022 19:43:27 +0530
Subject: [PATCH v16] Identify missing publications from publisher while
 create/alter subscription.

Creating/altering subscription is successful without throwing any warning when
we specify a publication which does not exist in the publisher. This patch
checks if the specified publications are present in the publisher and throws
a warning if any of the publication is missing in the publisher.

Author: Vignesh C
Reviewed-by: Amit Kapila, Bharath Rupireddy, Japin Li, Dilip Kumar, Euler Taveira, Ashutosh Sharma
Discussion: https://www.postgresql.org/message-id/flat/20220321235957.i4jtjn4wyjucex6b%40alap3.anarazel.de#b846fd4ef657cfaa8c9890f044e4
---
 doc/src/sgml/ref/create_subscription.sgml |   7 ++
 src/backend/commands/subscriptioncmds.c   | 132 ++
 src/test/subscription/t/007_ddl.pl|  61 ++
 3 files changed, 178 insertions(+), 22 deletions(-)

diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index b701752fc9..761d34116c 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -356,6 +356,13 @@ CREATE SUBSCRIPTION subscription_name
 
+  
+   pg_subscription
+   can have non-existent publications if non-existent publication was
+   specified during CREATE SUBSCRIPTION or if an existing
+   publication was removed.
+  
+
  
 
  
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index abebffdf3b..6db88f8443 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -375,6 +375,103 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 	}
 }
 
+/*
+ * Add publication names from the list to a string.
+ */
+static void
+get_publications_str(List *publications, StringInfo dest, bool quote_literal)
+{
+	ListCell   *lc;
+	bool		first = true;
+
+	Assert(list_length(publications) > 0);
+
+	foreach(lc, publications)
+	{
+		char	   *pubname = strVal(lfirst(lc));
+
+		if (first)
+			first = false;
+		else
+			appendStringInfoString(dest, ", ");
+
+		if (quote_literal)
+			appendStringInfoString(dest, quote_literal_cstr(pubname));
+		else
+		{
+			appendStringInfoChar(dest, '"');
+			appendStringInfoString(dest, pubname);
+			appendStringInfoChar(dest, '"');
+		}
+	}
+}
+
+/*
+ * Check the specified publication(s) is(are) present in the publisher.
+ */
+static void
+check_publications(WalReceiverConn *wrconn, List *publications)
+{
+	WalRcvExecResult *res;
+	StringInfo 		cmd;
+	TupleTableSlot *slot;
+	List	   *publicationsCopy = NIL;
+	Oid			tableRow[1] = {TEXTOID};
+
+	cmd = makeStringInfo();
+	appendStringInfoString(cmd, "SELECT t.pubname FROM\n"
+" pg_catalog.pg_publication t WHERE\n"
+" t.pubname IN (");
+	get_publications_str(publications, cmd, true);
+	appendStringInfoChar(cmd, ')');
+
+	res = walrcv_exec(wrconn, cmd->data, 1, tableRow);
+	pfree(cmd->data);
+	pfree(cmd);
+
+	if (res->status != WALRCV_OK_TUPLES)
+		ereport(ERROR,
+errmsg_plural("could not receive publication from the publisher: %s",
+			  "could not receive list of publications from the publisher: %s",
+			  list_length(publications),
+			  res->err));
+
+	publicationsCopy = list_copy(publications);
+
+	/* Process publication(s). */
+	slot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple);
+	while (tuplestore_gettupleslot(res->tuplestore, true, false, slot))
+	{
+		char	   *pubname;
+		bool		isnull;
+
+		pubname = TextDatumGetCString(slot_getattr(slot, 1, &isnull));
+		Assert(!isnull);
+
+		/* Delete the publication present in publisher fr

Re: Frontend error logging style


On 29.03.22 12:13, Daniel Gustafsson wrote:

Most of these should probably be addressed separately from Tom's patch.


@@ -508,24 +502,15 @@ writefile(char *path, char **lines)



if (fclose(out_file))
-   {
-   pg_log_error("could not write file \"%s\": %m", path);
-   exit(1);
-   }
+   pg_fatal("could not write file \"%s\": %m", path);
}


Should we update this message to differentiate it from the fwrite error case?
Something like "an error occurred during writing.."


Should be "could not close ...", no?


@@ -2057,10 +2004,7 @@ check_locale_name(int category, const char *locale, char 
**canonname)

save = setlocale(category, NULL);
if (!save)
-   {
-   pg_log_error("setlocale() failed");
-   exit(1);
-   }
+   pg_fatal("setlocale() failed");


Should this gain a hint message for those users who have no idea what this
really means?


My setlocale() man page says:

ERRORS
 No errors are defined.

So uh ... ;-)


@@ -3089,18 +2979,14 @@ main(int argc, char *argv[])
else if (strcmp(optarg, "libc") == 0)
locale_provider = COLLPROVIDER_LIBC;
else
-   {
-   pg_log_error("unrecognized locale provider: 
%s", optarg);
-   exit(1);
-   }
+   pg_fatal("unrecognized locale provider: 
%s", optarg);


Should this %s be within quotes to match how we usually emit user-input?


Usually not done after colon, but could be.


@@ -1123,9 +1097,9 @@ verify_btree_slot_handler(PGresult *res, PGconn *conn, 
void *context)
pg_log_warning("btree index \"%s.%s.%s\": btree checking 
function returned unexpected number of rows: %d",
   rel->datinfo->datname, 
rel->nspname, rel->relname, ntups);
if (opts.verbose)
-   pg_log_info("query was: %s", rel->sql);
-   pg_log_warning("Are %s's and amcheck's versions 
compatible?",
-  progname);
+   pg_log_warning_detail("Query was: %s", 
rel->sql);
+   pg_log_warning_hint("Are %s's and amcheck's versions 
compatible?",
+   progname);


Should "amcheck's" be a %s parameter to make translation reusable (which it
miht never be) and possibly avoid translation mistake?


We don't have translations set up for contrib modules.  Otherwise, this 
kind of thing would probably be something to look into.



--- a/src/bin/pg_basebackup/receivelog.c
+++ b/src/bin/pg_basebackup/receivelog.c
@@ -140,7 +140,7 @@ open_walfile(StreamCtl *stream, XLogRecPtr startpoint)
/* fsync file in case of a previous crash */
if (stream->walmethod->sync(f) != 0)
{
-   pg_log_fatal("could not fsync existing write-ahead log file 
\"%s\": %s",
+   pg_log_error("could not fsync existing write-ahead log file 
\"%s\": %s",
 fn, 
stream->walmethod->getlasterror());
stream->walmethod->close(f, CLOSE_UNLINK);
exit(1);


In the case where we already have IO related errors, couldn't the close() call
cause an additional error message which potentially could be helpful for
debugging?


Yeah, I think in general we have been sloppy with reporting file-closing 
errors properly.  Those presumably happen very rarely, but when they do, 
things are probably very bad.



@@ -597,31 +570,19 @@ main(int argc, char *argv[])



if (ControlFile->data_checksum_version == 0 &&
mode == PG_MODE_CHECK)
-   {
-   pg_log_error("data checksums are not enabled in cluster");
-   exit(1);
-   }
+   pg_fatal("data checksums are not enabled in cluster");



Fatal seems sort of out place here, it's not really a case of "something
terrible happened" but rather "the preconditions weren't met".  Couldn't these
be a single pg_log_error erroring out with the reason in a pg_log_detail?


"fatal" means error plus exit, so this seems ok.  There is no separate 
log level for really-bad-error.



@@ -721,7 +721,7 @@ setFilePath(ArchiveHandle *AH, char *buf, const char 
*relativeFilename)
dname = ctx->directory;

if (strlen(dname) + 1 + strlen(relativeFilename) + 1 > MAXPGPATH)


Unrelated, but shouldn't that be >= MAXPGPATH?


Seems correct to me as is.


@@ -14951,18 +14942,18 @@ createViewAsClause(Archive *fout, const TableInfo 
*tbinfo)



-   fatal("definition

Re: Frontend error logging style


On 27.03.22 22:19, Tom Lane wrote:

Here's a rebase up to today's HEAD.  I've fixed the merge problems,
but there may be some stray new error calls that could be converted
to use pg_fatal() and haven't been.  I don't want to do a full
fresh scan of the code until we're about ready to commit this.


This looks like a good improvement to me.

I think I would want the program name/location also in front of the 
detail and hint lines.  I need to think about this a bit more.  This 
shouldn't hold up this patch; it would be a quick localized change. 
(I'm also thinking about providing a separate color code for the 
secondary messages.  Again, this could be a quick follow-up patch.)


The one change I didn't like was

-   pg_log_error("The program \"%s\" is needed by %s 
but was not found in the\n"
-"same directory as 
\"%s\".\n"

-"Check your installation.",
+   pg_log_error("the program \"%s\" is needed by %s 
but was not found in the same directory as \"%s\"",
 "postgres", progname, 
full_path);


This appears to prioritize the guideline "don't punctuate error message 
as full sentence" over what should be the actual guideline "don't make 
the error message a full sentence".


There are other occurrences of a similar message that were not changed 
in the same way by the patch.  Maybe we should leave this one alone in 
this patch and consider rewriting the message instead.

Restructure ALTER TABLE notes to clarify table rewrites and verification scans

2022-03-29 Thread James Coleman

Over in the "Document atthasmissing default optimization avoids
verification table scan" thread David Johnston (who I've cc'd)
suggested that my goals might be better implemented with a simple
restructuring of the Notes section of the ALTER TABLE docs. I think
this is also along the lines of Tom Lane's suggestion of a "unified
discussion", but I've chosen for now (and simplicity's sake) not to
break this into an entirely new page. If reviewers feel that is
warranted at this stage, I can do that, but it seems to me that for
now this improves the structure and sets us up for such a future page
but falls short of sufficient content to move into its own page.

One question on the changes: the current docs say "when attaching a
new partition it may be scanned to verify that existing rows meet the
partition constraint". The word "may" there seems to suggest there may
also be occasions where scans are not needed, but no examples of such
cases are present. I'm not immediately aware of such a case. Are these
constraints always validated? If not, in which cases can such a scan
be skipped?

I've also incorporated the slight correction in "Correct docs re:
rewriting indexes when table rewrite is skipped" [2] here, and will
rebase this patch if that gets committed.

Thanks,
James Coleman

1: 
https://www.postgresql.org/message-id/CAKFQuwZyBaJjNepdTM3kO8PLaCpRdRd8%2BmtLT8QdE73oAsGv8Q%40mail.gmail.com
2: 
https://www.postgresql.org/message-id/CAAaqYe90Ea3RG%3DA7H-ONvTcx549-oQhp07BrHErwM%3DAyH2ximg%40mail.gmail.com


v1-0001-Restructure-ALTER-TABLE-notes-to-clarify-table-re.patch
Description: Binary data

Re: pgsql: Add 'basebackup_to_shell' contrib module.

On Fri, Mar 25, 2022 at 5:52 PM Thomas Munro  wrote:
> On Sat, Mar 26, 2022 at 9:55 AM Thomas Munro  wrote:
> > https://api.cirrus-ci.com/v1/artifact/task/5637156969381888/log/contrib/basebackup_to_shell/tmp_check/log/regress_log_001_basic
>
> This line doesn't look too healthy:
>
> pg_basebackup: error: backup failed: ERROR:  shell command "type con >
> "C:cirruscontrib asebackup_to_shell mp_check mp_test_tch3\base.tar""
> failed
>
> I guess it's an escaping problem around \ characters.

Oh, right. I didn't copy the usual incantation as completely as I
should have done.

Here's a new version, hopefully rectifying that deficiency. I also add
a second patch here, documenting basebackup_to_shell.required_role,
because Joe Conway observed elsewhere that I forgot to do that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v2-0002-Document-basebackup_to_shell.required_role.patch
Description: Binary data


v2-0001-basebackup_to_shell-Add-TAP-test.patch
Description: Binary data

Correct docs re: rewriting indexes when table rewrite is skipped

2022-03-29 Thread James Coleman

Back in 367bc42 (for 9.2!) we "avoid[ed] index rebuild[ing] for
no-rewrite ALTER TABLE
.. ALTER TYPE." However the docs still claim that "a table rewrite is
not needed; but any indexes on the affected columns must still be
rebuilt."

I've attached a simple patch to update the docs to match the current behavior.

Thanks,
James Coleman
From f6515a5f5f39d728b4cad837480c3ca953ed4623 Mon Sep 17 00:00:00 2001
From: jcoleman 
Date: Tue, 29 Mar 2022 13:56:39 +
Subject: [PATCH v1] Docs: When table rewriting is skipped indexes are not
 rebuilt

In 367bc42 (for 9.2!) we avoid index rebuild for no-rewrite ALTER TABLE
.. ALTER TYPE. Update the docs to match.
---
 doc/src/sgml/ref/alter_table.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 5c0735e08a..39931c97c8 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -1366,7 +1366,7 @@ WITH ( MODULUS numeric_literal, REM
 existing column, if the USING clause does not change
 the column contents and the old type is either binary coercible to the new
 type or an unconstrained domain over the new type, a table rewrite is not
-needed; but any indexes on the affected columns must still be rebuilt.
+needed.
 Table and/or index rebuilds may take a
 significant amount of time for a large table; and will temporarily require
 as much as double the disk space.
-- 
2.20.1

Re: Document atthasmissing default optimization avoids verification table scan

2022-03-29 Thread James Coleman

On Sun, Mar 27, 2022 at 11:12 PM David G. Johnston
 wrote:
>
> On Sun, Mar 27, 2022 at 11:17 AM James Coleman  wrote:
>>
>> Hmm,  I didn't realize that was project policy,
>
>
> Guideline/Rule of Thumb is probably a better concept.

Ah, OK, thanks.

>>
>> but I'm a bit
>> surprised given that the sentence which 0001 replaces seems like a
>> direct violation of that also: "In neither case is a rewrite of the
>> table required."
>>
>
> IMO mostly due to the structuring of the paragraphs; something like the 
> following makes it less problematic (and as shown below may be sufficient to 
> address the purpose of this patch)
>
> """
> [...]
> The following alterations of the table require the entire table, and/or its 
> indexes, to be rewritten; which may take a significant amount of time for a 
> large table, and will temporarily require as much as double the disk space.
>
> Changing the type of an existing column will require the entire table and its 
> indexes to be rewritten. As an exception, if the USING clause does not change 
> the column contents and the old type is either binary coercible to the new 
> type, or an unconstrained domain over the new type, a table rewrite is not 
> needed; but any indexes on the affected columns must still be rewritten.
>
> Adding a column with a volatile DEFAULT also requires the entire table and 
> its indexes to be rewritten.
>
> The reason a non-volatile (or absent) DEFAULT does not require a rewrite of 
> the table is because the DEFAULT expression (or NULL) is evaluated at the 
> time of the statement and the result is stored in the table's metadata.
>
> The following alterations of the table require that it be scanned in its 
> entirety to ensure that no existing values are contrary to the new 
> constraints placed on the table.  Constraints backed by indexes will scan the 
> table as a side-effect of populating the new index with data.
>
> Adding a CHECK constraint requires scanning the table to verify that existing 
> rows meet the constraint.  The same goes for adding a NOT NULL constraint to 
> an existing column.
>
> A newly attached partition requires scanning the table to verify that 
> existing rows meet the partition constraint.
>
> A foreign key constraint requires scanning the table to verify that all 
> existing values exist on the referenced table.
>
> The main reason for providing the option to specify multiple changes in a 
> single ALTER TABLE is that multiple table scans or rewrites can thereby be 
> combined into a single pass over the table.
>
> Scanning a large table to verify a new constraint can take a long time, and 
> other updates to the table are locked out until the ALTER TABLE ADD 
> CONSTRAINT command is committed. For CHECK and FOREIGN KEY constraints there 
> is an option, NOT VALID, that reduces the impact of adding a constraint on 
> concurrent updates. With NOT VALID, the ADD CONSTRAINT command does not scan 
> the table and can be committed immediately. After that, a VALIDATE CONSTRAINT 
> command can be issued to verify that existing rows satisfy the constraint. 
> The validation step does not need to lock out concurrent updates, since it 
> knows that other transactions will be enforcing the constraint for rows that 
> they insert or update; only pre-existing rows need to be checked. Hence, 
> validation acquires only a SHARE UPDATE EXCLUSIVE lock on the table being 
> altered. (If the constraint is a foreign key then a ROW SHARE lock is also 
> required on the table referenced by the constraint.) In addition to improving 
> concurrency, it can be useful to use NOT VALID and VALIDATE CONSTRAINT in 
> cases where the table is known to contain pre-existing violations. Once the 
> constraint is in place, no new violations can be inserted, and the existing 
> problems can be corrected at leisure until VALIDATE CONSTRAINT finally 
> succeeds.
>
> The DROP COLUMN form does not physically remove the column, but simply makes 
> it invisible to SQL operations. Subsequent insert and update operations in 
> the table will store a null value for the column. Thus, dropping a column is 
> quick but it will not immediately reduce the on-disk size of your table, as 
> the space occupied by the dropped column is not reclaimed. The space will be 
> reclaimed over time as existing rows are updated.
>
> To force immediate reclamation of space occupied by a dropped column, you can 
> execute one of the forms of ALTER TABLE that performs a rewrite of the whole 
> table. This results in reconstructing each row with the dropped column 
> replaced by a null value.
>
> The rewriting forms of ALTER TABLE are not MVCC-safe. After a table rewrite, 
> the table will appear empty to concurrent transactions, if they are using a 
> snapshot taken before the rewrite occurred. See Section 13.5 for more details.
> [...]
> """
>
> I'm liking the idea of breaking out multiple features into their own 
> sentences or paragraphs instead of saying:
>
> "Adding a column

Re: Add parameter jit_warn_above_fraction

On Tue, Mar 29, 2022 at 6:09 AM Julien Rouhaud  wrote:
> The last remaining thing is whether logging at WARNING level is the correct
> choice.  I'm personally fine with it, because the people who are going to use
> it will probably use the same approach as for log_min_duration_statements:
> enable it first with a high value, and check if that just lead to a massive 
> log
> spam.  If not, see if there's anything that needs attention and fix it,
> otherwise keep lowering it and keep going.

I think WARNING is fine. After all, the parameter is called
"jit_warn_above_fraction". Yeah, we could also rename the parameter,
but I think "jit_notice_above_fraction" would be harder to understand.
It feels very intuitive to just say "warn me if X thing happens" and I
don't see a reason why we shouldn't just do that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: standby recovery fails (tablespace related) (tentative patch and discussion)

On Tue, Mar 29, 2022 at 9:28 AM Alvaro Herrera  wrote:
> OK, this is a bug that's been open for years.   A fix can be committed
> after the feature freeze anyway.

+1

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Accommodate startup process in a separate ProcState array slot instead of in MaxBackends slots.

On Tue, Mar 29, 2022 at 3:21 AM Bharath Rupireddy
 wrote:
> Thanks. LGTM.

Committed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2022-03-29 Thread Alvaro Herrera

On 2022-Mar-29, Robert Haas wrote:

> I'm fine with that approach, but I'd like to ask that we proceed
> expeditiously, because I have another patch that I want to commit that
> touches this area. I can commit to helping with whatever we decide to
> do here, but I don't want to keep that patch on ice while we figure it
> out and then have it miss the release.

OK, this is a bug that's been open for years.   A fix can be committed
after the feature freeze anyway.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: Add LZ4 compression in pg_dump

On Tue, Mar 29, 2022 at 1:03 AM Michael Paquier  wrote:
> > Then, if people are willing to adopt the syntax that the
> > backup_compression.c/h stuff supports as a project standard (+1 from
> > me) we can go the other way and rename that stuff to be more generic,
> > taking backup out of the name.
>
> I am not sure about the specification part which is only used by base
> backups that has no client-server requirements, so option values would
> still require their own grammar.

I don't know what you mean by this. I think the specification stuff
could be reused in a lot of places. If you can ask for a base backup
with zstd:level=3,long=1,fancystuff=yes or whatever we end up with,
why not enable exactly the same for every other place that uses
compression? I don't know what "client-server requirements" is or what
that has to do with this.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2022-03-29 Thread Aleksander Alekseev

Hi hackers,

> I took another look at the patchset. Personally I don't think it will get much
> better than it is now. I'm inclined to change the status of the CF entry to
> "Ready for Committer" unless anyone disagrees.

> > About v25-0002-Use-64-bit-format-to-output-XIDs.patch:
> > I don't see the point of applying this now. It doesn't make PG15 any
> > better. It's just a patch part of which we might need later.
>
> AFAICT, this patch provides no actual functionality change, so holding
> it a bit for early in the PG16 cycle wouldn't lose anything.

OK. As I understand we still have a consensus that v30-0001 (SLRU refactoring,
not the XID formatting) is targeting PG15, so I changed the CF entry to
"Ready for Committer" for this single patch. I rechecked it again on the
current `master` branch without the other patches and it is OK. cfbot is happy
with the patchset as well.

-- 
Best regards,
Aleksander Alekseev

Jumble Query with COERCE_SQL_SYNTAX

2022-03-29 Thread Yura Sokolov

Good day.

v14 introduced the way to get original text for some kind of expressions
using new 'funcformat' - COERCE_SQL_SYNTAX:
- EXTRACT(part from timestamp)
- (text IS [form] NORMALIZED)
and others.

Mentioned EXTRACT and NORMALIZED statements has parts, that are not
usual arguments but some kind of syntax. At least, there is no way to:

PREPARE a(text) as select extract($1 from now());

But JumbleExpr doesn't distinguish it and marks this argument as a
variable constant, ie remembers it in 'clocations'.

I believe such "non-variable constant" should not be jumbled as
replaceable thing.

In our case (extended pg_stat_statements), we attempt to generalize
plan and then explain generalized plan. But using constant list from
JumbleState we mistakenly replace first argument in EXTRACT expression
with parameter. And then 'get_func_sql_syntax' fails on assertion "first
argument is text constant".

Sure we could workaround in our plan mutator with skipping such first
argument. But I wonder, is it correct at all to not count it as a
non-modifiable syntax part in JumbleExpr?

--

regards,

Sokolov Yura
y.soko...@postgrespro.ru
funny.fal...@gmail.coma

Re: refactoring basebackup.c (zstd workers)