date:20220307

Re: Handle infinite recursion in logical replication setup

2022-03-07 Thread Amit Kapila

On Tue, Mar 8, 2022 at 12:17 PM Peter Smith  wrote:
>
> On Tue, Mar 8, 2022 at 4:21 PM Amit Kapila  wrote:
> >
> > On Tue, Mar 8, 2022 at 10:31 AM Peter Smith  wrote:
> > >
> > > IIUC the new option may be implemented subscriber-side and/or
> > > publisher-side and/or both, and the subscriber-side option may be
> > > "enhanced" in future to prevent cycles. And probably there are more
> > > features I don't know about or that have not yet been thought of.
> > >
> > > ~~
> > >
> > > Even if the plan is only to implement just one part now and then add
> > > more later, I think there still should be some consideration for what
> > > you expect all possible future options to look like, because that may
> > > affect current implementation choices.
> > >
> > > The point is:
> > >
> > > - we should take care so don't accidentally end up with an option that
> > > turned out to be inconsistent looking on the subscriber-side /
> > > publisher-side.
> > >
> > > - we should try to avoid accidentally painting ourselves into a corner
> > > (e.g. stuck with a boolean option that cannot be enhanced later on)
> > >
> >
> > Agreed. I think it is important to see we shouldn't do something which
> > is not extendable in future or paint us in the corner. But, as of now,
> > with the current proposal, the main thing we should consider is
> > whether exposing the boolean option is okay or shall we do something
> > different so that we can extend it later to specific origin ids?
> >
>
> I was wondering, assuming later there is an enhancement to detect and
> prevent cycles using origin ids, then what really is the purpose of
> the option?
>

We can't use origin ids during the initial sync as we can't
differentiate between data from local or remote node. Also, how will
we distinguish between different origins? AFAIU, all the changes
applied on any particular node use the same origin.

> I can imagine if a cycle happens should probably log some one-time
> WARNING (just to bring it to the user's attention in case it was
> caused by some accidental misconfiguration) but apart from that why
> would this need to be an option at all - i.e. is there a use-case
> where a user would NOT want to prevent a recursive error?
>

I think there will be plenty of such setups where there won't be a
need for any such option like where users won't need bi-directional
replication for the same table.

-- 
With Regards,
Amit Kapila.

Re: Logical replication timeout problem

2022-03-07 Thread Masahiko Sawada

Hi,

On Tue, Mar 8, 2022 at 10:25 AM wangw.f...@fujitsu.com
 wrote:
>
> On Fri, Mar 4, 2022 at 4:26 PM Kuroda, Hayato/黒田 隼人 
>  wrote:
> >
> Thanks for your test and comments.
>
> > Some codes were added in ReorderBufferProcessTXN() for treating DDL,
> > but I doubted updating accept_writes is needed.
> > IMU, the parameter is read by OutputPluginPrepareWrite() in order align
> > messages.
> > They should have a header - like 'w' - before their body. But here only a
> > keepalive message is sent,
> > no meaningful changes, so I think it might be not needed.
> > I commented out the line and tested like you did [1], and no timeout and 
> > errors
> > were found.
> > Do you have any reasons for that?
> >
> > https://www.postgresql.org/message-
> > id/OS3PR01MB6275A95FD44DC6C46058EA389E3B9%40OS3PR01MB6275.jpnprd0
> > 1.prod.outlook.com
> Yes, you are right. We should not set accept_writes to true here.
> And I looked into the function WalSndUpdateProgress. I found function
> WalSndUpdateProgress try to record the time of some message(by function
> LagTrackerWrite) sent to subscriber, such as in function pgoutput_commit_txn.
> Then, when publisher receives the reply message from the subscriber(function
> ProcessStandbyReplyMessage), publisher invokes LagTrackerRead to calculate the
> delay time(refer to view pg_stat_replication).
> Referring to the purpose of LagTrackerWrite, I think it is no need to log time
> when sending keepalive messages here.
> So when the parameter send_keep_alive of function WalSndUpdateProgress is 
> true,
> skip the recording time.
>
> > I'm also happy if you give the version number :-).
> Introduce version information, starting from version 1.
>
> Attach the new patch.
> 1. Fix wrong variable setting and skip unnecessary time records.[suggestion 
> by Kuroda-San and me.]
> 2. Introduce version information.[suggestion by Peter, Kuroda-San]

I've looked at the patch and have a question:

+void
+SendKeepaliveIfNecessary(LogicalDecodingContext *ctx, bool skipped)
+{
+static int skipped_changes_count = 0;
+
+/*
+ * skipped_changes_count is reset when processing changes that do not
+ * need to be skipped.
+ */
+if (!skipped)
+{
+skipped_changes_count = 0;
+return;
+}
+
+/*
+ * After continuously skipping SKIPPED_CHANGES_THRESHOLD
changes, try to send a
+ * keepalive message.
+ */
+#define SKIPPED_CHANGES_THRESHOLD 1
+
+if (++skipped_changes_count >= SKIPPED_CHANGES_THRESHOLD)
+{
+/* Try to send a keepalive message. */
+OutputPluginUpdateProgress(ctx, true);
+
+/* After trying to send a keepalive message, reset the flag. */
+skipped_changes_count = 0;
+}
+}

Since we send a keepalive after continuously skipping 1 changes,
the originally reported issue can still occur if skipping 1
changes took more than the timeout and the walsender didn't send any
change while that, is that right?

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Handle infinite recursion in logical replication setup

2022-03-07 Thread Peter Smith

On Tue, Mar 8, 2022 at 4:21 PM Amit Kapila  wrote:
>
> On Tue, Mar 8, 2022 at 10:31 AM Peter Smith  wrote:
> >
> > IIUC the new option may be implemented subscriber-side and/or
> > publisher-side and/or both, and the subscriber-side option may be
> > "enhanced" in future to prevent cycles. And probably there are more
> > features I don't know about or that have not yet been thought of.
> >
> > ~~
> >
> > Even if the plan is only to implement just one part now and then add
> > more later, I think there still should be some consideration for what
> > you expect all possible future options to look like, because that may
> > affect current implementation choices.
> >
> > The point is:
> >
> > - we should take care so don't accidentally end up with an option that
> > turned out to be inconsistent looking on the subscriber-side /
> > publisher-side.
> >
> > - we should try to avoid accidentally painting ourselves into a corner
> > (e.g. stuck with a boolean option that cannot be enhanced later on)
> >
>
> Agreed. I think it is important to see we shouldn't do something which
> is not extendable in future or paint us in the corner. But, as of now,
> with the current proposal, the main thing we should consider is
> whether exposing the boolean option is okay or shall we do something
> different so that we can extend it later to specific origin ids?
>

I was wondering, assuming later there is an enhancement to detect and
prevent cycles using origin ids, then what really is the purpose of
the option?

I can imagine if a cycle happens should probably log some one-time
WARNING (just to bring it to the user's attention in case it was
caused by some accidental misconfiguration) but apart from that why
would this need to be an option at all - i.e. is there a use-case
where a user would NOT want to prevent a recursive error?

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Adding CI to our tree (ccache)

2022-03-07 Thread Justin Pryzby

On Mon, Mar 07, 2022 at 11:10:54AM -0800, Andres Freund wrote:
> On 2022-03-06 10:16:54 -0600, Justin Pryzby wrote:
> > On Fri, Mar 04, 2022 at 05:30:03PM -0800, Andres Freund wrote:
> > > I tried to use it, but saw that no caching was happening, and debugged
> > > it. Which yielded that it can't be used due to the way output files are
> > > specified (and due to multiple files, but that can be prevented with an
> > > msbuild parameter).
> > 
> > Could you give a hint about to the msbuild param to avoid processing 
> > multiple
> > files with cl.exe?  I'm not able to find it...
> 
> /p:UseMultiToolTask=true

Wow - thanks ;)

> > I don't know about the issue with output filenames ..
> 
> I don't remember the precise details anymore, but it boils down to ccache
> requiring the output filename to be specified, but we only specify the output
> directory. Or very similar.

There's already a problem report and PR for this.
I didn't test it, but I hope it'll be fixed in their next minor release.

https://github.com/ccache/ccache/issues/1018

-- 
Justin

Re: Add index scan progress to pg_stat_progress_vacuum

2022-03-07 Thread Masahiko Sawada

On Thu, Mar 3, 2022 at 2:08 PM Imseih (AWS), Sami  wrote:
>
> >>> If the failsafe kicks in midway through a vacuum, the number 
> > indexes_total will not be reset to 0. If INDEX_CLEANUP is turned off, then 
> > the value will be 0 at the start of the vacuum.
> >>
> >> The way that this works with num_index_scans is that we "round up"
> >> when there has been non-zero work in lazy_vacuum_all_indexes(), but
> >> not if the precheck in lazy_vacuum_all_indexes() fails. That seems
> >> like a good model to generalize from here. Note that this makes
> >> INDEX_CLEANUP=off affect num_index_scans in much the same way as a
> >> VACUUM where the failsafe kicks in very early, during the initial heap
> >> pass. That is, if the failsafe kicks in before we reach lazy_vacuum()
> >> for the first time (which is not unlikely), or even in the
> >> lazy_vacuum_all_indexes() precheck, then num_index_scans will remain
> >> at 0, just like INDEX_CLEANUP=off.
> >>
> >> The actual failsafe WARNING shows num_index_scans, possibly before it
> >> gets incremented one last time (by "rounding up"). So it's reasonably
> >> clear how this all works from that context (assuming that the
> >> autovacuum logging stuff, which reports num_index_scans, outputs a
> >> report for a table where the failsafe kicked in).
>
> >I am confused.  If failsafe kicks in during the middle of a vacuum, I
> >   (perhaps naively) would expect indexes_total and indexes_processed to go 
> > to
> >zero, and I'd expect to no longer see the "vacuuming indexes" and 
> > "cleaning
> >   up indexes" phases.  Otherwise, how would I know that we are now skipping
> >  indexes?  Of course, you won't have any historical context about the index
> >  work done before failsafe kicked in, but IMO it is misleading to still
> >   include it in the progress view.
>
> After speaking with Nathan offline, the best forward is to reset 
> indexes_total and indexes_processed to 0 after the start of "vacuuming 
> indexes" or "cleaning up indexes" phase.

+1

+/*
+ * vacuum_worker_init --- initialize this module's shared memory hash
+ * to track the progress of a vacuum worker
+ */
+void
+vacuum_worker_init(void)
+{
+   HASHCTL info;
+   longmax_table_size = GetMaxBackends();
+
+   VacuumWorkerProgressHash = NULL;
+
+   info.keysize = sizeof(pid_t);
+   info.entrysize = sizeof(VacProgressEntry);
+
+   VacuumWorkerProgressHash = ShmemInitHash("Vacuum Progress Hash",
+
  max_table_size,
+
  max_table_size,
+
  ,
+
  HASH_ELEM | HASH_BLOBS);
+}

It seems to me that creating a shmem hash with max_table_size entries
for parallel vacuum process tracking is too much. IIRC an old patch
had parallel vacuum workers advertise its progress and changed the
pg_stat_progress_vacuum view so that it aggregates the results
including workers' stats. I think it’s better than the current one.
Why did you change that?

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Optionally automatically disable logical replication subscriptions on error

2022-03-07 Thread Amit Kapila

On Tue, Mar 8, 2022 at 9:37 AM Peter Smith  wrote:
>
> Please find below some review comments for v29.
>
> ==
>
> 1. src/backend/replication/logical/worker.c - worker_post_error_processing
>
> +/*
> + * Abort and cleanup the current transaction, then do post-error processing.
> + * This function must be called in a PG_CATCH() block.
> + */
> +static void
> +worker_post_error_processing(void)
>
> The function comment and function name are too vague/generic. I guess
> this is a hang-over from Sawada-san's proposed patch, but now since
> this is only called when disabling the subscription so both the
> comment and the function name should say that's what it is doing...
>
> e.g. rename to DisableSubscriptionOnError() or something similar.
>
> ~~~
>
> 2. src/backend/replication/logical/worker.c - worker_post_error_processing
>
> + /* Notify the subscription has been disabled */
> + ereport(LOG,
> + errmsg("logical replication subscription \"%s\" has been be disabled
> due to an error",
> +MySubscription->name));
>
>   proc_exit(0);
>  }
>
> I know this is common code, but IMO it would be better to do the
> proc_exit(0); from the caller in the PG_CATCH. Then I think the code
> will be much easier to read the throw/exit logic, rather than now
> where it is just calling some function that never returns...
>
> Alternatively, if you want the code how it is, then the function name
> should give some hint that it is never going to return - e.g.
> DisableSubscriptionOnErrorAndExit)
>

I think we are already in error so maybe it is better to name it as
DisableSubscriptionAndExit.

Few other comments:
=
1.
DisableSubscription()
{
..
+
+ LockSharedObject(SubscriptionRelationId, subid, 0, AccessExclusiveLock);

Why do we need AccessExclusiveLock here? The Alter/Drop Subscription
takes AccessExclusiveLock, so AccessShareLock should be sufficient
unless we have a reason to use AccessExclusiveLock lock. The other
similar usages in this file (pg_subscription.c) also take
AccessShareLock.

2. Shall we mention this feature in conflict handling docs [1]:
Now:
To skip the transaction, the subscription needs to be disabled
temporarily by ALTER SUBSCRIPTION ... DISABLE first.

After:
To skip the transaction, the subscription needs to be disabled
temporarily by ALTER SUBSCRIPTION ... DISABLE first or alternatively,
the subscription can be used with the disable_on_error option.

Feel free to use something on the above lines, if you agree.

-- 
With Regards,
Amit Kapila.

Re: Comment typo in CheckCmdReplicaIdentity

2022-03-07 Thread Peter Smith

On Tue, Mar 8, 2022 at 4:31 PM Michael Paquier  wrote:
>
> On Mon, Mar 07, 2022 at 10:28:08AM +0800, Julien Rouhaud wrote:
> > +1
>
> And done.
> --
> Michael

Thanks!

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Comment typo in CheckCmdReplicaIdentity

2022-03-07 Thread Michael Paquier

On Mon, Mar 07, 2022 at 10:28:08AM +0800, Julien Rouhaud wrote:
> +1

And done.
--
Michael


signature.asc
Description: PGP signature

Re: Handle infinite recursion in logical replication setup

2022-03-07 Thread Amit Kapila

On Tue, Mar 8, 2022 at 10:31 AM Peter Smith  wrote:
>
> IIUC the new option may be implemented subscriber-side and/or
> publisher-side and/or both, and the subscriber-side option may be
> "enhanced" in future to prevent cycles. And probably there are more
> features I don't know about or that have not yet been thought of.
>
> ~~
>
> Even if the plan is only to implement just one part now and then add
> more later, I think there still should be some consideration for what
> you expect all possible future options to look like, because that may
> affect current implementation choices.
>
> The point is:
>
> - we should take care so don't accidentally end up with an option that
> turned out to be inconsistent looking on the subscriber-side /
> publisher-side.
>
> - we should try to avoid accidentally painting ourselves into a corner
> (e.g. stuck with a boolean option that cannot be enhanced later on)
>

Agreed. I think it is important to see we shouldn't do something which
is not extendable in future or paint us in the corner. But, as of now,
with the current proposal, the main thing we should consider is
whether exposing the boolean option is okay or shall we do something
different so that we can extend it later to specific origin ids?

-- 
With Regards,
Amit Kapila.

Re: Make relfile tombstone files conditional on WAL level

2022-03-07 Thread Dilip Kumar

On Tue, Mar 8, 2022 at 1:27 AM Robert Haas  wrote:

> > The only part I do not like in the patch is that before this patch we
> > could directly access the buftag->rnode.  But since now we are not
> > having directly relfilenode as part of the buffertag and instead of
> > that we are keeping individual fields (i.e. dbOid, tbsOid and relNode)
> > in the buffer tag.  So if we have to directly get the relfilenode we
> > need to generate it.  However those changes are very limited to just 1
> > or 2 file so maybe not that bad.
>
> You're talking here about just needing to introduce BufTagGetFileNode
> and BufTagSetFileNode, or something else? I don't find those macros to
> be problematic.

Yeah, I was talking about BufTagGetFileNode macro only.  The reason I
did not like it is that earlier we could directly use buftag->rnode,
but now whenever we wanted to use rnode first we need to use a
separate variable for preparing the rnode using BufTagGetFileNode
macro.  But these changes are very localized and a very few places so
I don't have much problem with those.

>
> BufTagSetFileNode could maybe assert that the OID isn't too big,
> though. We should ereport() before we get to this point if we somehow
> run out of values, but it might be nice to have a check here as a
> backup.

Yeah, we could do that, I will do that in the next version.  Thanks.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Handle infinite recursion in logical replication setup

2022-03-07 Thread Peter Smith

IIUC the new option may be implemented subscriber-side and/or
publisher-side and/or both, and the subscriber-side option may be
"enhanced" in future to prevent cycles. And probably there are more
features I don't know about or that have not yet been thought of.

~~

Even if the plan is only to implement just one part now and then add
more later, I think there still should be some consideration for what
you expect all possible future options to look like, because that may
affect current implementation choices.

The point is:

- we should take care so don't accidentally end up with an option that
turned out to be inconsistent looking on the subscriber-side /
publisher-side.

- we should try to avoid accidentally painting ourselves into a corner
(e.g. stuck with a boolean option that cannot be enhanced later on)

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Time to drop plpython2?

2022-03-07 Thread Andres Freund

Hi,

On 2022-03-07 23:39:39 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2022-03-07 20:11:52 -0500, Tom Lane wrote:
> >> This version seems ready-to-go to me, or at least ready to see
> >> what the buildfarm makes of it.
> 
> > Pushed. Let's see...
> 
> wrasse says you were too quick to drop plpython_error_5.out.

Does look like it. I'll try to find a distribution with an old python...

> Otherwise looks pretty good so far.

crake also failed. Looks like plpy_plpymodule.h needs to include plpython.h. A
pre-existing issue that just didn't happen to cause problems...

Greetings,

Andres Freund

Re: Time to drop plpython2?

2022-03-07 Thread Tom Lane

Andres Freund  writes:
> On 2022-03-07 20:11:52 -0500, Tom Lane wrote:
>> This version seems ready-to-go to me, or at least ready to see
>> what the buildfarm makes of it.

> Pushed. Let's see...

wrasse says you were too quick to drop plpython_error_5.out.
Otherwise looks pretty good so far.

regards, tom lane

Re: role self-revocation

2022-03-07 Thread Mark Dilger

> On Mar 7, 2022, at 12:16 PM, Tom Lane  wrote:
> 
> What would be
> lost if we drop it?

I looked into this a bit.  Removing that bit of code, the only regression test 
changes for "check-world" are the expected ones, with nothing else breaking.  
Running installcheck+pg_upgrade to the patched version of HEAD from each of 
versions 11, 12, 13 and 14 doesn't turn up anything untoward.  The change I 
used (for reference) is attached:

v1-0001-WIP-Removing-role-self-administration.patch.WIP
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Optionally automatically disable logical replication subscriptions on error

2022-03-07 Thread Peter Smith

Please find below some review comments for v29.

==

1. src/backend/replication/logical/worker.c - worker_post_error_processing

+/*
+ * Abort and cleanup the current transaction, then do post-error processing.
+ * This function must be called in a PG_CATCH() block.
+ */
+static void
+worker_post_error_processing(void)

The function comment and function name are too vague/generic. I guess
this is a hang-over from Sawada-san's proposed patch, but now since
this is only called when disabling the subscription so both the
comment and the function name should say that's what it is doing...

e.g. rename to DisableSubscriptionOnError() or something similar.

~~~

2. src/backend/replication/logical/worker.c - worker_post_error_processing

+ /* Notify the subscription has been disabled */
+ ereport(LOG,
+ errmsg("logical replication subscription \"%s\" has been be disabled
due to an error",
+MySubscription->name));

  proc_exit(0);
 }

I know this is common code, but IMO it would be better to do the
proc_exit(0); from the caller in the PG_CATCH. Then I think the code
will be much easier to read the throw/exit logic, rather than now
where it is just calling some function that never returns...

Alternatively, if you want the code how it is, then the function name
should give some hint that it is never going to return - e.g.
DisableSubscriptionOnErrorAndExit)

~~~

3. src/backend/replication/logical/worker.c - start_table_sync

+ {
+ /*
+ * Abort the current transaction so that we send the stats message
+ * in an idle state.
+ */
+ AbortOutOfAnyTransaction();
+
+ /* Report the worker failed during table synchronization */
+ pgstat_report_subscription_error(MySubscription->oid, false);
+
+ PG_RE_THROW();
+ }

(This is a repeat of a previous comment from [1] comment #2)

I felt the separation of those 2 statements and comments makes the
code less clean than it could/should be. IMO they should be grouped
together.

SUGGESTED

/*
* Report the worker failed during table synchronization. Abort the
* current transaction so that the stats message is sent in an idle
* state.
*/
AbortOutOfAnyTransaction();
pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_worker());

~~~

4. src/backend/replication/logical/worker.c - start_apply

+ {
+ /*
+ * Abort the current transaction so that we send the stats message
+ * in an idle state.
+ */
+ AbortOutOfAnyTransaction();
+
+ /* Report the worker failed while applying changes */
+ pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_worker());
+
+ PG_RE_THROW();
+ }

(same as #3 but comment says "while applying changes")

SUGGESTED

/*
* Report the worker failed while applying changing. Abort the current
* transaction so that the stats message is sent in an idle state.
*/
AbortOutOfAnyTransaction();
pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_worker());

--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPucrizJpqhSyD7dKj1yRkNMskqmiekD_RRqYpdDdusMRQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Logical replication timeout problem

2022-03-07 Thread Ajin Cherian

On Tue, Mar 8, 2022 at 12:25 PM wangw.f...@fujitsu.com
 wrote:
> Attach the new patch.
> 1. Fix wrong variable setting and skip unnecessary time records.[suggestion 
> by Kuroda-San and me.]
> 2. Introduce version information.[suggestion by Peter, Kuroda-San]
>
> Regards,
> Wang wei

Some comments.

1. The comment  on top of SendKeepaliveIfNecessary

 Try to send a keepalive message if too many changes was skipped.

change to

Try to send a keepalive message if too many changes wer skipped.

2. In pgoutput_change:

+ /* Reset the counter for skipped changes. */
+ SendKeepaliveIfNecessary(ctx, false);
+

This reset is called too early, this function might go on to skip
changes because of the row filter, so this
reset fits better once we know for sure that a change is sent out. You
will also need to send keep alive
when the change is skipped due to the row filter.

regards,
Ajin Cherian
Fujitsu Australia

Re: Expose JIT counters/timing in pg_stat_statements

2022-03-07 Thread Julien Rouhaud

On Mon, Mar 07, 2022 at 01:40:34PM +0100, Magnus Hagander wrote:
>
> I wonder if there might be an interesting middle ground, or if that is
> making it too much. That is, we could have an
> Option 3:
> jit_count
> total_jit_time - for sum of functions+inlining+optimization+emission time
> min_jit_time - for sum of functions+inlining+optimization+emission time
> max_jit_time - for sum of functions+inlining+optimization+emission time
> mean_jit_time - for sum of functions+inlining+optimization+emission time
> stddev_jit_time - for sum of functions+inlining+optimization+emission time
> jit_functions
> jit_generation_time
> jit_inlining_count
> jit_inlining_time
> jit_optimization_count
> jit_optimization_time
> jit_emission_count
> jit_emission_time
>
> That is, we'd get the more detailed timings across the total time, but
> not on the details. But that might be overkill.

I also thought about it but it seems overkill.  pg_stat_statements view is
already very big, and I think that the JIT time should be somewhat stable, at
least compared to how much a query execution time can vary depending on the
parameters.  This approach would also be a bit useless if you change the
costing of underlying JIT operation.

> But -- here's an updated patched based on Option 2.

Thanks!

Code-wide, the patch looks good.  For the doc, it seems that you documented
jit_inlining_count three times rather than documenting jit_optimization_count
and jit_emission_count.

I don't think we can add tests there, and having a test for every new counter
being >= 0 seems entirely useless, however there should be a new test added for
the "oldextversions" test to make sure that there's no issue with old SQL / new
shlib compatibility.  And looking at it I see that it was already missed for
version 1.9 :(

Re: pg_tablespace_location() failure with allow_in_place_tablespaces

2022-03-07 Thread Kyotaro Horiguchi

At Tue, 8 Mar 2022 10:28:46 +0900, Michael Paquier  wrote 
in 
> On Tue, Mar 08, 2022 at 10:06:50AM +0900, Kyotaro Horiguchi wrote:
> > At Tue, 8 Mar 2022 10:39:06 +1300, Thomas Munro  
> > wrote in 
> >> Thanks, you're right.  Test on a Win10 VM.  Here's a new version.
> 
> Looks fine to me.
> 
> > FYI, on Windows11, pg_basebackup didn't work correctly without the
> > patch.  So this looks like fixing an undiscovered bug as well.
> 
> Well, that's not really a long-time bug but just a side effect of
> in-place tablespaces because we don't use them in many test cases 
> yet, is it?

No, we don't. So just FYI.

> >> pg_basebackup -D copy
> > WARNING:  could not read symbolic link "pg_tblspc/16384": Invalid argument
> > pg_basebackup: error: tar member has empty name
> > 
> >1 File(s) 0 bytes
> >3 Dir(s)  171,920,613,376 bytes free
> 
> That's a lot of free space.

The laptop has a 512GB storage, so 160GB is pretty normal, maybe.
129GB of the storage is used by some VMs..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [RFC] building postgres with meson

2022-03-07 Thread Andres Freund

On 2022-03-07 09:58:41 -0800, Andres Freund wrote:
> On 2022-03-07 14:56:24 +0100, Peter Eisentraut wrote:
> > Is there a current patch set to review in this thread at the moment?
> 
> I've been regularly rebasing and improving the patchset, but didn't post to
> the thread about it most of the time.
> 
> I've just pushed another rebase, will work to squash it into a reasonable
> number of patches and then repost that.

Now done, see 
https://postgr.es/m/20220308025629.3xh2yo4sau74oafo%40alap3.anarazel.de

Re: [RFC] building postgres with meson -v6

2022-03-07 Thread Andres Freund

Hi,

Attached is v6 of the meson patchset. There are a lots of changes since the
last version posted. These include:
- python2 removal is now committed, so not needed in here anymore
- CI changed to be based on the CI now merged into postgres
- CI also tests suse, rhel, fedora (Nazir Bilal Yavuz). Found several bugs. I
  don't think we'd merge all of those, but while working on the meson branch,
  it's really useful.
- all dependencies, except for pl/tcl (should be done soon)
- several missing options added (segsize, extra_{lib,include}_dirs, 
enable-tap-tests
- several portability fixes, builds on net/openbsd without changes now
- improvements to a number of "configure" tests
- lots of ongoing rebasing changes
- ...

Greetings,

Andres Freund


v6-0001-meson-prereq-output-and-depencency-tracking-work.patch.gz
Description: application/patch-gzip


binYrOujnYBVk.bin
Description: application/patch-gzip


bin7y9rjP6Z4L.bin
Description: application/patch-gzip


bin38gaSD4mp3.bin
Description: application/patch-gzip


bingE4TPJtpcZ.bin
Description: application/patch-gzip


bin3OlM816jmk.bin
Description: application/patch-gzip


bin_rZ2lQyQ4V.bin
Description: application/patch-gzip


binwpPHbnLZHv.bin
Description: application/patch-gzip


v6-0009-prereq-make-unicode-targets-work-in-vpath-builds.patch.gz
Description: application/patch-gzip


binvv9YVTDcsT.bin
Description: application/patch-gzip


v6-0011-ldap-tests-Add-paths-for-openbsd.patch.gz
Description: application/patch-gzip


v6-0012-wip-split-TESTDIR-into-two.patch.gz
Description: application/patch-gzip


v6-0013-meson-Add-meson-based-buildsystem.patch.gz
Description: application/patch-gzip


v6-0014-meson-ci-Build-both-with-meson-and-as-before.patch.gz
Description: application/patch-gzip

Re: Time to drop plpython2?

2022-03-07 Thread Andres Freund

On 2022-03-07 20:11:52 -0500, Tom Lane wrote:
> This version seems ready-to-go to me, or at least ready to see
> what the buildfarm makes of it.

Pushed. Let's see...

Re: refreshing query id for pg_stat_statements based on comment in sql

2022-03-07 Thread Julien Rouhaud

Hi,

On Mon, Mar 07, 2022 at 09:42:26AM -0800, Zhihong Yu wrote:
> Hi,
> Currently the query id for pg_stat_statements gets calculated based on the
> parse nodes specifics.
> This means that the user cannot add a comment to a SQL query to test
> something. (though some other RDBMS allows this practice).
>
> Consider this use case: for query q, admin looks at stats and performs some
> optimization (without changing the query). Admin adds / modifies the
> comment for q - now the query becomes q'. If query id doesn't change, there
> still would be one row in pg_stat_statements which makes it difficult to
> gauge the effectiveness of the tuning.
>
> I want to get opinion from the community whether adding / changing comment
> in SQL query should result in new query id for pg_stat_statements.

Are you talking about optimizer hint with something like pg_hint_plan, or just
random comment like "/* we now added index blabla */ SELECT ..."?

If the former, then such an extension can already provide its own queryid
generator which can chose to ignore part or all of the comments or not.

If the latter, then it seems shortsighted to me.  At the very least not all
application can be modified to have a specific comment attached to a query.

Also, if you want check how a query if performing after doing some
modifications, you should start with some EXPLAIN ANALYZE first (or even a
simple EXPLAIN if you want to validate some new index using hypothetical
indexes).  If this is some more general change (e.g. shared_buffers,
work_mem...) then the whole system is going to perform differently, and you
certainly won't add a new comment to every single query executed.

So again it seems to me that doing pg_stat_statement snapshots and comparing
the diff between each to see how the whole workload, or specific queries, is
behaving is still the best answer here.

Re: pg_tablespace_location() failure with allow_in_place_tablespaces

2022-03-07 Thread Michael Paquier

On Tue, Mar 08, 2022 at 10:06:50AM +0900, Kyotaro Horiguchi wrote:
> At Tue, 8 Mar 2022 10:39:06 +1300, Thomas Munro  
> wrote in 
>> Thanks, you're right.  Test on a Win10 VM.  Here's a new version.

Looks fine to me.

> FYI, on Windows11, pg_basebackup didn't work correctly without the
> patch.  So this looks like fixing an undiscovered bug as well.

Well, that's not really a long-time bug but just a side effect of
in-place tablespaces because we don't use them in many test cases 
yet, is it?

>> pg_basebackup -D copy
> WARNING:  could not read symbolic link "pg_tblspc/16384": Invalid argument
> pg_basebackup: error: tar member has empty name
> 
>1 File(s) 0 bytes
>  3 Dir(s)  171,920,613,376 bytes free

That's a lot of free space.
--
Michael


signature.asc
Description: PGP signature

RE: Logical replication timeout problem

2022-03-07 Thread wangw.f...@fujitsu.com

On Fri, Mar 4, 2022 at 4:26 PM Kuroda, Hayato/黒田 隼人  
wrote:
>
Thanks for your test and comments.

> Some codes were added in ReorderBufferProcessTXN() for treating DDL,
> but I doubted updating accept_writes is needed.
> IMU, the parameter is read by OutputPluginPrepareWrite() in order align
> messages.
> They should have a header - like 'w' - before their body. But here only a
> keepalive message is sent,
> no meaningful changes, so I think it might be not needed.
> I commented out the line and tested like you did [1], and no timeout and 
> errors
> were found.
> Do you have any reasons for that?
> 
> https://www.postgresql.org/message-
> id/OS3PR01MB6275A95FD44DC6C46058EA389E3B9%40OS3PR01MB6275.jpnprd0
> 1.prod.outlook.com
Yes, you are right. We should not set accept_writes to true here.
And I looked into the function WalSndUpdateProgress. I found function
WalSndUpdateProgress try to record the time of some message(by function
LagTrackerWrite) sent to subscriber, such as in function pgoutput_commit_txn.
Then, when publisher receives the reply message from the subscriber(function
ProcessStandbyReplyMessage), publisher invokes LagTrackerRead to calculate the
delay time(refer to view pg_stat_replication).
Referring to the purpose of LagTrackerWrite, I think it is no need to log time
when sending keepalive messages here.
So when the parameter send_keep_alive of function WalSndUpdateProgress is true,
skip the recording time.

> I'm also happy if you give the version number :-).
Introduce version information, starting from version 1.

Attach the new patch.
1. Fix wrong variable setting and skip unnecessary time records.[suggestion by 
Kuroda-San and me.]
2. Introduce version information.[suggestion by Peter, Kuroda-San]

Regards,
Wang wei


v1-0001-Fix-the-timeout-of-subscriber-in-long-transaction.patch
Description:  v1-0001-Fix-the-timeout-of-subscriber-in-long-transaction.patch

Re: make tuplestore helper function

2022-03-07 Thread Michael Paquier

On Mon, Mar 07, 2022 at 05:09:19PM -0500, Melanie Plageman wrote:
> I've attached a patch using the helper in most locations in contrib
> modules that seemed useful.

Thanks for the patch.  I was also looking at that yesterday, and this
pretty much maps to what I have finished with, except for dblink and
xml2 where it was not looking that obvious to me at quick glance.

In xml2, my eyes hurt a bit on the meaning of doing the "Switch out of
SPI context" in xpath_table() after doing the two SPI calls, but I
agree that this extra switch back to the old context should not be
necessary.  For dblink, I did not notice that the change for
dblink_get_notify() would be this straight-forward, good catch.  It
would be nice to see prepTuplestoreResult() gone at the end, though I
am not sure how invasive/worth it would be for dblink/.

> - pg_logdir_ls() in contrib/adminpack has return type TYPEFUNC_RECORD
>   and expectedDesc is not already created, so the helper can't be used.
>   But basically, since it doesn't specify OUT argument names, it has to
>   do TupleDescInitEntry() itself anyway, I think.

Here, actually, I thought first that a simplification should be
possible, noticing that the regression tests passed with the function
called until I saw what it was testing :)

I am fine to not poke at the beast, and it looks like we may run into
trouble if we wish to maintain compatibility down to adminpack 1.0 at
runtime.  This function has a very limited usage, anyway.

> - contrib/tablefunc.c was also not simple to refactor. the various parts
>   of SetSingleFuncCall are spread throughout different functions in the
>   file.
> 
> - contrib/dblink has one function which returns a tuplestore that was
>   simple to change (dblink_get_notify()) and I've done so.

This matches my impression, so applied.
--
Michael

signature.asc
Description: PGP signature

Re: Time to drop plpython2?

2022-03-07 Thread Tom Lane

Andres Freund  writes:
> A related question is whether we want to remove $(python_majorversion)
> references in the makefiles?

I wouldn't.  I'm doubtful of your theory that there will never be
a Python 4.

This version seems ready-to-go to me, or at least ready to see
what the buildfarm makes of it.

regards, tom lane

Re: pg_tablespace_location() failure with allow_in_place_tablespaces

2022-03-07 Thread Kyotaro Horiguchi

At Tue, 8 Mar 2022 10:39:06 +1300, Thomas Munro  wrote 
in 
> On Tue, Mar 8, 2022 at 12:58 AM Michael Paquier  wrote:
> > On Fri, Mar 04, 2022 at 11:26:43PM +1300, Thomas Munro wrote:
> > > + /* Skip in-place tablespaces (testing use only) */
> > > + if (get_dirent_type(fullpath, de, false, ERROR) == PGFILETYPE_DIR)
> > > + continue;
> >
> > I saw the warning when testing base backups with in-place tablespaces
> > and it did not annoy me much, but, yes, that can be confusing.
> >
> > Junction points are directories, no?  Are you sure that this works
> > correctly on WIN32?  It seems to me that we'd better use readlink()
> > only for entries in pg_tlbspc/ that are PGFILETYPE_LNK on non-WIN32
> > and pgwin32_is_junction() on WIN32.
> 
> Thanks, you're right.  Test on a Win10 VM.  Here's a new version.

Thanks!  It works for me on CentOS8 and Windows11.

FYI, on Windows11, pg_basebackup didn't work correctly without the
patch.  So this looks like fixing an undiscovered bug as well.

===
> pg_basebackup -D copy
WARNING:  could not read symbolic link "pg_tblspc/16384": Invalid argument
pg_basebackup: error: tar member has empty name

> dir copy
 Volume in drive C has no label.
 Volume serial number: 10C6-4BA6

 Directory of c:\..\copy

2022/03/08  09:53  .
2022/03/08  09:53  ..
2022/03/08  09:53 0 nbase.tar
2022/03/08  09:53  pg_wal
   1 File(s) 0 bytes
   3 Dir(s)  171,920,613,376 bytes free

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: refreshing query id for pg_stat_statements based on comment in sql

2022-03-07 Thread Bruce Momjian

On Mon, Mar  7, 2022 at 09:42:26AM -0800, Zhihong Yu wrote:
> Hi,
> Currently the query id for pg_stat_statements gets calculated based on the
> parse nodes specifics. 
> This means that the user cannot add a comment to a SQL query to test 
> something.
> (though some other RDBMS allows this practice).
> 
> Consider this use case: for query q, admin looks at stats and performs some
> optimization (without changing the query). Admin adds / modifies the comment
> for q - now the query becomes q'. If query id doesn't change, there still 
> would
> be one row in pg_stat_statements which makes it difficult to gauge the
> effectiveness of the tuning.
> 
> I want to get opinion from the community whether adding / changing comment in
> SQL query should result in new query id for pg_stat_statements.

Uh, we don't have a parse node for comments, and I didn't think comments
were part of the query id, and my testing confirms that:

psql -c "SET log_statement = 'all'" -c "select pg_sleep(1) -- 
test1;" test
psql -c "SET log_statement = 'all'" -c "select pg_sleep(1) -- 
test2;" test

shows the comment in the logs:


2022-03-07 19:02:19.509 EST [1075860] LOG:  statement: select 
pg_sleep(1) -- test1;
2022-03-07 19:02:24.389 EST [1075860] ERROR:  canceling statement due 
to user request
2022-03-07 19:02:24.389 EST [1075860] STATEMENT:  select 
pg_sleep(1) -- test1;
2022-03-07 19:02:27.029 EST [1075893] LOG:  statement: select 
pg_sleep(1) -- test2;
2022-03-07 19:02:47.915 EST [1075893] ERROR:  canceling statement due 
to user request
2022-03-07 19:02:47.915 EST [1075893] STATEMENT:  select 
pg_sleep(1) -- test2;

and I see the same query_id for both:

test=> select query, query_id from pg_stat_activity;
 query |   query_id
---+--
   |
   |
-->  select pg_sleep(1) -- test1;  |  2920433178127795318
 select query, query_id from pg_stat_activity; | -8032661921273433383
   |
   |
   |
(7 rows)

test=> select query, query_id from pg_stat_activity;
 query |   query_id
---+--
   |
   |
-->  select pg_sleep(1) -- test2;  |  2920433178127795318
 select query, query_id from pg_stat_activity; | -8032661921273433383

I think you need to show us the problem you are having.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Fix overflow in DecodeInterval

2022-03-07 Thread Joseph Koshakow

I just realized another issue today. It may have been obvious from one
of Tom's earlier messages, but I'm just now putting the pieces
together.
On Fri, Feb 18, 2022 at 11:44 PM Tom Lane  wrote:
> Also, I notice that there's an overflow hazard upstream of here,
> in interval2tm:
>
> regression=# select interval '214748364 hours' * 11;
> ERROR: interval out of range
> regression=# \errverbose
> ERROR: 22008: interval out of range
> LOCATION: interval2tm, timestamp.c:1982
>
> There's no good excuse for not being able to print a value that
> we computed successfully.

Scenarios like this can properly decode the interval, but actually
error out when encoding the interval. As a consequence you can insert
the value successfully into a table, but any attempt to query the table
that includes the "bad interval" value in the result will cause an
error. Below I've demonstrated an example:

postgres=# CREATE TABLE tbl (i INTERVAL);
CREATE TABLE
postgres=# INSERT INTO tbl VALUES ('1 day'), ('3 months'), ('2 years');
INSERT 0 3
postgres=# SELECT * FROM tbl;
i
-
1 day
3 mons
2 years
(3 rows)

postgres=# INSERT INTO tbl VALUES ('2147483647 hours 60 minutes');
INSERT 0 1
postgres=# SELECT * FROM tbl;
ERROR: interval out of range

This would seriously reduce the usable of any table that contains one
of these "bad interval" values.

My patch actually fixes this issue, but I just wanted to call it out
because it might be relevant when reviewing.

Recovering stat file from crash

2022-03-07 Thread Marek Kulik

Hi,

I created a .patch that will allow me to recover the stat files after a
potential crash.
Depending on the traffic on the server some records might be lost (0.5 sec
of records / more or less ? ).
>From what I read it is still better than no stat files at all.

I restricted it to the default recovery scenario only
(RECOVERY_TARGET_TIMELINE_LATEST) to avoid having invalid stats files with
other recovery options.

Am I missing something ? File integrity should be fine because of renaming.
--- a/src/include/pgstat.h	2022-02-22 22:22:22.2 +0200
+++ b/src/include/pgstat.h	2022-02-22 22:22:22.2 +0200
@@ -29,6 +29,7 @@
 #define PGSTAT_STAT_PERMANENT_DIRECTORY		"pg_stat"
 #define PGSTAT_STAT_PERMANENT_FILENAME		"pg_stat/global.stat"
 #define PGSTAT_STAT_PERMANENT_TMPFILE		"pg_stat/global.tmp"
+#define PGSTAT_STAT_RECOVERY_FILENAME		"pg_stat/recovery"
 
 /* Default directory to store temporary statistics data in */
 #define PG_STAT_TMP_DIR		"pg_stat_tmp"
@@ -1091,6 +1092,7 @@
 extern void pgstat_init(void);
 extern int	pgstat_start(void);
 extern void pgstat_reset_all(void);
+extern void pgstat_create_recovery_file(void);
 extern void allow_immediate_pgstat_restart(void);
 
 #ifdef EXEC_BACKEND
--- a/src/backend/access/transam/xlog.c	2022-02-22 22:22:22.2 +0200
+++ b/src/backend/access/transam/xlog.c	2022-02-22 22:22:22.2 +0200
@@ -5195,7 +5195,16 @@
 		/*
 		 * Reset pgstat data, because it may be invalid after recovery.
 		 */
-		pgstat_reset_all();
+		if (recoveryTargetTimeLineGoal == RECOVERY_TARGET_TIMELINE_LATEST)
+		{
+			elog(WARNING, "Doing recovery");
+			pgstat_create_recovery_file();
+		}
+		else
+		{
+			elog(WARNING, "Reseting recovery files");
+			pgstat_reset_all();
+		}
 
 		/*
 		 * If there was a backup label file, it's done its job and the info
--- a/src/backend/postmaster/pgstat.c	2022-02-22 22:22:22.2 +0200
+++ b/src/backend/postmaster/pgstat.c	2022-02-22 22:22:22.2 +0200
@@ -739,6 +739,54 @@
 	pgstat_reset_remove_files(PGSTAT_STAT_PERMANENT_DIRECTORY);
 }
 
+static bool
+pgstat_check_recovery_file_exists()
+{
+		const char *stat_rec_file = PGSTAT_STAT_RECOVERY_FILENAME;
+		return (unlink(stat_rec_file) == 0);
+}
+
+void
+pgstat_create_recovery_file(void)
+{
+	FILE	   *fpout;
+	const char *stat_rec_file = PGSTAT_STAT_RECOVERY_FILENAME;
+
+	elog(WARNING, "writing stats recovery file \"%s\"", stat_rec_file);
+
+	/*
+	 * Open the statistics recovery file to touch it.
+	 */
+	fpout = AllocateFile(stat_rec_file, PG_BINARY_W);
+	if (fpout == NULL)
+	{
+		ereport(LOG,
+(errcode_for_file_access(),
+ errmsg("could not open stats recovery file \"%s\": %m",
+		stat_rec_file)));
+		return;
+	}
+
+	if (ferror(fpout))
+	{
+		ereport(LOG,
+(errcode_for_file_access(),
+ errmsg("could not write stats recovery file \"%s\": %m",
+		stat_rec_file)));
+		FreeFile(fpout);
+		unlink(stat_rec_file);
+	}
+	else if (FreeFile(fpout) < 0)
+	{
+		ereport(LOG,
+(errcode_for_file_access(),
+ errmsg("could not close stats recovery file \"%s\": %m",
+		stat_rec_file)));
+		unlink(stat_rec_file);
+	}
+
+}
+
 #ifdef EXEC_BACKEND
 
 /*
@@ -3525,7 +3573,8 @@
 	 * Read in existing stats files or initialize the stats to zero.
 	 */
 	pgStatRunningInCollector = true;
-	pgStatDBHash = pgstat_read_statsfiles(InvalidOid, true, true);
+	bool recFile = pgstat_check_recovery_file_exists();
+	pgStatDBHash = pgstat_read_statsfiles(InvalidOid, recFile ? false : true, true);
 
 	/* Prepare to wait for our latch or data in our socket. */
 	wes = CreateWaitEventSet(CurrentMemoryContext, 3);

Re: ltree_gist indexes broken after pg_upgrade from 12 to 13

2022-03-07 Thread Alexander Korotkov

On Sun, Mar 6, 2022 at 8:28 PM Tomas Vondra
 wrote:
> On 3/6/22 08:09, Alexander Korotkov wrote:
> > Sorry for this terrible oversight by me.
> >
> > On Sat, Mar 5, 2022 at 10:13 AM Tomas Vondra
> >  wrote:
> >> On 3/4/22 23:09, Nikita Glukhov wrote:
> >>> On 04.03.2022 23:28, Tom Lane wrote:
> >>>
>  Tomas Vondra  writes:
> > On 3/4/22 20:29, Nikita Glukhov wrote:
> >> So, we probably have corrupted indexes that were updated since such
> >> "incomplete" upgrade of ltree.
> > IIRC pg_upgrade is not expected to upgrade extensions - it keeps the
> > installed version of the extension, and that's intentional.
>  Yeah, exactly.  But this opens up an additional consideration we
>  have to account for: whatever we do needs to work with either 1.1
>  or 1.2 SQL-level versions of the extension.
> 
>   regards, tom lane
> >>>
> >>> It becomes clear that ltree upgrade 1.1 => 1.2 is broken, the problem
> >>> is not so much related to PG12 => PG13+ upgrades.
> >
> > So, it seems that ltree 1.1 in PG13+ is incompatible with ltree on
> > PG12 and ltree 1.2 on PG13+.  And there are many scenarios involving.
> >
> > It seems too difficult to identify all the broken cases in the release
> > notes.  What about applying a patch and asking all ltree users to
> > reindex their indexes?
> >
>
> Yeah. I think this is getting so complicated that there's little chance
> we'd be able to clearly explain when to reindex.

Good.  The revised patch is attached.  Instead of adding argument to
LTREE_GET_ASIGLEN(), it introduces separate LTREE_GET_SIGLEN() and
LTREE_GET_ASIGLEN() macros.

--
Regards,
Alexander Korotkov


0001-ltree-gist-siglen-fix-v2.patch
Description: Binary data

Re: New developer papercut - Makefile references INSTALL

2022-03-07 Thread Tom Lane

Magnus Hagander  writes:
> But taking a step back, who is the actual audience for this? Do we
> *need* a link pointing directly there, or is it enough to just point
> to "use the docs on the web"? We can't link to the incorrect version,
> but can we just link to /docs/ and leave it at that?

Well, it's people compiling from source, so I guess we can assume some
amount of cluefulness?  I think perhaps it'd be okay to say "go here
and then navigate to the proper sub-page for your version".

> If not, could we make the change of URL a part of the branching step?
> Branch to a stable release would the include modifying README, and be
> mad ea step of version_stamp.pl?

Doesn't really help people working from git, I think, because the
master branch is always going to claim to be "devel" even when you
rewind it to some old state.  Maybe we can assume people doing
such a thing have even more clue ... but on the whole I'd rather
not add the additional complication.

regards, tom lane

Re: support for MERGE

2022-03-07 Thread Zhihong Yu

On Mon, Mar 7, 2022 at 12:04 PM Álvaro Herrera 
wrote:

> On Mon, Mar 7, 2022, at 4:59 PM, Álvaro Herrera wrote:
>
> I attach v13 here.  This version includes a 0002 that's does the split of
> nodeModifyTable.c routines, then 0003 implements MERGE on top of that.
> 0001 is as before.
>
>
> In 0002, I've opted to have two separate structs.  One is the
> ModifyTableContext, as before, but I've removed 'tupleid' and 'oldtuple'
> (the specification of the tuple to delete/update) because it makes
> ExecCrossPartitionUpdate cleaner if we pass them separately.  The second
> struct is UpdateContext, which is used inside ExecUpdate as output data
> from its subroutines.  This is also for the benefit of cross-partition
> updates.
>
> I'm pretty happy with how this turned out; even without considering MERGE,
> it seems to me that ExecUpdate benefits from being shorter.
>
Hi,
For v13-0003-MERGE-SQL-Command-following-SQL-2016.patch :

+* Reset per-tuple memory context to free any expression evaluation
+* storage allocated in the previous cycle.
+*/
+   ResetExprContext(econtext);

Why is the memory cleanup done in the next cycle ? Can the cleanup be done
at the end of the current cycle ?

+* XXX Should this explain why MERGE has the same logic as
UPDATE?

I think explanation should be given.

Cheers

Re: make tuplestore helper function

2022-03-07 Thread Melanie Plageman

On Sun, Mar 6, 2022 at 9:29 PM Michael Paquier  wrote:
>
> On Wed, Mar 02, 2022 at 03:43:17PM +0900, Michael Paquier wrote:
> > This is actually setting up a function in the context of a single call
> > where we fill the tuplestore with all its values, so instead I have
> > settled down to name that SetSingleFuncCall(), to make a parallel with
> > the existing MultiFuncCall*().  funcapi.c is the right place for
> > that, and I have added more documentation in the fmgr's README as well
> > as funcapi.h.
>
> I have tortured all those code paths for the last couple of days, and
> the new function name, as well as its options, still seemed fine to
> me, so I have applied the patch.  The buildfarm is cool with it.  It
> is worth noting that there are more code paths in contrib/ that could
> be simplified with this new routine.

Wow! Thanks so much for taking the time to review, refine, and commit
this work.

I've attached a patch using the helper in most locations in contrib
modules that seemed useful.

The following I don't think we can use the helper or it is not worth it:

- pg_logdir_ls() in contrib/adminpack has return type TYPEFUNC_RECORD
  and expectedDesc is not already created, so the helper can't be used.

  But basically, since it doesn't specify OUT argument names, it has to
  do TupleDescInitEntry() itself anyway, I think.

- contrib/tablefunc.c was also not simple to refactor. the various parts
  of SetSingleFuncCall are spread throughout different functions in the
  file.

- contrib/dblink has one function which returns a tuplestore that was
  simple to change (dblink_get_notify()) and I've done so.

  However, most of the other creation of tuplestore and tupledescriptors
  is in helper functions (like materializeResult()) which separate the
  tuplestore creation from the tuple descriptor initialization in a way
  that made it hard to do a drop-in replacement with the helper function.

- Melanie
From a734a4dbf33230242f53bdc4ab6d23a98d6def8c Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Mon, 7 Mar 2022 14:22:22 -0500
Subject: [PATCH] contrib SRFs use helper

9e98583898c introduced a helper to centralize building their needed
state (tuplestore, tuple descriptors, etc) and checking for any errors.
Use this helper in contrib modules in which it is a simple drop-in
replacement.
---
 contrib/amcheck/verify_heapam.c   | 48 ++---
 contrib/dblink/dblink.c   | 26 +--
 contrib/pageinspect/brinfuncs.c   | 31 +---
 contrib/pageinspect/gistfuncs.c   | 60 ++--
 .../pg_stat_statements/pg_stat_statements.c   | 33 +
 contrib/pgrowlocks/pgrowlocks.c   | 34 ++---
 contrib/postgres_fdw/connection.c | 31 +---
 contrib/xml2/xpath.c  | 72 +++
 8 files changed, 34 insertions(+), 301 deletions(-)

diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index f996f9a572..60e2f36183 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -165,7 +165,6 @@ static bool check_tuple_visibility(HeapCheckContext *ctx);
 static void report_corruption(HeapCheckContext *ctx, char *msg);
 static void report_toast_corruption(HeapCheckContext *ctx,
 	ToastedAttribute *ta, char *msg);
-static TupleDesc verify_heapam_tupdesc(void);
 static FullTransactionId FullTransactionIdFromXidAndCtx(TransactionId xid,
 		const HeapCheckContext *ctx);
 static void update_cached_xid_range(HeapCheckContext *ctx);
@@ -214,8 +213,6 @@ Datum
 verify_heapam(PG_FUNCTION_ARGS)
 {
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
-	MemoryContext old_context;
-	bool		random_access;
 	HeapCheckContext ctx;
 	Buffer		vmbuffer = InvalidBuffer;
 	Oid			relid;
@@ -227,16 +224,6 @@ verify_heapam(PG_FUNCTION_ARGS)
 	BlockNumber nblocks;
 	const char *skip;
 
-	/* Check to see if caller supports us returning a tuplestore */
-	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
-		ereport(ERROR,
-(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("set-valued function called in context that cannot accept a set")));
-	if (!(rsinfo->allowedModes & SFRM_Materialize))
-		ereport(ERROR,
-(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("materialize mode required, but it is not allowed in this context")));
-
 	/* Check supplied arguments */
 	if (PG_ARGISNULL(0))
 		ereport(ERROR,
@@ -290,15 +277,12 @@ verify_heapam(PG_FUNCTION_ARGS)
 	 */
 	ctx.attnum = -1;
 
-	/* The tupdesc and tuplestore must be created in ecxt_per_query_memory */
-	old_context = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
-	random_access = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
-	ctx.tupdesc = verify_heapam_tupdesc();
-	ctx.tupstore = tuplestore_begin_heap(random_access, false, work_mem);
-	rsinfo->returnMode = SFRM_Materialize;
-	rsinfo->setResult = ctx.tupstore;
-	rsinfo->setDesc = ctx.tupdesc;
-

Re: Doc about how to set max_wal_senders when setting minimal wal_level

2022-03-07 Thread David G. Johnston

On Fri, Mar 4, 2022 at 2:49 AM Japin Li  wrote:

> Thanks for your review.  Modified.
>

Works for me.  I have some additional sparks of ideas but nothing that need
hold this up.

David J.

Re: Doc about how to set max_wal_senders when setting minimal wal_level

2022-03-07 Thread David G. Johnston

On Thu, Mar 3, 2022 at 11:05 PM Kyotaro Horiguchi 
wrote:

> But,
> anyway, IMHO, it is mere a performance tips that is not necessarily
> required in this section, or even in this documentaiotn.  Addtion to
> that, if we write this for max_wal_senders, archive_mode will deserve
> the similar tips but I think it is too verbose.  In short, I think I
> would not add that description at all.
>
>
I wrote it as a performance tip but it is documenting that when set to 0 no
features of the server require more information than is captured in the
minimal wal.  That fact seems worthy of noting.  Even at the cost of a bit
of verbosity.  These features interact with each other and that interaction
should be adequately described.  While subjective, this dynamic seems to
warrant inclusion.

David J.

Re: pg_tablespace_location() failure with allow_in_place_tablespaces

2022-03-07 Thread Thomas Munro

On Tue, Mar 8, 2022 at 10:39 AM Thomas Munro  wrote:
> Test on a Win10 VM.

Erm, "Tested" (as in, I tested), I meant to write...

Re: pg_tablespace_location() failure with allow_in_place_tablespaces

2022-03-07 Thread Thomas Munro

On Tue, Mar 8, 2022 at 12:58 AM Michael Paquier  wrote:
> On Fri, Mar 04, 2022 at 11:26:43PM +1300, Thomas Munro wrote:
> > + /* Skip in-place tablespaces (testing use only) */
> > + if (get_dirent_type(fullpath, de, false, ERROR) == PGFILETYPE_DIR)
> > + continue;
>
> I saw the warning when testing base backups with in-place tablespaces
> and it did not annoy me much, but, yes, that can be confusing.
>
> Junction points are directories, no?  Are you sure that this works
> correctly on WIN32?  It seems to me that we'd better use readlink()
> only for entries in pg_tlbspc/ that are PGFILETYPE_LNK on non-WIN32
> and pgwin32_is_junction() on WIN32.

Thanks, you're right.  Test on a Win10 VM.  Here's a new version.
From e231fd45e153279bfb2dd3441b2a34797c446289 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 8 Mar 2022 10:25:59 +1300
Subject: [PATCH] Suppress pg_basebackup warning for in-place tablespaces.

Reported-by: Kyotaro Horiguchi 
Reviewed-by: Michael Paquier 
Discussion: https://postgr.es/m/20220304.165449.1200020258723305904.horikyota.ntt%40gmail.com

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0d2bd7a357..ab8be77bd2 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -66,6 +66,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "common/controldata_utils.h"
+#include "common/file_utils.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
@@ -8292,6 +8293,20 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
 
 			snprintf(fullpath, sizeof(fullpath), "pg_tblspc/%s", de->d_name);
 
+			/*
+			 * Skip anything that isn't a symlink/junction.  For testing only,
+			 * we sometimes use allow_in_place_tablespaces to create
+			 * directories directly under pg_tblspc, which would produce
+			 * spurious warnings below.
+			 */
+#ifdef WIN32
+			if (!pgwin32_is_junction(fullpath))
+continue;
+#else
+			if (get_dirent_type(fullpath, de, false, ERROR) != PGFILETYPE_LNK)
+continue;
+#endif
+
 #if defined(HAVE_READLINK) || defined(WIN32)
 			rllen = readlink(fullpath, linkpath, sizeof(linkpath));
 			if (rllen < 0)
-- 
2.35.1

Re: role self-revocation

2022-03-07 Thread David G. Johnston

On Mon, Mar 7, 2022 at 1:16 PM Tom Lane  wrote:

> Based on Robert's archaeological dig, it now seems that the fact that
> we have any such behavior at all was just a mistake.  What would be
> lost if we drop it?
>

Probably nothing that couldn't be replaced, and with a better model, but I
do have a concern that there are setups in the wild inadvertently using
this behavior.  Enough so that I would vote to change it but include a
migration GUC to restore the current behavior, probably with a deprecation
warning.  Kinda depends on the post-change dump/restore mechanics.  But
just tearing it out wouldn't seem extraordinary for us.

>
> Having said that, one thing that I find fishy is that it's not clear
> where the admin privilege for a role originates.

I do not see a problem with there being no inherent admin privilege for a
role.  A superuser or CREATEROLE user holds admin privilege on all roles in
the cluster.  They can delegate the privilege to administer a role to yet
another role in the system.  The necessitates creating two roles - the one
being administered and the one being delegated to.  I don't see a benefit
to saving which specific superuser or CREATEROLE user "owns" the role that
is to be administered.  Not unless non-owner CREATEROLE users are prevented
from exercising admin privileges on the role.  That all said, I'd accept
the choice to include such ownership information as a requirement for
meeting the auditing needs of DBAs.  But I would argue that such auditing
probably needs to be external to the working system - the fact that
ownership can be changed reduces the benefit of an in-database value.

> If we recorded

which user created the role, we could act as though that user has
> admin privilege (whether or not it's a member).

I suppose we could record the current owner of a role but that seems
unnecessary.  I dislike using the "created" concept by virtue of the fact
that, for routines, "security definer" implies creator but it actually
means "security owner".

David J.

Re: logical decoding and replication of sequences

2022-03-07 Thread Tomas Vondra




On 3/7/22 17:53, Tomas Vondra wrote:
> On 2/28/22 12:46, Amit Kapila wrote:
>> On Sat, Feb 12, 2022 at 6:04 AM Tomas Vondra
>>  wrote:
>>>
>>> On 2/10/22 19:17, Tomas Vondra wrote:
 I've polished & pushed the first part adding sequence decoding
 infrastructure etc. Attached are the two remaining parts.

 I plan to wait a day or two and then push the test_decoding part. The
 last part (for built-in replication) will need more work and maybe
 rethinking the grammar etc.

>>>
>>> I've pushed the second part, adding sequences to test_decoding.
>>>
>>
>> The test_decoding is failing randomly in the last few days. I am not
>> completely sure but they might be related to this work. The two of
>> these appears to be due to the same reason:
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2022-02-25%2018%3A50%3A09
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust=2022-02-17%2015%3A17%3A07
>>
>> TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File:
>> "reorderbuffer.c", Line: 1173, PID: 35013)
>> 0   postgres0x00593de0 ExceptionalCondition + 
>> 160\\0
>>
> 
> This might be related to the issue reported by Amit, i.e. that
> sequence_decode does not call ReorderBufferProcessXid(). If this keeps
> failing, we'll have to add some extra debug info (logging LSN etc.), at
> least temporarily. It'd be valuable to inspect the WAL too.
> 
>> Another:
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill=2022-02-16%2006%3A21%3A48
>>
>> --- 
>> /home/nm/farm/xlc32/HEAD/pgsql.build/contrib/test_decoding/expected/rewrite.out
>> 2022-02-14 20:19:14.0 +
>> +++ 
>> /home/nm/farm/xlc32/HEAD/pgsql.build/contrib/test_decoding/results/rewrite.out
>> 2022-02-16 07:42:18.0 +
>> @@ -126,6 +126,7 @@
>>table public.replication_example: INSERT: id[integer]:4
>> somedata[integer]:3 text[character varying]:null
>> testcolumn1[integer]:null
>>table public.replication_example: INSERT: id[integer]:5
>> somedata[integer]:4 text[character varying]:null
>> testcolumn1[integer]:2 testcolumn2[integer]:1
>>COMMIT
>> +  sequence public.replication_example_id_seq: transactional:0
>> last_value: 38 log_cnt: 0 is_called:1
>>BEGIN
>>table public.replication_example: INSERT: id[integer]:6
>> somedata[integer]:5 text[character varying]:null
>> testcolumn1[integer]:3 testcolumn2[integer]:null
>>COMMIT
>> @@ -133,7 +134,7 @@
>>table public.replication_example: INSERT: id[integer]:7
>> somedata[integer]:6 text[character varying]:null
>> testcolumn1[integer]:4 testcolumn2[integer]:null
>>table public.replication_example: INSERT: id[integer]:8
>> somedata[integer]:7 text[character varying]:null
>> testcolumn1[integer]:5 testcolumn2[integer]:null
>> testcolumn3[integer]:1
>>COMMIT
>> - (15 rows)
>> + (16 rows)
>>
> 
> Interesting. I can think of one reason that might cause this - we log
> the first sequence increment after a checkpoint. So if a checkpoint
> happens in an unfortunate place, there'll be an extra WAL record. On
> slow / busy machines that's quite possible, I guess.
> 

I've tweaked the checkpoint_interval to make checkpoints more aggressive
(set it to 1s), and it seems my hunch was correct - it produces failures
exactly like this one. The best fix probably is to just disable decoding
of sequences in those tests that are not aimed at testing sequence decoding.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: refactoring basebackup.c

2022-03-07 Thread Robert Haas

On Wed, Feb 16, 2022 at 8:46 PM Jeevan Ladhe  wrote:
> Thanks for the comments Robert. I have addressed your comments in the
> attached patch v13-0002-ZSTD-add-server-side-compression-support.patch.
> Rest of the patches are similar to v12, but just bumped the version number.

OK, here's a consolidated patch with all your changes from 0002-0004
as 0001 plus a few proposed edits of my own in 0002. By and large I
think this is fine.

My proposed changes are largely cosmetic, but one thing that isn't is
revising the size - pos <= bound tests to instead check size - pos <
bound. My reasoning for that change is: if the number of bytes
remaining in the buffer is exactly equal to the maximum number we can
write, we don't need to flush it yet. If that sounds correct, we
should fix the LZ4 code the same way.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

v14-0002-My-changes.patch
Description: Binary data

v14-0001-Patches-from-JL.patch
Description: Binary data

Re: logical decoding and replication of sequences

2022-03-07 Thread Tomas Vondra




On 3/7/22 17:39, Tomas Vondra wrote:
> 
> 
> On 3/1/22 12:53, Amit Kapila wrote:
>> On Mon, Feb 28, 2022 at 5:16 PM Amit Kapila  wrote:
>>>
>>> On Sat, Feb 12, 2022 at 6:04 AM Tomas Vondra
>>>  wrote:

 On 2/10/22 19:17, Tomas Vondra wrote:
> I've polished & pushed the first part adding sequence decoding
> infrastructure etc. Attached are the two remaining parts.
>
> I plan to wait a day or two and then push the test_decoding part. The
> last part (for built-in replication) will need more work and maybe
> rethinking the grammar etc.
>

 I've pushed the second part, adding sequences to test_decoding.

>>>
>>> The test_decoding is failing randomly in the last few days. I am not
>>> completely sure but they might be related to this work. The two of
>>> these appears to be due to the same reason:
>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2022-02-25%2018%3A50%3A09
>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust=2022-02-17%2015%3A17%3A07
>>>
>>> TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File:
>>> "reorderbuffer.c", Line: 1173, PID: 35013)
>>> 0   postgres0x00593de0 ExceptionalCondition + 
>>> 160\\0
>>>
>>
>> While reviewing the code for this, I noticed that in
>> sequence_decode(), we don't call ReorderBufferProcessXid to register
>> the first known lsn in WAL for the current xid. The similar functions
>> logicalmsg_decode() or heap_decode() do call ReorderBufferProcessXid
>> even if they decide not to queue or send the change. Is there a reason
>> for not doing the same here? However, I am not able to deduce any
>> scenario where lack of this will lead to such an Assertion failure.
>> Any thoughts?
>>
> 
> Thanks, that seems like an omission. Will fix.
> 

I've pushed this simple fix. Not sure it'll fix the assert failures on
skink/locust, though. Given the lack of information it'll be difficult
to verify. So let's wait a bit.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: New developer papercut - Makefile references INSTALL

2022-03-07 Thread Magnus Hagander

On Mon, Mar 7, 2022 at 4:57 PM Tom Lane  wrote:
>
> Magnus Hagander  writes:
> > If anything, I'm more behind the idea of just getting rid of the
> > INSTALL file. A reference to the install instructions in the README
> > should be enough today. The likelihood of somebody getting a postgres
> > source tarball and trying to build it for the first time while not
> > having internet access is extremely low I'd say.
>
> I agree that there's no longer a lot of reason to insist that the
> installation instructions need to be present in a flat text file
> as opposed to some other way.
>
> However, just putting a URL into README seems problematic, because how
> will we ensure that it's the correct version-specific URL?  (And it does
> need to be version-specific; the set of configure options changes over
> time, and that's not even mentioning whatever user-visible effects
> changing to meson will have.)  You could imagine generating the URL
> during tarball build, but that does nothing for the people who pull
> directly from git.
>
> I thought briefly about directing people to read
> doc/src/sgml/html/installation.html, but that has the same problem
> that it won't be present in a fresh git pull.

Yeah, if we just care about tarballs that works, but then it also
works to inject the version number in the README file.

But taking a step back, who is the actual audience for this? Do we
*need* a link pointing directly there, or is it enough to just point
to "use the docs on the web"? We can't link to the incorrect version,
but can we just link to /docs/ and leave it at that?

If not, could we make the change of URL a part of the branching step?
Branch to a stable release would the include modifying README, and be
mad ea step of version_stamp.pl?

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: role self-revocation

2022-03-07 Thread Tom Lane

Mark Dilger  writes:
>> On Mar 7, 2022, at 12:01 PM, Robert Haas  wrote:
>> 
>> It's been pointed out upthread that this would have undesirable
>> security implications, because the admin option would be inherited,
>> and the implicit permission isn't.

> Right, but with a reflexive self-admin-option, we could document that it 
> works in a non-inherited way.  We'd just be saying the current hard-coded 
> behavior is an option which can be revoked rather than something you're stuck 
> with.

After reflection, I think that role self-admin is probably a bad idea that
we should stay away from.  It could perhaps be reasonable given some other
system design and/or syntax than what SQL gives us, but we're dealing in
SQL.  It doesn't make sense to GRANT a role to itself, and therefore it
likewise doesn't make sense to GRANT WITH ADMIN OPTION.

Based on Robert's archaeological dig, it now seems that the fact that
we have any such behavior at all was just a mistake.  What would be
lost if we drop it?

Having said that, one thing that I find fishy is that it's not clear
where the admin privilege for a role originates.  After "CREATE ROLE
alice", alice has no members, therefore none that have admin privilege,
therefore the only way that the first member could be added is via
superuser deus ex machina.  This does not seem clean.  If we recorded
which user created the role, we could act as though that user has
admin privilege (whether or not it's a member).  Perhaps I'm
reinventing something that was already discussed upthread.  I wonder
what the SQL spec has to say on this point, too.

regards, tom lane

Re: role self-revocation

2022-03-07 Thread Mark Dilger

> On Mar 7, 2022, at 12:03 PM, Mark Dilger  wrote:
> 
> Right, but with a reflexive self-admin-option, we could document that it 
> works in a non-inherited way.  We'd just be saying the current hard-coded 
> behavior is an option which can be revoked rather than something you're stuck 
> with.

We could also say that the default is to not have admin option on yourself, 
with that being something grantable, but that is a larger change from the 
historical behavior and might have more consequences for dump/restore, etc.

My concern about just nuking self-admin is that there may be sites which use 
self-admin and we'd be leaving them without a simple work-around after upgrade, 
because they couldn't restore the behavior by executing a grant.  They'd have 
to more fundamentally restructure their role relationships to not depend on 
self-admin, something which might be harder for them to do.  Perhaps nobody is 
using self-admin, or very few people are using it, and I'm being overly 
concerned.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Time to drop plpython2?

2022-03-07 Thread Tom Lane

Andres Freund  writes:
> Now that the BF has stabilized, I've rebased and cleaned up the patches I'd
> posted earlier. Attached for the first time is my attempt at cleaning up the
> docs.

I looked through this quickly, and have a couple of nitpicks.  The
PGFILEDESC value for jsonb_plpython is worded randomly differently
from those for hstore_plpython and ltree_plpython.  I think we should
make it match those.  I also noted a couple of typos in the docs patch.
See attached delta patch (I named it .txt in hopes of not confusing
the cfbot).

I kind of wonder if we still need "46.1. Python 2 vs. Python 3" at
all.  It certainly doesn't seem like it still deserves its position
of honor as the first subsection.  Perhaps move it down to be the
last subsection?

Also, grepping reveals that vcregress.pl still has two stray references to
"plpythonu".  I did not touch that here, but maybe that has something
to do with the ci failure?

> I did so far leave in the "major version conflict" detection stuff in
> plpy_main.c - that could again be useful? I'm leaning towards removing it, I'd
> hope that there's not again such a painful transition, and we have the git
> history if needed.

I think we should leave it in.  I foresee that somebody will want to build
plpython2u as an out-of-core extension, at least for a few years yet.
If they do, and the core language does not have its half of that guard,
it'd be bad.

regards, tom lane

diff --git a/contrib/jsonb_plpython/Makefile b/contrib/jsonb_plpython/Makefile
index e3f160510d..fea7bdfc00 100644
--- a/contrib/jsonb_plpython/Makefile
+++ b/contrib/jsonb_plpython/Makefile
@@ -4,7 +4,7 @@ MODULE_big = jsonb_plpython$(python_majorversion)
 OBJS = \
 	$(WIN32RES) \
 	jsonb_plpython.o
-PGFILEDESC = "jsonb_plpython - transform between jsonb and plpython3u"
+PGFILEDESC = "jsonb_plpython - jsonb transform for plpython"
 
 PG_CPPFLAGS = -I$(top_srcdir)/src/pl/plpython $(python_includespec) -DPLPYTHON_LIBNAME='"plpython$(python_majorversion)"'
 
diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml
index b3b16b8a77..c4223fafb6 100644
--- a/doc/src/sgml/json.sgml
+++ b/doc/src/sgml/json.sgml
@@ -716,7 +716,7 @@ UPDATE table_name SET jsonb_field[1]['a'] = '1';
   
 
   
-   The extensions for PL/Python is called jsonb_plpython3u.
+   The extension for PL/Python is called jsonb_plpython3u.
If you use it, jsonb values are mapped to Python
dictionaries, lists, and scalars, as appropriate.
   
diff --git a/doc/src/sgml/ltree.sgml b/doc/src/sgml/ltree.sgml
index 4ff384c7ee..508f404ae8 100644
--- a/doc/src/sgml/ltree.sgml
+++ b/doc/src/sgml/ltree.sgml
@@ -826,15 +826,15 @@ ltreetest= SELECT ins_label(path,2,'Space') FROM test WHERE path @ 'Top.
   Transforms
 
   
-   The ltree_plpython3u implement transforms for the
-   ltree type for PL/Python. If installed and specified when
+   The ltree_plpython3u extension implements transforms for
+   the ltree type for PL/Python. If installed and specified when
creating a function, ltree values are mapped to Python lists.
(The reverse is currently not supported, however.)
   
 
   

-It is strongly recommended that the transform extensions be installed in
+It is strongly recommended that the transform extension be installed in
 the same schema as ltree.  Otherwise there are
 installation-time security hazards if a transform extension's schema
 contains objects defined by a hostile user.
diff --git a/doc/src/sgml/plpython.sgml b/doc/src/sgml/plpython.sgml
index fab4330cd1..7698d5f068 100644
--- a/doc/src/sgml/plpython.sgml
+++ b/doc/src/sgml/plpython.sgml
@@ -52,7 +52,7 @@
   
PL/Python supports only Python 3. Past versions of
PostgreSQL supported Python 2, using the
-   plpyhonu and plpython2u language
+   plpythonu and plpython2u language
names.

Re: support for MERGE

2022-03-07 Thread Álvaro Herrera

On Mon, Mar 7, 2022, at 4:59 PM, Álvaro Herrera wrote:
> I attach v13 here.  This version includes a 0002 that's does the split of 
> nodeModifyTable.c routines, then 0003 implements MERGE on top of that.  0001 
> is as before.

In 0002, I've opted to have two separate structs.  One is the 
ModifyTableContext, as before, but I've removed 'tupleid' and 'oldtuple' (the 
specification of the tuple to delete/update) because it makes 
ExecCrossPartitionUpdate cleaner if we pass them separately.  The second struct 
is UpdateContext, which is used inside ExecUpdate as output data from its 
subroutines.  This is also for the benefit of cross-partition updates.

I'm pretty happy with how this turned out; even without considering MERGE, it 
seems to me that ExecUpdate benefits from being shorter.

Re: role self-revocation

2022-03-07 Thread Mark Dilger

> On Mar 7, 2022, at 12:01 PM, Robert Haas  wrote:
> 
> It's been pointed out upthread that this would have undesirable
> security implications, because the admin option would be inherited,
> and the implicit permission isn't.

Right, but with a reflexive self-admin-option, we could document that it works 
in a non-inherited way.  We'd just be saying the current hard-coded behavior is 
an option which can be revoked rather than something you're stuck with.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: role self-revocation

2022-03-07 Thread Robert Haas

On Mon, Mar 7, 2022 at 2:59 PM Mark Dilger  wrote:
> This test failure is just a manifestation of the intended change, but 
> assuming we make no other changes, the error message would clearly need to be 
> updated, because it suggests the role should have admin_option on itself, a 
> situation which is not currently supported.

It's been pointed out upthread that this would have undesirable
security implications, because the admin option would be inherited,
and the implicit permission isn't.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: role self-revocation

2022-03-07 Thread Mark Dilger

> On Mar 7, 2022, at 10:28 AM, Tom Lane  wrote:
> 
> Does anything interesting break if you do just take it out?

 SET SESSION AUTHORIZATION regress_priv_group2;
 GRANT regress_priv_group2 TO regress_priv_user5; -- ok: a role can self-admin
-NOTICE:  role "regress_priv_user5" is already a member of role 
"regress_priv_group2"
+ERROR:  must have admin option on role "regress_priv_group2"

This test failure is just a manifestation of the intended change, but assuming 
we make no other changes, the error message would clearly need to be updated, 
because it suggests the role should have admin_option on itself, a situation 
which is not currently supported.

Perhaps we should support that, though, by adding a reflexive aclitem[] to 
pg_authid (meaning it tracks which privileges a role has on itself) with 
tracking of who granted it, so that revocation can be handled properly.  The 
aclitem could start out null, meaning the role has by default the traditional 
limited self-admin which the code comments discuss:

/*
 * A role can admin itself when it matches the session user and we're
 * outside any security-restricted operation, SECURITY DEFINER or
 * similar context.  SQL-standard roles cannot self-admin.  However,
 * SQL-standard users are distinct from roles, and they are not
 * grantable like roles: PostgreSQL's role-user duality extends the
 * standard.  Checking for a session user match has the effect of
 * letting a role self-admin only when it's conspicuously behaving
 * like a user.  Note that allowing self-admin under a mere SET ROLE
 * would make WITH ADMIN OPTION largely irrelevant; any member could
 * SET ROLE to issue the otherwise-forbidden command.
 *
 * Withholding self-admin in a security-restricted operation prevents
 * object owners from harnessing the session user identity during
 * administrative maintenance.  Suppose Alice owns a database, has
 * issued "GRANT alice TO bob", and runs a daily ANALYZE.  Bob creates
 * an alice-owned SECURITY DEFINER function that issues "REVOKE alice
 * FROM carol".  If he creates an expression index calling that
 * function, Alice will attempt the REVOKE during each ANALYZE.
 * Checking InSecurityRestrictedOperation() thwarts that attack.
 *
 * Withholding self-admin in SECURITY DEFINER functions makes their
 * behavior independent of the calling user.  There's no security or
 * SQL-standard-conformance need for that restriction, though.
 *
 * A role cannot have actual WITH ADMIN OPTION on itself, because that
 * would imply a membership loop.  Therefore, we're done either way.
 */

For non-null aclitem[], we could support REVOKE ADMIN OPTION FOR joe FROM joe, 
and for explicit re-grants, we could track who granted it, such that further 
revocations could properly refuse if the revoker doesn't have sufficient 
privileges vis-a-vis the role that granted it in the first place.

I have not yet tried to implement this, and might quickly hit problems with the 
idea, but will take a stab at a proof-of-concept patch unless you suggest a 
better approach.

Thoughts?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Make relfile tombstone files conditional on WAL level

2022-03-07 Thread Robert Haas

On Fri, Mar 4, 2022 at 12:37 AM Dilip Kumar  wrote:
> In this version I have fixed both of these issues.  Thanks Robert for
> suggesting the solution for both of these problems in our offlist
> discussion.  Basically, for the first problem we can flush the xlog
> immediately because we are actually logging the WAL every time after
> we allocate 64 relfilenode so this should not have much impact on the
> performance and I have added the same in the comments.  And during
> pg_upgrade, whenever we are assigning the relfilenode as part of the
> upgrade we will set that relfilenode + 1 as nextRelFileNode to be
> assigned so that we never generate the conflicting relfilenode.

Anyone else have an opinion on this?

> The only part I do not like in the patch is that before this patch we
> could directly access the buftag->rnode.  But since now we are not
> having directly relfilenode as part of the buffertag and instead of
> that we are keeping individual fields (i.e. dbOid, tbsOid and relNode)
> in the buffer tag.  So if we have to directly get the relfilenode we
> need to generate it.  However those changes are very limited to just 1
> or 2 file so maybe not that bad.

You're talking here about just needing to introduce BufTagGetFileNode
and BufTagSetFileNode, or something else? I don't find those macros to
be problematic.

BufTagSetFileNode could maybe assert that the OID isn't too big,
though. We should ereport() before we get to this point if we somehow
run out of values, but it might be nice to have a check here as a
backup.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Make relfile tombstone files conditional on WAL level

2022-03-07 Thread Robert Haas

On Mon, Feb 21, 2022 at 2:51 AM Dilip Kumar  wrote:
> While working on this I realized that even if we make the relfilenode
> 56 bits we can not remove the loop inside GetNewRelFileNode() for
> checking the file existence.  Because it is always possible that the
> file reaches to the disk even before the WAL for advancing the next
> relfilenode and if the system crashes in between that then we might
> generate the duplicate relfilenode right?

I agree.

> I think the second paragraph in XLogPutNextOid() function explain this
> issue and now even after we get the wider relfilenode we will have
> this issue.  Correct?

I think you are correct.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: role self-revocation

2022-03-07 Thread Robert Haas

On Mon, Mar 7, 2022 at 2:29 PM David G. Johnston
 wrote:
> You mean the one that was based upon your "ooops"...I discounted that 
> immediately because members cannot revoke their own membership in a group 
> unless they were given WITH ADMIN OPTION on that group.

Oh, hmm. That example might be backwards from the case I'm talking about.

> The mere fact that the pg_hba.conf concern raised there hasn't been reported 
> as a live issue suggests the lack of any meaningful design flaw here.

Not really. The system is full of old bugs, just as all software
system are, and the particular role self-administration behavior that
is at issue here appears to be something that was accidentally
introduced 16 years years ago in a commit that did something else and
never scrutinized from a design perspective since then.

Personally, I've been shocked by the degree to which this entire area
seems to be full of design flaws and half-baked code. I mean, just the
fact that the pg_auth_members.grantor can be left pointing to a role
OID that no longer exists is pretty crazy, right? I don't think anyone
today would consider something with that kind of wart committable.

> That isn't to say that having a LOGIN role get an automatic temporary WITH 
> ADMIN OPTION on itself is a good thing - but there isn't any privilege 
> escalation vector here to be squashed.  There is just a "DBAs should treat 
> LOGIN roles as leaf nodes" expectation in which case there would be no 
> superuser granted memberships to be removed.

Well, we may not have found one yet, but that doesn't prove none
exists. In any case, if we can agree that it's not necessarily a
desirable behavior, that's good enough for me.

(I still disagree with the idea that LOGIN roles have to be leaf
nodes. We could have a system where that's true, but that's not how
the system we actually have is designed.)

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: role self-revocation

2022-03-07 Thread David G. Johnston

On Mon, Mar 7, 2022 at 11:18 AM Robert Haas  wrote:

> In terms of how
>
things work today, see Joshua Brindle's email about the use of groups
> in pg_hba.conf. That is an excellent example of how removing oneself
> from a group could enable one to bypass security restrictions intended
> by the DBA.
>
>
You mean the one that was based upon your "ooops"...I discounted that
immediately because members cannot revoke their own membership in a group
unless they were given WITH ADMIN OPTION on that group.

The mere fact that the pg_hba.conf concern raised there hasn't been
reported as a live issue suggests the lack of any meaningful design flaw
here.

That isn't to say that having a LOGIN role get an automatic temporary WITH
ADMIN OPTION on itself is a good thing - but there isn't any privilege
escalation vector here to be squashed.  There is just a "DBAs should treat
LOGIN roles as leaf nodes" expectation in which case there would be no
superuser granted memberships to be removed.

David J.

Re: role self-revocation

2022-03-07 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > I'm not quite following this bit.  Where would SET ROLE come into play
> > when we're talking about old dump scripts and how the commands in those
> > scripts might be interpreted by newer versions of PG..?
> 
> No, the concern there is the other way around: what if you take a
> script made by newer pg_dump and try to load it into an older server
> that doesn't have the GRANTED BY option?

Wow.  No, I really don't think I can agree that we need to care about
this.

> We're accustomed to saying that that doesn't work if you use a
> database feature that didn't exist in the old server, but
> privilege grants are hardly that.  I don't want us to change the
> pg_dump output in such a way that the grants can't be restored at all
> to an older server, just because of a syntax choice that we could
> make backwards-compatibly instead of not-backwards-compatibly.

GRANTED BY is clearly such a feature that exists in the newer version
and doesn't exist in the older and I can't agree that we should
complicate things for ourselves and bend over backwards to try and make
it work to take a dump from a newer version of PG and make it work on
random older versions.

Folks are also able to exclude privileges from dumps if they want to.

Where do we document that we are going to put in effort to make these
kinds of things work?  What other guarantees are we supposed to be
providing regarding using output from a newer pg_dump against older
servers?  What about newer custom format dumps?  Surely you're not
suggesting that we need to back-patch support for them to released
versions of pg_restore.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: role self-revocation

2022-03-07 Thread Robert Haas

On Mon, Mar 7, 2022 at 1:58 PM Tom Lane  wrote:
> Stephen Frost  writes:
> > I'm not quite following this bit.  Where would SET ROLE come into play
> > when we're talking about old dump scripts and how the commands in those
> > scripts might be interpreted by newer versions of PG..?
>
> No, the concern there is the other way around: what if you take a
> script made by newer pg_dump and try to load it into an older server
> that doesn't have the GRANTED BY option?
>
> We're accustomed to saying that that doesn't work if you use a
> database feature that didn't exist in the old server, but
> privilege grants are hardly that.  I don't want us to change the
> pg_dump output in such a way that the grants can't be restored at all
> to an older server, just because of a syntax choice that we could
> make backwards-compatibly instead of not-backwards-compatibly.

Are you absolutely positive that it's that simple? I mean, what if the
SET ROLE command has other side effects, or if the GRANT command
behaves differently in some way as a result of the SET ROLE having
been done? I feel like a solution that involves explicitly specifying
the behavior that we want (i.e. GRANTED BY) is likely to be more
reliable and more secure than a solution which involves absorbing a
key value from a session property (i.e. the role established by SET
ROLE). Even if we decide that SET ROLE is the way to go for
compatibility reasons, I would personally say that it's an inferior
hack only worth accepting for that reason than a truly desirable
design.

See CVE-2018-1058 for an example of what I'm talking about. The
prevailing search_path turned out to affect not only the creation
schema, as intended, but also the resolution of references to other
objects mentioned in the CREATE COMMAND, as not intended. I don't see
a similar hazard here, but I'm worried that there might be one.
Declarative syntax is a very powerful tool for avoiding those kinds of
mishaps, and I think we should make as much use of it as we can.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: role self-revocation

2022-03-07 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> Agreed, this is not something to move on quickly.  We might want
> >> to think about adjusting pg_dump to use explicit GRANTED BY
> >> options in GRANT/REVOKE a release or two before making incompatible
> >> changes.
> 
> > I'm with Robert on this though- folks should know already that they need
> > to use the pg_dump of the version of PG that they want to move to and
> > not try to re-use older pg_dump output with newer versions, for a number
> > of reasons and this is just another.
> 
> Yeah, in an ideal world you'd do that, but our users don't always have
> the luxury of living in an ideal world.  Sometimes all you've got is
> an old pg_dump file.  Perhaps this behavior wouldn't mess things up
> enough to make the restored database unusable, but we need to think
> about (and test) that case while we're considering changes.

I agree it's something to consider and deal with if we're able to do so
sanely, but I disagree that we should be beholden to old dump files when
considering how to move the project forward.  Further, they can surely
build and install the version of PG that goes with that dump file in a
great many cases and then dump the data out using a newer version of
pg_dump.  For 5 years they could do that with a completely supported
version of PG, but we've recently agreed to make an effort to do more
here by supporting the building of even older versions on modern
systems.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Adding CI to our tree (ccache)

2022-03-07 Thread Andres Freund

Hi,

On 2022-03-06 10:16:54 -0600, Justin Pryzby wrote:
> On Fri, Mar 04, 2022 at 05:30:03PM -0800, Andres Freund wrote:
> > I tried to use it, but saw that no caching was happening, and debugged
> > it. Which yielded that it can't be used due to the way output files are
> > specified (and due to multiple files, but that can be prevented with an
> > msbuild parameter).
> 
> Could you give a hint about to the msbuild param to avoid processing multiple
> files with cl.exe?  I'm not able to find it...

/p:UseMultiToolTask=true


> I don't know about the issue with output filenames ..

I don't remember the precise details anymore, but it boils down to ccache
requiring the output filename to be specified, but we only specify the output
directory. Or very similar.

Greetings,

Andres Freund

Re: role self-revocation

2022-03-07 Thread Tom Lane

Stephen Frost  writes:
> I'm not quite following this bit.  Where would SET ROLE come into play
> when we're talking about old dump scripts and how the commands in those
> scripts might be interpreted by newer versions of PG..?

No, the concern there is the other way around: what if you take a
script made by newer pg_dump and try to load it into an older server
that doesn't have the GRANTED BY option?

We're accustomed to saying that that doesn't work if you use a
database feature that didn't exist in the old server, but
privilege grants are hardly that.  I don't want us to change the
pg_dump output in such a way that the grants can't be restored at all
to an older server, just because of a syntax choice that we could
make backwards-compatibly instead of not-backwards-compatibly.

regards, tom lane

Re: role self-revocation

2022-03-07 Thread Tom Lane

Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Agreed, this is not something to move on quickly.  We might want
>> to think about adjusting pg_dump to use explicit GRANTED BY
>> options in GRANT/REVOKE a release or two before making incompatible
>> changes.

> I'm with Robert on this though- folks should know already that they need
> to use the pg_dump of the version of PG that they want to move to and
> not try to re-use older pg_dump output with newer versions, for a number
> of reasons and this is just another.

Yeah, in an ideal world you'd do that, but our users don't always have
the luxury of living in an ideal world.  Sometimes all you've got is
an old pg_dump file.  Perhaps this behavior wouldn't mess things up
enough to make the restored database unusable, but we need to think
about (and test) that case while we're considering changes.

regards, tom lane

Re: role self-revocation

2022-03-07 Thread Stephen Frost

Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Sun, Mar 6, 2022 at 11:34 AM Tom Lane  wrote:
> > I was thinking the former ... however, after a bit of experimentation
> > I see that we accept "grant foo to bar granted by baz" a VERY long
> > way back, but the "granted by" option for object privileges is
> > (a) pretty new and (b) apparently restrictively implemented:
> >
> > regression=# grant delete on alices_table to bob granted by alice;
> > ERROR:  grantor must be current user
> >
> > That's ... surprising.  I guess whoever put that in was only
> > interested in pro-forma SQL syntax compliance and not in making
> > a usable feature.
> 
> It appears so: 
> https://www.postgresql.org/message-id/2073b6a9-7f79-5a00-5f26-cd19589a52c7%402ndquadrant.com
> 
> It doesn't seem like that would be hard to fix. Maybe we should just do that.

Yeah, that seems like something that should be fixed.  Superusers should
be allowed to set GRANTED BY to whatever they feel like, and I'd argue
that a role who wants a GRANT to actually be GRANTED BY some other role
they're a member of should also be allowed to (as they could anyway by
doing a SET ROLE), provided that role also has the privileges to do the
GRANT itself, of course.

> > So if we decide to extend this change into object privileges
> > it would be advisable to use SET ROLE, else we'd be giving up
> > an awful lot of backwards compatibility in dump scripts.
> > But if we're only talking about role grants then I think
> > GRANTED BY would work fine.
> 
> OK.

I'm not quite following this bit.  Where would SET ROLE come into play
when we're talking about old dump scripts and how the commands in those
scripts might be interpreted by newer versions of PG..?

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: role self-revocation

2022-03-07 Thread David G. Johnston

On Mon, Mar 7, 2022 at 11:18 AM Robert Haas  wrote:

> On Sun, Mar 6, 2022 at 11:01 PM David G. Johnston
>  wrote:
> > The example, which you moved here, then attempts to demonstrate this
> "fact" but gets it wrong.  Boss became a member of peon so if you want to
> demonstrate self-administration of a role's membership in a different group
> you have to login as boss, not peon.  Doing that, and then revoking peon
> from boss, yields "ERROR: must have admin option on role "peon"".
>
> This doesn't seem to me to be making a constructive argument. I showed
> an example with certain names demonstrating a certain behavior that I
> find problematic.

Whether you choose the wording of the original thread:

"This is because we allow 'self administration' of roles, meaning that
they can decide what other roles they are a member of."

https://www.postgresql.org/message-id/flat/20211005025746.GN20998%40tamriel.snowman.net

Or you quote at the top of this one:

> The ability of a role to revoke itself from some other role is just
> something we need to accept as being a change that needs to be made,

This example:

rhaas=# create user boss;
CREATE ROLE
rhaas=# create user peon;
CREATE ROLE
rhaas=# grant peon to boss;
GRANT ROLE
rhaas=# \c - peon
You are now connected to database "rhaas" as user "peon".
rhaas=> revoke peon from boss; -- i don't like being bossed around!
REVOKE ROLE

Fails to demonstrate the boss "can revoke itself from peon" / "boss can
decide what other roles they are a member of."

You are logged in as peon when you do the revoke, not boss, so the extent
of what "boss" can or cannot do has not been shown.

boss is a member of peon, not the other way around.  That the wording
"grant peon to boss" makes you think otherwise is unfortunate.

David J.

Re: role self-revocation

2022-03-07 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Robert Haas  writes:
> > 1. What should be the exact rule for whether A can remove a grant made
> > by B? Is it has_privs_of_role()? is_member_of_role()? Something else?
> 
> No strong opinion here, but I'd lean slightly to the more restrictive
> option.
> 
> > 2. What happens if the same GRANT is enacted by multiple users? For
> > example, suppose peon does "GRANT peon to boss" and then the superuser
> > does the same thing afterwards, or vice versa? One design would be to
> > try to track those as two separate grants, but I'm not sure if we want
> > to add that much complexity, since that's not how we do it now and it
> > would, for example, implicate the choice of PK on the pg_auth_members
> > table.
> 
> As you note later, we *do* track such grants separately in ordinary
> ACLs, and I believe this is clearly required by the SQL spec.

Agreed.

> It says (for privileges on objects):
> 
> Each privilege is represented by a privilege descriptor.
> A privilege descriptor contains:
> — The identification of the object on which the privilege is granted.
> — The  of the grantor of the privilege.
>   ^^^
> — The  of the grantee of the privilege.
> — Identification of the action that the privilege allows.
> — An indication of whether or not the privilege is grantable.
> — An indication of whether or not the privilege has the WITH HIERARCHY 
> OPTION specified.
> 
> Further down (4.42.3 in SQL:2021), the granting of roles is described,
> and that says:
> 
> Each role authorization is described by a role authorization descriptor.
> A role authorization descriptor includes:
> — The role name of the role.
> — The authorization identifier of the grantor.
>   
> — The authorization identifier of the grantee.
> — An indication of whether or not the role authorization is grantable.
> 
> If we are not tracking the grantors of role authorizations,
> then we are doing it wrong and we ought to fix that.

Yup, and as noted elsewhere, we are tracking it but not properly dealing
with dependencies nor are we considering the grantor when REVOKE is run.

Looking at the spec for REVOKE is quite useful when trying to understand
how this is all supposed to work (and, admittedly, isn't something I did
enough of when I did the original work on roles... sorry about that, was
early on).  In particular, a REVOKE only works when it finds something
to revoke/remove, and part of that search includes basically "was it the
current role who was the grantor?"

The specific language here being: A role authorization descriptor is
said to be identified if it defines the grant of any of the specified
roles revoked to grantee with grantor A.

Basically, a role authorization descriptor isn't identified unless it's
one that this user/role had previously granted.

> > 3. What happens if a user is dropped after being recorded as a
> > grantor?
> 
> Should work the same as it does now for ordinary ACLs, ie, you
> gotta drop the grant first.

Agreed.

> > 4. Should we apply this rule to other types of grants, rather than
> > just to role membership?
> 
> I am not sure about the reasoning behind the existing rule that
> superuser-granted privileges are recorded as being granted by the
> object owner.  It does feel more like a wart than something we want.
> It might have been a hack to deal with the lack of GRANTED BY
> options in GRANT/REVOKE back in the day.

Yeah, that doesn't seem right and isn't great.

> Changing it could have some bad compatibility consequences though.
> In particular, I believe it would break existing pg_dump files,
> in that after restore all privileges would be attributed to the
> restoring superuser, and there'd be no very easy way to clean that
> up.

Ugh, that's pretty grotty, certainly.

> > Please note that it is not really my intention to try to shove
> > anything into v15 here.
> 
> Agreed, this is not something to move on quickly.  We might want
> to think about adjusting pg_dump to use explicit GRANTED BY
> options in GRANT/REVOKE a release or two before making incompatible
> changes.

I'm with Robert on this though- folks should know already that they need
to use the pg_dump of the version of PG that they want to move to and
not try to re-use older pg_dump output with newer versions, for a number
of reasons and this is just another.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: role self-revocation

2022-03-07 Thread Robert Haas

On Mon, Mar 7, 2022 at 1:28 PM Tom Lane  wrote:
> Ugh, I think you are right.  It's been a long time of course, but it sure
> looks like that was copied-and-pasted without recognizing that it was
> wrong in this function because of the need to check the admin_option flag.
> And then in the later security discussion we didn't realize that the
> problematic behavior was a flat-out thinko, so we narrowed it as much as
> we could instead of just taking it out.
>
> Does anything interesting break if you do just take it out?

That is an excellent question, but I haven't had time yet to
investigate the matter.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: role self-revocation

2022-03-07 Thread Tom Lane

Robert Haas  writes:
> It appears to me that the actual behavior of having is_admin_of_role()
> return true when member == role dates to
> f9fd1764615ed5d85fab703b0ffb0c323fe7dfd5 (Tom Lane, 2005). If I'm not
> reading this code wrong, prior to that commit, it seems to me that we
> only searched the roles that were members of that role, directly or
> indirectly, and you had to have admin_option on the last hop of the
> membership chain in order to get a "true" result. But that commit,
> among other changes, made member == role a special case, but the
> comment just says /* Fast path for simple case */ which makes it
> appear that it wasn't thought to be a behavior change at all, but it
> looks to me like it was. Am I confused?

Ugh, I think you are right.  It's been a long time of course, but it sure
looks like that was copied-and-pasted without recognizing that it was
wrong in this function because of the need to check the admin_option flag.
And then in the later security discussion we didn't realize that the
problematic behavior was a flat-out thinko, so we narrowed it as much as
we could instead of just taking it out.

Does anything interesting break if you do just take it out?

regards, tom lane

Re: role self-revocation

2022-03-07 Thread Robert Haas

On Sun, Mar 6, 2022 at 11:01 PM David G. Johnston
 wrote:
> The example, which you moved here, then attempts to demonstrate this "fact" 
> but gets it wrong.  Boss became a member of peon so if you want to 
> demonstrate self-administration of a role's membership in a different group 
> you have to login as boss, not peon.  Doing that, and then revoking peon from 
> boss, yields "ERROR: must have admin option on role "peon"".

This doesn't seem to me to be making a constructive argument. I showed
an example with certain names demonstrating a certain behavior that I
find problematic. You don't have to think it's problematic, and you
can show other examples that demonstrate things you want to show. But
please don't tell me that when I literally cut and paste the output
from my terminal into an email window, what I'm showing is somehow
counterfactual. The behavior as it exists today is surely a fact, and
an easily demonstrable one at that. It's not a "fact'" in quotes, and
it doesn't "get it wrong". It is the actual behavior and the example
with the names I picked demonstrates precisely what I want to
demonstrate. When you say that I should have chosen a different
example or used different identifier names or talked about it in
different way, *that* is an opinion. I believe that you are wholly
entitled to that opinion, even if (as in this case) I disagree, but I
believe that it is not right at all to make it sound as if I don't
have the right to pick the examples I care about, or as if terminal
output is not a factual representation of how things work today.

> So no, without "WITH ADMIN OPTION" a role cannot decide what other roles they 
> are a member of.

It clearly can in some limited cases, because I showed an example
demonstrating *exactly that thing*.

> I don't necessarily have an issue changing self-administration but if the 
> motivating concern is that all these new pg_* roles we are creating are 
> something a normal user can opt-out of/revoke that simply isn't the case 
> today, unless they are added to the pg_* role WITH ADMIN OPTION.

I agree with this, but that's not my concern, because that's a
different use case from the one that I complained about. Since the
session user exception only applies to login roles, the problem that
I'm talking about only occurs when a login role is granted to some
other role.

> That all said, permissions SHOULD BE strictly additive.  If boss doesn't want 
> to be a member of pg_read_all_files allowing them to revoke themself from 
> that role seems like it should be acceptable.  If there is fear in allowing 
> someone to revoke (not add) themselves as a member of a different role that 
> suggests we have a design issue in another feature of the system.  Today, 
> they neither grant nor revoke, and the self-revocation doesn't seem that 
> important to add.

I disagree with this on principle, and I also think that's not how it
works today. On the general principle, I do not see a compelling
reason why we should have two systems for maintaining groups of users,
one of which is used for additive things and one of which is used for
subtractive things. That is a lot of extra machinery for little gain,
especially given how close we are to having it sorted out so that the
same mechanism can serve both purposes. It presently appears to me
that if we either remove the session user exception OR do the
grantor-tracking thing discussed earlier, we can get to a place where
the same facility can be used for either purpose. That would, I think,
be a significant step forward over the status quo. In terms of how
things work today, see Joshua Brindle's email about the use of groups
in pg_hba.conf. That is an excellent example of how removing oneself
from a group could enable one to bypass security restrictions intended
by the DBA.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [RFC] building postgres with meson

2022-03-07 Thread Andres Freund

Hi,

On 2022-03-07 14:56:24 +0100, Peter Eisentraut wrote:
> Is there a current patch set to review in this thread at the moment?

I've been regularly rebasing and improving the patchset, but didn't post to
the thread about it most of the time.

I've just pushed another rebase, will work to squash it into a reasonable
number of patches and then repost that.

Greetings,

Andres Freund

Re: role self-revocation

2022-03-07 Thread Robert Haas

On Mon, Mar 7, 2022 at 11:04 AM Tom Lane  wrote:
> > Is there some use case for the behavior described in that last
> > sentence?
>
> Good question.  You might try figuring out when that text was added
> and then see if there's relevant discussion in the archives.

Apparently the permission used to be broader, and commit
fea164a72a7bfd50d77ba5fb418d357f8f2bb7d0 (February 2014, Noah,
CVE-2014-0060) restricted it by requiring that (a) the user had to be
the logged-in user, rather than an identity assumed via SET ROLE (so
maybe my bogus example from before would have worked in 2013) and (b)
that we're not in a security-restricted operation at the time.

Interestingly, it appears to me that the behavior wasn't documented
prior to that commit. The previous text read simply:

If WITH ADMIN OPTION is specified, the member can
in turn grant membership in the role to others, and revoke membership
in the role as well.  Without the admin option, ordinary users cannot do
that.

That doesn't give any hint that self-administration is a special case.

I reviewed the (private) discussion of this vulnerability on the
pgsql-security mailing list where various approaches were considered.
I think it's safe to share a few broad details about that conversation
publicly now, since it was many years ago and the fix has long since
been published. There was discussion of making this
self-administration behavior something that could be turned off, but
such a change was deemed too large for the back-branches. There was no
discussion that I could find about removing the behavior altogether.
It was noted that having a special case for this was different than
granting WITH ADMIN OPTION because WITH ADMIN OPTION is inherited and
being logged in as a certain user is not.

It appears to me that the actual behavior of having is_admin_of_role()
return true when member == role dates to
f9fd1764615ed5d85fab703b0ffb0c323fe7dfd5 (Tom Lane, 2005). If I'm not
reading this code wrong, prior to that commit, it seems to me that we
only searched the roles that were members of that role, directly or
indirectly, and you had to have admin_option on the last hop of the
membership chain in order to get a "true" result. But that commit,
among other changes, made member == role a special case, but the
comment just says /* Fast path for simple case */ which makes it
appear that it wasn't thought to be a behavior change at all, but it
looks to me like it was. Am I confused?

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] minor reloption regression tests improvement

2022-03-07 Thread Nikolay Shaplov

В письме от понедельник, 7 марта 2022 г. 20:04:49 MSK пользователь Greg Stark 
написал:
> I don't think this is worth spending time adding tests for. I get what
> you're saying that this is at least semi-intentional behaviour and
> desirable behaviour so it should have tests ensuring that it continues
> to work. But it's not documented behaviour and the test is basically
> testing that the implementation is this specific implementation.
> 
> I don't think the test is really a bad idea but neither is it really
> very useful and I think it's not worth having people spend time
> reviewing and discussing. I'm leaning to saying this patch is
> rejected.

This is a regression test. To test if behaviour brocken or not.

Actually I came to the idea of the test when I wrote my big reloption patch. 
I've broken this behaviour by mistake and  did not notice it as it have not 
been properly tested. So I decided it would be goo to add test to it before 
adding and big patches.

-- 
Nikolay Shaplov aka Nataraj
Fuzzing Engineer at Postgres Professional
Matrix IM: @dhyan:nataraj.su

refreshing query id for pg_stat_statements based on comment in sql

2022-03-07 Thread Zhihong Yu

Hi,
Currently the query id for pg_stat_statements gets calculated based on the
parse nodes specifics.
This means that the user cannot add a comment to a SQL query to test
something. (though some other RDBMS allows this practice).

Consider this use case: for query q, admin looks at stats and performs some
optimization (without changing the query). Admin adds / modifies the
comment for q - now the query becomes q'. If query id doesn't change, there
still would be one row in pg_stat_statements which makes it difficult to
gauge the effectiveness of the tuning.

I want to get opinion from the community whether adding / changing comment
in SQL query should result in new query id for pg_stat_statements.

Cheers

Re: [PATCH] minor reloption regression tests improvement

2022-03-07 Thread Greg Stark

I don't think this is worth spending time adding tests for. I get what
you're saying that this is at least semi-intentional behaviour and
desirable behaviour so it should have tests ensuring that it continues
to work. But it's not documented behaviour and the test is basically
testing that the implementation is this specific implementation.

I don't think the test is really a bad idea but neither is it really
very useful and I think it's not worth having people spend time
reviewing and discussing. I'm leaning to saying this patch is
rejected.

Re: role self-revocation

2022-03-07 Thread David G. Johnston

On Mon, Mar 7, 2022 at 9:04 AM Tom Lane  wrote:

> Just looking at it now, without having done any historical research,
>
I wonder why it is that we don't attach significance to WITH ADMIN
> OPTION being granted to the role itself.  It seems like the second
> part of that sentence is effectively saying that a role DOES have
> admin option on itself, contradicting the first part.
>
>
WITH ADMIN OPTION is inheritable which is really bad if the group has WITH
ADMIN OPTION on itself.  The session_user exception temporarily grants WITH
ADMIN OPTION to the group but it is done in such a way so that it is not
inheritable.

There is no possible way to even assign WITH ADMIN OPTION on a role to
itself since pg_auth_members doesn't record a self-relationship and
admin_option only exists there.

David J.

P.S. Feature request; modify \du+ to show which "Member of" roles a given
role has the WITH ADMIN OPTION privilege on.

Re: logical decoding and replication of sequences

2022-03-07 Thread Tomas Vondra

On 2/28/22 12:46, Amit Kapila wrote:
> On Sat, Feb 12, 2022 at 6:04 AM Tomas Vondra
>  wrote:
>>
>> On 2/10/22 19:17, Tomas Vondra wrote:
>>> I've polished & pushed the first part adding sequence decoding
>>> infrastructure etc. Attached are the two remaining parts.
>>>
>>> I plan to wait a day or two and then push the test_decoding part. The
>>> last part (for built-in replication) will need more work and maybe
>>> rethinking the grammar etc.
>>>
>>
>> I've pushed the second part, adding sequences to test_decoding.
>>
> 
> The test_decoding is failing randomly in the last few days. I am not
> completely sure but they might be related to this work. The two of
> these appears to be due to the same reason:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2022-02-25%2018%3A50%3A09
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust=2022-02-17%2015%3A17%3A07
> 
> TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File:
> "reorderbuffer.c", Line: 1173, PID: 35013)
> 0   postgres0x00593de0 ExceptionalCondition + 
> 160\\0
> 

This might be related to the issue reported by Amit, i.e. that
sequence_decode does not call ReorderBufferProcessXid(). If this keeps
failing, we'll have to add some extra debug info (logging LSN etc.), at
least temporarily. It'd be valuable to inspect the WAL too.

> Another:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill=2022-02-16%2006%3A21%3A48
> 
> --- 
> /home/nm/farm/xlc32/HEAD/pgsql.build/contrib/test_decoding/expected/rewrite.out
> 2022-02-14 20:19:14.0 +
> +++ 
> /home/nm/farm/xlc32/HEAD/pgsql.build/contrib/test_decoding/results/rewrite.out
> 2022-02-16 07:42:18.0 +
> @@ -126,6 +126,7 @@
>table public.replication_example: INSERT: id[integer]:4
> somedata[integer]:3 text[character varying]:null
> testcolumn1[integer]:null
>table public.replication_example: INSERT: id[integer]:5
> somedata[integer]:4 text[character varying]:null
> testcolumn1[integer]:2 testcolumn2[integer]:1
>COMMIT
> +  sequence public.replication_example_id_seq: transactional:0
> last_value: 38 log_cnt: 0 is_called:1
>BEGIN
>table public.replication_example: INSERT: id[integer]:6
> somedata[integer]:5 text[character varying]:null
> testcolumn1[integer]:3 testcolumn2[integer]:null
>COMMIT
> @@ -133,7 +134,7 @@
>table public.replication_example: INSERT: id[integer]:7
> somedata[integer]:6 text[character varying]:null
> testcolumn1[integer]:4 testcolumn2[integer]:null
>table public.replication_example: INSERT: id[integer]:8
> somedata[integer]:7 text[character varying]:null
> testcolumn1[integer]:5 testcolumn2[integer]:null
> testcolumn3[integer]:1
>COMMIT
> - (15 rows)
> + (16 rows)
> 

Interesting. I can think of one reason that might cause this - we log
the first sequence increment after a checkpoint. So if a checkpoint
happens in an unfortunate place, there'll be an extra WAL record. On
slow / busy machines that's quite possible, I guess.

I wonder if these two issues might be related ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: logical decoding and replication of sequences

2022-03-07 Thread Tomas Vondra




On 3/1/22 12:53, Amit Kapila wrote:
> On Mon, Feb 28, 2022 at 5:16 PM Amit Kapila  wrote:
>>
>> On Sat, Feb 12, 2022 at 6:04 AM Tomas Vondra
>>  wrote:
>>>
>>> On 2/10/22 19:17, Tomas Vondra wrote:
 I've polished & pushed the first part adding sequence decoding
 infrastructure etc. Attached are the two remaining parts.

 I plan to wait a day or two and then push the test_decoding part. The
 last part (for built-in replication) will need more work and maybe
 rethinking the grammar etc.

>>>
>>> I've pushed the second part, adding sequences to test_decoding.
>>>
>>
>> The test_decoding is failing randomly in the last few days. I am not
>> completely sure but they might be related to this work. The two of
>> these appears to be due to the same reason:
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2022-02-25%2018%3A50%3A09
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust=2022-02-17%2015%3A17%3A07
>>
>> TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File:
>> "reorderbuffer.c", Line: 1173, PID: 35013)
>> 0   postgres0x00593de0 ExceptionalCondition + 
>> 160\\0
>>
> 
> While reviewing the code for this, I noticed that in
> sequence_decode(), we don't call ReorderBufferProcessXid to register
> the first known lsn in WAL for the current xid. The similar functions
> logicalmsg_decode() or heap_decode() do call ReorderBufferProcessXid
> even if they decide not to queue or send the change. Is there a reason
> for not doing the same here? However, I am not able to deduce any
> scenario where lack of this will lead to such an Assertion failure.
> Any thoughts?
> 

Thanks, that seems like an omission. Will fix.


regards


-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Handle infinite recursion in logical replication setup

2022-03-07 Thread vignesh C

On Mon, Mar 7, 2022 at 5:51 PM Amit Kapila  wrote:
>
> On Mon, Mar 7, 2022 at 5:01 PM Ashutosh Bapat
>  wrote:
> >
> > Hi Vignesh,
> > I agree with Peter's comment that the changes to
> > FilterRemoteOriginData() should be part of FilterByOrigin()
> >
> > Further, I wonder why "onlylocal_data" is a replication slot's
> > property. A replication slot tracks the progress of replication and it
> > may be used by different receivers with different options. I could
> > start one receiver which wants only local data, say using
> > "pg_logical_slot_get_changes" and later start another receiver which
> > fetches all the data starting from where the first receiver left. This
> > option prevents such flexibility.
> >
> > As discussed earlier in the thread, local_only can be property of
> > publication or subscription, depending upon the use case, but I can't
> > see any reason that it should be tied to a replication slot.
> >
>
> I thought it should be similar to 'streaming' option of subscription
> but may be Vignesh has some other reason which makes it different.

Yes, this can be removed from the replication slot. It is my mistake
that I have made while making the code similar to two-phase, I'm
working on making the changes for this. I will fix and post an updated
version for this.

Regards,
Vignesh

Re: New Table Access Methods for Multi and Single Inserts

2022-03-07 Thread Matthias van de Meent

On Sun, 6 Mar 2022 at 12:12, Bharath Rupireddy
 wrote:
>
> On Fri, Mar 4, 2022 at 8:07 PM Matthias van de Meent
>  wrote:
> > > Another rebase due to conflicts in 0003. Attaching v6 for review.
> >
> > I recently touched the topic of multi_insert, and I remembered this
> > patch. I had to dig a bit to find it, but as it's still open I've
> > added some comments:
>
> Thanks for reviving the thread. I almost lost hope in it. In fact, it
> took me a while to recollect the work and respond to your comments.
> I'm now happy to answer or continue working on this patch if you or
> someone is really interested to review it and take it to the end.
>
> > > diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
> > > +#define MAX_BUFFERED_TUPLES1000
> > > +#define MAX_BUFFERED_BYTES65535
> >
> > It looks like these values were copied over from copyfrom.c, but are
> > now expressed in the context of the heapam.
> > As these values are now heap-specific (as opposed to the
> > TableAM-independent COPY infrastructure), shouldn't we instead
> > optimize for heap page insertions? That is, I suggest using a multiple
> > (1 or more) of MaxHeapTuplesPerPage for _TUPLES, and that same / a
> > similar multiple of BLCKSZ for MAX_BUFFERED_BYTES.
>
> We can do that. In fact, it is a good idea to let callers pass in as
> an input to tuple_insert_begin and have it as part of
> TableInsertState. If okay, I will do that in the next patch.
>
> > > TableInsertState->flushed
> > > TableInsertState->mi_slots
> >
> > I don't quite like the current storage-and-feedback mechanism for
> > flushed tuples. The current assumptions in this mechanism seem to be
> > that
> > 1.) access methods always want to flush all available tuples at once,
> > 2.) access methods want to maintain the TupleTableSlots for all
> > inserted tuples that have not yet had all triggers handled, and
> > 3.) we need access to the not-yet-flushed TupleTableSlots.
> >
> > I think that that is not a correct set of assumptions; I think that
> > only flushed tuples need to be visible to the tableam-agnostic code;
> > and that tableams should be allowed to choose which tuples to flush at
> > which point; as long as they're all flushed after a final
> > multi_insert_flush.
> >
> > Examples:
> > A heap-based access method might want bin-pack tuples and write out
> > full pages only; and thus keep some tuples in the buffers as they
> > didn't fill a page; thus having flushed only a subset of the current
> > buffered tuples.
> > A columnstore-based access method might not want to maintain the
> > TupleTableSlots of original tuples, but instead use temporary columnar
> > storage: TupleTableSlots are quite large when working with huge
> > amounts of tuples; and keeping lots of tuple data in memory is
> > expensive.
> >
> > As such, I think that this should be replaced with a
> > TableInsertState->mi_flushed_slots + TableInsertState->mi_flushed_len,
> > managed by the tableAM, in which only the flushed tuples are visible
> > to the AM-agnostic code. This is not much different from how the
> > implementation currently works; except that ->mi_slots now does not
> > expose unflushed tuples; and that ->flushed is replaced by an integer
> > value of number of flushed tuples.
>
> Yeah, that makes sense. Let's table AMs expose the flushed tuples
> outside on which the callers can handle the after-insert row triggers
> upon them.
>
> IIUC, TableInsertState needs to have few other variables:
>
> /* Below members are only used for multi inserts. */
> /* Array of buffered slots. */
> TupleTableSlot  **mi_slots;

Not quite: there's no need for TupleTableSlot  **mi_slots in the
TableInsertState; as the buffer used by the tableAM to buffer
unflushed tuples shouldn't be publicly visible. I suspect that moving
that field to HeapMultiInsertState instead would be the prudent thing
to do; limiting the external access of AM-specific buffers.

> /* Number of slots that are currently buffered. */
> int32   mi_cur_slots;
> /* Array of flushed slots that will be used by callers to handle
> after-insert row triggers or similar events outside */
> TupleTableSlot  **mi_flushed_slots ;
> /* Number of slots that are currently buffered. */
> int32   mi_no_of_flushed_slots;
>
> The implementation of heap_multi_insert_flush will just set the
> mi_slots to mi_flushed_slots.

Yes.

> > A further improvement (in my opinion) would be the change from a
> > single multi_insert_flush() to a signalling-based multi_insert_flush:
> > It is not unreasonable for e.g. a columnstore to buffer tens of
> > thousands of inserts; but doing so in TupleTableSlots would introduce
> > a high memory usage. Allowing for batched retrieval of flushed tuples
> > would help in memory usage; which is why multiple calls to
> > multi_insert_flush() could be useful. To handle this gracefully, we'd
> > probably add TIS->mi_flush_remaining, where each insert adds one to
> >

Re: Changing "Hot Standby" to "hot standby"

2022-03-07 Thread Daniel Westermann (DWE)

>>> Thanks for having a look. Are you suggesting to change it like this?
>>> -Hot Standby is the term used to describe the ability to connect to
>>> +Hot standby is the term used to describe the ability to connect to

>>Yes.  Isn't it the right form of a sentence?

I've created and entry in the Commitfest 2022-07 for this.

Regards
Daniel

Re: role self-revocation

2022-03-07 Thread Tom Lane

Robert Haas  writes:
> Hmm. I think the real issue is what David Johnson calls the session
> user exception. I hadn't quite understood how that played into this.
> According to the documentation: "If WITH ADMIN OPTION is specified,
> the member can in turn grant membership in the role to others, and
> revoke membership in the role as well. Without the admin option,
> ordinary users cannot do that. A role is not considered to hold WITH
> ADMIN OPTION on itself, but it may grant or revoke membership in
> itself from a database session where the session user matches the
> role."

> Is there some use case for the behavior described in that last
> sentence?

Good question.  You might try figuring out when that text was added
and then see if there's relevant discussion in the archives.

Just looking at it now, without having done any historical research,
I wonder why it is that we don't attach significance to WITH ADMIN
OPTION being granted to the role itself.  It seems like the second
part of that sentence is effectively saying that a role DOES have
admin option on itself, contradicting the first part.

regards, tom lane

Re: role self-revocation

2022-03-07 Thread David G. Johnston

On Mon, Mar 7, 2022 at 8:37 AM Robert Haas  wrote:

> A role is not considered to hold WITH
> ADMIN OPTION on itself, but it may grant or revoke membership in
> itself from a database session where the session user matches the
> role."
>
> Is there some use case for the behavior described in that last
> sentence?

I can imagine, in particular combined with CREATEROLE, that this allows for
any user to delegate their personal permissions to a separate newly created
user.  Like an assistant.  I'm not all that sure whether CREATEROLE is
presently safe enough to give to a normal user in order to make this use
case work but it seems reasonable.

I would be concerned about changing the behavior at this point.  But I
would be in favor of at least removing the hard-coded exception and linking
it to a role attribute.  That attribute can default to "SELFADMIN" to match
the existing behavior but then "NOSELFADMIN" would exist to disable that
behavior on the per-role basis.  Still tied to session_user as opposed to
current_user.

David J.

P.S.

create role selfadmin admin selfadmin; -- ERROR: role "selfadmin" is a
member of role "selfadmin"

create role selfadmin;
grant selfadmin to selfadmin with admin option; -- ERROR: role "selfadmin"
is a member of role "selfadmin"

The error message seems odd.  I tried this because instead of a "SELFADMIN"
attribute adding a role to itself WITH ADMIN OPTION could be defined to
have the same effect.  You cannot change WITH ADMIN OPTION independently of
the adding of the role to the group.

Re: role self-revocation

2022-03-07 Thread Robert Haas

On Sun, Mar 6, 2022 at 2:09 PM David G. Johnston
 wrote:
> So, do we really want to treat every single login role as a potential group 
> by keeping the session_user exception?

I think that we DO want to continue to treat login roles as
potentially grantable privileges. That feels fundamentally useful to
me. The superuser is essentially granted the privileges of all users
on the system, and has all the rights they have, including the right
to drop tables owned by those users as if they were the owner of those
tables. If it's useful for the superuser to implicitly have the rights
of all users on the system, why should it not be useful for some
non-superuser to implicitly have the rights of some other users on the
system? I think it pretty clearly is. If one of my colleagues leaves
the company, the DBA can say "grant jdoe to rhaas" and let me mess
around with this stuff. Or, the DBA can grant me the privileges of all
my direct reports even when they're not leaving so that I can sort out
anything I need to do without superuser involvement. That all seems
cool and OK to me.

Now I think it is fair to say that we could have chosen a different
design, and MAYBE that would have been better. Nobody forced us to
conflate users and groups into a unified thing called roles, and I
think there's pretty good evidence that it's confusing and
counterintuitive in some ways. There's also no intrinsic reason why
the superuser has to be able to directly exercise the privileges of
every role rather than, say, having a way to become any given role.
But at this point, those design decisions are pretty well baked into
the system design, and I don't really think it's likely that we want
to change them. To put that another way, just because you don't like
the idea of granting one login role to another login role, that
doesn't mean that the feature doesn't exist, and as long as that
feature does exist, trying to make it work better or differently is
fair game.

But I think that's separate from your other question about whether we
should remove the session user exception. That looks tempting to me at
first glance, because we have exchanged several hundred, and it really
feels more like several million, emails on this list about how much of
a problem it is that an unprivileged user can just log in and run a
REVOKE. It breaks the idea that the people WITH ADMIN OPTION on a role
are the ones who control membership in that role. Joshua Brindle's
note upthread about the interaction of this with pg_hba.conf is
another example of that, and I think there are more. Any idea that a
role is a general-purpose way of designating a group of users for some
security critical purpose is threatened if people can make changes to
the membership of that group without being specifically authorized to
do so.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: New developer papercut - Makefile references INSTALL

2022-03-07 Thread Tom Lane

Magnus Hagander  writes:
> If anything, I'm more behind the idea of just getting rid of the
> INSTALL file. A reference to the install instructions in the README
> should be enough today. The likelihood of somebody getting a postgres
> source tarball and trying to build it for the first time while not
> having internet access is extremely low I'd say.

I agree that there's no longer a lot of reason to insist that the
installation instructions need to be present in a flat text file
as opposed to some other way.

However, just putting a URL into README seems problematic, because how
will we ensure that it's the correct version-specific URL?  (And it does
need to be version-specific; the set of configure options changes over
time, and that's not even mentioning whatever user-visible effects
changing to meson will have.)  You could imagine generating the URL
during tarball build, but that does nothing for the people who pull
directly from git.

I thought briefly about directing people to read
doc/src/sgml/html/installation.html, but that has the same problem
that it won't be present in a fresh git pull.

regards, tom lane

Re: role self-revocation

2022-03-07 Thread Robert Haas

On Sun, Mar 6, 2022 at 11:53 AM Tom Lane  wrote:
> Really?
>
> regression=# grant admin to joe;
> GRANT ROLE
> regression=# grant admin to sally;
> GRANT ROLE
> regression=# \c - joe
> You are now connected to database "regression" as user "joe".
> regression=> revoke admin from sally;
> ERROR:  must have admin option on role "admin"
> regression=> set role admin;
> SET
> regression=> revoke admin from sally;
> ERROR:  must have admin option on role "admin"

Oops. I stand corrected.

> I think there is an issue here around exactly what the admin option
> means, but if it doesn't grant you the ability to remove grants
> made by other people, it's pretty hard to see what it's for.

Hmm. I think the real issue is what David Johnson calls the session
user exception. I hadn't quite understood how that played into this.
According to the documentation: "If WITH ADMIN OPTION is specified,
the member can in turn grant membership in the role to others, and
revoke membership in the role as well. Without the admin option,
ordinary users cannot do that. A role is not considered to hold WITH
ADMIN OPTION on itself, but it may grant or revoke membership in
itself from a database session where the session user matches the
role."

Is there some use case for the behavior described in that last
sentence? If that exception is the only case in which an unprivileged
user can revoke a grant made by someone else, then getting rid of it
seems pretty appealing from where I sit. I can't speak to the
standards compliance end of things, but it doesn't intrinsically seem
bothersome that having "WITH ADMIN OPTION" on a role lets you control
who has membership in said role. And certainly it's not bothersome
that the superuser can change whatever they want. The problem here is
just that a user with NO special privileges on any role, including
their own, can make changes that more privileged users might not like.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: role self-revocation

2022-03-07 Thread Robert Haas

On Sun, Mar 6, 2022 at 11:34 AM Tom Lane  wrote:
> I was thinking the former ... however, after a bit of experimentation
> I see that we accept "grant foo to bar granted by baz" a VERY long
> way back, but the "granted by" option for object privileges is
> (a) pretty new and (b) apparently restrictively implemented:
>
> regression=# grant delete on alices_table to bob granted by alice;
> ERROR:  grantor must be current user
>
> That's ... surprising.  I guess whoever put that in was only
> interested in pro-forma SQL syntax compliance and not in making
> a usable feature.

It appears so: 
https://www.postgresql.org/message-id/2073b6a9-7f79-5a00-5f26-cd19589a52c7%402ndquadrant.com

It doesn't seem like that would be hard to fix. Maybe we should just do that.

> So if we decide to extend this change into object privileges
> it would be advisable to use SET ROLE, else we'd be giving up
> an awful lot of backwards compatibility in dump scripts.
> But if we're only talking about role grants then I think
> GRANTED BY would work fine.

OK.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: New developer papercut - Makefile references INSTALL

2022-03-07 Thread Magnus Hagander

On Thu, Feb 10, 2022 at 12:52 AM Tom Lane  wrote:

(bringing this one back from the dead)

>
> Andres Freund  writes:
> > On 2022-02-09 22:32:59 +0100, Magnus Hagander wrote:
> >> post-commit hooks don't run on the git server, they run locally on
> >> your machine. There is a "post receive" hook that runs on the git
> >> server, but we definitely don't want that one to fabricate new commits
> >> I think.
>
> > Why not? We probably wouldn't want to do synchronously as part of the 
> > receive
> > hook, but if we have a policy that INSTALL is not to be updated by humans, 
> > but
> > updated automatically whenever its sources are modified, I'd be OK with
> > auto-committing that.

Auto committing in git becomes very.. Special. In that case you would
push but after that you would still not be in sync and have to pull
back down to compare.

It would also require the running of a comparatively very complex
build script inside the very restricted environment that is the git
master server. I do not want to have to deal with that level of
complexity there, and whatever security implications can come from it.

But if you want to accomplish something similar you could have a batch
job that runs on maybe a daily basis and builds the INSTALL file and
commits it to the repo. I still don't really like the idea of
committing automatically (in the github/gitlab world, having such a
tool generate a MR/PR would be perfectly fine, but pushing directly to
the repo I really prefer being something that has human eyes on the
process). Or post a patch with it for someone to look at.

> What happens when the INSTALL build fails (which is quite possible,
> I believe, even if a plain html build works)?

Yes, it breaks every time somebody accidentally puts a link to
somewhere else in the documentation, doesn't it?

> I don't really want any post-commit or post-receive hooks doing
> anything interesting to the tree.  I think the odds for trouble
> are significantly greater than any value we'd get out of it.

Very much agreed.

> I'm in favor of unifying README and README.git along the lines
> we discussed above.  I think that going further than that
> will be a lot of trouble for very little gain; in fact no gain,
> because I do not buy any of the arguments that have been
> made about why changing the INSTALL setup would be beneficial.
> If we adjust the README contents to be less confusing about that,
> we're done.

+1.

If anything, I'm more behind the idea of just getting rid of the
INSTALL file. A reference to the install instructions in the README
should be enough today. The likelihood of somebody getting a postgres
source tarball and trying to build it for the first time while not
having internet access is extremely low I'd say.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Add parameter jit_warn_above_fraction

2022-03-07 Thread Magnus Hagander

On Mon, Mar 7, 2022 at 2:09 PM Justin Pryzby  wrote:
>
> On Mon, Mar 07, 2022 at 01:10:32PM +0100, Magnus Hagander wrote:
> > On Fri, Feb 25, 2022 at 5:23 PM Justin Pryzby  wrote:
> > >
> > > I think it should be a NOTICE (or less?)
> >
> > Hmm. I'm not so sure.
> >
> > Other similar parameters often use LOG, but the downside of that is
> > that it won't be sent to the client.
> >
> > The problem with using NOTICE is that it won't go to the log by
> > default. It needs to be at least warning to do that.
>
> Anyone who uses this parameter is aleady going to be changing GUCs, so it
> doesn't need to work by default.  The people who are likely to enable this
> already monitor their logs and have probably customized their logging
> configuration.

Sure, but if you set log_min_messagse to NOTICE you are likely going
to flood your logs beyond usefulness. And the whole idea of this
parameter is you can keep it on during "normal times" to catch
outliers.

> > > Is it more useful if this is applied combined with 
> > > log_min_duration_statement ?
> > >
> > > It's easy to imagine a query for which the planner computes a high cost, 
> > > but
> > > actually runs quickly.  You might get a bunch of WARNINGs that the query 
> > > took
> > > 10 MS and JIT was 75% of that, even if you don't care about queries that 
> > > take
> > > less than 10 SEC.
> >
> > Yeah, that's definitely a problem. But which way would you want to tie
> > it to log_min_duration_statement?  That a statement would both have to
> > take longer than log_min_duration_statement *and* have JIT above a
> > certain percentage? In my experience that is instead likely to miss
> > most of the interesting times.
>
> I don't understand - why doesn't it combine trivially with
> log_min_duration_statement?  Are you saying that the default / pre-existing 
> min
> duration may not log all of the intended queries ?  I think that just means 
> the
> configuration needs to be changed.  The GUC needs to allow/help finding these
> JIT issues, but going to require an admin's interaction in any case.  Avoiding
> false positives may be important for it to be useful at all.
>
> Hmm .. should it also combine with min_sample_rate ?  Maybe that addresses 
> your
> concern.

For one, what if I don't want to log any queries unless this is the
case? I leave log_min_duration_statement at -1...
Or what if I want to log say only queries taking more than 60 seconds.
Now I will miss all queries taking 30 seconds where 99.9% of the time
is spent on JIT which I definitely *would* want.

If all I wanted was to get the JIT information for all queries over a
certain amount of time, then as someone else mentioned as well, I can
just use auto_explain. The point here is I want to trigger the logging
for queries where the runtime is *shorter* than what I have in
log_min_duration_statement, but a large portion of the runtime is
spent on JIT.

I maen. If I have a bunch of queries that take 10ms and 75% is spent
in JIT (per your example above), that is actually a problem and I'd
like to know about it and fix it (by tuning the jit triggers either
globally or locally).  Yes, at 10ms I *may* ignore it, and in general
I'd use pg_stat_statements for that. But at 100ms with 75% JIT it
starts becoming exactly the situation that I intended this patch to
help with.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

2022-03-07 Thread Nitin Jadhav

> + 
> +  
> +   type text
> +  
> +  
> +   Type of checkpoint. See .
> +  
> + 
> +
> + 
> +  
> +   kind text
> +  
> +  
> +   Kind of checkpoint. See .
> +  
> + 
>
> This looks a bit confusing. Two columns, one with the name "checkpoint
> types" and another "checkpoint kinds". You can probably rename
> checkpoint-kinds to checkpoint-flags and let the checkpoint-types be
> as-it-is.

Makes sense. I will change in the next patch.
---

> +  
> pg_stat_progress_checkpointpg_stat_progress_checkpoint
> +  One row only, showing the progress of the checkpoint.
>
> Let's make this message consistent with the already existing message
> for pg_stat_wal_receiver. See description for pg_stat_wal_receiver
> view in "Dynamic Statistics Views" table.

You want me to change "One row only" to "Only one row" ? If that is
the case then for other views in the "Collected Statistics Views"
table, it is referred as "One row only". Let me know if you are
pointing out something else.
---

> [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint;
> -[ RECORD 1 ]-+-
> pid   | 22043
> type  | checkpoint
> kind  | immediate force wait requested time
>
> I think the output in the kind column can be displayed as {immediate,
> force, wait, requested, time}. By the way these are all checkpoint
> flags so it is better to display it as checkpoint flags instead of
> checkpoint kind as mentioned in one of my previous comments.

I will update in the next patch.
---

> [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint;
> -[ RECORD 1 ]-+-
> pid   | 22043
> type  | checkpoint
> kind  | immediate force wait requested time
> start_lsn | 0/14C60F8
> start_time| 2022-03-03 18:59:56.018662+05:30
> phase | performing two phase checkpoint
>
> This is the output I see when the checkpointer process has come out of
> the two phase checkpoint and is currently writing checkpoint xlog
> records and doing other stuff like updating control files etc. Is this
> okay?

The idea behind choosing the phases is based on the functionality
which takes longer time to execute. Since after two phase checkpoint
till post checkpoint cleanup won't take much time to execute, I have
not added any additional phase for that. But I also agree that this
gives wrong information to the user. How about mentioning the phase
information at the end of each phase like "Initializing",
"Initialization done", ..., "two phase checkpoint done", "post
checkpoint cleanup done", .., "finalizing". Except for the first phase
("initializing") and last phase ("finalizing"), all the other phases
describe the end of a certain operation. I feel this gives correct
information even though the phase name/description does not represent
the entire code block between two phases. For example if the current
phase is ''two phase checkpoint done". Then the user can infer that
the checkpointer has done till two phase checkpoint and it is doing
other stuff that are after that. Thoughts?
---

> The output of log_checkpoint shows the number of buffers written is 3
> whereas the output of pg_stat_progress_checkpoint shows it as 0. See
> below:
>
> 2022-03-03 20:04:45.643 IST [22043] LOG:  checkpoint complete: wrote 3
> buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled;
> write=24.652 s, sync=104.256 s, total=3889.625 s; sync files=2,
> longest=0.011 s, average=0.008 s; distance=0 kB, estimate=0 kB
>
> --
>
> [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint;
> -[ RECORD 1 ]-+-
> pid   | 22043
> type  | checkpoint
> kind  | immediate force wait requested time
> start_lsn | 0/14C60F8
> start_time| 2022-03-03 18:59:56.018662+05:30
> phase | finalizing
> buffers_total | 0
> buffers_processed | 0
> buffers_written   | 0
>
> Any idea why this mismatch?

Good catch. In BufferSync() we have 'num_to_scan' (buffers_total)
which indicates the total number of buffers to be processed. Based on
that, the 'buffers_processed' and 'buffers_written' counter gets
incremented. I meant these values may reach upto 'buffers_total'. The
current pg_stat_progress_view support above information. There is
another place when 'ckpt_bufs_written' gets incremented (In
SlruInternalWritePage()). This increment is above the 'buffers_total'
value and it is included in the server log message (checkpoint end)
and not included in the view. I am a bit confused here. If we include
this increment in the view then we cannot calculate the exact
'buffers_total' beforehand. Can we increment the 'buffers_toal' also
when 'ckpt_bufs_written' gets incremented so that we can match the
behaviour with checkpoint end message?  Please share your

Re: Proposal: Support custom authentication methods using hooks,Re: Proposal: Support custom authentication methods using hooks

2022-03-07 Thread Joshua Brindle

On Fri, Mar 4, 2022 at 6:03 PM Tatsuo Ishii  wrote:
>
> >> I still don't understand why using plaintex password authentication
> >> over SSL connection is considered insecure. Actually we have been
> >> stating opposite in the manual:
> >> https://www.postgresql.org/docs/14/auth-password.html
> >>
> >> "If the connection is protected by SSL encryption then password can be
> >> used safely, though."
> >
> > If you aren't doing client verification (i.e., cert in pg_hba) and are
> > not doing verify-full on the client side then a man-in-the-middle
> > attack on TLS is trivial, and the plaintext password will be
> > sniffable.
>
> So the plaintext password is safe if used with hostssl + verify-full
> (server side) and sslmode = verify-full (client side), right?
>

That would be safe-in-transit so long as everything was configured
properly and all certificates were protected. Unfortunately PG doesn't
make this incredibly easy to implement, it allows only 1 client root
cert, the client side doesn't understand system certificate stores or
PKI, etc.

Further, if someone gains access to the password hashes there is still
a pass-the-hash vulnerability, though.

Re: generic plans and "initial" pruning

2022-03-07 Thread Amit Langote

On Fri, Feb 11, 2022 at 7:02 AM Robert Haas  wrote:
> On Thu, Feb 10, 2022 at 3:14 AM Amit Langote  wrote:
> > Maybe this should be more than one patch?  Say:
> >
> > 0001 to add ExecutorPrep and the boilerplate,
> > 0002 to teach plancache.c to use the new facility

Thanks for taking a look and sorry about the delay.

> Could be, not sure. I agree that if it's possible to split this in a
> meaningful way, it would facilitate review. I notice that there is
> some straight code movement e.g. the creation of
> ExecPartitionPruneFixSubPlanIndexes. It would be best, I think, to do
> pure code movement in a preparatory patch so that the main patch is
> just adding the new stuff we need and not moving stuff around.

Okay, created 0001 for moving around the execution pruning code.

> David Rowley recently proposed a patch for some parallel-safety
> debugging cross checks which added a plan tree walker. I'm not sure
> whether he's going to press that patch forward to commit, but I think
> we should get something like that into the tree and start using it,
> rather than adding more bespoke code. Maybe you/we should steal that
> part of his patch and commit it separately.

I looked at the thread you mentioned (I guess [1]), though it seems
David's proposing a path_tree_walker(), so I guess only useful within
the planner and not here.

> What I'm imagining is that
> plan_tree_walker() would know which nodes have subnodes and how to
> recurse over the tree structure, and you'd have a walker function to
> use with it that would know which executor nodes have ExecPrep
> functions and call them, and just do nothing for the others. That
> would spare you adding stub functions for nodes that don't need to do
> anything, or don't need to do anything other than recurse. Admittedly
> it would look a bit different from the existing executor phases, but
> I'd argue that it's a better coding model.
>
> Actually, you might've had this in the patch at some point, because
> you have a declaration for plan_tree_walker but no implementation.

Right, the previous patch indeed used a plan_tree_walker() for this
and I think in a way you seem to think it should work.

I do agree that plan_tree_walker() allows for a better implementation
of the idea of this patch and may also be generally useful, so I've
created a separate patch that adds it to nodeFuncs.c.

> I guess one thing that's a bit awkward about this idea is that in some
> cases you want to recurse to some subnodes but not other subnodes. But
> maybe it would work to put the recursion in the walker function in
> that case, and then just return true; but if you want to walk all
> children, return false.

Right, that's how I've made ExecPrepAppend() etc. do it.

> + bool contains_init_steps;
> + bool contains_exec_steps;
>
> s/steps/pruning/? maybe with contains -> needs or performs or requires as 
> well?

Went with: needs_{init|exec}_pruning

> + * Returned information includes the set of RT indexes of relations 
> referenced
> + * in the plan, and a PlanPrepOutput node for each node in the planTree if 
> the
> + * node type supports producing one.
>
> Aren't all RT indexes referenced in the plan?

Ah yes.  How about:

 * Returned information includes the set of RT indexes of relations that must
 * be locked to safely execute the plan,

> + * This may lock relations whose information may be used to produce the
> + * PlanPrepOutput nodes. For example, a partitioned table before perusing its
> + * PartitionPruneInfo contained in an Append node to do the pruning the 
> result
> + * of which is used to populate the Append node's PlanPrepOutput.
>
> "may lock" feels awfully fuzzy to me. How am I supposed to rely on
> something that "may" happen? And don't we need to have tight logic
> around locking, with specific guarantees about what is locked at which
> points in the code and what is not?

Agree the wording was fuzzy.  I've rewrote as:

 * This locks relations whose information is needed to produce the
 * PlanPrepOutput nodes. For example, a partitioned table before perusing its
 * PartitionedRelPruneInfo contained in an Append node to do the pruning, the
 * result of which is used to populate the Append node's PlanPrepOutput.

BTW, I've added an Assert in ExecGetRangeTableRelation():

   /*
* A cross-check that AcquireExecutorLocks() hasn't missed any relations
* it must not have.
*/
   Assert(estate->es_execprep == NULL ||
  bms_is_member(rti, estate->es_execprep->relationRTIs));

which IOW ensures that the actual execution of a plan only sees
relations that ExecutorPrep() would've told AcquireExecutorLocks() to
take a lock on.

> + * At least one of 'planstate' or 'econtext' must be passed to be able to
> + * successfully evaluate any non-Const expressions contained in the
> + * steps.
>
> This also seems fuzzy. If I'm thinking of calling this function, I
> don't know how I'd know whether this criterion is met.

OK, I have removed this comment (which was on

Re: suboverflowed subtransactions concurrency performance optimize

2022-03-07 Thread Julien Rouhaud

On Mon, Mar 07, 2022 at 01:27:40PM +, Simon Riggs wrote:
> > +/*
> > + * Single-item cache for results of SubTransGetTopmostTransaction.  It's 
> > worth having
> > + * such a cache because we frequently find ourselves repeatedly checking 
> > the
> > + * same XID, for example when scanning a table just after a bulk insert,
> > + * update, or delete.
> > + */
> > +static TransactionId cachedFetchXid = InvalidTransactionId;
> > +static TransactionId cachedFetchTopmostXid = InvalidTransactionId;
> >
> > The comment is above the 80 chars after
> > s/TransactionLogFetch/SubTransGetTopmostTransaction/, and I don't think this
> > comment is valid for subtrans.c.
> 
> What aspect makes it invalid? The comment seems exactly applicable to
> me; Andrey thinks so also.

Sorry, I somehow missed the "for example", and was thinking that
SubTransGetTopmostTransaction was used in many other places compared to
TransactionIdDidCommit and friends.

> > It would be nice to see some benchmarks, for both when this change is
> > enough to avoid a contention (when there's a single long-running overflowed
> > backend) and when it's not enough.  That will also be useful if/when 
> > working on
> > the "rethink_subtrans" patch.
> 
> The patch doesn't do anything about the case of when there's a single
> long-running overflowed backend, nor does it claim that.

I was thinking that having a cache for SubTransGetTopmostTransaction could help
at least to some extent for that problem, sorry if that's not the case.

I'm still curious on how much this simple optimization can help in some
scenarios, even if they're somewhat artificial.

> The patch was posted because TransactionLogFetch() has a cache, yet
> SubTransGetTopmostTransaction() does not, yet the argument should be
> identical in both cases.

I totally agree with that.

Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

2022-03-07 Thread Nitin Jadhav

> 1) Can we convert below into pgstat_progress_update_multi_param, just
> to avoid function calls?
> pgstat_progress_update_param(PROGRESS_CHECKPOINT_LSN, checkPoint.redo);
> pgstat_progress_update_param(PROGRESS_CHECKPOINT_PHASE,
>
> 2) Why are we not having special phase for CheckPointReplicationOrigin
> as it does good bunch of work (writing to disk, XLogFlush,
> durable_rename) especially when max_replication_slots is large?
>
> 3) I don't think "requested" is necessary here as it doesn't add any
> value or it's not a checkpoint kind or such, you can remove it.
>
> 4) s:/'recycling old XLOG files'/'recycling old WAL files'
> +  WHEN 16 THEN 'recycling old XLOG files'
>
> 5) Can we place CREATE VIEW pg_stat_progress_checkpoint AS definition
> next to pg_stat_progress_copy in system_view.sql? It looks like all
> the progress reporting views are next to each other.

I will take care in the next patch.
---

> 6) How about shutdown and end-of-recovery checkpoint? Are you planning
> to have an ereport_startup_progress mechanism as 0002?

I thought of including it earlier then I felt lets first make the
current patch stable. Once all the fields are properly decided and the
patch gets in then we can easily extend the functionality to shutdown
and end-of-recovery cases. I have also observed that the timer
functionality wont work properly in case of shutdown as we are doing
an immediate checkpoint. So this needs a lot of discussion and I would
like to handle this on a separate thread.
---

> 7) I think you don't need to call checkpoint_progress_start and
> pgstat_progress_update_param, any other progress reporting function
> for shutdown and end-of-recovery checkpoint right?

I had included the guards earlier and then removed later based on the
discussion upthread.
---

> 8) Not for all kinds of checkpoints right? pg_stat_progress_checkpoint
> can't show progress report for shutdown and end-of-recovery
> checkpoint, I think you need to specify that here in wal.sgml and
> checkpoint.sgml.
> +   command CHECKPOINT. The checkpointer process running 
> the
> +   checkpoint will report its progress in the
> +   pg_stat_progress_checkpoint view. See
> +for details.
>
> 9) Can you add a test case for pg_stat_progress_checkpoint view? I
> think it's good to add one. See, below for reference:
> -- Add a trigger to catch and print the contents of the catalog view
> -- pg_stat_progress_copy during data insertion.  This allows to test
> -- the validation of some progress reports for COPY FROM where the trigger
> -- would fire.
> create function notice_after_tab_progress_reporting() returns trigger AS
> $$
> declare report record;
>
> 10) Typo: it's not "is happens"
> +   The checkpoint is happens without delays.
>
> 11) Can you be specific what are those "some operations" that forced a
> checkpoint? May be like, basebackup, createdb or something?
> +   The checkpoint is started because some operation forced a checkpoint.
>
> 12) Can you be a bit elobartive here who waits? Something like the
> backend that requested checkpoint will wait until it's completion 
> +   Wait for completion before returning.
>
> 13) "removing unneeded or flushing needed logical rewrite mapping files"
> +   The checkpointer process is currently removing/flushing the logical
>
> 14) "old WAL files"
> +   The checkpointer process is currently recycling old XLOG files.

I will take care in the next patch.

Thanks & Regards,
Nitin Jadhav

On Wed, Mar 2, 2022 at 11:52 PM Bharath Rupireddy
 wrote:
>
> On Wed, Mar 2, 2022 at 4:45 PM Nitin Jadhav
>  wrote:
> > > Also, how about special phases for SyncPostCheckpoint(),
> > > SyncPreCheckpoint(), InvalidateObsoleteReplicationSlots(),
> > > PreallocXlogFiles() (it currently pre-allocates only 1 WAL file, but
> > > it might be increase in future (?)), TruncateSUBTRANS()?
> >
> > SyncPreCheckpoint() is just incrementing a counter and
> > PreallocXlogFiles() currently pre-allocates only 1 WAL file. I feel
> > there is no need to add any phases for these as of now. We can add in
> > the future if necessary. Added phases for SyncPostCheckpoint(),
> > InvalidateObsoleteReplicationSlots() and TruncateSUBTRANS().
>
> Okay.
>
> > > 10) I'm not sure if it's discussed, how about adding the number of
> > > snapshot/mapping files so far the checkpoint has processed in file
> > > processing while loops of
> > > CheckPointSnapBuild/CheckPointLogicalRewriteHeap? Sometimes, there can
> > > be many logical snapshot or mapping files and users may be interested
> > > in knowing the so-far-processed-file-count.
> >
> > I had thought about this while sharing the v1 patch and mentioned my
> > views upthread. I feel it won't give meaningful progress information
> > (It can be treated as statistics). Hence not included. Thoughts?
>
> Okay. If there are any complaints about it we can always add them later.
>
> > > > > As mentioned upthread, there can be multiple backends that request a

Re: [RFC] building postgres with meson

2022-03-07 Thread Peter Eisentraut


Is there a current patch set to review in this thread at the moment?

Re: suboverflowed subtransactions concurrency performance optimize

2022-03-07 Thread Simon Riggs

On Mon, 7 Mar 2022 at 09:49, Julien Rouhaud  wrote:
>
> Hi,
>
> On Mon, Jan 17, 2022 at 01:44:02PM +, Simon Riggs wrote:
> >
> > Re-attached, so that the CFapp isn't confused between the multiple
> > patches on this thread.
>
> Thanks a lot for working on this!
>
> The patch is simple and overall looks good to me.  A few comments though:
>
>
> +/*
> + * Single-item cache for results of SubTransGetTopmostTransaction.  It's 
> worth having
> + * such a cache because we frequently find ourselves repeatedly checking the
> + * same XID, for example when scanning a table just after a bulk insert,
> + * update, or delete.
> + */
> +static TransactionId cachedFetchXid = InvalidTransactionId;
> +static TransactionId cachedFetchTopmostXid = InvalidTransactionId;
>
> The comment is above the 80 chars after
> s/TransactionLogFetch/SubTransGetTopmostTransaction/, and I don't think this
> comment is valid for subtrans.c.

What aspect makes it invalid? The comment seems exactly applicable to
me; Andrey thinks so also.

> Also, maybe naming the first variable cachedFetchSubXid would make it a bit
> clearer?

Sure, that can be done.

> It would be nice to see some benchmarks, for both when this change is
> enough to avoid a contention (when there's a single long-running overflowed
> backend) and when it's not enough.  That will also be useful if/when working 
> on
> the "rethink_subtrans" patch.

The patch doesn't do anything about the case of when there's a single
long-running overflowed backend, nor does it claim that.

The patch will speed up calls to SubTransGetTopmostTransaction(), which occur in
src/backend/access/heap/heapam.c
src/backend/utils/time/snapmgr.c
src/backend/storage/lmgr/lmgr.c
src/backend/storage/ipc/procarray.c

The patch was posted because TransactionLogFetch() has a cache, yet
SubTransGetTopmostTransaction() does not, yet the argument should be
identical in both cases.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: Add parameter jit_warn_above_fraction

2022-03-07 Thread Justin Pryzby

On Mon, Mar 07, 2022 at 01:10:32PM +0100, Magnus Hagander wrote:
> On Fri, Feb 25, 2022 at 5:23 PM Justin Pryzby  wrote:
> >
> > I think it should be a NOTICE (or less?)
> 
> Hmm. I'm not so sure.
> 
> Other similar parameters often use LOG, but the downside of that is
> that it won't be sent to the client.
> 
> The problem with using NOTICE is that it won't go to the log by
> default. It needs to be at least warning to do that.

Anyone who uses this parameter is aleady going to be changing GUCs, so it
doesn't need to work by default.  The people who are likely to enable this
already monitor their logs and have probably customized their logging
configuration.

> > Is it more useful if this is applied combined with 
> > log_min_duration_statement ?
> >
> > It's easy to imagine a query for which the planner computes a high cost, but
> > actually runs quickly.  You might get a bunch of WARNINGs that the query 
> > took
> > 10 MS and JIT was 75% of that, even if you don't care about queries that 
> > take
> > less than 10 SEC.
> 
> Yeah, that's definitely a problem. But which way would you want to tie
> it to log_min_duration_statement?  That a statement would both have to
> take longer than log_min_duration_statement *and* have JIT above a
> certain percentage? In my experience that is instead likely to miss
> most of the interesting times.

I don't understand - why doesn't it combine trivially with
log_min_duration_statement?  Are you saying that the default / pre-existing min
duration may not log all of the intended queries ?  I think that just means the
configuration needs to be changed.  The GUC needs to allow/help finding these
JIT issues, but going to require an admin's interaction in any case.  Avoiding
false positives may be important for it to be useful at all.

Hmm .. should it also combine with min_sample_rate ?  Maybe that addresses your
concern.

-- 
Justin

Re: Expose JIT counters/timing in pg_stat_statements

2022-03-07 Thread Dmitry Dolgov

> On Mon, Mar 07, 2022 at 01:27:02PM +0100, Magnus Hagander wrote:
> On Fri, Feb 25, 2022 at 5:40 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
> > One interesting not-very-relevant for the patch thing I've noticed while
> > reading it, is that there seems to be no way to find out what fraction
> > of time jit_tuple_deforming is taking alone, it's sort of merged
> > together with jit_expressions in generation_counter.
>
> That's missing att a deeper level though, right? We don't have it in
> EXPLAIN ANALYZE either. So while I agree that's useful, I think that's
> the job of another patch, and these two sets of counters should be the
> same.

Right, it's missing on the instrumentation level, I was just surprised
to notice that.

Re: logical replication empty transactions

2022-03-07 Thread Ajin Cherian

On Mon, Mar 7, 2022 at 7:50 PM shiy.f...@fujitsu.com
 wrote:
>
> On Fri, Mar 4, 2022 9:41 AM Ajin Cherian  wrote:
> >
> > I have split the patch into two. I have kept the logic of skipping
> > streaming changes in the second patch.
> > I will work on the second patch once we can figure out a solution for
> > the COMMIT PREPARED after restart problem.
> >
>
> Thanks for updating the patch.
>
> A comment on v23-0001 patch.
>
> @@ -1429,6 +1520,19 @@ pgoutput_message(LogicalDecodingContext *ctx, 
> ReorderBufferTXN *txn,
> if (in_streaming)
> xid = txn->xid;
>
> +   /*
> +* Output BEGIN if we haven't yet.
> +* Avoid for non-transactional messages.
> +*/
> +   if (in_streaming || transactional)
> +   {
> +   PGOutputTxnData *txndata = (PGOutputTxnData *) 
> txn->output_plugin_private;
> +
> +   /* Send BEGIN if we haven't yet */
> +   if (txndata && !txndata->sent_begin_txn)
> +   pgoutput_send_begin(ctx, txn);
> +   }
> +
> OutputPluginPrepareWrite(ctx, true);
> logicalrep_write_message(ctx->out,
>  xid,
>
> I think we don't need to send BEGIN if in_streaming is true, right? The first
> patch doesn't skip streamed transaction, so should we modify
> +   if (in_streaming || transactional)
> to
> +   if (!in_streaming && transactional)
> ?
>

Fixed.

regards,
Ajin Cherian
Fujitsu Australia


v24-0002-Skip-empty-streamed-transactions-for-logical-rep.patch
Description: Binary data


v24-0001-Skip-empty-transactions-for-logical-replication.patch
Description: Binary data

Re: Expose JIT counters/timing in pg_stat_statements

2022-03-07 Thread Magnus Hagander

On Fri, Feb 25, 2022 at 5:43 PM Julien Rouhaud  wrote:
>
> On Fri, Feb 25, 2022 at 05:38:45PM +0100, Magnus Hagander wrote:
> >
> > Per some off-list discussion with Julien, we have clearly been talking
> > in slightly different terms. So let's summarize the options into what
> > theÿ́d actually be:
> >
> > Option 0: what is int he patch now
> >
> > Option 1:
> > jit_count - number of executions using jit
> > total_jit_time - for sum of functions+inlining+optimization+emission time
> > min_jit_time - for sum of functions+inlining+optimization+emission time
> > max_jit_time - for sum of functions+inlining+optimization+emission time
> > mean_jit_time - for sum of functions+inlining+optimization+emission time
> > stddev_jit_time - for sum of functions+inlining+optimization+emission time
> > jit_functions - number of functions
> > jit_inlining_count - number of executions where inlining happened
> > jit_optimization_count - number of executions where optimization happened
> > jit_emission_count - number of executions where emission happened
> >
> > Option 2:
> > jit_count
> > jit_functions
> > jit_generation_time
> > jit_inlining_count
> > jit_inlining_time
> > jit_optimization_count
> > jit_optimization_time
> > jit_emission_count
> > jit_emission_time
>
> I'm for option 2, I think it's important to have the timing details for
> inlining and optimization and be able to compute correct stats.

I wonder if there might be an interesting middle ground, or if that is
making it too much. That is, we could have an
Option 3:
jit_count
total_jit_time - for sum of functions+inlining+optimization+emission time
min_jit_time - for sum of functions+inlining+optimization+emission time
max_jit_time - for sum of functions+inlining+optimization+emission time
mean_jit_time - for sum of functions+inlining+optimization+emission time
stddev_jit_time - for sum of functions+inlining+optimization+emission time
jit_functions
jit_generation_time
jit_inlining_count
jit_inlining_time
jit_optimization_count
jit_optimization_time
jit_emission_count
jit_emission_time

That is, we'd get the more detailed timings across the total time, but
not on the details. But that might be overkill.

But -- here's an updated patched based on Option 2.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/
diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 7fabd96f38..d9eacfb364 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -6,7 +6,7 @@ OBJS = \
 	pg_stat_statements.o
 
 EXTENSION = pg_stat_statements
-DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.8--1.9.sql \
+DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.9--1.10.sql pg_stat_statements--1.8--1.9.sql \
 	pg_stat_statements--1.7--1.8.sql pg_stat_statements--1.6--1.7.sql \
 	pg_stat_statements--1.5--1.6.sql pg_stat_statements--1.4--1.5.sql \
 	pg_stat_statements--1.3--1.4.sql pg_stat_statements--1.2--1.3.sql \
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql b/contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql
new file mode 100644
index 00..8924df2715
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql
@@ -0,0 +1,65 @@
+/* contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.10'" to load this file. \quit
+
+/* First we have to remove them from the extension */
+ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements(boolean);
+
+/* Then we can drop them */
+DROP VIEW pg_stat_statements;
+DROP FUNCTION pg_stat_statements(boolean);
+
+/* Now redefine */
+CREATE FUNCTION pg_stat_statements(IN showtext boolean,
+OUT userid oid,
+OUT dbid oid,
+OUT toplevel bool,
+OUT queryid bigint,
+OUT query text,
+OUT plans int8,
+OUT total_plan_time float8,
+OUT min_plan_time float8,
+OUT max_plan_time float8,
+OUT mean_plan_time float8,
+OUT stddev_plan_time float8,
+OUT calls int8,
+OUT total_exec_time float8,
+OUT min_exec_time float8,
+OUT max_exec_time float8,
+OUT mean_exec_time float8,
+OUT stddev_exec_time float8,
+OUT rows int8,
+OUT shared_blks_hit int8,
+OUT shared_blks_read int8,
+OUT shared_blks_dirtied int8,
+OUT shared_blks_written int8,
+OUT local_blks_hit int8,
+OUT local_blks_read int8,
+OUT local_blks_dirtied int8,
+OUT local_blks_written int8,
+OUT temp_blks_read int8,
+OUT temp_blks_written int8,
+OUT blk_read_time float8,
+OUT blk_write_time float8,
+OUT wal_records int8,
+OUT wal_fpi int8,
+OUT wal_bytes numeric,
+OUT jit_functions int8,
+OUT jit_generation_time float8,
+OUT jit_inlining_count int8,
+OUT

Re: Expose JIT counters/timing in pg_stat_statements

2022-03-07 Thread Magnus Hagander

On Fri, Feb 25, 2022 at 5:40 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> > On Fri, Feb 25, 2022 at 04:19:27PM +0100, Magnus Hagander wrote:
> > On Fri, Feb 25, 2022 at 2:33 PM Julien Rouhaud  wrote:
> > >
> > > Hi,
> > >
> > > On Fri, Feb 25, 2022 at 02:06:29PM +0100, Magnus Hagander wrote:
> > > > Here's a patch to add the sum of timings for JIT counters to
> > > > pg_stat_statements, as a way to follow-up on if JIT is doing a good or
> > > > a bad job in a configuration.
> > >
> > > +1, it seems like something quite useful.
> >
> > Given the amount of time often spent debugging JIT -- getting more
> > insight is going to make it easier to tune it instead of like what
> > unfortunately many people do and just turn it off..
>
> Indeed, sounds convenient, although I wonder how exactly one would use it
> to tune JIT? I'm curious, because I got used to situations when one
> single long query takes much longer than expected due to JIT issues --
> but it seems the target of this patch are situations when there are a
> lot of long queries using JIT, and it's easier to analyze them via pgss?

"tune jit" might be a bad wording. But tune the values of
jit_above_cost for example, and in particular figure specific queries
where it needs adapting.

And like most things with pgss, just getting proper insight into what
your system is actually doing and spending it's time on. It might be
spending a huge amount of time on JITting many relatively simple
queries, and it may still be a win -- but this lets you know that this
was the case.


> > > > I decided to only store the total time for the timings, since there
> > > > are 4 different timings and storing max/min/etc for each one of them
> > > > would lead to a bit too much data. This can of course be reconsidered,
> > > > but I think that's a reasonable tradeoff.
> > >
> > > I think the cumulated total time is enough.  Looking at the patch, I 
> > > think we
> > > should also cumulate the number of time jit was triggered, and
> > > probably the same for each other main operation (optimization and 
> > > inlining).
> > > Otherwise the values may be wrong and look artificially low.
> >
> > So just to be clear, you're basically thinking:
> >
> > jit_count = count of entries where jit_functions>0
> > jit_functions = 
> > jit_optimizatinos = count of entries where time spent on jit_optimizations 
> > > 0
>
> One interesting not-very-relevant for the patch thing I've noticed while
> reading it, is that there seems to be no way to find out what fraction
> of time jit_tuple_deforming is taking alone, it's sort of merged
> together with jit_expressions in generation_counter.

That's missing att a deeper level though, right? We don't have it in
EXPLAIN ANALYZE either. So while I agree that's useful, I think that's
the job of another patch, and these two sets of counters should be the
same.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Handle infinite recursion in logical replication setup

2022-03-07 Thread Amit Kapila

On Mon, Mar 7, 2022 at 5:01 PM Ashutosh Bapat
 wrote:
>
> Hi Vignesh,
> I agree with Peter's comment that the changes to
> FilterRemoteOriginData() should be part of FilterByOrigin()
>
> Further, I wonder why "onlylocal_data" is a replication slot's
> property. A replication slot tracks the progress of replication and it
> may be used by different receivers with different options. I could
> start one receiver which wants only local data, say using
> "pg_logical_slot_get_changes" and later start another receiver which
> fetches all the data starting from where the first receiver left. This
> option prevents such flexibility.
>
> As discussed earlier in the thread, local_only can be property of
> publication or subscription, depending upon the use case, but I can't
> see any reason that it should be tied to a replication slot.
>

I thought it should be similar to 'streaming' option of subscription
but may be Vignesh has some other reason which makes it different.

> I have a similar question for "two_phase" but the ship has sailed and
> probably it makes some sense there which I don't know.
>

two_phase is different from some of the other subscription options
like 'streaming' such that it can be enabled only at the time of slot
and subscription creation, we can't change/specify it via
pg_logical_slot_get_changes. This is to avoid the case where we won't
know at the time of the commit prepared whether the prepare for the
transaction has already been sent. For the same reason, we need to
also know the 'two_phase_at' information.

> As for publication vs subscription, I think both are useful cases.
> 1. It will be a publication's property, if we want the node to not
> publish any data that it receives from other nodes for a given set of
> tables.
> 2. It will be the subscription's property, if we want the subscription
> to decide whether it wants to fetch the data changed on only upstream
> or other nodes as well.
>

I think it could be useful to allow it via both publication and
subscription but I guess it is better to provide it via one way
initially just to keep things simple and give users some way to deal
with such cases. I would prefer to allow it via subscription initially
for the reasons specified by Vignesh in his previous email [1]. Now,
if we think those all are ignorable things and it is more important to
allow this option first by publication or we must allow it via both
publication and subscription then it makes sense to change it.

[1] - 
https://www.postgresql.org/message-id/CALDaNm3jkotRhKfCqu5CXOf36_yiiW_cYE5%3DbG%3Dj6N3gOWJkqw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: Add parameter jit_warn_above_fraction

2022-03-07 Thread Magnus Hagander

On Fri, Feb 25, 2022 at 5:23 PM Justin Pryzby  wrote:
>
> On Fri, Feb 25, 2022 at 04:16:01PM +0100, Magnus Hagander wrote:
> > + {
> > + {"jit_warn_above_fraction", PGC_SUSET, LOGGING_WHEN,
> > + gettext_noop("Sets the fraction of query time spent 
> > on JIT before writing"
> > +  "a warning to the log."),
> > + gettext_noop("Write a message tot he server log if 
> > more than this"
> > +  "fraction of the query 
> > runtime is spent on JIT."
> > +  "Zero turns off the 
> > warning.")
> > + },
> > + _warn_above_fraction,
> > + 0.0, 0.0, 1.0,
> > + NULL, NULL, NULL
> > + },
>
> Should be PGC_USERSET ?

Yes. Definitely. Copy/paste thinko.


> +   gettext_noop("Write a message tot he server log if 
> more than this"
>
> to the

Yup.


> +   if (jit_enabled && jit_warn_above_fraction > 0)
> +   {
> +   int64 jit_time =
> +   
> INSTR_TIME_GET_MILLISEC(portal->queryDesc->estate->es_jit->instr.generation_counter)
>  +
> +   
> INSTR_TIME_GET_MILLISEC(portal->queryDesc->estate->es_jit->instr.inlining_counter)
>  +
> +   
> INSTR_TIME_GET_MILLISEC(portal->queryDesc->estate->es_jit->instr.optimization_counter)
>  +
> +   
> INSTR_TIME_GET_MILLISEC(portal->queryDesc->estate->es_jit->instr.emission_counter);
> +
> +   if (jit_time > msecs * jit_warn_above_fraction)
> +   {
> +   ereport(WARNING,
> +   (errmsg("JIT time was %ld ms of %d 
> ms",
> +   jit_time, msecs)));
> +   }
> +   }
>
>
> I think it should be a NOTICE (or less?)

Hmm. I'm not so sure.

Other similar parameters often use LOG, but the downside of that is
that it won't be sent to the client.

The problem with using NOTICE is that it won't go to the log by
default. It needs to be at least warning to do that.


> Is it more useful if this is applied combined with log_min_duration_statement 
> ?
>
> It's easy to imagine a query for which the planner computes a high cost, but
> actually runs quickly.  You might get a bunch of WARNINGs that the query took
> 10 MS and JIT was 75% of that, even if you don't care about queries that take
> less than 10 SEC.

Yeah, that's definitely a problem. But which way would you want to tie
it to log_min_duration_statement?  That a statement would both have to
take longer than log_min_duration_statement *and* have JIT above a
certain percentage? In my experience that is instead likely to miss
most of the interesting times. Maybe it would need a separate guc for
the timing, but I'd like to avoid that, I'm not sure it's a function
worth *that* much...


> I should say that this is already available by processing the output of
> autoexplain.

True. You just can't trigger it based on it. (and it can be somewhat
of a PITA to parse things out of auto_explain on busy systems, but it
does give a lot of very useful details)


Meanwhile here is an updated based on your other comments above, as
well as those from Julien.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7ed8c82a9d..ec4308a480 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6592,6 +6592,21 @@ local0.*/var/log/postgresql

  
 
+ 
+  jit_warn_above_fraction (floating point)
+  
+   jit_warn_above_fraction configuration parameter
+  
+  
+   
+
+ Causes a warning to be written to the log if the time spent on JIT (see )
+ goes above this fraction of the total query runtime.
+ A value of 0 (the default) disables the warning.
+
+   
+ 
+
  
   log_startup_progress_interval (integer)
   
diff --git a/src/backend/tcop/fastpath.c b/src/backend/tcop/fastpath.c
index d429aa4663..a1bff893a3 100644
--- a/src/backend/tcop/fastpath.c
+++ b/src/backend/tcop/fastpath.c
@@ -196,6 +196,7 @@ HandleFunctionRequest(StringInfo msgBuf)
 	struct fp_info *fip;
 	bool		callit;
 	bool		was_logged = false;
+	int			msecs;
 	char		msec_str[32];
 
 	/*
@@ -305,7 +306,7 @@ HandleFunctionRequest(StringInfo msgBuf)
 	/*
 	 * Emit duration logging if appropriate.
 	 */
-	switch (check_log_duration(msec_str, was_logged))
+	switch (check_log_duration(, msec_str, was_logged))
 	{
 		case 1:
 			ereport(LOG,
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index d7e39aed64..fcf20428a0 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1016,7 +1016,9 @@ exec_simple_query(const char *query_string)
 	bool

Re: pg_tablespace_location() failure with allow_in_place_tablespaces

2022-03-07 Thread Michael Paquier

On Fri, Mar 04, 2022 at 11:26:43PM +1300, Thomas Munro wrote:
> The warning from readlink() while making the mapping file isn't ideal,
> and perhaps we should suppress that with something like the attached.
> Or does the missing map file entry break something on Windows?

> @@ -8292,6 +8293,10 @@ do_pg_start_backup(const char *backupidstr, bool fast, 
> TimeLineID *starttli_p,
>  
>   snprintf(fullpath, sizeof(fullpath), "pg_tblspc/%s", de->d_name);
>  
> + /* Skip in-place tablespaces (testing use only) */
> + if (get_dirent_type(fullpath, de, false, ERROR) == PGFILETYPE_DIR)
> + continue;

I saw the warning when testing base backups with in-place tablespaces
and it did not annoy me much, but, yes, that can be confusing.

Junction points are directories, no?  Are you sure that this works
correctly on WIN32?  It seems to me that we'd better use readlink()
only for entries in pg_tlbspc/ that are PGFILETYPE_LNK on non-WIN32
and pgwin32_is_junction() on WIN32.
--
Michael


signature.asc
Description: PGP signature

1 2 >

1 - 100 of 115 matches

Mail list logo