Re: Speed up pg_checksums in cases where checksum already set

2021-06-17 Thread Michael Paquier
On Wed, Jun 02, 2021 at 05:09:36PM -0400, Greg Sabino Mullane wrote:
> Newer version attach that adds a small documentation tweak as well.

-   enabling checksums, every file in the cluster is rewritten in-place.
+   enabling checksums, every file in the cluster with a changed checksum is
+   rewritten in-place.

This doc addition is a bit confusing, as it could mean that each file
has just one single checksum.  We could be more precise, say:
"When enabling checksums, each relation file block with a changed
checksum is rewritten in place."

Should we also mention that the sync happens even if no blocks are
rewritten based on the reasoning of upthread (aka we'd better do the
final flush as an interrupted pg_checksums may let a portion of the
files as not flushed)?
--
Michael


signature.asc
Description: PGP signature


Re: Unresolved repliaction hang and stop problem.

2021-06-17 Thread Kyotaro Horiguchi
At Thu, 17 Jun 2021 12:56:42 -0400, Alvaro Herrera  
wrote in 
> On 2021-Jun-17, Kyotaro Horiguchi wrote:
> 
> > I don't see a call to hash_*seq*_search there. Instead, I see one in
> > ReorderBufferCheckMemoryLimit().
> 
> Doh, of course -- I misread.
> 
> ReorderBufferCheckMemoryLimit is new in pg13 (cec2edfa7859) so now at
> least we have a reason why this workload regresses in pg13 compared to
> earlier releases.
> 
> Looking at the code, it does seem that increasing the memory limit as
> Amit suggests might solve the issue.  Is that a practical workaround?

I believe so generally. I'm not sure about the op, though.

Just increasing the working memory to certain size would solve the
problem.  There might be a potential issue that it might be hard like
this case for users to find out that the GUC works for their issue (if
actually works).  pg_stat_replicatoin_slots.spill_count / spill_txns
could be a guide for setting logical_decoding_work_mem.  Is it worth
having additional explanation like that for the GUC variable in the
documentation?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: SSL/TLS instead of SSL in docs

2021-06-17 Thread Michael Paquier
On Tue, Jun 15, 2021 at 03:59:18PM +0200, Daniel Gustafsson wrote:
> While in there I added IMO missing items to the glossary and acronyms sections
> as well as fixed up markup around OpenSSL.
> 
> This only deals with docs, but if this is deemed interesting then userfacing
> messages in the code should use SSL/TLS as well of course.

+SNI
+
+ 
+  Server Name Indication
+ 
+
It looks inconsistent to me to point to the libpq documentation to get
the details about SNI.  Wouldn't is be better to have an item in the
glossary that refers to the bits of RFC 6066, and remove the reference
of the RPC from the libpq page?

-   to present a valid (trusted) SSL certificate, while
+   to present a valid (trusted) 
SSL/TLS certificate, while
This style with two acronyms for what we want to be one thing is
heavy.  Could it be better to just have one single acronym called
SSL/TLS that references both parts?

Patch 0003, for the  markups with OpenSSL, included one
SSL/TLS entry.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] logical decoding of two-phase transactions

2021-06-17 Thread Amit Kapila
On Fri, Jun 18, 2021 at 7:43 AM Peter Smith  wrote:
>
> On Thu, Jun 17, 2021 at 6:22 PM Greg Nancarrow  wrote:
> >
> > For example, in the v86-0001 patch:
> >
> > +/*
> > + * Handle PREPARE message.
> > + */
> > +static void
> > +apply_handle_prepare(StringInfo s)
> > +{
> > + LogicalRepPreparedTxnData prepare_data;
> > + char gid[GIDSIZE];
> > +
> > + logicalrep_read_prepare(s, _data);
> > +
> > + Assert(prepare_data.prepare_lsn == remote_final_lsn);
> >
> > The above Assert() should be changed to something like:
> >
> > +if (prepare_data.prepare_lsn != remote_final_lsn)
> > +ereport(ERROR,
> > +(errcode(ERRCODE_PROTOCOL_VIOLATION),
> > + errmsg_internal("incorrect prepare LSN %X/%X in
> > prepare message (expected %X/%X)",
> > + LSN_FORMAT_ARGS(prepare_data.prepare_lsn),
> > + LSN_FORMAT_ARGS(remote_final_lsn;
> >
> > Without being more familiar with this code, it's difficult for me to
> > judge exactly how many of such cases are in these patches.
>
> Thanks for the above example. I will fix this one later, after
> receiving some more reviews and reports of other Assert cases just
> like this one.
>

I think on similar lines below asserts also need to be changed.

1.
+static void
+apply_handle_begin_prepare(StringInfo s)
+{
+ LogicalRepPreparedTxnData begin_data;
+ char gid[GIDSIZE];
+
+ /* Tablesync should never receive prepare. */
+ Assert(!am_tablesync_worker());

2.
+static void
+TwoPhaseTransactionGid(Oid subid, TransactionId xid, char *gid, int szgid)
+{
..
+ Assert(TransactionIdIsValid(xid));

3.
+static void
+apply_handle_stream_prepare(StringInfo s)
+{
+ int nchanges = 0;
+ LogicalRepPreparedTxnData prepare_data;
+ TransactionId xid;
+ char gid[GIDSIZE];
+
..
..
+
+ /* Tablesync should never receive prepare. */
+ Assert(!am_tablesync_worker());


-- 
With Regards,
Amit Kapila.




Re: [HACKERS] logical decoding of two-phase transactions

2021-06-17 Thread vignesh C
On Thu, Jun 17, 2021 at 7:40 PM Ajin Cherian  wrote:
>
> On Wed, Jun 16, 2021 at 9:08 AM Peter Smith  wrote:
> >
> > On Fri, Jun 11, 2021 at 6:34 PM Peter Smith  wrote:
> >
> > > KNOWN ISSUES: This v85 patch was built and tested using yesterday's
> > > master, but due to lots of recent activity in the replication area I
> > > expect it will be broken for HEAD very soon (if not already). I'll
> > > rebase it again ASAP to try to keep it in working order.
> > >
> >
> > Please find attached the latest patch set v86*
>
>
> I've modified the patchset based on comments received on thread [1]
> for the CREATE_REPLICATION_SLOT
> changes. Based on the request from that thread, I've taken out those
> changes as two new patches (patch-1 and patch-2)
> and made this into 5 patches. I've also changed the logic to align
> with the changes in the command syntax.

Few comments:
1) This content is present in
v87-0001-Add-option-to-set-two-phase-in-CREATE_REPLICATIO.patch and
v87-0003-Add-support-for-prepared-transactions-to-built-i.patch, it
can be removed from one of them
   
+   TWO_PHASE
+   
+
+ Specify that this logical replication slot supports decoding
of two-phase
+ transactions. With this option, two-phase commands like
+ PREPARE TRANSACTION, COMMIT
PREPARED
+ and ROLLBACK PREPARED are decoded and transmitted.
+ The transaction will be decoded and transmitted at
+ PREPARE TRANSACTION time.
+
+   
+  
+
+  

2) This change is not required, it can be removed:
   
Logical Decoding Examples
-

 The following example demonstrates controlling logical decoding using the
 SQL interface.

3) We could add comment mentioning example 1 at the beginning of
example 1 and example 2 for the newly added example with description,
that will clearly mark the examples.
COMMIT 693
 ControlC
 $ pg_recvlogical -d postgres --slot=test --drop-slot
+
+$ pg_recvlogical -d postgres --slot=test --create-slot --two-phase
+$ pg_recvlogical -d postgres --slot=test --start -f -

4) You could mention "Before you use two-phase commit commands, you
must set max_prepared_transactions to at least 1" for example 2.
 $ pg_recvlogical -d postgres --slot=test --drop-slot
+
+$ pg_recvlogical -d postgres --slot=test --create-slot --two-phase
+$ pg_recvlogical -d postgres --slot=test --start -f -

5) This should be before verbose, the options are documented alphabetically
+ 
+   -t
+   --two-phase
+   
+   
+Enables two-phase decoding. This option should only be used with
+--create-slot
+   
+   
+ 

6)  This should be before verbose, the options are printed alphabetically
printf(_("  -v, --verbose  output verbose messages\n"));
+   printf(_("  -t, --two-phaseenable two-phase decoding
when creating a slot\n"));
printf(_("  -V, --version  output version information,
then exit\n"));

Regards,
Vignesh




Re: snapshot too old issues, first around wraparound and then more.

2021-06-17 Thread Thomas Munro
On Wed, Apr 15, 2020 at 2:21 PM Thomas Munro  wrote:
> On Mon, Apr 13, 2020 at 2:58 PM Thomas Munro  wrote:
> > On Fri, Apr 3, 2020 at 2:22 PM Peter Geoghegan  wrote:
> > > I'm thinking of a version of "snapshot too old" that amounts to a
> > > statement timeout that gets applied for xmin horizon type purposes in
> > > the conventional way, while only showing an error to the client if and
> > > when they access literally any buffer (though not when the relation is
> > > a system catalog). Is it possible that something along those lines is
> > > appreciably better than nothing to users? If it is, and if we can find
> > > a way to manage the transition, then maybe we could tolerate
> > > supporting this greatly simplified implementation of "snapshot too
> > > old".

I rebased that patch and fleshed it out just a bit more.  Warning:
experiment grade, incomplet, inkorrect, but I think it demonstrates
the main elements of Peter's idea from last year.

For READ COMMITTED, the user experience is much like a statement
timeout, except that it can keep doing stuff that doesn't read
non-catalog data.  Trivial example: pg_sleep(10) happily completes
with old_snapshot_threshold=5s, as do queries that materialise all
their input data in time, and yet your xmin is zapped.  For REPEATABLE
READ it's obviously tied to your first query, and produces "snapshot
too old" if you repeatedly SELECT from a little table and your time
runs out.

In this version I put the check into the heapam visibility + vismap
checks, instead of in the buffer access code.  The reason is that the
lower level buffer access routines don't have a snapshot, but if you
push the check down to buffer access and just check the "oldest"
snapshot (definition in this patch, not in master), then you lose some
potential granularity with different cursors.  If you try to put it at
a higher level in places that have a snapshot and access a buffer, you
run into the problem of being uncertain that you've covered all the
bases.  But I may be underthinking that.

Quite a few unresolved questions about catalog and toast snapshots and
other special stuff, as well as the question of whether it's actually
useful or the overheads can be made cheap enough.

> Hmm.  I suppose it must be possible to put the LSN check back: if
> (snapshot->too_old && PageGetLSN(page) > snapshot->lsn) ereport(...).
> Then the granularity would be the same as today -- block level -- but
> the complexity is transferred from the pruning side (has to deal with
> xid time map) to the snapshot-owning side (has to deal with timers,
> CFI() and make sure all snapshots are registered).  Maybe not a great
> deal, and maybe not easier than fixing the existing bugs.

It is a shame to lose the current LSN-based logic; it's really rather
clever (except for being broken).

> One problem is all the new setitimer() syscalls.  I feel like that
> could be improved, as could statement_timeout, by letting existing
> timers run rather than repeatedly rescheduling eagerly, so that eg a 1
> minute timeout never gets rescheduled more than once per minute.  I
> haven't looked into that, but I guess it's no worse than the existing
> implement's overheads anyway.

At least that problem was fixed, by commit 09cf1d52.  (Not entirely
sure why I got away with not reenabling the timer between queries, but
I didn't look very hard).
From 656af1863e841664d4c342bb7c783834912d3906 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Wed, 15 Apr 2020 12:35:51 +1200
Subject: [PATCH v2] Implement snapshot_too_old using a timer.

Remove the previous implementation of "snapshot too old", and replace it
with a very conservative implementation based only on time passing.

Use a timer to mark active and registered snapshots as too old and
potentially advance the backend's xmin.  Snapshots that reach this state
can no longer be used to read buffers, because data they need to see may
be vacuumed away.

XXX Incomplete experimental code!
---
 contrib/Makefile  |   1 -
 contrib/bloom/blscan.c|   1 -
 contrib/old_snapshot/Makefile |  21 -
 contrib/old_snapshot/old_snapshot--1.0.sql|  14 -
 contrib/old_snapshot/old_snapshot.control |   5 -
 contrib/old_snapshot/time_mapping.c   | 160 
 src/backend/access/brin/brin_revmap.c |   7 -
 src/backend/access/common/toast_internals.c   |   5 +-
 src/backend/access/gin/ginbtree.c |   2 -
 src/backend/access/gin/ginget.c   |   4 -
 src/backend/access/gist/gistget.c |   1 -
 src/backend/access/hash/hashsearch.c  |   6 -
 src/backend/access/heap/heapam.c  |  11 -
 src/backend/access/heap/heapam_visibility.c   |   2 +
 src/backend/access/heap/pruneheap.c   |  96 +--
 src/backend/access/heap/vacuumlazy.c  |   4 +-
 src/backend/access/nbtree/nbtsearch.c |   9 -
 src/backend/access/spgist/spgscan.c   |   1 -
 

Re: Optionally automatically disable logical replication subscriptions on error

2021-06-17 Thread Amit Kapila
On Fri, Jun 18, 2021 at 1:48 AM Mark Dilger
 wrote:
>
> Hackers,
>
> Logical replication apply workers for a subscription can easily get stuck in 
> an infinite loop of attempting to apply a change, triggering an error (such 
> as a constraint violation), exiting with an error written to the subscription 
> worker log, and restarting.
>
> As things currently stand, only superusers can create subscriptions.  Ongoing 
> work to delegate superuser tasks to non-superusers creates the potential for 
> even more errors to be triggered, specifically, errors where the apply worker 
> does not have permission to make changes to the target table.
>
> The attached patch makes it possible to create a subscription using a new 
> subscription_parameter, "disable_on_error", such that rather than going into 
> an infinite loop, the apply worker will catch errors and automatically 
> disable the subscription, breaking the loop.  The new parameter defaults to 
> false.  When false, the PG_TRY/PG_CATCH overhead is avoided, so for 
> subscriptions which do not use the feature, there shouldn't be any change.  
> Users can manually clear the error after fixing the underlying issue with an 
> ALTER SUBSCRIPTION .. ENABLE command.
>

I see this idea has merits and it will help users to repair failing
subscriptions. Few points on a quick look at the patch: (a) The patch
seem to be assuming that the error can happen only by the apply worker
but I think the constraint violation can happen via one of the table
sync workers as well, (b) What happens if the error happens when you
are updating the error information in the catalog table. I think
instead of seeing the actual apply time error, the user might see some
other for which it won't be clear what is an appropriate action.

We are also discussing another action like skipping the apply of the
transaction on an error [1]. I think it is better to evaluate both the
proposals as one seems to be an extension of another. Adding
Sawada-San, as he is working on the other proposal.

[1] - 
https://www.postgresql.org/message-id/CAD21AoDeScrsHhLyEPYqN3sydg6PxAPVBboK%3D30xJfUVihNZDA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: [PoC] Federated Authn/z with OAUTHBEARER

2021-06-17 Thread Michael Paquier
On Tue, Jun 08, 2021 at 04:37:46PM +, Jacob Champion wrote:
> 1. Prep
> 
>   0001 decouples the SASL code from the SCRAM implementation.
>   0002 makes it possible to use common/jsonapi from the frontend.
>   0003 lets the json_errdetail() result be freed, to avoid leaks.
>
> The first three patches are, hopefully, generally useful outside of
> this implementation, and I'll plan to register them in the next
> commitfest. The middle two patches are the "interesting" pieces, and
> I've split them into client and server for ease of understanding,
> though neither is particularly useful without the other.

Beginning with the beginning, could you spawn two threads for the
jsonapi rework and the SASL/SCRAM business?  I agree that these look
independently useful.  Glad to see someone improving the code with
SASL and SCRAM which are too inter-dependent now.  I saw in the RFCs
dedicated to OAUTH the need for the JSON part as well.

+#  define check_stack_depth()
+#  ifdef JSONAPI_NO_LOG
+#define json_log_and_abort(...) \
+   do { fprintf(stderr, __VA_ARGS__); exit(1); } while(0)
+#  else
In patch 0002, this is the wrong approach.  libpq will not be able to
feed on such reports, and you cannot use any of the APIs from the
palloc() family either as these just fail on OOM.  libpq should be
able to know about the error, and would fill in the error back to the
application.  This abstraction is not necessary on HEAD as
pg_verifybackup is fine with this level of reporting.  My rough guess
is that we will need to split the existing jsonapi.c into two files,
one that can be used in shared libraries and a second that handles the 
errors.

+   /* TODO: SASL_EXCHANGE_FAILURE with output is forbidden in SASL */
if (result == SASL_EXCHANGE_SUCCESS)
sendAuthRequest(port,
AUTH_REQ_SASL_FIN,
output,
outputlen);
Perhaps that's an issue we need to worry on its own?  I didn't recall
this part..
--
Michael


signature.asc
Description: PGP signature


Re: pgbench logging broken by time logic changes

2021-06-17 Thread Noah Misch
On Wed, Jun 16, 2021 at 03:13:30PM -0400, Andrew Dunstan wrote:
> On 6/16/21 2:59 PM, Fabien COELHO wrote:
> > The key feedback for me is the usual one: what is not tested does not
> > work. Wow:-)
> 
> Agreed.
> 
> > I'm unhappy because I already added tap tests for time-sensitive
> > features (-T and others, maybe logging aggregates, cannot remember),
> > which have been removed because they could fail under some
> > circonstances (eg very very very very slow hosts), or required some
> > special handling (a few lines of code) in pgbench, and the net result
> > of this is there is not a single test in place for some features:-(
> 
> I'm not familiar with exactly what happened in this case, but tests need
> to be resilient over a wide range of performance characteristics. One
> way around this issue might be to have a way of detecting that it's on a
> slow platform and if so either skipping tests (Test::More provides
> plenty of support for this) or expecting different results.

Detection would need the host to be consistently slow, like running under
Valgrind or a 20-year-old CPU.  We also test on systems having highly-variable
performance due to other processes competing for the same hardware.  I'd
perhaps add a "./configure --enable-realtime-tests" option that enables
affected tests.  Testers should use the option whenever the execution
environment has sufficient reserved CPU.




Re: Decoding speculative insert with toast leaks memory

2021-06-17 Thread Amit Kapila
On Thu, Jun 17, 2021 at 2:55 PM Amit Kapila  wrote:
>
> Your patch looks good to me as well. I would like to retain the
> comment as it is from master for now. I'll do some testing and push it
> tomorrow unless there are additional comments.
>

Pushed!

-- 
With Regards,
Amit Kapila.




Re: snapshot too old issues, first around wraparound and then more.

2021-06-17 Thread Noah Misch
On Wed, Jun 16, 2021 at 12:00:57PM -0400, Tom Lane wrote:
> Greg Stark  writes:
> > I think Andres's point earlier is the one that stands out the most for me:
> > 
> > > I still think that's the most reasonable course. I actually like the
> > > feature, but I don't think a better implementation of it would share
> > > much if any of the current infrastructure.
> > 
> > That makes me wonder whether ripping the code out early in the v15
> > cycle wouldn't be a better choice. It would make it easier for someone
> > to start work on a new implementation.

Deleting the feature early is better than deleting the feature late,
certainly.  (That doesn't tell us about the relative utility of deleting the
feature early versus never deleting the feature.)

> > Fwiw I too think the basic idea of the feature is actually awesome.
> > There are tons of use cases where you might have one long-lived
> > transaction working on a dedicated table (or even a schema) that will
> > never look at the rapidly mutating tables in another schema and never
> > trigger the error even though those tables have been vacuumed many
> > times over during its run-time.
> 
> I agree that's a great use-case.  I don't like this implementation though.
> I think if you want to set things up like that, you should draw a line
> between the tables it's okay for the long transaction to touch and those
> it isn't, and then any access to the latter should predictably draw an
> error.

I agree that would be a useful capability, but it solves a different problem.

> I really do not like the idea that it might work anyway, because
> then if you accidentally break the rule, you have an application that just
> fails randomly ... probably only on the days when the boss wants that
> report *now* not later.

Every site adopting SERIALIZABLE learns that transactions can fail due to
mostly-unrelated concurrent activity.  ERRCODE_SNAPSHOT_TOO_OLD is another
kind of serialization failure, essentially.  Moreover, one can opt for an
old_snapshot_threshold value longer than the runtime of the boss's favorite
report.  Of course, nobody would reject a replacement that has all the
advantages of old_snapshot_threshold and fewer transaction failures.  Once
your feature rewrite starts taking away advantages to achieve fewer
transaction failures, that rewrite gets a lot more speculative.

nm




Re: Fix for segfault in logical replication on master

2021-06-17 Thread Amit Kapila
On Thu, Jun 17, 2021 at 9:26 PM Mark Dilger
 wrote:
>
> > On Jun 17, 2021, at 6:40 AM, Amit Kapila  wrote:
> >
> > I think such a problem won't happen because we are using historic
> > snapshots in this context. We rely on that in a similar way in
> > reorderbuffer.c, see ReorderBufferProcessTXN.
>
> I think you are right, but that's the part I have trouble fully convincing 
> myself is safe.  We certainly have an historic snapshot when we call 
> RelationGetIndexList, but that has an early exit if the relation already has 
> fields set, and we don't know if those fields were set before or after the 
> historic snapshot was taken.  Within the context of the pluggable 
> infrastructure, I think we're safe.  The only caller of 
> RelationGetIdentityKeyBitmap() in core is logicalrep_write_attrs(), which is 
> only called by logicalrep_write_rel(), which is only called by 
> send_relation_and_attrs(), which is only called by maybe_send_schema(), which 
> is called by pgoutput_change() and pgoutput_truncate(), both being callbacks 
> in core's logical replication plugin.
>
> ReorderBufferProcessTXN calls SetupHistoricSnapshot before opening the 
> relation and then calling ReorderBufferApplyChange to invoke the plugin on 
> that opened relation, so the relation's fields could not have been setup 
> before the snapshot was taken.  Any other plugin would similarly get invoked 
> after that same logic, so they'd be fine, too.  The problem would only be if 
> somebody called RelationGetIdentityKeyBitmap() or one of its calling 
> functions from outside that infrastructure.  Is that worth worrying about?  
> The function comments for those mention having an historic snapshot, and the 
> Assert will catch if code doesn't have one, but I wonder how much of a trap 
> for the unwary that is, considering that somebody might open the relation and 
> lookup indexes for the relation before taking an historic snapshot and 
> calling these functions.
>

I think in such a case the caller must call InvalidateSystemCaches
before setting up a historic snapshot, otherwise, there could be other
problems as well.

> I thought it was cheap enough to check that the relation we open is an index, 
> because if it is not, we'll segfault when accessing fields of the 
> relation->rd_index struct.  I wouldn't necessarily advocate doing any really 
> expensive checks here, but a quick sanity check seemed worth the effort.
>

I am not telling you anything about the cost of these sanity checks. I
suggest you raise elog rather than return NULL because if this happens
there is definitely some problem and continuing won't be a good idea.

-- 
With Regards,
Amit Kapila.




Re: Decoding of two-phase xacts missing from CREATE_REPLICATION_SLOT command

2021-06-17 Thread Ajin Cherian
On Tue, Jun 15, 2021 at 5:34 PM Ajin Cherian  wrote:

Since we've decided to not commit this for PG-14, I've added these
patches along with the larger patch-set for
subscriber side 2pc in thread [1]

[1] -  
https://www.postgresql.org/message-id/cahut+pujktnrjfre0vbufwmz9bescc__nt+puhbsaunw2bi...@mail.gmail.com

regards,
Ajin Cherian




RE: Transactions involving multiple postgres foreign servers, take 2

2021-06-17 Thread tsunakawa.ta...@fujitsu.com
From: Robert Haas 
> On Tue, Jun 15, 2021 at 5:51 AM tsunakawa.ta...@fujitsu.com
>  wrote:
> > Postgres can do that, but other implementations can not necessaily do it, 
> > I'm
> afraid.  But before that, the FDW interface documentation doesn't describe
> anything about how to handle interrupts.  Actually, odbc_fdw and possibly
> other FDWs don't respond to interrupts.
> 
> Well, I'd consider that a bug.

I kind of hesitate to call it a bug...  Unlike libpq, JDBC (for jdbc_fdw) 
doesn't have asynchronous interface, and Oracle and PostgreSQL ODBC drivers 
don't support asynchronous interface.  Even with libpq, COMMIT (and other SQL 
commands) is not always cancellable, e.g., when the (NFS) storage server gets 
hand while writing WAL.


> > What we're talking here is mainly whether commit should return success or
> failure when some participants failed to commit in the second phase of 2PC.
> That's new to Postgres, isn't it?  Anyway, we should respect existing relevant
> specifications and (well-known) implementations before we conclude that we
> have to devise our own behavior.
> 
> Sure ... but we can only decide to do things that the implementation
> can support, and running code that might fail after we've committed
> locally isn't one of them.

Yes, I understand that Postgres may not be able to conform to specifications or 
well-known implementations in all aspects.  I'm just suggesting to take the 
stance "We carefully considered established industry specifications that we can 
base on, did our best to design the desirable behavior learned from them, but 
couldn't implement a few parts", rather than "We did what we like and can do."


> I think your comparison here is quite unfair. We work hard to add
> overhead in hot paths where it might cost, but the FDW case involves a
> network round-trip anyway, so the cost of an if-statement would surely
> be insignificant. I feel like you want to assume without any evidence
> that a local resolver can never be quick enough, even thought the cost
> of IPC between local processes shouldn't be that high compared to a
> network round trip. But you also want to suppose that we can run code
> that might fail late in the commit process even though there is lots
> of evidence that this will cause problems, starting with the code
> comments that clearly say so.

There may be better examples.  What I wanted to say is just that I believe it's 
not PG developers' standard to allow serial prepare and commit.  Let's make it 
clear what's difficult to do the 2PC from each client session in normal 
operation without going through the resolver.


Regards
Takayuki Tsunakawa



RE: locking [user] catalog tables vs 2pc vs logical rep

2021-06-17 Thread osumi.takami...@fujitsu.com
On Thursday, June 17, 2021 10:34 PM Simon Riggs  
wrote:
> On Thu, Jun 17, 2021 at 12:57 PM Amit Kapila 
> wrote:
> > On Thu, Jun 17, 2021 at 4:27 PM Amit Kapila 
> wrote:
> > >
> > > On Thu, Jun 17, 2021 at 8:41 AM osumi.takami...@fujitsu.com
> > >  wrote:
> > >
> > > Pushed!
> > >
> > [Responding to Simon's comments]
> >
> > > If LOCK and TRUNCATE is advised against on all user catalog tables,
> > > why would CLUSTER only apply to pg_class? Surely its locking level is the
> same as LOCK?
> > >
> >
> > Cluster will also apply to all user catalog tables. I think we can
> > extend it slightly as we have mentioned for Lock.
> 
> OK, good.
> 
> > > The use of "[user]" isn't fully explained, so it might not be clear
> > > that this applies to both Postgres catalog tables and any user tables that
> have been nominated as catalogs. Probably worth linking to the "Capabilities"
> section to explain.
> > >
> >
> > Sounds reasonable.
Simon, I appreciate your suggestions and yes,
if the user catalog table is referenced by the output plugin,
it can be another cause of the deadlock.

I'm going to post the patch for the those two changes, accordingly.


Best Regards,
Takamichi Osumi



Re: Teaching users how they can get the most out of HOT in Postgres 14

2021-06-17 Thread Peter Geoghegan
On Thu, Jun 17, 2021 at 5:55 AM Justin Pryzby  wrote:
> (Various sgml typos)

Fixed in the v3 I just posted.

> +  removed until index cleanup is completed.  This option has no
> +  effect for tables that do not have an index and is ignored if
> +  the FULL option is used.
>
> I'd say "tables that have no indexes,"

That wording wasn't mine (it just happened to be moved around by
reformatting), but I think you're right. I went with your suggestion.

Thanks for taking a look
-- 
Peter Geoghegan




Re: Teaching users how they can get the most out of HOT in Postgres 14

2021-06-17 Thread Peter Geoghegan
On Thu, Jun 17, 2021 at 2:14 AM Masahiko Sawada  wrote:
> Thank you for updating the patch! Here are comments on v2 patch:

Thanks for the review!

Attached is v3, which has all the changes that you suggested (plus the
doc stuff from Justin).

I also renamed the "default" VacOptTernaryValue (actually now called
VacOptValue) value -- it seems clearer to call this "unspecified".
Because it represents a state that has nothing to do with the default
of the reloption or GUC. Really, it means "VACUUM command didn't have
this specified explicitly" (note that this means that it always starts
out "default" in an autovacuum worker). Unspecified seems much clearer
because it directly expresses "fall back on the reloption, and then
fall back on the reloption's default". I find this much clearer -- it
is unspecified, but will have to *become* specified later, so that
vacuumlazy.c has a truly usable value ("unspecified" is never usable
in vacuumlazy.c).

> VacOptTernaryValue is no longer a ternary value. Can we rename it
> something like VacOptValue?

As I said, done that way.

> We should specify TRUE instead.

Ooops. Fixed.

> --force-index-cleanup option isn't shown in the help message.

Fixed.

> ---
> I think we also improve the tab completion for INDEX_CLEANUP option.

Fixed.

> How about listing the available values of INDEX_CLEANUP here instead
> of enum? For example, we do a similar thing in the description of
> FORMAT option of EXPLAIN command. It would be easier to perceive all
> available values.

That looks much better. Fixed.

> It should be --force-index-cleanup.

Fixed.

--
Peter Geoghegan


v3-0001-Generalize-VACUUM-s-INDEX_CLEANUP-option.patch
Description: Binary data


Re: Centralizing protective copying of utility statements

2021-06-17 Thread Tom Lane
Michael Paquier  writes:
> The DECLARE CURSOR case in ExplainOneUtility() does a copy of a Query.
> Perhaps a comment should be added to explain why a copy is still
> required?

I did add a comment about that in the v2 patch --- the issue is the
call path for EXPLAIN EXECUTE.

regards, tom lane




Re: [HACKERS] logical decoding of two-phase transactions

2021-06-17 Thread Peter Smith
On Thu, Jun 17, 2021 at 6:22 PM Greg Nancarrow  wrote:
>
> On Wed, Jun 16, 2021 at 9:08 AM Peter Smith  wrote:
> >
> >
> > Please find attached the latest patch set v86*
> >
>
> A couple of comments:
>
> (1)  I think one of my suggested changes was missed (or was that 
> intentional?):
>
> BEFORE:
> +The LSN of the commit prepared.
> AFTER:
> +The LSN of the commit prepared transaction.
>

No, not missed. I already dismissed that one and wrote about it when I
posted v85 [1].

>
> (2)  In light of Tom Lane's recent changes in:
>
> fe6a20ce54cbbb6fcfe9f6675d563af836ae799a (Don't use Asserts to check
> for violations of replication protocol)
>
> there appear to be some instances of such code in these patches.

Yes, I already noted [2] there are likely to be such cases which need
to be fixed.

>
> For example, in the v86-0001 patch:
>
> +/*
> + * Handle PREPARE message.
> + */
> +static void
> +apply_handle_prepare(StringInfo s)
> +{
> + LogicalRepPreparedTxnData prepare_data;
> + char gid[GIDSIZE];
> +
> + logicalrep_read_prepare(s, _data);
> +
> + Assert(prepare_data.prepare_lsn == remote_final_lsn);
>
> The above Assert() should be changed to something like:
>
> +if (prepare_data.prepare_lsn != remote_final_lsn)
> +ereport(ERROR,
> +(errcode(ERRCODE_PROTOCOL_VIOLATION),
> + errmsg_internal("incorrect prepare LSN %X/%X in
> prepare message (expected %X/%X)",
> + LSN_FORMAT_ARGS(prepare_data.prepare_lsn),
> + LSN_FORMAT_ARGS(remote_final_lsn;
>
> Without being more familiar with this code, it's difficult for me to
> judge exactly how many of such cases are in these patches.

Thanks for the above example. I will fix this one later, after
receiving some more reviews and reports of other Assert cases just
like this one.

--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPvOVkiVBf4P5chdVSoVs5%3Da%3DF_GtTSHHoXDb4LiOM_8Qw%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAHut%2BPvdio4%3DOE6cz5pr8VcJNcAgt5uGBPdKf-tnGEMa1mANGg%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Avoid call MaintainOldSnapshotTimeMapping, if old_snapshot_threshold is disabled.

2021-06-17 Thread Ranier Vilela
Em qui., 17 de jun. de 2021 às 22:08, Andres Freund 
escreveu:

> Hi,
>
> On 2021-06-17 21:27:15 -0300, Ranier Vilela wrote:
> > While another long thread discusses the situation of
> old_snapshot_threshold,
> > I believe we can improve procarray.c by avoiding calling
> > MaintainOldSnapshotTimeMapping (src/backend/utils/time/snapmgr.c).
> >
> > There's a very explicit comment there, which says (line 1866):
> > "Never call this function when old snapshot checking is disabled."
> >
> > Well, assert should never be used to validate a condition that certainly
> > occurs at runtime.
>
> I don't see how it can happen at runtime currently?
>
> > Since old_snapshot_threshold is -1, it is disabled, so
> > MaintainOldSnapshotTimeMapping doesn't need to be run, right?
>
> It *isn't* run, the caller checks OldSnapshotThresholdActive() first.
>
True. My mistake.
I didn't check GetSnapshotDataInitOldSnapshot correctly.

regards,
Ranier Vilela


Re: Outdated replication protocol error?

2021-06-17 Thread Jeff Davis
On Wed, 2021-06-16 at 16:17 -0700, Andres Freund wrote:
> I think we should explicitly compute the current timeline before
> using
> ThisTimelineID. E.g. in StartReplication() call a new version of
> GetFlushRecPtr() that also returns the current timeline id.

I think all we need to do is follow the pattern in IdentifySystem() by
calling:

am_cascading_walsender = RecoveryInProgress();

first. There are three cases:

1. If the server was a primary the last time RecoveryInProgress() was
called, and it's still a primary, then it returns false immediately.
ThisTimeLineID should be set properly already.

2. If the server was a secondary the last time RecoveryInProgress() was
called, and now it's a primary, then it updates ThisTimeLineID in
InitXLOGAccess() and returns false.

3. If the server was a secondary the last time, and is still a
secondary, then it returns true. Then, StartReplication() will call
GetStandbyFlushRecPtr(), which will update ThisTimeLineID.

It was confusing to me for a while because I was trying to sort out
whether am_cascading_walsender and/or ThisTimeLineID could be out of
date, and how that would result in not updating ThisTimeLineID; and
also why there was a difference between IdentifySystem() and
StartReplication().

Simple patch attached. I didn't test it yet, but wanted to post my
analysis.

Regards,
Jeff Davis

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 32245363561..17cb21220e0 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -573,11 +573,6 @@ StartReplication(StartReplicationCmd *cmd)
 	StringInfoData buf;
 	XLogRecPtr	FlushPtr;
 
-	if (ThisTimeLineID == 0)
-		ereport(ERROR,
-(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("IDENTIFY_SYSTEM has not been run before START_REPLICATION")));
-
 	/* create xlogreader for physical replication */
 	xlogreader =
 		XLogReaderAllocate(wal_segment_size, NULL,
@@ -619,6 +614,7 @@ StartReplication(StartReplicationCmd *cmd)
 	 * that. Otherwise use the timeline of the last replayed record, which is
 	 * kept in ThisTimeLineID.
 	 */
+	am_cascading_walsender = RecoveryInProgress();
 	if (am_cascading_walsender)
 	{
 		/* this also updates ThisTimeLineID */


Re: fdatasync performance problem with large number of DB files

2021-06-17 Thread Justin Pryzby
Thomas, could you comment on this ?

On Sat, May 29, 2021 at 02:23:21PM -0500, Justin Pryzby wrote:
> On Tue, May 25, 2021 at 07:13:59PM -0500, Justin Pryzby wrote:
> > On Sat, Mar 20, 2021 at 12:16:27PM +1300, Thomas Munro wrote:
> > > > > + {
> > > > > + {"recovery_init_sync_method", PGC_POSTMASTER, 
> > > > > ERROR_HANDLING_OPTIONS,
> > > > > + gettext_noop("Sets the method for synchronizing 
> > > > > the data directory before crash recovery."),
> > > > > + },
> > 
> > Is there any reason why this can't be PGC_SIGHUP ?
> > (Same as restart_after_crash, remove_temp_files_after_crash)
> 
> I can't see any reason why this is nontrivial.
> What about data_sync_retry?
> 
> commit 2d2d2e10f99548c486b50a1ce095437d558e8649
> Author: Justin Pryzby 
> Date:   Sat May 29 13:41:14 2021 -0500
> 
> Change recovery_init_sync_method to PGC_SIGHUP..
> 
> The setting has no effect except during startup.
> But it's nice to be able to change the setting dynamically, which is 
> expected
> to be pretty useful to an admin following crash recovery when turning the
> service off and on again is not so appealing.
> 
> See also: 2941138e6, 61752afb2
> 
> diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
> index d8c0fd3315..ab9916eac5 100644
> --- a/doc/src/sgml/config.sgml
> +++ b/doc/src/sgml/config.sgml
> @@ -9950,7 +9950,8 @@ dynamic_library_path = 
> 'C:\tools\postgresql;H:\my_project\lib;$libdir'
>  appear only in kernel logs.
> 
> 
> -This parameter can only be set at server start.
> +This parameter can only be set in the 
> postgresql.conf
> +file or on the server command line.
> 
>
>   
> diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
> index 87bc688704..796b4e83ce 100644
> --- a/src/backend/utils/misc/guc.c
> +++ b/src/backend/utils/misc/guc.c
> @@ -4945,7 +4945,7 @@ static struct config_enum ConfigureNamesEnum[] =
>   },
>  
>   {
> - {"recovery_init_sync_method", PGC_POSTMASTER, 
> ERROR_HANDLING_OPTIONS,
> + {"recovery_init_sync_method", PGC_SIGHUP, 
> ERROR_HANDLING_OPTIONS,
>   gettext_noop("Sets the method for synchronizing the 
> data directory before crash recovery."),
>   },
>   _init_sync_method,
> diff --git a/src/backend/utils/misc/postgresql.conf.sample 
> b/src/backend/utils/misc/postgresql.conf.sample
> index ddbb6dc2be..9c4c4a9eec 100644
> --- a/src/backend/utils/misc/postgresql.conf.sample
> +++ b/src/backend/utils/misc/postgresql.conf.sample
> @@ -774,7 +774,6 @@
>   # data?
>   # (change requires restart)
>  #recovery_init_sync_method = fsync   # fsync, syncfs (Linux 5.8+)
> - # (change requires restart)
>  
>  
>  
> #--




Re: Avoid call MaintainOldSnapshotTimeMapping, if old_snapshot_threshold is disabled.

2021-06-17 Thread Andres Freund
Hi,

On 2021-06-17 21:27:15 -0300, Ranier Vilela wrote:
> While another long thread discusses the situation of old_snapshot_threshold,
> I believe we can improve procarray.c by avoiding calling
> MaintainOldSnapshotTimeMapping (src/backend/utils/time/snapmgr.c).
> 
> There's a very explicit comment there, which says (line 1866):
> "Never call this function when old snapshot checking is disabled."
> 
> Well, assert should never be used to validate a condition that certainly
> occurs at runtime.

I don't see how it can happen at runtime currently?

> Since old_snapshot_threshold is -1, it is disabled, so
> MaintainOldSnapshotTimeMapping doesn't need to be run, right?

It *isn't* run, the caller checks OldSnapshotThresholdActive() first.

Greetings,

Andres Freund




Re: Centralizing protective copying of utility statements

2021-06-17 Thread Michael Paquier
On Thu, Jun 17, 2021 at 02:08:34PM -0700, Andres Freund wrote:
> Unfortunately github is annoying to search through - it doesn't seem to
> have any logic to prevent duplicates across repositories :(. Which means
> there's dozens of copies of postgres code included...

I agree with the position of doing something now while in beta.  I
have not looked at the tree, but I am rather sure that we had changes 
in the hooks while in beta phase in the past.

>> which admittedly is more than none, but it's not a huge number
>> either.  I have to think that fixing this bug reliably is a
>> more important consideration.
> 
> Sure!

The DECLARE CURSOR case in ExplainOneUtility() does a copy of a Query.
Perhaps a comment should be added to explain why a copy is still
required?
--
Michael


signature.asc
Description: PGP signature


Re: pgbench logging broken by time logic changes

2021-06-17 Thread Michael Paquier
On Fri, Jun 18, 2021 at 12:49:42AM +1200, Thomas Munro wrote:
> I'm on the fence TBH, I can see that it's really small things and it
> seems we have the patches, but it's late, late enough that
> benchmarking gurus are showing up to benchmark with it for real, and
> it's not great to be getting in the way of that with what is mostly
> refactoring work, so I don't think it would be a bad thing if we
> agreed to try again in 15.  (A number of arguments for and against
> moving pgbench out of the postgresql source tree and release cycle
> occur to me, but I guess that's a topic for another thread.)

I may be missing something of course, but I don't see any strong
reason why we need to do a revert here if we have patches to discuss
first.

>> Note that Michaël is having a look at fixing pgbench logging issues.
> 
> Yeah I've been catching up with these threads.

Thomas, do you want me to look more at this issue?  I don't feel
comfortable with the idea of doing anything if you are planning to
look at this thread and you are the owner here, so that should be your
call.

From what I can see, we have the same area getting patched with
patches across two threads, so it seems better to give up the other
thread and just focus on the discussion here, where v7 has been sent:
https://www.postgresql.org/message-id/20210617175542.ad6b9b82926d8469e8520...@sraoss.co.jp
https://www.postgresql.org/message-id/CAF7igB1r6wRfSCEAB-iZBKxkowWY6%2BdFF2jObSdd9%2BiVK%2BvHJg%40mail.gmail.com
--
Michael


signature.asc
Description: PGP signature


Avoid call MaintainOldSnapshotTimeMapping, if old_snapshot_threshold is disabled.

2021-06-17 Thread Ranier Vilela
Hi,

While another long thread discusses the situation of old_snapshot_threshold,
I believe we can improve procarray.c by avoiding calling
MaintainOldSnapshotTimeMapping (src/backend/utils/time/snapmgr.c).

There's a very explicit comment there, which says (line 1866):
"Never call this function when old snapshot checking is disabled."

Well, assert should never be used to validate a condition that certainly
occurs at runtime.

Since old_snapshot_threshold is -1, it is disabled, so
MaintainOldSnapshotTimeMapping doesn't need to be run, right?

regards,
Ranier Vilela


avoid_call_if_old_snapshot_threshold_is_disabled.patch
Description: Binary data


Re: Replication protocol doc fix

2021-06-17 Thread Jeff Davis
On Thu, 2021-06-17 at 12:42 -0400, Robert Haas wrote:
> On a casual read-through this seems pretty reasonable, but it
> essentially documents that libpq is doing the wrong thing by sending
> Sync unconditionally. As I say above, I disagree with that from a
> philosophical perspective. Then again, unless we're willing to
> redefine the wire protocol, I don't have an alternative to offer.

What if we simply mandate that a Sync must be sent before the server
will respond with CopyInResponse/CopyBothResponse, and the client must
send another Sync after CopyDone/CopyFail (or after receiving an
ErrorResponse, if the client isn't going to send a CopyDone/CopyFail)?

This will follow what libpq is already doing today, as far as I can
tell, and it will leave the server in a definite state.

In theory, it could break a client that issues Parse+Bind+Execute for a
CopyIn/CopyBoth command without a Sync, but I'm not sure there are any
clients that do that, and it's arguable whether the documentation
permitted that or not anyway.

I hacked together a quick patch; attached.

Regards,
Jeff Davis

diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index fdf57f15560..f936f76a257 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -171,16 +171,43 @@ ReceiveCopyBegin(CopyFromState cstate)
 	StringInfoData buf;
 	int			natts = list_length(cstate->attnumlist);
 	int16		format = (cstate->opts.binary ? 1 : 0);
+	int			mtype;
 	int			i;
 
+	cstate->copy_src = COPY_FRONTEND;
+	cstate->fe_msgbuf = makeStringInfo();
+
+	HOLD_CANCEL_INTERRUPTS();
+	pq_startmsgread();
+	mtype = pq_getbyte();
+
+	if (mtype == EOF)
+		ereport(ERROR,
+(errcode(ERRCODE_CONNECTION_FAILURE),
+ errmsg("unexpected EOF on client connection with an open transaction")));
+
+	/* expecting a Sync */
+	if (mtype != 'S')
+		ereport(ERROR,
+(errcode(ERRCODE_PROTOCOL_VIOLATION),
+ errmsg("unexpected message type 0x%02X before COPY FROM STDIN; expected Sync",
+		mtype)));
+
+	/* Now collect the message body */
+	if (pq_getmessage(cstate->fe_msgbuf, PQ_SMALL_MESSAGE_LIMIT))
+		ereport(ERROR,
+(errcode(ERRCODE_CONNECTION_FAILURE),
+ errmsg("unexpected EOF on client connection with an open transaction")));
+
+	RESUME_CANCEL_INTERRUPTS();
+
 	pq_beginmessage(, 'G');
 	pq_sendbyte(, format);	/* overall format */
 	pq_sendint16(, natts);
 	for (i = 0; i < natts; i++)
 		pq_sendint16(, format); /* per-column formats */
 	pq_endmessage();
-	cstate->copy_src = COPY_FRONTEND;
-	cstate->fe_msgbuf = makeStringInfo();
+
 	/* We *must* flush here to ensure FE knows it can send. */
 	pq_flush();
 }


Re: Patch for bug #17056 fast default on non-plain table

2021-06-17 Thread Tom Lane
Andrew Dunstan  writes:
> On 6/17/21 1:45 PM, Tom Lane wrote:
>> OK.  One other point is that in HEAD, you only need the hunk that
>> prevents atthasmissing from becoming incorrectly set.

> Good point. Should I replace the relcache.c changes in HEAD with an
> Assert? Or just skip them altogether?

I wouldn't bother touching relcache.c.

regards, tom lane




Re: Patch for bug #17056 fast default on non-plain table

2021-06-17 Thread Andrew Dunstan


On 6/17/21 1:45 PM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> revised patch attached.
> OK.  One other point is that in HEAD, you only need the hunk that
> prevents atthasmissing from becoming incorrectly set.  The hacks
> to cope with it being already wrong are only needed in the back
> branches.  Since we already forced initdb for beta2, there will
> not be any v14 installations in which pg_attribute contains
> a wrong value.
>
>   



Good point. Should I replace the relcache.c changes in HEAD with an
Assert? Or just skip them altogether?


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Centralizing protective copying of utility statements

2021-06-17 Thread Tom Lane
I wrote:
> (In any case, if someone does get excited about this, they
> could rearrange things to push the copyObject calls into the
> individual arms of the switch in ProcessUtility.  Personally
> though I doubt it could be worth the code bloat.)

It occurred to me to try making the copying code look like

if (readOnlyTree)
{
switch (nodeTag(parsetree))
{
case T_TransactionStmt:
/* stmt is immutable anyway, no need to copy */
break;
default:
pstmt = copyObject(pstmt);
parsetree = pstmt->utilityStmt;
break;
}
}

This didn't move the needle at all, in fact it seems maybe a
shade slower:

tps = 23502.288878 (without initial connection time)
tps = 23643.821923 (without initial connection time)
tps = 23082.976795 (without initial connection time)
tps = 23547.527641 (without initial connection time)

So I think this confirms my gut feeling that copyObject on a
TransactionStmt is negligible.  To the extent that the prior
measurement shows a real difference, it's probably a chance effect
of changing code layout elsewhere.

regards, tom lane




Re: Centralizing protective copying of utility statements

2021-06-17 Thread Andres Freund
Hi,

On 2021-06-17 16:50:57 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2021-06-17 15:53:22 -0400, Tom Lane wrote:
> >> Uh, nobody ever promised that server-internal APIs are frozen as of beta1;
> >> that would be a horrid crimp on our ability to fix bugs during beta.
> 
> > Sure, there's no promise. But I still think it's worth taking the amount
> > of breakage more into account than pre beta?
> 
> Are there really so many people using the ProcessUtility hook?
> In a quick look on codesearch.debian.net, I found
> 
> hypopg
> pgaudit
> pgextwlist
> pglogical

The do seem to be quite a few more outside of Debian. E.g.
https://github.com/search?p=2=ProcessUtility_hook=Code
shows quite a few.

Unfortunately github is annoying to search through - it doesn't seem to
have any logic to prevent duplicates across repositories :(. Which means
there's dozens of copies of postgres code included...


> which admittedly is more than none, but it's not a huge number
> either.  I have to think that fixing this bug reliably is a
> more important consideration.

Sure!

Greetings,

Andres Freund




Re: pgbench logging broken by time logic changes

2021-06-17 Thread Fabien COELHO



Hello Thomas,


I prepared a draft revert patch for discussion, just in case it comes
in handy.  This reverts "pgbench: Improve time logic.", but "pgbench:
Synchronize client threads." remains (slightly rearranged).


I had a quick look.

I had forgotten that this patch also fixed the long-running brain-damaged 
tps computation that has been bothering me for years, so that one sane 
performance figure is now reported instead of two not-clear-to-interpret 
take-your-pick figures.


It would be a real loss if this user-facing fix is removed in the 
process:-(


--
Fabien.




Re: Centralizing protective copying of utility statements

2021-06-17 Thread Tom Lane
Andres Freund  writes:
> On 2021-06-17 15:53:22 -0400, Tom Lane wrote:
>> Uh, nobody ever promised that server-internal APIs are frozen as of beta1;
>> that would be a horrid crimp on our ability to fix bugs during beta.

> Sure, there's no promise. But I still think it's worth taking the amount
> of breakage more into account than pre beta?

Are there really so many people using the ProcessUtility hook?
In a quick look on codesearch.debian.net, I found

hypopg
pgaudit
pgextwlist
pglogical

which admittedly is more than none, but it's not a huge number
either.  I have to think that fixing this bug reliably is a
more important consideration.

regards, tom lane




Re: Centralizing protective copying of utility statements

2021-06-17 Thread Tom Lane
Andres Freund  writes:
> On 2021-06-16 21:39:49 -0400, Tom Lane wrote:
>> Although this adds some overhead in the form of copying of
>> utility node trees that won't actually mutate during execution,
>> I think that won't be too bad because those trees tend to be
>> small and hence cheap to copy.

> Have you evaluated the cost in some form? I don't think it a relevant
> cost for most utility statements, but there's a few exceptions that *do*
> worry me. In particular, in some workloads transaction statements are
> very frequent.

I hadn't, but since you mention it, I tried this test case:

$ cat trivial.sql 
begin;
commit;
$ pgbench -n -M prepared -f trivial.sql -T 60

I got these results on HEAD:
tps = 23853.244130 (without initial connection time)
tps = 23810.759969 (without initial connection time)
tps = 23167.608493 (without initial connection time)
tps = 23784.432746 (without initial connection time)

and adding the v2 patch:
tps = 23298.081147 (without initial connection time)
tps = 23614.466755 (without initial connection time)
tps = 23475.297853 (without initial connection time)
tps = 23530.826527 (without initial connection time)

So if you squint there might be a sub-one-percent difference
there, but it's barely distinguishable from the noise.  In
any situation where the transactions are doing actual work,
I doubt you could measure a difference.

(In any case, if someone does get excited about this, they
could rearrange things to push the copyObject calls into the
individual arms of the switch in ProcessUtility.  Personally
though I doubt it could be worth the code bloat.)

regards, tom lane




Re: pgbench logging broken by time logic changes

2021-06-17 Thread Fabien COELHO


Hello Greg,

I think the only thing you and I disagree on is that you see a "first 
issue in a corner case" where I see a process failure that is absolutely 
vital for me to improve.


Hmmm. I agree that improvements are needed, but for me there is simply a 
few missing (removed) tap tests which should/could have caught these 
issues, which are AFAICS limited to the untested area.


Given the speed of the process and the energy and patience needed to move 
things forward, reverting means that the patch is probably dead for at 
least a year, possibly an eon, and that is too bad because IMHO it was an 
improvement (my eyes are watering when I see INSTR_TIME macros), so I'd 
prefer a fix rather than a revert if it is possible, which in this case I 
think it could be.



Since the reality is that I might be the best positioned person


Good for you:-)

to actually move said process forward in a meaningful long-term way, I 
have every intention of applying pressure to the area you're frustrated 
at.  Crunchy has a whole parallel review team to the community one now 
focused on what our corporate and government customers need for software 
process control and procedure compliance.  The primary business problem 
I'm working on now is how to include performance review in that mix.


This idea has been around for some time now. It is quite a task, and a 
working and possibly extended pgbench is just one part of the overall 
software, infrastructure and procedure needed to have that.



I already know I need to re-engage with you over how I need min/max numbers
in the aggregate logging output to accomplish some valuable goals.


I do try to review every patch submitted about pgbench. Feel free to fire!

When I get around to that this summer, I'd really enjoy talking with you 
a bit, video call or something, about really any community topic you're 
frustrated with.


"frustrated" may be a strong word. I'm somehow annoyed, and unlikely to 
ever submit many tests improvements in the future.



There is no problem with proposing tests, the problem is that they are
accepted, or if they are accepted then that they are not removed at the
first small issue but rather fixed, or their limitations accepted, because
testing time-sensitive features is not as simple as testing functional
features.


For 2020 Crunchy gave me a sort of sabbatical year to research community
oriented benchmarking topics.  Having a self contained project in my home
turned out to be the perfect way to spend *that* wreck of a year.


Yep.


I made significant progress toward the idea of having a performance farm
for PostgreSQL.  On my laptop today is a 14GB database with 1s resolution
latency traces for 663 days of pgbench time running 4 workloads across a
small bare metal farm of various operating systems and hardware classes.


Wow.

I can answer questions like "how long does a typical SSD take to execute 
an INSERT commit?" across my farm with SQL.


So, what is the answer? :-)

It's at the "works for me!" stage of development, and I thought this was 
the right time in the development cycle to start sharing improvement 
ideas from my work; thus the other submissions in progress I alluded to.


The logging feature is in an intermediate spot where validating it requires
light custom tooling that compares its output against known variables like
the system time.


Sure.


It doesn't quite have a performance component to it.


Hmmm, if you log all transactions it can becomes the performance 
bottleneck quite quickly:-)


Since this time logic detail is a well known portability minefield, I 
thought demanding that particular test was a pretty easy sell.


The test I recalled was removed at ad51c6f. Ok, it would not have caught 
the issue about timestamp (although it could have been improved to do so), 
but it would have caught the trivial one about the catchup loop in 
aggregate interval generating too many lines because of a forgotten 
conversion to µs.


--
Fabien.

Optionally automatically disable logical replication subscriptions on error

2021-06-17 Thread Mark Dilger
Hackers,

Logical replication apply workers for a subscription can easily get stuck in an 
infinite loop of attempting to apply a change, triggering an error (such as a 
constraint violation), exiting with an error written to the subscription worker 
log, and restarting.

As things currently stand, only superusers can create subscriptions.  Ongoing 
work to delegate superuser tasks to non-superusers creates the potential for 
even more errors to be triggered, specifically, errors where the apply worker 
does not have permission to make changes to the target table.

The attached patch makes it possible to create a subscription using a new 
subscription_parameter, "disable_on_error", such that rather than going into an 
infinite loop, the apply worker will catch errors and automatically disable the 
subscription, breaking the loop.  The new parameter defaults to false.  When 
false, the PG_TRY/PG_CATCH overhead is avoided, so for subscriptions which do 
not use the feature, there shouldn't be any change.  Users can manually clear 
the error after fixing the underlying issue with an ALTER SUBSCRIPTION .. 
ENABLE command. 
 
In addition to helping on production systems, this makes writing TAP tests 
involving error conditions simpler.  I originally ran into the motivation to 
write this patch when frustrated that TAP tests needed to parse the apply 
worker log file to determine whether permission failures were occurring and 
what they were.  It was also obnoxiously easy to have a test get stuck waiting 
for a permanently stuck subscription to catch up.  This helps with both issues.

I don't think this is quite ready for commit, but I'd like feedback if folks 
like this idea or want to suggest design changes.



v1-0001-Optionally-disabling-subscriptions-on-error.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: Centralizing protective copying of utility statements

2021-06-17 Thread Andres Freund
Hi,

On 2021-06-17 15:53:22 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Phew. Do we really want to break a quite significant number of
> > extensions this long after feature freeze? Since we already need to find
> > a backpatchable way to deal with the issue it seems like deferring the
> > API change to 15 might be prudent?
> 
> Uh, nobody ever promised that server-internal APIs are frozen as of beta1;
> that would be a horrid crimp on our ability to fix bugs during beta.

Sure, there's no promise. But I still think it's worth taking the amount
of breakage more into account than pre beta?

Greetings,

Andres Freund




Re: Add version macro to libpq-fe.h

2021-06-17 Thread Tom Lane
Robert Haas  writes:
> On Thu, Jun 17, 2021 at 2:34 PM Tom Lane  wrote:
>> ...  Just because the
>> version-number approach offloads work from us doesn't make it a good
>> idea, because the work doesn't vanish; it will be dumped in the laps
>> of packagers and end users.

> What work? Including an additional #define in a header file doesn't
> create any work for packagers or end-users that I can see. If
> anything, it seems easier for end-users. If you want a function that
> first appears in v16, just test whether the version number is >= 16.

You're omitting the step of "figure out which version the feature you
want to use appeared in".  A few years down the road, that'll get
harder than it might seem to be for a shiny new feature.

As for the packagers, this creates a requirement to include the right
version of the right file in the right sub-package.  Admittedly, if
we hack things so that the #define appears directly in libpq-fe.h through
some configure magic, then there's nothing extra for packagers to get
right; but if we put it anywhere else, we're adding ways for them to
get it wrong.

> On the other hand if we promise to add at least one #define to that
> file for each new release,

New libpq API feature, not every new release.  I don't really see
that that's much harder than, say, bumping catversion.

> ... then somebody's got to be like, oh, let's
> see, this function was added in v16, now which #define got added in
> that release ... hmm, let me go diff the branches in git ... how is
> that any better?

I repeat that you are evaluating this through the lens of how much
work it is for us as opposed to other people, and I fundamentally
disagree with that being the primary metric.

regards, tom lane




Re: Centralizing protective copying of utility statements

2021-06-17 Thread Tom Lane
Andres Freund  writes:
> Phew. Do we really want to break a quite significant number of
> extensions this long after feature freeze? Since we already need to find
> a backpatchable way to deal with the issue it seems like deferring the
> API change to 15 might be prudent?

Uh, nobody ever promised that server-internal APIs are frozen as of beta1;
that would be a horrid crimp on our ability to fix bugs during beta.
I've generally supposed that we don't start expecting that till RC stage.

regards, tom lane




Re: Add version macro to libpq-fe.h

2021-06-17 Thread Robert Haas
On Thu, Jun 17, 2021 at 2:34 PM Tom Lane  wrote:
> Interesting, but then you have to explain why this is the first time
> that somebody has asked for a version number in libpq-fe.h.  Maybe
> all those previous additions were indeed minor enough that the
> problem didn't come up.  (Another likely possibility, perhaps, is
> that people have been misusing the server version for this purpose,
> and have been lucky enough to not have that approach fail for them.)

Well, I don't know for sure. I sometimes find it difficult to account
for the behavior even of the small number of people I know fairly
well, let alone the rather large number of people I've never even met.
But if I had to speculate ... I think one contributing factor is that
the number of people who write applications that use a C-language
connector to the database isn't terribly large, because most
application developers are going to use a higher-level language like
Java or Python or something. And of those that do, I would guess most
of them aren't trying to write applications that work across versions,
and so the problem doesn't arise. Now I know that personally, I have
tried to do that on a number of occasions, and I've accidentally used
functions that only existed in newer versions on, err, most of those
occasions. I chose to handle that problem by either (a) rewriting the
code to use only functions that appeared in all relevant versions of
libpq or (b) upgrading all the versions of libpq in my environment to
something new enough that it would work. If I'd run into a problem
that couldn't be handled in either of those ways, I likely would have
handled it by (c) depending on some symbol that actually indicates the
server version number, and demanding that anybody compiling my code
use a packaging system where those versions were the same. But none of
those workarounds seem like a real argument against having a version
indicator for libpq proper.

> Anyway, I do not see why we can't establish a principle going forward
> that new additions to libpq's API should involve at least one macro,
> so that they can be checked for with #ifdefs.  Just because the
> version-number approach offloads work from us doesn't make it a good
> idea, because the work doesn't vanish; it will be dumped in the laps
> of packagers and end users.

What work? Including an additional #define in a header file doesn't
create any work for packagers or end-users that I can see. If
anything, it seems easier for end-users. If you want a function that
first appears in v16, just test whether the version number is >= 16.
On the other hand if we promise to add at least one #define to that
file for each new release, then somebody's got to be like, oh, let's
see, this function was added in v16, now which #define got added in
that release ... hmm, let me go diff the branches in git ... how is
that any better? Especially because it seems really likely that we
will fail to actually follow this principle consistently, in which
case they may find that #define that they need doesn't even exist.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Add version macro to libpq-fe.h

2021-06-17 Thread Justin Pryzby
On Thu, Jun 17, 2021 at 08:15:42PM +0100, Dagfinn Ilmari Mannsåker wrote:
> Tom Lane  writes:
> > Robert Haas  writes:
> >> I just went and looked at how exports.txt has evolved over the years.
> >> Since PostgreSQL 8.1, every release except for 9.4 and 11 added at
> >> least one new function to libpq. That means in 14 releases we've done
> >> something that might break someone's compile 12 times. Now maybe you
> >> want to try to argue that few of those changes are "major," but I
> >> don't know how that could be a principled argument. Every new function
> >> is something someone may want to use, and thus a potential compile
> >> break.
> >
> > Interesting, but then you have to explain why this is the first time
> > that somebody has asked for a version number in libpq-fe.h.  Maybe
> > all those previous additions were indeed minor enough that the
> > problem didn't come up.  (Another likely possibility, perhaps, is
> > that people have been misusing the server version for this purpose,
> > and have been lucky enough to not have that approach fail for them.)
> 
> FWIW, the perl DBD::Pg module extracts the version number from
> `pg_config --version` at build time, and uses that to define a

pygresql is also using pg_config --version:

setup.py-wanted = self.escaping_funcs
setup.py:supported = pg_version >= (9, 0)
--
setup.py-wanted = self.pqlib_info
setup.py:supported = pg_version >= (9, 1)
--
setup.py-wanted = self.ssl_info
setup.py:supported = pg_version >= (9, 5)
--
setup.py-wanted = self.memory_size
setup.py:supported = pg_version >= (12, 0)

-- 
Justin




Re: unnesting multirange data types

2021-06-17 Thread Alexander Korotkov
On Thu, Jun 17, 2021 at 7:54 PM Alexander Korotkov  wrote:
> On Wed, Jun 16, 2021 at 3:44 PM Alexander Korotkov  
> wrote:
> > On Tue, Jun 15, 2021 at 8:18 PM Tom Lane  wrote:
> > > Alexander Korotkov  writes:
> > > > I did run "check-world", but it passed for me.  Probably the same
> > > > reason it passed for some buildfarm animals...
> > >
> > > It looks to me like the proximate problem is that you should
> > > have taught pg_dump to skip these new auto-generated functions.
> >
> > Yes, it appears that pg_dump skips auto-generated functions, but
> > doesn't skip auto-generated casts.  It appears to be enough to tune
> > query getCasts() to resolve the issue.  The revised patch is attached.
>
> Here is the next revision of the patch: I've adjusted some comments.
>
> In my point of view this patch is not actually complex.  The reason
> why it colored buildfarm in red is purely my fault: I messed up with
> "make check-world" :(
>
> I've registered it on the commitfest application to make it go through
> commitfest.cputube.org.  My proposal is to re-push it once it goes
> through commitfest.cputube.org.

Patch successfully passed commitfest.cputube.org.  I'm going to
re-push it if there are no objections.

--
Regards,
Alexander Korotkov




Re: Centralizing protective copying of utility statements

2021-06-17 Thread Andres Freund
Hi,

On 2021-06-17 13:03:29 -0400, Tom Lane wrote:
> Here's a v2 that does it like that.  In this formulation, we're
> basically hoisting the responsibility for doing copyObject up into
> ProcessUtility from its direct children, which seems like a clearer
> way of thinking about what has to change.
> 
> We could avoid the side-effects on users of ProcessUtility_hook by
> doing the copy step in ProcessUtility itself rather than passing the
> flag on to standard_ProcessUtility.  But that sounded like a bit of a
> kluge.  Also, putting the work in standard_ProcessUtility preserves
> the option to redistribute it into the individual switch arms, in case
> anyone does find the extra copying overhead annoying for statement
> types that don't need it.  (I don't plan to do any such thing as part
> of this bug-fix patch, though.)
> 
> Barring objections, I'm going to push this into HEAD fairly soon,
> since beta2 is hard upon us.  Still thinking about which way to
> fix it in the back branches.

Phew. Do we really want to break a quite significant number of
extensions this long after feature freeze? Since we already need to find
a backpatchable way to deal with the issue it seems like deferring the
API change to 15 might be prudent?

Greetings,

Andres Freund




Re: Centralizing protective copying of utility statements

2021-06-17 Thread Andres Freund
Hi,

On 2021-06-16 21:39:49 -0400, Tom Lane wrote:
> Although this adds some overhead in the form of copying of
> utility node trees that won't actually mutate during execution,
> I think that won't be too bad because those trees tend to be
> small and hence cheap to copy.  The statements that can have
> a lot of substructure usually contain expression trees or the
> like, which do have to be copied for safety.  Moreover, we buy
> back a lot of cost by removing pointless copying when we're
> not executing on a cached plan.

Have you evaluated the cost in some form? I don't think it a relevant
cost for most utility statements, but there's a few exceptions that *do*
worry me. In particular, in some workloads transaction statements are
very frequent.

Greetings,

Andres Freund




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2021-06-17 Thread Robert Haas
On Thu, Jun 17, 2021 at 2:48 PM Andres Freund  wrote:
> Or do you mean that looking at the filesystem at all is bypassing shared
> buffers?

This is what I mean. I think we will end up in a better spot if we can
avoid doing that without creating too much ugliness elsewhere.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Add version macro to libpq-fe.h

2021-06-17 Thread Dagfinn Ilmari Mannsåker
Tom Lane  writes:

> Robert Haas  writes:
>> I just went and looked at how exports.txt has evolved over the years.
>> Since PostgreSQL 8.1, every release except for 9.4 and 11 added at
>> least one new function to libpq. That means in 14 releases we've done
>> something that might break someone's compile 12 times. Now maybe you
>> want to try to argue that few of those changes are "major," but I
>> don't know how that could be a principled argument. Every new function
>> is something someone may want to use, and thus a potential compile
>> break.
>
> Interesting, but then you have to explain why this is the first time
> that somebody has asked for a version number in libpq-fe.h.  Maybe
> all those previous additions were indeed minor enough that the
> problem didn't come up.  (Another likely possibility, perhaps, is
> that people have been misusing the server version for this purpose,
> and have been lucky enough to not have that approach fail for them.)

FWIW, the perl DBD::Pg module extracts the version number from
`pg_config --version` at build time, and uses that to define a
PGLIBVERSION which is used to define fatal fallbacks for a few
functions:

https://metacpan.org/release/TURNSTEP/DBD-Pg-3.15.0/source/dbdimp.c#L26-55

I have an unfinished branch which does similar for PQsetSingleRowMode,
(added in 9.2).

- ilmari




Re: Add version macro to libpq-fe.h

2021-06-17 Thread Andres Freund
Hi,

On 2021-06-17 14:41:40 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I'm not sure I understand why you think that exposing the version number
> > for libpq is such a bad idea?
> > I think it'd be reasonable to add a few more carefully chosen macros to
> > pg_config_ext.h.
> 
> The primary problem I've got with that is the risk of confusion
> between server and libpq version numbers.  In particular, if we do
> it like that then we've just totally screwed the Debian packagers.
> They will have to choose whether to install pg_config_ext.h from
> their server build or their libpq build.  Both choices are wrong,
> depending on what applications want to know.

That's a fair point.

However, we kind of already force them to do so - libpq already depends
on pg_config_ext.h, so they need to deal with the issue in some
form. It's not particularly likely to lead to a problem to have a
mismatching pg_config_ext.h, though, so maybe that's not too bad.

Our make install actually forsees the issue to some degree, and installs
pg_config_ext.h in two places, which then debian builds on:

$ apt-file search pg_config_ext.h
libpq-dev: /usr/include/postgresql/pg_config_ext.h
postgresql-server-dev-13: /usr/include/postgresql/13/server/pg_config_ext.h
postgresql-server-dev-14: /usr/include/postgresql/14/server/pg_config_ext.h

Greetings,

Andres Freund




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2021-06-17 Thread Andres Freund
Hi,

On 2021-06-17 14:22:52 -0400, Robert Haas wrote:
> On Thu, Jun 17, 2021 at 2:17 PM Andres Freund  wrote:
> > Adding a hacky special case implementation for cross-database relation
> > accesses that violates all kinds of assumptions (like holding a lock on
> > a relation when accessing it / pinning pages, processing relcache
> > invals, ...) doesn't seem like a good plan.
> 
> I agree that we don't want hacky code that violates assumptions, but
> bypassing shared_buffers is a bit hacky, too. Can't we lock the
> relations as we're copying them? We know pg_class's OID a fortiori,
> and we can find out all the other OIDs as we go.

We possibly can - but I'm not sure that won't end up violating some
other assumptions.


> I'm just thinking that the hackiness of going around shared_buffers
> feels irreducible, but maybe the hackiness in the patch is something
> that can be solved with more engineering.

Which bypassing of shared buffers are you talking about here? We'd still
have to solve a subset of the issues around locking (at least on the
source side), but I don't think we need to read pg_class contents to be
able to go through shared_buffers? As I suggested, we can use the init
fork presence to infer relpersistence?

Or do you mean that looking at the filesystem at all is bypassing shared
buffers?

Greetings,

Andres Freund




Re: Add version macro to libpq-fe.h

2021-06-17 Thread Tom Lane
Andres Freund  writes:
> I'm not sure I understand why you think that exposing the version number
> for libpq is such a bad idea?
> I think it'd be reasonable to add a few more carefully chosen macros to
> pg_config_ext.h.

The primary problem I've got with that is the risk of confusion
between server and libpq version numbers.  In particular, if we do
it like that then we've just totally screwed the Debian packagers.
They will have to choose whether to install pg_config_ext.h from
their server build or their libpq build.  Both choices are wrong,
depending on what applications want to know.

Now we could alternatively invent a libpq_version.h and hope that
packagers remember to install the right version of that.  But I
think it's a better user experience all around to do it the other
way.

regards, tom lane




Re: Add version macro to libpq-fe.h

2021-06-17 Thread Tom Lane
Robert Haas  writes:
> I just went and looked at how exports.txt has evolved over the years.
> Since PostgreSQL 8.1, every release except for 9.4 and 11 added at
> least one new function to libpq. That means in 14 releases we've done
> something that might break someone's compile 12 times. Now maybe you
> want to try to argue that few of those changes are "major," but I
> don't know how that could be a principled argument. Every new function
> is something someone may want to use, and thus a potential compile
> break.

Interesting, but then you have to explain why this is the first time
that somebody has asked for a version number in libpq-fe.h.  Maybe
all those previous additions were indeed minor enough that the
problem didn't come up.  (Another likely possibility, perhaps, is
that people have been misusing the server version for this purpose,
and have been lucky enough to not have that approach fail for them.)

Anyway, I do not see why we can't establish a principle going forward
that new additions to libpq's API should involve at least one macro,
so that they can be checked for with #ifdefs.  Just because the
version-number approach offloads work from us doesn't make it a good
idea, because the work doesn't vanish; it will be dumped in the laps
of packagers and end users.

BTW, by that principle, we should likely be adding a symbol
associated with the new tracing features, as well as one for
pipelining.  Or is it good enough to tell people they can
check "#ifdef PQTRACE_SUPPRESS_TIMESTAMPS" ?

regards, tom lane




Re: Add version macro to libpq-fe.h

2021-06-17 Thread Andres Freund
Hi,

On 2021-06-17 13:16:17 -0400, Tom Lane wrote:
> > Then again, why would pg_config.h be absent?
> 
> Likely because somebody decided it was a server-side include rather
> than an application-side include.

Which is the right call - pg_config.h can't easily be included in
applications that themselves use autoconf. Most problematically it
defines all the standard autotools PACKAGE_* macros that are guaranteed
to conflict in any autotools using project. There's obviously also a lot
of other defines in there that quite possibly could conflict.

We probably split pg_config.h at some point. Even for extensions it can
be annoying because pg_config.h is always included in server code, which
means that the extension can't easily include an autoheader style header
itself.


I'm not sure I understand why you think that exposing the version number
for libpq is such a bad idea?


I think it'd be reasonable to add a few more carefully chosen macros to
pg_config_ext.h.

Greetings,

Andres Freund




Re: Add version macro to libpq-fe.h

2021-06-17 Thread Robert Haas
On Thu, Jun 17, 2021 at 1:16 PM Tom Lane  wrote:
> We don't really add major new APIs to libpq very often, so I don't
> find that too compelling.  I do find it compelling that user code
> shouldn't embed knowledge about "feature X arrived in version Y".

I just went and looked at how exports.txt has evolved over the years.
Since PostgreSQL 8.1, every release except for 9.4 and 11 added at
least one new function to libpq. That means in 14 releases we've done
something that might break someone's compile 12 times. Now maybe you
want to try to argue that few of those changes are "major," but I
don't know how that could be a principled argument. Every new function
is something someone may want to use, and thus a potential compile
break.

Some of those releases also changed behavior. For example, version 10
allowed multi-host connection strings and URLs. People might want to
know about that sort of thing, too.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Add version macro to libpq-fe.h

2021-06-17 Thread Daniel Gustafsson
> On 17 Jun 2021, at 19:16, Tom Lane  wrote:

> A more critical point is that if pg_config is present, it'll likely
> contain the server version, which might not have a lot to do with the
> libpq version.  Debian's already shipping things in a way that decouples
> those, and I gather Red Hat is moving in that direction too.
> 
> I think what people really want to know is "if I try to call
> PQenterPipelineMode, will that compile?".

I think this is the most compelling argument for feature-based gating rather
than promote version based.  +1 on doing "#ifdef LIBPQ_HAS_PIPELINING" or along
those lines and try to be consistent going forward.  If we've truly failed to
do so in X releases time, then we can revisit this.

--
Daniel Gustafsson   https://vmware.com/





Re: Centralizing protective copying of utility statements

2021-06-17 Thread Julien Rouhaud
On Thu, Jun 17, 2021 at 01:03:29PM -0400, Tom Lane wrote:
> 
> Here's a v2 that does it like that.  In this formulation, we're
> basically hoisting the responsibility for doing copyObject up into
> ProcessUtility from its direct children, which seems like a clearer
> way of thinking about what has to change.

I agree that forcing an API break is better.  Just a nit:

+ * readOnlyTree: treat pstmt's node tree as read-only

Maybe it's because I'm not a native english speaker, or because it's quite
late here, but I don't find "treat as read-only" really clear.  I don't have a
concise better wording to suggest.

> Still thinking about which way to fix it in the back branches.

I'm +0.5 for a narrow fix, due to the possibility of unspotted similar problem
vs possibility of performance regression ratio.




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2021-06-17 Thread Robert Haas
On Thu, Jun 17, 2021 at 5:20 AM Heikki Linnakangas  wrote:
> You only need relpersistence if you want to use the buffer cache, right?
> I think that's a good argument for not using it.

I think the root of the problem with this feature is that it doesn't
go through shared_buffers, so in my opinion, it would be better if we
can make it all go through shared_buffers. It seems like you're
advocating a middle ground where half of the operation goes through
shared_buffers and the other half doesn't, but that sounds like
getting rid of half of the hack when we could have gotten rid of all
of it. I think things that don't go through shared_buffers are bad,
and we should be making an effort to get rid of them where we can
reasonably do so. I believe I've both introduced and fixed my share of
bugs that were caused by such cases, and I think the behavior of the
whole system would be a lot easier to reason about if we had fewer of
those, or none.

I can also think of at least one significant advantage of driving this
off the remote database's pg_class rather than the filesystem
contents. It's a known defect of PostgreSQL that if you create a table
and then crash, you leave behind a dead file that never gets removed.
If you now copy the database that contains that orphaned file, you
would ideally prefer not to copy that file, but if you do a copy based
on the filesystem contents, then you will. If you drive the copy off
of pg_class, you won't.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Patch for bug #17056 fast default on non-plain table

2021-06-17 Thread Tom Lane
Andrew Dunstan  writes:
> revised patch attached.

OK.  One other point is that in HEAD, you only need the hunk that
prevents atthasmissing from becoming incorrectly set.  The hacks
to cope with it being already wrong are only needed in the back
branches.  Since we already forced initdb for beta2, there will
not be any v14 installations in which pg_attribute contains
a wrong value.

regards, tom lane




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2021-06-17 Thread Robert Haas
On Wed, Jun 16, 2021 at 6:13 PM Andres Freund  wrote:
> I don't think the main issue is the speed of checkpointing itself? The reaoson
> to maintain the old paths is that the "new approach" is bloating WAL volume,
> no? Right now cloning a 1TB database costs a few hundred bytes of WAL and 
> about
> 1TB of write IO. With the proposed approach, the write volume approximately
> doubles, because there'll also be about 1TB in WAL.

This is a good point, but on the other hand, I think this smells a lot
like the wal_level=minimal optimization where we don't need to log
data being bulk-loaded into a table created in the same transaction if
wal_level=minimal. In theory, that optimization has a lot of value,
but in practice it gets a lot of bad press on this list, because (1)
sometimes doing the fsync is more expensive than writing the extra WAL
would have been and (2) most people want to run with
wal_level=replica/logical so it ends up being a code path that isn't
used much and is therefore more likely than average to have bugs
nobody's terribly interested in fixing (except Noah ... thanks Noah!).
If we add features in the future, lke TDE or perhaps incremental
backup, that rely on new pages getting new LSNs instead of recycled
ones, this may turn into the same kind of wart. And as with that
optimization, you're probably not even better off unless the database
is pretty big, and you might be worse off if you have to do fsyncs or
flush buffers synchronously. I'm not severely opposed to keeping both
methods around, so if that's really what people want to do, OK, but I
guess I wonder whether we're really going to be happy with that
decision down the road.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Add version macro to libpq-fe.h

2021-06-17 Thread Tom Lane
Robert Haas  writes:
> On Thu, Jun 17, 2021 at 9:34 AM Tom Lane  wrote:
>> I think putting a version number as such in there is a truly
>> horrid idea.  However, I could get behind adding a boolean flag
>> that says specifically whether the pipeline feature exists.

> I realize that this kind of feature-based testing is generally
> considered a best practice, but the problem is we're unlikely to do it
> consistently. If we put a version number in there, people will be able
> to test for whatever they want.

We don't really add major new APIs to libpq very often, so I don't
find that too compelling.  I do find it compelling that user code
shouldn't embed knowledge about "feature X arrived in version Y".

> Then again, why would pg_config.h be absent?

Likely because somebody decided it was a server-side include rather
than an application-side include.

A more critical point is that if pg_config is present, it'll likely
contain the server version, which might not have a lot to do with the
libpq version.  Debian's already shipping things in a way that decouples
those, and I gather Red Hat is moving in that direction too.

I think what people really want to know is "if I try to call
PQenterPipelineMode, will that compile?".  Comparing v13 and v14
libpq-fe.h, I see that there is a solution available now:
"#ifdef PQ_QUERY_PARAM_MAX_LIMIT".  But depending on that seems
like a bit of a hack, because I'm not sure that it's directly tied
to the pipelining feature.

regards, tom lane




Re: Centralizing protective copying of utility statements

2021-06-17 Thread Tom Lane
I wrote:
> What seems like a much safer answer is to make the API change noisy.
> That is, move the responsibility for actually calling copyObject
> into ProcessUtility itself, and add a bool parameter saying whether
> that needs to be done.  That forces all callers to consider the
> issue, and gives them the tool they need to make the right thing
> happen.

Here's a v2 that does it like that.  In this formulation, we're
basically hoisting the responsibility for doing copyObject up into
ProcessUtility from its direct children, which seems like a clearer
way of thinking about what has to change.

We could avoid the side-effects on users of ProcessUtility_hook by
doing the copy step in ProcessUtility itself rather than passing the
flag on to standard_ProcessUtility.  But that sounded like a bit of a
kluge.  Also, putting the work in standard_ProcessUtility preserves
the option to redistribute it into the individual switch arms, in case
anyone does find the extra copying overhead annoying for statement
types that don't need it.  (I don't plan to do any such thing as part
of this bug-fix patch, though.)

Barring objections, I'm going to push this into HEAD fairly soon,
since beta2 is hard upon us.  Still thinking about which way to
fix it in the back branches.

regards, tom lane

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 09433c8c96..07fe0e7cda 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -320,6 +320,7 @@ static void pgss_ExecutorRun(QueryDesc *queryDesc,
 static void pgss_ExecutorFinish(QueryDesc *queryDesc);
 static void pgss_ExecutorEnd(QueryDesc *queryDesc);
 static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
+bool readOnlyTree,
 ProcessUtilityContext context, ParamListInfo params,
 QueryEnvironment *queryEnv,
 DestReceiver *dest, QueryCompletion *qc);
@@ -1069,6 +1070,7 @@ pgss_ExecutorEnd(QueryDesc *queryDesc)
  */
 static void
 pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
+	bool readOnlyTree,
 	ProcessUtilityContext context,
 	ParamListInfo params, QueryEnvironment *queryEnv,
 	DestReceiver *dest, QueryCompletion *qc)
@@ -1126,11 +1128,11 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 		PG_TRY();
 		{
 			if (prev_ProcessUtility)
-prev_ProcessUtility(pstmt, queryString,
+prev_ProcessUtility(pstmt, queryString, readOnlyTree,
 	context, params, queryEnv,
 	dest, qc);
 			else
-standard_ProcessUtility(pstmt, queryString,
+standard_ProcessUtility(pstmt, queryString, readOnlyTree,
 		context, params, queryEnv,
 		dest, qc);
 		}
@@ -1176,11 +1178,11 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
 	else
 	{
 		if (prev_ProcessUtility)
-			prev_ProcessUtility(pstmt, queryString,
+			prev_ProcessUtility(pstmt, queryString, readOnlyTree,
 context, params, queryEnv,
 dest, qc);
 		else
-			standard_ProcessUtility(pstmt, queryString,
+			standard_ProcessUtility(pstmt, queryString, readOnlyTree,
 	context, params, queryEnv,
 	dest, qc);
 	}
diff --git a/contrib/sepgsql/hooks.c b/contrib/sepgsql/hooks.c
index 34de6158d6..19a3ffb7ff 100644
--- a/contrib/sepgsql/hooks.c
+++ b/contrib/sepgsql/hooks.c
@@ -313,6 +313,7 @@ sepgsql_exec_check_perms(List *rangeTabls, bool abort)
 static void
 sepgsql_utility_command(PlannedStmt *pstmt,
 		const char *queryString,
+		bool readOnlyTree,
 		ProcessUtilityContext context,
 		ParamListInfo params,
 		QueryEnvironment *queryEnv,
@@ -378,11 +379,11 @@ sepgsql_utility_command(PlannedStmt *pstmt,
 		}
 
 		if (next_ProcessUtility_hook)
-			(*next_ProcessUtility_hook) (pstmt, queryString,
+			(*next_ProcessUtility_hook) (pstmt, queryString, readOnlyTree,
 		 context, params, queryEnv,
 		 dest, qc);
 		else
-			standard_ProcessUtility(pstmt, queryString,
+			standard_ProcessUtility(pstmt, queryString, readOnlyTree,
 	context, params, queryEnv,
 	dest, qc);
 	}
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 89a4f8f810..b6eacd5baa 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -438,14 +438,8 @@ BeginCopyTo(ParseState *pstate,
 		/*
 		 * Run parse analysis and rewrite.  Note this also acquires sufficient
 		 * locks on the source table(s).
-		 *
-		 * Because the parser and planner tend to scribble on their input, we
-		 * make a preliminary copy of the source querytree.  This prevents
-		 * problems in the case that the COPY is in a portal or plpgsql
-		 * function and is executed repeatedly.  (See also the same hack in
-		 * DECLARE CURSOR and PREPARE.)  XXX FIXME someday.
 		 */
-		rewritten = pg_analyze_and_rewrite(copyObject(raw_query),
+		rewritten = 

Re: Add version macro to libpq-fe.h

2021-06-17 Thread Robert Haas
On Thu, Jun 17, 2021 at 9:34 AM Tom Lane  wrote:
> I think putting a version number as such in there is a truly
> horrid idea.  However, I could get behind adding a boolean flag
> that says specifically whether the pipeline feature exists.
> Then you'd do something like
>
> #ifdef LIBPQ_HAS_PIPELINING
>
> rather than embedding knowledge of exactly which release
> added that.

I realize that this kind of feature-based testing is generally
considered a best practice, but the problem is we're unlikely to do it
consistently. If we put a version number in there, people will be able
to test for whatever they want.

Then again, why would pg_config.h be absent?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Unresolved repliaction hang and stop problem.

2021-06-17 Thread Alvaro Herrera
On 2021-Jun-17, Kyotaro Horiguchi wrote:

> I don't see a call to hash_*seq*_search there. Instead, I see one in
> ReorderBufferCheckMemoryLimit().

Doh, of course -- I misread.

ReorderBufferCheckMemoryLimit is new in pg13 (cec2edfa7859) so now at
least we have a reason why this workload regresses in pg13 compared to
earlier releases.

Looking at the code, it does seem that increasing the memory limit as
Amit suggests might solve the issue.  Is that a practical workaround?

-- 
Álvaro Herrera   Valdivia, Chile




Re: unnesting multirange data types

2021-06-17 Thread Alexander Korotkov
On Wed, Jun 16, 2021 at 3:44 PM Alexander Korotkov  wrote:
> On Tue, Jun 15, 2021 at 8:18 PM Tom Lane  wrote:
> > Alexander Korotkov  writes:
> > > I did run "check-world", but it passed for me.  Probably the same
> > > reason it passed for some buildfarm animals...
> >
> > It looks to me like the proximate problem is that you should
> > have taught pg_dump to skip these new auto-generated functions.
>
> Yes, it appears that pg_dump skips auto-generated functions, but
> doesn't skip auto-generated casts.  It appears to be enough to tune
> query getCasts() to resolve the issue.  The revised patch is attached.

Here is the next revision of the patch: I've adjusted some comments.

In my point of view this patch is not actually complex.  The reason
why it colored buildfarm in red is purely my fault: I messed up with
"make check-world" :(

I've registered it on the commitfest application to make it go through
commitfest.cputube.org.  My proposal is to re-push it once it goes
through commitfest.cputube.org.

--
Regards,
Alexander Korotkov


multirange_unnest_cast_to_array-v6.patch
Description: Binary data


Re: Replication protocol doc fix

2021-06-17 Thread Robert Haas
On Wed, Jun 16, 2021 at 5:15 PM Jeff Davis  wrote:
> * A normal command, where you know that you've sent everything that you
> will send. In this case, the client needs to send the Sync message in
> order to get the ReadyForQuery message.
>
> * A command that initiates CopyIn/CopyBoth, where you are going to send
> more data after the command. In this case, sending the Sync eagerly is
> wrong, and you can't pipeline more queries in the middle of
> CopyIn/CopyBoth mode. Instead, the client should send Sync after
> receiving an ErrorResponse, or after sending a CopyDone/CopyFail
> (right?).

Well, that's one view of it. I would argue that the protocol ought not
to be designed in such a way that the client has to guess what
response the server might send back. How is it supposed to know? If
the user says, hey, go run this via the extended query protocol, we
don't want libpq to have to try to parse the query text and figure out
whether it looks COPY-ish. That's expensive, hacky, and might create
cross-version compatibility hazards if, say, a new replication command
that uses the copy protocol is added. Nor do we want the user to have
to specify what it thinks the server is going to do. Right now, we
have this odd situation where the client indeed does not try to guess
what the server will do and always send Sync, but the server acts as
if the client is doing what you propose here - only sending the
CopyDone/CopyFail at the end of everything associated with the
command.

> One thing I don't fully understand is what would happen if the client
> issued the Sync as the *first* message in an extended-protocol series.

I don't think that will break anything, because I think you can send a
Sync message to try to reestablish protocol synchronization whenever
you want. But I don't think it will accomplish anything either,
because presumably you've already got protocol synchronization at the
beginning of the sequence. The tricky part is getting resynchronized
after you've done some stuff.

> I attached a doc patch that hopefully clarifies this point as well as
> the weirdness around CopyIn/CopyBoth and the extended protocol. I
> reorganized the sections, as well.

On a casual read-through this seems pretty reasonable, but it
essentially documents that libpq is doing the wrong thing by sending
Sync unconditionally. As I say above, I disagree with that from a
philosophical perspective. Then again, unless we're willing to
redefine the wire protocol, I don't have an alternative to offer.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Fix for segfault in logical replication on master

2021-06-17 Thread Mark Dilger


> On Jun 17, 2021, at 6:40 AM, Amit Kapila  wrote:
> 
> I think such a problem won't happen because we are using historic
> snapshots in this context. We rely on that in a similar way in
> reorderbuffer.c, see ReorderBufferProcessTXN.

I think you are right, but that's the part I have trouble fully convincing 
myself is safe.  We certainly have an historic snapshot when we call 
RelationGetIndexList, but that has an early exit if the relation already has 
fields set, and we don't know if those fields were set before or after the 
historic snapshot was taken.  Within the context of the pluggable 
infrastructure, I think we're safe.  The only caller of 
RelationGetIdentityKeyBitmap() in core is logicalrep_write_attrs(), which is 
only called by logicalrep_write_rel(), which is only called by 
send_relation_and_attrs(), which is only called by maybe_send_schema(), which 
is called by pgoutput_change() and pgoutput_truncate(), both being callbacks in 
core's logical replication plugin.

ReorderBufferProcessTXN calls SetupHistoricSnapshot before opening the relation 
and then calling ReorderBufferApplyChange to invoke the plugin on that opened 
relation, so the relation's fields could not have been setup before the 
snapshot was taken.  Any other plugin would similarly get invoked after that 
same logic, so they'd be fine, too.  The problem would only be if somebody 
called RelationGetIdentityKeyBitmap() or one of its calling functions from 
outside that infrastructure.  Is that worth worrying about?  The function 
comments for those mention having an historic snapshot, and the Assert will 
catch if code doesn't have one, but I wonder how much of a trap for the unwary 
that is, considering that somebody might open the relation and lookup indexes 
for the relation before taking an historic snapshot and calling these functions.

I thought it was cheap enough to check that the relation we open is an index, 
because if it is not, we'll segfault when accessing fields of the 
relation->rd_index struct.  I wouldn't necessarily advocate doing any really 
expensive checks here, but a quick sanity check seemed worth the effort.  If 
you don't want to commit that part, I'm not going to put up a huge fuss.

Since neither of you knew why I was performing that check, it is clear that my 
code comment was insufficient.  I have added a more detailed code comment to 
explain the purpose of the check.  I also changed the first check to use 
RelationIsValid(), as suggested upthread.



v2-0001-Fixing-bug-in-logical-replication.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: Patch for bug #17056 fast default on non-plain table

2021-06-17 Thread Andrew Dunstan

On 6/17/21 11:13 AM, Andrew Dunstan wrote:
> On 6/17/21 11:05 AM, Tom Lane wrote:
>> Andrew Dunstan  writes:
>>> Here's a patch I propose to apply to fix this bug (See
>>> )
>> If I'm reading the code correctly, your change in RelationBuildTupleDesc
>> is scribbling directly on the disk buffer, which is surely not okay.
>> I don't understand why you need that at all given the other defenses
>> you added ... but if you need it, you have to modify the tuple AFTER
>> copying it.
>
> OK, will fix. I think we do need it (See Andres' comment in the bug
> thread). It should be a fairly simple fix.
>
>


revised patch attached.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

>From 06b073a01da5f3e69417ffb88fea1320c0e0a08b Mon Sep 17 00:00:00 2001
From: Andrew Dunstan 
Date: Thu, 17 Jun 2021 11:49:33 -0400
Subject: [PATCH] Don't set a fast default for anything but a plain table
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The fast default code added in Release 11 omitted to check that the
table a fast default was being added to was a plain table. Thus one
could be added to a foreign table, which predicably blows up. Here we
perform that check, and in addition if we encounter a missing value for
an attribute of something other than a plain table we ignore it.

Fixes bug #17056

Backpatch to release 11,

Reviewed by: Andres Freund, Álvaro Herrera and Tom Lane
---
 src/backend/catalog/heap.c | 10 +-
 src/backend/commands/tablecmds.c   |  5 +++--
 src/backend/utils/cache/relcache.c | 19 ++-
 src/test/regress/expected/fast_default.out | 19 +++
 src/test/regress/sql/fast_default.sql  | 14 ++
 5 files changed, 63 insertions(+), 4 deletions(-)

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index afa830d924..1293dc04ca 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -2161,6 +2161,13 @@ SetAttrMissing(Oid relid, char *attname, char *value)
 	/* lock the table the attribute belongs to */
 	tablerel = table_open(relid, AccessExclusiveLock);
 
+	/* Don't do anything unless it's a plain table */
+	if (tablerel->rd_rel->relkind != RELKIND_RELATION)
+	{
+		table_close(tablerel, AccessExclusiveLock);
+		return;
+	}
+
 	/* Lock the attribute row and get the data */
 	attrrel = table_open(AttributeRelationId, RowExclusiveLock);
 	atttup = SearchSysCacheAttName(relid, attname);
@@ -2287,7 +2294,8 @@ StoreAttrDefault(Relation rel, AttrNumber attnum,
 		valuesAtt[Anum_pg_attribute_atthasdef - 1] = true;
 		replacesAtt[Anum_pg_attribute_atthasdef - 1] = true;
 
-		if (add_column_mode && !attgenerated)
+		if (rel->rd_rel->relkind == RELKIND_RELATION  && add_column_mode &&
+			!attgenerated)
 		{
 			expr2 = expression_planner(expr2);
 			estate = CreateExecutorState();
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 028e8ac46b..2eef07e452 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -12215,9 +12215,10 @@ ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel,
 	/*
 	 * Here we go --- change the recorded column type and collation.  (Note
 	 * heapTup is a copy of the syscache entry, so okay to scribble on.) First
-	 * fix up the missing value if any.
+	 * fix up the missing value if any. There shouldn't be any missing values
+	 * for anything except plain tables, but if there are, ignore them.
 	 */
-	if (attTup->atthasmissing)
+	if (rel->rd_rel->relkind == RELKIND_RELATION  && attTup->atthasmissing)
 	{
 		Datum		missingval;
 		bool		missingNull;
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index fd05615e76..446cfcc4e8 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -551,6 +551,7 @@ RelationBuildTupleDesc(Relation relation)
 	{
 		Form_pg_attribute attp;
 		int			attnum;
+		boolatthasmissing;
 
 		attp = (Form_pg_attribute) GETSTRUCT(pg_attribute_tuple);
 
@@ -563,6 +564,22 @@ RelationBuildTupleDesc(Relation relation)
 			   attp,
 			   ATTRIBUTE_FIXED_PART_SIZE);
 
+		/*
+		 * Fix atthasmissing flag - it's only for plain tables. Others
+		 * should not have missing values set, but there may be some left from
+		 * before when we placed that check, so this code defensively ignores
+		 * such values.
+		 */
+		atthasmissing = attp->atthasmissing;
+		if (relation->rd_rel->relkind != RELKIND_RELATION && atthasmissing)
+		{
+			Form_pg_attribute nattp;
+
+			atthasmissing = false;
+			nattp = TupleDescAttr(relation->rd_att, attnum - 1);
+			nattp->atthasmissing = false;
+		}
+
 		/* Update constraint/default info */
 		if (attp->attnotnull)
 			constr->has_not_null = true;
@@ -572,7 +589,7 @@ RelationBuildTupleDesc(Relation relation)
 	

Re: pgbench logging broken by time logic changes

2021-06-17 Thread Fabien COELHO




Is there an identified issue beyond the concrete example Greg gave of
the timestamps?


AFAICS, there is a patch which fixes all known issues linked to pgbench 
logging. Whether there exists other issues is possible, but the "broken"
area was quite specific. There are also some TAP tests on pgbench which do 
catch issues.


--
Fabien.




Re: snapshot too old issues, first around wraparound and then more.

2021-06-17 Thread Stephen Frost
Greetings,

* Peter Geoghegan (p...@bowt.ie) wrote:
> On Wed, Jun 16, 2021 at 12:06 PM Andres Freund  wrote:
> > > I would think that it wouldn't really matter inside VACUUM -- it would
> > > only really need to be either an opportunistic pruning or an
> > > opportunistic index deletion thing -- probably both. Most of the time
> > > VACUUM doesn't seem to end up doing most of the work of removing
> > > garbage versions. It's mostly useful for "floating garbage", to use
> > > the proper GC memory management term.
> >
> > I don't fully agree with this. For one, there are workloads where VACUUM
> > removes the bulk of the dead tuples.
> 
> It's definitely much more important that VACUUM run often when non-HOT
> updates are the norm, and there are lots of them. But that's probably
> not going to be helped all that much by this technique anyway.

I don't follow this argument.  Surely there are many, many cases out
there where there's very few HOT updates but lots of non-HOT updates
which create lots of dead rows that can't currently be cleaned up if
there's a long running transaction hanging around.

> Mostly I'm just saying I'd start elsewhere and do heapam later. And
> probably do VACUUM itself last of all, if that usefully cut scope.

Not quite following what 'elsewhere' means here or what it would entail
if it involves cleaning up dead tuples but doesn't involve heapam.  I
can sort of follow the idea of working on the routine page-level cleanup
of tuples rather than VACUUM, except that would seem to require one to
deal with the complexities of ctid chains discussed below and therefore
be a larger and more complicated effort than if one were to tackle
VACUUM and perhaps in the first round cut scope by explicitly ignoring
ctid chains.

> > For another, slowing down VACUUM
> > can cause a slew of follow-on problems, so being careful to not
> > introduce new bottlenecks is important. And I don't think just doing
> > this optimization as part of on-access pruning is reasonable
> > solution. And it's not like making on-access pruning slower is
> > unproblematic either.

I don't know that slowing down VACUUM, which already goes purposefully
slow by default when run out of autovacuum, needs to really be stressed
over, particularly when what we're talking about here are CPU cycles.  I
do think it'd make sense to have a heuristic which decides if we're
going to put in the effort to try to do this kind of pruning.  That is-
if the global Xmin and the current transaction are only a few thousand
apart or something along those lines then don't bother, but if there's
been 100s of thousands of transactions then enable it (perhaps allowing
control over this or allowing users to explicitly ask VACUUM to 'work
harder' or such).

> I think that on-access pruning is much more important because it's the
> only hope we have of keeping the original heap page intact, in the
> sense that there are no non-HOT updates over time, though there may be
> many HOT updates. And no LP_DEAD items ever accumulate. It's not so
> much about cleaning up bloat as it is about *preserving* the heap
> pages in this sense.
> 
> If in the long run it's impossible to keep the page intact in this
> sense then we will still have most of our current problems. It might
> not make that much practical difference if we simply delay the problem
> -- we kinda have to prevent it entirely, at least for a given
> workload. So I'm mostly concerned about keeping things stable over
> time, at the level of individual pages.

I do think that's a worthwhile goal, but if we could get some kind of
cleanup happening, that strikes me as better than the nothing that we
have today.  Which side makes sense to tackle first is certainly a
discussion that could be had but I'd go for "do the simple thing first".

> > But as I said nearby, I think the hardest part is figuring out how to
> > deal with ctid chains, not the efficiency of the xid->visibility lookup
> > (or the collection of data necessary for that lookup).
> 
> Definitely true.

It strikes me that stressing over ctid chains, while certainly something
to consider, at this point is putting the cart before the horse in this
discussion- there's not much sense in it if we haven't actually got the
data collection piece figured out and working (and hopefully in a manner
that minimizes the overhead from it) and then worked out the logic to
figure out if a given tuple is actually visible to any running
transaction.  As I say above, it seems like it'd be a great win even if
it was initially only able to deal with 'routine'/non-chained cases and
only with VACUUM.

The kind of queue tables that I'm thinking of, at least, are ones like
what PgQ uses: https://github.com/pgq/pgq

Now, that already works around our lacking here by using TRUNCATE and
table rotation, but if we improved here then it'd potentially be able to
be rewritten to use routine DELETE's instead of TRUNCATE.  Even the
UPDATEs which are done to process a batch for a 

Re: Centralizing protective copying of utility statements

2021-06-17 Thread Tom Lane
I wrote:
> What I found is that there are only two places that call
> ProcessUtility with a statement that might be coming from the plan
> cache.  _SPI_execute_plan is always doing so, so it can just
> unconditionally copy the statement.  The other one is
> PortalRunUtility, which can examine the Portal to see if the
> parsetree came out of cache or not.  Having added copyObject
> calls there, we can get rid of the retail calls that exist
> in not-quite-enough utility statement execution routines.

In the light of morning, I'm not too pleased with this patch either.
It's essentially creating a silent API change for ProcessUtility.
I don't know whether there are any out-of-tree ProcessUtility
callers, but if there are, this could easily break them in a way
that basic testing might not catch.

What seems like a much safer answer is to make the API change noisy.
That is, move the responsibility for actually calling copyObject
into ProcessUtility itself, and add a bool parameter saying whether
that needs to be done.  That forces all callers to consider the
issue, and gives them the tool they need to make the right thing
happen.

However, this clearly is not a back-patchable approach.  I'm
thinking there are two plausible alternatives for the back branches:

1. Narrow fix as per my original patch for #17053.  Low cost,
but no protection against other bugs of the same ilk.

2. Still move the responsibility for calling copyObject into
ProcessUtility; but lacking the bool parameter, just do it
unconditionally.  Offers solid protection at some uncertain
performance cost.

I'm not terribly certain which way to go.  Thoughts?

regards, tom lane




Re: pgbench logging broken by time logic changes

2021-06-17 Thread Gregory Smith
On Wed, Jun 16, 2021 at 2:59 PM Fabien COELHO  wrote:

> I'm unhappy because I already added tap tests for time-sensitive features
> (-T and others, maybe logging aggregates, cannot remember), which have
> been removed because they could fail under some circonstances (eg very
> very very very slow hosts), or required some special handling (a few lines
> of code) in pgbench, and the net result of this is there is not a single
> test in place for some features:-(
>

I understand your struggle and I hope I was clear about two things:

-I am excited by all the progress made in pgbench, and this problem is an
integration loose end rather than a developer failure at any level.
-Doing better in this messy area takes a team that goes from development to
release management, and I had no right to complain unless I brought
resources to improve in the specific areas of the process that I want to be
better.

I think the only thing you and I disagree on is that you see a "first issue
in a corner case" where I see a process failure that is absolutely vital
for me to improve.  Since the reality is that I might be the best
positioned person to actually move said process forward in a meaningful
long-term way, I have every intention of applying pressure to the area
you're frustrated at.  Crunchy has a whole parallel review team to the
community one now focused on what our corporate and government customers
need for software process control and procedure compliance.  The primary
business problem I'm working on now is how to include performance review in
that mix.

I already know I need to re-engage with you over how I need min/max numbers
in the aggregate logging output to accomplish some valuable goals.  When I
get around to that this summer, I'd really enjoy talking with you a bit,
video call or something, about really any community topic you're frustrated
with.  I have a lot riding now on the productivity of the PostgreSQL hacker
community and I want everyone to succeed at the best goals.

There is no problem with proposing tests, the problem is that they are
> accepted, or if they are accepted then that they are not removed at the
> first small issue but rather fixed, or their limitations accepted, because
> testing time-sensitive features is not as simple as testing functional
> features.
>

For 2020 Crunchy gave me a sort of sabbatical year to research community
oriented benchmarking topics.  Having a self contained project in my home
turned out to be the perfect way to spend *that* wreck of a year.

I made significant progress toward the idea of having a performance farm
for PostgreSQL.  On my laptop today is a 14GB database with 1s resolution
latency traces for 663 days of pgbench time running 4 workloads across a
small bare metal farm of various operating systems and hardware classes.  I
can answer questions like "how long does a typical SSD take to execute an
INSERT commit?" across my farm with SQL.  It's at the "works for me!" stage
of development, and I thought this was the right time in the development
cycle to start sharing improvement ideas from my work; thus the other
submissions in progress I alluded to.

The logging feature is in an intermediate spot where validating it requires
light custom tooling that compares its output against known variables like
the system time.  It doesn't quite have a performance component to it.
Since this time logic detail is a well known portability minefield, I
thought demanding that particular test was a pretty easy sell.

That you in particular are frustrated here makes perfect sense to me.  I am
fresh and ready to carry this forward some distance, and I hope the outcome
makes you happy


Re: Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation

2021-06-17 Thread Matthias van de Meent
On Fri, 23 Apr 2021 at 12:45, Matthias van de Meent
 wrote:
>
> On Sat, 17 Apr 2021 at 01:05, Peter Geoghegan  wrote:
>
> > It would be convenient if the first patch could be treated
> > as an independent thing.
>
> Patch 0002 was the reason for writing 0001, and uses the performance
> improvements of 0001 to show it's worth. As such, I submitted them as
> a set. If you'd like, I could submit 0002 seperately?

For now, version 2 of the patchset to make MSVC and cfbot happy (only
fixes the compilation issues, no significant changes). I'll try to
benchmark the patches in this patchset (both 0001, and 0001+0002) in
the upcoming weekend.

Kind regards,

Matthias van de Meent.
From 3507ac6a110ad37a540db03342be8d7135fa7612 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent 
Date: Sat, 10 Apr 2021 19:01:50 +0200
Subject: [PATCH v2 1/2] Implement and use index tuple attribute iteration

index_getattr's fast-path only works for fixed-size prefixes with non-null
attrubutes, and null attributes, but for all other cases this is a O(n)
lookup. Because it is also often called in a loop for each attribute, we
can re-use the offset results from earlier attribute lookups, and using
that speed up this attribute lookup.
---
 src/backend/access/common/indextuple.c | 149 +
 src/backend/access/gist/gistutil.c |  32 --
 src/backend/access/nbtree/nbtsearch.c  |  11 +-
 src/backend/access/nbtree/nbtsort.c|  15 ++-
 src/backend/access/nbtree/nbtutils.c   |  75 -
 src/backend/utils/sort/tuplesort.c |  28 ++---
 src/include/access/itup.h  |  89 +++
 7 files changed, 339 insertions(+), 60 deletions(-)

diff --git a/src/backend/access/common/indextuple.c b/src/backend/access/common/indextuple.c
index 8df882da7a..f0bdc5669b 100644
--- a/src/backend/access/common/indextuple.c
+++ b/src/backend/access/common/indextuple.c
@@ -424,6 +424,155 @@ nocache_index_getattr(IndexTuple tup,
 	return fetchatt(TupleDescAttr(tupleDesc, attnum), tp + off);
 }
 
+/*
+ * Initiate an index attribute iterator to attribute attnum, 
+ * and return the corresponding datum.
+ *
+ * This is nearly the same as index_deform_tuple, except that this
+ * returns the internal state up to attnum, instead of populating the
+ * datum- and isnull-arrays
+ */
+void
+nocache_index_attiterinit(IndexTuple tup, AttrNumber attnum, TupleDesc tupleDesc, IAttrIterState iter)
+{
+	bool		hasnulls = IndexTupleHasNulls(tup);
+	int			curatt;
+	char	   *tp;/* ptr to tuple data */
+	int			off;			/* offset in tuple data */
+	bits8	   *bp;/* ptr to null bitmap in tuple */
+	bool		slow = false;	/* can we use/set attcacheoff? */
+	bool		null = false;
+
+	/* Assert to protect callers */
+	Assert(tupleDesc->natts <= INDEX_MAX_KEYS);
+	Assert(attnum <= tupleDesc->natts);
+	Assert(attnum > 0);
+
+	/* XXX "knows" t_bits are just after fixed tuple header! */
+	bp = (bits8 *) ((char *) tup + sizeof(IndexTupleData));
+
+	tp = (char *) tup + IndexInfoFindDataOffset(tup->t_info);
+	off = 0;
+
+	for (curatt = 0; curatt < attnum; curatt++)
+	{
+		Form_pg_attribute thisatt = TupleDescAttr(tupleDesc, curatt);
+
+		if (hasnulls && att_isnull(curatt, bp))
+		{
+			null = true;
+			slow = true;		/* can't use attcacheoff anymore */
+			continue;
+		}
+
+		null = false;
+
+		if (!slow && thisatt->attcacheoff >= 0)
+			off = thisatt->attcacheoff;
+		else if (thisatt->attlen == -1)
+		{
+			/*
+			 * We can only cache the offset for a varlena attribute if the
+			 * offset is already suitably aligned, so that there would be no
+			 * pad bytes in any case: then the offset will be valid for either
+			 * an aligned or unaligned value.
+			 */
+			if (!slow &&
+off == att_align_nominal(off, thisatt->attalign))
+thisatt->attcacheoff = off;
+			else
+			{
+off = att_align_pointer(off, thisatt->attalign, -1,
+		tp + off);
+slow = true;
+			}
+		}
+		else
+		{
+			/* not varlena, so safe to use att_align_nominal */
+			off = att_align_nominal(off, thisatt->attalign);
+
+			if (!slow)
+thisatt->attcacheoff = off;
+		}
+
+		off = att_addlength_pointer(off, thisatt->attlen, tp + off);
+
+		if (thisatt->attlen <= 0)
+			slow = true;		/* can't use attcacheoff anymore */
+	}
+
+	iter->isNull = null;
+	iter->offset = off;
+	iter->slow = slow;
+}
+
+Datum
+nocache_index_attiternext(IndexTuple tup, AttrNumber attnum, TupleDesc tupleDesc, IAttrIterState iter)
+{
+	bool		hasnulls = IndexTupleHasNulls(tup);
+	char	   *tp;/* ptr to tuple data */
+	bits8	   *bp;/* ptr to null bitmap in tuple */
+	Datum		datum;
+
+	Assert(tupleDesc->natts <= INDEX_MAX_KEYS);
+	Assert(attnum <= tupleDesc->natts);
+	Assert(attnum > 0);
+
+	bp = (bits8 *) ((char *) tup + sizeof(IndexTupleData));
+
+	tp = (char *) tup + IndexInfoFindDataOffset(tup->t_info);
+	Form_pg_attribute thisatt = TupleDescAttr(tupleDesc, attnum - 1);
+
+	if (hasnulls && att_isnull(attnum - 1, bp))
+	{
+		iter->isNull = true;
+		iter->slow = true;		/* can't 

Re: Patch for bug #17056 fast default on non-plain table

2021-06-17 Thread Andrew Dunstan


On 6/17/21 11:05 AM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> Here's a patch I propose to apply to fix this bug (See
>> )
> If I'm reading the code correctly, your change in RelationBuildTupleDesc
> is scribbling directly on the disk buffer, which is surely not okay.
> I don't understand why you need that at all given the other defenses
> you added ... but if you need it, you have to modify the tuple AFTER
> copying it.


OK, will fix. I think we do need it (See Andres' comment in the bug
thread). It should be a fairly simple fix.


Thanks for looking.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Patch for bug #17056 fast default on non-plain table

2021-06-17 Thread Tom Lane
Andrew Dunstan  writes:
> Here's a patch I propose to apply to fix this bug (See
> )

If I'm reading the code correctly, your change in RelationBuildTupleDesc
is scribbling directly on the disk buffer, which is surely not okay.
I don't understand why you need that at all given the other defenses
you added ... but if you need it, you have to modify the tuple AFTER
copying it.

regards, tom lane




Patch for bug #17056 fast default on non-plain table

2021-06-17 Thread Andrew Dunstan

Here's a patch I propose to apply to fix this bug (See
)


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

>From 5780fd5bf6878dcfd48289d80ca74c6d5b64 Mon Sep 17 00:00:00 2001
From: Andrew Dunstan 
Date: Thu, 17 Jun 2021 08:19:28 -0400
Subject: [PATCH] Don't set a fast default for anything but a plain table
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The fast default code added in Release 11 omitted to check that the
table a fast default was being added to was a plain table. Thus one
could be added to a foreign table, which predicably blows up. Here we
perform that check, and in addition if we encounter a missing value for
an attribute of something other than a plain table we ignore it.

Fixes bug #17056

Backpatch to release 11,

Reviewed by: Andres Freund  and Álvaro Herrera
---
 src/backend/catalog/heap.c | 10 +-
 src/backend/commands/tablecmds.c   |  5 +++--
 src/backend/utils/cache/relcache.c |  9 +
 src/test/regress/expected/fast_default.out | 19 +++
 src/test/regress/sql/fast_default.sql  | 14 ++
 5 files changed, 54 insertions(+), 3 deletions(-)

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index afa830d924..1293dc04ca 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -2161,6 +2161,13 @@ SetAttrMissing(Oid relid, char *attname, char *value)
 	/* lock the table the attribute belongs to */
 	tablerel = table_open(relid, AccessExclusiveLock);
 
+	/* Don't do anything unless it's a plain table */
+	if (tablerel->rd_rel->relkind != RELKIND_RELATION)
+	{
+		table_close(tablerel, AccessExclusiveLock);
+		return;
+	}
+
 	/* Lock the attribute row and get the data */
 	attrrel = table_open(AttributeRelationId, RowExclusiveLock);
 	atttup = SearchSysCacheAttName(relid, attname);
@@ -2287,7 +2294,8 @@ StoreAttrDefault(Relation rel, AttrNumber attnum,
 		valuesAtt[Anum_pg_attribute_atthasdef - 1] = true;
 		replacesAtt[Anum_pg_attribute_atthasdef - 1] = true;
 
-		if (add_column_mode && !attgenerated)
+		if (rel->rd_rel->relkind == RELKIND_RELATION  && add_column_mode &&
+			!attgenerated)
 		{
 			expr2 = expression_planner(expr2);
 			estate = CreateExecutorState();
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 028e8ac46b..2eef07e452 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -12215,9 +12215,10 @@ ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel,
 	/*
 	 * Here we go --- change the recorded column type and collation.  (Note
 	 * heapTup is a copy of the syscache entry, so okay to scribble on.) First
-	 * fix up the missing value if any.
+	 * fix up the missing value if any. There shouldn't be any missing values
+	 * for anything except plain tables, but if there are, ignore them.
 	 */
-	if (attTup->atthasmissing)
+	if (rel->rd_rel->relkind == RELKIND_RELATION  && attTup->atthasmissing)
 	{
 		Datum		missingval;
 		bool		missingNull;
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index fd05615e76..1d15671177 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -559,6 +559,15 @@ RelationBuildTupleDesc(Relation relation)
 			elog(ERROR, "invalid attribute number %d for relation \"%s\"",
  attp->attnum, RelationGetRelationName(relation));
 
+		/*
+		 * Fix atthasmissing flag - it's only for plain tables. Others
+		 * should not have missing values set, but there may be some left from
+		 * before when we placed that check, so this code defensively ignores
+		 * such values.
+		 */
+		if (relation->rd_rel->relkind != RELKIND_RELATION)
+			attp->atthasmissing = false;
+
 		memcpy(TupleDescAttr(relation->rd_att, attnum - 1),
 			   attp,
 			   ATTRIBUTE_FIXED_PART_SIZE);
diff --git a/src/test/regress/expected/fast_default.out b/src/test/regress/expected/fast_default.out
index 10bc5ff757..91f25717b5 100644
--- a/src/test/regress/expected/fast_default.out
+++ b/src/test/regress/expected/fast_default.out
@@ -797,7 +797,26 @@ SELECT * FROM t WHERE a IS NULL;
 (1 row)
 
 ROLLBACK;
+-- verify that a default set on a non-plain table doesn't set a missing
+-- value on the attribute
+CREATE FOREIGN DATA WRAPPER dummy;
+CREATE SERVER s0 FOREIGN DATA WRAPPER dummy;
+CREATE FOREIGN TABLE ft1 (c1 integer NOT NULL) SERVER s0;
+ALTER FOREIGN TABLE ft1 ADD COLUMN c8 integer DEFAULT 0;
+ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 TYPE char(10);
+SELECT count(*)
+  FROM pg_attribute
+  WHERE attrelid = 'ft1'::regclass AND
+(attmissingval IS NOT NULL OR atthasmissing);
+ count 
+---
+ 0
+(1 row)
+
 -- cleanup
+DROP FOREIGN TABLE ft1;
+DROP SERVER s0;
+DROP FOREIGN DATA WRAPPER dummy;
 DROP TABLE vtype;
 DROP TABLE vtype2;
 

Re: [bug?] Missed parallel safety checks, and wrong parallel safety

2021-06-17 Thread Robert Haas
On Thu, Jun 17, 2021 at 4:54 AM Amit Kapila  wrote:
> I have responded about heavy-weight locking stuff in my next email [1]
> and why I think the approach I mentioned will work. I don't deny that
> I might be missing something here.
>
> [1] - 
> https://www.postgresql.org/message-id/CAA4eK1%2BT2CWqp40YqYttDA1Skk7wK6yDrkCD5GZ80QGr5ze-6g%40mail.gmail.com

I mean I saw that but I don't see how it addresses the visibility
issue. There could be a relation that is not visible to your snapshot
and upon which AccessExclusiveLock is held which needs to be
invalidated. You can't lock it because it's AccessExclusiveLock'd
already.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Update maintenance_work_mem/autovacuum_work_mem to reflect the 1GB limitation with VACUUM

2021-06-17 Thread Greg Sabino Mullane
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

Latest patch looks fine to me, to be clear.

The new status of this patch is: Ready for Committer


Re: pgbench logging broken by time logic changes

2021-06-17 Thread Andrew Dunstan


On 6/17/21 8:49 AM, Thomas Munro wrote:
> On Thu, Jun 17, 2021 at 7:24 PM Fabien COELHO  wrote:
>>> Seems like it should be straightforward to fix, though, with fixes
>>> already proposed (though I haven't studied them yet, will do).
>> I think that fixing logging is simple enough, thus a revert is not
>> necessary.
> I prepared a draft revert patch for discussion, just in case it comes
> in handy.  This reverts "pgbench: Improve time logic.", but "pgbench:
> Synchronize client threads." remains (slightly rearranged).
>
> I'm on the fence TBH, I can see that it's really small things and it
> seems we have the patches, but it's late, late enough that
> benchmarking gurus are showing up to benchmark with it for real, and
> it's not great to be getting in the way of that with what is mostly
> refactoring work, so I don't think it would be a bad thing if we
> agreed to try again in 15.  


Is there an identified issue beyond the concrete example Greg gave of
the timestamps?


We are still fixing a few things with potentially far more impact than
anything in pgbench, so fixing this wouldn't bother me that much, as
long as we get it done for Beta2.


> (A number of arguments for and against
> moving pgbench out of the postgresql source tree and release cycle
> occur to me, but I guess that's a topic for another thread.)
>

Indeed.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Fix for segfault in logical replication on master

2021-06-17 Thread Amit Kapila
On Thu, Jun 17, 2021 at 6:50 PM Mark Dilger
 wrote:
>
> > On Jun 17, 2021, at 3:39 AM, osumi.takami...@fujitsu.com wrote:
> >
> > For the 1st check, isn't it better to use RelationIsValid() ?
>
> Yes, you are right.
>
> > Additionally, In what kind of actual scenario, did you think that
> > we come to the part to "log a complaint" ?
>
> The way that RelationGetIndexList assigns rd_replidindex to the Relation 
> seems to lack sufficient locking.  After scanning pg_index to find indexes 
> associated with the relation, pg_index is closed and the access share lock 
> released.  I couldn't prove to myself that by the time we use the 
> rd_replidindex field thus computed that it was safe to assume that the Oid 
> stored there still refers to an index.  The most likely problem would be that 
> the index has since been dropped in a concurrent transaction, but it also 
> seems just barely possible that the Oid has been reused and refers to 
> something else, a table perhaps.
>

I think such a problem won't happen because we are using historic
snapshots in this context. We rely on that in a similar way in
reorderbuffer.c, see ReorderBufferProcessTXN.

-- 
With Regards,
Amit Kapila.




Re: locking [user] catalog tables vs 2pc vs logical rep

2021-06-17 Thread Simon Riggs
On Thu, Jun 17, 2021 at 12:57 PM Amit Kapila  wrote:
>
> On Thu, Jun 17, 2021 at 4:27 PM Amit Kapila  wrote:
> >
> > On Thu, Jun 17, 2021 at 8:41 AM osumi.takami...@fujitsu.com
> >  wrote:
> >
> > Pushed!
> >
> [Responding to Simon's comments]
>
> > If LOCK and TRUNCATE is advised against on all user catalog tables, why 
> > would CLUSTER only apply to pg_class? Surely its locking
> > level is the same as LOCK?
> >
>
> Cluster will also apply to all user catalog tables. I think we can
> extend it slightly as we have mentioned for Lock.

OK, good.

> > The use of "[user]" isn't fully explained, so it might not be clear that 
> > this applies to both Postgres catalog tables and any user tables
> > that have been nominated as catalogs. Probably worth linking to the 
> > "Capabilities" section to explain.
> >
>
> Sounds reasonable.
>
> > It would be worth coalescing the following sections into a single page, 
> > since they are just a few lines each:
> > Streaming Replication Protocol Interface
> > Logical Decoding SQL Interface
> > System Catalogs Related to Logical Decoding
> >
>
> I think this is worth considering but we might want to discuss this as
> a separate change/patch.

Makes sense.

Thanks

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Add version macro to libpq-fe.h

2021-06-17 Thread Tom Lane
Boris Kolpackov  writes:
> I am making use of the new pipeline mode added to libpq in
> PostgreSQL 14. At the same time I would still like to support
> older libpq versions by not providing the extended functionality
> that depends on this mode.

Good point.

> The natural way to achieve this in C/C++ is to conditionally
> enable code that depends on the additional APIs based on the
> preprocessor macro. And I could easily do this if libpq-fe.h
> provided a macro containing its version.

I think putting a version number as such in there is a truly
horrid idea.  However, I could get behind adding a boolean flag
that says specifically whether the pipeline feature exists.
Then you'd do something like

#ifdef LIBPQ_HAS_PIPELINING

rather than embedding knowledge of exactly which release
added that.

regards, tom lane




Re: Fix for segfault in logical replication on master

2021-06-17 Thread Mark Dilger



> On Jun 17, 2021, at 3:39 AM, osumi.takami...@fujitsu.com wrote:
> 
> For the 1st check, isn't it better to use RelationIsValid() ?

Yes, you are right.

> Additionally, In what kind of actual scenario, did you think that
> we come to the part to "log a complaint" ?

The way that RelationGetIndexList assigns rd_replidindex to the Relation seems 
to lack sufficient locking.  After scanning pg_index to find indexes associated 
with the relation, pg_index is closed and the access share lock released.  I 
couldn't prove to myself that by the time we use the rd_replidindex field thus 
computed that it was safe to assume that the Oid stored there still refers to 
an index.  The most likely problem would be that the index has since been 
dropped in a concurrent transaction, but it also seems just barely possible 
that the Oid has been reused and refers to something else, a table perhaps.  
The check that I added is not completely bulletproof, because the new object 
reusing that Oid could be a different index, and we'd be none the wiser.  Do 
you think we should do something about that?  I felt the checks I put in place 
were very cheap and would work in almost all cases.  In any event, they seemed 
better than no checks, which is what we have now.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2021-06-17 Thread Dilip Kumar
On Thu, Jun 17, 2021 at 2:50 PM Heikki Linnakangas  wrote:
>
> On 17/06/2021 08:45, Dilip Kumar wrote:
> > Another problem with directly scanning the directory is, how we
> > are supposed to get the "relpersistence" which is stored in pg_class
> > tuple right?
>
> You only need relpersistence if you want to use the buffer cache, right?
> I think that's a good argument for not using it.

Yeah, that is the one place,  another place I am using it to decide
whether to WAL log the new page while writing into the target
relfilenode, if it is unlogged relation then I am not WAL logging. But
now, I think that is not the right idea, during creating the database
we should WAL log all the pages irrespective of whether the table is
logged or unlogged.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Issue with some calls to GetMultiXactIdMembers()

2021-06-17 Thread Heikki Linnakangas

On 16/06/2021 13:22, Greg Nancarrow wrote:

Hi,

There's a couple of calls to GetMultiXactIdMembers() in heapam.c which
subsequently pfree() the returned "members" pointer (pass-by-reference
parameter) if it's non-NULL.
However, there's an error return within GetMultiXactIdMembers() that
returns -1 without NULLing out "members", and the callers have simply
allocated that pointer on the stack without initializing it to NULL.
If that error condition were to ever happen, pfree() would likely be
called with a junk value.
Also note that there's another error return (about 15 lines further
down) in GetMultiXactIdMembers() that returns -1 and does NULL out
"members", so the handling is inconsistent.
The attached patch adds the NULLing out of the "members" pointer in
the first error case, to fix that and guard against possible pfree()
on error by such callers.


Thanks! Committed with a few additional cleanups.


I also note that there are other callers which pfree() "members" based
on the returned "nmembers" value, and this is also inconsistent.
Some pfree() "members" if nmembers>= 0, while others pfree() it if nmembers>0.
After looking at the code for a while, it looks like the "nmembers ==
0" case can't actually happen (right?). I decided not to mess with any
of the calling code.


I added an assertion that it never returns nmembers==0.

- Heikki




Re: Teaching users how they can get the most out of HOT in Postgres 14

2021-06-17 Thread Justin Pryzby
+  AUTO.  With OFF index
+  cleanup is disabled, with ON it is enabled,

OFF comma

+  bypassing index cleanup in cases where there is more than zero
+  dead tuples.

*are* more than zero
Or (preferably): "cases when there are no dead tuples at all"

+  If INDEX_CLEANUP is set to
+  OFF performance may suffer, because as the

OFF comma

+  removed until index cleanup is completed.  This option has no
+  effect for tables that do not have an index and is ignored if
+  the FULL option is used.

I'd say "tables that have no indexes,"

-- 
Justin




Re: when the startup process doesn't (logging startup delays)

2021-06-17 Thread Justin Pryzby
+ * Codes of the operations performed during startup process
+ */
+typedef enum StartupProcessOp
+{
+   SYNCFS_IN_PROGRESS,
+   FSYNC_IN_PROGRESS,
+   RECOVERY_IN_PROGRESS,
+   RESET_UNLOGGED_REL_IN_PROGRESS,
+   DUMMY,
+   SYNCFS_END,
+   FSYNC_END,
+   RECOVERY_END,
+   RESET_UNLOGGED_REL_END
+} StartupProcessOp;

What is DUMMY about ?  If you just want to separate the "start" from "end",
you could write:

/* codes for start of operations */
FSYNC_IN_PROGRESS
SYNCFS_IN_PROGRESS
...
/* codes for end of operations */
FSYNC_END
SYNCFS_END
...

Or group them together like:

FSYNC_IN_PROGRESS,
FSYNC_END,
SYNCFS_IN_PROGRESS, 
SYNCFS_END,
RECOVERY_IN_PROGRESS,
RECOVERY_END,
RESET_UNLOGGED_REL_IN_PROGRESS,
RESET_UNLOGGED_REL_END,

-- 
Justin




Re: pgbench logging broken by time logic changes

2021-06-17 Thread Thomas Munro
On Thu, Jun 17, 2021 at 7:24 PM Fabien COELHO  wrote:
> > Seems like it should be straightforward to fix, though, with fixes
> > already proposed (though I haven't studied them yet, will do).
>
> I think that fixing logging is simple enough, thus a revert is not
> necessary.

I prepared a draft revert patch for discussion, just in case it comes
in handy.  This reverts "pgbench: Improve time logic.", but "pgbench:
Synchronize client threads." remains (slightly rearranged).

I'm on the fence TBH, I can see that it's really small things and it
seems we have the patches, but it's late, late enough that
benchmarking gurus are showing up to benchmark with it for real, and
it's not great to be getting in the way of that with what is mostly
refactoring work, so I don't think it would be a bad thing if we
agreed to try again in 15.  (A number of arguments for and against
moving pgbench out of the postgresql source tree and release cycle
occur to me, but I guess that's a topic for another thread.)

> > [...] For that reason, I'm not super attached to that new pg_time_usec_t
> > stuff at all, and wouldn't be sad if we reverted that piece.
>
> Well, I was sooo happy to get rid of INSTR_TIME ugly and inefficient
> macros in pgbench… so anything looks better to me.

I don't love it either, in this code or the executor.  (I know you
also don't like the THREAD_CREATE etc macros.  I have something to
propose to improve that for 15)

> Note that Michaël is having a look at fixing pgbench logging issues.

Yeah I've been catching up with these threads.
From 07d2102ed7faee6c1ed7cbacf44b3c0d95d3e2d5 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 17 Jun 2021 22:36:56 +1200
Subject: [PATCH] Revert "pgbench: Improve time logic."

Discussion: https://postgr.es/m/CAHLJuCW_8Vpcr0%3Dt6O_gozrg3wXXWXZXDioYJd3NhvKriqgpfQ%40mail.gmail.com
---
 doc/src/sgml/ref/pgbench.sgml|  39 ++-
 src/bin/pgbench/pgbench.c| 443 ---
 src/tools/pgindent/typedefs.list |   1 -
 3 files changed, 248 insertions(+), 235 deletions(-)

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 0c60077e1f..eba6409148 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -58,10 +58,8 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 1/1
-latency average = 11.013 ms
-latency stddev = 7.351 ms
-initial connection time = 45.758 ms
-tps = 896.967014 (without initial connection time)
+tps = 85.184871 (including connections establishing)
+tps = 85.296346 (excluding connections establishing)
 
 
   The first six lines report some of the most important parameter
@@ -70,7 +68,8 @@ tps = 896.967014 (without initial connection time)
   and number of transactions per client); these will be equal unless the run
   failed before completion.  (In -T mode, only the actual
   number of transactions is printed.)
-  The last line reports the number of transactions per second.
+  The last two lines report the number of transactions per second,
+  figured with and without counting the time to start database sessions.
  
 
   
@@ -2339,22 +2338,22 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 1/1
-latency average = 10.870 ms
-latency stddev = 7.341 ms
-initial connection time = 30.954 ms
-tps = 907.949122 (without initial connection time)
+latency average = 15.844 ms
+latency stddev = 2.715 ms
+tps = 618.764555 (including connections establishing)
+tps = 622.977698 (excluding connections establishing)
 statement latencies in milliseconds:
-0.001  \set aid random(1, 10 * :scale)
-0.001  \set bid random(1, 1 * :scale)
-0.001  \set tid random(1, 10 * :scale)
-0.000  \set delta random(-5000, 5000)
-0.046  BEGIN;
-0.151  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
-0.107  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
-4.241  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
-5.245  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
-0.102  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
-0.974  END;
+0.002  \set aid random(1, 10 * :scale)
+0.005  \set bid random(1, 1 * :scale)
+0.002  \set tid random(1, 10 * :scale)
+0.001  \set delta random(-5000, 5000)
+0.326  BEGIN;
+0.603  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
+0.454  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+5.528  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
+7.335  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
+0.371  INSERT INTO pgbench_history (tid, bid, aid, 

Re: locking [user] catalog tables vs 2pc vs logical rep

2021-06-17 Thread Amit Kapila
On Thu, Jun 17, 2021 at 4:27 PM Amit Kapila  wrote:
>
> On Thu, Jun 17, 2021 at 8:41 AM osumi.takami...@fujitsu.com
>  wrote:
>
> Pushed!
>
[Responding to Simon's comments]

> If LOCK and TRUNCATE is advised against on all user catalog tables, why would 
> CLUSTER only apply to pg_class? Surely its locking
> level is the same as LOCK?
>

Cluster will also apply to all user catalog tables. I think we can
extend it slightly as we have mentioned for Lock.

> The use of "[user]" isn't fully explained, so it might not be clear that this 
> applies to both Postgres catalog tables and any user tables
> that have been nominated as catalogs. Probably worth linking to the 
> "Capabilities" section to explain.
>

Sounds reasonable.

> It would be worth coalescing the following sections into a single page, since 
> they are just a few lines each:
> Streaming Replication Protocol Interface
> Logical Decoding SQL Interface
> System Catalogs Related to Logical Decoding
>

I think this is worth considering but we might want to discuss this as
a separate change/patch.


-- 
With Regards,
Amit Kapila.




Re: pgbench logging broken by time logic changes

2021-06-17 Thread Fabien COELHO




I found you forgot to fix printProgressReport().


Indeed.


Also, according to the document, interval_start in Aggregated Logging
seems to be printed in seconds instead of ms.


Indeed. I'm unsure about what we should really want there, but for a beta 
bug fix I agree that it must simply comply to the old documented behavior.



The attached patch includes these fixes.


Thanks. Works for me.

--
Fabien.




Re: Unresolved repliaction hang and stop problem.

2021-06-17 Thread Amit Kapila
On Thu, Jun 17, 2021 at 7:28 AM Kyotaro Horiguchi
 wrote:
>
> At Wed, 16 Jun 2021 18:28:28 -0400, Alvaro Herrera  
> wrote in
> > On 2021-Jun-16, Ha Ka wrote:
> > # Children  Self  Command   Shared Object  Symbol
> > #       .  
> > ..
> > #
> >100.00% 0.00%  postgres  postgres   [.] exec_replication_command
> > |
> > ---exec_replication_command
> >WalSndLoop
> >XLogSendLogical
> >LogicalDecodingProcessRecord
> >|
> > --99.51%--ReorderBufferQueueChange
> >   |
> >   |--96.06%--hash_seq_search
> >   |
> >   |--1.78%--ReorderBufferSerializeTXN
> >   |  |
> >   |   --0.52%--errstart
> >   |
> >--0.76%--deregister_seq_scan
> >
> > What this tells me is that ReorderBufferQueueChange is spending a lot of
> > time doing hash_seq_search, which probably is the one in
> > ReorderBufferTXNByXid.
>
> I don't see a call to hash_*seq*_search there. Instead, I see one in
> ReorderBufferCheckMemoryLimit().
>
> If added an elog line in hash_seq_search that is visited only when it
> is called under ReorderBufferQueueChange, then set
> logical_decoding_work_mem to 64kB.
>
> Running the following query calls hash_seq_search (relatively) frequently.
>
> pub=# create table t1 (a int primary key);
> pub=# create publication p1 for table t1;
> sub=# create table t1 (a int primary key);
> sub=# create subscription s1 connection 'host=/tmp port=5432' publication p1;
> pub=# insert into t1 (select a from generate_series(0, ) a);
>
> The insert above makes 20 calls to ReorderBufferLargestTXN() (via
> ReorderBufferCheckmemoryLimit()), which loops over hash_seq_search.
>

If there are large transactions then someone can probably set
logical_decoding_work_mem to a higher value.

-- 
With Regards,
Amit Kapila.




Re: Fix for segfault in logical replication on master

2021-06-17 Thread Amit Kapila
On Thu, Jun 17, 2021 at 4:09 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Thursday, June 17, 2021 2:43 PM Mark Dilger  
> wrote:
> > > On Jun 16, 2021, at 10:19 PM, osumi.takami...@fujitsu.com wrote:
> > >
> > > I started to analyze your report and
> > > will reply after my idea to your modification is settled.
> >
> > Thank you.
> I'll share my first analysis.
>
> > In commit e7eea52b2d, you introduced a new function,
> > RelationGetIdentityKeyBitmap(), which uses some odd logic for determining
> > if a relation has a replica identity index.  That code segfaults under 
> > certain
> > conditions.  A test case to demonstrate that is attached.  Prior to patching
> > the code, this new test gets stuck waiting for replication to finish, which 
> > never
> > happens.  You have to break out of the test and check
> > tmp_check/log/021_no_replica_identity_publisher.log.
> >
> > I believe this bit of logic in src/backend/utils/cache/relcache.c:
> >
> >   indexDesc = RelationIdGetRelation(relation->rd_replidindex);
> >   for (i = 0; i < indexDesc->rd_index->indnatts; i++)
> >
> > is unsafe without further checks, also attached.
> You are absolutely right.
> I checked the crash scenario and reproduced the core,
> which has a null indexDesc. Also, rd_replidindex must be checked beforehand
> as you included in your patch, because having an index does not necessarily
> mean to have a replica identity index. As the proof of this, the oid of
> rd_replidindex in the scenario is 0. OTOH, I've confirmed your new test
> has passed with your fix.
>
> Also, your test looks essentially minimum(suitable for the problem) to me.
>
> * RelationGetIdentityKeyBitmap
> +   /*
> +* Fall out if the description is not for an index, suggesting
> +* affairs have changed since we looked.  XXX Should we log a
> +* complaint here?
> +*/
> +   if (!indexDesc)
> +   return NULL;
> +   if (!indexDesc->rd_index)
> +   {
> +   RelationClose(indexDesc);
> +   return NULL;
> +   }
> For the 1st check, isn't it better to use RelationIsValid() ?
> I agree with having the check itself of course, though.
>
> Additionally, In what kind of actual scenario, did you think that
> we come to the part to "log a complaint" ?
>

Yeah, I think that part is not required unless there is some case
where it can happen. I guess we might want to have an elog at that
place with a check like:
if (!RelationIsValid(relation))
elog(ERROR, "could not open relation with OID %u", relid);

-- 
With Regards,
Amit Kapila.




Re: PoC: Using Count-Min Sketch for join cardinality estimation

2021-06-17 Thread Tomas Vondra
On 6/17/21 2:23 AM, Tomas Vondra wrote:
> On 6/17/21 1:31 AM, John Naylor wrote:
>> On Wed, Jun 16, 2021 at 12:23 PM Tomas Vondra 
>> mailto:tomas.von...@enterprisedb.com>> 
>> wrote:
>>
>>  ...
>>
>> + * depth 8 and width 128 is sufficient for relative error ~1.5% with a
>> + * probability of approximately 99.6%
>>
>> Okay, so in the example above, we have a 99.6% probability of having 
>> less than 14.5M, but the actual error is much smaller. Do we know how 
>> tight the error bounds are with some lower probability?
>>
> 
> I don't recall such formula mentioned in any of the papers. The [3]
> paper has a proof in section 4.2, deriving the formula using Markov's
> inequality, but it's not obvious how to relax that (it's been ages since
> I last did things like this).
> 


I've been thinking about this a bit more, and while I still don't know
about a nice formula, I think I have a fairly good illustration that may
provide some intuition about the "typical" error. I'll talk about self
joins, because it makes some of the formulas simpler. But in principle
the same thing works for a join of two relations too.

Imagine you have a relation with N rows and D distinct values, and let's
build a count-min sketch on it, with W counters. So assuming d=1 for
simplicity, we have one set of counters with frequencies:

[f(1), f(2), ..., f(W)]

Now, the dot product effectively calculates

S = sum[ f(i)^2 for i in 1 ... W ]

which treats each counter as if it was just a single distinct value. But
we know that this is the upper boundary of the join size estimate,
because if we "split" a grou in any way, the join will always be lower:

(f(i) - X)^2 + X^2 <= f(i)^2

It's as if you have a rectangle - if you split a side in some way and
calculate the area of those smaller rectangles, it'll be smaller than
the are of the whole rectangle. To minimize the area, the parts need to
be of equal size, and for K parts it's

K * (f(i) / K) ^ 2 = f(i)^2 / K

This is the "minimum relative error" case assuming uniform distribution
of the data, I think. If there are D distinct values in the data set,
then for uniform distribution we can assume each counter represents
about D / W = K distinct values, and we can assume f(i) = N / W, so then

S = W * (N/W)^2 / (D/W) = N^2 / D

Of course, this is the exact cardinality of the join - the count-min
sketch simply multiplies the f(i) values, ignoring D entirely. But I
think this shows that the fewer distinct values are there and/or the
more skewed the data set is, the closer the estimate is to the actual
value. More uniform data sets with more distinct values will end up
closer to the (N^2 / D) size, and the sketch will significantly
over-estimate this.

So the question is whether to attempt to do any "custom" correction
based on number of distinct values (which I think the count-min sketch
does not do, because the papers assumes it's unknown).

I still don't know about an analytical solution, giving us smaller
confidence interval (with lower probability). But we could perform some
experiments, generating data sets with various data distribution and
then measure how accurate the adjusted estimate is.

But I think the fact that for "more skewed" data sets the estimate is
closer to reality is very interesting, and pretty much what we want.
It's probably better than just assuming uniformity on both sides, which
is what we do when we only have MCV on one side (that's a fairly common
case, I think).

The other interesting feature is that it *always* overestimates (at
least the default version, not the variant adjusted by distinct values).
That's probably good, because under-estimates are generally much more
dangerous than over-estimates (the execution often degrades pretty
quickly, not gracefully).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: when the startup process doesn't

2021-06-17 Thread Nitin Jadhav
> Since you agreed that SUSET was wrong, and PGC_POSTMASTER doesn't allow
> changing the value without restart, doesn't it follow that SIGHUP is what's
> wanted ?

Yes. I have done the changes in the attached patch.

Apart from this, I have done a few other changes to the patch. The
changes include

1. Renamed 'InitCurrentOperation' to 'InitStartupProgress()'.
2. Divided the functionality of 'LogStartupProgress()' into 2 parts.
One for logging the progress and the other to log the completion
information. The first part's function name remains as is and a new
function 'CloseStartupProgress()' added for the second part.
3. In case of any invalid operations found during logging of the
startup progress, throwing an error. This is not a concern unless the
developer makes a mistake.
4. Modified the 'StartupProcessOp' enums like 'FSYNC_START' to
'FSYNC_IN_PROGRESS' for better readability.
5. Updated the comments and some cosmetic changes.

Kindly share your comments.

Thanks & Regards,
Nitin Jadhav

On Thu, Jun 10, 2021 at 3:31 PM Justin Pryzby  wrote:
>
> On Thu, Jun 10, 2021 at 03:19:20PM +0530, Nitin Jadhav wrote:
> > > > > +   {"log_min_duration_startup_process", PGC_SUSET, 
> > > > > LOGGING_WHEN,
> > > > >
> > > > > I think it should be PGC_SIGHUP, to allow changing it during runtime.
> > > > > Obviously it has no effect except during startup, but the change will 
> > > > > be
> > > > > effective if the current process crashes.
> > > > > See also: 
> > > > > https://www.postgresql.org/message-id/20210526001359.ge3...@telsasoft.com
> > > >
> > > > I did not get exactly how it will change behaviour. In my
> > > > understanding, when the server restarts after a crash, it fetches the
> > > > value from the config file. So if there is any change that gets
> > > > affected. Kindly correct me if I am wrong.
> >
> > Sorry my understanding was wrong. But I'm not completely convinced
> > with the above description saying that the change will be effective if
> > the current process crashes.
> > AFAIK, whenever we set the GucContext less than PGC_SIGHUP (that is
> > either PGC_POSTMASTER or PGC_INTERNAL) then any change in the config
> > file will not get affected during restart after crash. If the
> > GucContext is greater than or equal to PGC_SIGHUP, then any change in
> > the config file will be changed once it receives the SIGHUP signal. So
> > it gets affected by a restart after a crash. So since the GucContext
> > set here is PGC_SUSET which is greater than PGC_SIGHUP, there is no
> > change in the behaviour wrt this point.
>
> Since you agreed that SUSET was wrong, and PGC_POSTMASTER doesn't allow
> changing the value without restart, doesn't it follow that SIGHUP is what's
> wanted ?
>
> > > I've triple checked the behavior using a patch I submitted for Thomas' 
> > > syncfs
> > > feature.  ALTER SYSTEM recovery_init_sync_method=syncfs was not picked up 
> > > when
> > > I sent SIGABRT.  But with my patch, if I also do SELECT pg_reload_conf(), 
> > > then
> > > a future crash uses syncfs.
> > > https://www.postgresql.org/message-id/20210526001359.ge3...@telsasoft.com
> >
> > The difference is since the behaviour is compared between
> > PGC_POSTMASTER and PGC_SIGHUP.
> >
> > > The GUC definitely isn't SUSET, since it's not useful to write in a 
> > > (super)
> > > user session SET log_min_duration_startup_process=123.
> > I agree with this. I may have to change this value as setting in a
> > user session is not at all useful. But I am confused between
> > PGC_POSTMASTER and PGC_SIGHUP. We should use PGC_SIGHUP if we would
> > like to allow the change during restart after a crash. Otherwise
> > PGC_POSTMASTER would be sufficient. Kindly share your thoughts.
> >
> > Thanks & Regards,
> > Nitin Jadhav
> >
> > On Wed, Jun 9, 2021 at 9:49 PM Justin Pryzby  wrote:
> > >
> > > On Wed, Jun 09, 2021 at 05:09:54PM +0530, Nitin Jadhav wrote:
> > > > > +   {"log_min_duration_startup_process", PGC_SUSET, 
> > > > > LOGGING_WHEN,
> > > > >
> > > > > I think it should be PGC_SIGHUP, to allow changing it during runtime.
> > > > > Obviously it has no effect except during startup, but the change will 
> > > > > be
> > > > > effective if the current process crashes.
> > > > > See also: 
> > > > > https://www.postgresql.org/message-id/20210526001359.ge3...@telsasoft.com
> > > >
> > > > I did not get exactly how it will change behaviour. In my
> > > > understanding, when the server restarts after a crash, it fetches the
> > > > value from the config file. So if there is any change that gets
> > > > affected. Kindly correct me if I am wrong.
> > >
> > > I don't think so.  I checked and SelectConfigFiles is called only once to 
> > > read
> > > config files and cmdline args.  And not called on restart_after_crash.
> > >
> > > The GUC definitely isn't SUSET, since it's not useful to write in a 
> > > (super)
> > > user session SET log_min_duration_startup_process=123.
> > >
> > > I've triple checked the 

Re: locking [user] catalog tables vs 2pc vs logical rep

2021-06-17 Thread Amit Kapila
On Thu, Jun 17, 2021 at 8:41 AM osumi.takami...@fujitsu.com
 wrote:
>
> On Wednesday, June 16, 2021 7:21 PM Amit Kapila  
> wrote:
> > On Mon, Jun 14, 2021 at 5:33 PM osumi.takami...@fujitsu.com
> >  wrote:
> > >
> > > On  Friday, June 11, 2021 2:13 PM  vignesh C 
> > wrote:
> > >
> > > Attached the patch-set that addressed those two comments.
> > >
> >
> > Few minor comments:
> > 1.
> > +  
> > +   
> > +Clustering pg_class in a transaction.
> >
> > Can we change above to: Perform CLUSTER on
> > pg_class in a transaction.
> Looks better.
>
> >
> > 2.
> > +  
> > +   
> > +Executing TRUNCATE on user catalog
> > table
> > in a transaction.
> > +   
> >
> > Square brackets are missing for user.
> Thanks for catching it. You are right.
>
>
> > 3.
> > +
> > + Overview
> > +
> > ..
> > ..
> > +
> > + Caveats
> > +
> >
> > Why are these required when we already have titles? I have seen other places
> > in the docs where we use titles for Overview and Caveats but they didn't 
> > have
> > similar usage.
> Sorry, this was a mistake. We didn't need those sections.
>
>
> > 4.
> > 
> > +Performing PREPARE TRANSACTION
> > after
> > LOCK
> > +command on pg_class and logical
> > decoding of published
> > +table.
> >
> > Can we change above to: PREPARE
> > TRANSACTION after LOCK
> > command on pg_class and allow logical
> > decoding of two-phase transactions.
> >
> > 5.
> > +   
> > +Clustering pg_trigger and decoding
> > PREPARE
> > +TRANSACTION, if any published table have a trigger
> > and any
> > +operations that will be decoded are conducted.
> > +   
> >
> > Can we change above to: PREPARE
> > TRANSACTION after CLUSTER
> > command on pg_trigger and allow logical
> > decoding of two-phase transactions. This will lead to deadlock only when
> > published table have a trigger.
> Yeah, I needed the nuance to turn on logical decoding of two-phase 
> transactions...
> Your above suggestions are much tidier and more accurate than mine.
> I agree with your all suggestions.
>

Pushed!

-- 
With Regards,
Amit Kapila.




RE: Fix for segfault in logical replication on master

2021-06-17 Thread osumi.takami...@fujitsu.com
On Thursday, June 17, 2021 2:43 PM Mark Dilger  
wrote:
> > On Jun 16, 2021, at 10:19 PM, osumi.takami...@fujitsu.com wrote:
> >
> > I started to analyze your report and
> > will reply after my idea to your modification is settled.
> 
> Thank you.
I'll share my first analysis.

> In commit e7eea52b2d, you introduced a new function,
> RelationGetIdentityKeyBitmap(), which uses some odd logic for determining
> if a relation has a replica identity index.  That code segfaults under certain
> conditions.  A test case to demonstrate that is attached.  Prior to patching
> the code, this new test gets stuck waiting for replication to finish, which 
> never
> happens.  You have to break out of the test and check
> tmp_check/log/021_no_replica_identity_publisher.log.
> 
> I believe this bit of logic in src/backend/utils/cache/relcache.c:
> 
>   indexDesc = RelationIdGetRelation(relation->rd_replidindex);
>   for (i = 0; i < indexDesc->rd_index->indnatts; i++)
> 
> is unsafe without further checks, also attached.
You are absolutely right.
I checked the crash scenario and reproduced the core,
which has a null indexDesc. Also, rd_replidindex must be checked beforehand
as you included in your patch, because having an index does not necessarily
mean to have a replica identity index. As the proof of this, the oid of
rd_replidindex in the scenario is 0. OTOH, I've confirmed your new test
has passed with your fix.

Also, your test looks essentially minimum(suitable for the problem) to me.

* RelationGetIdentityKeyBitmap
+   /*
+* Fall out if the description is not for an index, suggesting
+* affairs have changed since we looked.  XXX Should we log a
+* complaint here?
+*/
+   if (!indexDesc)
+   return NULL;
+   if (!indexDesc->rd_index)
+   {
+   RelationClose(indexDesc);
+   return NULL;
+   }
For the 1st check, isn't it better to use RelationIsValid() ?
I agree with having the check itself of course, though.

Additionally, In what kind of actual scenario, did you think that
we come to the part to "log a complaint" ?

I'm going to spend some time to analyze RelationGetIndexAttrBitmap next
to know if similar hazards can happen, because RelationGetIdentityKeyBitmap's 
logic
comes from the function mainly.


Best Regards,
Takamichi Osumi



Re: pgbench bug candidate: negative "initial connection time"

2021-06-17 Thread Fabien COELHO




Second, currently the *only* function to change the client state is
advanceConnectionState, so it can be checked there and any bug is only
there. We had issues before when several functions where doing updates,
and it was a mess to understand what was going on. I really want that it
stays that way, so I disagree with setting the state to ABORTED from
threadRun. Moreover I do not see that it brings a feature, so ISTM that it
is not an actual issue not to do it?


Ok. I gave up to change the state in threadRun. Instead, I changed the
condition at the end of bench, which enables to report abortion due to
socket errors.

+@@ -6480,7 +6490,7 @@ main(int argc, char **argv)
+ #endif/* 
ENABLE_THREAD_SAFETY */
+
+   for (int j = 0; j < thread->nstate; j++)
+-  if (thread->state[j].state == CSTATE_ABORTED)
++  if (thread->state[j].state != CSTATE_FINISHED)
+   exit_code = 2;
+
+   /* aggregate thread level stats */

Does this make sense?


Yes, definitely.

--
Fabien.




Re: Less compiler errors in pg_crc32c_sse42_choose.c

2021-06-17 Thread Julien Rouhaud
On Tue, Jun 15, 2021 at 03:04:11PM +0200, Nat! wrote:

Please don't send emails in html only.

Those includes are protected by some #ifdef which shouldn't be set unless
configure detects that that they're usable.  Do you have a different behavior?




Less compiler errors in pg_crc32c_sse42_choose.c

2021-06-17 Thread Nat!
I am just quoting the whole file here for simplicity, as its smallhttps://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/port/pg_crc32c_sse42_choose.c;h=0608e02011f7f5d8dbba7673a5ab4ba071d017a0;hb=e4f9737fac77a5cb03a84d1f4038d300ffd28afd.In line #43 the compiler errors out, if there is no cpuid.h or intrin.h available. As the code is supposed to fallback on a CRC software implementation, if SSE isn't available, returning false here instead, would be better. Ciao   Nat!```   1 /*-   2  *   3  * pg_crc32c_sse42_choose.c   4  *    Choose between Intel SSE 4.2 and software CRC-32C implementation.   5  *   6  * On first call, checks if the CPU we're running on supports Intel SSE   7  * 4.2. If it does, use the special SSE instructions for CRC-32C   8  * computation. Otherwise, fall back to the pure software implementation   9  * (slicing-by-8).  10  *  11  * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group  12  * Portions Copyright (c) 1994, Regents of the University of California  13  *  14  *  15  * IDENTIFICATION  16  *    src/port/pg_crc32c_sse42_choose.c  17  *  18  *-  19  */  20   21 #include "c.h"  22   23 #ifdef HAVE__GET_CPUID  24 #include   25 #endif  26   27 #ifdef HAVE__CPUID  28 #include   29 #endif  30   31 #include "port/pg_crc32c.h"  32   33 static bool  34 pg_crc32c_sse42_available(void)  35 {  36     unsigned int exx[4] = {0, 0, 0, 0};  37   38 #if defined(HAVE__GET_CPUID)  39     __get_cpuid(1, [0], [1], [2], [3]);  40 #elif defined(HAVE__CPUID)  41     __cpuid(exx, 1);  42 #else  43 #error cpuid instruction not available  44 #endif  45   46     return (exx[2] & (1 << 20)) != 0;   /* SSE 4.2 */  47 }  48   49 /*  50  * This gets called on the first call. It replaces the function pointer  51  * so that subsequent calls are routed directly to the chosen implementation.  52  */  53 static pg_crc32c  54 pg_comp_crc32c_choose(pg_crc32c crc, const void *data, size_t len)  55 {  56     if (pg_crc32c_sse42_available())  57         pg_comp_crc32c = pg_comp_crc32c_sse42;  58     else  59         pg_comp_crc32c = pg_comp_crc32c_sb8;  60   61     return pg_comp_crc32c(crc, data, len);  62 }  63   64 pg_crc32c   (*pg_comp_crc32c) (pg_crc32c crc, const void *data, size_t len) = pg_comp_crc32c_choose;```



Re: Decoding speculative insert with toast leaks memory

2021-06-17 Thread Amit Kapila
On Thu, Jun 17, 2021 at 1:35 PM Amit Langote  wrote:
>
> Hi Dilip,
>
> On Thu, Jun 17, 2021 at 4:45 PM Dilip Kumar  wrote:
> > On Thu, Jun 17, 2021 at 12:52 PM Amit Langote  
> > wrote:
> >
> > > Oh I missed that the problem report is for the PG13 branch.
> > >
> > > How about the attached patch then?
> > >
> > Looks good,
>
> Thanks for checking.
>
> > one minor comment, how about making the below comment,
> > same as on the head?
> >
> > - if (!found || !entry->replicate_valid)
> > + if (!found)
> > + {
> > + /*
> > + * Make the new entry valid enough for the callbacks to look at, in
> > + * case any of them get invoked during the more complicated
> > + * initialization steps below.
> > + */
> >
> > On head:
> > if (!found)
> > {
> > /* immediately make a new entry valid enough to satisfy callbacks */
>
> Agree it's better to have the same comment in both branches.
>
> Though, I think it should be "the new entry", not "a new entry".  I
> find the sentence I wrote a bit more enlightening, but I am fine with
> just fixing the aforementioned problem with the existing comment.
>
> I've updated the patch.  Also, attaching a patch for HEAD for the
> s/a/the change.  While at it, I also capitalized "immediately".
>

Your patch looks good to me as well. I would like to retain the
comment as it is from master for now. I'll do some testing and push it
tomorrow unless there are additional comments.

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2021-06-17 Thread Masahiko Sawada
On Thu, Jun 17, 2021 at 3:24 PM Masahiko Sawada  wrote:
>
> > Now, if this function is used by super
> > users then we can probably trust that they provide the XIDs that we
> > can trust to be skipped but OTOH making a restriction to allow these
> > functions to be used by superusers might restrict the usage of this
> > repair tool.
>
> If we specify the subscription id or name, maybe we can allow also the
> owner of subscription to do that operation?

Ah, the owner of the subscription must be superuser.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2021-06-17 Thread Heikki Linnakangas

On 17/06/2021 08:45, Dilip Kumar wrote:

Another problem with directly scanning the directory is, how we
are supposed to get the "relpersistence" which is stored in pg_class
tuple right?


You only need relpersistence if you want to use the buffer cache, right? 
I think that's a good argument for not using it.


- Heikki




Re: pgbench bug candidate: negative "initial connection time"

2021-06-17 Thread Yugo NAGATA
Hello Fabien,

On Thu, 17 Jun 2021 10:37:05 +0200 (CEST)
Fabien COELHO  wrote:

> > ).  Is this acceptable for you?
> 
> I disagree on two counts:
> 
> First, thread[0] should not appear.
> 
> Second, currently the *only* function to change the client state is 
> advanceConnectionState, so it can be checked there and any bug is only 
> there. We had issues before when several functions where doing updates, 
> and it was a mess to understand what was going on. I really want that it 
> stays that way, so I disagree with setting the state to ABORTED from 
> threadRun. Moreover I do not see that it brings a feature, so ISTM that it 
> is not an actual issue not to do it?

Ok. I gave up to change the state in threadRun. Instead, I changed the
condition at the end of bench, which enables to report abortion due to
socket errors.

+@@ -6480,7 +6490,7 @@ main(int argc, char **argv)
+ #endif/* 
ENABLE_THREAD_SAFETY */
+ 
+   for (int j = 0; j < thread->nstate; j++)
+-  if (thread->state[j].state == CSTATE_ABORTED)
++  if (thread->state[j].state != CSTATE_FINISHED)
+   exit_code = 2;
+ 
+   /* aggregate thread level stats */

Does this make sense?

-- 
Yugo NAGATA 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index d7479925cb..901f25aab7 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -3171,6 +3171,7 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 
 	if ((st->con = doConnect()) == NULL)
 	{
+		/* as the bench is already running, we do not abort */
 		pg_log_error("client %d aborted while establishing connection", st->id);
 		st->state = CSTATE_ABORTED;
 		break;
@@ -4405,7 +4406,10 @@ runInitSteps(const char *initialize_steps)
 	initPQExpBuffer();
 
 	if ((con = doConnect()) == NULL)
+	{
+		pg_log_fatal("connection for initialization failed");
 		exit(1);
+	}
 
 	setup_cancel_handler(NULL);
 	SetCancelConn(con);
@@ -6332,7 +6336,10 @@ main(int argc, char **argv)
 	/* opening connection... */
 	con = doConnect();
 	if (con == NULL)
+	{
+		pg_log_fatal("setup connection failed");
 		exit(1);
+	}
 
 	pg_log_debug("pghost: %s pgport: %s nclients: %d %s: %d dbName: %s",
  PQhost(con), PQport(con), nclients,
@@ -6437,7 +6444,10 @@ main(int argc, char **argv)
 
 	errno = THREAD_BARRIER_INIT(, nthreads);
 	if (errno != 0)
+	{
 		pg_log_fatal("could not initialize barrier: %m");
+		exit(1);
+	}
 
 #ifdef ENABLE_THREAD_SAFETY
 	/* start all threads but thread 0 which is executed directly later */
@@ -6480,7 +6490,7 @@ main(int argc, char **argv)
 #endif			/* ENABLE_THREAD_SAFETY */
 
 		for (int j = 0; j < thread->nstate; j++)
-			if (thread->state[j].state == CSTATE_ABORTED)
+			if (thread->state[j].state != CSTATE_FINISHED)
 exit_code = 2;
 
 		/* aggregate thread level stats */
@@ -6548,7 +6558,7 @@ threadRun(void *arg)
 		if (thread->logfile == NULL)
 		{
 			pg_log_fatal("could not open logfile \"%s\": %m", logpath);
-			goto done;
+			exit(1);
 		}
 	}
 
@@ -6572,16 +6582,10 @@ threadRun(void *arg)
 		{
 			if ((state[i].con = doConnect()) == NULL)
 			{
-/*
- * On connection failure, we meet the barrier here in place of
- * GO before proceeding to the "done" path which will cleanup,
- * so as to avoid locking the process.
- *
- * It is unclear whether it is worth doing anything rather
- * than coldly exiting with an error message.
- */
-THREAD_BARRIER_WAIT();
-goto done;
+/* coldly abort on connection failure */
+pg_log_fatal("cannot create connection for thread %d client %d",
+			 thread->tid, i);
+exit(1);
 			}
 		}
 
@@ -6707,7 +6711,7 @@ threadRun(void *arg)
 	continue;
 }
 /* must be something wrong */
-pg_log_fatal("%s() failed: %m", SOCKET_WAIT_METHOD);
+pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD);
 goto done;
 			}
 		}


  1   2   >