Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-08-03 Thread Martin Kalcher

Patch update without merge conflicts.

MartinFrom 0ecffcf3ed2eb59d045941b69bb86a34b93f3391 Mon Sep 17 00:00:00 2001
From: Martin Kalcher 
Date: Sun, 17 Jul 2022 18:06:04 +0200
Subject: [PATCH v3] Introduce array_shuffle() and array_sample()

* array_shuffle() shuffles the elements of an array.
* array_sample() chooses n elements from an array by random.

The new functions share its prng state with random() and thus interoperate
with setseed().
---
 doc/src/sgml/func.sgml  |  40 +-
 src/backend/utils/adt/Makefile  |   1 +
 src/backend/utils/adt/array_userfuncs.c | 176 
 src/backend/utils/adt/float.c   |  37 +
 src/backend/utils/adt/user_prng.c   |  87 
 src/include/catalog/pg_proc.dat |   6 +
 src/include/utils/user_prng.h   |  19 +++
 src/test/regress/expected/arrays.out|  66 +
 src/test/regress/sql/arrays.sql |  16 +++
 9 files changed, 413 insertions(+), 35 deletions(-)
 create mode 100644 src/backend/utils/adt/user_prng.c
 create mode 100644 src/include/utils/user_prng.h

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 053d4dc650..ef7c001fe7 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1820,7 +1820,8 @@ repeat('Pg', 4) PgPgPgPg
 void


-Sets the seed for subsequent random() calls;
+Sets the seed for subsequent calls to random functions, like
+random() or array_shuffle();
 argument must be between -1.0 and 1.0, inclusive


@@ -1838,7 +1839,8 @@ repeat('Pg', 4) PgPgPgPg
applications; see the  module for a more
secure alternative.
If setseed() is called, the series of results of
-   subsequent random() calls in the current session
+   subsequent calls to random functions, like random() or
+   array_shuffle(), in the current session
can be repeated by re-issuing setseed() with the same
argument.
Without any prior setseed() call in the same
@@ -19398,6 +19400,40 @@ SELECT NULLIF(value, '(none)') ...

   
 
+  
+   
+
+ array_sample
+
+array_sample ( array anyarray, n integer )
+anyarray
+   
+   
+Returns n randomly chosen elements from array in selection order.
+   
+   
+array_sample(ARRAY[[1,2],[3,4],[5,6]], 2)
+{{5,6},{1,2}}
+   
+  
+
+  
+   
+
+ array_shuffle
+
+array_shuffle ( anyarray )
+anyarray
+   
+   
+Shuffles the first dimension of the array.
+   
+   
+array_shuffle(ARRAY[[1,2],[3,4],[5,6]])
+{{5,6},{1,2},{3,4}}
+   
+  
+
   

 
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 7c722ea2ce..42b65e58bb 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -109,6 +109,7 @@ OBJS = \
 	tsvector.o \
 	tsvector_op.o \
 	tsvector_parser.o \
+	user_prng.o \
 	uuid.o \
 	varbit.o \
 	varchar.o \
diff --git a/src/backend/utils/adt/array_userfuncs.c b/src/backend/utils/adt/array_userfuncs.c
index ca70590d7d..c4a2117df7 100644
--- a/src/backend/utils/adt/array_userfuncs.c
+++ b/src/backend/utils/adt/array_userfuncs.c
@@ -17,6 +17,7 @@
 #include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/user_prng.h"
 #include "utils/typcache.h"
 
 
@@ -902,3 +903,178 @@ array_positions(PG_FUNCTION_ARGS)
 
 	PG_RETURN_DATUM(makeArrayResult(astate, CurrentMemoryContext));
 }
+
+/*
+ * Produce array with n randomly chosen items from given array in random order.
+ *
+ * array: array object to examine (must not be NULL)
+ * n: number of items (must not be greater than the size of the arrays first dimension)
+ * elmtyp, elmlen, elmbyval, elmalign: info for the datatype of the items
+ *
+ * NOTE: it would be cleaner to look up the elmlen/elmbval/elmalign info
+ * from the system catalogs, given the elmtyp. However, the caller is
+ * in a better position to cache this info across multiple uses, or even
+ * to hard-wire values if the element type is hard-wired.
+ */
+static ArrayType *
+array_shuffle_n(ArrayType *array, int n,
+Oid elmtyp, int elmlen, bool elmbyval, char elmalign)
+{
+	ArrayType  *result;
+	int			ndim,
+			   *dims,
+			   *lbs,
+rdims[MAXDIM],
+nelm,
+nitem,
+i,
+j,
+k;
+	Datum		elm,
+			   *elms,
+			   *ielms,
+			   *jelms;
+	bool		nul,
+			   *nuls,
+			   *inuls,
+			   *jnuls;
+
+	ndim = ARR_NDIM(array);
+	dims = ARR_DIMS(array);
+	lbs = ARR_LBOUND(array);
+
+	if (ndim < 1 || dims[0] < 1 || n < 1)
+		return construct_empty_array(elmtyp);
+
+	deconstruct_array(array, elmtyp, elmlen, elmbyval, elmalign,
+	  , , );
+
+	nitem = dims[0];			/* total number of items */
+	nelm /= nitem;/* number of elements per item */
+
+	ielms = elms;
+	inuls = nuls;
+
+	/*
+	 * Shuffle array using Fisher-Yates 

Re: doc: mentioned CREATE+ATTACH PARTITION as an alternative to CREATE TABLE..PARTITION OF

2022-08-03 Thread Robert Treat
On Mon, Jul 18, 2022 at 10:39 AM Andrew Dunstan  wrote:
> On 2022-07-18 Mo 10:33, Justin Pryzby wrote:
> > It's easy to use CREATE TABLE..LIKE + ALTER..ATTACH PARTITION to avoid
> > acquiring a strong lock when creating a new partition.
> > But it's also easy to forget.
> >
> > commit 76c0d1198cf2908423b321cd3340d296cb668c8e
> > Author: Justin Pryzby 
> > Date:   Mon Jul 18 09:24:55 2022 -0500
> >
> > doc: mention CREATE+ATTACH PARTITION as an alternative to CREATE 
> > TABLE..PARTITION OF
> >
> > See also: 898e5e3290a72d288923260143930fb32036c00c
> > Should backpatch to v12
> >
> > diff --git a/doc/src/sgml/ref/create_table.sgml 
> > b/doc/src/sgml/ref/create_table.sgml
> > index 6bbf15ed1a4..db7d8710bae 100644
> > --- a/doc/src/sgml/ref/create_table.sgml
> > +++ b/doc/src/sgml/ref/create_table.sgml
> > @@ -619,6 +619,16 @@ WITH ( MODULUS  > class="parameter">numeric_literal, REM
> >with DROP TABLE requires taking an ACCESS
> >EXCLUSIVE lock on the parent table.
> >   
> > +
> > + 
> > +  Note that creating a partition acquires an ACCESS
> > +  EXCLUSIVE lock on the parent table.
> > +  It may be preferable to first CREATE a separate table and then 
> > ATTACH it,
> > +  which does not require as strong of a lock.
> > +  See ATTACH 
> > PARTITION
> > +  and  for more information.
> > + 
> > +
> >  
> > 
> >
>
> Style nitpick.
>
> I would prefer "does not require as strong a lock."
>
FWIW, this is also proper grammar as well.

After reading this again, it isn't clear to me that this advice would
be more appropriately placed into Section 5.11, aka
https://www.postgresql.org/docs/current/ddl-partitioning.html, but in
lieu of a specific suggestion for where to place it there (I haven't
settled on one yet), IMHO, I think the first sentence of the suggested
change should be rewritten as:


Note that creating a partition using PARTITION OF
requires taking an ACCESS EXCLUSIVE lock on the parent table.
It may be preferable to first CREATE a separate table...


Robert Treat
https://xzilla.net




Re: support for SSE2 intrinsics

2022-08-03 Thread Masahiko Sawada
Hi,

On Wed, Aug 3, 2022 at 2:01 PM John Naylor  wrote:
>
>
> On Tue, Aug 2, 2022 at 11:53 PM Nathan Bossart  
> wrote:
> > I did a bit of cross-checking, and AFAICT this is a reasonable starting
> > point.  emmintrin.h appears to be sufficient for one of my patches that
> > makes use of SSE2 instructions.  That being said, I imagine it'll be
> > especially important to keep an eye on the buildfarm when this change is
> > committed.
>
> Thanks for checking! Here's a concrete patch for testing.

I also think it's a good start. There is a typo in the commit message:

s/hepler/helper/

The rest looks good to me.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Cygwin cleanup

2022-08-03 Thread Tom Lane
Thomas Munro  writes:
> It may be madness to try to work around this, but I wonder if we could
> use a static local variable that we update with atomic compare
> exhange, inside PG_SIGNAL_HANDLER_ENTRY(), and
> PG_SIGNAL_HANDLER_EXIT() macros that do nothing on every other system.
> On entry, if you can do 0->1 it means you are allowed to run the
> function.  If it's non-zero, set n->n+1 and return immediately: signal
> blocked, but queued for later.  On exit, you CAS n->0.  If n was > 1,
> then you have to jump back to the top and run the function body again.

And ... we're expending all this effort for what exactly?

regards, tom lane




Re: Cygwin cleanup

2022-08-03 Thread Thomas Munro
On Thu, Aug 4, 2022 at 4:16 PM Thomas Munro  wrote:
> On Thu, Aug 4, 2022 at 3:38 PM Justin Pryzby  wrote:
> > [train wreck]
>
> Oh my, so I'm getting the impression we might actually be totally
> unstable on Cygwin.  Which surprises me because ... wait a minute ...
> lorikeet isn't even running most of the tests.  So... we don't really
> know the degree to which any of this works at all?

Hmm, it's possible that all these failures are just new-to-me effects
of the known bug.  Certainly the assertion failures are of the usual
type, and I think it might be possible for the weird parallel query
failure to be explained by the postmaster forking extra phantom child
processes.

It may be madness to try to work around this, but I wonder if we could
use a static local variable that we update with atomic compare
exhange, inside PG_SIGNAL_HANDLER_ENTRY(), and
PG_SIGNAL_HANDLER_EXIT() macros that do nothing on every other system.
On entry, if you can do 0->1 it means you are allowed to run the
function.  If it's non-zero, set n->n+1 and return immediately: signal
blocked, but queued for later.  On exit, you CAS n->0.  If n was > 1,
then you have to jump back to the top and run the function body again.




Re: Cleaning up historical portability baggage

2022-08-03 Thread Thomas Munro
On Thu, Aug 4, 2022 at 3:43 PM Andres Freund  wrote:
> > We retain a HAVE_SHM_OPEN macro, because it's clearer to readers than
> > something like !defined(WIN32).
>
> I don't like these. I don't find them clearer - if we really just assume this
> to be the case on windows, it's easier to understand the checks if they talk
> about windows rather than having to know whether this specific check just
> applies to windows or potentially an unspecified separate set of systems.
>
> But I guess I should complain upthread...

Thanks for reviewing.

For this point, I'm planning to commit with those "vestigial" macros
that Tom asked for, and then we can argue about removing them
separately later.




Re: Generalize ereport_startup_progress infrastructure

2022-08-03 Thread Bharath Rupireddy
On Wed, Aug 3, 2022 at 12:11 AM Robert Haas  wrote:
>
> On Tue, Aug 2, 2022 at 3:25 AM Bharath Rupireddy
>  wrote:
> > ereport_startup_progress infrastructure added by commit 9ce346e [1]
> > will be super-useful for reporting progress of any long-running server
> > operations, not just the startup process operations. For instance,
> > postmaster can use it for reporting progress of temp file and temp
> > relation file removals [2], checkpointer can use it for reporting
> > progress of snapshot or mapping file processing or even WAL file
> > processing and so on. And I'm sure there can be many places in the
> > code where we have while or for loops which can, at times, take a long
> > time to finish and having a log message there would definitely help.
> >
> > Here's an attempt to generalize the ereport_startup_progress
> > infrastructure. The attached v1 patch places the code in elog.c/.h,
> > renames associated functions and variables, something like
> > ereport_startup_progress to ereport_progress,
> > log_startup_progress_interval to log_progress_report_interval and so
> > on.
>
> I'm not averse to reusing this infrastructure in other places, but I
> doubt we'd want all of those places to be controlled by a single GUC,
> especially because that GUC is also the on/off switch for the feature.

Thanks Robert! How about we tweak the function a bit -
begin_progress_report_phase(int timeout), so that each process can use
their own timeout interval? In  this case, do we want to retain
log_startup_progress_interval as-is specific to the startup process?
If yes, other processes might come up with their own GUCs (if they
don't want to use hard-coded timeouts) similar to
log_startup_progress_interval, which isn't the right way IMO.

I think the notion of ereport_progress feature being disabled when the
timeout is 0, makes sense to me at least.

On the flip side, what if we just have a single GUC
log_progress_report_interval (as proposed in the v1 patch)? Do we ever
want different processes to emit progress report messages at different
frequencies? Well, I can think of the startup process during standby
recovery needing to emit recovery progress report messages at a much
lower frequency than the startup process during the crash recovery.
Again, controlling the frequencies with different GUCs isn't the way
forward. But we can do something like: process 1 emits messages with a
frequency of 2*log_progress_report_interval, process 2 with  a
frequency 4*log_progress_report_interval and so on without needing
additional GUCs.

Thoughts?

--
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/




Re: Cygwin cleanup

2022-08-03 Thread Andres Freund
Hi,

On 2022-08-04 16:16:06 +1200, Thomas Munro wrote:
> Ok, that's slightly reassuring, so maybe we *can* fix this, but I'm
> one step closer to what Tom said, re wasting developer time...

It might be worth checking whether the cygwin installer, which at some point
at least allowed installing postgres, has download numbers available anywhere.

It's possible we could e.g. get away with just allowing libpq to be built.

Greetings,

Andres Freund




Re: Cleaning up historical portability baggage

2022-08-03 Thread Thomas Munro
On Thu, Aug 4, 2022 at 4:09 PM Tom Lane  wrote:
> Andres Freund  writes:
> >> XXX This can only be committed once prairedog is decommissioned, because
> >> macOS 10.4 didn't have clock_gettime().
>
> > Maybe put it later in the queue?
>
> clock_gettime is required by SUSv2 (1997), so I have to admit that
> macOS 10.4 doesn't have a lot of excuse not to have it.  In any case,
> prairiedog is just sitting there doing its thing until I find cycles
> to install a newer OS.  If you want to move ahead with this, don't
> let prairiedog block you.

Thanks, will do.  Just having an argument with MSYS about something I
seem to have messed up in the most recent version, and then I'll start
pushing these...




Re: Cygwin cleanup

2022-08-03 Thread Thomas Munro
On Thu, Aug 4, 2022 at 3:38 PM Justin Pryzby  wrote:
> [train wreck]

Oh my, so I'm getting the impression we might actually be totally
unstable on Cygwin.  Which surprises me because ... wait a minute ...
lorikeet isn't even running most of the tests.  So... we don't really
know the degree to which any of this works at all?

> This shows that it *can* pass, if slowly, and infrequently:
> https://cirrus-ci.com/task/6546858536337408

Ok, that's slightly reassuring, so maybe we *can* fix this, but I'm
one step closer to what Tom said, re wasting developer time...

> [lots of improvements]

Cool.

> Why did you write "|| exit /b 1" in all the bash invocations ?  I think cirrus
> handles that automatically.

Cargo-culted from libarchive.




Re: Refactoring postgres_fdw/connection.c

2022-08-03 Thread Amul Sul
On Thu, Jul 28, 2022 at 11:56 AM Fujii Masao
 wrote:
>
>
>
> On 2022/07/27 10:36, Kyotaro Horiguchi wrote:
> > At Tue, 26 Jul 2022 18:33:04 +0900, Fujii Masao 
> >  wrote in
> >>> I'm not sure the two are similar with each other.  The new function
> >>> pgfdw_exec_pre_commit() looks like a merger of two isolated code paths
> >>> intended to share a seven-line codelet.  I feel the code gets a bit
> >>> harder to understand after the change.  I mildly oppose to this part.
> >>
> >> If so, we can pgfdw_exec_pre_commit() into two, one is the common
> >> function that sends or executes the command (i.e., calls
> >> do_sql_command_begin() or do_sql_command()), and another is
> >> the function only for toplevel. The latter function calls
> >> the common function and then executes DEALLOCATE ALL things.
> >>
> >> But this is not the way that other functions like
> >> pgfdw_abort_cleanup()
> >> is implemented. Those functions have both codes for toplevel and
> >> !toplevel (i.e., subxact), and run the processings depending
> >> on the argument "toplevel". So I'm thinking that
> >> pgfdw_exec_pre_commit() implemented in the same way is better.
> >
> > I didn't see it from that viewpoint but I don't think that
> > unconditionally justifies other refactoring.  If we merge
> > pgfdw_finish_pre_(sub)?commit_cleanup()s this way, in turn
> > pgfdw_subxact_callback() and pgfdw_xact_callback() are going to be
> > almost identical except event IDs to handle. But I don't think we
> > would want to merge them.
>
> I don't think they are so identical because (as you say) they have to handle 
> different event IDs. So I agree we don't want to merge them.
>
>
> > A concern on 0002 is that it is hiding the subxact-specific steps from
> > the subxact callback.  It would look reasonable if it were called from
> > two or more places for each topleve and !toplevel, but actually it has
> > only one caller for each.  So I think that pgfdw_exec_pre_commit
> > should not do that and should be renamed to pgfdw_commit_remote() or
> > something.  On the other hand pgfdw_finish_pre_commit() hides
> > toplevel-specific steps from the caller so the same argument holds.
>
> So you conclusion is to rename pgfdw_exec_pre_commit() to 
> pgfdw_commit_remote() or something?
>
>
> > Another point that makes me concern about the patch is the new
> > function takes an SQL statement, along with the toplevel flag. I guess
> > the reason is that the command for subxact (RELEASE SAVEPOINT %d)
> > requires the current transaction level.  However, the values
> > isobtainable very cheap within the cleanup functions. So I propose to
> > get rid of the parameter "sql" from the two functions.
>
> Yes, that's possible. That is, pgfdw_exec_pre_commit() can construct the 
> query string by executing the following codes, instead of accepting the query 
> as an argument. But one downside of this approach is that the following codes 
> are executed for every remote subtransaction entries. Maybe it's cheap to 
> construct the query string as follows, but I'd like to avoid any unnecessray 
> overhead if possible. So the patch makes the caller, 
> pgfdw_subxact_callback(), construct the query string only once and give it to 
> pgfdw_exec_pre_commit().
>
> curlevel = GetCurrentTransactionNestLevel();
> snprintf(sql, sizeof(sql), "RELEASE SAVEPOINT s%d", curlevel);
>

Another possibility I can see is that instead of calling
pgfdw_exec_pre_commit() (similarly pgfdw_abort_cleanup) for every
connection entry, we should call that once from the callback function,
and for that we need to move the hash table loop inside that function.

The structure of the callback function looks a little fuzzy to me
where the same event is checked for every entry of the connection hash
table. Instead of simply move that loop should be inside those
function (e.g. pgfdw_exec_pre_commit and pgfdw_abort_cleanup), and let
called those function called once w.r.t to event and that function
should take care of every entry of the connection hash table. The
benefit is that we would save a few processing cycles that needed to
match events and call the same function for each connection entry.

I tried this refactoring in 0004 patch which is not complete, and
reattaching other patches too to make CFboat happy.

Thoughts? Suggestions?

Regards,
Amul
From d6e241cb946afe3b74c2893bda6dab8d3288716b Mon Sep 17 00:00:00 2001
From: Fujii Masao 
Date: Mon, 25 Jul 2022 23:27:14 +0900
Subject: [PATCH v2 3/4] Merge pgfdw_finish_pre_commit_cleanup and
 pgfdw_finish_pre_subcommit_cleanup into one.

---
 contrib/postgres_fdw/connection.c | 78 ++-
 1 file changed, 25 insertions(+), 53 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index ec290459be3..6e23046ad69 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -113,9 +113,8 @@ static bool pgfdw_get_result_timed(PGconn *conn, TimestampTz 

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-08-03 Thread Dilip Kumar
On Thu, Aug 4, 2022 at 12:18 AM Justin Pryzby  wrote:
>
> On Wed, Aug 03, 2022 at 11:26:43AM -0700, Andres Freund wrote:
> > Hm. This looks more like an issue of DROP DATABASE not being interruptible. 
> > I
> > suspect this isn't actually related to STRATEGY wal_log and could likely be
> > reproduced in older versions too.
>
> I couldn't reproduce it with file_copy, but my recipe isn't exactly reliable.
> That may just mean that it's easier to hit now.

I think this looks like a problem with drop db but IMHO you are seeing
this behavior only when a database is created using WAL LOG because in
this strategy we are using buffers to write the destination database
pages and some of the dirty buffers and sync requests might still be
pending.  And now when we try to drop the database it drops all the
dirty buffers and all pending sync requests and then before it
actually removes the directory it gets interrupted and now you see the
database directory on disk which is partially corrupted.  See below
sequence of drop database


dropdb()
{
...
DropDatabaseBuffers(db_id);
...
ForgetDatabaseSyncRequests(db_id);
...
RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);

WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE));
 -- Inside this it can process the cancel query and get interrupted
remove_dbtablespaces(db_id);
..
}

I reproduced the same error by inducing error just before
WaitForProcSignalBarrier.

postgres[14968]=# CREATE DATABASE a STRATEGY WAL_LOG ; drop database a;
CREATE DATABASE
ERROR:  XX000: test error
LOCATION:  dropdb, dbcommands.c:1684
postgres[14968]=# \c a
connection to server on socket "/tmp/.s.PGSQL.5432" failed: PANIC:
could not open critical system index 2662
Previous connection kept
postgres[14968]=#


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Cleaning up historical portability baggage

2022-08-03 Thread Tom Lane
Andres Freund  writes:
>> XXX This can only be committed once prairedog is decommissioned, because
>> macOS 10.4 didn't have clock_gettime().

> Maybe put it later in the queue?

clock_gettime is required by SUSv2 (1997), so I have to admit that
macOS 10.4 doesn't have a lot of excuse not to have it.  In any case,
prairiedog is just sitting there doing its thing until I find cycles
to install a newer OS.  If you want to move ahead with this, don't
let prairiedog block you.

regards, tom lane




Re: Cygwin cleanup

2022-08-03 Thread Andres Freund
Hi,

On 2022-07-28 17:23:19 -0500, Justin Pryzby wrote:
> On Fri, Jul 29, 2022 at 10:04:04AM +1200, Thomas Munro wrote:
> > > > XXX This should use a canned Docker image with all the right packages
> > > > installed
> > >
> > > Has anyone tried using non-canned images ?  It sounds like this could 
> > > reduce
> > > the 4min startup time for windows.
> > >
> > > https://cirrus-ci.org/guide/docker-builder-vm/#dockerfile-as-a-ci-environment
> >
> > Yeah, I had that working once.  Not sure what the pros and cons would be 
> > for us.
>
> I think it could be a lot faster to start, since cirrus caches the generated
> docker image locally.  Rather than (I gather) pulling the image every time.

I'm quite certain that is not true. All the docker images built are just
uploaded to the google container registry and then downloaded onto a
*separate* windows host. The dockerfile: stuff generates a separate task
running on a separate machine...

It's a bit better for non-windows containers, because there google has some
optimization for pulling image (pieces) on demand or such.

Greetings,

Andres Freund




Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-08-03 Thread Amit Kapila
On Mon, Aug 1, 2022 at 9:52 PM Önder Kalacı  wrote:
>
> Hi,
>
> As far as I can see, the following is the answer to the only remaining open 
> discussion in this thread. Let me know if anything is missed.
>
>> (b) it appears to me that the patch decides
>> >> which index to use the first time it opens the rel (or if the rel gets
>> >> invalidated) on subscriber and then for all consecutive operations it
>> >> uses the same index. It is quite possible that after some more
>> >> operations on the table, using the same index will actually be
>> >> costlier than a sequence scan or some other index scan
>> >
>> >
>> > Regarding (b), yes that is a concern I share. And, I was actually 
>> > considering sending another patch regarding this.
>> >
>> > Currently, I can see two options and happy to hear your take on these (or 
>> > maybe another idea?)
>> >
>> > - Add a new class of invalidation callbacks: Today, if we do ALTER TABLE 
>> > or CREATE INDEX on a table, the CacheRegisterRelcacheCallback helps us to 
>> > re-create the cache entries. In this case, as far as I can see, we need a 
>> > callback that is called when table "ANALYZE"d, because that is when the 
>> > statistics change. That is the time picking a new index makes sense.
>> > However, that seems like adding another dimension to this patch, which I 
>> > can try but also see that committing becomes even harder.
>> >
>>
>> This idea sounds worth investigating. I see that this will require
>> more work but OTOH, we can't allow the existing system to regress
>> especially because depending on workload it might regress badly. We
>> can create a patch for this atop the base patch for easier review/test
>> but I feel we need some way to address this point.
>>
>
> It turns out that we already invalidate the relevant entries in 
> LogicalRepRelMap/LogicalRepPartMap when "ANALYZE" (or VACUUM) updates any of 
> the statistics in pg_class.
>
> The call-stack for analyze is roughly:
> do_analyze_rel()
>-> vac_update_relstats()
>  -> heap_inplace_update()
>  -> if needs to apply any statistical change
>  -> CacheInvalidateHeapTuple()
>

Yeah, it appears that this will work but I see that we don't update
here for inherited stats, how does it work for such cases?

-- 
With Regards,
Amit Kapila.




Re: Eliminating SPI from RI triggers - take 2

2022-08-03 Thread Amit Langote
On Wed, Jul 13, 2022 at 8:59 PM Amit Langote  wrote:
> On Sat, Jul 9, 2022 at 1:15 AM Robert Haas  wrote:
> Thanks for taking a look at this.  I'll try to respond to other points
> in a separate email, but I wanted to clarify something about below:
>
> > I find my ego slightly wounded by the comment that "the partition
> > descriptor machinery has a hack that assumes that the queries
> > originating in this module push the latest snapshot in the
> > transaction-snapshot mode." It's true that the partition descriptor
> > machinery gives different answers depending on the active snapshot,
> > but, err, is that a hack, or just a perfectly reasonable design
> > decision?
>
> I think my calling it a hack of "partition descriptor machinery" is
> not entirely fair (sorry), because it's talking about the following
> comment in find_inheritance_children_extended(), which describes it as
> being a hack, so I mentioned the word "hack" in my comment too:
>
> /*
>  * Cope with partitions concurrently being detached.  When we see a
>  * partition marked "detach pending", we omit it from the returned set
>  * of visible partitions if caller requested that and the tuple's xmin
>  * does not appear in progress to the active snapshot.  (If there's no
>  * active snapshot set, that means we're not running a user query, so
>  * it's OK to always include detached partitions in that case; if the
>  * xmin is still running to the active snapshot, then the partition
>  * has not been detached yet and so we include it.)
>  *
>  * The reason for this hack is that we want to avoid seeing the
>  * partition as alive in RI queries during REPEATABLE READ or
>  * SERIALIZABLE transactions: such queries use a different snapshot
>  * than the one used by regular (user) queries.
>  */
>
> That bit came in to make DETACH CONCURRENTLY produce sane answers for
> RI queries in some cases.
>
> I guess my comment should really have said something like:
>
> HACK: find_inheritance_children_extended() has a hack that assumes
> that the queries originating in this module push the latest snapshot
> in transaction-snapshot mode.

Posting a new version with this bit fixed; cfbot complained that 0002
needed a rebase over 3592e0ff98.

I will try to come up with a patch to enhance the PartitionDirectory
interface to allow passing the snapshot to use when scanning
pg_inherits explicitly, so we won't need the above "hack".

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


v3-0002-Avoid-using-an-SQL-query-for-some-RI-checks.patch
Description: Binary data


v3-0001-Avoid-using-SPI-in-RI-trigger-functions.patch
Description: Binary data


Re: Cleaning up historical portability baggage

2022-08-03 Thread Andres Freund
Hi,

Can we get a few more of these committed soon? It's all tests that I need to
sync with the meson stuff, and I'd rather get it over with :). And it reduces
the set of tests that need to be compared...  Or is there a blocker (leaving
the prairedog one aside)?


On 2022-08-03 14:25:01 +1200, Thomas Munro wrote:
> Subject: [PATCH v3 01/13] Remove configure probe for dlopen.
> Subject: [PATCH v3 02/13] Remove configure probe and extra tests for
>  getrlimit.

LGTM.


> From 96a4935ff9480c2786634e9892b1f44782b403fb Mon Sep 17 00:00:00 2001
> From: Thomas Munro 
> Date: Sat, 23 Jul 2022 23:49:27 +1200
> Subject: [PATCH v3 03/13] Remove configure probe for shm_open.
>
> shm_open() is in SUSv2 (realtime) and all targeted Unix systems have it.
>
> We retain a HAVE_SHM_OPEN macro, because it's clearer to readers than
> something like !defined(WIN32).

I don't like these. I don't find them clearer - if we really just assume this
to be the case on windows, it's easier to understand the checks if they talk
about windows rather than having to know whether this specific check just
applies to windows or potentially an unspecified separate set of systems.

But I guess I should complain upthread...


> Subject: [PATCH v3 04/13] Remove configure probe for setsid.

LGTM.



> Subject: [PATCH v3 05/13] Remove configure probes for symlink/readlink, and
>  dead code.

Nice win.

>

> From 143f6917bbc7d8f457d52d02a5fbc79d849744e1 Mon Sep 17 00:00:00 2001
> From: Thomas Munro 
> Date: Sun, 24 Jul 2022 01:19:05 +1200

> Subject: [PATCH v3 06/13] Remove configure probe for link.
> --- a/src/include/port.h
> +++ b/src/include/port.h
> @@ -402,7 +402,8 @@ extern float pg_strtof(const char *nptr, char **endptr);
>  #define strtof(a,b) (pg_strtof((a),(b)))
>  #endif
>
> -#ifndef HAVE_LINK
> +#ifdef WIN32
> +/* src/port/win32link.c */
>  extern int   link(const char *src, const char *dst);
>  #endif

It bothers me that we have all this windows crap in port.h instead of
win32_port.h. But that's not this patch's fault.


> Subject: [PATCH v3 07/13] Remove dead replacement code for clock_gettime().

Nice.


> XXX This can only be committed once prairedog is decommissioned, because
> macOS 10.4 didn't have clock_gettime().

Maybe put it later in the queue?


> Subject: [PATCH v3 08/13] Remove configure probes for poll and poll.h.
>
> poll() and  are in SUSv2 and all targeted Unix systems have
> them.
>
> Retain HAVE_POLL and HAVE_POLL_H macros for readability.  There's an
> error in latch.c that is now unreachable (since logically always have
> one of WIN32 or HAVE_POLL defined), but that falls out of a decision to
> keep using defined(HAVE_POLL) instead of !defined(WIN32) to guard the
> poll() code.

Wonder if we instead should add an empty poll.h to src/include/port/win32?


> Subject: [PATCH v3 09/13] Remove dead setenv, unsetenv replacement code.
> Subject: [PATCH v3 10/13] Remove dead pread and pwrite replacement code.

LGTM.


> Subject: [PATCH v3 11/13] Simplify replacement code for preadv and pwritev.
>
> preadv() and pwritev() are not standardized by POSIX.  Most targeted
> Unix systems have had them for more than a decade, since they are
> obvious combinations of standard p- and -v functions.
>
> In 15, we had two replacement implementations: one based on lseek() + -v
> function if available, and the other based on a loop over p- function.
> They aren't used for much yet, but are heavily used in a current
> proposal.
>
> Supporting two ways of falling back, at the cost of having a
> pg_preadv/pg_pwritev that could never be used in a multi-threaded
> program accessing the same file descriptor from two threads without
> unpleasant locking does not sound like a good trade.
>
> Therefore, drop the lseek()-based variant, and also the pg_ prefix, now
> that the file position portability hazard is gone.  Previously, both
> fallbacks had the file position portability hazard, because our
> pread()/pwrite() replacement had the same hazard, but that problem has
> been fixed for pread()/pwrite() by an earlier commit.  Now the way is
> clear to expunge the file position portability hazard of the
> lseek()-based variants too.
>
> At the time of writing, the following systems in our build farm lack
> native preadv/pwritev and thus use fallback code:
>
>  * Solaris (but not illumos)
>  * macOS before release 11.0
>  * Windows with Cygwin
>  * Windows native
>
> With this commit, all of the above systems will now use the *same*
> fallback code, the one that loops over pread()/pwrite() (which is
> translated to equivalent calls in Windows).  Previously, all but Windows
> native would use the readv()/writev()-based fallback that this commit
> removes.

Given that it's just solaris and old macOS that "benefited" from writev, just
using the "full" fallback there makes sense.


> Subject: [PATCH v3 12/13] Remove fdatasync configure probe.

> @@ -1928,7 +1925,6 @@ if test "$PORTNAME" = "win32"; then
>AC_CHECK_FUNCS(_configthreadlocale)
>

Re: Cygwin cleanup

2022-08-03 Thread Justin Pryzby
On Fri, Jul 29, 2022 at 10:04:04AM +1200, Thomas Munro wrote:
> > > XXX We would never want this to run by default in CI, but it'd be nice
> > > to be able to ask for it with ci-os-only!  (See commented out line)
> > >  only_if: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*'
> >
> > Doesn't this already do what's needed?
> > As long as it doesn't also check: CHANGE_MESSAGE !~ 'ci-os-only',
> > the task will runs only on request.
> 
> Yeah I was just trying to say that I was sharing the script in a way
> that always runs, but for commit we'd want that.

That makes more sense after noticing that you created a cf entry (for which
cfbot has been skipping my patch due to my "only_if" line).  There's still a
few persistent issues:

This fails ~50% of the time in recovery 010-truncate
I hacked around this by setting data_sync_retry.
https://cirrus-ci.com/task/5289444063313920
I found these, not sure if they're relevant.
https://www.postgresql.org/message-id/flat/CAA4eK1Kft05mwNuZbTVRmz8SNS3r%2BuriuCT8DxL5KJy5btoS-A%40mail.gmail.com
https://www.postgresql.org/message-id/flat/CAFiTN-uGxgo5258hZy2QJoz%3Ds7_Cs7v9%3Db8Z2GgFV7qmQUOwxw%40mail.gmail.com

And an fsync abort in 013 which seems similar to this other one.
data_sync_retry also avoids this issue.
https://cirrus-ci.com/task/6283023745286144?logs=cores#L34
https://www.postgresql.org/message-id/flat/CAMVYW_4QhjZ-19Xpr2x1B19soRCNu1BXHM8g1mOnAVtd5VViDw%40mail.gmail.com

And sometimes various assertions failing in regress parallel_select (and then 
times out)
https://api.cirrus-ci.com/v1/artifact/task/5537540282253312/log/src/test/regress/log/postmaster.log
https://api.cirrus-ci.com/v1/artifact/task/6108746773430272/log/src/test/regress/log/postmaster.log
Or "could not map dynamic shared memory segment" (actually in 
027-stream-regress):
https://cirrus-ci.com/task/6168860746317824

And segfault in vacuum parallel
https://api.cirrus-ci.com/v1/artifact/task/5404589569605632/log/src/test/regress/log/postmaster.log

Sometimes semctl() failed: Resource temporarily unavailable
https://api.cirrus-ci.com/v1/artifact/task/5027860623654912/log/src/test/subscription/tmp_check/log/014_binary_publisher.log
https://api.cirrus-ci.com/v1/artifact/task/5027860623654912/log/src/bin/pg_rewind/tmp_check/log/001_basic_standby_local.log

Some more
https://cirrus-ci.com/task/6468927780814848

If you're lucky, there's only 1 or 2 problems, of which those are different
symptoms..  Maybe for now this needs to disable tap tests :(

This shows that it *can* pass, if slowly, and infrequently:
https://cirrus-ci.com/task/6546858536337408

This fixes my changes to configure for getopt.
And simplifies the changes to *.pl (the .exe changes weren't necessary at all).
And removes the changes for implicit-fallthrough; I realized that configure was
  just deciding that it didn't work and not using it at all.
And adds support for backtraces.
And remove kerberos and and add libxml

Why did you write "|| exit /b 1" in all the bash invocations ?  I think cirrus
handles that automatically.

-- 
Justin
>From b929ea7acc33a2fda1ec10693736a2fa83d364e1 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 25 Jul 2022 23:05:10 +1200
Subject: [PATCH v3] WIP CI support for Cygwin.

ci-os-only: cygwin

See also: d8e78714-dc77-4a64-783f-e863ba4d9...@2ndquadrant.com

https://cirrus-ci.com/task/5145086722834432

XXX This should use a canned Docker image with all the right packages
installed?  But if the larger image is slower to start, then maybe not...
---
 .cirrus.yml   | 67 +++
 configure |  2 +-
 configure.ac  |  2 +-
 src/test/perl/PostgreSQL/Test/Cluster.pm  |  4 +-
 src/test/perl/PostgreSQL/Test/Utils.pm| 12 +++-
 src/test/recovery/t/020_archive_status.pl |  2 +-
 src/tools/ci/cores_backtrace.sh   | 28 +-
 src/tools/ci/pg_ci_base.conf  |  2 +
 8 files changed, 109 insertions(+), 10 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index 4b7918ef456..84341ac1b94 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -34,6 +34,7 @@ on_failure: _failure
   - "**/*.log"
   - "**/*.diffs"
   - "**/regress_log_*"
+  - "**/*.stackdump"
 type: text/plain
 
 task:
@@ -464,6 +465,72 @@ task:
   type: text/plain
 
 
+task:
+  name: Windows - Cygwin
+  #XXX only_if: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*'
+  timeout_in: 90m
+
+  env:
+CPUS: 4
+BUILD_JOBS: 4
+TEST_JOBS: 1
+CCACHE_DIR: /tmp/ccache
+CONFIGURE_FLAGS: --enable-debug --enable-tap-tests --with-ldap --with-ssl=openssl --with-libxml --enable-cassert
+# --with-gssapi
+CONFIGURE_CACHE: /tmp/ccache/configure.cache
+PG_TEST_USE_UNIX_SOCKETS: 1
+CCACHE_LOGFILE: ccache.log
+EXTRA_REGRESS_OPTS: --max-connections=1
+PG_TEST_EXTRA: ldap ssl # disable kerberos
+
+  windows_container:
+image: cirrusci/windowsservercore:2019
+os_version: 

Re: PostgreSQL 15 minor fixes in protocol.sgml

2022-08-03 Thread Amit Kapila
On Wed, Aug 3, 2022 at 4:23 PM Alvaro Herrera  wrote:
>
> On 2022-Aug-03, Amit Kapila wrote:
>
> > Thanks for the report and Thanks Michael for including me. I am just
> > redirecting it to -hackers so that others involved in this feature
> > also can share their views.
>
> I'm sorry, but our policy is that crossposts are not allowed.  I think
> this policy is bad, precisely because it prevents legitimate cases like
> this one; but it is what it is.
>
> I think we should change the policy, not back to allow indiscriminate
> cross-posting, but to allow some limited form of it.  For example I
> think pg-bugs+pg-hackers and pg-docs+pg-hackers should be allowed
> combinations.  Just saying.
>

+1.

-- 
With Regards,
Amit Kapila.




Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2022-08-03 Thread Thomas Munro
On Sun, Jul 31, 2022 at 11:17 AM Tom Lane  wrote:
> Thomas Munro  writes:
> > I noticed this is a 32 bit FBSD system.  Is it running on UFS, perhaps
> > on slow storage?  Are soft updates enabled (visible as options in
> > output of "mount")?
>
> It's an ancient (2006) mac mini with 5400RPM spinning rust.
> "mount" says
>
> /dev/ada0s2a on / (ufs, local, soft-updates, journaled soft-updates)
> devfs on /dev (devfs)

I don't have all the details and I may be way off here but I have the
impression that when you create and then unlink trees of files
quickly, sometimes soft-updates are flushed synchronously, which turns
into many 5400 RPM seeks; dtrace could be used to check, but some
clues in your numbers would be some kind of correlation between time
and number of clusters that are set up and torn down by each test.
Without soft-updates, it'd be much worse, because then many more
things become synchronous I/O.  Even with write caching enabled,
soft-updates flush the drive cache when there's a barrier needed for
crash safety.  It may also be that there is something strange about
Apple hardware that makes it extra slow at full-cache-flush operations
(cf unexplainable excess slowness of F_FULLFSYNC under macOS including
old spinning rust systems and current flash systems, and complaints
about this general area on current Apple hardware from the Asahi
Linux/M1 port people, though how relevant that is to 2006 spinning
rust I dunno).  It would be nice to look into how to tune, fix or work
around all of that, as it also affects CI which has a IO limits
(though admittedly a couple of orders of mag higher IOPS than 5400
RPM).




Re: [PATCH] postgresql.conf.sample comment alignment.

2022-08-03 Thread Peter Smith
On Thu, Aug 4, 2022 at 11:09 AM Michael Paquier  wrote:
>
> On Wed, Aug 03, 2022 at 12:58:04PM +0200, Alvaro Herrera wrote:
> > On 2022-Aug-01, Tom Lane wrote:
> >> One idea for avoiding confusion is to legislate that we won't
> >> use tabs at all in this file (which we could enforce via
> >> .gitattributes, I think).
> >
> > +1.
>
> That's not the first time this 4- or 8-character tab issue is popping
> up around here, so enforcing spaces and having a rule sounds like a
> good idea at the end.
>

Well, it was only assumed that I had probably confused 4- 8- tabs, but
I don't think I did, so the tabbing issue did not really "pop up"
here.

e.g. you can see some of the existing alignments I'd suggested
modifying here [1]
- #shared_preload_libraries = '' # (change requires restart)
- #idle_in_transaction_session_timeout = 0 # in milliseconds, 0 is
disable <- (moved comments of the neighbours to keep them all aligned)
- etc.

I'm not saying replacing the tabs with spaces isn't a good idea - I
also agree probably it is, but that's a different problem to the
alignments I was trying to correct with the patch

--
[1] 
https://github.com/postgres/postgres/blob/master/src/backend/utils/misc/postgresql.conf.sample

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Cleaning up historical portability baggage

2022-08-03 Thread Andres Freund
Hi,

On 2022-08-03 21:52:04 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Another potential cleanup is the fallback for strtoll/strtoull.
>
> +1, I suspect the alternate spellings are dead.

Looks like that includes systems where there's no declaration for strtoll,
strtoull. The test was introduced in

commit a6228128fc48c222953dfd41fd438522a184054c
Author: Tom Lane 
Date:   2018-05-18 22:42:10 -0400

Arrange to supply declarations for strtoll/strtoull if needed.

The check was introduced for animal dromedary, afaics. Looks like that stopped
reporting 2019-09-27 and transformed into florican.

A query on the bf database didn't see any runs in the last 30 days that didn't
have strtoll declared.

See attached patch.

Greetings,

Andres Freund
>From 971cd9491f27b52b982b0a9735bc0e678805e14b Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 3 Aug 2022 19:21:56 -0700
Subject: [PATCH v1] Remove fallbacks for strtoll, strtoull

strtoll was backfilled with either __strtoll or strtoq on systems without
strtoll. The last such system on the buildfarm was an ancient HP-UX animal. We
don't support HP-UX anymore, so remove.

On other systems strtoll was present, but did not have a declaration. The last
known instance on the buildfarm was running an ancient OSX and shut down in
2019.

Discussion: https://postgr.es/m/20220804013546.h65najrzig764...@awork3.anarazel.de
---
 src/include/c.h| 29 ---
 src/include/pg_config.h.in | 26 -
 configure  | 47 --
 configure.ac   |  5 
 src/tools/msvc/Solution.pm |  8 ---
 5 files changed, 115 deletions(-)

diff --git a/src/include/c.h b/src/include/c.h
index d35405f191a..8c4baeb0ec3 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -1294,35 +1294,6 @@ typedef union PGAlignedXLogBlock
 extern int	fdatasync(int fildes);
 #endif
 
-/* Older platforms may provide strto[u]ll functionality under other names */
-#if !defined(HAVE_STRTOLL) && defined(HAVE___STRTOLL)
-#define strtoll __strtoll
-#define HAVE_STRTOLL 1
-#endif
-
-#if !defined(HAVE_STRTOLL) && defined(HAVE_STRTOQ)
-#define strtoll strtoq
-#define HAVE_STRTOLL 1
-#endif
-
-#if !defined(HAVE_STRTOULL) && defined(HAVE___STRTOULL)
-#define strtoull __strtoull
-#define HAVE_STRTOULL 1
-#endif
-
-#if !defined(HAVE_STRTOULL) && defined(HAVE_STRTOUQ)
-#define strtoull strtouq
-#define HAVE_STRTOULL 1
-#endif
-
-#if defined(HAVE_STRTOLL) && !HAVE_DECL_STRTOLL
-extern long long strtoll(const char *str, char **endptr, int base);
-#endif
-
-#if defined(HAVE_STRTOULL) && !HAVE_DECL_STRTOULL
-extern unsigned long long strtoull(const char *str, char **endptr, int base);
-#endif
-
 /*
  * Thin wrappers that convert strings to exactly 64-bit integers, matching our
  * definition of int64.  (For the naming, compare that POSIX has
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index f9618e19863..00abe62f5c6 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -161,14 +161,6 @@
don't. */
 #undef HAVE_DECL_STRNLEN
 
-/* Define to 1 if you have the declaration of `strtoll', and to 0 if you
-   don't. */
-#undef HAVE_DECL_STRTOLL
-
-/* Define to 1 if you have the declaration of `strtoull', and to 0 if you
-   don't. */
-#undef HAVE_DECL_STRTOULL
-
 /* Define to 1 if you have the `dlopen' function. */
 #undef HAVE_DLOPEN
 
@@ -531,18 +523,6 @@
 /* Define to 1 if you have the `strtof' function. */
 #undef HAVE_STRTOF
 
-/* Define to 1 if you have the `strtoll' function. */
-#undef HAVE_STRTOLL
-
-/* Define to 1 if you have the `strtoq' function. */
-#undef HAVE_STRTOQ
-
-/* Define to 1 if you have the `strtoull' function. */
-#undef HAVE_STRTOULL
-
-/* Define to 1 if you have the `strtouq' function. */
-#undef HAVE_STRTOUQ
-
 /* Define to 1 if the system has the type `struct addrinfo'. */
 #undef HAVE_STRUCT_ADDRINFO
 
@@ -747,12 +727,6 @@
 /* Define to 1 if your compiler understands _Static_assert. */
 #undef HAVE__STATIC_ASSERT
 
-/* Define to 1 if you have the `__strtoll' function. */
-#undef HAVE___STRTOLL
-
-/* Define to 1 if you have the `__strtoull' function. */
-#undef HAVE___STRTOULL
-
 /* Define to the appropriate printf length modifier for 64-bit ints. */
 #undef INT64_MODIFIER
 
diff --git a/configure b/configure
index c5bc3823958..b9add9022b8 100755
--- a/configure
+++ b/configure
@@ -17230,53 +17230,6 @@ $as_echo "#define HAVE_INT_OPTRESET 1" >>confdefs.h
 
 fi
 
-for ac_func in strtoll __strtoll strtoq
-do :
-  as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
-ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
-if eval test \"x\$"$as_ac_var"\" = x"yes"; then :
-  cat >>confdefs.h <<_ACEOF
-#define `$as_echo "HAVE_$ac_func" | $as_tr_cpp` 1
-_ACEOF
- break
-fi
-done
-
-for ac_func in strtoull __strtoull strtouq
-do :
-  as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
-ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
-if eval test 

Re: [PATCH] postgresql.conf.sample comment alignment.

2022-08-03 Thread Julien Rouhaud
On Thu, Aug 04, 2022 at 10:09:27AM +0900, Michael Paquier wrote:
> On Wed, Aug 03, 2022 at 12:58:04PM +0200, Alvaro Herrera wrote:
> > On 2022-Aug-01, Tom Lane wrote:
> >> One idea for avoiding confusion is to legislate that we won't
> >> use tabs at all in this file (which we could enforce via
> >> .gitattributes, I think).
> >
> > +1.
>
> That's not the first time this 4- or 8-character tab issue is popping
> up around here, so enforcing spaces and having a rule sounds like a
> good idea at the end.

+1




Re: Mingw task for Cirrus CI

2022-08-03 Thread Thomas Munro
I noticed that this says:

[01:01:45.657] sqlda.pgc: In function 'dump_sqlda':
[01:01:45.657] sqlda.pgc:45:33: warning: format '%d' expects argument
of type 'int', but argument 3 has type 'long long int' [-Wformat=]
[01:01:45.657] 45 | "name sqlda descriptor: '%s' value %I64d\n",
[01:01:45.657] | ^~~
[01:01:45.657] ..
[01:01:45.657] 49 | sqlda->sqlvar[i].sqlname.data, *(long long int
*)sqlda->sqlvar[i].sqldata);
[01:01:45.657] | ~~
[01:01:45.657] | |
[01:01:45.657] | long long int

... but fairywren doesn't.  Why would they disagree on recognising %I64d?

The other warning I'm seeing is present on both, so it's not really
relevant to this thread, just mentioning it...  seems kinda like a
worlds-colliding-problem without an elegant fix (POSIX says you have
to declare it yourself exactly like that, but Windows says linkage
ain't going to work the way you want unless you sprinkle the right
linkage dust on it...), so maybe we just want to put #if
!defined(something something mings) around it and to MinGW's header's
declaration...

[00:48:08.925] c:/cirrus/src/backend/postmaster/postmaster.c: In
function 'PostmasterMain':
[00:48:08.925] c:/cirrus/src/backend/postmaster/postmaster.c:973:31:
warning: '__p__environ' redeclared without dllimport attribute:
previous dllimport ignored [-Wattributes]
[00:48:08.925] 973 | extern char **environ;
[00:48:08.925] | ^~~




Re: Cleaning up historical portability baggage

2022-08-03 Thread Tom Lane
Andres Freund  writes:
> Another potential cleanup is the fallback for strtoll/strtoull.

+1, I suspect the alternate spellings are dead.

regards, tom lane




Re: Introduce wait_for_subscription_sync for TAP tests

2022-08-03 Thread Amit Kapila
On Wed, Aug 3, 2022 at 10:21 AM Amit Kapila  wrote:
>
> Pushed this one and now I'll look at your other patch.
>

I have pushed the second patch as well after making minor changes in
the comments. Alvaro [1] and Tom [2] suggest to back-patch this and
they sound reasonable to me. Will you be able to produce back branch
patches?

[1] - 
https://www.postgresql.org/message-id/20220803104544.k2luy5hr2ugnhgr2%40alvherre.pgsql
[2] - https://www.postgresql.org/message-id/2966703.1659535343%40sss.pgh.pa.us

-- 
With Regards,
Amit Kapila.




Re: Cleaning up historical portability baggage

2022-08-03 Thread Andres Freund
Hi,

Another potential cleanup is the fallback for strtoll/strtoull. Some of the
spellings were introduced because of "ancient HPUX":

commit 06f66cff9e0b93db81db1595156b2aff8ba1786e
Author: Tom Lane 
Date:   2018-05-19 14:22:18 -0400

Support platforms where strtoll/strtoull are spelled __strtoll/__strtoull.

Ancient HPUX, for one, does this.  We hadn't noticed due to the lack
of regression tests that required a working strtoll.

(I was slightly tempted to remove the other historical spelling,
strto[u]q, since it seems we have no buildfarm members testing that case.
But I refrained.)

Discussion: 
https://postgr.es/m/151935568942.1461.14623890240535309...@wrigleys.postgresql.org

and some much longer ago:

commit 9394d391b803c55281879721ea393a50df4a0be6
Author: Peter Eisentraut 
Date:   2000-11-20 15:56:14 +

Add configure checks for strtoll, strtoull (or strto[u]q).  Disable
'long long int' portions of ecpg if the type or these functions don't
exist.

since strtoq, strtouq apparently were already obsolete in 2018, and hpux is
now obsolete...


I only noticed this because I'd ported the configure check a bit naively,
without the break in the if-found case, and was looking into why HAVE_STRTOQ,
HAVE_STRTOUQ were defined with meson, but not autoconf...
AC_CHECK_FUNCS([strtoll __strtoll strtoq], [break])
AC_CHECK_FUNCS([strtoull __strtoull strtouq], [break])


Greetings,

Andres Freund




Re: A test for replay of regression tests

2022-08-03 Thread Justin Pryzby
On Thu, Aug 04, 2022 at 09:24:24AM +1200, Thomas Munro wrote:
> On Thu, Aug 4, 2022 at 3:30 AM Justin Pryzby  wrote:
> > On Thu, Dec 09, 2021 at 12:10:23PM +1300, Thomas Munro wrote:
> > > This adds 2 whole minutes to the recovery check, when running with the
> > > Windows serial-only scripts on Cirrus CI (using Andres's CI patches).
> > > For Linux it adds ~20 seconds to the total of -j8 check-world.
> > > Hopefully that's time well spent, because it adds test coverage for
> > > all the redo routines, and hopefully soon we won't have to run 'em in
> > > series on Windows.
> >
> > Should 027-stream-regress be renamed to something that starts earlier ?
> > Off-list earlier this year, Andres referred to 000.

See also: 
https://www.postgresql.org/message-id/20220213220709.vjz5rziuhfdpq...@alap3.anarazel.de

> Do you have any data on improved times from doing that?
> 
> I have wondered about moving it into 001_stream_rep.pl.

The immediate motive for raising the question is due to working on your cygwin
patch (where I've set PROVE_FLAGS=-j3).  The last invocation I have opened ends
like:

[20:46:47.577] [13:46:47] t/026_overwrite_contrecord.pl  ok10264 ms 
( 0.02 usr  0.02 sys + 11.25 cusr 35.95 csys = 47.24 CPU)
[20:47:08.087] [13:47:08] t/028_pitr_timelines.pl .. ok13153 ms 
( 0.00 usr  0.00 sys +  4.03 cusr 14.79 csys = 18.82 CPU)
[20:47:08.999] [13:47:09] t/029_stats_restart.pl ... ok12631 ms 
( 0.00 usr  0.02 sys +  7.40 cusr 23.30 csys = 30.71 CPU)
[20:47:34.353] [13:47:34] t/031_recovery_conflict.pl ... ok11337 ms 
( 0.00 usr  0.00 sys +  3.84 cusr 11.82 csys = 15.66 CPU)
[20:47:35.070] [13:47:35] t/030_stats_cleanup_replica.pl ... ok14054 ms 
( 0.02 usr  0.00 sys +  7.64 cusr 25.02 csys = 32.68 CPU)
[20:48:04.887] [13:48:04] t/032_relfilenode_reuse.pl ... ok12755 ms 
( 0.00 usr  0.00 sys +  3.36 cusr 11.57 csys = 14.93 CPU)
[20:48:42.055] [13:48:42] t/033_replay_tsp_drops.pl  ok43529 ms 
( 0.00 usr  0.00 sys + 12.29 cusr 41.43 csys = 53.71 CPU)
[20:50:02.770] [13:50:02] t/027_stream_regress.pl .. ok   198408 ms 
( 0.02 usr  0.06 sys + 44.92 cusr 142.42 csys = 187.42 CPU)
[20:50:02.771] [13:50:02]
[20:50:02.771] All tests successful.
[20:50:02.771] Files=33, Tests=411, 402 wallclock secs ( 0.16 usr  0.27 sys + 
138.03 cusr 441.56 csys = 580.01 CPU)

If 027 had been started sooner, this test might have finished up to 78sec
earlier.  If lots of tests are added in the future, maybe it won't matter, but
it seems like it does now.

As I understand, checks are usually parallelized by "make -j" and not by
"prove".  In that case, starting a slow test later doesn't matter.  But it'd be
better for anyone who runs tap tests manually, and (I think) for meson.

As a one-off test on localhost:
time make check -C src/test/recovery
=> 11m42,790s
time make check -C src/test/recovery PROVE_FLAGS=-j2
=> 7m56,315s

After renaming it to 001:
time make check -C src/test/recovery
=> 11m33,887s (~same)
time make check -C src/test/recovery PROVE_FLAGS=-j2
=> 6m59,969s

I don't know how it affect the buildfarm (but I think that's not optimized
primarily for speed anyway).

-- 
Justin




Re: [PATCH] postgresql.conf.sample comment alignment.

2022-08-03 Thread Michael Paquier
On Wed, Aug 03, 2022 at 12:58:04PM +0200, Alvaro Herrera wrote:
> On 2022-Aug-01, Tom Lane wrote:
>> One idea for avoiding confusion is to legislate that we won't
>> use tabs at all in this file (which we could enforce via
>> .gitattributes, I think).
> 
> +1.

That's not the first time this 4- or 8-character tab issue is popping
up around here, so enforcing spaces and having a rule sounds like a
good idea at the end.

>> But that might just be making things equally inconvenient for
>> everybody.
> 
> In this situation, the only disadvantaged users are those using a
> non-fixed-width font in their editor, but those are lost souls already.

Haha.
--
Michael


signature.asc
Description: PGP signature


Re: enable/disable broken for statement triggers on partitioned tables

2022-08-03 Thread Amit Langote
On Thu, Aug 4, 2022 at 3:01 AM Alvaro Herrera  wrote:
> On 2022-Aug-02, Amit Langote wrote:
> > Regarding the patch, I agree that storing the recurse flag rather than
> > overwriting subtype might be better.
> >
> > +   boolexecTimeRecursion; /* set by ATPrepCmd if ATExecCmd must
> > +   * recurse to children */
> >
> > Might it be better to call this field simply 'recurse'?  I think it's
> > clear from the context and the comment above the flag is to be used
> > during execution.
>
> Yeah, I guess we can do that and also reword the overall ALTER TABLE
> comment about recursion.  That's in the attached first patch, which is
> intended as backpatchable.

Thanks.  This one looks good to me.

> The second patch is just to show how we'd rewrite AT_AddColumn to no
> longer use the Recurse separate enum value but instead use the ->recurse
> flag.  This is pretty straightforward and it's a clear net reduction of
> code.  We can't backpatch this kind of thing of course, both because of
> the ABI break (easily fixed) and because potential destabilization
> (scary).  We can do similar tihngs for the other AT enum values for
> recursion.  This isn't complete since there are a few other values in
> that enum that we should process in this way too; I don't intend it to
> push it just yet.

I like the idea of removing all AT_*Recurse subtypes in HEAD.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: pg15b2: large objects lost on upgrade

2022-08-03 Thread Tom Lane
"Jonathan S. Katz"  writes:
> I did rule out wanting to do the "xid + $X" check after reviewing some 
> of the output. I think that both $X could end up varying, and it really 
> feels like a bandaid.

It is that.  I wouldn't feel comfortable with $X less than 100 or so,
which is probably sloppy enough to draw Robert's ire.  Still, realizing
that what we want right now is a band-aid for 15beta3, I don't think
it's an unreasonable short-term option.

> Andres suggested upthread using "txid_current()" -- for the comparison, 
> that's one thing I looked at. Would any of the XID info from 
> "pg_control_checkpoint()" also serve for this test?

I like the idea of txid_current(), but we have no comparable
function for mxid do we?  While you could get both numbers from
pg_control_checkpoint(), I doubt that's sufficiently up-to-date.

regards, tom lane




Re: pg15b2: large objects lost on upgrade

2022-08-03 Thread Jonathan S. Katz

On 8/3/22 2:08 PM, Peter Geoghegan wrote:

On Wed, Aug 3, 2022 at 1:47 PM Tom Lane  wrote:

Again, this seems to me to be breaking the test's real-world applicability
for a (false?) sense of stability.


I agree.

A lot of the VACUUM test flappiness issues we've had to deal with in
the past now seem like problems with VACUUM itself, the test's design,
or both. For example, why should we get a totally different
pg_class.reltuples because we couldn't get a cleanup lock on some
page? Why not just make sure to give the same answer either way,
which happens to be the most useful behavior to the user? That way
the test isn't just targeting implementation details.


After catching up (and reviewing approaches that could work while on 
poor wifi), it does make me wonder if we can have a useful test ready 
before beta 3.


I did rule out wanting to do the "xid + $X" check after reviewing some 
of the output. I think that both $X could end up varying, and it really 
feels like a bandaid.


Andres suggested upthread using "txid_current()" -- for the comparison, 
that's one thing I looked at. Would any of the XID info from 
"pg_control_checkpoint()" also serve for this test?


If yes to the above, I should be able to modify this fairly quickly.

Jonathan




Re: [doc] fix a potential grammer mistake

2022-08-03 Thread Junwang Zhao
On Thu, Aug 4, 2022 at 12:42 AM Robert Treat  wrote:
>
> On Wed, Aug 3, 2022 at 11:15 AM Junwang Zhao  wrote:
> >
> > Attachment is a corrected version based on Tom's suggestion.
> >
> > Thanks.
> >
> > On Wed, Aug 3, 2022 at 9:56 PM Tom Lane  wrote:
> > >
> > > Erikjan Rijkers  writes:
> > > > I don't think these  "were"s  are wrong but arguably changing them to
> > > > "have" helps non-native speakers (like myself), as it doesn't change the
> > > > meaning significantly as far as I can see.
> > >
> > > I think it does --- it changes the meaning from passive to active.
> > > I don't necessarily object to rewriting these sentences more broadly,
> > > but I don't think "have issued" is the correct phrasing.
> > >
> > > Possibly "The user issued ..." would work.
> > >
>
> Is there a reason that the first case says "just" issued vs the other
> two cases? It seems to me that it should be removed.
Attachment is a patch with the "just" removed.

Thanks
>
> Robert Treat
> https://xzilla.net



-- 
Regards
Junwang Zhao


0001-doc-rewrite-some-comments-to-make-them-more-precise.patch
Description: Binary data


Re: postgres_fdw: batch inserts vs. before row triggers

2022-08-03 Thread Matthias van de Meent
On Wed, 3 Aug 2022 at 23:57, Tom Lane  wrote:
>
> Matthias van de Meent  writes:
> > I don't have a current version of the SQL spec, but one preliminary
> > version of SQL:2012 I retrieved via the wiki details that all BEFORE
> > triggers on INSERT/UPDATE/DELETE statements are all executed before
> > _any_ of that statements' affected data is modified.
> > ...
> > I don't know about the semantics of triggers in the latest SQL
> > standard versions, but based on that sample it seems like we're
> > non-compliant on BEFORE trigger behaviour, and it doesn't seem like
> > it's documented in the trigger documentation.
>
> I think we're compliant if you declare the trigger functions as
> stable (or immutable, but in any case where this matters, I think
> you'd be lying).  They'll then run with the snapshot of the calling
> query, in which those updates are not yet visible.
>
> This is documented somewhere, but maybe not anywhere near triggers.

Thank you for this pointer.

Looking around a bit, it seems like this behaviour for functions is
indeed documented in xfunc.sgml, but rendered docs page [0] does not
seem to mention triggers, nor does the triggers page link to that part
of the xfunc document. This makes it quite easy to overlook that this
is expected (?) behaviour for VOLATILE functions only.

Kind regards,

Matthias van de Meent

[0] https://www.postgresql.org/docs/current/xfunc-volatility.html




Re: Unstable tests for recovery conflict handling

2022-08-03 Thread Andres Freund
Hi,

On 2022-08-03 16:33:46 -0400, Robert Haas wrote:
> On Wed, Aug 3, 2022 at 1:57 PM Andres Freund  wrote:
> > The reason nothing might get logged in some cases is that
> > e.g. ResolveRecoveryConflictWithLock() tells
> > ResolveRecoveryConflictWithVirtualXIDs() to *not* report the waiting:
> > /*
> >  * Prevent ResolveRecoveryConflictWithVirtualXIDs() from 
> > reporting
> >  * "waiting" in PS display by disabling its argument 
> > report_waiting
> >  * because the caller, WaitOnLock(), has already reported 
> > that.
> >  */
> >
> > so ResolveRecoveryConflictWithLock() can end up looping indefinitely without
> > logging anything.
> 
> I understand why we need to avoid adding "waiting" to the PS status
> when we've already done that, but it doesn't seem like that should
> imply skipping ereport() of log messages.
> 
> I think we could redesign the way the ps display works to make things
> a whole lot simpler. Let's have a function set_ps_display() and
> another function set_ps_display_suffix(). What gets reported to the OS
> is the concatenation of the two. Calling set_ps_display() implicitly
> resets the suffix to empty.
> 
> AFAICS, that'd let us get rid of this tricky logic, and some other
> tricky logic as well. Here, we'd just say set_ps_display_suffix("
> waiting") and not worry about whether the caller might have already
> done something similar.

That sounds like it'd be an improvement.  Of course we still need to fix that
we can signal at a rate not allowing the other side to handle the conflict,
but at least that'd be easier to identify...


> > Another question I have about ResolveRecoveryConflictWithLock() is whether
> > it's ok that we don't check deadlocks around the
> > ResolveRecoveryConflictWithVirtualXIDs() call? It might be ok, because we'd
> > only block if there's a recovery conflict, in which killing the process 
> > ought
> > to succeed?
> 
> The startup process is supposed to always "win" in any deadlock
> situation, so I'm not sure what you think is a problem here. We get
> the conflicting lockers. We kill them. If they don't die, that's a
> bug, but killing ourselves doesn't really help anything; if we die,
> the whole system goes down, which seems undesirable.

The way deadlock timeout for the startup process works is that we wait for it
to pass and then send PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK to the
backends. So it's not that the startup process would die.

The question is basically whether there are cases were
PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK would resolve a conflict but
PROCSIG_RECOVERY_CONFLICT_LOCK wouldn't. It seems plausible that there isn't,
but it's also not obvious enough that I'd fully trust it.

Greetings,

Andres Freund




Re: postgres_fdw: batch inserts vs. before row triggers

2022-08-03 Thread Tom Lane
Matthias van de Meent  writes:
> I don't have a current version of the SQL spec, but one preliminary
> version of SQL:2012 I retrieved via the wiki details that all BEFORE
> triggers on INSERT/UPDATE/DELETE statements are all executed before
> _any_ of that statements' affected data is modified.
> ...
> I don't know about the semantics of triggers in the latest SQL
> standard versions, but based on that sample it seems like we're
> non-compliant on BEFORE trigger behaviour, and it doesn't seem like
> it's documented in the trigger documentation.

I think we're compliant if you declare the trigger functions as
stable (or immutable, but in any case where this matters, I think
you'd be lying).  They'll then run with the snapshot of the calling
query, in which those updates are not yet visible.

This is documented somewhere, but maybe not anywhere near triggers.

regards, tom lane




Re: postgres_fdw: batch inserts vs. before row triggers

2022-08-03 Thread Matthias van de Meent
On Tue, 19 Apr 2022 at 14:00, Tomas Vondra
 wrote:
>
> On 4/19/22 11:16, Etsuro Fujita wrote:
> > On Sun, Apr 17, 2022 at 6:20 PM Etsuro Fujita  
> > wrote:
> >> Here
> >> is an example using HEAD:
> >>
> >> create extension postgres_fdw;
> >> create server loopback foreign data wrapper postgres_fdw options
> >> (dbname 'postgres');
> >> create user mapping for current_user server loopback;
> >> create table t (a int);
> >> create foreign table ft (a int) server loopback options (table_name 't');
> >> create function ft_rowcount_tf() returns trigger as $$ begin raise
> >> notice '%: rows = %', tg_name, (select count(*) from ft); return new;
> >> end; $$ language plpgsql;
> >> create trigger ft_rowcount before insert on ft for each row execute
> >> function ft_rowcount_tf();
> >>
> >> insert into ft select i from generate_series(1, 10) i;
> >> NOTICE:  ft_rowcount: rows = 0
> >> NOTICE:  ft_rowcount: rows = 1
> >> NOTICE:  ft_rowcount: rows = 2
> >> NOTICE:  ft_rowcount: rows = 3
> >> NOTICE:  ft_rowcount: rows = 4
> >> NOTICE:  ft_rowcount: rows = 5
> >> NOTICE:  ft_rowcount: rows = 6
> >> NOTICE:  ft_rowcount: rows = 7
> >> NOTICE:  ft_rowcount: rows = 8
> >> NOTICE:  ft_rowcount: rows = 9
> >> INSERT 0 10
> >>
> >> This looks good, but when batch insert is enabled, the trigger
> >> produces incorrect results:
> >>
> >> alter foreign table ft options (add batch_size '10');
> >> delete from ft;
> >>
> >> insert into ft select i from generate_series(1, 10) i;
> >> NOTICE:  ft_rowcount: rows = 0
> >> NOTICE:  ft_rowcount: rows = 0
> >> NOTICE:  ft_rowcount: rows = 0
> >> NOTICE:  ft_rowcount: rows = 0
> >> NOTICE:  ft_rowcount: rows = 0
> >> NOTICE:  ft_rowcount: rows = 0
> >> NOTICE:  ft_rowcount: rows = 0
> >> NOTICE:  ft_rowcount: rows = 0
> >> NOTICE:  ft_rowcount: rows = 0
> >> NOTICE:  ft_rowcount: rows = 0
> >> INSERT 0 10
> >
> > Actually, the results are correct, as we do batch-insert here.  But I
> > just wanted to show that the trigger behaves *differently* when doing
> > batch-insert.
>
> +1, I think it's a bug to do batch insert in this case.

I don't have a current version of the SQL spec, but one preliminary
version of SQL:2012 I retrieved via the wiki details that all BEFORE
triggers on INSERT/UPDATE/DELETE statements are all executed before
_any_ of that statements' affected data is modified.

See the "SQL:2011 (preliminary)" document you can grab on the wiki,
Part 2: for INSERT, in 15.10 (4) the BEFORE triggers on the changeset
are executed, and only after that in section 15.10 (5)(c) the
changeset is inserted into the target table. During the BEFORE-trigger
this table does not contain the rows of the changeset, thus a count(*)
on that table would result in a single value for all the BEFORE
triggers triggered on that statement, regardless of the FOR EACH ROW
specifier. The sections for DELETE are 15.7 (6) and 15.7 (7); and for
UPDATE 15.13(7) and 15.13(9) respectively.

I don't know about the semantics of triggers in the latest SQL
standard versions, but based on that sample it seems like we're
non-compliant on BEFORE trigger behaviour, and it doesn't seem like
it's documented in the trigger documentation.

I seem to recall a mail on this topic (changes in trigger execution
order with respect to the DML it is triggered by in the newest SQL
spec) but I can't seem to find that thread.

Kind regards,

Matthias van de Meent




Re: Proposal: Support custom authentication methods using hooks

2022-08-03 Thread Andres Freund
Hi,

On 2022-08-03 17:21:58 -0400, Stephen Frost wrote:
> * Andres Freund (and...@anarazel.de) wrote:
> > On 2022-08-03 16:28:08 -0400, Stephen Frost wrote:
> > > Again, server-side only is not interesting and not a direction that
> > > makes sense to go in because it doesn't provide any way to have
> > > trust established in both directions, which is what all modern
> > > authentication methods do (certificates, kerberos, scram) and is what we
> > > should expect from anything new in this space.
> > 
> > As explained repeatedly before, that's plainly not true. The patch allows
> > e.g. to use the exact scram flow we already have, with the code we have for
> > that (e.g. using a different source of secret).  In fact the example 
> > extension
> > does so (starting in v3, from 2022-03-15):
> 
> Sure, thanks to the bespoke code in libpq for supporting SCRAM.  I don't
> think it makes sense to move the server-side code for that into an
> extension as it just makes work for a bunch of people.  Having a way to
> have the authenticator token for scram exist elsewhere doesn't strike me
> as unreasonable but that's not the same thing and is certainly more
> constrained in terms of what it's doing.

The question is: Why is providing that ability not a good step, even if
somewhat constrainted? One argument would be that a later "protocol level
extensibility" effort would require significant changes to the API - but I
don't think that'd be the case. And even if, this isn't a large extensibility
surface, so API evolution wouldn't be a problem.


> Further, I outlined exactly how extensability in this area could be
> achieved: use some good third party library that provides multiple SASL
> methods then just add support for that library to *PG*, on both the
> libpq and the server sides.

> What I don't think makes sense is adding extensibility to the server
> side, especially if it's through a 3rd party library, but then not
> adding support for that to libpq.  I don't follow what the rationale is
> for explicitly excluding libpq from this discussion.

The rationale is trivial: Breaking down projects into manageable, separately
useful, steps is a very sensible way of doing development. Even if we get
protocol extensibility, we're still going to need an extension API like it's
provided by the patchset. After protocol extensibility the authentication
extensions would then have more functions it could call to control the auth
flow with the client.


> To be clear- I'm not explicitly saying that we can only add
> extensibility with SCRAM, I'm just saying that whatever we're doing here
> to add other actual authentication methods we should be ensuring is done
> on both sides with a way for each side to authenticate the other side,
> as all of the modern authentication methods we have already do.

But *why* do these need to be tied together?


> > > If anything, the other auth methods should be ripped out entirely 
> > > (password,
> > > md5, ldap, etc), but certainly not used as a basis for new work or a place
> > > to try and add new features, as they're all well known to have serious
> > > vulnerabilities.
> > 
> > I don't think we'd help users if we ripped out all those methods without a
> > replacement, but relegating them to contrib/ and thus requiring that they
> > explicitly get configured in the server seems like a decent step. But imo
> > that's a separate discussion.
> 
> We have a replacement for password and md5 and it's SCRAM.  For my 2c, I
> don't see the value in continuing to have those in any form at this
> point.  I concede that I may not get consensus on that but I don't
> really see how moving them to contrib would actually be helpful.

I'm much more on board with ripping out password and md5 than ldap, radius,
pam et al.

Greetings,

Andres Freund




Re: Checking pgwin32_is_junction() errors

2022-08-03 Thread Thomas Munro
On Thu, Aug 4, 2022 at 9:28 AM Andrew Dunstan  wrote:
> On 2022-08-01 Mo 16:06, Andrew Dunstan wrote:
> > I'll try it out on fairywren/drongo.

> They are happy with patches 2, 3, and 4.

Thanks for testing!

If there are no objections, I'll go ahead and commit these later today.




Re: Checking pgwin32_is_junction() errors

2022-08-03 Thread Andrew Dunstan


On 2022-08-01 Mo 16:06, Andrew Dunstan wrote:
> On 2022-08-01 Mo 01:09, Thomas Munro wrote:
>> On Thu, Jul 28, 2022 at 9:31 PM Thomas Munro  wrote:
>>> There's one curious change in the draft patch attached: you can't
>>> unlink() a junction point, you have to rmdir() it.  Previously, things
>>> that traverse directories without ever calling pgwin32_is_junction()
>>> would see junction points as S_ISDIR() and call rmdir(), which was OK,
>>> but now they see S_ISLNK() and call unlink().  So I taught unlink() to
>>> try both things.  Which is kinda weird, and not beautiful, especially
>>> when combined with the existing looping weirdness.
>> Here's a new attempt at unlink(), this time in its own patch.  This
>> version is a little more careful about calling rmdir() only after
>> checking that it is a junction point, so that unlink("a directory")
>> fails just like on Unix (well, POSIX says that that should fail with
>> EPERM, not EACCES, and implementations are allowed to make it work
>> anyway, but it doesn't seem helpful to allow it to work there when
>> every OS I know of fails with EPERM or EISDIR).  That check is racy,
>> but should be good enough for our purposes, no (see comment for a note
>> on that)?
>>
>> Longer term, I wonder if we should get rid of our use of symlinks, and
>> instead just put paths in a file and do our own path translation.  But
>> for now, this patch set completes the set of junction point-based
>> emulations, and, IMHO, cleans up a confusing aspect of our code.
>>
>> As before, 0001 is just for cfbot to add an MSYS checkmark.
>
>
> I'll try it out on fairywren/drongo.
>
>

They are happy with patches 2, 3, and 4.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: A test for replay of regression tests

2022-08-03 Thread Thomas Munro
On Thu, Aug 4, 2022 at 3:30 AM Justin Pryzby  wrote:
> On Thu, Dec 09, 2021 at 12:10:23PM +1300, Thomas Munro wrote:
> > This adds 2 whole minutes to the recovery check, when running with the
> > Windows serial-only scripts on Cirrus CI (using Andres's CI patches).
> > For Linux it adds ~20 seconds to the total of -j8 check-world.
> > Hopefully that's time well spent, because it adds test coverage for
> > all the redo routines, and hopefully soon we won't have to run 'em in
> > series on Windows.
>
> Should 027-stream-regress be renamed to something that starts earlier ?
> Off-list earlier this year, Andres referred to 000.

Do you have any data on improved times from doing that?

I have wondered about moving it into 001_stream_rep.pl.




Re: Proposal: Support custom authentication methods using hooks

2022-08-03 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2022-08-03 16:28:08 -0400, Stephen Frost wrote:
> > Again, server-side only is not interesting and not a direction that
> > makes sense to go in because it doesn't provide any way to have
> > trust established in both directions, which is what all modern
> > authentication methods do (certificates, kerberos, scram) and is what we
> > should expect from anything new in this space.
> 
> As explained repeatedly before, that's plainly not true. The patch allows
> e.g. to use the exact scram flow we already have, with the code we have for
> that (e.g. using a different source of secret).  In fact the example extension
> does so (starting in v3, from 2022-03-15):

Sure, thanks to the bespoke code in libpq for supporting SCRAM.  I don't
think it makes sense to move the server-side code for that into an
extension as it just makes work for a bunch of people.  Having a way to
have the authenticator token for scram exist elsewhere doesn't strike me
as unreasonable but that's not the same thing and is certainly more
constrained in terms of what it's doing.

> If you're ideologically opposed to allowing extensibility in this specific
> area: Say so, instead of repeating this bogus argument over and over.

Considering how much I pushed for and supported the effort to include
SCRAM, I hardly would say that I'm against making improvements in this
area.

Further, I outlined exactly how extensability in this area could be
achieved: use some good third party library that provides multiple SASL
methods then just add support for that library to *PG*, on both the
libpq and the server sides.

What I don't think makes sense is adding extensibility to the server
side, especially if it's through a 3rd party library, but then not
adding support for that to libpq.  I don't follow what the rationale is
for explicitly excluding libpq from this discussion.

To be clear- I'm not explicitly saying that we can only add
extensibility with SCRAM, I'm just saying that whatever we're doing here
to add other actual authentication methods we should be ensuring is done
on both sides with a way for each side to authenticate the other side,
as all of the modern authentication methods we have already do.  If we
got SASL added and it included support for SCRAM and Kerberos through
that and was a common enough library that we felt comfortable ripping
out our own implementations in favor of what that library provides,
sure, that'd be something to consider too (though I tend to doubt we'd
get so lucky as to have one that worked with the existing SASL/SCRAM
stuff we have today since we had to do some odd things there with the
username, as I recall).

> > If anything, the other auth methods should be ripped out entirely (password,
> > md5, ldap, etc), but certainly not used as a basis for new work or a place
> > to try and add new features, as they're all well known to have serious
> > vulnerabilities.
> 
> I don't think we'd help users if we ripped out all those methods without a
> replacement, but relegating them to contrib/ and thus requiring that they
> explicitly get configured in the server seems like a decent step. But imo
> that's a separate discussion.

We have a replacement for password and md5 and it's SCRAM.  For my 2c, I
don't see the value in continuing to have those in any form at this
point.  I concede that I may not get consensus on that but I don't
really see how moving them to contrib would actually be helpful.

> > I also don't agree that this makes sense as an extension as we don't
> > have any way for extensions to make changes in libpq or psql, again
> > leading to the issue that it either can't be exercised or we create some
> > dependency on an external SASL library for libpq but object to having
> > that same dependency on the server side, which doesn't seem sensible to
> > me.  Requiring admins to jump through hoops to install an extension
> > where we have such a dependency on a library anyway doesn't make sense
> > either.
> 
> This argument doesn't make a whole lot of sense to me - as explained above you
> can use the existing scram flow for plenty usecases. I'd be a bit more
> convinced if you'd argue that the extension API should just allow providing a
> different source of secrets for the existing scram flow (I'd argue that that's
> not the best point for extensibility, but that'd be more a question of taste).

As I think I said before, I don't particularly object to the idea of
having an alternative backing store for pg_authid (though note that I'm
actively working on extending that further to allow multiple
authenticators to be able to be configured for a role, so whatever that
backing store is would need to be adjusted if that ends up getting
committed into core).  That's quite a different thing from my
perspective.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: pg15b2: large objects lost on upgrade

2022-08-03 Thread Peter Geoghegan
On Wed, Aug 3, 2022 at 1:47 PM Tom Lane  wrote:
> Again, this seems to me to be breaking the test's real-world applicability
> for a (false?) sense of stability.

I agree.

A lot of the VACUUM test flappiness issues we've had to deal with in
the past now seem like problems with VACUUM itself, the test's design,
or both. For example, why should we get a totally different
pg_class.reltuples because we couldn't get a cleanup lock on some
page? Why not just make sure to give the same answer either way,
which happens to be the most useful behavior to the user? That way
the test isn't just targeting implementation details.

--
Peter Geoghegan




Re: Clarifying Commitfest policies

2022-08-03 Thread Matthias van de Meent
On Wed, 3 Aug 2022 at 20:04, Jacob Champion  wrote:
>
> [was: CF app: add "Returned: Needs more interest"]
>
> On Wed, Aug 3, 2022 at 10:09 AM Julien Rouhaud  wrote:
> > I'm afraid that
> > patches will still be left alone to rot and there still be no clear rules on
> > what to do and when, reminder for CFM and such, and that this new status 
> > would
> > never be used anyway.
>
> Yeah, so the lack of clear rules is an issue -- maybe not because we
> can't work without them (we have, clearly, and we can continue to do
> so) but because each of us kind of makes it up as we go along? When
> discussions about these "rules" happen on the list, it doesn't always
> happen with the same people, and opinions can vary wildly.
>
> There have been a couple of suggestions recently:
> - Revamp the CF Checklist on the wiki. I plan to do so later this
> month, but that will still need some community review.
> - Provide in-app explanations and documentation for some of the less
> obvious points. (What should the target version be? What's the
> difference between Rejected and Returned?)
>
> Is that enough, or should we do more?

"The CF Checklist" seems to refer to only the page that is (or seems
to be) intended for the CFM only. We should probably also update the
pages of "Commitfest", "Submitting a patch", "Reviewing a Patch", "So,
you want to be a developer?", and the "Developer FAQ" page, which
doesn't have to be more than removing outdated information and
refering to any (new) documentation on how to participate in the
PostgreSQL development and/or Commitfest workflow as a non-CFM.

Additionally, we might want to add extra text to the "developers"
section of the main website [0] to refer to any new documentation.
This suggestion does depend on whether the new documentation has a
high value for potential community members.

Lastly, a top-level CONTRIBUTING.md file in git repositories is also
often used as an entry point for potential contributors. I don't
suggest we copy all documentation into the main repo, just that a
pointer to our existing contributer entry documentation in such a file
could help lower the barrier of entry.
As an example, the GitHub mirror of the main PostgreSQL repository
receives a decent amount of pull request traffic. When a project has a
CONTRIBUTING.md -file at the top level people writing the pull request
message will be pointed to those contributing guidelines. This could

Thank you for raising this to a topical thread.

Kind regards,

Matthias van de Meent

[0] https://www.postgresql.org/developer/




Re: pg15b2: large objects lost on upgrade

2022-08-03 Thread Andres Freund
On 2022-08-03 16:46:57 -0400, Tom Lane wrote:
> Robert Haas  writes:
> > Or we could disable autovacuum on the new cluster, which I think is a
> > better solution. I like it when things match exactly; it makes me feel
> > that the universe is well-ordered.
> 
> Again, this seems to me to be breaking the test's real-world applicability
> for a (false?) sense of stability.

Yea, that doesn't seem like an improvement. I e.g. found the issues around
relfilenode reuse in 15 due to autovacuum running in the pg_upgrade target
cluster.  And I recall other bugs in the area...

Greetings,

Andres Freund




Re: Proposal: Support custom authentication methods using hooks

2022-08-03 Thread Andres Freund
Hi,

On 2022-08-03 16:28:08 -0400, Stephen Frost wrote:
> Again, server-side only is not interesting and not a direction that
> makes sense to go in because it doesn't provide any way to have
> trust established in both directions, which is what all modern
> authentication methods do (certificates, kerberos, scram) and is what we
> should expect from anything new in this space.

As explained repeatedly before, that's plainly not true. The patch allows
e.g. to use the exact scram flow we already have, with the code we have for
that (e.g. using a different source of secret).  In fact the example extension
does so (starting in v3, from 2022-03-15):
Check 0002 from
https://www.postgresql.org/message-id/CAJxrbyxgFzfqby%2BVRCkeAhJnwVZE50%2BZLPx0JT2TDg9LbZtkCg%40mail.gmail.com

If you're ideologically opposed to allowing extensibility in this specific
area: Say so, instead of repeating this bogus argument over and over.


> If anything, the other auth methods should be ripped out entirely (password,
> md5, ldap, etc), but certainly not used as a basis for new work or a place
> to try and add new features, as they're all well known to have serious
> vulnerabilities.

I don't think we'd help users if we ripped out all those methods without a
replacement, but relegating them to contrib/ and thus requiring that they
explicitly get configured in the server seems like a decent step. But imo
that's a separate discussion.


> I also don't agree that this makes sense as an extension as we don't
> have any way for extensions to make changes in libpq or psql, again
> leading to the issue that it either can't be exercised or we create some
> dependency on an external SASL library for libpq but object to having
> that same dependency on the server side, which doesn't seem sensible to
> me.  Requiring admins to jump through hoops to install an extension
> where we have such a dependency on a library anyway doesn't make sense
> either.

This argument doesn't make a whole lot of sense to me - as explained above you
can use the existing scram flow for plenty usecases. I'd be a bit more
convinced if you'd argue that the extension API should just allow providing a
different source of secrets for the existing scram flow (I'd argue that that's
not the best point for extensibility, but that'd be more a question of taste).

Greetings,

Andres Freund




Re: pg15b2: large objects lost on upgrade

2022-08-03 Thread Tom Lane
Robert Haas  writes:
> Or we could disable autovacuum on the new cluster, which I think is a
> better solution. I like it when things match exactly; it makes me feel
> that the universe is well-ordered.

Again, this seems to me to be breaking the test's real-world applicability
for a (false?) sense of stability.

regards, tom lane




Re: pg15b2: large objects lost on upgrade

2022-08-03 Thread Peter Geoghegan
On Wed, Aug 3, 2022 at 1:34 PM Tom Lane  wrote:
> That doesn't seem like it'd be all that thorough: we expect VACUUM
> to skip pages whenever possible.  I'm also a bit concerned about
> the expense, though admittedly this test is ridiculously expensive
> already.

I bet the SKIP_PAGES_THRESHOLD stuff will be enough to make VACUUM
visit every heap page in practice for a test case like this. That is
all it takes to be able to safely advance relfrozenxid to whatever the
oldest extant XID happened to be. However, I'm no fan of the
SKIP_PAGES_THRESHOLD behavior, and already have plans to get rid of it
-- so I wouldn't rely on that continuing to be true forever.

It's probably not really necessary to have that kind of coverage in
this particular test case. VACUUM will complain about weird
relfrozenxid values in a large variety of contexts, even without
assertions enabled. Mostly I was just saying: if we really do need
test coverage of relfrozenxid in this context, then VACUUM is probably
the way to go.

-- 
Peter Geoghegan




Re: pg15b2: large objects lost on upgrade

2022-08-03 Thread Robert Haas
On Wed, Aug 3, 2022 at 4:20 PM Andres Freund  wrote:
> > I don't really like this approach. Imagine that the code got broken in
> > such a way that relfrozenxid and relminmxid were set to a value chosen
> > at random - say, the contents of 4 bytes of unallocated memory that
> > contained random garbage. Well, right now, the chances that this would
> > cause a test failure are nearly 100%. With this change, they'd be
> > nearly 0%.
>
> Can't that pretty easily be addressed by subsequently querying txid_current(),
> and checking that the value isn't newer than that?

Hmm, maybe. The old cluster shouldn't have wrapped around ever, since
we just created it. So the value in the new cluster should be >= that
value and <= the result of txid_curent() ignoring wraparound.

Or we could disable autovacuum on the new cluster, which I think is a
better solution. I like it when things match exactly; it makes me feel
that the universe is well-ordered.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg15b2: large objects lost on upgrade

2022-08-03 Thread Tom Lane
Peter Geoghegan  writes:
> It couldn't hurt to do that as well, in passing (at the same time as
> testing that newrelfrozenxid >= oldrelfrozenxid directly). But
> deliberately running VACUUM afterwards seems like a good idea. We
> really ought to expect VACUUM to catch cases where
> relfrozenxid/relminmxid is faulty, at least in cases where it can be
> proven wrong by noticing some kind of inconsistency.

That doesn't seem like it'd be all that thorough: we expect VACUUM
to skip pages whenever possible.  I'm also a bit concerned about
the expense, though admittedly this test is ridiculously expensive
already.

regards, tom lane




Re: Unstable tests for recovery conflict handling

2022-08-03 Thread Robert Haas
On Wed, Aug 3, 2022 at 1:57 PM Andres Freund  wrote:
> The reason nothing might get logged in some cases is that
> e.g. ResolveRecoveryConflictWithLock() tells
> ResolveRecoveryConflictWithVirtualXIDs() to *not* report the waiting:
> /*
>  * Prevent ResolveRecoveryConflictWithVirtualXIDs() from 
> reporting
>  * "waiting" in PS display by disabling its argument 
> report_waiting
>  * because the caller, WaitOnLock(), has already reported 
> that.
>  */
>
> so ResolveRecoveryConflictWithLock() can end up looping indefinitely without
> logging anything.

I understand why we need to avoid adding "waiting" to the PS status
when we've already done that, but it doesn't seem like that should
imply skipping ereport() of log messages.

I think we could redesign the way the ps display works to make things
a whole lot simpler. Let's have a function set_ps_display() and
another function set_ps_display_suffix(). What gets reported to the OS
is the concatenation of the two. Calling set_ps_display() implicitly
resets the suffix to empty.

AFAICS, that'd let us get rid of this tricky logic, and some other
tricky logic as well. Here, we'd just say set_ps_display_suffix("
waiting") and not worry about whether the caller might have already
done something similar.

> Another question I have about ResolveRecoveryConflictWithLock() is whether
> it's ok that we don't check deadlocks around the
> ResolveRecoveryConflictWithVirtualXIDs() call? It might be ok, because we'd
> only block if there's a recovery conflict, in which killing the process ought
> to succeed?

The startup process is supposed to always "win" in any deadlock
situation, so I'm not sure what you think is a problem here. We get
the conflicting lockers. We kill them. If they don't die, that's a
bug, but killing ourselves doesn't really help anything; if we die,
the whole system goes down, which seems undesirable.

> I think there's also might be a problem with the wait loop in ProcSleep() wrt
> recovery conflicts: We rely on interrupts to be processed to throw recovery
> conflict errors, but ProcSleep() is called in a bunch of places with
> interrupts held. An Assert(INTERRUPTS_CAN_BE_PROCESSED()) after releasing the
> partition lock triggers a bunch.  It's possible that these aren't problematic
> cases for recovery conflicts, because they're all around extension locks:
> [...]
> Independent of recovery conflicts, isn't it dangerous that we acquire the
> relation extension lock with a buffer content lock held? I *guess* it might be
> ok because BufferAlloc(P_NEW) only acquires buffer content locks in a
> conditional way.

These things both seem a bit sketchy but it's not 100% clear to me
that anything is actually broken. Now it's also not clear to me that
nothing is broken ...

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg15b2: large objects lost on upgrade

2022-08-03 Thread Peter Geoghegan
On Wed, Aug 3, 2022 at 1:20 PM Andres Freund  wrote:
> > I don't really like this approach. Imagine that the code got broken in
> > such a way that relfrozenxid and relminmxid were set to a value chosen
> > at random - say, the contents of 4 bytes of unallocated memory that
> > contained random garbage. Well, right now, the chances that this would
> > cause a test failure are nearly 100%. With this change, they'd be
> > nearly 0%.
>
> Can't that pretty easily be addressed by subsequently querying txid_current(),
> and checking that the value isn't newer than that?

It couldn't hurt to do that as well, in passing (at the same time as
testing that newrelfrozenxid >= oldrelfrozenxid directly). But
deliberately running VACUUM afterwards seems like a good idea. We
really ought to expect VACUUM to catch cases where
relfrozenxid/relminmxid is faulty, at least in cases where it can be
proven wrong by noticing some kind of inconsistency.

-- 
Peter Geoghegan




Re: Proposal: Support custom authentication methods using hooks

2022-08-03 Thread Stephen Frost
Greetings,

* samay sharma (smilingsa...@gmail.com) wrote:
> On Tue, Aug 2, 2022 at 2:48 PM Jacob Champion 
> wrote:
> > [dev hat] That said, I plan to do some additional dev work on top of
> > this over the next couple of months. The ideal case would be to provide
> > a server-only extension that provides additional features to existing
> > clients in the wild (i.e. no client-side changes).
> >
> > I'm also interested in plugging an existing 3rd-party SASL library into
> > the client, which would help extensibility on that side.
> 
> That would be interesting.

Again, server-side only is not interesting and not a direction that
makes sense to go in because it doesn't provide any way to have
trust established in both directions, which is what all modern
authentication methods do (certificates, kerberos, scram) and is what we
should expect from anything new in this space.  If anything, the other
auth methods should be ripped out entirely (password, md5, ldap, etc),
but certainly not used as a basis for new work or a place to try and add
new features, as they're all well known to have serious vulnerabilities.

As it specifically relates to SASL- if there's work to properly support
SASL for psql/libpq while linking to an external library which supports,
say, Kerberos through SASL, then sure, having that could work out well
and I wouldn't object to it, but I don't agree that we should have
dedicated SASL code which links to an external library on the server
side without any way to exercise it through libpq/psql, or vice-versa.

I also don't agree that this makes sense as an extension as we don't
have any way for extensions to make changes in libpq or psql, again
leading to the issue that it either can't be exercised or we create some
dependency on an external SASL library for libpq but object to having
that same dependency on the server side, which doesn't seem sensible to
me.  Requiring admins to jump through hoops to install an extension
where we have such a dependency on a library anyway doesn't make sense
either.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: optimize lookups in snapshot [sub]xip arrays

2022-08-03 Thread Nathan Bossart
On Wed, Aug 03, 2022 at 11:06:58AM -0700, Andres Freund wrote:
> On 2022-08-02 16:43:57 -0700, Nathan Bossart wrote:
>> >> +#ifdef USE_SSE2
>> >> +pg_attribute_no_sanitize_alignment()
>> >> +#endif
>> > 
>> > What's the deal with this annotation? Needs a comment.
>> 
>> Will do.  c.h suggests that this should only be used for x86-specific code.
> 
> What I'm asking is why the annotation is needed at all?

Upon further inspection, I don't think this is needed.  I originally
borrowed it from the SSE version of the CRC code, but while it is trivial
to produce alignment failures with the CRC code, I haven't been able to
generate any with my patches.  Looking at the code, I'm not sure why I was
worried about this in the first place.  Please pardon the brain fade.

Here is a new patch set without the annotation.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From bd523948876f801b7f1b909f399b2cc41acf06cf Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Wed, 3 Aug 2022 11:07:40 +0700
Subject: [PATCH v6 1/3] Support SSE2 intrinsics where available

SSE2 vector instructions are part of the spec for the 64-bit x86
architecture. Until now we have relied on the compiler to autovectorize
in some limited situations, but some useful coding idioms can only be
expressed explicitly via compiler intrinsics. To this end, add a header
that defines USE_SSE2 when available. While x86-only for now, we can
add other architectures in the future. This will also be the intended
place for low-level hepler functions that use vector operations.

Reviewed by Nathan Bossart

Discussion: https://www.postgresql.org/message-id/CAFBsxsE2G_H_5Wbw%2BNOPm70-BK4xxKf86-mRzY%3DL2sLoQqM%2B-Q%40mail.gmail.com
---
 src/include/port/simd.h | 30 ++
 1 file changed, 30 insertions(+)
 create mode 100644 src/include/port/simd.h

diff --git a/src/include/port/simd.h b/src/include/port/simd.h
new file mode 100644
index 00..a571e79f57
--- /dev/null
+++ b/src/include/port/simd.h
@@ -0,0 +1,30 @@
+/*-
+ *
+ * simd.h
+ *	  Support for platform-specific vector operations.
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/port/simd.h
+ *
+ *-
+ */
+#ifndef SIMD_H
+#define SIMD_H
+
+/*
+ * SSE2 instructions are part of the spec for the 64-bit x86 ISA. We assume
+ * that compilers targeting this architecture understand SSE2 intrinsics.
+ *
+ * We use emmintrin.h rather than the comprehensive header immintrin.h in
+ * order to exclude extensions beyond SSE2. This is because MSVC, at least,
+ * will allow the use of intrinsics that haven't been enabled at compile
+ * time.
+ */
+#if (defined(__x86_64__) || defined(_M_AMD64))
+#include 
+#define USE_SSE2
+#endif
+
+#endif			/* SIMD_H */
-- 
2.25.1

>From 9b70c265fa7a254117436eed59c2d0effd07a00d Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 3 Aug 2022 09:49:04 -0700
Subject: [PATCH v6 2/3] Introduce optimized routine for linear searches
 through an array of integers.

If SSE2 is available, this function uses it to speed up the search.  Otherwise,
it uses a simple 'for' loop.  This is a prerequisite for a follow-up commit
that will use this function to optimize [sub]xip lookups in
XidInMVCCSnapshot(), but it can be used anywhere that might benefit from such
an optimization.

It might be worthwhile to add an ARM-specific code path to this function in the
future.

Author: Nathan Bossart
Reviewed by: Andres Freund, John Naylor
Discussion: https://postgr.es/m/20220713170950.GA3116318%40nathanxps13
---
 src/include/utils/linearsearch.h | 70 
 1 file changed, 70 insertions(+)
 create mode 100644 src/include/utils/linearsearch.h

diff --git a/src/include/utils/linearsearch.h b/src/include/utils/linearsearch.h
new file mode 100644
index 00..a23ad45d82
--- /dev/null
+++ b/src/include/utils/linearsearch.h
@@ -0,0 +1,70 @@
+/*-
+ *
+ * linearsearch.h
+ *	  Optimized linear search routines.
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/include/utils/linearsearch.h
+ *
+ *-
+ */
+#ifndef LINEARSEARCH_H
+#define LINEARSEARCH_H
+
+#include "port/simd.h"
+
+/*
+ * pg_linearsearch_uint32
+ *
+ * Returns true if there is an element in 'base' that equals 'key'.  Otherwise,
+ * returns false.
+ */
+static inline bool
+pg_linearsearch_uint32(uint32 key, uint32 *base, uint32 nelem)
+{
+	uint32		i = 0;
+
+	/* If possible, use SSE2 intrinsics to speed up the search. */
+#ifdef USE_SSE2
+	__m128i		keys = _mm_set1_epi32(key);	/* load 4 copies of key */
+	uint32		its = nelem & ~0xF;		

Re: pg15b2: large objects lost on upgrade

2022-08-03 Thread Andres Freund
Hi,

On 2022-08-03 09:59:40 -0400, Robert Haas wrote:
> On Tue, Aug 2, 2022 at 3:51 PM Tom Lane  wrote:
> > > The test does look helpful and it would catch regressions. Loosely
> > > quoting Robert on a different point upthread, we don't want to turn off
> > > the alarm just because it's spuriously going off.
> > > I think the weakened check is OK (and possibly mimics the real-world
> > > where autovacuum runs), unless you see a major drawback to it?
> >
> > I also think that ">=" is a sufficient requirement.  It'd be a
> > bit painful to test if we had to cope with potential XID wraparound,
> > but we know that these installations haven't been around nearly
> > long enough for that, so a plain ">=" test ought to be good enough.
> > (Replacing the simple "eq" code with something that can handle that
> > doesn't look like much fun, though.)
> 
> I don't really like this approach. Imagine that the code got broken in
> such a way that relfrozenxid and relminmxid were set to a value chosen
> at random - say, the contents of 4 bytes of unallocated memory that
> contained random garbage. Well, right now, the chances that this would
> cause a test failure are nearly 100%. With this change, they'd be
> nearly 0%.

Can't that pretty easily be addressed by subsequently querying txid_current(),
and checking that the value isn't newer than that?

Greetings,

Andres Freund




Re: Smoothing the subtrans performance catastrophe

2022-08-03 Thread Andres Freund
Hi,

On 2022-08-03 15:36:40 -0400, Robert Haas wrote:
> On Wed, Aug 3, 2022 at 3:18 PM Andres Freund  wrote:
> > In contrast to e.g. clog or multixact we don't need to access a lot of old
> > While we can't put a useful hard cap on the number of potential subtrans
> > entries (we can only throw subxid->parent mappings away once no existing
> > snapshot might need them), saying that there can't be more subxids 
> > "considered
> > running" at a time than can fit in memory doesn't seem like a particularly
> > problematic restriction.
>
> That sounds really problematic to me, unless I misunderstand what
> you're proposing. Say I have a plpgsql containing a FOR loop which in
> turn contains an EXCEPTION block which in turn does DML. Right now,
> that loop could iterate millions of times and everything would still
> work. Sure, there might be performance impacts depending on what else
> is happening on the system, but it might also be totally fine. IIUC,
> you'd like to make that case fail outright. I think that's a
> non-starter.

I don't think this scenario would fundamentally change - we already keep the
set of subxids in backend local memory (e.g. either a dedicated
TransactionStateData or an element in ->childXids) and in the locking table
(via XactLockTableInsert()).

The problematic part isn't keeping "actually" running subxids in memory, but
keeping subxids that might be "considered running" in memory (i.e. subxids
that are considered running by an old snapshot in another backend).

A hashtable just containing child->parent mapping for subxids doesn't actually
need that much memory. It'd be approximately (2 * 4 bytes) * subxids *
(2-fillfactor) or such? So maybe ~10MB for 1 milllion subxids?  Allocating
that on-demand doesn't strike me as prohibitive.


> I don't know whether Simon's ideas here are amazingly good, utterly
> terrible, or something in between, but I think we can evaluate the
> patch actually submitted rather than propose a complete redesign of
> the entire mechanism - especially one that seems like it would break
> stuff that currently works.

We've had quite a few patches that try to address issues around subids, but
only ever improve things on the margins. I'm doubtful that's a useful use of
time.

Greetings,

Andres Freund




Re: [PATCH] CF app: add "Returned: Needs more interest"

2022-08-03 Thread Tom Lane
Andres Freund  writes:
> I agree very much with that - just am doubtful that "lacks interest" is a good
> way of dealing with it, unless we just want to treat it as a nicer sounding
> "rejected".

I think there is a difference.  "Lacks interest" suggests that there
is a path forward for the patch, namely (as Jacob has mentioned
repeatedly) doing some sort of consensus-building that it's worth
pursuing.  The author may or may not have the interest/skills to do
that, but it's possible that it could happen.  "Rejected" says "don't
bother pursuing this, it's a bad idea".  Neither of these seems the
same as RWF, which I think we mostly understand to mean "this patch
has technical problems that can probably be fixed".

regards, tom lane




Re: [PATCH] CF app: add "Returned: Needs more interest"

2022-08-03 Thread Andres Freund
Hi,

On 2022-08-03 12:06:03 -0700, Jacob Champion wrote:
> On 8/3/22 11:41, Andres Freund wrote:
> > What patches are we concretely talking about?>
> > My impression is that a lot of the patches floating from CF to CF have 
> > gotten
> > sceptical feedback and at best a minor amount of work to address that has 
> > been
> > done.
> 
> - https://commitfest.postgresql.org/38/2482/

Hm - "Returned: Needs more interest" doesn't seem like it'd have been more
descriptive? It was split off a patchset that was committed at the tail end of
15 (and which still has *severe* code quality issues). Imo having a CF entry
before the rest of the jsonpath stuff made it in doesn't seem like a good
idea.


> - https://commitfest.postgresql.org/38/3338/

Here it'd have fit.


> - https://commitfest.postgresql.org/38/3181/

FWIW, I mentioned at least once that I didn't think this was worth pursuing.


> - https://commitfest.postgresql.org/38/2918/

Hm, certainly not a lot of review activity.


> - https://commitfest.postgresql.org/38/2710/

A good bit of this was committed in some form with a decent amount of review
activity for a while.


> - https://commitfest.postgresql.org/38/2266/ (this one was particularly
> miscommunicated during the first RwF)

I'd say misunderstanding than miscommunication...

It seems partially stalled due to the potential better approach based on
https://www.postgresql.org/message-id/flat/15848.1576515643%40sss.pgh.pa.us ?
In which case RwF doesn't seem to inappropriate.


> - https://commitfest.postgresql.org/38/2218/

Yep.


> - https://commitfest.postgresql.org/38/3256/

Yep.


> - https://commitfest.postgresql.org/38/3310/

I don't really understand why this has been RwF'd, doesn't seem that long
since the last review leading to changes.


> - https://commitfest.postgresql.org/38/3050/

Given that a non-author did a revision of the patch, listed a number of TODO
items and said "I'll create regression tests firstly." - I don't think "lacks
interest" would have been appropriate, and RwF is?


> (Even if they'd all received skeptical feedback, if the author replies in
> good faith and is met with silence for months, we need to not keep stringing
> them along.)

I agree very much with that - just am doubtful that "lacks interest" is a good
way of dealing with it, unless we just want to treat it as a nicer sounding
"rejected".

Greetings,

Andres Freund




Re: pg15b2: large objects lost on upgrade

2022-08-03 Thread Peter Geoghegan
On Wed, Aug 3, 2022 at 6:59 AM Robert Haas  wrote:
> I don't really like this approach. Imagine that the code got broken in
> such a way that relfrozenxid and relminmxid were set to a value chosen
> at random - say, the contents of 4 bytes of unallocated memory that
> contained random garbage. Well, right now, the chances that this would
> cause a test failure are nearly 100%. With this change, they'd be
> nearly 0%.

If that kind of speculative bug existed, and somehow triggered before
the concurrent autovacuum ran (which seems very likely to be the
source of the test flappiness), then it would still be caught, most
likely. VACUUM itself has the following defenses:

* The defensive "can't happen" errors added to
heap_prepare_freeze_tuple() and related freezing routines by commit
699bf7d0 in 2017, as hardening following the "freeze the dead" bug.
That'll catch XIDs that are before the relfrozenxid at the start of
the VACUUM (ditto for MXIDs/relminmxid).

* The assertion added in my recent commit 0b018fab, which verifies
that we're about to set relfrozenxid to something sane.

* VACUUM now warns when it sees a *previous* relfrozenxid that's
apparently "in the future", following recent commit e83ebfe6. This
problem scenario is associated with several historic bugs in
pg_upgrade, where for one reason or another it failed to carry forward
correct relfrozenxid and/or relminmxid values for a table (see the
commit message for references to those old pg_upgrade bugs).

It might make sense to run a manual VACUUM right at the end of the
test, so that you reliably get this kind of coverage, even without
autovacuum.

-- 
Peter Geoghegan




Re: Smoothing the subtrans performance catastrophe

2022-08-03 Thread Robert Haas
On Wed, Aug 3, 2022 at 3:18 PM Andres Freund  wrote:
> In contrast to e.g. clog or multixact we don't need to access a lot of old
> While we can't put a useful hard cap on the number of potential subtrans
> entries (we can only throw subxid->parent mappings away once no existing
> snapshot might need them), saying that there can't be more subxids "considered
> running" at a time than can fit in memory doesn't seem like a particularly
> problematic restriction.

That sounds really problematic to me, unless I misunderstand what
you're proposing. Say I have a plpgsql containing a FOR loop which in
turn contains an EXCEPTION block which in turn does DML. Right now,
that loop could iterate millions of times and everything would still
work. Sure, there might be performance impacts depending on what else
is happening on the system, but it might also be totally fine. IIUC,
you'd like to make that case fail outright. I think that's a
non-starter.

I don't know whether Simon's ideas here are amazingly good, utterly
terrible, or something in between, but I think we can evaluate the
patch actually submitted rather than propose a complete redesign of
the entire mechanism - especially one that seems like it would break
stuff that currently works.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Smoothing the subtrans performance catastrophe

2022-08-03 Thread Andres Freund
Hi,

On 2022-08-01 17:42:49 +0100, Simon Riggs wrote:
> The reason for the slowdown is clear: when we overflow we check every
> xid against subtrans, producing a large stream of lookups. Some
> previous hackers have tried to speed up subtrans - this patch takes a
> different approach: remove as many subtrans lookups as possible. (So
> is not competing with those other solutions).
> 
> Attached patch improves on the situation, as also shown in the attached 
> diagram.

I think we should consider redesigning subtrans more substantially - even with
the changes you propose here, there's still plenty ways to hit really bad
performance. And there's only so much we can do about that without more
fundamental design changes.

One way to fix a lot of the issues around pg_subtrans would be remove the
pg_subtrans SLRU and replace it with a purely in-memory hashtable. IMO there's
really no good reason to use an SLRU for it (anymore).

In contrast to e.g. clog or multixact we don't need to access a lot of old
entries, we don't need persistency etc. Nor is it a good use of memory and IO
to have loads of pg_subtrans pages that don't point anywhere, because the xid
is just a "normal" xid.

While we can't put a useful hard cap on the number of potential subtrans
entries (we can only throw subxid->parent mappings away once no existing
snapshot might need them), saying that there can't be more subxids "considered
running" at a time than can fit in memory doesn't seem like a particularly
problematic restriction.

So, why don't we use a dshash table with some amount of statically allocated
memory for the mapping? In common cases that will *reduce* memory usage
(because we don't need to reserve space for [as many] subxids in snapshots /
procarray anymore) and IO (no mostly-zeroes pg_subtrans).

Greetings,

Andres Freund




Re: [PATCH] CF app: add "Returned: Needs more interest"

2022-08-03 Thread Jacob Champion
On 8/3/22 11:41, Andres Freund wrote:
> What patches are we concretely talking about?>
> My impression is that a lot of the patches floating from CF to CF have gotten
> sceptical feedback and at best a minor amount of work to address that has been
> done.

- https://commitfest.postgresql.org/38/2482/
- https://commitfest.postgresql.org/38/3338/
- https://commitfest.postgresql.org/38/3181/
- https://commitfest.postgresql.org/38/2918/
- https://commitfest.postgresql.org/38/2710/
- https://commitfest.postgresql.org/38/2266/ (this one was particularly
miscommunicated during the first RwF)
- https://commitfest.postgresql.org/38/2218/
- https://commitfest.postgresql.org/38/3256/
- https://commitfest.postgresql.org/38/3310/
- https://commitfest.postgresql.org/38/3050/

Looking though, some of those have received skeptical feedback as you
say, but certainly not all; not even a majority IMO. (Even if they'd all
received skeptical feedback, if the author replies in good faith and is
met with silence for months, we need to not keep stringing them along.)

--Jacob




Re: [PATCH] CF app: add "Returned: Needs more interest"

2022-08-03 Thread Tom Lane
Andres Freund  writes:
> My impression is that a lot of the patches floating from CF to CF have gotten
> sceptical feedback and at best a minor amount of work to address that has been
> done.

That I think is a distinct issue: nobody wants to take on the
unpleasant job of saying "no, we don't want this" in a final way.
We may raise some objections but actually rejecting a patch is hard.
So it tends to slide forward until the author gives up.

regards, tom lane




Re: pg15b2: large objects lost on upgrade

2022-08-03 Thread Peter Geoghegan
On Tue, Aug 2, 2022 at 12:32 PM Tom Lane  wrote:
> Hmmm ... now that you mention it, I see nothing in 002_pg_upgrade.pl
> that attempts to turn off autovacuum on either the source server or
> the destination.  So one plausible theory is that autovac moved the
> numbers since we checked.

It's very easy to believe that my work in commit 0b018fab could make
that happen, which is only a few months old. It's now completely
routine for non-aggressive autovacuums to advance relfrozenxid by at
least a small amount.

For example, when autovacuum runs against either the tellers table or
the branches table during a pgbench run, it will now advance
relfrozenxid, every single time. And to a very recent value. This will
happen in spite of the fact that no freezing ever takes place -- it's
just a consequence of the oldest extant XID consistently being quite
young each time, due to workload characteristics.

-- 
Peter Geoghegan




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-08-03 Thread Justin Pryzby
On Wed, Aug 03, 2022 at 11:26:43AM -0700, Andres Freund wrote:
> Hm. This looks more like an issue of DROP DATABASE not being interruptible. I
> suspect this isn't actually related to STRATEGY wal_log and could likely be
> reproduced in older versions too.

I couldn't reproduce it with file_copy, but my recipe isn't exactly reliable.
That may just mean that it's easier to hit now.

-- 
Justin




Re: [PATCH] CF app: add "Returned: Needs more interest"

2022-08-03 Thread Andres Freund
Hi,

On 2022-08-03 10:52:36 -0700, Jacob Champion wrote:
> The situation I'm looking at, though, is where we have a dozen patches
> floating forward that multiple CFMs in a row feel should be returned,
> but they won't because claiming "they have feedback" is clearly unfair
> to the author. I think this will improve that situation.

What patches are we concretely talking about?

My impression is that a lot of the patches floating from CF to CF have gotten
sceptical feedback and at best a minor amount of work to address that has been
done.

Greetings,

Andres Freund




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-08-03 Thread Andres Freund
Hi,

On 2022-08-03 12:01:18 -0500, Justin Pryzby wrote:
> Now, I've reproduced the problem under valgrind, but it doesn't show anything
> useful

Yea, that looks like an issue on a different level.

> 
> pryzbyj@pryzbyj:~$ while :; do psql -h /tmp template1 -c "DROP DATABASE a" -c 
> "CREATE DATABASE a TEMPLATE postgres STRATEGY wal_log"; done
> ERROR:  database "a" does not exist
> CREATE DATABASE
> ^CCancel request sent
> ERROR:  canceling statement due to user request
> ERROR:  database "a" already exists
> ^C

Hm. This looks more like an issue of DROP DATABASE not being interruptible. I
suspect this isn't actually related to STRATEGY wal_log and could likely be
reproduced in older versions too.

It's pretty obvious that dropdb() isn't safe against being interrupted. We
delete the data before we have committed the deletion of the pg_database
entry.

Seems like we should hold interrupts across the remove_dbtablespaces() until
*after* we've committed the transaction?

Greetings,

Andres Freund




Re: optimize lookups in snapshot [sub]xip arrays

2022-08-03 Thread Andres Freund
Hi,

On 2022-08-02 16:43:57 -0700, Nathan Bossart wrote:
> >> +/*
> >> + * pg_linearsearch_uint32
> >> + *
> >> + * Returns the address of the first element in 'base' that equals 'key', 
> >> or
> >> + * NULL if no match is found.
> >> + */
> >> +#ifdef USE_SSE2
> >> +pg_attribute_no_sanitize_alignment()
> >> +#endif
> > 
> > What's the deal with this annotation? Needs a comment.
> 
> Will do.  c.h suggests that this should only be used for x86-specific code.

What I'm asking is why the annotation is needed at all?

Greetings,

Andres Freund




Clarifying Commitfest policies

2022-08-03 Thread Jacob Champion
[was: CF app: add "Returned: Needs more interest"]

On Wed, Aug 3, 2022 at 10:09 AM Julien Rouhaud  wrote:
> I'm afraid that
> patches will still be left alone to rot and there still be no clear rules on
> what to do and when, reminder for CFM and such, and that this new status would
> never be used anyway.

Yeah, so the lack of clear rules is an issue -- maybe not because we
can't work without them (we have, clearly, and we can continue to do
so) but because each of us kind of makes it up as we go along? When
discussions about these "rules" happen on the list, it doesn't always
happen with the same people, and opinions can vary wildly.

There have been a couple of suggestions recently:
- Revamp the CF Checklist on the wiki. I plan to do so later this
month, but that will still need some community review.
- Provide in-app explanations and documentation for some of the less
obvious points. (What should the target version be? What's the
difference between Rejected and Returned?)

Is that enough, or should we do more?

My preference, as I think Daniel also said in a recent thread, would
be for most of this information to be in the application somewhere.
That would make it more immediately accessible to everyone. (The
tradeoff is, it gets harder to update.)

--Jacob




Re: enable/disable broken for statement triggers on partitioned tables

2022-08-03 Thread Alvaro Herrera
On 2022-Aug-02, Amit Langote wrote:

> Regarding the patch, I agree that storing the recurse flag rather than
> overwriting subtype might be better.
> 
> +   boolexecTimeRecursion; /* set by ATPrepCmd if ATExecCmd must
> +   * recurse to children */
> 
> Might it be better to call this field simply 'recurse'?  I think it's
> clear from the context and the comment above the flag is to be used
> during execution.

Yeah, I guess we can do that and also reword the overall ALTER TABLE
comment about recursion.  That's in the attached first patch, which is
intended as backpatchable.

The second patch is just to show how we'd rewrite AT_AddColumn to no
longer use the Recurse separate enum value but instead use the ->recurse
flag.  This is pretty straightforward and it's a clear net reduction of
code.  We can't backpatch this kind of thing of course, both because of
the ABI break (easily fixed) and because potential destabilization
(scary).  We can do similar tihngs for the other AT enum values for
recursion.  This isn't complete since there are a few other values in
that enum that we should process in this way too; I don't intend it to
push it just yet.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"XML!" Exclaimed C++.  "What are you doing here? You're not a programming
language."
"Tell that to the people who use me," said XML.
https://burningbird.net/the-parable-of-the-languages/
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index d22dd44712..70b94bbb39 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -580,7 +580,8 @@ static void ATExecSetRelOptions(Relation rel, List *defList,
 AlterTableType operation,
 LOCKMODE lockmode);
 static void ATExecEnableDisableTrigger(Relation rel, const char *trigname,
-	   char fires_when, bool skip_system, LOCKMODE lockmode);
+	   char fires_when, bool skip_system, bool recurse,
+	   LOCKMODE lockmode);
 static void ATExecEnableDisableRule(Relation rel, const char *rulename,
 	char fires_when, LOCKMODE lockmode);
 static void ATPrepAddInherit(Relation child_rel);
@@ -4057,9 +4058,7 @@ AlterTableLookupRelation(AlterTableStmt *stmt, LOCKMODE lockmode)
  * be done in this phase.  Generally, this phase acquires table locks,
  * checks permissions and relkind, and recurses to find child tables.
  *
- * ATRewriteCatalogs performs phase 2 for each affected table.  (Note that
- * phases 2 and 3 normally do no explicit recursion, since phase 1 already
- * did it --- although some subcommands have to recurse in phase 2 instead.)
+ * ATRewriteCatalogs performs phase 2 for each affected table.
  * Certain subcommands need to be performed before others to avoid
  * unnecessary conflicts; for example, DROP COLUMN should come before
  * ADD COLUMN.  Therefore phase 1 divides the subcommands into multiple
@@ -4067,6 +4066,12 @@ AlterTableLookupRelation(AlterTableStmt *stmt, LOCKMODE lockmode)
  *
  * ATRewriteTables performs phase 3 for those tables that need it.
  *
+ * For most subcommand types, phases 2 and 3 do no explicit recursion,
+ * since phase 1 already does it.  However, for certain subcommand types
+ * it is only possible to determine how to recurse at phase 2 time; for
+ * those cases, phase 1 sets the cmd->recurse flag (or, in some older coding,
+ * changes the command subtype of a "Recurse" variant XXX to be cleaned up.)
+ *
  * Thanks to the magic of MVCC, an error anywhere along the way rolls back
  * the whole operation; we don't have to do anything special to clean up.
  *
@@ -4487,10 +4492,10 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 errhint("Use ALTER TABLE ... DETACH PARTITION ... FINALIZE to complete the pending detach operation."));
 
 	/*
-	 * Copy the original subcommand for each table.  This avoids conflicts
-	 * when different child tables need to make different parse
-	 * transformations (for example, the same column may have different column
-	 * numbers in different children).
+	 * Copy the original subcommand for each table, so we can scribble on it.
+	 * This avoids conflicts when different child tables need to make
+	 * different parse transformations (for example, the same column may have
+	 * different column numbers in different children).
 	 */
 	cmd = copyObject(cmd);
 
@@ -4777,8 +4782,9 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 		case AT_DisableTrigAll:
 		case AT_DisableTrigUser:
 			ATSimplePermissions(cmd->subtype, rel, ATT_TABLE | ATT_FOREIGN_TABLE);
-			if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-ATSimpleRecursion(wqueue, rel, cmd, recurse, lockmode, context);
+			/* Set up recursion for phase 2; no other prep needed */
+			if (recurse)
+cmd->recurse = true;
 			pass = AT_PASS_MISC;
 			break;
 		case AT_EnableRule:		/* ENABLE/DISABLE RULE variants */
@@ -5119,35 +5125,51 @@ ATExecCmd(List 

Re: Unstable tests for recovery conflict handling

2022-08-03 Thread Andres Freund
Hi,

On 2022-07-26 13:03:54 -0700, Andres Freund wrote:
> On 2022-07-26 14:30:30 -0400, Tom Lane wrote:
> > Andres Freund  writes:
> > > On 2022-07-26 13:57:53 -0400, Tom Lane wrote:
> > >> So this is not a case of RecoveryConflictInterrupt doing the wrong thing:
> > >> the startup process hasn't detected the buffer conflict in the first
> > >> place.
> >
> > > I wonder if this, at least partially, could be be due to the elog thing
> > > I was complaining about nearby. I.e. we decide to FATAL as part of a
> > > recovery conflict interrupt, and then during that ERROR out as part of
> > > another recovery conflict interrupt (because nothing holds interrupts as
> > > part of FATAL).
> >
> > There are all sorts of things one could imagine going wrong in the
> > backend receiving the recovery conflict interrupt, but AFAICS in these
> > failures, the startup process hasn't sent a recovery conflict interrupt.
> > It certainly hasn't logged anything suggesting it noticed a conflict.
>
> I don't think we reliably emit a log message before the recovery
> conflict is resolved.

I played around trying to reproduce this kind of issue.

One way to quickly run into trouble on a slow system is that
ResolveRecoveryConflictWithVirtualXIDs() can end up sending signals more
frequently than the target can process them. The next signal can arrive by the
time SIGUSR1 processing finished, which, at least on linux, causes the queued
signal to immediately be processed, without "normal" postgres code gaining
control.

The reason nothing might get logged in some cases is that
e.g. ResolveRecoveryConflictWithLock() tells
ResolveRecoveryConflictWithVirtualXIDs() to *not* report the waiting:
/*
 * Prevent ResolveRecoveryConflictWithVirtualXIDs() from 
reporting
 * "waiting" in PS display by disabling its argument 
report_waiting
 * because the caller, WaitOnLock(), has already reported that.
 */

so ResolveRecoveryConflictWithLock() can end up looping indefinitely without
logging anything.



Another question I have about ResolveRecoveryConflictWithLock() is whether
it's ok that we don't check deadlocks around the
ResolveRecoveryConflictWithVirtualXIDs() call? It might be ok, because we'd
only block if there's a recovery conflict, in which killing the process ought
to succeed?


I think there's also might be a problem with the wait loop in ProcSleep() wrt
recovery conflicts: We rely on interrupts to be processed to throw recovery
conflict errors, but ProcSleep() is called in a bunch of places with
interrupts held. An Assert(INTERRUPTS_CAN_BE_PROCESSED()) after releasing the
partition lock triggers a bunch.  It's possible that these aren't problematic
cases for recovery conflicts, because they're all around extension locks:

#2  0x562032f1968d in ExceptionalCondition (conditionName=0x56203310623a 
"INTERRUPTS_CAN_BE_PROCESSED()", errorType=0x562033105f6c "FailedAssertion",
fileName=0x562033105f30 
"/home/andres/src/postgresql/src/backend/storage/lmgr/proc.c", lineNumber=1208)
at /home/andres/src/postgresql/src/backend/utils/error/assert.c:69
#3  0x562032d50f41 in ProcSleep (locallock=0x562034cafaf0, 
lockMethodTable=0x562033281740 )
at /home/andres/src/postgresql/src/backend/storage/lmgr/proc.c:1208
#4  0x562032d3e2ce in WaitOnLock (locallock=0x562034cafaf0, 
owner=0x562034d12c58) at 
/home/andres/src/postgresql/src/backend/storage/lmgr/lock.c:1859
#5  0x562032d3cd0a in LockAcquireExtended (locktag=0x7ffc7b4d0810, 
lockmode=7, sessionLock=false, dontWait=false, reportMemoryError=true, 
locallockp=0x0)
at /home/andres/src/postgresql/src/backend/storage/lmgr/lock.c:1101
#6  0x562032d3c1c4 in LockAcquire (locktag=0x7ffc7b4d0810, lockmode=7, 
sessionLock=false, dontWait=false)
at /home/andres/src/postgresql/src/backend/storage/lmgr/lock.c:752
#7  0x562032d3a696 in LockRelationForExtension (relation=0x7f54646b1dd8, 
lockmode=7) at /home/andres/src/postgresql/src/backend/storage/lmgr/lmgr.c:439
#8  0x562032894276 in _bt_getbuf (rel=0x7f54646b1dd8, blkno=4294967295, 
access=2) at /home/andres/src/postgresql/src/backend/access/nbtree/nbtpage.c:975
#9  0x56203288f1cb in _bt_split (rel=0x7f54646b1dd8, 
itup_key=0x562034ea7428, buf=770, cbuf=0, newitemoff=408, newitemsz=16, 
newitem=0x562034ea3fc8,
orignewitem=0x0, nposting=0x0, postingoff=0) at 
/home/andres/src/postgresql/src/backend/access/nbtree/nbtinsert.c:1715
#10 0x56203288e4bb in _bt_insertonpg (rel=0x7f54646b1dd8, 
itup_key=0x562034ea7428, buf=770, cbuf=0, stack=0x562034ea1fb8, 
itup=0x562034ea3fc8, itemsz=16,
newitemoff=408, postingoff=0, split_only_page=false) at 
/home/andres/src/postgresql/src/backend/access/nbtree/nbtinsert.c:1212
#11 0x56203288caf9 in _bt_doinsert (rel=0x7f54646b1dd8, 
itup=0x562034ea3fc8, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=false, 
heapRel=0x7f546823dde0)
at 

Re: [PATCH] CF app: add "Returned: Needs more interest"

2022-08-03 Thread Jacob Champion
On Wed, Aug 3, 2022 at 10:09 AM Julien Rouhaud  wrote:
> First of all, I didn't want to imply that rejecting a patch should be 
> pleasant,
> sorry if that sounded that way.

No worries, I don't think it really sounded that way. :D

> It's not that I'm opposed to adding that status, I just don't see how it's
> really going to improve the situation on its own.

If the situation you're referring to is the fact that we have a lot of
patches sitting without review, it won't improve that situation, I
agree.

The situation I'm looking at, though, is where we have a dozen patches
floating forward that multiple CFMs in a row feel should be returned,
but they won't because claiming "they have feedback" is clearly unfair
to the author. I think this will improve that situation.

> Or maybe because it wouldn't
> make any difference to me as a patch author to get my patches returned "with
> feedback" or "for any other reason" if they are ignored.

Sure. I think this change helps the newer contributors (and the CFMs
talking to them) more than it helps the established ones.

I'm in your boat, where I don't personally care how a patch of mine is
returned (and I'm fine with Withdrawing them myself). But I'm also
paid to do this. From some of my past experiences with other projects,
I tend to feel more sensitive to bad communication if I've developed a
patch using volunteer hours, on evenings or weekends.

> I'm afraid that
> patches will still be left alone to rot and there still be no clear rules on
> what to do and when, reminder for CFM and such, and that this new status would
> never be used anyway.  So I guess I will just stop hijacking this thread and
> wait for a discussion on that, sorry for the noise.

Well, here, let's keep that conversation going too while there's
momentum. One sec while I switch Subjects and continue...

--Jacob




Re: optimize lookups in snapshot [sub]xip arrays

2022-08-03 Thread Nathan Bossart
Here is a new patch set.  0001 is the currently-proposed patch from the
other thread [0] for determining SSE2 support.  0002 introduces the
optimized linear search function.  And 0003 makes use of the new function
for the [sub]xip lookups in XidInMVCCSnapshot().

[0] 
https://postgr.es/m/CAFBsxsGktHL7%3DJXbgnKTi_uL0VRPcH4FSAqc6yK-3%2BJYfqPPjA%40mail.gmail.com

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From bd523948876f801b7f1b909f399b2cc41acf06cf Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Wed, 3 Aug 2022 11:07:40 +0700
Subject: [PATCH v5 1/3] Support SSE2 intrinsics where available

SSE2 vector instructions are part of the spec for the 64-bit x86
architecture. Until now we have relied on the compiler to autovectorize
in some limited situations, but some useful coding idioms can only be
expressed explicitly via compiler intrinsics. To this end, add a header
that defines USE_SSE2 when available. While x86-only for now, we can
add other architectures in the future. This will also be the intended
place for low-level hepler functions that use vector operations.

Reviewed by Nathan Bossart

Discussion: https://www.postgresql.org/message-id/CAFBsxsE2G_H_5Wbw%2BNOPm70-BK4xxKf86-mRzY%3DL2sLoQqM%2B-Q%40mail.gmail.com
---
 src/include/port/simd.h | 30 ++
 1 file changed, 30 insertions(+)
 create mode 100644 src/include/port/simd.h

diff --git a/src/include/port/simd.h b/src/include/port/simd.h
new file mode 100644
index 00..a571e79f57
--- /dev/null
+++ b/src/include/port/simd.h
@@ -0,0 +1,30 @@
+/*-
+ *
+ * simd.h
+ *	  Support for platform-specific vector operations.
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/port/simd.h
+ *
+ *-
+ */
+#ifndef SIMD_H
+#define SIMD_H
+
+/*
+ * SSE2 instructions are part of the spec for the 64-bit x86 ISA. We assume
+ * that compilers targeting this architecture understand SSE2 intrinsics.
+ *
+ * We use emmintrin.h rather than the comprehensive header immintrin.h in
+ * order to exclude extensions beyond SSE2. This is because MSVC, at least,
+ * will allow the use of intrinsics that haven't been enabled at compile
+ * time.
+ */
+#if (defined(__x86_64__) || defined(_M_AMD64))
+#include 
+#define USE_SSE2
+#endif
+
+#endif			/* SIMD_H */
-- 
2.25.1

>From 89d17ba8a669b53814551284f8f8c82192eb1402 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 3 Aug 2022 09:49:04 -0700
Subject: [PATCH v5 2/3] Introduce optimized routine for linear searches
 through an array of integers.

If SSE2 is available, this function uses it to speed up the search.  Otherwise,
it uses a simple 'for' loop.  This is a prerequisite for a follow-up commit
that will use this function to optimize [sub]xip lookups in
XidInMVCCSnapshot(), but it can be used anywhere that might benefit from such
an optimization.

It might be worthwhile to add an ARM-specific code path to this function in the
future.

Author: Nathan Bossart
Reviewed by: Andres Freund, John Naylor
Discussion: https://postgr.es/m/20220713170950.GA3116318%40nathanxps13
---
 src/include/utils/linearsearch.h | 76 
 1 file changed, 76 insertions(+)
 create mode 100644 src/include/utils/linearsearch.h

diff --git a/src/include/utils/linearsearch.h b/src/include/utils/linearsearch.h
new file mode 100644
index 00..51298b4355
--- /dev/null
+++ b/src/include/utils/linearsearch.h
@@ -0,0 +1,76 @@
+/*-
+ *
+ * linearsearch.h
+ *	  Optimized linear search routines.
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/include/utils/linearsearch.h
+ *
+ *-
+ */
+#ifndef LINEARSEARCH_H
+#define LINEARSEARCH_H
+
+#include "port/simd.h"
+
+/*
+ * pg_linearsearch_uint32
+ *
+ * Returns true if there is an element in 'base' that equals 'key'.  Otherwise,
+ * returns false.
+ *
+ * Since pg_attribute_no_sanitize_alignment() is only intended for x86-specific
+ * code, we surround it with an SSE2 check.
+ */
+#ifdef USE_SSE2
+pg_attribute_no_sanitize_alignment()
+#endif
+static inline bool
+pg_linearsearch_uint32(uint32 key, uint32 *base, uint32 nelem)
+{
+	uint32		i = 0;
+
+	/* If possible, use SSE2 intrinsics to speed up the search. */
+#ifdef USE_SSE2
+	__m128i		keys = _mm_set1_epi32(key);	/* load 4 copies of key */
+	uint32		its = nelem & ~0xF;			/* round down to multiple of 16 */
+
+	for (; i < its; i += 16)
+	{
+		/* load the next 16 values into __m128i variables */
+		__m128i vals1 = _mm_loadu_si128((__m128i *) [i]);
+		__m128i vals2 = _mm_loadu_si128((__m128i *) [i + 4]);
+		__m128i vals3 = 

Re: [PATCH] CF app: add "Returned: Needs more interest"

2022-08-03 Thread Julien Rouhaud
Hi,

On Wed, Aug 03, 2022 at 08:58:49AM -0700, Jacob Champion wrote:
> On Tue, Aug 2, 2022 at 8:00 PM Julien Rouhaud  wrote:
> > I'm personally fine with the current statutes, as closing a patch with RwF
> > explaining that there was no interest is still a feedback,
>
> Making that explanation each time we intend to close a patch "needs
> interest" takes a lot of time and wordsmithing. "Returned with
> feedback" clearly has an established meaning to the community, and
> this is counter to that meaning, so people just avoid using it that
> way.
>
> When they do, miscommunications happen easily, which can lead to
> authors reopening patches thinking that there's been some kind of
> mistake (as happened to at least one of the patches in this past CF,
> which I had to close again). Language and cultural differences likely
> exacerbate the problem, so the less ad hoc messaging a CFM has to do
> to explain that "this is RwF but not actually RwF", the better.
>
> > and having a
> > different status won't make it any more pleasant for both the CFM and the
> > author.
>
> "More pleasant" is not really the goal here. I don't think it should
> ever be pleasant for a CFM to return someone's patch when it hasn't
> received review, and it's certainly not going to be pleasant for the
> author. But we can be more honest and clear about why we're returning
> it, and hopefully make it less unpleasant.
>
> > My biggest complaint here is that it doesn't really do anything to try to
> > improve the current situation (lack of review and/or lack of committer
> > interest).
>
> It's not really meant to improve that. This is just trying to move the
> needle a little bit, in a way that's been requested several times.
>
> > Maybe it would be better to discuss some clear rules and thresholds on when
> > action should be taken on such patches.
>
> I think that's also important to discuss, and I have thoughts on that
> too, but I don't think the discussions for these sorts of incremental
> changes should wait for that discussion.

First of all, I didn't want to imply that rejecting a patch should be pleasant,
sorry if that sounded that way.

It's not that I'm opposed to adding that status, I just don't see how it's
really going to improve the situation on its own.  Or maybe because it wouldn't
make any difference to me as a patch author to get my patches returned "with
feedback" or "for any other reason" if they are ignored.  I'm afraid that
patches will still be left alone to rot and there still be no clear rules on
what to do and when, reminder for CFM and such, and that this new status would
never be used anyway.  So I guess I will just stop hijacking this thread and
wait for a discussion on that, sorry for the noise.




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-08-03 Thread Justin Pryzby
On Wed, Aug 03, 2022 at 11:02:00AM -0500, Justin Pryzby wrote:
> But I reproduced the first problem with a handful of tries interrupting the
> while loop:
> 
> 2022-08-03 10:39:50.129 CDT client backend[5530] [unknown] PANIC:  could not 
> open critical system index 2662
...
> So far, I haven't succeeded in eliciting anything useful from valgrind.

Now, I've reproduced the problem under valgrind, but it doesn't show anything
useful:

pryzbyj@pryzbyj:~$ while :; do psql -h /tmp template1 -c "DROP DATABASE a" -c 
"CREATE DATABASE a TEMPLATE postgres STRATEGY wal_log"; done
ERROR:  database "a" does not exist
CREATE DATABASE
^CCancel request sent
ERROR:  canceling statement due to user request
ERROR:  database "a" already exists
^C
pryzbyj@pryzbyj:~$ ^C
pryzbyj@pryzbyj:~$ ^C
pryzbyj@pryzbyj:~$ ^C
pryzbyj@pryzbyj:~$ psql -h /tmp a -c ''
2022-08-03 11:57:39.178 CDT client backend[31321] [unknown] PANIC:  could not 
open critical system index 2662
psql: error: falló la conexión al servidor en el socket «/tmp/.s.PGSQL.5432»: 
PANIC:  could not open critical system index 2662


On the server process, nothing interesting but the backtrace (the error was
before this, while writing the relation file, but there's nothing suspicious).

2022-08-03 11:08:06.628 CDT client backend[2841] [unknown] PANIC:  could not 
open critical system index 2662
==2841==
==2841== Process terminating with default action of signal 6 (SIGABRT)
==2841==at 0x5FBBE97: raise (raise.c:51)
==2841==by 0x5FBD800: abort (abort.c:79)
==2841==by 0x2118DEF: errfinish (elog.c:675)
==2841==by 0x20F6002: load_critical_index (relcache.c:4328)
==2841==by 0x20F727A: RelationCacheInitializePhase3 (relcache.c:4103)
==2841==by 0x213DFA5: InitPostgres (postinit.c:1087)
==2841==by 0x1D20D72: PostgresMain (postgres.c:4081)
==2841==by 0x1B15CFD: BackendRun (postmaster.c:4490)
==2841==by 0x1B1D6E1: BackendStartup (postmaster.c:4218)
==2841==by 0x1B1DD59: ServerLoop (postmaster.c:1808)
==2841==by 0x1B1F86D: PostmasterMain (postmaster.c:1480)
==2841==by 0x17B7150: main (main.c:197)

Below, I reproduced it without valgrind (and without LANG):

pryzbyj@pryzbyj:~/src/postgres$ while :; do psql -qh /tmp template1 -c "DROP 
DATABASE a" -c "CREATE DATABASE a TEMPLATE postgres STRATEGY wal_log"; done
2022-08-03 11:59:52.675 CDT checkpointer[1881] LOG:  checkpoint starting: 
immediate force wait
2022-08-03 11:59:52.862 CDT checkpointer[1881] LOG:  checkpoint complete: wrote 
4 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.045 s, 
sync=0.038 s, total=0.188 s; sync files=3, longest=0.019 s, average=0.013 s; 
distance=3 kB, estimate=3 kB; lsn=0/24862508, redo lsn=0/248624D0
2022-08-03 11:59:53.213 CDT checkpointer[1881] LOG:  checkpoint starting: 
immediate force wait
2022-08-03 11:59:53.409 CDT checkpointer[1881] LOG:  checkpoint complete: wrote 
4 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.030 s, 
sync=0.054 s, total=0.196 s; sync files=4, longest=0.029 s, average=0.014 s; 
distance=4042 kB, estimate=4042 kB; lsn=0/24C54D88, redo lsn=0/24C54D50
^CCancel request sent
2022-08-03 11:59:53.750 CDT checkpointer[1881] LOG:  checkpoint starting: 
immediate force wait
2022-08-03 11:59:53.930 CDT checkpointer[1881] LOG:  checkpoint complete: wrote 
4 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.029 s, 
sync=0.027 s, total=0.181 s; sync files=4, longest=0.022 s, average=0.007 s; 
distance=4042 kB, estimate=4042 kB; lsn=0/250476D0, redo lsn=0/25047698
2022-08-03 11:59:54.270 CDT checkpointer[1881] LOG:  checkpoint starting: 
immediate force wait
^C2022-08-03 11:59:54.294 CDT client backend[1903] psql ERROR:  canceling 
statement due to user request
2022-08-03 11:59:54.294 CDT client backend[1903] psql STATEMENT:  DROP DATABASE 
a
Cancel request sent
ERROR:  canceling statement due to user request
2022-08-03 11:59:54.296 CDT client backend[1903] psql ERROR:  database "a" 
already exists
2022-08-03 11:59:54.296 CDT client backend[1903] psql STATEMENT:  CREATE 
DATABASE a TEMPLATE postgres STRATEGY wal_log
ERROR:  database "a" already exists
^C
pryzbyj@pryzbyj:~/src/postgres$ ^C
pryzbyj@pryzbyj:~/src/postgres$ ^C
pryzbyj@pryzbyj:~/src/postgres$ 2022-08-03 11:59:54.427 CDT checkpointer[1881] 
LOG:  checkpoint complete: wrote 4 buffers (0.0%); 0 WAL file(s) added, 0 
removed, 0 recycled; write=0.024 s, sync=0.036 s, total=0.158 s; sync files=4, 
longest=0.027 s, average=0.009 s; distance=4042 kB, estimate=4042 kB; 
lsn=0/2543A108, redo lsn=0/2543A0A8
^C
pryzbyj@pryzbyj:~/src/postgres$ ^C
pryzbyj@pryzbyj:~/src/postgres$ ^C
pryzbyj@pryzbyj:~/src/postgres$ psql -h /tmp a -c ''

   2022-08-03 11:59:56.617 CDT 
client backend[1914] [unknown] PANIC:  could not open critical system index 2662




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-08-03 Thread Dilip Kumar
On Wed, Aug 3, 2022 at 9:32 PM Justin Pryzby  wrote:
>
> On Wed, Aug 03, 2022 at 04:45:23PM +0530, Dilip Kumar wrote:
> > Another version of the patch which closes the smgr at the end using
> > smgrcloserellocator() and I have also added a commit message.
>
> Thanks for providing a patch.
> This seems to fix the second problem with accessing freed memory.

Thanks for the confirmation.

> But I reproduced the first problem with a handful of tries interrupting the
> while loop:
>
> 2022-08-03 10:39:50.129 CDT client backend[5530] [unknown] PANIC:  could not 
> open critical system index 2662
>
> In the failure, when trying to connect to the new "a" DB, it does this:
>
> [pid 10700] openat(AT_FDCWD, "base/17003/pg_filenode.map", O_RDONLY) = 11
> [pid 10700] read(11, 
> "\27'Y\0\21\0\0\0\353\4\0\0\353\4\0\0\341\4\0\0\341\4\0\0\347\4\0\0\347\4\0\0\337\4\0\0\337\4\0\0\24\v\0\0\24\v\0\0\25\v\0\0\25\v\0\0K\20\0\0K\20\0\0L\20\0\0L\20\0\0\202\n\0\0\202\n\0\0\203\n\0\0\203\n\0\0\217\n\0\0\217\n\0\0\220\n\0\0\220\n\0\0b\n\0\0b\n\0\0c\n\0\0c\n\0\0f\n\0\0f\n\0\0g\n\0\0g\n\0\0\177\r\0\0\177\r\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\362\366\252\337",
>  524) = 524
> [pid 10700] close(11)   = 0
> [pid 10700] openat(AT_FDCWD, "base/17003/pg_internal.init", O_RDONLY) = -1 
> ENOENT (No such file or directory)
> [pid 10700] openat(AT_FDCWD, "base/17003/1259", O_RDWR) = 11
> [pid 10700] lseek(11, 0, SEEK_END)  = 106496
> [pid 10700] lseek(11, 0, SEEK_END)  = 106496
>
> And then reads nothing but zero bytes from FD 11 (rel 1259/pg_class)
>
> So far, I haven't succeeded in eliciting anything useful from valgrind.

I tried multiple times but had no luck with reproducing this issue.
Do you have some logs to know that just before ^C what was the last
query executed and whether it got canceled or executed completely?
Because theoretically, if the create database failed anywhere in
between then it should at least clean the directory of that newly
created database but seems there are some corrupted data in that
directory so seems it is not symptoms of just the create database
failure but some combination of multiple things.  I will put more
thought into this tomorrow.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [doc] fix a potential grammer mistake

2022-08-03 Thread Robert Treat
On Wed, Aug 3, 2022 at 11:15 AM Junwang Zhao  wrote:
>
> Attachment is a corrected version based on Tom's suggestion.
>
> Thanks.
>
> On Wed, Aug 3, 2022 at 9:56 PM Tom Lane  wrote:
> >
> > Erikjan Rijkers  writes:
> > > I don't think these  "were"s  are wrong but arguably changing them to
> > > "have" helps non-native speakers (like myself), as it doesn't change the
> > > meaning significantly as far as I can see.
> >
> > I think it does --- it changes the meaning from passive to active.
> > I don't necessarily object to rewriting these sentences more broadly,
> > but I don't think "have issued" is the correct phrasing.
> >
> > Possibly "The user issued ..." would work.
> >

Is there a reason that the first case says "just" issued vs the other
two cases? It seems to me that it should be removed.

Robert Treat
https://xzilla.net




Re: fix typos

2022-08-03 Thread Robert Haas
On Tue, Aug 2, 2022 at 4:32 AM Erik Rijkers  wrote:
> > The part of the sentence inside parentheses is not clear to me, before
> > or after the patch:
> >
> >  Dropping an extension causes its component objects, and other
> > explicitly
> >  dependent routines (see ,
> > -   the depends on extension action), to be dropped as well.
> > +   that depend on extension action), to be dropped as well.
> > 
> >
>
> Hm, I see what you mean, I did not notice that earlier and I won't make
> a guess as to intention.  Maybe Bruce can have another look? (commit
> 5fe2d4c56e)

I think that it's talking about this (documented) syntax:

ALTER ROUTINE name [ ( [ [ argmode ] [ argname ] argtype [, ...] ] ) ]
[ NO ] DEPENDS ON EXTENSION extension_name

So the change from "depends" to "depend" here is incorrect. Maybe we
can say something like:

the DEPENDS ON EXTENSION
extension_name action

(I haven't tested whether this markup works.)

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: support for SSE2 intrinsics

2022-08-03 Thread Nathan Bossart
On Wed, Aug 03, 2022 at 12:00:39PM +0700, John Naylor wrote:
> Thanks for checking! Here's a concrete patch for testing.

LGTM

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: pg15b2: large objects lost on upgrade

2022-08-03 Thread Robert Haas
On Wed, Aug 3, 2022 at 10:13 AM Tom Lane  wrote:
> If you have a different solution that you can implement by, say,
> tomorrow, then go for it.  But I want to see some fix in there
> within about 24 hours, because 15beta3 wraps on Monday and we
> will need at least a few days to see if the buildfarm is actually
> stable with whatever solution is applied.

I doubt that I can come up with something that quickly, so I guess we
need some stopgap for now.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-08-03 Thread Justin Pryzby
On Wed, Aug 03, 2022 at 04:45:23PM +0530, Dilip Kumar wrote:
> Another version of the patch which closes the smgr at the end using
> smgrcloserellocator() and I have also added a commit message.

Thanks for providing a patch.
This seems to fix the second problem with accessing freed memory.

But I reproduced the first problem with a handful of tries interrupting the
while loop:

2022-08-03 10:39:50.129 CDT client backend[5530] [unknown] PANIC:  could not 
open critical system index 2662

In the failure, when trying to connect to the new "a" DB, it does this:

[pid 10700] openat(AT_FDCWD, "base/17003/pg_filenode.map", O_RDONLY) = 11
[pid 10700] read(11, 
"\27'Y\0\21\0\0\0\353\4\0\0\353\4\0\0\341\4\0\0\341\4\0\0\347\4\0\0\347\4\0\0\337\4\0\0\337\4\0\0\24\v\0\0\24\v\0\0\25\v\0\0\25\v\0\0K\20\0\0K\20\0\0L\20\0\0L\20\0\0\202\n\0\0\202\n\0\0\203\n\0\0\203\n\0\0\217\n\0\0\217\n\0\0\220\n\0\0\220\n\0\0b\n\0\0b\n\0\0c\n\0\0c\n\0\0f\n\0\0f\n\0\0g\n\0\0g\n\0\0\177\r\0\0\177\r\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\362\366\252\337",
 524) = 524
[pid 10700] close(11)   = 0
[pid 10700] openat(AT_FDCWD, "base/17003/pg_internal.init", O_RDONLY) = -1 
ENOENT (No such file or directory)
[pid 10700] openat(AT_FDCWD, "base/17003/1259", O_RDWR) = 11
[pid 10700] lseek(11, 0, SEEK_END)  = 106496
[pid 10700] lseek(11, 0, SEEK_END)  = 106496

And then reads nothing but zero bytes from FD 11 (rel 1259/pg_class)

So far, I haven't succeeded in eliciting anything useful from valgrind.

-- 
Justin




Re: [PATCH] CF app: add "Returned: Needs more interest"

2022-08-03 Thread Jacob Champion
On Tue, Aug 2, 2022 at 8:00 PM Julien Rouhaud  wrote:
> I'm personally fine with the current statutes, as closing a patch with RwF
> explaining that there was no interest is still a feedback,

Hi Julien,

Making that explanation each time we intend to close a patch "needs
interest" takes a lot of time and wordsmithing. "Returned with
feedback" clearly has an established meaning to the community, and
this is counter to that meaning, so people just avoid using it that
way.

When they do, miscommunications happen easily, which can lead to
authors reopening patches thinking that there's been some kind of
mistake (as happened to at least one of the patches in this past CF,
which I had to close again). Language and cultural differences likely
exacerbate the problem, so the less ad hoc messaging a CFM has to do
to explain that "this is RwF but not actually RwF", the better.

> and having a
> different status won't make it any more pleasant for both the CFM and the
> author.

"More pleasant" is not really the goal here. I don't think it should
ever be pleasant for a CFM to return someone's patch when it hasn't
received review, and it's certainly not going to be pleasant for the
author. But we can be more honest and clear about why we're returning
it, and hopefully make it less unpleasant.

> My biggest complaint here is that it doesn't really do anything to try to
> improve the current situation (lack of review and/or lack of committer
> interest).

It's not really meant to improve that. This is just trying to move the
needle a little bit, in a way that's been requested several times.

> Maybe it would be better to discuss some clear rules and thresholds on when
> action should be taken on such patches.

I think that's also important to discuss, and I have thoughts on that
too, but I don't think the discussions for these sorts of incremental
changes should wait for that discussion.

--Jacob




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-08-03 Thread Dilip Kumar
On Wed, 3 Aug 2022 at 9:22 PM, Robert Haas  wrote:

> On Wed, Aug 3, 2022 at 7:15 AM Dilip Kumar  wrote:
> > Another version of the patch which closes the smgr at the end using
> > smgrcloserellocator() and I have also added a commit message.
>
> Hmm, but didn't we decide against doing it that way intentionally? The
> comment you're deleting says "If we didn't do this and used the smgr
> layer directly, we would have to worry about invalidations."


I think we only need to worry if we keep the smgr reference around and try
to reuse it.  But in this patch I am not keeping the reference to the smgr.

—
Dilip

> --
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-08-03 Thread Robert Haas
On Wed, Aug 3, 2022 at 7:15 AM Dilip Kumar  wrote:
> Another version of the patch which closes the smgr at the end using
> smgrcloserellocator() and I have also added a commit message.

Hmm, but didn't we decide against doing it that way intentionally? The
comment you're deleting says "If we didn't do this and used the smgr
layer directly, we would have to worry about invalidations."

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: A test for replay of regression tests

2022-08-03 Thread Justin Pryzby
On Thu, Dec 09, 2021 at 12:10:23PM +1300, Thomas Munro wrote:
> This adds 2 whole minutes to the recovery check, when running with the
> Windows serial-only scripts on Cirrus CI (using Andres's CI patches).
> For Linux it adds ~20 seconds to the total of -j8 check-world.
> Hopefully that's time well spent, because it adds test coverage for
> all the redo routines, and hopefully soon we won't have to run 'em in
> series on Windows.

Should 027-stream-regress be renamed to something that starts earlier ?
Off-list earlier this year, Andres referred to 000.

-- 
Justin




Re: [doc] fix a potential grammer mistake

2022-08-03 Thread Junwang Zhao
Attachment is a corrected version based on Tom's suggestion.

Thanks.

On Wed, Aug 3, 2022 at 9:56 PM Tom Lane  wrote:
>
> Erikjan Rijkers  writes:
> > I don't think these  "were"s  are wrong but arguably changing them to
> > "have" helps non-native speakers (like myself), as it doesn't change the
> > meaning significantly as far as I can see.
>
> I think it does --- it changes the meaning from passive to active.
> I don't necessarily object to rewriting these sentences more broadly,
> but I don't think "have issued" is the correct phrasing.
>
> Possibly "The user issued ..." would work.
>
> regards, tom lane



-- 
Regards
Junwang Zhao


0001-doc-rewrite-some-comments-to-make-them-more-precise.patch
Description: Binary data


Re: pg15b2: large objects lost on upgrade

2022-08-03 Thread Jonathan S. Katz


> On Aug 3, 2022, at 10:14 AM, Tom Lane  wrote:
> 
> Robert Haas  writes:
>>> On Tue, Aug 2, 2022 at 3:51 PM Tom Lane  wrote:
>>> I also think that ">=" is a sufficient requirement.
> 
>> I don't really like this approach. Imagine that the code got broken in
>> such a way that relfrozenxid and relminmxid were set to a value chosen
>> at random - say, the contents of 4 bytes of unallocated memory that
>> contained random garbage. Well, right now, the chances that this would
>> cause a test failure are nearly 100%. With this change, they'd be
>> nearly 0%.
> 
> If you have a different solution that you can implement by, say,
> tomorrow, then go for it.  But I want to see some fix in there
> within about 24 hours, because 15beta3 wraps on Monday and we
> will need at least a few days to see if the buildfarm is actually
> stable with whatever solution is applied.

Yeah, I would argue that the current proposal
guards against the false positives as they currently stand.

I do think Robert raises a fair point, but I wonder
if another test would catch that? I don’t want to
say “this would never happen” because, well,
it could happen. But AIUI this would probably
manifest itself in other places too?

> A possible compromise is to allow new values that are between
> old value and old-value-plus-a-few-dozen.

Well, that’s kind of deterministic :-) I’m OK
with that tweak, where “OK” means not thrilled,
but I don’t see a better way to get more granular
details (at least through my phone searches).

I can probably have a tweak for this in a couple
of hours if and when I’m on plane wifi.

Jonathan 




Re: pg15b2: large objects lost on upgrade

2022-08-03 Thread Tom Lane
Robert Haas  writes:
> On Tue, Aug 2, 2022 at 3:51 PM Tom Lane  wrote:
>> I also think that ">=" is a sufficient requirement.

> I don't really like this approach. Imagine that the code got broken in
> such a way that relfrozenxid and relminmxid were set to a value chosen
> at random - say, the contents of 4 bytes of unallocated memory that
> contained random garbage. Well, right now, the chances that this would
> cause a test failure are nearly 100%. With this change, they'd be
> nearly 0%.

If you have a different solution that you can implement by, say,
tomorrow, then go for it.  But I want to see some fix in there
within about 24 hours, because 15beta3 wraps on Monday and we
will need at least a few days to see if the buildfarm is actually
stable with whatever solution is applied.

A possible compromise is to allow new values that are between
old value and old-value-plus-a-few-dozen.

regards, tom lane




Re: pg15b2: large objects lost on upgrade

2022-08-03 Thread Robert Haas
On Tue, Aug 2, 2022 at 3:51 PM Tom Lane  wrote:
> > The test does look helpful and it would catch regressions. Loosely
> > quoting Robert on a different point upthread, we don't want to turn off
> > the alarm just because it's spuriously going off.
> > I think the weakened check is OK (and possibly mimics the real-world
> > where autovacuum runs), unless you see a major drawback to it?
>
> I also think that ">=" is a sufficient requirement.  It'd be a
> bit painful to test if we had to cope with potential XID wraparound,
> but we know that these installations haven't been around nearly
> long enough for that, so a plain ">=" test ought to be good enough.
> (Replacing the simple "eq" code with something that can handle that
> doesn't look like much fun, though.)

I don't really like this approach. Imagine that the code got broken in
such a way that relfrozenxid and relminmxid were set to a value chosen
at random - say, the contents of 4 bytes of unallocated memory that
contained random garbage. Well, right now, the chances that this would
cause a test failure are nearly 100%. With this change, they'd be
nearly 0%.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [doc] fix a potential grammer mistake

2022-08-03 Thread Tom Lane
Erikjan Rijkers  writes:
> I don't think these  "were"s  are wrong but arguably changing them to 
> "have" helps non-native speakers (like myself), as it doesn't change the 
> meaning significantly as far as I can see.

I think it does --- it changes the meaning from passive to active.
I don't necessarily object to rewriting these sentences more broadly,
but I don't think "have issued" is the correct phrasing.

Possibly "The user issued ..." would work.

regards, tom lane




Re: automatically generating node support functions

2022-08-03 Thread Tom Lane
Amit Kapila  writes:
> I have a question related to commit 964d01ae90. Today, after getting
> the latest code, when I compiled it on my windows machine, it lead to
> a compilation error because the outfuncs.funcs.c was not regenerated.
> I did the usual steps which I normally perform after getting the
> latest code (a) run "perl mkvcbuild.pl" and (b) then build the code
> using MSVC. Now, after that, I manually removed "node-support-stamp"
> from folder src/backend/nodes/ and re-did the steps and I see that the
> outfuncs.funcs.c got regenerated, and the build is also successful. I
> see that there is handling to clean the file "node-support-stamp" in
> nodes/Makefile but not sure how it works for windows. I think I am
> missing something here. Can you please guide me?

More likely, we need to add something explicit to Mkvcbuild.pm
for this.  I recall that it has stanzas to deal with updating
other autogenerated files; I bet we either missed that or
fat-fingered it for node-support-stamp.

regards, tom lane




Re: logical replication restrictions

2022-08-03 Thread Amit Kapila
On Mon, Aug 1, 2022 at 6:46 PM Euler Taveira  wrote:
>
> On Tue, Jul 5, 2022, at 9:29 AM, Amit Kapila wrote:
>
> I wonder why we don't apply the delay on commit/commit_prepared
> records only similar to physical replication. See recoveryApplyDelay.
> One more advantage would be then we don't need to worry about
> transactions that we are going to skip due SKIP feature for
> subscribers.
>
> I added an explanation at the top of apply_delay(). I didn't read the 
> "parallel
> apply" patch yet. I'll do soon to understand how the current design for
> streamed transactions conflicts with the parallel apply patch.
>
> + * It applies the delay for the next transaction but before starting the
> + * transaction. The main reason for this design is to avoid a long-running
> + * transaction (which can cause some operational challenges) if the user 
> sets a
> + * high value for the delay. This design is different from the physical
> + * replication (that applies the delay at commit time) mainly because write
> + * operations may allow some issues (such as bloat and locks) that can be
> + * minimized if it does not keep the transaction open for such a long time.
> + */

Your explanation makes sense to me. The other point to consider is
that there can be cases where we may not apply operation for the
transaction because of empty transactions (we don't yet skip empty
xacts for prepared transactions). So, won't it be better to apply the
delay just before we apply the first change for a transaction? Do we
want to apply the delay during table sync as we sometimes do need to
enter apply phase while doing table sync?

>
> One more thing that might be worth discussing is whether introducing a
> new subscription parameter for this feature is a better idea or can we
> use guc (either an existing or a new one). Users may want to set this
> only for a particular subscription or set of subscriptions in which
> case it is better to have this as a subscription level parameter.
> OTOH, I was slightly worried that if this will be used for all
> subscriptions on a subscriber then it will be burdensome for users.
>
> That's a good point. Logical replication is per database and it is slightly
> different from physical replication that is per cluster. In physical
> replication, you have no choice but to have a GUC. It is very unlikely that
> someone wants to delay all logical replicas. Therefore, the benefit of having 
> a
> GUC is quite small.
>

Fair enough.

-- 
With Regards,
Amit Kapila.




Correct comment in RemoveNonParentXlogFiles()

2022-08-03 Thread Ashutosh Sharma
HI All,

Following comment in RemoveNonParentXlogFiles() says that we are trying to
remove any WAL file whose segment number is >= the segment number of the
first WAL file on the new timeline. However, looking at the code, I can say
that we are trying to remove the WAL files from the previous timeline whose
segment number is just greater than (not equal to) the segment number of
the first WAL file in the new timeline. I think we should improve this
comment, thoughts?


/*
 * Remove files that are on a timeline older than the new one we're
 * switching to, but with a segment number >= the first segment on
the
 * new timeline.
 */
if (strncmp(xlde->d_name, switchseg, 8) < 0 &&
strcmp(xlde->d_name + 8, switchseg + 8) > 0)

--
With Regards,
Ashutosh Sharma.


Re: Smoothing the subtrans performance catastrophe

2022-08-03 Thread Dilip Kumar
On Mon, Aug 1, 2022 at 10:13 PM Simon Riggs
 wrote:
>
> "A mathematical catastrophe is a point in a model of an input-output
> system, where a vanishingly small change in the input can produce a
> large change in the output."
>
> We have just such a change in Postgres: when a snapshot overflows. In
> this case it takes only one subxid over the subxid cache limit to slow
> down every request in XidInMVCCSnapshot(), which becomes painful when
> a long running transaction exists at the same time. This situation has
> been noted by various bloggers, but is illustrated clearly in the
> attached diagram, generated by test results from Julien Tachoires.
>
> The reason for the slowdown is clear: when we overflow we check every
> xid against subtrans, producing a large stream of lookups. Some
> previous hackers have tried to speed up subtrans - this patch takes a
> different approach: remove as many subtrans lookups as possible. (So
> is not competing with those other solutions).
>
> Attached patch improves on the situation, as also shown in the attached 
> diagram.
>
> The patch does these things:
>
> 1. Rework XidInMVCCSnapshot() so that it always checks the snapshot
> first, before attempting to lookup subtrans. A related change means
> that we always keep full subxid info in the snapshot, even if one of
> the backends has overflowed.
>
> 2. Use binary search for standby snapshots, since the snapshot subxip
> is in sorted order.
>
> 3. Rework GetTopmostTransaction so that it a) checks xmin as it goes,
> b) only does one iteration on standby snapshots, both of which save
> subtrans lookups in appropriate cases.
> (This was newly added in v6)
>
> Now, is this a panacea? Not at all. What this patch does is smooth out
> the catastrophic effect so that a few overflowed subxids don't spoil
> everybody else's performance, but eventually, if many or all sessions
> have their overflowed subxid caches then the performance will descend
> as before, albeit that the attached patch has some additional
> optimizations (2, 3 above). So what this gives is a better flight
> envelope in case of a small number of occasional overflows.
>
> Please review. Thank you.

+1,
I had a quick look into the patch to understand the idea and I think
the idea looks really promising to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-08-03 Thread Dilip Kumar
On Wed, Aug 3, 2022 at 1:41 PM Dilip Kumar  wrote:
>
> On Wed, Aug 3, 2022 at 12:00 PM Dilip Kumar  wrote:
> >
>
> > Okay, so AtEOXact_SMgr will only get rid of unowned smgr and ours are
> > owned by a fake relcache and whose lifetime is just portal memory
> > context which will go away at the transaction end.  So as Andres
> > suggested options could be that a) we catch the error and do
> > FreeFakeRelcacheEntry.  b) directly use smgropen instead of
> > RelationGetSmgr because actually, we do not want the owner to be set
> > for these smgrs.
> >
> > I think option b) looks better to me, I will prepare a patch and test
> > whether the error goes away with that or not.
> >
>
> Here is the patch which directly uses smgropen instead of using fake
> relcache entry.  We don't preserve the smgr pointer and whenever
> required we again call the smgropen.
>
> With this patch it resolved the problem for me at least what I was
> able to reproduce.  I was able to reproduce the WARNING messages that
> Robert got as well as the valgrind error that Justin got and with this
> patch both are resolved.

Another version of the patch which closes the smgr at the end using
smgrcloserellocator() and I have also added a commit message.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From b151b54880dd17c94a25e8de908e30fe0d9a8542 Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Wed, 3 Aug 2022 13:28:46 +0530
Subject: [PATCH v1] Avoid setting the fake relcache entry as smgr owner
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

During CREATE DATABASE, we are not connected to the source and the
destination DB so in order to operate on the storage we are using
FakeRelCacheEntry and by using that we are calling RelationGetSmgr().

So the problem is that this function will set the temporary
FakeRelCacheEntry as an owner of the smgr.  Now if there is any
error before we close the FakeRelCacheEntry then the memory of the
fake relcache entry will be released at the transaction abort but
the smgr will survive the transaction.  So now smgr is pointing
to some already release memory and it will have random behavior
when we try to access the smgr next time.

For fixing the issue instead of using the FakeRelCacheEntry, directly
call the smgropen() but do not keep the reference to the smgr.
So every time call smgropen() whenever we need it.  This is required to
ensure that we do not access the smgr pointer which might have already
been closed during interrupt processing.
---
 src/backend/commands/dbcommands.c   | 15 +++
 src/backend/storage/buffer/bufmgr.c | 51 +
 2 files changed, 28 insertions(+), 38 deletions(-)

diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 7bc53f3..9342e8e 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -258,8 +258,8 @@ ScanSourceDatabasePgClass(Oid tbid, Oid dbid, char *srcpath)
 	Page		page;
 	List	   *rlocatorlist = NIL;
 	LockRelId	relid;
-	Relation	rel;
 	Snapshot	snapshot;
+	SMgrRelation	smgr;
 	BufferAccessStrategy bstrategy;
 
 	/* Get pg_class relfilenumber. */
@@ -276,16 +276,9 @@ ScanSourceDatabasePgClass(Oid tbid, Oid dbid, char *srcpath)
 	rlocator.dbOid = dbid;
 	rlocator.relNumber = relfilenumber;
 
-	/*
-	 * We can't use a real relcache entry for a relation in some other
-	 * database, but since we're only going to access the fields related to
-	 * physical storage, a fake one is good enough. If we didn't do this and
-	 * used the smgr layer directly, we would have to worry about
-	 * invalidations.
-	 */
-	rel = CreateFakeRelcacheEntry(rlocator);
-	nblocks = smgrnblocks(RelationGetSmgr(rel), MAIN_FORKNUM);
-	FreeFakeRelcacheEntry(rel);
+	smgr = smgropen(rlocator, InvalidBackendId);
+	nblocks = smgrnblocks(smgr, MAIN_FORKNUM);
+	smgrclose(smgr);
 
 	/* Use a buffer access strategy since this is a bulk read operation. */
 	bstrategy = GetAccessStrategy(BAS_BULKREAD);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 6b30138..8a7ccf5 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -487,9 +487,9 @@ static void FindAndDropRelationBuffers(RelFileLocator rlocator,
 	   ForkNumber forkNum,
 	   BlockNumber nForkBlock,
 	   BlockNumber firstDelBlock);
-static void RelationCopyStorageUsingBuffer(Relation src, Relation dst,
-		   ForkNumber forkNum,
-		   bool isunlogged);
+static void RelationCopyStorageUsingBuffer(RelFileLocator srclocator,
+		   RelFileLocator dstlocator,
+		   ForkNumber forkNum, bool permanent);
 static void AtProcExit_Buffers(int code, Datum arg);
 static void CheckForBufferLeaks(void);
 static int	rlocator_comparator(const void *p1, const void *p2);
@@ -3701,8 +3701,9 @@ FlushRelationsAllBuffers(SMgrRelation *smgrs, int nrels)
  * 

RE: collate not support Unicode Variation Selector

2022-08-03 Thread 荒井元成
Thank you for your reply.

About 60,000 characters are registered in the IPAmj Mincho font designated by 
the national specifications. 
It should be able to handle all characters.

regards.


-Original Message-
From: Kyotaro Horiguchi  
Sent: Wednesday, August 3, 2022 3:26 PM
To: thomas.mu...@gmail.com
Cc: t...@sss.pgh.pa.us; n2...@ndensan.co.jp; pgsql-hackers@lists.postgresql.org
Subject: Re: collate not support Unicode Variation Selector

At Wed, 3 Aug 2022 14:02:08 +1200, Thomas Munro  wrote 
in 
> On Wed, Aug 3, 2022 at 12:56 PM Tom Lane  wrote:
> > Maybe it would help if you run the strings through normalize() first?
> > I'm not sure if that can combine combining characters.
> 
> I think the similarity between Latin combining characters and these 
> ideographic variations might end there.  I don't think there is a 
> single codepoint version of U&'\+003436' || U&'\+0E0101', unlike é.

Right. At least in Japanese texts, the two "character"s are the same glyph.  In 
that sense the loss of variation selectors from a text doesn't alter its 
meaning and doesn't hurt correctness at all.
Ideographic variation is useful in special cases where their ideographic 
identity is crucial.

> This system is for controlling small differences in rendering for the 
> "same" character[1].  My computer doesn't even show the OP's example 
> glyphs as different (to my eyes, at least; I can see on a random 
> picture I found[2] that the one with the e0101 selector is supposed to 
> have a ... what do you call that ... a tiny gap :-)).

They need variation-aware fonts and application support to render.  So when 
even *I* see the two characters on Excel (which I believe doesn't have that 
support by default), they would look exactly same.  In that sense, my opinion 
on the behavior is that all ideographic variations rather should be treated as 
the same character in searching in general context. In other words, text 
matching should just drop variation selectors as the default behavior.

ICU:Collator [1] has the notion of "collation strength" and I saw in an article 
that only Colator::IDENTICAL among five alternatives makes distinction between 
ideographic variations of a glyph.

> [1] http://www.unicode.org/reports/tr37/tr37-14.html
> [2] https://glyphwiki.org/wiki/u3436

[1] 
https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/classicu_1_1Collator.html

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center





Re: [PATCH] postgresql.conf.sample comment alignment.

2022-08-03 Thread Alvaro Herrera
On 2022-Aug-01, Tom Lane wrote:

> One idea for avoiding confusion is to legislate that we won't
> use tabs at all in this file (which we could enforce via
> .gitattributes, I think).

+1.

> But that might just be making things equally inconvenient for
> everybody.

In this situation, the only disadvantaged users are those using a
non-fixed-width font in their editor, but those are lost souls already.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Having your biases confirmed independently is how scientific progress is
made, and hence made our great society what it is today" (Mary Gardiner)




Re: PostgreSQL 15 minor fixes in protocol.sgml

2022-08-03 Thread Alvaro Herrera
On 2022-Aug-03, Amit Kapila wrote:

> Thanks for the report and Thanks Michael for including me. I am just
> redirecting it to -hackers so that others involved in this feature
> also can share their views.

I'm sorry, but our policy is that crossposts are not allowed.  I think
this policy is bad, precisely because it prevents legitimate cases like
this one; but it is what it is.

I think we should change the policy, not back to allow indiscriminate
cross-posting, but to allow some limited form of it.  For example I
think pg-bugs+pg-hackers and pg-docs+pg-hackers should be allowed
combinations.  Just saying.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/




Re: Does having pg_last_wal_replay_lsn[replica] >= pg_current_wal_insert_lsn[master] guarantee that the replica is caught up?

2022-08-03 Thread Dmitry Koterov
Thank you for the detailed explanation!

I doubt many people from -general would actually be able to provide such
info since the spirit of that list is to find work-arounds for problems and
questions at user level rather than dig into the details on how something
actually works.

It's worth adding to the documentation, with that exact example BTW:
https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-ADMIN-BACKUP-TABLE
(I can try submitting a docs PR if you think it's a good idea).

Also, when I said that we use PQexec, I did it just for an illustration: in
practice we use the node-postgres JS library which sends multi-statement
protocol messages. So - transaction wise - it works the same way as PQexec
with multiple queries, but it returns responses for ALL queries in the
batch, not just for the last one (very convenient BTW, saves on network
round-trip latency). This mode is fully supported by PG wire protocol:
https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-FLOW-MULTI-STATEMENT


On Wed, Aug 3, 2022 at 12:32 AM Kyotaro Horiguchi 
wrote:

>
> 
>


=# select pg_current_wal_insert_lsn();
>  pg_current_wal_insert_lsn
> ---
>  0/*68E5038*
> (1 row)
> =# insert into t values(0)\; select pg_current_wal_lsn();
> INSERT 0 1
>  pg_current_wal_lsn
> 
>  0/*68E5038*
> (1 row)
> =# select pg_current_wal_insert_lsn();
>  pg_current_wal_insert_lsn
> ---
>  0/68E50A0
> (1 row)




>

> =# select pg_current_wal_insert_lsn();
>  pg_current_wal_insert_lsn
> ---
>  0/*68E75C8*
> (1 row)
> =# begin\;insert into t values(0)\;commit\; select pg_current_wal_lsn();
>  pg_current_wal_lsn
> 
>  0/*68E7958*
>


RE: Improve logging when using Huge Pages

2022-08-03 Thread Shinoda, Noriyoshi (PN Japan FSIP)
Hello,

> As discussed in [1], we're taking this opportunity to return some patchsets 
> that don't appear to be getting enough reviewer interest.
Thank you for your helpful comments.
As you say, my patch doesn't seem to be of much interest to reviewers either.
I will discard the patch I proposed this time and consider it again.

Regards,
Noriyoshi Shinoda

-Original Message-
From: Jacob Champion  
Sent: Tuesday, August 2, 2022 5:45 AM
To: Shinoda, Noriyoshi (PN Japan FSIP) ; Masahiko 
Sawada ; Fujii Masao 
Cc: Justin Pryzby ; PostgreSQL-development 
; Julien Rouhaud ; Tom Lane 
; Kyotaro Horiguchi 
Subject: Re: Improve logging when using Huge Pages

As discussed in [1], we're taking this opportunity to return some patchsets 
that don't appear to be getting enough reviewer interest.

This is not a rejection, since we don't necessarily think there's anything 
unacceptable about the entry, but it differs from a standard "Returned with 
Feedback" in that there's probably not much actionable feedback at all. Rather 
than code changes, what this patch needs is more community interest. You might

- ask people for help with your approach,
- see if there are similar patches that your code could supplement,
- get interested parties to agree to review your patch in a CF, or
- possibly present the functionality in a way that's easier to review
  overall.

(Doing these things is no guarantee that there will be interest, but it's 
hopefully better than endlessly rebasing a patchset that is not receiving any 
feedback from the community.)

Once you think you've built up some community support and the patchset is ready 
for review, you (or any interested party) can resurrect the patch entry by 
visiting

https://commitfest.postgresql.org/38/3310/ 

and changing the status to "Needs Review", and then changing the status again 
to "Move to next CF". (Don't forget the second step; hopefully we will have 
streamlined this in the near future!)

Thanks,
--Jacob

[1] https://postgr.es/m/86140760-8ba5-6f3a-3e6e-5ca6c060b...@timescale.com 


  1   2   >