Re: A proposal to force-drop replication slots to make disabling async/sync standbys or logical replication faster in production environments

2022-06-08 Thread Tom Lane
Bharath Rupireddy  writes:
> How about we provide a function to force-drop a replication slot?

Isn't this akin to filing off the safety interlock on the loaded revolver
you keep in your hip pocket?  IMO the entire point of replication slots
is to not make it easy to lose data.

regards, tom lane




Re: Bump MIN_WINNT to 0x0600 (Vista) as minimal runtime in 16~

2022-06-08 Thread Michael Paquier
On Thu, Jun 09, 2022 at 04:55:45PM +1200, Thomas Munro wrote:
> The Cygwin stuff in installation.sgml also mentions NT, 2000, XP, but
> it's not clear from the phrasing if it meant "and later" or "and
> earlier", so I'm not sure if it needs adjusting or removing...

Right.  We could just remove the entire mention to "NT, 2000 or XP"
instead?  There would be no loss in clarity IMO.

> I think we can drop mention of Itanium (RIP): the ancient versions of
> Windows that could run on that arch are desupported with your patch.
> It might be more relevant to say that we can't yet run on ARM, and
> Windows 11 is untested by us, but let's fix those problems instead of
> documenting them :-)

Okay to remove the Itanium part for me.

> Is the mention of "64-Bit" in that title sounds a little anachronistic
> to me (isn't that the norm now?) but I'm not sure what to suggest.

Not sure.  I think that I would leave this part alone for now.

> There are more mentions of older Windows releases near the compiler
> stuff but perhaps your MSVC version vacuuming work will take care of
> those.

Yes, I have a few changes like the one in main.c for _M_AMD64.  Are
you referring to something else?
--
Michael


signature.asc
Description: PGP signature


Re: Collation version tracking for macOS

2022-06-08 Thread Peter Geoghegan
On Wed, Jun 8, 2022 at 10:24 PM Jeremy Schneider
 wrote:
> Even if PG supports two versions of ICU, how does someone actually go about 
> removing every dependency on the old version and replacing it with the new?

They simply REINDEX, without changing anything. The details are still
fuzzy, but at least that's what I was thinking of.

This should be possible by generalizing the definition of a collation
to recognize that different ICU versions can support the same
collation. Of course we'd also have to remember which actual ICU
version and specific "physical collation" was currently in use by each
index. We'd also probably have to have some policy about which ICU
version was the latest (or some suitably generalized version of that
that applies to collation providers more generally).

> Can it be done without downtime? Can it be done without modifying a running 
> application?

Clearly the only way that we can ever transition to a new "physical
collation" is by reindexing using a newer ICU version. And clearly
there is going to be a need to fully deprecate any legacy version of
ICU on a long enough timeline. There is just no getting around that.

The advantage of an approach along the lines that I've laid out is
that everything can be done incrementally, possibly some time after an
initial OS or Posgres upgrade, once everything has settled. Much much
later, even. If the same new ICU version isn't available in your
original/old environment (which is likely), you can avoid reindexing,
and so reserve the option of backing out of a complex upgrade until
very late in the process. You're going to have to do it eventually,
but it can probably just be an afterthought.

-- 
Peter Geoghegan




A proposal to force-drop replication slots to make disabling async/sync standbys or logical replication faster in production environments

2022-06-08 Thread Bharath Rupireddy
Hi,

Currently postgres doesn't allow dropping a replication slot that's active
[1]. This can make certain operations more time-consuming or stuck in
production environments. These operations are - disable async/sync standbys
and disable logical replication that require the postgres running on
standby or the subscriber to go down. If stopping postgres server takes
time, the VM or container will have to be killed forcefully which can take
a considerable amount of time as there are many layers in between.

How about we provide a function to force-drop a replication slot? All other
things such as stopping postgres and gracefully unprovisioning VM etc. can
be taken care of in the background. This force-drop function will also have
to ensure that the walsender that's active for the replication slot is
terminated gracefully without letting postmaster restart the other backends
(right now if a wal sender is exited/terminated, the postmaster restarts
all other backends too). The main advantage of the force-drop function is
that the disable operations can be quicker and there is no down time/crash
on the primary/source server.

Thoughts?

[1] ERROR:  replication slot "foo" is active for PID 2598155

Regards,
Bharath Rupireddy.


Re: Collation version tracking for macOS

2022-06-08 Thread Jeremy Schneider


> On Jun 8, 2022, at 03:19, Thomas Munro  wrote:
> 
> On Wed, Jun 8, 2022 at 12:23 PM Peter Geoghegan  wrote:
>> ISTM that there are two mostly-distinct questions here:
>> 
>> 1. How do we link to multiple versions of ICU at the same time, in a
>> way that is going to work smoothly on mainstream platforms?
>> 
> Yeah.  Well I couldn't resist doing some (very!) experimental hacking.
> See attached. 


Even if PG supports two versions of ICU, how does someone actually go about 
removing every dependency on the old version and replacing it with the new? Can 
it be done without downtime? Can it be done without modifying a running 
application? Avoiding “collate” clauses on SQL statements requires working 
behind the scenes with defaults and indexes and partitions and constraints and 
everything else. I’m having a hard time coming up with a way this would be 
possible in practice, with all the places collations can show up.

Is the idea of “alter database” to change the default collation even realistic?

I’m having a bit of trouble picturing what the end game is here

-Jeremy


Sent from my TI-83





Re: [PoC] Let libpq reject unexpected authentication requests

2022-06-08 Thread Michael Paquier
On Tue, Jun 07, 2022 at 02:22:28PM -0700, Jacob Champion wrote:
> v2 rebases over latest, removes the alternate spellings of "password",
> and implements OR operations with a comma-separated list. For example:
> 
> - require_auth=cert means that the server must ask for, and the client
> must provide, a client certificate.

Hmm..  Nya.

> - require_auth=password,md5 means that the server must ask for a
> plaintext password or an MD5 hash.
> - require_auth=scram-sha-256,gss means that one of SCRAM, Kerberos
> authentication, or GSS transport encryption must be successfully
> negotiated.

Makes sense.

> - require_auth=scram-sha-256,cert means that either a SCRAM handshake
> must be completed, or the server must request a client certificate. It
> has a potential pitfall in that it allows a partial SCRAM handshake,
> as long as a certificate is requested and sent.

Er, this one could be a problem protocol-wise for SASL, because that
would mean that the authentication gets completed but that the
exchange has begun and is not finished?

> AND and NOT, discussed upthread, are not yet implemented. I tied
> myself up in knots trying to make AND generic, so I think I"m going to
> tackle NOT first, instead. The problem with AND is that it only makes
> sense when one (and only one) of the options is a form of transport
> authentication. (E.g. password+md5 never makes sense.) And although
> cert+ and gss+ could be useful, the latter case
> is already handled by gssencmode=require, and the gssencmode option is
> more powerful since you can disable it (or set it to don't-care).

I am on the edge regarding NOT as well, and I am unsure of the actual
benefits we could get from it as long as one can provide a white list
of auth methods.  If we don't see an immediate benefit in that, I'd
rather choose a minimal, still useful, design.  As a whole, I would
vote with adding only support for OR and a comma-separated list like
your proposal.

> I'm not generally happy with how the "cert" option is working. With
> the other methods, if you don't include a method in the list, then the
> connection fails if the server tries to negotiate it. But if you don't
> include the cert method in the list, we don't forbid the server from
> asking for a cert, because the server always asks for a client
> certificate via TLS whether it needs one or not. Behaving in the
> intuitive way here would effectively break all use of TLS.

Agreed.  Looking at what you are doing with allowed_auth_method_cert,
this makes the code harder to follow, which is risky for any
security-related feature, and that's different than the other methods
where we have the AUTH_REQ_* codes.  This leads to extra complications
in the shape of ssl_cert_requested and ssl_cert_sent, as well.  This
strongly looks like what we do for channel binding as it has
requirements separated from the actual auth methods but has dependency
with them, so a different parameter, as suggested, would make sense.
If we are not sure about this part, we could discard it in the first
instance of the patch.

> So I think Tom's recommendation that the cert method be handled by an
> orthogonal option was a good one, and if that works then maybe we
> don't need an AND syntax at all. Presumably I can just add an option
> that parallels gssencmode, and then the current don't-care semantics
> can be explicitly controlled. Skipping AND also means that I don't
> have to create a syntax that can handle AND and NOT at the same time,
> which I was dreading.

I am not convinced that we have any need for the AND grammar within
one parameter, as that's basically the same thing as defining multiple
connection parameters, isn't it?  This is somewhat a bit similar to
the interactions of channel binding with this new parameter and what
you have implemented.  For example, channel_binding=require
require_auth=md5 would imply both and should fail, even if it makes
little sense because MD5 has no idea of channel binding.  One
interesting case comes down to stuff like channel_binding=require
require_auth="md5,scram-sha-256", where I think that we should still
fail even if the server asks for MD5 and enforce an equivalent of an
AND grammar through the parameters.  This reasoning limits the
dependencies between each parameter and treats the areas where these
are checked independently, which is what check_expected_areq() does
for channel binding.  So that's more robust at the end.

Speaking of which, I would add tests to check some combinations of
channel_binding and require_auth.

+   appendPQExpBuffer(>errorMessage,
+ libpq_gettext("auth method \"%s\" required, but 
%s\n"),
+ conn->require_auth, reason);
This one is going to make translation impossible.  One way to tackle
this issue is to use "auth method \"%s\" required: %s".

+   {"require_auth", NULL, NULL, NULL,
+   "Require-Auth", "", 14, /* sizeof("scram-sha-256") == 14 */
+   offsetof(struct pg_conn, 

Re: Bump MIN_WINNT to 0x0600 (Vista) as minimal runtime in 16~

2022-06-08 Thread Thomas Munro
On Thu, Jun 9, 2022 at 3:55 PM Michael Paquier  wrote:
> On Mon, May 30, 2022 at 03:59:52PM +0900, Michael Paquier wrote:
> > Okay, I am outnumbered, and that would mean bumping MIN_WINNT to
> > 0x0A00.  So, ready to move to this version at full speed for 16?  We
> > still have a couple of weeks ahead before the next dev cycle begins,
> > so feel free to comment, of course.
>
> And attached is an updated patch to do exactly that.

PostgreSQL can be expected to work on
these operating
-   systems: Linux (all recent distributions), Windows (XP and later),
+   systems: Linux (all recent distributions), Windows (10 and later),
FreeBSD, OpenBSD, NetBSD, macOS, AIX, HP/UX, and Solaris.

Cool.  (I'm not sure what "all recent distributions" contributes but
that's not from your patch...).

The Cygwin stuff in installation.sgml also mentions NT, 2000, XP, but
it's not clear from the phrasing if it meant "and later" or "and
earlier", so I'm not sure if it needs adjusting or removing...

While looking for more stuff to vacuum, I found this:

  Special Considerations for 64-Bit Windows

  
   PostgreSQL will only build for the x64 architecture on 64-bit Windows, there
   is no support for Itanium processors.
  

I think we can drop mention of Itanium (RIP): the ancient versions of
Windows that could run on that arch are desupported with your patch.
It might be more relevant to say that we can't yet run on ARM, and
Windows 11 is untested by us, but let's fix those problems instead of
documenting them :-)

Is the mention of "64-Bit" in that title sounds a little anachronistic
to me (isn't that the norm now?) but I'm not sure what to suggest.

There are more mentions of older Windows releases near the compiler
stuff but perhaps your MSVC version vacuuming work will take care of
those.




A proposal to provide a timeout option for CREATE_REPLICATION_SLOT/pg_create_logical_replication_slot

2022-06-08 Thread Bharath Rupireddy
Hi,

Currently CREATE_REPLICATION_SLOT/pg_create_logical_replication_slot waits
unboundedly if there are any in-progress write transactions [1]. The wait
is for a reason actually i.e. for building an initial snapshot, but waiting
unboundedly isn't good for usability of the command/function and when
stuck, the callers will not have any information as to why.

How about we provide a timeout for the command/function instead of letting
them wait unboundedly? The behavior will be something like this - if the
logical replication slot isn't created within this timeout, the
command/function will fail.

We could've asked callers to set statement_timeout before calling
CREATE_REPLICATION_SLOT/pg_create_logical_replication_slot but that impacts
the queries running in all other sessions and it may not be always possible
to set this parameter just for the session that runs command
CREATE_REPLICATION_SLOT.

Thoughts?

[1]
(gdb) bt
#0  0x7fc21509a45a in epoll_wait (epfd=9, events=0x561874204e88,
maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x56187350e9cc in WaitEventSetWaitBlock (set=0x561874204e28,
cur_timeout=-1, occurred_events=0x7fff72b3a4a0, nevents=1) at latch.c:1467
#2  0x56187350e847 in WaitEventSetWait (set=0x561874204e28, timeout=-1,
occurred_events=0x7fff72b3a4a0, nevents=1, wait_event_info=50331653) at
latch.c:1413
#3  0x56187350db64 in WaitLatch (latch=0x7fc21292f324, wakeEvents=33,
timeout=0, wait_event_info=50331653) at latch.c:475
#4  0x56187353b5b2 in ProcSleep (locallock=0x56187422aa58,
lockMethodTable=0x561873a61a20 ) at proc.c:1337
#5  0x561873527e49 in WaitOnLock (locallock=0x56187422aa58,
owner=0x5618742888b0) at lock.c:1859
#6  0x561873526730 in LockAcquireExtended (locktag=0x7fff72b3a8a0,
lockmode=5, sessionLock=false, dontWait=false, reportMemoryError=true,
locallockp=0x0) at lock.c:1101
#7  0x561873525b9d in LockAcquire (locktag=0x7fff72b3a8a0, lockmode=5,
sessionLock=false, dontWait=false) at lock.c:752
#8  0x561873524099 in XactLockTableWait (xid=734, rel=0x0, ctid=0x0,
oper=XLTW_None) at lmgr.c:702
#9  0x5618734a69c4 in SnapBuildWaitSnapshot (running=0x561874315a18,
cutoff=735) at snapbuild.c:1416
#10 0x5618734a67a2 in SnapBuildFindSnapshot (builder=0x561874311a80,
lsn=21941704, running=0x561874315a18) at snapbuild.c:1328
#11 0x5618734a62c4 in SnapBuildProcessRunningXacts
(builder=0x561874311a80, lsn=21941704, running=0x561874315a18) at
snapbuild.c:1117
#12 0x56187348cab0 in standby_decode (ctx=0x5618742fb9e0,
buf=0x7fff72b3aa00) at decode.c:346
#13 0x56187348c34e in LogicalDecodingProcessRecord (ctx=0x5618742fb9e0,
record=0x5618742fbda0) at decode.c:119
#14 0x56187349124e in DecodingContextFindStartpoint
(ctx=0x5618742fb9e0) at logical.c:613
#15 0x5618734c2ab3 in create_logical_replication_slot
(name=0x56187420d848 "slot1", plugin=0x56187420d8f8 "test_decoding",
temporary=false, two_phase=false, restart_lsn=0, find_startpoint=true) at
slotfuncs.c:158
#16 0x5618734c2bb8 in pg_create_logical_replication_slot
(fcinfo=0x5618742efdd0) at slotfuncs.c:187
#17 0x5618732def6b in ExecMakeTableFunctionResult
(setexpr=0x5618742dc318, econtext=0x5618742dc1d0,
argContext=0x5618742efcb0, expectedDesc=0x5618742ec098, randomAccess=false)
at execSRF.c:234
#18 0x5618732fbc27 in FunctionNext (node=0x5618742dbfb8) at
nodeFunctionscan.c:95
#19 0x5618732e0987 in ExecScanFetch (node=0x5618742dbfb8,
accessMtd=0x5618732fbb72 , recheckMtd=0x5618732fbf6e
) at execScan.c:133
#20 0x5618732e0a00 in ExecScan (node=0x5618742dbfb8,
accessMtd=0x5618732fbb72 , recheckMtd=0x5618732fbf6e
) at execScan.c:182
#21 0x5618732fbfc4 in ExecFunctionScan (pstate=0x5618742dbfb8) at
nodeFunctionscan.c:270
#22 0x5618732dc693 in ExecProcNodeFirst (node=0x5618742dbfb8) at
execProcnode.c:463
#23 0x5618732cfe80 in ExecProcNode (node=0x5618742dbfb8) at
../../../src/include/executor/executor.h:259

Regards,
Bharath Rupireddy.


Re: Bump MIN_WINNT to 0x0600 (Vista) as minimal runtime in 16~

2022-06-08 Thread Michael Paquier
On Mon, May 30, 2022 at 03:59:52PM +0900, Michael Paquier wrote:
> Okay, I am outnumbered, and that would mean bumping MIN_WINNT to
> 0x0A00.  So, ready to move to this version at full speed for 16?  We
> still have a couple of weeks ahead before the next dev cycle begins,
> so feel free to comment, of course.

And attached is an updated patch to do exactly that.
--
Michael
From 6ef8fbe7a4ef1c3f63d1d1b578f0d85129c07798 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 9 Jun 2022 12:54:07 +0900
Subject: [PATCH v2] Bump WIN_MINNT to 0x0A00 everywhere (Windows 10~)

---
 src/include/port/win32.h  | 11 +++
 src/backend/utils/adt/pg_locale.c |  4 ++--
 doc/src/sgml/installation.sgml|  2 +-
 3 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/src/include/port/win32.h b/src/include/port/win32.h
index c6213c77c3..fe0829cedc 100644
--- a/src/include/port/win32.h
+++ b/src/include/port/win32.h
@@ -11,15 +11,10 @@
 
 /*
  * Make sure _WIN32_WINNT has the minimum required value.
- * Leave a higher value in place. When building with at least Visual
- * Studio 2015 the minimum requirement is Windows Vista (0x0600) to
- * get support for GetLocaleInfoEx() with locales. For everything else
- * the minimum version is Windows XP (0x0501).
+ * Leave a higher value in place.  The minimum requirement is Windows 10.
  */
-#if defined(_MSC_VER) && _MSC_VER >= 1900
-#define MIN_WINNT 0x0600
-#else
-#define MIN_WINNT 0x0501
+#if defined(_MSC_VER)
+#define MIN_WINNT 0x0A00
 #endif
 
 #if defined(_WIN32_WINNT) && _WIN32_WINNT < MIN_WINNT
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index a0490a7522..4c39841b99 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1729,7 +1729,7 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 		else
 			ereport(ERROR,
 	(errmsg("could not load locale \"%s\"", collcollate)));
-#elif defined(WIN32) && _WIN32_WINNT >= 0x0600
+#elif defined(WIN32)
 		/*
 		 * If we are targeting Windows Vista and above, we can ask for a name
 		 * given a collation name (earlier versions required a location code
@@ -1757,7 +1757,7 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 			collcollate,
 			GetLastError(;
 		}
-		collversion = psprintf("%d.%d,%d.%d",
+		collversion = psprintf("%ld.%ld,%ld.%ld",
 			   (version.dwNLSVersion >> 8) & 0x,
 			   version.dwNLSVersion & 0xFF,
 			   (version.dwDefinedVersion >> 8) & 0x,
diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index c585078029..7e6e7cc6bd 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -2136,7 +2136,7 @@ export MANPATH
 
   
PostgreSQL can be expected to work on these operating
-   systems: Linux (all recent distributions), Windows (XP and later),
+   systems: Linux (all recent distributions), Windows (10 and later),
FreeBSD, OpenBSD, NetBSD, macOS, AIX, HP/UX, and Solaris.
Other Unix-like systems may also work but are not currently
being tested.  In most cases, all CPU architectures supported by
-- 
2.36.1



signature.asc
Description: PGP signature


Re: Convert macros to static inline functions

2022-06-08 Thread Peter Geoghegan
On Mon, May 16, 2022 at 1:28 AM Peter Eisentraut
 wrote:
> Inspired by [0], I looked to convert more macros to inline functions.
> The attached patches are organized "bottom up" in terms of their API
> layering; some of the later ones depend on some of the earlier ones.

Big +1 from me.

I converted over most of the nbtree.h function style macros in
Postgres 13, having put it off in Postgres 12 (there is one remaining
function macro due to an issue with #include dependencies). This
vastly improved the maintainability of the code, and I wish I'd done
it sooner.

Inline functions made it a lot easier to pepper various B-Tree code
utility functions with defensive assertions concerning preconditions
and postconditions. That's something that I am particular about. In
theory you can just use AssertMacro() in a function style macro. In
practice that approach is ugly, and necessitates thinking about
multiple evaluation hazards, which is enough to discourage good
defensive coding practices.

-- 
Peter Geoghegan




Re: amcheck is using a wrong macro to check compressed-ness

2022-06-08 Thread Michael Paquier
On Wed, May 18, 2022 at 09:55:07AM +0900, Michael Paquier wrote:
> On Tue, May 17, 2022 at 04:58:11PM +0900, Michael Paquier wrote:
>> Adding Robert in CC, as this has been added with bd807be.  I have
>> added an open item for now.
> 
> With the individual in CC, that's even better.

Three weeks later, ping.  Robert, could you look at this thread?
--
Michael


signature.asc
Description: PGP signature


Re: Logging query parmeters in auto_explain

2022-06-08 Thread Michael Paquier
On Tue, Jun 07, 2022 at 12:18:52PM +0100, Dagfinn Ilmari Mannsåker wrote:
> Point, fixed in the attached v2. I've also added a test that truncation
> and disabling works.

The tests are structured so as all the queries are run first, then the
full set of logs is slurped and scanned.  With things like tests for
truncation, it becomes easier to have a given test overlap with
another one in terms of pattern matching, so we could silently lose
coverage.  Wouldn't it be better to reorganize things so we save the
current size of the log file (as of -s $node->logfile), run one query,
and then slurp the log file based on the position saved previously?
The GUC updates had better be localized in each sub-section of the
tests, as well. 
--
Michael


signature.asc
Description: PGP signature


June 16, 2022 out-of-cycle release for PostgreSQL 14

2022-06-08 Thread Jonathan S. Katz

Hi,

Due to the issue with potential data corruption when running `CREATE 
INDEX CONCURRENTLY` / `REINDEX CONCURRENTLY` on PostgreSQL 14[1], the 
release team has decided to make an out-of-cycle release available for 
PostgreSQL 14 on June 16, 2022.


This release will be numbered 14.4. This release is *only* for 
PostgreSQL 14; we are not releasing any other versions as part of this 
release.


If you are planning to include any other bug fixes, please do so no 
later than Saturday, June 11 11:59pm AoE[2]. Because this is an 
out-of-cycle release, we ask that you use extra discretion when 
committing fixes to the REL_14_STABLE branch as we would like to avoid 
additional regressions.


Please let us know if you have any questions.

Thanks,

Jonathan

[1] 
https://www.postgresql.org/message-id/flat/17485-396609c6925b982d%40postgresql.org

[2] https://en.wikipedia.org/wiki/Anywhere_on_Earth


OpenPGP_signature
Description: OpenPGP digital signature


Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

2022-06-08 Thread Michael Paquier
On Wed, Jun 08, 2022 at 04:13:37PM -0500, Justin Pryzby wrote:
> On Wed, Jun 08, 2022 at 10:55:29AM +0900, Michael Paquier wrote:
>> And applied, to take care of this open item.
> 
> Shouldn't this wait for the buildfarm to be updated again ?

The TAP logic is able to find any logs by itself on failure, so what
would be impacted is the case of the tests running pg_upgrade via the
past route in TestUpgrade.pm (it had better not run in the buildfarm
client for 15~ and I am wondering if it would be worth backpatching
the TAP test once it brews a bit more).  Anyway, seeing my time sheet
for the next couple of days coupled with a potential beta2 in the very
short term and with the broken upgrade workflow, I have given priority
to fix the issue because that's what impacts directly people looking
at 15 and testing their upgrades, which is what Tushar did.

Saying that, I have already sent a pull request to the buildfarm repo
to refresh the set of logs, as of the patch attached.  This updates
the logic so as this would work for any changes in the structure of
pg_upgrade_output.d/, fetching any files prefixed by ".log".
--
Michael
From c5cf14647c7acc5c4f704bb788a89afa0f5c5732 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 9 Jun 2022 09:36:18 +0900
Subject: [PATCH] Update TestUpgrade.pm to grab for any existing log files with
 Postgres 15~

Upstream has changed the location of the log files generated by
pg_upgrade to be in a directory located within the target cluster, named
pg_upgrade_output.d.  Its structure has changed to use more
subdirectories, and glob() is not really able to cope with that.  This
modifies the logic grabbing the log files to use "find" and grab all the
files within pg_upgrade_output.d that are prefixed with ".log".
---
 PGBuild/Modules/TestUpgrade.pm | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/PGBuild/Modules/TestUpgrade.pm b/PGBuild/Modules/TestUpgrade.pm
index f193251..1e0a1e3 100644
--- a/PGBuild/Modules/TestUpgrade.pm
+++ b/PGBuild/Modules/TestUpgrade.pm
@@ -18,6 +18,7 @@ use PGBuild::SCM;
 use PGBuild::Utils qw(:DEFAULT $steps_completed);
 
 use File::Basename;
+use File::Find;
 
 use strict;
 use warnings;
@@ -139,9 +140,18 @@ sub check
  $self->{pgsql}/src/bin/pg_upgrade/*.log
  $self->{pgsql}/src/bin/pg_upgrade/log/*
  $self->{pgsql}/src/bin/pg_upgrade/tmp_check/*/*.diffs
- $self->{pgsql}/src/bin/pg_upgrade/tmp_check/data/pg_upgrade_output.d/log/*
  $self->{pgsql}/src/test/regress/*.diffs"
 	);
+
+	# Extra location of logs, changed as per Postgres 15~ with multiple
+	# levels of subdirectories.
+	find(
+		sub {
+			push @logfiles, $File::Find::name
+if $File::Find::name =~ m/.*\.log/;
+		},
+		"$self->{pgsql}/src/bin/pg_upgrade/tmp_check/data/pg_upgrade_output.d/");
+
 	$log->add_log($_) foreach (@logfiles);
 
 	my $status = $? >> 8;
-- 
2.36.1



signature.asc
Description: PGP signature


Re: Error from the foreign RDBMS on a foreign table I have no privilege on

2022-06-08 Thread Laurenz Albe
On Wed, 2022-06-08 at 19:06 +0900, Etsuro Fujita wrote:
> On Wed, Jun 8, 2022 at 2:51 PM Kyotaro Horiguchi  
> wrote:
> > At Wed, 08 Jun 2022 07:05:09 +0200, Laurenz Albe  
> > wrote in
> > > diff --git a/doc/src/sgml/postgres-fdw.sgml 
> > > b/doc/src/sgml/postgres-fdw.sgml
> > > index b43d0aecba..b4b7e36d28 100644
> > > --- a/doc/src/sgml/postgres-fdw.sgml
> > > +++ b/doc/src/sgml/postgres-fdw.sgml
> > > @@ -274,6 +274,14 @@ OPTIONS (ADD password_required 'false');
> > >     but only for that table.
> > >     The default is false.
> > >    
> > > +
> > > +  
> > > +   Note that EXPLAIN will be run on the remote 
> > > server
> > > +   at query planning time, before permissions 
> > > on the
> > > +   foreign table are checked.  This is not a security problem, since 
> > > the
> > > +   subsequent error from the permission check will prevent the user 
> > > from
> > > +   seeing any of the resulting data.
> > > +  
> > >   
> > >  
> > 
> > Looks fine.  I'd like to add something like "If needed, depriving
> > unprivileged users of relevant user mappings will prevent such remote
> > executions that happen at planning-time."
> 
> I agree on that point; if the EXPLAIN done on the remote side is
> really a problem, I think the user should revoke privileges from the
> remote user specified in the user mapping, to prevent it.  I’d rather
> recommend granting to the remote user privileges consistent with those
> granted to the local user.

I don't think that is better.  Even if the local and remote privileges are
consistent, you will get an error from the *remote* table access when trying
to use a foreign table on which you don't have permissions.
The above paragraph describes why.
Note that the original complaint against oracle_fdw that led to this thread
was just such a case.

Yours,
Laurenz Albe




Re: PGDOCS - "System Catalogs" table-of-contents page structure

2022-06-08 Thread Peter Smith
On Thu, Jun 9, 2022 at 10:00 AM Tom Lane  wrote:
>
> Peter Smith  writes:
> > e.g.2 What I thought it should look like:
>
> > Chapter 53. "System Catalogs and Views" <-- chapter name change
> > ==
> > 53.1. System Catalogs  <-- heading name change
> > 53.1.1. pg_aggregate
> > 53.1.2. pg_am
> > 53.1.3. pg_amop
> > 53.1.4. pg_amproc
>
> Then the catalog descriptions would not be on separate pages.
>

Oh, right. Thanks for the explanation.

--
Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: PGDOCS - "System Catalogs" table-of-contents page structure

2022-06-08 Thread Tom Lane
Peter Smith  writes:
> e.g.2 What I thought it should look like:

> Chapter 53. "System Catalogs and Views" <-- chapter name change
> ==
> 53.1. System Catalogs  <-- heading name change
> 53.1.1. pg_aggregate
> 53.1.2. pg_am
> 53.1.3. pg_amop
> 53.1.4. pg_amproc

Then the catalog descriptions would not be on separate pages.

> OTOH it looks like this table of contents page has been this way
> forever (20+ years?). It is hard to believe nobody else suggested
> modifying it in all that time, so perhaps there is some reason for it
> being like it is?

Perhaps that it's just fine as-is.

regards, tom lane




Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-06-08 Thread Jacob Champion
On Tue, Jun 7, 2022 at 5:45 PM Robert Haas  wrote:
> Perhaps that is
> ill-founded, but I don't think "should be serialized" is necessarily
> something that everybody is going to have the same view on, or even
> know what it means.

Using this thread as an example, once it was decided that the parallel
workers needed the additional info, the need for serialization
followed directly from that. I don't get the feeling that developers
are going to jump through the hoops of writing serialization logic for
a new field in the struct just by accident; they should know why
they're writing that code, and hopefully it would be easy for
reviewers to catch a patch that showed up with pointless
serialization.

> Also, I don't think we want to end up with a situation where we have a
> struct that contains wildly unrelated things that all need to be
> serialized. If the definition of the struct is "stuff we should
> serialize and send to the worker," well then maybe the transaction
> snapshot ought to go in there! Well, no. I mean, we already have a
> separate place for that, but suppose somehow we didn't. It doesn't
> belong here, because yes the things in this struct get serialized, but
> it's not only any old thing that needs serializing, it's more specific
> than that.

I completely agree with you here -- the name should not be so generic
that it's just a catch-all for any serialized fields that exist.

> I guess what this boils down to is that I really want this thing to
> have a meaningful name by means of which a future developer can make a
> guess as to whether some field they're adding ought to go in there. I
> theorize that SharedPort is not too great because (a) Port is already
> a bad name and (b) how am I supposed to know whether my stuff ought to
> be shared or not? I like something like ClientConnectionInfo better
> because it seems to describe what the stuff in the struct *is* rather
> than what we *do* with it.

I think having both would be useful in this case -- what the stuff is,
so that it's clear what doesn't belong in it, and what we do with it,
so it's clear that you have to write serialization code if you add new
things. The nature of the struct is such that I think you _have_ to
figure out whether or not your stuff ought to be shared before you
have any business adding it.

But I don't have any better ideas for how to achieve both. I'm fine
with your suggestion of ClientConnectionInfo, if that sounds good to
others; the doc comment can clarify why it differs from Port? Or add
one of the Shared-/Gang-/Group- prefixes to it, maybe?

Thanks,
--Jacob




Re: BTMaxItemSize seems to be subtly incorrect

2022-06-08 Thread Peter Geoghegan
On Wed, Jun 8, 2022 at 4:18 PM Robert Haas  wrote:
> I wasn't originally setting out to modify BTPageOpaqueData at all,
> just borrow some special space. See the "storing an explicit nonce"
> discussion and patch set. But when this regression failure turned up I
> said to myself, hmm, I think this is an unrelated bug.

FWIW I don't see much difference between borrowing special space and
adding something to BTPageOpaqueData. If anything I'd prefer the
latter.

> > I think that we should fix this on HEAD, on general principle. There
> > is no reason to believe that this is a live bug, so a backpatch seems
> > unnecessary.
>
> Yeah, I agree with not back-patching the fix, unless it turns out that
> there is some platform where the same issue occurs without any
> cabbage. I assume that if it happened on any common system someone
> would have complained about it by now, so probably it doesn't.

I don't think it's possible on 32-bit platforms, which is the only
further variation I can think of. But let's assume that the same
problem was in fact possible "without cabbage", just for the sake of
argument. The worst problem that could result would be a failure to
split a page that had maximally large keys, without TOAST compression
-- which is what you demonstrated. There wouldn't be any downstream
consequences.

Here's why: BTMaxItemSizeNoHeapTid() is actually what BTMaxItemSize()
looked like prior to Postgres 12. So the limit on internal pages never
changed, even in Postgres 12. There was no separate leaf page limit
prior to 12. Only the rules on the leaf level ever really changed.

Note also that amcheck has tests for this stuff. Though that probably
doesn't matter at all.

-- 
Peter Geoghegan




PGDOCS - "System Catalogs" table-of-contents page structure

2022-06-08 Thread Peter Smith
Hi hackers,

(Resending because my previous post was missing the subject - sorry for noise)

Recently when looking at the "System Catalogs" Tables of Contents [1],
I was wondering why are those headings "Overview" and "System Views"
at the same section level as the catalogs/views within them.

~~~

e.g.1. Current:

Chapter 53. "System Catalogs"
==
53.1. Overview
53.2. pg_aggregate
53.3. pg_am
53.4. pg_amop
53.5. pg_amproc
...
53.66. System Views
53.67. pg_available_extensions
53.68. pg_available_extension_versions
53.69. pg_backend_memory_contexts
53.70. pg_config
...
==

e.g.2 What I thought it should look like:

Chapter 53. "System Catalogs and Views" <-- chapter name change
==
53.1. System Catalogs  <-- heading name change
53.1.1. pg_aggregate
53.1.2. pg_am
53.1.3. pg_amop
53.1.4. pg_amproc
...
53.2. System Views
53.2.1. pg_available_extensions
53.2.2. pg_available_extension_versions
53.2.3. pg_backend_memory_contexts
53.2.4. pg_config
...
==

~~~

OTOH it looks like this table of contents page has been this way
forever (20+ years?). It is hard to believe nobody else suggested
modifying it in all that time, so perhaps there is some reason for it
being like it is?

Thoughts?

--
[1] https://www.postgresql.org/docs/15/catalogs.html

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Collation version tracking for macOS

2022-06-08 Thread Jeremy Schneider
New emoji are getting added with some frequency, it’s a thing lately…

New Unicode chars use existing but unassigned code points. All code points are 
able to be encoded, claimed or unclaimed.

Someone on old glibc or ICU can still store the new characters. As long as 
there’s an input field. You wouldn’t believe some stuff I’ve seen people enter 
in the “name” field for web apps…  It’ll get some undefined or default sort 
behavior for unrecognized or unassigned code points.

When the libs are updated, those new chars begin to sort correctly, which is a 
change and breaks indexes (and potentially other stuff).

-Jeremy

Sent from my TI-83

> On Jun 8, 2022, at 16:34, Thomas Munro  wrote:
> On Thu, Jun 9, 2022 at 5:42 AM Tom Lane  wrote:
>> I'm sure that Apple are indeed updating the UTF8 data behind
>> their proprietary i18n APIs, but the libc APIs are mostly getting benign
>> neglect.
> 
> As for how exactly they might be doing that, I don't know, but a bit
> of light googling tells me that a private, headerless,
> please-don't-call-me-directly copy of ICU arrived back in macOS
> 10.3[1].  I don't see it on my 12.4 system, but I also know that 12.x
> started hiding system libraries completely (the linker is magic and
> pulls libraries from some parallel dimension, there is no
> /usr/lib/libSystem.B.dylib file on disk, and yet otool -L
>  references it).
> 
> It's a lovely client machine, but I don't know if anyone really runs
> meaningful database server stuff on macOS.  I think if I did I'd be
> very keen to use ICU for everything directly, rather than trying to
> unpick any of that and talk to Apple's API...  I think the
> how-to-support-multiple-ICUs subrant/subthread is a much more
> interesting topic.  I have no idea if the dlopen() concept I mentioned
> is the right way forward, but FWIW the experimental patch I posted
> seems to work just fine on a Mac, using multiple ICU libraries
> installed by MacPorts, which might be useful to developers
> contemplating that stuff.
> 
> [1] https://lists.apple.com/archives/xcode-users/2005/Jun/msg00633.html




Re: Tightening behaviour for non-immutable behaviour in immutable functions

2022-06-08 Thread Mark Dilger



> On Jun 8, 2022, at 2:42 PM, Greg Stark  wrote:
> 
> Thinking of plpgsql here, we already run the raw parser on all sql
> when the function is defined. We could somehow check whether the
> raw_parser found any non-schema-qualified references. This looks like
> it would be awkward but doable. That would allow users to write
> non-search_path-dependent code and if postgres doesn't warn they would
> know they've done it properly. It would still put quite a burden on
> users, especially when it comes to operators...
> 
> Or alternatively we could offer lexical scoping so that all objects
> are looked up at parse time and the fully qualified reference is
> stored instead of the non-qualified reference. That would be more
> similar to how views and other object references are handled.

I like the general idea, but I'm confused why you are limiting the analysis to 
search path resolution.  The following is clearly wrong, but not for that 
reason:

create function public.identity () returns double precision as $$
  select random()::integer;
$$
language sql
immutable
parallel safe
-- set search_path to 'pg_catalog'
;

Uncommenting that last bit wouldn't make it much better.

Isn't the more general approach to look for non-immutable (or non-stable) 
operations, with object resolution just one type of non-immutable operation?  
Perhaps raise an error when you can prove the given function's provolatile 
marking is wrong, and a warning when you cannot prove the marking is correct?  
That would tend to give warnings for polymorphic functions that use functions 
or operators over the polymorphic types, or which use dynamic sql, but maybe 
that's ok.  Those functions probably deserve closer scrutiny anyway.


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Collation version tracking for macOS

2022-06-08 Thread Thomas Munro
On Thu, Jun 9, 2022 at 5:42 AM Tom Lane  wrote:
> I'm sure that Apple are indeed updating the UTF8 data behind
> their proprietary i18n APIs, but the libc APIs are mostly getting benign
> neglect.

As for how exactly they might be doing that, I don't know, but a bit
of light googling tells me that a private, headerless,
please-don't-call-me-directly copy of ICU arrived back in macOS
10.3[1].  I don't see it on my 12.4 system, but I also know that 12.x
started hiding system libraries completely (the linker is magic and
pulls libraries from some parallel dimension, there is no
/usr/lib/libSystem.B.dylib file on disk, and yet otool -L
 references it).

It's a lovely client machine, but I don't know if anyone really runs
meaningful database server stuff on macOS.  I think if I did I'd be
very keen to use ICU for everything directly, rather than trying to
unpick any of that and talk to Apple's API...  I think the
how-to-support-multiple-ICUs subrant/subthread is a much more
interesting topic.  I have no idea if the dlopen() concept I mentioned
is the right way forward, but FWIW the experimental patch I posted
seems to work just fine on a Mac, using multiple ICU libraries
installed by MacPorts, which might be useful to developers
contemplating that stuff.

[1] https://lists.apple.com/archives/xcode-users/2005/Jun/msg00633.html




[no subject]

2022-06-08 Thread Peter Smith
Hi hackers,

Recently when looking at the "System Catalogs" Tables of Contents [1],
I was wondering why are those headings "Overview" and "System Views"
at the same section level as the catalogs/views within them.

~~~

e.g.1. Current:

Chapter 53. "System Catalogs"
==
53.1. Overview
53.2. pg_aggregate
53.3. pg_am
53.4. pg_amop
53.5. pg_amproc
...
53.66. System Views
53.67. pg_available_extensions
53.68. pg_available_extension_versions
53.69. pg_backend_memory_contexts
53.70. pg_config
...
==

e.g.2 What I thought it should look like:

Chapter 53. "System Catalogs and Views" <-- chapter name change
==
53.1. System Catalogs  <-- heading name change
53.1.1. pg_aggregate
53.1.2. pg_am
53.1.3. pg_amop
53.1.4. pg_amproc
...
53.2. System Views
53.2.1. pg_available_extensions
53.2.2. pg_available_extension_versions
53.2.3. pg_backend_memory_contexts
53.2.4. pg_config
...
==

~~~

OTOH it looks like this table of contents page has been this way
forever (20+ years?). It is hard to believe nobody else suggested
modifying it in all that time, so perhaps there is some reason for it
being like it is?

Thoughts?

--
[1] https://www.postgresql.org/docs/15/catalogs.html

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: BTMaxItemSize seems to be subtly incorrect

2022-06-08 Thread Robert Haas
On Wed, Jun 8, 2022 at 5:55 PM Peter Geoghegan  wrote:
> > That's a problem, because if in that scenario you allow three 2704
> > byte items that don't need a heap TID and later you find you need to
> > add a heap TID to one of those items, the result will be bigger than
> > 2704 bytes, and then you can't fit three of them into a page.
>
> Seems you must be right. I'm guessing that the field "cabbage" was
> originally a nonce value, as part of a draft patch you're working on?

I wasn't originally setting out to modify BTPageOpaqueData at all,
just borrow some special space. See the "storing an explicit nonce"
discussion and patch set. But when this regression failure turned up I
said to myself, hmm, I think this is an unrelated bug.

> I think that we should fix this on HEAD, on general principle. There
> is no reason to believe that this is a live bug, so a backpatch seems
> unnecessary.

Yeah, I agree with not back-patching the fix, unless it turns out that
there is some platform where the same issue occurs without any
cabbage. I assume that if it happened on any common system someone
would have complained about it by now, so probably it doesn't. I
suppose we could try to enumerate plausibly different values of the
quantities involved and see if any of the combinations look like they
lead to a bad result. I'm not really sure how many things could
plausibly be different, though, apart from MAXIMUM_ALIGNOF.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: BTMaxItemSize seems to be subtly incorrect

2022-06-08 Thread Peter Geoghegan
On Wed, Jun 8, 2022 at 2:23 PM Robert Haas  wrote:
> In my tests, PageGetPageSize(page) = 8192, SizeOfPageHeaderData = 24,
> sizeof(ItemIdData) = 4, sizeof(ItemPointerData) = 6, and
> sizeof(BTPageOpaqueData) = 16. Assuming MAXIMUM_ALIGNOF == 8, I
> believe that makes BTMaxItemSize come out to 2704 and
> BTMaxItemSizeNoHeapTid come out to 2712.

I agree that these numbers are what you get on all mainstream
platforms. I know these specifics from memory alone, actually.

> To see why, suppose that sizeof(BTPageOpaqueData) were 24 rather than
> 16. Then we'd have:
>
> BTMaxItemSize = MAXALIGN_DOWN((8192 - MAXALIGN(24 + 3 * 4 + 3 * 6) -
> MAXALIGN(24)) / 3) = MAXALIGN_DOWN((8192 - MAXALIGN(54)  - 24) / 3) =
> MAXALIGN_DOWN(2704) = 2704
> BTMaxItemSizeNoHeapTid = MAXALIGN_DOWN((8192 - MAXALIGN(24 + 3 * 4) -
> MAXALIGN(24)) / 3 = MAXALIGN_DOWN((8192 - MAXALIGN(36)  - 24) / 3) =
> MAXALIGN_DOWN(2709) = 2704
>
> That's a problem, because if in that scenario you allow three 2704
> byte items that don't need a heap TID and later you find you need to
> add a heap TID to one of those items, the result will be bigger than
> 2704 bytes, and then you can't fit three of them into a page.

Seems you must be right. I'm guessing that the field "cabbage" was
originally a nonce value, as part of a draft patch you're working on?

I actually tested this in a fairly brute force fashion back when I was
working on the Postgres 12 nbtree stuff. Essentially I found a way to
build the tallest possible B-Tree, consisting of only 3 items (plus a
high key) on each leaf page, each of which was the largest possible
size, up to the byte. If memory serves, it is just about impossible to
get beyond 7 levels. It took as long as 30 minutes or more to run the
test.

I think that we should fix this on HEAD, on general principle. There
is no reason to believe that this is a live bug, so a backpatch seems
unnecessary.

--
Peter Geoghegan




Tightening behaviour for non-immutable behaviour in immutable functions

2022-06-08 Thread Greg Stark
Recently I had someone complaining about a pg_restore failure and I
believe we semi-regularly get complaints that are similar -- though
I'm having trouble searching for them because the keywords "dump
restore failure" are pretty generic.

The root cause here -- and I believe for a lot of users -- are
functions that are declared IMMUTABLE but are not for reasons that
aren't obvious to the user. Indeed poking at this more carefully I
think it's downright challenging to write an IMMUTABLE function at
all. I suspect *most* users, perhaps even nearly all users, who write
functions intending them to be immutable are actually not really as
successful as they believe.

The biggest culprit is of course search_path. Afaics it's nigh
impossible to write any non-trivial immutable function without just
setting the search_path GUC on the function. And there's nothing
Postgres that requires that. I don't even see anything in the docs
recommending it.

Many users probably always run with the same search_path so in
practice they're probably mostly safe. But one day they could insert
data with a different search path with a different function definition
in their path and corrupt their index which would be poor... Or as
in my user they could discover the problem only in the middle of an
upgrade which is a terrible time to discover it.

I would suggest we should probably at the very least warn users if
they create an immutable function that doesn't have search_path set
for the function. I would actually prefer to make it an error but that
brings in compatibility issues. Perhaps it could be optional.

But putting a GUC on a function imposes a pretty heavy performance
cost. I'm not sure how bad it is compared to running plpgsql code let
alone other languages but IIUC it invalidates some catalog caches
which for something happening repeatedly in, e.g. a data load would be
pretty bad.

It would be nice to have a way to avoid the performance cost and I see
two options.

Thinking of plpgsql here, we already run the raw parser on all sql
when the function is defined. We could somehow check whether the
raw_parser found any non-schema-qualified references. This looks like
it would be awkward but doable. That would allow users to write
non-search_path-dependent code and if postgres doesn't warn they would
know they've done it properly. It would still put quite a burden on
users, especially when it comes to operators...

Or alternatively we could offer lexical scoping so that all objects
are looked up at parse time and the fully qualified reference is
stored instead of the non-qualified reference. That would be more
similar to how views and other object references are handled.

I suppose there's a third option that we could provide something which
instead of *setting* the guc when a function is entered just verifies
that guc is set as expected. That way the function would simply throw
an error if search_path is "incorrect" and not have to invalidate any
caches. That would at least avoid index corruption but not guarantee
dump/reload would work.

-- 
greg




Re: Request for assistance to backport CVE-2022-1552 fixes to 9.6 and 9.4

2022-06-08 Thread Roberto C . Sánchez
On Wed, Jun 08, 2022 at 04:15:47PM -0400, Tom Lane wrote:
> Roberto =?iso-8859-1?Q?C=2E_S=E1nchez?=  writes:
> > I am investigating backporting the fixes for CVE-2022-1552 to 9.6 and
> > 9.4 as part of Debian LTS and Extended LTS.  I am aware that these
> > releases are no longer supported upstream, but I have made an attempt at
> > adapting commits ef792f7856dea2576dcd9cab92b2b05fe955696b and
> > f26d5702857a9c027f84850af48b0eea0f3aa15c from the REL_10_STABLE branch.
> > I would appreciate a review of the attached patches and any comments on
> > things that may have been missed and/or adapted improperly.
> 
> FWIW, I would not recommend being in a huge hurry to back-port those
> changes, pending the outcome of this discussion:
> 
> https://www.postgresql.org/message-id/flat/f8a4105f076544c180a87ef0c4822352%40stmuk.bayern.de
> 
Thanks for the pointer.

> We're going to have to tweak that code somehow, and it's not yet
> entirely clear how.
> 
I will monitor the discussion to see what comes of it.

Regards,

-Roberto
-- 
Roberto C. Sánchez




BTMaxItemSize seems to be subtly incorrect

2022-06-08 Thread Robert Haas
We have these two definitions in the source code:

#define BTMaxItemSize(page) \
 MAXALIGN_DOWN((PageGetPageSize(page) - \
MAXALIGN(SizeOfPageHeaderData + \
 3*sizeof(ItemIdData)  + \
 3*sizeof(ItemPointerData)) - \
MAXALIGN(sizeof(BTPageOpaqueData))) / 3)
#define BTMaxItemSizeNoHeapTid(page) \
 MAXALIGN_DOWN((PageGetPageSize(page) - \
MAXALIGN(SizeOfPageHeaderData + 3*sizeof(ItemIdData)) - \
MAXALIGN(sizeof(BTPageOpaqueData))) / 3)

In my tests, PageGetPageSize(page) = 8192, SizeOfPageHeaderData = 24,
sizeof(ItemIdData) = 4, sizeof(ItemPointerData) = 6, and
sizeof(BTPageOpaqueData) = 16. Assuming MAXIMUM_ALIGNOF == 8, I
believe that makes BTMaxItemSize come out to 2704 and
BTMaxItemSizeNoHeapTid come out to 2712. I have no quibble with the
formula for BTMaxItemSizeNoHeapTid. It's just saying that if you
subtract out the page header data, the special space, and enough space
for 3 line pointers, you have a certain amount of space left (8152
bytes) and so a single item shouldn't use more than a third of that
(2717 bytes) but since items use up space in increments of MAXALIGN,
you have to round down to the next such multiple (2712 bytes).

But what's up with BTMaxItemSize? Here, the idea as I understand it is
that we might need to add a TID to the item, so it has to be small
enough to squeeze one in while still fitting under the limit. And at
first glance everything seems to be OK, because BTMaxItemSize comes
out to be 8 bytes less than BTMaxItemSizeNoHeapTid and that's enough
space to fit a heap TID for sure. However, it seems to me that the
formula calculates this as if those additional 6 bytes were being
separately added to the page header or the line pointer array, whereas
in reality they will be part of the tuple itself. I think that we
should be subtracting sizeof(ItemPointerData) at the very end, rather
than subtracting 3*sizeof(ItemPointerData) from the space available to
be divided by three.

To see why, suppose that sizeof(BTPageOpaqueData) were 24 rather than
16. Then we'd have:

BTMaxItemSize = MAXALIGN_DOWN((8192 - MAXALIGN(24 + 3 * 4 + 3 * 6) -
MAXALIGN(24)) / 3) = MAXALIGN_DOWN((8192 - MAXALIGN(54)  - 24) / 3) =
MAXALIGN_DOWN(2704) = 2704
BTMaxItemSizeNoHeapTid = MAXALIGN_DOWN((8192 - MAXALIGN(24 + 3 * 4) -
MAXALIGN(24)) / 3 = MAXALIGN_DOWN((8192 - MAXALIGN(36)  - 24) / 3) =
MAXALIGN_DOWN(2709) = 2704

That's a problem, because if in that scenario you allow three 2704
byte items that don't need a heap TID and later you find you need to
add a heap TID to one of those items, the result will be bigger than
2704 bytes, and then you can't fit three of them into a page.

Applying the attached patch and running 'make check' suffices to
demonstrate the problem for me:

diff -U3 /Users/rhaas/pgsql/src/test/regress/expected/vacuum.out
/Users/rhaas/pgsql/src/test/regress/results/vacuum.out
--- /Users/rhaas/pgsql/src/test/regress/expected/vacuum.out
2022-06-06 14:46:17.0 -0400
+++ /Users/rhaas/pgsql/src/test/regress/results/vacuum.out
2022-06-08 17:20:58.0 -0400
@@ -137,7 +137,9 @@
 repeat('1234567890',269));
 -- index cleanup option is ignored if VACUUM FULL
 VACUUM (INDEX_CLEANUP TRUE, FULL TRUE) no_index_cleanup;
+ERROR:  cannot insert oversized tuple of size 2712 on internal page
of index "no_index_cleanup_idx"
 VACUUM (FULL TRUE) no_index_cleanup;
+ERROR:  cannot insert oversized tuple of size 2712 on internal page
of index "no_index_cleanup_idx"
 -- Toast inherits the value from its parent table.
 ALTER TABLE no_index_cleanup SET (vacuum_index_cleanup = false);
 DELETE FROM no_index_cleanup WHERE i < 15;

-- 
Robert Haas
EDB: http://www.enterprisedb.com


add-cabbage.patch
Description: Binary data


Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

2022-06-08 Thread Justin Pryzby
On Wed, Jun 08, 2022 at 10:55:29AM +0900, Michael Paquier wrote:
> And applied, to take care of this open item.

Shouldn't this wait for the buildfarm to be updated again ?

-- 
Justin




Re: Collation version tracking for macOS

2022-06-08 Thread Mark Dilger



> On Jun 7, 2022, at 1:10 PM, Tom Lane  wrote:
> 
> This is not the concern that I have.  I agree that if we tell a user
> that collation X changed behavior and he'd better reindex his indexes
> that use collation X, but none of them actually contain any cases that
> changed behavior, that's not a "false positive" --- that's "it's cheaper
> to reindex than to try to identify whether there's a problem".

I don't see this problem as limited to indexes, though I do understand why that 
might be the most common place for the problem to manifest itself.

As a simple example, text[] constructed using array_agg over sorted data can be 
corrupted by a collation change, and reindex won't fix it.

If we extend the table-AM interface to allow query quals to be pushed down to 
the table-AM, we might develop table-AMs that care about sort order, too.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Collation version tracking for macOS

2022-06-08 Thread Robert Haas
On Wed, Jun 8, 2022 at 4:02 PM Tom Lane  wrote:
> I'm very skeptical of this process as being a reason to push users
> to reindex everything in sight.  If U+ was not a thing last year,
> there's no reason to expect that it appears in anyone's existing data,
> and therefore the fact that it sorts differently this year is a poor
> excuse for sounding time-to-reindex alarm bells.

That seems completely wrong to me. It's not like a new character shows
up and people wait to start using it until it makes its way into
everyone's collation data. That is emphatically not what happens, I
would say. What happens is that people upgrade their libc packages at
one times and their postgres packages at another time, and it's
unlikely that they have any idea which order they do or did those
things. Meanwhile, people start using all the latest emojis. The idea
that the average PostgreSQL user has any idea whether a certain emoji
shows up in the data set for the first time before or after they
install the libc version that knows about it seems absurd. We don't
even know how to figure out which emojis the installed libc supports
-- if we did, we could reject data that we don't know how to sort
properly instead of ending up with corrupted indexes later. The user
has no more ability to figure it out than we do, and even if they did,
they probably wouldn't want to compare their stream of input data to
their collate definitions using some process external to the database.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Request for assistance to backport CVE-2022-1552 fixes to 9.6 and 9.4

2022-06-08 Thread Tom Lane
Roberto =?iso-8859-1?Q?C=2E_S=E1nchez?=  writes:
> I am investigating backporting the fixes for CVE-2022-1552 to 9.6 and
> 9.4 as part of Debian LTS and Extended LTS.  I am aware that these
> releases are no longer supported upstream, but I have made an attempt at
> adapting commits ef792f7856dea2576dcd9cab92b2b05fe955696b and
> f26d5702857a9c027f84850af48b0eea0f3aa15c from the REL_10_STABLE branch.
> I would appreciate a review of the attached patches and any comments on
> things that may have been missed and/or adapted improperly.

FWIW, I would not recommend being in a huge hurry to back-port those
changes, pending the outcome of this discussion:

https://www.postgresql.org/message-id/flat/f8a4105f076544c180a87ef0c4822352%40stmuk.bayern.de

We're going to have to tweak that code somehow, and it's not yet
entirely clear how.

regards, tom lane




Re: Collation version tracking for macOS

2022-06-08 Thread Tom Lane
"Daniel Verite"  writes:
> Independently of these rules, all Unicode collations change frequently
> because each release of Unicode adds new characters. Any string
> that contains a code point that was previously unassigned is going
> to be sorted differently by all collations when that code point gets
> assigned to a character.
> Therefore the versions of all collations need to be bumped at every
> Unicode release. This is what ICU does.

I'm very skeptical of this process as being a reason to push users
to reindex everything in sight.  If U+ was not a thing last year,
there's no reason to expect that it appears in anyone's existing data,
and therefore the fact that it sorts differently this year is a poor
excuse for sounding time-to-reindex alarm bells.

I'm quite concerned that we are going to be training users to ignore
collation-change warnings.  They have got to be a lot better targeted
than this, or we're just wasting everyone's time, including ours.

regards, tom lane




Request for assistance to backport CVE-2022-1552 fixes to 9.6 and 9.4

2022-06-08 Thread Roberto C . Sánchez
Hello Devs,

I am investigating backporting the fixes for CVE-2022-1552 to 9.6 and
9.4 as part of Debian LTS and Extended LTS.  I am aware that these
releases are no longer supported upstream, but I have made an attempt at
adapting commits ef792f7856dea2576dcd9cab92b2b05fe955696b and
f26d5702857a9c027f84850af48b0eea0f3aa15c from the REL_10_STABLE branch.
I would appreciate a review of the attached patches and any comments on
things that may have been missed and/or adapted improperly.

The first thing I did was to adapt the full patches, with functional
changes and regression tests.  Since amcheck was new to version 10, I
dropped that part of the patch.  Additionally, since partitioned tables
were new in 10 I dropped those parts of the tests.  The absence of block
range indices in 9.4 means I also dropped that part of the change and
associated test as well.

Once everything built successfully, I built again with only the
regression tests to confirm that the vulnerability was presented and
triggerred by the regression test [*].

When building with only the adapted regression tests, the 9.6 build
failed with this in the test output:

+ ERROR:  sro_ifun(10) called by pbuilder
+ CONTEXT:  PL/pgSQL function sro_ifun(integer) line 4 at ASSERT

This seems to indicate that the vulnerability was encountered and that
the function was called as the invoking user rather than the owning
user.  Naturally, there were further differneces in the test output
owing to the index creation failure.

For 9.4, the error looked like this:

+ ERROR:  called by pbuilder

As a result of ASSERT not being present in 9.4 I had to resort to an IF
statement with a RAISE.  However, it appears to be the identical
problem.

There are 4 patches attached to this mail, one for each of the two
commits referenced above as adapted for 9.6 and 9.4.  Please advise on
whether adjustments are needed or whether I can proceed with publishing
updated 9.6 and 9.4 packages for Debian with said patches.

Regards,

-Roberto

[*] Side note: my approach revealed that the adapted regression tests
trigger the vulnerability in both 9.6 and 9.4.  However, the SUSE
security information page for CVE-2022-1552 [0] lists 9.6 as "not
affected".  Presumably this is based on the language in the upstream
advisory "Versions Affected: 10 - 14."

[0] https://www.suse.com/security/cve/CVE-2022-1552.html

-- 
Roberto C. Sánchez
>From ef792f7856dea2576dcd9cab92b2b05fe955696b Mon Sep 17 00:00:00 2001
From: Noah Misch 
Date: Mon, 9 May 2022 08:35:08 -0700
Subject: [PATCH] Make relation-enumerating operations be security-restricted
 operations.

When a feature enumerates relations and runs functions associated with
all found relations, the feature's user shall not need to trust every
user having permission to create objects.  BRIN-specific functionality
in autovacuum neglected to account for this, as did pg_amcheck and
CLUSTER.  An attacker having permission to create non-temp objects in at
least one schema could execute arbitrary SQL functions under the
identity of the bootstrap superuser.  CREATE INDEX (not a
relation-enumerating operation) and REINDEX protected themselves too
late.  This change extends to the non-enumerating amcheck interface.
Back-patch to v10 (all supported versions).

Sergey Shinderuk, reviewed (in earlier versions) by Alexander Lakhin.
Reported by Alexander Lakhin.

Security: CVE-2022-1552
---
 src/backend/access/brin/brin.c   |   30 -
 src/backend/catalog/index.c  |   41 +--
 src/backend/commands/cluster.c   |   35 
 src/backend/commands/indexcmds.c |   53 +--
 src/backend/utils/init/miscinit.c|   24 --
 src/test/regress/expected/privileges.out |   42 
 src/test/regress/sql/privileges.sql  |   36 +
 7 files changed, 231 insertions(+), 30 deletions(-)

--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -28,6 +28,7 @@
 #include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/freespace.h"
+#include "utils/guc.h"
 #include "utils/index_selfuncs.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -786,6 +787,9 @@
 	Oid			heapoid;
 	Relation	indexRel;
 	Relation	heapRel;
+	Oid			save_userid;
+	int			save_sec_context;
+	int			save_nestlevel;
 	double		numSummarized = 0;
 
 	if (RecoveryInProgress())
@@ -799,10 +803,28 @@
 	 * passed indexoid isn't an index then IndexGetRelation() will fail.
 	 * Rather than emitting a not-very-helpful error message, postpone
 	 * complaining, expecting that the is-it-an-index test below will fail.
+	 *
+	 * unlike brin_summarize_range(), autovacuum never calls this.  hence, we
+	 * don't switch userid.
 	 */
 	heapoid = IndexGetRelation(indexoid, true);
 	if (OidIsValid(heapoid))
+	{
 		heapRel = heap_open(heapoid, ShareUpdateExclusiveLock);
+
+		/*
+		 * Autovacuum calls us.  For its benefit, switch 

Re: Collation version tracking for macOS

2022-06-08 Thread Peter Geoghegan
On Wed, Jun 8, 2022 at 10:51 AM Tom Lane  wrote:
> Their POSIX collations seem to be legacy code that's entirely unrelated to
> any modern collation support; in particular the "UTF8" ones are that in
> name only.  I'm sure that Apple are indeed updating the UTF8 data behind
> their proprietary i18n APIs, but the libc APIs are mostly getting benign
> neglect.

I find that easy to believe. It's consistent with the overall picture
of Apple not caring about the POSIX collations beyond the basic
requirement for compatibility. ISTM that their totally inefficient
approach to implementing strxfrm() is another example of the same
thing. (The Apple strxfrm() produces huge low entropy binary strings,
unlike the glibc version, which is pretty well optimized.)

-- 
Peter Geoghegan




Re: Collation version tracking for macOS

2022-06-08 Thread Daniel Verite
Tom Lane wrote:

> Yeah, and it's exactly at the level of quirks that things are likely
> to change.  Nobody's going to suddenly start sorting B before A.
> They might, say, change their minds about where the digram "cz"
> sorts relative to single letters, in languages where special rules
> for that are a thing.

Independently of these rules, all Unicode collations change frequently
because each release of Unicode adds new characters. Any string
that contains a code point that was previously unassigned is going
to be sorted differently by all collations when that code point gets
assigned to a character.
Therefore the versions of all collations need to be bumped at every
Unicode release. This is what ICU does.

If the libc in macOS doesn't follow Unicode, that's not relevant
to macOS, but let's assume an OS that tries to be up-to-date.
If major OS upgrades happen every year or less frequently,
each OS upgrade is likely to imply an upgrade of all the collations,
since the interval between Unicode releases tends to be a year
or less:
https://www.unicode.org/history/publicationdates.html



Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite




Re: How about a psql backslash command to show GUCs?

2022-06-08 Thread Jonathan S. Katz

On 6/8/22 1:26 PM, Tom Lane wrote:

"Jonathan S. Katz"  writes:

Interesting. OK, I'd say let's keep the behavior that's in the patch.


Pushed then.


Excellent -- thank you!

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature


Re: Collation version tracking for macOS

2022-06-08 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jun 7, 2022 at 4:10 PM Tom Lane  wrote:
>> I mean by "false positive" is telling every macOS user that they'd better
>> reindex everything every year, when in point of fact Apple changes those
>> collations almost never.

> Do we actually know that to be true? Given how fast things seem to be
> getting added to Unicode, it wouldn't surprise me at all if they're
> updating their Unicode tables for new characters with some regularity,
> if nothing else, and that's a breaking change for us.

Their POSIX collations seem to be legacy code that's entirely unrelated to
any modern collation support; in particular the "UTF8" ones are that in
name only.  I'm sure that Apple are indeed updating the UTF8 data behind
their proprietary i18n APIs, but the libc APIs are mostly getting benign
neglect.

Maybe the report that started this thread indicates that this is changing,
but I'll believe that when I see it.

regards, tom lane




Re: How about a psql backslash command to show GUCs?

2022-06-08 Thread Tom Lane
"Jonathan S. Katz"  writes:
> Interesting. OK, I'd say let's keep the behavior that's in the patch.

Pushed then.

regards, tom lane




Re: How about a psql backslash command to show GUCs?

2022-06-08 Thread Jonathan S. Katz

On 6/7/22 10:57 PM, Tom Lane wrote:

"Jonathan S. Katz"  writes:

I don't know how frequently issues around "max_stack_depth" being too
small are reported -- I'd be curious to know that -- but I don't have
any strong arguments against allowing the behavior you describe based on
our current docs.


I can't recall any recent gripes on our own lists, but the issue was
top-of-mind for me after discovering that NetBSD defaults "ulimit -s"
to 2MB on at least some platforms.  That would leave us setting
max_stack_depth to something less than that, probably about 1.5MB.


Interesting. OK, I'd say let's keep the behavior that's in the patch.

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature


Re: replacing role-level NOINHERIT with a grant-level option

2022-06-08 Thread Robert Haas
On Wed, Jun 8, 2022 at 10:16 AM Stephen Frost  wrote:
> > That's why I proposed the name SET, not MEMBERSHIP. You would still
> > get a catalog entry in pg_auth_members, so you are still a member in
> > some loose sense even if your grant has INHERIT FALSE and SET FALSE,
> > but in such a case the fact that you are a member isn't really doing
> > anything for you in terms of getting you access to privileges because
> > you're neither allowed to exercise them implicitly nor SET ROLE to the
> > role.
>
> Naming things is hard. :)  I'm still not a fan of calling that option
> 'SET' and 'membership' feels like how it's typically described today
> when someone has the rights of a group (implicitly or explicitly).  As
> for what to call "has a pg_auth_members row but no actual access", maybe
> 'associated'?

What I want here is two Boolean flags, one of which controls whether
you implicitly have the privileges of the granted role and the other
of which controls whether you can access those privileges via SET
ROLE. In each case, I want TRUE to be the state where you have or can
get the privileges and FALSE the state where you can't. I'm not
especially stuck on the names INHERIT and SET for those things,
although in my opinion INHERIT is pretty good and SET is where there
is, perhaps, more likely to be room for improvement. However, I don't
think ASSOCIATED is an improvement because it reverses the sense: now,
TRUE means you don't have the privileges (because you're only
associated, not actually a member, I guess) and FALSE means you do
(because you are not merely associated, but are a member). Yick. That
seems super-unintuitive.

Other alternatives to SET:

- BECOME (can you become that user?)
- ASSUME (can you assume that roles's privileges?)
- EXPLICIT (perhaps also renaming the INHERIT permission to IMPLICIT)

Honestly the main thing I don't like about SET is that it sounds like
it ought to be the action verb in the command, rather than just an
option name. But we've already crossed that Rubicon by deciding that
you can GRANT SET on a GUC, and what's the difference between that and
granting a role WITH SET? I'm actually a bit afraid that if we get
creative here it will sound inconsistent.

> That does lead me down a bit of a rabbit hole because every role in the
> entire system could be considered 'associated' with every other one and
> if the case of "no pg_auth_members row" is identical to the case of
> "pg_auth_members row with everything off/default" then it feels a bit
> odd to have an entry for it- and is there any way to get rid of that
> entry?

"everything off" and "everything default" are two very different
things. If you just do "GRANT foo TO bar" without setting any options,
then you get all the options set to their default values, and that's
going to generate a pg_auth_members entry that we certainly can't
omit. However, let's say that you then alter that grant by removing
every single privilege bit, one by one:

GRANT foo TO bar; -- normal grant
GRANT foo TO bar WITH INHERIT OFF; -- implicit inheritance of privileges removed
GRANT foo TO bar WITH SET OFF; -- explicit SET ROLE privilege now also removed
GRANT foo TO bar WITH JOIE_DE_VIVRE OFF; -- now there's nothing left

One can certainly make an argument that the last GRANT ought to just
go ahead and nuke the pg_auth_members entry entirely, and I'm willing
to do that if that's what most people want. However, I think it might
be better to leave well enough alone. In my proposal, that last GRANT
command is just adjusting the options of the existing grant, not
removing it. True, it is a pretty worthless grant, lacking even joie
de vivre, but it is a grant all the same, and it can still be dumped
and restored just fine. With this change, that last GRANT command
becomes, in effect, a REVOKE, which is a little odd. Right now, the
pg_auth_members row has no "payload data" and perhaps it never will,
but if we were storing, I don't know, the number of times the grant
had ever been used, or when it got created, or whatever, that
information would be lost when that row is deleted, and that seems
like something that the user might want to have happen only when they
explicitly asked for it.

I don't think this is a big deal either way. I've designed systems for
other things both ways in the past, and my general experience has been
that the implicit-removal behavior tends to turn out to be more
annoying than you initially think that it will be, but if people want
that, it should be possible.

> > > In your proposal, does:
> > >
> > > GRANT foo TO bar WITH FRUIT STRAWBERRY;
> > >
> > > mean that 'foo' is grant'd to 'bar' too?
>
> That is, regardless of how we track these things in the catalog or such,
> we have to respect that:
>
> GRANT foo TO bar;
>
> is a SQL-defined thing that says that a 'role authorization descriptor'
> is created.  SET ROLE then checks if a role authorization descriptor
> exists or not matching the current role to the new role and if 

Re: invoker function security issues

2022-06-08 Thread David G. Johnston
On Wed, Jun 8, 2022 at 7:29 AM Virender Singla 
wrote:

>  but I still expect Postgres to save us from such data inconsistencies
> issues by using early binding for functional Indexes.
>

Well, if the functions you are writing are "black boxes" to PostgreSQL this
expectation seems unreasonable.  As of v14 at least you have the option to
write a SQL standard function definition which is whose parsed expression
and dependencies are saved instead of a black box of text.


> However still there are cases where functional Indexes are created on
> extension functions (For Ex - cube extension) which are present in
> different schemas and then those cube functions are defined as invoker
> security type with nested functions calls without any schema qualification.
>

Right, because if you try doing that in a security definer context the lack
of a schema qualification will provoke a function not found error due to
the sanitized search_path  (IIRC, if it doesn't then the two cases should
behave identically...)

One way would be to define search_path for such nested functions.
>

Which the user is well advised to do, or, better, just schema-qualify all
object references.  This can get a bit annoying for operators, in which
case an explicit, localized, search_path becomes more appealing.

The tools are available for one to protect themself.  I suspect the
historical baggage we carry prevents the server from being more aggressive
in enforcing the use of these tools since not all cases where they are not
used are problematic and there is lots of legacy code working just fine.
The security lockdown for this dynamic has already happened by basically
admitting that the idea of a "public" schema at the front of the default
search_path was a poor decision.  And while I see that there is possibly
room for improvement here if desired, it is, for me, acceptable for the
project to put the responsibility of not executing problematic code in the
hands of the DBA.

I'm curious how "EXECUTE" command and dynamic SQL fit into your POV here
(specifically in function bodies).  Right now, with "black box inside of
black box" mechanics it isn't really an issue but if you want to not keep
function bodies as black boxes now dynamic SQL becomes the top-most late
binding point.

David J.


Re: pgcon unconference / impact of block size on performance

2022-06-08 Thread Tomas Vondra
On 6/8/22 16:15, Jakub Wartak wrote:
> Hi, got some answers!
> 
> TL;DR for fio it would make sense to use many stressfiles (instead of 1) and 
> same for numjobs ~ VCPU to avoid various pitfails.
>  The really
 puzzling thing is why is the filesystem so much slower for smaller
 pages. I mean, why would writing 1K be 1/3 of writing 4K?
 Why would a filesystem have such effect?
>>>
>>> Ha! I don't care at this point as 1 or 2kB seems too small to handle
>>> many real world scenarios ;)
> [..]
>> Independently of that, it seems like an interesting behavior and it might 
>> tell us
>> something about how to optimize for larger pages.
> 
> OK, curiosity won:
> 
> With randwrite on ext4 directio using 4kb the avgqu-sz reaches ~90-100 (close 
> to fio's 128 queue depth?) and I'm getting ~70k IOPS [with maxdepth=128]
> With randwrite on ext4 directio using 1kb the avgqu-sz is just 0.7 and I'm 
> getting just ~17-22k IOPS [with maxdepth=128] ->  conclusion: something is 
> being locked thus preventing queue to build up
> With randwrite on ext4 directio using 4kb the avgqu-sz reaches ~2.3 (so 
> something is queued) and I'm also getting ~70k IOPS with minimal possible 
> maxdepth=4 ->  conclusion: I just need to split the lock contention by 4.
> 
> The 1kB (slow) profile top function is aio_write() ->  -> 
> iov_iter_get_pages() -> internal_get_user_pages_fast() and there's sadly 
> plenty of "lock" keywords inside {related to memory manager, padding to full 
> page size, inode locking} also one can find some articles / commits related 
> to it [1] which didn't made a good feeling to be honest as the fio is using 
> just 1 file (even while I'm on kernel 5.10.x). So I've switched to 4x files 
> and numjobs=4 and got easily 60k IOPS, contention solved whatever it was :) 
> So I would assume PostgreSQL (with it's splitting data files by default on 
> 1GB boundaries and multiprocess architecture) should be relatively safe from 
> such ext4 inode(?)/mm(?) contentions even with smallest 1kb block sizes on 
> Direct I/O some day.  
> 

Interesting. So what parameter values would you suggest?

FWIW some of the tests I did were on xfs, so I wonder if that might be
hitting similar/other bottlenecks.

> [1] - https://www.phoronix.com/scan.php?page=news_item=EXT4-DIO-Faster-DBs
> 
>>> Both scenarios (raw and fs) have had direct=1 set. I just cannot understand
>> how having direct I/O enabled (which disables caching) achieves better read
>> IOPS on ext4 than on raw device... isn't it contradiction?
>>>
>>
>> Thanks for the clarification. Not sure what might be causing this. Did you 
>> use the
>> same parameters (e.g. iodepth) in both cases?
> 
> Explanation: it's the CPU scheduler migrations mixing the performance result 
> during the runs of fio  (as you have in your framework). Various VCPUs seem 
> to be having varying max IOPS characteristics (sic!) and CPU scheduler seems 
> to be unaware of it. At least on 1kB and 4kB blocksize this happens also 
> notice that some VCPUs [ marker] don't reach 100% CPU reaching almost 
> twice the result; while cores 0, 3 do reach 100% and lack CPU power to 
> perform more. The only thing that I don't get is that it doesn't make sense 
> from extened lscpu output (but maybe it's AWS XEN mixing real CPU mappings, 
> who knows).

Uh, that's strange. I haven't seen anything like that, but I'm running
on physical HW and not AWS, so it's either that or maybe I just didn't
do the same test.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




invoker function security issues

2022-06-08 Thread Virender Singla
I believe functions in Postgres follow a late binding approach and hence
nested function dependencies are resolved using search_path at run time.
This way a user can override nested functions in its schema and change the
behaviour of wrapper functions. However, a more serious issue is when
functional Indexes (with nested function calls) are created on a table and
then the data inserted in Indexes could be entirely dependent on which user
is inserting the data (by overriding nested function).

I performed a couple of test cases where data inserted is dependent on the
user overriding nested functions. I understand this is not the best
practice to scatter functions/indexes/tables in different different schemas
and use such kind schema setup but I still expect Postgres to save us from
such data inconsistencies issues by using early binding for functional
Indexes. In fact Postgres does that linking for a single function Index
(where no nested function are there) and qualifies the function used in the
Index with its schema name and also it works in cases where all
functions, table, Indexes are present in the same schema.

However still there are cases where functional Indexes are created on
extension functions (For Ex - cube extension) which are present in
different schemas and then those cube functions are defined as invoker
security type with nested functions calls without any schema qualification.

Issue that would arise with late binding for functional Indexes is that
when we are migrating such tables/indexes/data from one database to another
(using pg_dump/pg_restore or any other method) data can be changed
depending on which user we are using for import.
(These tests i performed using invoker functions, i think definer functions
produce correct behavior). One way would be to define search_path for such
nested functions.


   1. =Case1==
   2.
   3. ##Table and functions are in different schemas.
   4.
   5. Session1::
   6. User:Postgres
   7.
   8. create user idxusr1 with password '*';
   9. grant idxusr1 to postgres;
   10. create schema idxusr1 AUTHORIZATION idxusr1;
   11.
   12. create user idxusr2 with password '*';
   13. grant idxusr2 to postgres;
   14. create schema idxusr2 AUTHORIZATION idxusr2;
   15.
   16. Session2::
   17. User:idxusr1
   18.
   19. set search_path to idxusr1,public;
   20.
   21. CREATE FUNCTION sumcall(int, int) RETURNS int LANGUAGE SQL
IMMUTABLE STRICT PARALLEL SAFE AS 'SELECT ($1+$2)';
   22.
   23. CREATE FUNCTION wrapsum(int, int) RETURNS int LANGUAGE SQL
IMMUTABLE STRICT PARALLEL SAFE AS 'SELECT sumcall($1,$2)';
   24.
   25. ##create table in another schema
   26.
   27. create table public.test(n1 int);
   28. create unique index idxtst on public.test(idxusr1.wrapsum(n1,1));
   29.
   30. grant insert on table public.test to idxusr2;
   31.
   32. postgres=> insert into test values(1);
   33. INSERT 0 1
   34. postgres=> insert into test values(1);
   35. ERROR:  duplicate key value violates unique constraint "idxtst"
   36. DETAIL:  Key (wrapsum(n1, 1))=(2) already exists.
   37.
   38. Session3::
   39. User:idxusr2
   40.
   41. set search_path to idxusr2,public;
   42.
   43. CREATE FUNCTION sumcall(int, int) RETURNS int LANGUAGE SQL
IMMUTABLE STRICT PARALLEL SAFE AS 'SELECT ($1 - $2)';
   44.
   45. postgres=> insert into test values(1);
   46. INSERT 0 1
   47. postgres=> insert into test values(1);
   48. ERROR:  duplicate key value violates unique constraint "idxtst"
   49. DETAIL:  Key (idxusr1.wrapsum(n1, 1))=(0) already exists.
   50.
   51. ==Case2==
   52.
   53.  ##Functions are in different schemas.
   54.
   55. Session1::
   56. User:Postgres
   57.
   58. create user idxusr1 with password '*';
   59. grant idxusr1 to postgres;
   60. create schema idxusr1 AUTHORIZATION idxusr1;
   61.
   62. create user idxusr2 with password '*';
   63. grant idxusr2 to postgres;
   64. create schema idxusr2 AUTHORIZATION idxusr2;
   65.
   66. Session2::
   67. User:idxusr1
   68.
   69. set search_path to idxusr1,public;
   70.
   71. ##create internal function in own schema and wrapper function
in another schema.
   72.
   73. CREATE FUNCTION sumcall(int, int) RETURNS int LANGUAGE SQL
IMMUTABLE STRICT PARALLEL SAFE AS 'SELECT ($1+$2)';
   74.
   75. CREATE FUNCTION public.wrapsum(int, int) RETURNS int LANGUAGE
SQL IMMUTABLE STRICT PARALLEL SAFE AS 'SELECT sumcall($1,$2)';
   76.
   77. create table test(n1 int);
   78. create unique index idxtst on test(public.wrapsum(n1,1));
   79.
   80. grant usage on schema idxusr1 to idxusr2;
   81. grant insert on table test to idxusr2;
   82. postgres=> insert into test values(1);
   83. INSERT 0 1
   84. postgres=> insert into test values(1);
   85. ERROR:  duplicate key value violates unique constraint "idxtst"
   86. DETAIL:  Key (wrapsum(n1, 1))=(2) already exists.
   87.
   88. Session3::
   89. User:idxusr2
   90.
   91. set search_path to idxusr2,public;
   92.
   93. 

Re: replacing role-level NOINHERIT with a grant-level option

2022-06-08 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Mon, Jun 6, 2022 at 7:21 PM Stephen Frost  wrote:
> > > To revoke a grant entirely, you just say REVOKE foo FROM bar, as now.
> > > To change an option for an existing grant, you can re-execute the
> > > grant statement with a different WITH clause. Any options that are
> > > explicitly mentioned will be changed to have the associated values;
> > > unmentioned options will retain their existing values. If you want to
> > > change the value of a Boolean option to false, you have a second
> > > option, which is to write "REVOKE option_name OPTION FOR foo FROM
> > > bar," which means exactly the same thing as "GRANT foo TO bar WITH
> > > option_name FALSE".
> >
> > I'm a bit concerned about this because, iiuc, it would mean:
> >
> > GRANT foo TO bar WITH FRUIT KIWI, SARDINES;
> > GRANT foo TO bar WITH FRUIT STRAWBERRY;
> >
> > would mean that the GRANT of FRUIT would then *only* have STRAWBERRY,
> > right?
> 
> I think that you are misunderstanding what kind of option I intended
> FRUIT to be. Here, I was imagining FRUIT as a property that every
> grant has. Any given grant is either strawberry, or it's kiwi, or it's
> banana. It cannot be more than one of those things, nor can it be none
> of those things. It follows that if you execute GRANT without
> specifying a FRUIT, there's some default - hopefully banana, but
> that's a matter of taste. Later, you can change the fruit associated
> with a grant, but you cannot remove it, because there's no such thing
> as a fruitless grant. Imagine that the catalog representation is a
> "char" that is either 's', 'k', or 'b'.

Ah, yeah, if it's always single-value then that seems reasonable to me
too.  If we ever get to wanting to support multiple choices for a given
option then we could possibly require they be provided as an ARRAY or
using ()'s or something else, but we probably don't need to try and sort
that today.

> > In your proposal, does:
> >
> > GRANT foo TO bar WITH FRUIT STRAWBERRY;
> >
> > mean that 'foo' is grant'd to 'bar' too?  Seems to be closest to current
> > usage and the spec's ideas on these things.  I'm thinking that could be
> > dealt with by having a MEMBERSHIP option (which would be a separate
> > column in pg_auth_members and default would be 'true') but otherwise
> > using exactly what you have here, eg:
> 
> Currently, GRANT always creates an entry in pg_auth_members, or
> modifies an existing one, or does nothing because the one that's there
> is the same as the one it would have created. I think we should stick
> with that idea.

Alright.

> That's why I proposed the name SET, not MEMBERSHIP. You would still
> get a catalog entry in pg_auth_members, so you are still a member in
> some loose sense even if your grant has INHERIT FALSE and SET FALSE,
> but in such a case the fact that you are a member isn't really doing
> anything for you in terms of getting you access to privileges because
> you're neither allowed to exercise them implicitly nor SET ROLE to the
> role.

Naming things is hard. :)  I'm still not a fan of calling that option
'SET' and 'membership' feels like how it's typically described today
when someone has the rights of a group (implicitly or explicitly).  As
for what to call "has a pg_auth_members row but no actual access", maybe
'associated'?

That does lead me down a bit of a rabbit hole because every role in the
entire system could be considered 'associated' with every other one and
if the case of "no pg_auth_members row" is identical to the case of
"pg_auth_members row with everything off/default" then it feels a bit
odd to have an entry for it- and is there any way to get rid of that
entry?

All that said ... we have a similar thing with GRANT today when it comes
to privileges on objects in that we go from NULL to owner-all+whatever,
and while it's a bit odd, it works well enough.

> I find that idea - that GRANT always grants membership but membership
> by itself doesn't really do anything for you unless you've got some
> other options enabled somewhere - more appealing than the design you
> seem to have in mind, which seems to me that membership is the same
> thing as the ability to SET ROLE and thus, if the ability to SET ROLE
> has not been granted, you have a grant that didn't confirm membership
> in any sense. I'm not saying we couldn't make that work, but I think
> it's awkward to make that work. Among other problems, what happens
> with the actual catalog representation? You could for example still
> create a role in pg_auth_members and then have a Boolean column
> membership = false, but that's a bit odd. Or you could add a new
> catalog or you could rename the existing catalog, but that's more
> complicated for not much benefit. I think there's some fuzziness at
> the semantic level with this kind of thing too: if I do a GRANT with
> MEMBERSHIP FALSE, what exactly is it that I am granting? I like the
> conceptual simplicity of being able to say 

Re: Collation version tracking for macOS

2022-06-08 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jun 7, 2022 at 3:53 PM Tom Lane  wrote:
>> No, I quite agree that we have a problem.  What I don't agree is that
>> issuing a lot of false-positive warnings is a solution.

> I mean, how many false-positive warnings do you think we'll get?

The proposed patch would result in a warning about every collation-
sensitive index during every macOS major version upgrade, ie about
once a year for most people.  Seeing that Apple only actually touch
their POSIX collations once a decade or so, that's way too far over
on the crying-wolf end of the scale for me.  We need something that
has at least *some* connection to actual changes.

regards, tom lane




RE: pgcon unconference / impact of block size on performance

2022-06-08 Thread Jakub Wartak
Hi, got some answers!

TL;DR for fio it would make sense to use many stressfiles (instead of 1) and 
same for numjobs ~ VCPU to avoid various pitfails.

> >> The really
> >> puzzling thing is why is the filesystem so much slower for smaller
> >> pages. I mean, why would writing 1K be 1/3 of writing 4K?
> >> Why would a filesystem have such effect?
> >
> > Ha! I don't care at this point as 1 or 2kB seems too small to handle
> > many real world scenarios ;)
[..]
> Independently of that, it seems like an interesting behavior and it might 
> tell us
> something about how to optimize for larger pages.

OK, curiosity won:

With randwrite on ext4 directio using 4kb the avgqu-sz reaches ~90-100 (close 
to fio's 128 queue depth?) and I'm getting ~70k IOPS [with maxdepth=128]
With randwrite on ext4 directio using 1kb the avgqu-sz is just 0.7 and I'm 
getting just ~17-22k IOPS [with maxdepth=128] ->  conclusion: something is 
being locked thus preventing queue to build up
With randwrite on ext4 directio using 4kb the avgqu-sz reaches ~2.3 (so 
something is queued) and I'm also getting ~70k IOPS with minimal possible 
maxdepth=4 ->  conclusion: I just need to split the lock contention by 4.

The 1kB (slow) profile top function is aio_write() ->  -> 
iov_iter_get_pages() -> internal_get_user_pages_fast() and there's sadly plenty 
of "lock" keywords inside {related to memory manager, padding to full page 
size, inode locking} also one can find some articles / commits related to it 
[1] which didn't made a good feeling to be honest as the fio is using just 1 
file (even while I'm on kernel 5.10.x). So I've switched to 4x files and 
numjobs=4 and got easily 60k IOPS, contention solved whatever it was :) So I 
would assume PostgreSQL (with it's splitting data files by default on 1GB 
boundaries and multiprocess architecture) should be relatively safe from such 
ext4 inode(?)/mm(?) contentions even with smallest 1kb block sizes on Direct 
I/O some day.  

[1] - https://www.phoronix.com/scan.php?page=news_item=EXT4-DIO-Faster-DBs

> > Both scenarios (raw and fs) have had direct=1 set. I just cannot understand
> how having direct I/O enabled (which disables caching) achieves better read
> IOPS on ext4 than on raw device... isn't it contradiction?
> >
> 
> Thanks for the clarification. Not sure what might be causing this. Did you 
> use the
> same parameters (e.g. iodepth) in both cases?

Explanation: it's the CPU scheduler migrations mixing the performance result 
during the runs of fio  (as you have in your framework). Various VCPUs seem to 
be having varying max IOPS characteristics (sic!) and CPU scheduler seems to be 
unaware of it. At least on 1kB and 4kB blocksize this happens also notice that 
some VCPUs [ marker] don't reach 100% CPU reaching almost twice the result; 
while cores 0, 3 do reach 100% and lack CPU power to perform more. The only 
thing that I don't get is that it doesn't make sense from extened lscpu output 
(but maybe it's AWS XEN mixing real CPU mappings, who knows). 

[root@x ~]# for((x=0; x<=3; x++)) ; do echo "$x:"; taskset -c $x fio fio.ext4 | 
grep -e 'read :' -e 'cpu '; done
0:
  read : io=2416.8MB, bw=123730KB/s, iops=123730, runt= 20001msec
  cpu  : usr=42.98%, sys=56.52%, ctx=2317, majf=0, minf=41 [: 100% 
cpu bottleneck and just 123k IOPS]
1:
  read : io=4077.9MB, bw=208774KB/s, iops=208773, runt= 20001msec
  cpu  : usr=29.47%, sys=51.43%, ctx=2993, majf=0, minf=42  [, some 
idle power and 208k IOPS just by switching to core1...]
2:
  read : io=4036.7MB, bw=206636KB/s, iops=206636, runt= 20001msec
  cpu  : usr=31.00%, sys=52.41%, ctx=2815, majf=0, minf=42 []
3:
  read : io=2398.4MB, bw=122791KB/s, iops=122791, runt= 20001msec
  cpu  : usr=44.20%, sys=55.20%, ctx=2522, majf=0, minf=41
[root@x ~]# for((x=0; x<=3; x++)) ; do echo "$x:"; taskset -c $x fio fio.raw | 
grep -e 'read :' -e 'cpu '; done
0:
  read : io=2512.3MB, bw=128621KB/s, iops=128620, runt= 20001msec
  cpu  : usr=47.62%, sys=51.58%, ctx=2365, majf=0, minf=42
1:
  read : io=4070.2MB, bw=206748KB/s, iops=206748, runt= 20159msec
  cpu  : usr=29.52%, sys=42.86%, ctx=2808, majf=0, minf=42 []
2:
  read : io=4101.3MB, bw=209975KB/s, iops=209975, runt= 20001msec
  cpu  : usr=28.05%, sys=45.09%, ctx=3419, majf=0, minf=42 []
3:
  read : io=2519.4MB, bw=128985KB/s, iops=128985, runt= 20001msec
  cpu  : usr=46.59%, sys=52.70%, ctx=2371, majf=0, minf=41

[root@x ~]# lscpu --extended
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZMINMHZ
0   00  00:0:0:0   yes3000. 1200.
1   00  11:1:1:0   yes3000. 1200.
2   00  00:0:0:0   yes3000. 1200.
3   00  11:1:1:0   yes3000. 1200.
[root@x ~]# lscpu | grep -e ^Model -e ^NUMA -e ^Hyper
NUMA node(s):1
Model:   79
Model name:  Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz

Re: => operator for named parameters in open cursor

2022-06-08 Thread Pavel Stehule
st 8. 6. 2022 v 14:29 odesílatel Martin Butter <
martin.but...@splendiddata.com> napsal:

> Since Postgres 9.5 it is possible to use the => operators for filling in
> named parameters in a function call, like perform my_func(named_param =>
> 'some value'). The old form, with the := operator is still allowed for
> backward reference.
>
> But with open cursor, still only the old := operator is allowed.
> Wouldn't it be more appropriate to at least allow the => operator there
> as well, like open my_cursor(named_param => 'some value')?
>

+1

Pavel


Re: Invalid memory access in pg_stat_get_subscription

2022-06-08 Thread Kuntal Ghosh
Hello Tom,

On Wed, Jun 8, 2022 at 12:44 AM Tom Lane  wrote:
>
> Kuntal Ghosh  writes:
> > While exploring some code in logical replication worker
> > implementation, I noticed that we're accessing an invalid memory while
> > traversing LogicalRepCtx->workers[i].
> > For the above structure, we're allocating
> > max_logical_replication_workers times LogicalRepWorker amount of
> > memory in ApplyLauncherShmemSize. But, in the for loop, we're
> > accessing the max_logical_replication_workers + 1 location which is
> > resulting in random crashes.
>
> I concur that that's a bug, but eyeing the code, it seems like an
> actual crash would be improbable.  Have you seen one?  Can you
> reproduce it?
Thank you for looking into it. Unfortunately, I'm not able to
reproduce the crash, but I've seen one crash while executing the
function. The crash occurred at the following line:
> if (!worker.proc || !IsBackendPid(worker.proc->pid))
(gdb) p worker.proc
$6 = (PGPROC *) 0x2bf0b9
The PGPROC structure was pointing to an invalid memory location.

-- 
Thanks & Regards,
Kuntal Ghosh




Checking for missing heap/index files

2022-06-08 Thread Bruce Momjian
We currently can check for missing heap/index files by comparing
pg_class with the database directory files.  However, I am not clear if
this is safe during concurrent DDL.  I assume we create the file before
the update to pg_class is visible, but do we always delete the file
after the update to pg_class is visible?  I assume any external checking
tool would need to lock the relation to prevent concurrent DDL.

Also, how would it check if the number of extents is correct?  Seems we
would need this value to be in pg_class, and have the same update
protections outlined above.  Seems that would require heavier locking.

Is this something anyone has even needed or had requested?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: [RFC] building postgres with meson -v8

2022-06-08 Thread Peter Eisentraut


Attached is a patch the finishes up the work to move the snowball SQL 
script generation into a separate script.From 02ca51dfb918666dfde8e48499a4c73afae4e89e Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 8 Jun 2022 11:05:31 +0200
Subject: [PATCH] fixup! meson: prereq: move snowball_create.sql creation into
 perl file.

- Remove LANGUAGES list from Makefile, keep in Perl script.
- Add list of stopword files to install.
- Make depfile generation optional.
- Call new Perl script from Install.pm.
---
 src/backend/snowball/Makefile   | 81 -
 src/backend/snowball/meson.build|  3 +-
 src/backend/snowball/snowball_create.pl | 73 +-
 src/tools/msvc/Install.pm   | 36 +--
 4 files changed, 86 insertions(+), 107 deletions(-)

diff --git a/src/backend/snowball/Makefile b/src/backend/snowball/Makefile
index 259104f8eb..c12a77055d 100644
--- a/src/backend/snowball/Makefile
+++ b/src/backend/snowball/Makefile
@@ -72,40 +72,22 @@ OBJS += \
stem_UTF_8_turkish.o \
stem_UTF_8_yiddish.o
 
-# first column is language name and also name of dictionary for not-all-ASCII
-# words, second is name of dictionary for all-ASCII words
-# Note order dependency: use of some other language as ASCII dictionary
-# must come after creation of that language
-LANGUAGES=  \
-   arabic  arabic  \
-   armenianarmenian\
-   basque  basque  \
-   catalan catalan \
-   danish  danish  \
-   dutch   dutch   \
-   english english \
-   finnish finnish \
-   french  french  \
-   german  german  \
-   greek   greek   \
-   hindi   english \
-   hungarian   hungarian   \
-   indonesian  indonesian  \
-   irish   irish   \
-   italian italian \
-   lithuanian  lithuanian  \
-   nepali  nepali  \
-   norwegian   norwegian   \
-   portuguese  portuguese  \
-   romanianromanian\
-   russian english \
-   serbian serbian \
-   spanish spanish \
-   swedish swedish \
-   tamil   tamil   \
-   turkish turkish \
-   yiddish yiddish
-
+stop_files = \
+   danish.stop \
+   dutch.stop \
+   english.stop \
+   finnish.stop \
+   french.stop \
+   german.stop \
+   hungarian.stop \
+   italian.stop \
+   nepali.stop \
+   norwegian.stop \
+   portuguese.stop \
+   russian.stop \
+   spanish.stop \
+   swedish.stop \
+   turkish.stop
 
 SQLSCRIPT= snowball_create.sql
 DICTDIR=tsearch_data
@@ -119,35 +101,22 @@ all: all-shared-lib $(SQLSCRIPT)
 
 include $(top_srcdir)/src/Makefile.shlib
 
-$(SQLSCRIPT): snowball_create.pl Makefile snowball_func.sql.in snowball.sql.in
+$(SQLSCRIPT): snowball_create.pl snowball_func.sql.in snowball.sql.in
$(PERL) $< --input ${srcdir} --output .
 
 install: all installdirs install-lib
$(INSTALL_DATA) $(SQLSCRIPT) '$(DESTDIR)$(datadir)'
-   @set -e; \
-   set $(LANGUAGES) ; \
-   while [ "$$#" -gt 0 ] ; \
-   do \
-   lang=$$1; shift; shift; \
-   if [ -s $(srcdir)/stopwords/$${lang}.stop ] ; then \
-   $(INSTALL_DATA) $(srcdir)/stopwords/$${lang}.stop 
'$(DESTDIR)$(datadir)/$(DICTDIR)' ; \
-   fi \
-   done
+   $(INSTALL_DATA) $(addprefix $(srcdir)/stopwords/,$(stop_files)) 
'$(DESTDIR)$(datadir)/$(DICTDIR)'
 
 installdirs: installdirs-lib
$(MKDIR_P) '$(DESTDIR)$(datadir)' '$(DESTDIR)$(datadir)/$(DICTDIR)'
 
 uninstall: uninstall-lib
rm -f '$(DESTDIR)$(datadir)/$(SQLSCRIPT)'
-   @set -e; \
-   set $(LANGUAGES) ; \
-   while [ "$$#" -gt 0 ] ; \
-   do \
-   lang=$$1; shift; shift;  \
-   if [ -s $(srcdir)/stopwords/$${lang}.stop ] ; then \
-   rm -f '$(DESTDIR)$(datadir)/$(DICTDIR)/'$${lang}.stop ; \
-   fi \
-   done
-
-clean distclean maintainer-clean: clean-lib
-   rm -f $(OBJS) $(SQLSCRIPT) snowball_create.dep
+   rm -f $(addprefix '$(DESTDIR)$(datadir)/$(DICTDIR)/',$(stop_files))
+
+clean distclean: clean-lib
+   rm -f $(OBJS)
+
+maintainer-clean: distclean
+   rm -f $(SQLSCRIPT)
diff --git a/src/backend/snowball/meson.build b/src/backend/snowball/meson.build
index 30aed714dc..c6326380e1 100644
--- a/src/backend/snowball/meson.build
+++ b/src/backend/snowball/meson.build
@@ -70,12 +70,11 @@ snowball_create = custom_target('snowball_create',
   input: ['snowball_create.pl'],
   output: ['snowball_create.sql'],
   depfile: 'snowball_create.dep',
-  command: [perl, 

=> operator for named parameters in open cursor

2022-06-08 Thread Martin Butter
Since Postgres 9.5 it is possible to use the => operators for filling in 
named parameters in a function call, like perform my_func(named_param => 
'some value'). The old form, with the := operator is still allowed for 
backward reference.


But with open cursor, still only the old := operator is allowed. 
Wouldn't it be more appropriate to at least allow the => operator there 
as well, like open my_cursor(named_param => 'some value')?






Re: Collation version tracking for macOS

2022-06-08 Thread Robert Haas
On Tue, Jun 7, 2022 at 4:10 PM Tom Lane  wrote:
> I mean by "false positive" is telling every macOS user that they'd better
> reindex everything every year, when in point of fact Apple changes those
> collations almost never.

Do we actually know that to be true? Given how fast things seem to be
getting added to Unicode, it wouldn't surprise me at all if they're
updating their Unicode tables for new characters with some regularity,
if nothing else, and that's a breaking change for us.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Skipping schema changes in publication

2022-06-08 Thread Amit Kapila
On Fri, Jun 3, 2022 at 3:37 PM vignesh C  wrote:
>
> Thanks for the comments, the attached v8 patch has the changes for the same.
>

AFAICS, the summary of this proposal is that we want to support
exclude of certain objects from publication with two kinds of
variants. The first variant is to add support to exclude specific
tables from ALL TABLES PUBLICATION. Without this feature, users need
to manually add all tables for a database even when she wants to avoid
only a handful of tables from the database say because they contain
sensitive information or are not required. We have seen that other
database like MySQL also provides similar feature [1] (See
REPLICATE_WILD_IGNORE_TABLE). The proposed syntax for this is as
follows:

CREATE PUBLICATION pub1 FOR ALL TABLES EXCEPT TABLE t1,t2;
or
ALTER PUBLICATION pub1 ADD ALL TABLES EXCEPT TABLE t1,t2;

This will allow us to publish all the tables in the current database
except t1 and t2. Now, I see that pg_dump has a similar option
provided by switch --exclude-table but that allows tables matching
patterns which is not the case here. I am not sure if we need a
similar variant here.

Then users will be allowed to reset the publication by:
ALTER PUBLICATION pub1 RESET;

This will reset the publication to the default state which includes
resetting the publication parameters, setting the ALL TABLES flag to
false, and dropping the relations and schemas that are associated with
the publication. I don't know if we want to go further with allowing
to RESET specific parameters and if so which parameters and what would
its syntax be?

The second variant is to add support to exclude certain columns of a
table while publishing a particular table. Currently, users need to
list all required columns' names even if they don't want to hide most
of the columns in the table (for example Create Publication pub For
Table t1 (c1, c2)). Consider user doesn't want to publish the 'salary'
or other sensitive information of executives/employees but would like
to publish all other columns. I feel in such cases it will be a lot of
work for the user especially when the table has many columns. I see
that Oracle has a similar feature [2]. I think without this it will be
difficult for users to use this feature in some cases. The patch for
this is not proposed but I would imagine syntax for it to be something
like "Create Publication pub For Table t1 Except (c3)" and similar
variants for Alter Publication.

Have I missed anything?

Thoughts on the proposal/syntax would be appreciated?

[1] - https://dev.mysql.com/doc/refman/5.7/en/change-replication-filter.html
[2] - 
https://docs.oracle.com/en/cloud/paas/goldengate-cloud/gwuad/selecting-columns.html#GUID-9A851C8B-48F7-43DF-8D98-D086BE069E20

-- 
With Regards,
Amit Kapila.




Re: Collation version tracking for macOS

2022-06-08 Thread Robert Haas
On Tue, Jun 7, 2022 at 3:53 PM Tom Lane  wrote:
> No, I quite agree that we have a problem.  What I don't agree is that
> issuing a lot of false-positive warnings is a solution.  That will
> just condition people to ignore the warnings, and then when their
> platform really does change behavior, they're still screwed.  If we
> could *accurately* report collation behavioral changes, I'd be all
> for that.

I mean, how many false-positive warnings do you think we'll get?

I would argue that if we put out something that's wrong half the time
-- it tells you about all the real problems and an equal number of
imaginary ones -- we'd be way ahead of where we are right now.  If on
the other hand we put out something that's wrong 99% of the time -- it
tells you about all the real problems and ninety-nine times as many
imaginary ones -- that's worse than useless.

There can be some weasel wording in the language e.g. "WARNING:  glibc
has been updated, collation definitions may have changed". It's worth
keeping in mind that the user doesn't necessarily have another source
of information that is more accurate than what we're providing. If
they REINDEX somewhat more often than is really necessary, that may be
painful, but it can still be a lot better than having queries return
wrong answers. If it's not, nobody's forcing them to issue that
REINDEX command.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Collation version tracking for macOS

2022-06-08 Thread Thomas Munro
On Wed, Jun 8, 2022 at 12:23 PM Peter Geoghegan  wrote:
> ISTM that there are two mostly-distinct questions here:
>
> 1. How do we link to multiple versions of ICU at the same time, in a
> way that is going to work smoothly on mainstream platforms?
>
> 2. What semantics around collations do we want for Postgres once we
> gain the ability to use multiple versions of ICU at the same time? For
> example, do we want to generalize the definition of a collation, so
> that it's associated with one particular ICU version and collation for
> the purposes of on-disk compatibility, but isn't necessarily tied to
> the same ICU version in other contexts, such as on a dump and restore?

Yeah.  Well I couldn't resist doing some (very!) experimental hacking.
See attached.  The idea of putting a raw library name in there is just
a straw-man, and I already found a major problem with it: I also need
to get my hands on u_strToLower and friends for formatting.c, but
those functions are in a different library that needs to be dlopen'd
separately, so we need *two* names.  That's not done in the attached
patch, but at least this demonstrates some of the mechanics of a
dlopen() based solution that can do the collating part...  of course
there are all kinds of problems apparent (security of loading
arbitrary libraries, API stability, interaction with the "default" ICU
that our binary is linked against, creation of initial set of
collations in initdb, naming, upgrades, ...).

Demo:

$ sudo apt-get install libicu63 libicu67

postgres=# create schema icu63;
CREATE SCHEMA
postgres=# create schema icu67;
CREATE SCHEMA
postgres=# create collation icu63."en-US-x-icu" (provider = icu,
locale = 'libicui18n.so.63:en-US');
CREATE COLLATION
postgres=# create collation icu67."en-US-x-icu" (provider = icu,
locale = 'libicui18n.so.67:en-US');
CREATE COLLATION
postgres=# select collname, collnamespace::regnamespace,
colliculocale, collversion
 from pg_collation
where collname = 'en-US-x-icu';
  collname   | collnamespace | colliculocale  | collversion
-+---++-
 en-US-x-icu | pg_catalog| en-US  | 153.14
 en-US-x-icu | icu63 | libicui18n.so.63:en-US | 153.88
 en-US-x-icu | icu67 | libicui18n.so.67:en-US | 153.14
(3 rows)
postgres=# select relname from pg_class order by relname collate
icu63."en-US-x-icu" limit 2;
  relname
---
 _pg_foreign_data_wrappers
 _pg_foreign_servers
(2 rows)
From 5622f25172881e021d0f436add8a785f9e3445e5 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Wed, 8 Jun 2022 17:43:53 +1200
Subject: [PATCH] WIP: allow multiple ICU libraries

XXX This is highly experimental code
---
 src/backend/access/hash/hashfunc.c |  16 +--
 src/backend/utils/adt/pg_locale.c  | 209 +++--
 src/backend/utils/adt/varchar.c|  16 +--
 src/backend/utils/adt/varlena.c|  47 +++
 src/include/utils/pg_locale.h  |  47 +++
 5 files changed, 284 insertions(+), 51 deletions(-)

diff --git a/src/backend/access/hash/hashfunc.c b/src/backend/access/hash/hashfunc.c
index b57ed946c4..c1847149de 100644
--- a/src/backend/access/hash/hashfunc.c
+++ b/src/backend/access/hash/hashfunc.c
@@ -298,11 +298,11 @@ hashtext(PG_FUNCTION_ARGS)
 
 			ulen = icu_to_uchar(, VARDATA_ANY(key), VARSIZE_ANY_EXHDR(key));
 
-			bsize = ucol_getSortKey(mylocale->info.icu.ucol,
-	uchar, ulen, NULL, 0);
+			bsize = mylocale->info.icu.funcs->getSortKey(mylocale->info.icu.ucol,
+		 uchar, ulen, NULL, 0);
 			buf = palloc(bsize);
-			ucol_getSortKey(mylocale->info.icu.ucol,
-			uchar, ulen, buf, bsize);
+			mylocale->info.icu.funcs->getSortKey(mylocale->info.icu.ucol,
+ uchar, ulen, buf, bsize);
 
 			result = hash_any(buf, bsize);
 
@@ -355,11 +355,11 @@ hashtextextended(PG_FUNCTION_ARGS)
 
 			ulen = icu_to_uchar(, VARDATA_ANY(key), VARSIZE_ANY_EXHDR(key));
 
-			bsize = ucol_getSortKey(mylocale->info.icu.ucol,
-	uchar, ulen, NULL, 0);
+			bsize = mylocale->info.icu.funcs->getSortKey(mylocale->info.icu.ucol,
+		 uchar, ulen, NULL, 0);
 			buf = palloc(bsize);
-			ucol_getSortKey(mylocale->info.icu.ucol,
-			uchar, ulen, buf, bsize);
+			mylocale->info.icu.funcs->getSortKey(mylocale->info.icu.ucol,
+ uchar, ulen, buf, bsize);
 
 			result = hash_any_extended(buf, bsize, PG_GETARG_INT64(1));
 
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index a0490a7522..3a8951fe46 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -78,6 +78,10 @@
 #include 
 #endif
 
+#ifdef HAVE_DLOPEN
+#include 
+#endif
+
 #define		MAX_L10N_DATA		80
 
 
@@ -1435,29 +1439,204 @@ lc_ctype_is_c(Oid collation)
 	return (lookup_collation_cache(collation, true))->ctype_is_c;
 }
 
+#ifdef USE_ICU
+
 struct pg_locale_struct default_locale;
 
+/* Linked list of ICU libraries we have 

Re: Defer selection of asynchronous subplans until the executor initialization stage

2022-06-08 Thread Etsuro Fujita
On Fri, Jun 3, 2022 at 1:03 AM Zhihong Yu  wrote:
> Suggestion on formatting the comment:
>
> + * node (or that for any plan node in the subplan tree), 2)
> + * set_plan_references() modifies the tlist for every plan node in the
>
> It would be more readable if `2)` is put at the beginning of the second line 
> above.
>
> + * preserves the length and order of the tlist, and 3) set_plan_references()
> + * might delete the topmost plan node like an Append or MergeAppend from the
>
> Similarly you can move `3) set_plan_references()` to the beginning of the 
> next line.

Seems like a good idea, so I updated the patch as you suggest.  I did
some indentation as well, which I think improves readability a bit
further.  Attached is an updated version.  If no objections, I’ll
commit the patch.

Thanks for reviewing!

Best regards,
Etsuro Fujita


Improve-comments-for-trivial_subqueryscan-2.patch
Description: Binary data


Re: Error from the foreign RDBMS on a foreign table I have no privilege on

2022-06-08 Thread Etsuro Fujita
On Wed, Jun 8, 2022 at 2:51 PM Kyotaro Horiguchi
 wrote:
> At Wed, 08 Jun 2022 07:05:09 +0200, Laurenz Albe  
> wrote in
> > I take Tom's comment above as saying that the current behavior is fine.
> > So yes, perhaps some documentation would be in order:
> >
> > diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
> > index b43d0aecba..b4b7e36d28 100644
> > --- a/doc/src/sgml/postgres-fdw.sgml
> > +++ b/doc/src/sgml/postgres-fdw.sgml
> > @@ -274,6 +274,14 @@ OPTIONS (ADD password_required 'false');
> > but only for that table.
> > The default is false.
> >
> > +
> > +  
> > +   Note that EXPLAIN will be run on the remote 
> > server
> > +   at query planning time, before permissions on 
> > the
> > +   foreign table are checked.  This is not a security problem, since 
> > the
> > +   subsequent error from the permission check will prevent the user 
> > from
> > +   seeing any of the resulting data.
> > +  
> >   
> >  
>
> Looks fine.  I'd like to add something like "If needed, depriving
> unprivileged users of relevant user mappings will prevent such remote
> executions that happen at planning-time."

I agree on that point; if the EXPLAIN done on the remote side is
really a problem, I think the user should revoke privileges from the
remote user specified in the user mapping, to prevent it.  I’d rather
recommend granting to the remote user privileges consistent with those
granted to the local user.

Best regards,
Etsuro Fujita




Re: pg_rewind: warn when checkpoint hasn't happened after promotion

2022-06-08 Thread Kyotaro Horiguchi
At Wed, 08 Jun 2022 18:15:09 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Tue, 07 Jun 2022 16:05:47 +0900 (JST), Kyotaro Horiguchi 
>  wrote in 
> > At Tue, 07 Jun 2022 12:39:38 +0900 (JST), Kyotaro Horiguchi 
> >  wrote in 
> > > One possible way to detect promotion reliably is to look into timeline
> > > history files. It is written immediately at promotion even on
> > > standbys.
> > 
> > The attached seems to work. It uses timeline history files to identify
> > the source timeline.  With this change pg_waldump no longer need to
> > wait for end-of-recovery to finish.
> > 
> > (It lacks doc part and test.. But I'm not sure how we can test this
> > behavior.)
> 
> This is a revised version.
> 
> Revised getTimelineHistory()'s logic (refactored, and changed so that
> it doesn't pick-up the wrong history files).
> 
> perform_rewind always identify endtli based on source's timeline
> history.

No need to "search" history file to identify it.  The latest timeline
must be that.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 69d6924b3a..7f752b2ed0 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -334,15 +334,6 @@ GRANT EXECUTE ON function 
pg_catalog.pg_read_binary_file(text, bigint, bigint, b
 
   
 
-  
-   When executing pg_rewind using an online
-   cluster as source which has been recently promoted, it is necessary
-   to execute a CHECKPOINT after promotion such that its
-   control file reflects up-to-date timeline information, which is used by
-   pg_rewind to check if the target cluster
-   can be rewound using the designated source cluster.
-  
-
   
How It Works
 
diff --git a/src/bin/pg_rewind/file_ops.c b/src/bin/pg_rewind/file_ops.c
index 6cb288f099..2a407da1e4 100644
--- a/src/bin/pg_rewind/file_ops.c
+++ b/src/bin/pg_rewind/file_ops.c
@@ -309,9 +309,11 @@ sync_target_dir(void)
  * buffer is actually *filesize + 1. That's handy when reading a text file.
  * This function can be used to read binary files as well, you can just
  * ignore the zero-terminator in that case.
+ *
+ * If noerror is true, returns NULL when the file is not found.
  */
 char *
-slurpFile(const char *datadir, const char *path, size_t *filesize)
+slurpFile(const char *datadir, const char *path, size_t *filesize, bool 
noerror)
 {
int fd;
char   *buffer;
@@ -323,8 +325,13 @@ slurpFile(const char *datadir, const char *path, size_t 
*filesize)
snprintf(fullpath, sizeof(fullpath), "%s/%s", datadir, path);
 
if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) == -1)
+   {
+   if (noerror && errno == ENOENT)
+   return NULL;
+
pg_fatal("could not open file \"%s\" for reading: %m",
 fullpath);
+   }
 
if (fstat(fd, ) < 0)
pg_fatal("could not open file \"%s\" for reading: %m",
diff --git a/src/bin/pg_rewind/file_ops.h b/src/bin/pg_rewind/file_ops.h
index 54a853bd42..92e19042cb 100644
--- a/src/bin/pg_rewind/file_ops.h
+++ b/src/bin/pg_rewind/file_ops.h
@@ -21,7 +21,8 @@ extern void create_target(file_entry_t *t);
 extern void remove_target(file_entry_t *t);
 extern void sync_target_dir(void);
 
-extern char *slurpFile(const char *datadir, const char *path, size_t 
*filesize);
+extern char *slurpFile(const char *datadir, const char *path, size_t *filesize,
+   bool noerror);
 
 typedef void (*process_file_callback_t) (const char *path, file_type_t type, 
size_t size, const char *link_target);
 extern void traverse_datadir(const char *datadir, process_file_callback_t 
callback);
diff --git a/src/bin/pg_rewind/libpq_source.c b/src/bin/pg_rewind/libpq_source.c
index 011c9cce6e..751c96e6e4 100644
--- a/src/bin/pg_rewind/libpq_source.c
+++ b/src/bin/pg_rewind/libpq_source.c
@@ -68,7 +68,7 @@ static void libpq_queue_fetch_range(rewind_source *source, 
const char *path,
off_t 
off, size_t len);
 static void libpq_finish_fetch(rewind_source *source);
 static char *libpq_fetch_file(rewind_source *source, const char *path,
- size_t *filesize);
+ size_t *filesize, 
bool noerror);
 static XLogRecPtr libpq_get_current_wal_insert_lsn(rewind_source *source);
 static void libpq_destroy(rewind_source *source);
 
@@ -620,9 +620,12 @@ appendArrayEscapedString(StringInfo buf, const char *str)
 
 /*
  * Fetch a single file as a malloc'd buffer.
+ *
+ * If noerror is true, returns NULL if pg_read_binary_file() failed.
  */
 static char *
-libpq_fetch_file(rewind_source *source, const char *path, size_t *filesize)
+libpq_fetch_file(rewind_source *source, const char *path, size_t *filesize,
+bool noerror)
 {
PGconn *conn 

Re: pg_rewind: warn when checkpoint hasn't happened after promotion

2022-06-08 Thread Kyotaro Horiguchi
At Tue, 07 Jun 2022 16:05:47 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Tue, 07 Jun 2022 12:39:38 +0900 (JST), Kyotaro Horiguchi 
>  wrote in 
> > One possible way to detect promotion reliably is to look into timeline
> > history files. It is written immediately at promotion even on
> > standbys.
> 
> The attached seems to work. It uses timeline history files to identify
> the source timeline.  With this change pg_waldump no longer need to
> wait for end-of-recovery to finish.
> 
> (It lacks doc part and test.. But I'm not sure how we can test this
> behavior.)

This is a revised version.

Revised getTimelineHistory()'s logic (refactored, and changed so that
it doesn't pick-up the wrong history files).

perform_rewind always identify endtli based on source's timeline
history.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 69d6924b3a..7f752b2ed0 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -334,15 +334,6 @@ GRANT EXECUTE ON function 
pg_catalog.pg_read_binary_file(text, bigint, bigint, b
 
   
 
-  
-   When executing pg_rewind using an online
-   cluster as source which has been recently promoted, it is necessary
-   to execute a CHECKPOINT after promotion such that its
-   control file reflects up-to-date timeline information, which is used by
-   pg_rewind to check if the target cluster
-   can be rewound using the designated source cluster.
-  
-
   
How It Works
 
diff --git a/src/bin/pg_rewind/file_ops.c b/src/bin/pg_rewind/file_ops.c
index 6cb288f099..2a407da1e4 100644
--- a/src/bin/pg_rewind/file_ops.c
+++ b/src/bin/pg_rewind/file_ops.c
@@ -309,9 +309,11 @@ sync_target_dir(void)
  * buffer is actually *filesize + 1. That's handy when reading a text file.
  * This function can be used to read binary files as well, you can just
  * ignore the zero-terminator in that case.
+ *
+ * If noerror is true, returns NULL when the file is not found.
  */
 char *
-slurpFile(const char *datadir, const char *path, size_t *filesize)
+slurpFile(const char *datadir, const char *path, size_t *filesize, bool 
noerror)
 {
int fd;
char   *buffer;
@@ -323,8 +325,13 @@ slurpFile(const char *datadir, const char *path, size_t 
*filesize)
snprintf(fullpath, sizeof(fullpath), "%s/%s", datadir, path);
 
if ((fd = open(fullpath, O_RDONLY | PG_BINARY, 0)) == -1)
+   {
+   if (noerror && errno == ENOENT)
+   return NULL;
+
pg_fatal("could not open file \"%s\" for reading: %m",
 fullpath);
+   }
 
if (fstat(fd, ) < 0)
pg_fatal("could not open file \"%s\" for reading: %m",
diff --git a/src/bin/pg_rewind/file_ops.h b/src/bin/pg_rewind/file_ops.h
index 54a853bd42..92e19042cb 100644
--- a/src/bin/pg_rewind/file_ops.h
+++ b/src/bin/pg_rewind/file_ops.h
@@ -21,7 +21,8 @@ extern void create_target(file_entry_t *t);
 extern void remove_target(file_entry_t *t);
 extern void sync_target_dir(void);
 
-extern char *slurpFile(const char *datadir, const char *path, size_t 
*filesize);
+extern char *slurpFile(const char *datadir, const char *path, size_t *filesize,
+   bool noerror);
 
 typedef void (*process_file_callback_t) (const char *path, file_type_t type, 
size_t size, const char *link_target);
 extern void traverse_datadir(const char *datadir, process_file_callback_t 
callback);
diff --git a/src/bin/pg_rewind/libpq_source.c b/src/bin/pg_rewind/libpq_source.c
index 011c9cce6e..751c96e6e4 100644
--- a/src/bin/pg_rewind/libpq_source.c
+++ b/src/bin/pg_rewind/libpq_source.c
@@ -68,7 +68,7 @@ static void libpq_queue_fetch_range(rewind_source *source, 
const char *path,
off_t 
off, size_t len);
 static void libpq_finish_fetch(rewind_source *source);
 static char *libpq_fetch_file(rewind_source *source, const char *path,
- size_t *filesize);
+ size_t *filesize, 
bool noerror);
 static XLogRecPtr libpq_get_current_wal_insert_lsn(rewind_source *source);
 static void libpq_destroy(rewind_source *source);
 
@@ -620,9 +620,12 @@ appendArrayEscapedString(StringInfo buf, const char *str)
 
 /*
  * Fetch a single file as a malloc'd buffer.
+ *
+ * If noerror is true, returns NULL if pg_read_binary_file() failed.
  */
 static char *
-libpq_fetch_file(rewind_source *source, const char *path, size_t *filesize)
+libpq_fetch_file(rewind_source *source, const char *path, size_t *filesize,
+bool noerror)
 {
PGconn *conn = ((libpq_source *) source)->conn;
PGresult   *res;
@@ -631,6 +634,34 @@ libpq_fetch_file(rewind_source *source, const char *path, 
size_t *filesize)
const char *paramValues[1];
 
 

Re: effective_io_concurrency and NVMe devices

2022-06-08 Thread Tomas Vondra
On 6/8/22 08:29, Jakub Wartak wrote:
 The attached patch is a trivial version that waits until we're at
 least
 32 pages behind the target, and then prefetches all of them. Maybe give it 
 a
>> try?
 (This pretty much disables prefetching for e_i_c below 32, but for an
 experimental patch that's enough.)
>>>
>>> I've tried it at e_i_c=10 initially on David's setup.sql, and most defaults
>> s_b=128MB, dbsize=8kb but with forced disabled parallel query (for easier
>> inspection with strace just to be sure//so please don't compare times).
>>>
>>> run:
>>> a) master (e_i_c=10)  181760ms, 185680ms, 185384ms @ ~ 340MB/s and 44k
>>> IOPS (~122k IOPS practical max here for libaio)
>>> b) patched(e_i_c=10)  237774ms, 236326ms, ..as you stated it disabled
>>> prefetching, fadvise() not occurring
>>> c) patched(e_i_c=128) 90430ms, 88354ms, 85446ms, 78475ms, 74983ms,
>>> 81432ms (mean=83186ms +/- 5947ms) @ ~570MB/s and 75k IOPS (it even
>>> peaked for a second on ~122k)
>>> d) master (e_i_c=128) 116865ms, 101178ms, 89529ms, 95024ms, 89942ms
>>> 99939ms (mean=98746ms +/- 10118ms) @ ~510MB/s and 65k IOPS (rare peaks
>>> to 90..100k IOPS)
>>>
>>> ~16% benefit sounds good (help me understand: L1i cache?). Maybe it is
>>> worth throwing that patch onto more advanced / complete performance
>>> test farm too ? (although it's only for bitmap heap scans)
> 
> I hope you have some future plans for this patch :)
> 

I think the big challenge is to make this adaptive, i.e. work well for
access patterns that are not known in advance. The existing prefetching
works fine for our random stuff (even for nvme devices), not so much for
sequential (as demonstrated by David's example).

>> Yes, kernel certainly does it's own read-ahead, which works pretty well for
>> sequential patterns. What does
>>
>>blockdev --getra /dev/...
>>
>> say?
> 
> It's default, 256 sectors (128kb) so it matches.
> 

Right. I think this is pretty much why (our) prefetching performs so
poorly on sequential access patterns - the kernel read-ahead works very
well in this case, and our prefetching can't help but can interfere.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Replica Identity check of partition table on subscriber

2022-06-08 Thread shiy.f...@fujitsu.com
Hi hackers,

I saw a problem in logical replication, when the target table on subscriber is a
partitioned table, it only checks whether the Replica Identity of partitioned
table is consistent with the publisher, and doesn't check Replica Identity of
the partition.

For example:
-- publisher --
create table tbl (a int not null, b int);
create unique INDEX ON tbl (a);
alter table tbl replica identity using INDEX tbl_a_idx;
create publication pub for table tbl;

-- subscriber --
-- table tbl (parent table) has RI index, while table child has no RI index.
create table tbl (a int not null, b int) partition by range (a);
create table child partition of tbl default;
create unique INDEX ON tbl (a);
alter table tbl replica identity using INDEX tbl_a_idx;
create subscription sub connection 'port=5432 dbname=postgres' publication pub;

-- publisher --
insert into tbl values (11,11);
update tbl set a=a+1;

It caused an assertion failure on subscriber:
TRAP: FailedAssertion("OidIsValid(idxoid) || (remoterel->replident == 
REPLICA_IDENTITY_FULL)", File: "worker.c", Line: 2088, PID: 1616523)

The backtrace is attached.

We got the assertion failure because idxoid is invalid, as table child has no
Replica Identity or Primary Key. We have a check in check_relation_updatable(),
but what it checked is table tbl (the parent table) and it passed the check.

I think one approach to fix it is to check the target partition in this case,
instead of the partitioned table.

When trying to fix it, I saw some other problems about updating partition map
cache.

a) In logicalrep_partmap_invalidate_cb(), the type of the entry in
LogicalRepPartMap should be LogicalRepPartMapEntry, instead of
LogicalRepRelMapEntry.

b) In logicalrep_partition_open(), it didn't check if the entry is valid.

c) When the publisher send new relation mapping, only relation map cache will be
updated, and partition map cache wouldn't. I think it also should be updated
because it has remote relation information, too.

Attach two patches which tried to fix them.
0001 patch: fix the above three problems about partition map cache.
0002 patch: fix the assertion failure, check the Replica Identity of the
partition if the target table is a partitioned table.

Thanks to Hou Zhijie for helping me in the first patch.

I will add a test for it later if no one doesn't like this fix.

Regards,
Shi yu


v1-0001-Fix-partition-map-cache-issues.patch
Description: v1-0001-Fix-partition-map-cache-issues.patch


v1-0002-Check-partition-table-replica-identity-on-subscri.patch
Description:  v1-0002-Check-partition-table-replica-identity-on-subscri.patch
#0  0x7fd35e76970f in raise () from /lib64/libc.so.6
#1  0x7fd35e753b25 in abort () from /lib64/libc.so.6
#2  0x00fff9b4 in ExceptionalCondition (conditionName=0x1237a80 
"OidIsValid(idxoid) || (remoterel->replident == REPLICA_IDENTITY_FULL)",
errorType=0x1237208 "FailedAssertion", fileName=0x123737f "worker.c", 
lineNumber=2088) at assert.c:69
#3  0x00c354d4 in FindReplTupleInLocalRel (estate=0x1c92240, 
localrel=0x7fd352ce7bf8, remoterel=0x1ca5b28, remoteslot=0x1c962a0, 
localslot=0x7ffc8b70c360)
at worker.c:2087
#4  0x00c35a0f in apply_handle_tuple_routing (edata=0x1c7b3b0, 
remoteslot=0x1c7b808, newtup=0x7ffc8b70c420, operation=CMD_UPDATE) at 
worker.c:2192
#5  0x00c34a1d in apply_handle_update (s=0x7ffc8b70c4f0) at 
worker.c:1855
#6  0x00c36cab in apply_dispatch (s=0x7ffc8b70c4f0) at worker.c:2481
#7  0x00c37834 in LogicalRepApplyLoop (last_received=22131584) at 
worker.c:2757
#8  0x00c39ee5 in start_apply (origin_startpos=0) at worker.c:3531
#9  0x00c3ae14 in ApplyWorkerMain (main_arg=0) at worker.c:3787
#10 0x00bbd903 in StartBackgroundWorker () at bgworker.c:858
#11 0x00bd175a in do_start_bgworker (rw=0x1bd4910) at postmaster.c:5815
#12 0x00bd1ea5 in maybe_start_bgworkers () at postmaster.c:6039
#13 0x00bcfdb7 in sigusr1_handler (postgres_signal_arg=10) at 
postmaster.c:5204
#14 
#15 0x7fd35e8254ab in select () from /lib64/libc.so.6
#16 0x00bc74fc in ServerLoop () at postmaster.c:1770
#17 0x00bc6a67 in PostmasterMain (argc=3, argv=0x1ba9b00) at 
postmaster.c:1478
#18 0x00a10b78 in main (argc=3, argv=0x1ba9b00) at main.c:202


Re: CPU time for pg_stat_statement

2022-06-08 Thread Michail Nikolaev
Hello, Tom.

>> This is a pretty broad claim to make on the basis of one undocumented
>> test case on one unmentioned platform.

> I'll try to use pg_stat_kcache to check the difference between Wall
> and CPU for my case.

In my case I see pretty high correlation of pg_stat_kcache and
pg_stat_statement (clock_gettime vs getrusage).
Looks like CPU usage is hidden somewhere else (planning probably, not
measured on postgres 11, but I see really high
*clauselist_selectivity* in perf).

Thanks,
Michail.




RE: effective_io_concurrency and NVMe devices

2022-06-08 Thread Jakub Wartak
> >> The attached patch is a trivial version that waits until we're at
> >> least
> >> 32 pages behind the target, and then prefetches all of them. Maybe give it 
> >> a
> try?
> >> (This pretty much disables prefetching for e_i_c below 32, but for an
> >> experimental patch that's enough.)
> >
> > I've tried it at e_i_c=10 initially on David's setup.sql, and most defaults
> s_b=128MB, dbsize=8kb but with forced disabled parallel query (for easier
> inspection with strace just to be sure//so please don't compare times).
> >
> > run:
> > a) master (e_i_c=10)  181760ms, 185680ms, 185384ms @ ~ 340MB/s and 44k
> > IOPS (~122k IOPS practical max here for libaio)
> > b) patched(e_i_c=10)  237774ms, 236326ms, ..as you stated it disabled
> > prefetching, fadvise() not occurring
> > c) patched(e_i_c=128) 90430ms, 88354ms, 85446ms, 78475ms, 74983ms,
> > 81432ms (mean=83186ms +/- 5947ms) @ ~570MB/s and 75k IOPS (it even
> > peaked for a second on ~122k)
> > d) master (e_i_c=128) 116865ms, 101178ms, 89529ms, 95024ms, 89942ms
> > 99939ms (mean=98746ms +/- 10118ms) @ ~510MB/s and 65k IOPS (rare peaks
> > to 90..100k IOPS)
> >
> > ~16% benefit sounds good (help me understand: L1i cache?). Maybe it is
> > worth throwing that patch onto more advanced / complete performance
> > test farm too ? (although it's only for bitmap heap scans)

I hope you have some future plans for this patch :)

> Yes, kernel certainly does it's own read-ahead, which works pretty well for
> sequential patterns. What does
> 
>blockdev --getra /dev/...
> 
> say?

It's default, 256 sectors (128kb) so it matches.

-J.




Re: [RFC] building postgres with meson -v8

2022-06-08 Thread Peter Eisentraut
I looked at some of the "prereq" patches again to see what state they 
are in:


commit 351a12f48e395b31cce4aca239b934174b36ea9d
Author: Andres Freund 
Date:   Wed Apr 20 22:46:54 2022

prereq: deal with \ paths in basebackup_to_shell tests.

This is a new component in PG15, so a fix might be in scope for PG15 
too.  But I don't know if this change is really necessary.  There are 
other tests that use the GZIP and TAR environment variables (e.g., 
pg_verifybackup).  If this is a problem there too, we should think of a 
general solution.  If not, it could use some explanation.



commit c00642483a53f4ee6e351085c7628363c293ee61
Author: Andres Freund 
Date:   Fri Mar 25 21:44:48 2022

meson: prereq: unicode: allow to specify output directory.

OK with attached fixup (but see below).


commit 31313056e153e099f236a29b752f7610c4f7764f
Author: Andres Freund 
Date:   Thu Jan 20 08:36:50 2022

meson: prereq: generate-errcodes.pl: accept output file

This is ok, but seems unnecessary, since meson can capture the output of 
a single file.  (See also similar script generate-errcodes-table.pl in 
doc/, which uses capture.)



commit e4e77c0e20f3532be4ed270a7cf8b965b7cafa49
Author: Andres Freund 
Date:   Thu Jan 20 08:36:50 2022

meson: prereq: add output path arg in generate-lwlocknames.pl

We should make the command-line interface here the same as the unicode 
script: Either make the output directory a positional argument or an 
option. I don't have a strong feeling about it either way, but perhaps 
the solution with the option is more elegant and would also not require 
changing the makefiles.  Also, we should decide on short or long option: 
The code declares a long option, but the build uses a short option. 
It's confusing that that even works.



commit 7866620afa65223f6e657da972f501615fd32d3b
Author: Andres Freund 
Date:   Wed Apr 20 21:01:31 2022

meson: prereq: output and depencency tracking work.

This could be split into multiple parts with more detailed explanations. 
 I see where you're going but not everything is fully clear to me 
(especially the guc-file.c.h stuff).From 51c6d3544ae9e652c7aac26102a8bf5a116fb182 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 7 Jun 2022 22:54:26 +0200
Subject: [PATCH] fixup! meson: prereq: unicode: allow to specify output
 directory.

---
 src/common/unicode/generate-unicode_norm_table.pl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/common/unicode/generate-unicode_norm_table.pl 
b/src/common/unicode/generate-unicode_norm_table.pl
index 3d0c3f0ebc..ed4181fce6 100644
--- a/src/common/unicode/generate-unicode_norm_table.pl
+++ b/src/common/unicode/generate-unicode_norm_table.pl
@@ -40,7 +40,7 @@
 my @characters = ();
 my %character_hash = ();
 open($FH, '<', "$directory/UnicodeData.txt")
-  or die "Could not open UnicodeData.txt: $!.";
+  or die "Could not open $directory/UnicodeData.txt: $!.";
 while (my $line = <$FH>)
 {
 
-- 
2.36.1