Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2023-05-26 Thread Michael Paquier
On Fri, May 26, 2023 at 09:10:17AM -0700, Jacob Champion wrote:
> Can X509_get_signature_nid be moved to the required section up above?
> As it is now, anyone configuring with -Dssl=auto can still pick up a
> 1.0.1 build, which Meson accepts, and then the build fails downstream.
> If we require the function instead, Meson will ignore 1.0.1 (or, for
> -Dssl=openssl, complain before we compile).

Yes, I was wondering a bit if something more should be marked as
required, but just saw more value in removing all references to this
function.  Making the build fail straight when setting up things is OK
by me, but I am not convinced that X509_get_signature_nid() would be
the right choice for the job, as it is an OpenSSL artifact originally,
AFAIK.

The same problem exists with OpenSSL 1.0.0 on HEAD when building with
meson?  CRYPTO_new_ex_data() and SSL_new() exist there.

> t/001_ssltests.pl has a reference to 1.0.1 that can probably be
> entirely deleted:
> 
> # ... (Nor for OpenSSL
> # 1.0.1, but that's old enough that accommodating it isn't worth the 
> cost.)

Not touching that is intentional.  It sounded useful to me as an
historic reference for LibreSSL ;)
--
Michael


signature.asc
Description: PGP signature


Re: Implement generalized sub routine find_in_log for tap test

2023-05-26 Thread vignesh C
On Fri, 26 May 2023 at 04:09, Michael Paquier  wrote:
>
> On Thu, May 25, 2023 at 06:34:20PM +0100, Dagfinn Ilmari Mannsåker wrote:
> > However, none of the other functions in ::Utils know anything about node
> > objects, which makes me think it should be a method on the node itself
> > (i.e. in PostgreSQL::Test::Cluster) instead.  Also, I think log_contains
> > would be a better name, since it just returns a boolean.  The name
> > find_in_log makes me think it would return the log lines matching the
> > pattern, or the position of the match in the file.
> >
> > In that case, the slurp_file() call would have to be fully qualified,
> > since ::Cluster uses an empty import list to avoid polluting the method
> > namespace with imported functions.
>
> Hmm.  connect_ok() and connect_fails() in Cluster.pm have a similar
> log comparison logic, feeding from the offset of a log file.  Couldn't
> you use the same code across the board for everything?  Note that this
> stuff is parameterized so as it is possible to check if patterns match
> or do not match, for multiple patterns.  It seems to me that we could
> use the new log finding routine there as well, so how about extending
> it a bit more?  You would need, at least:
> - One parameter for log entries matching.
> - One parameter for log entries not matching.

I felt adding these to log_contains was making the function slightly
complex with multiple checks. I was not able to make it simple with
the approach I tried. How about having a common function
check_connect_log_contents which has the common log contents check for
connect_ok and connect_fails function like the v2-0002 patch attached.

Regards,
Vignesh
From 1e835b6d032ea380bc213ff130e87fbb250be77f Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Fri, 26 May 2023 20:07:30 +0530
Subject: [PATCH v2 1/2] Remove duplicate find_in_log sub routines from tap
 tests.

Remove duplicate find_in_log sub routines from tap tests.
---
 src/test/authentication/t/003_peer.pl | 17 ++
 src/test/perl/PostgreSQL/Test/Cluster.pm  | 16 +
 src/test/recovery/t/019_replslot_limit.pl | 33 +--
 src/test/recovery/t/033_replay_tsp_drops.pl   | 15 ++---
 .../t/035_standby_logical_decoding.pl | 26 +++
 5 files changed, 33 insertions(+), 74 deletions(-)

diff --git a/src/test/authentication/t/003_peer.pl b/src/test/authentication/t/003_peer.pl
index 3272e52cae..2a035c2d0d 100644
--- a/src/test/authentication/t/003_peer.pl
+++ b/src/test/authentication/t/003_peer.pl
@@ -69,17 +69,6 @@ sub test_role
 	}
 }
 
-# Find $pattern in log file of $node.
-sub find_in_log
-{
-	my ($node, $offset, $pattern) = @_;
-
-	my $log = PostgreSQL::Test::Utils::slurp_file($node->logfile, $offset);
-	return 0 if (length($log) <= 0);
-
-	return $log =~ m/$pattern/;
-}
-
 my $node = PostgreSQL::Test::Cluster->new('node');
 $node->init;
 $node->append_conf('postgresql.conf', "log_connections = on\n");
@@ -91,9 +80,9 @@ reset_pg_hba($node, 'peer');
 # Check if peer authentication is supported on this platform.
 my $log_offset = -s $node->logfile;
 $node->psql('postgres');
-if (find_in_log(
-		$node, $log_offset,
-		qr/peer authentication is not supported on this platform/))
+if ($node->log_contains(
+		qr/peer authentication is not supported on this platform/),
+		$log_offset,)
 {
 	plan skip_all => 'peer authentication is not supported on this platform';
 }
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index baea0fcd1c..f10e44ba51 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2536,6 +2536,22 @@ sub log_content
 }
 
 
+=pod
+
+=item log_contains(pattern, offset)
+
+Find pattern in logfile of node after offset byte.
+
+=cut
+
+sub log_contains
+{
+	my ($self, $pattern, $offset) = @_;
+
+	$offset = 0 unless defined $offset;
+	return PostgreSQL::Test::Utils::slurp_file($self->logfile, $offset) =~ m/$pattern/;
+}
+
 =pod
 
 =item $node->run_log(...)
diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index a1aba16e14..95acf9e357 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -161,8 +161,7 @@ $node_primary->wait_for_catchup($node_standby);
 
 $node_standby->stop;
 
-ok( !find_in_log(
-		$node_standby,
+ok( !$node_standby->log_contains(
 		"requested WAL segment [0-9A-F]+ has already been removed"),
 	'check that required WAL segments are still available');
 
@@ -184,8 +183,8 @@ $node_primary->safe_psql('postgres', "CHECKPOINT;");
 my $invalidated = 0;
 for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; $i++)
 {
-	if (find_in_log(
-			$node_primary, 'invalidating obsolete replication slot "rep1"',
+	if ($node_primary->log_contains(
+			'invalidating obsolete replication slot "rep1"',
 			$logstart))
 	{
 		$invalidated = 1;
@@ -207,7 +206,7 @@ is($result, 

Re: Cleaning up nbtree after logical decoding on standby work

2023-05-26 Thread Peter Geoghegan
On Fri, May 26, 2023 at 4:05 PM Michael Paquier  wrote:
> On Fri, May 26, 2023 at 10:56:53AM +0200, Alvaro Herrera wrote:
> > I can't make up my mind about this.  What do others think?
>
> When I looked at the patch yesterday, my impression was that this
> would be material for v17 as it is refactoring work, not v16.

I'd have thought the subject line "Cleaning up nbtree after logical
decoding on standby work" made it quite clear that this patch was
targeting 16.

It's not refactoring work -- not really. The whole idea of outright
removing the use of P_NEW in nbtree was where I landed with this after
a couple of hours of work. In fact I almost posted a version without
that, though that was worse in every way to my final approach.

I first voiced concerns about this whole area way back on April 4,
which is only 3 days after commit 61b313e4 went in:

https://postgr.es/m/CAH2-Wz=jgryxwm74g1khst0znpunhezyjnvsjno2t3jswtb...@mail.gmail.com

--
Peter Geoghegan




Fix search_path for all maintenance commands

2023-05-26 Thread Jeff Davis
Maintenance commands (ANALYZE, CLUSTER, REFRESH MATERIALIZED VIEW,
REINDEX, and VACUUM) currently run as the table owner, and as a
SECURITY_RESTRICTED_OPERATION.

I propose that we also fix the search_path to "pg_catalog, pg_temp"
when running maintenance commands, for two reasons:

1. Make the behavior of maintenance commands more consistent because
they'd always have the same search_path.

2. Now that we have the MAINTAIN privilege in 16, privileged non-
superusers can execute maintenance commands on other users' tables.
That raises the possibility that a user with MAINTAIN privilege may be
able to use search_path tricks to escalate privileges to the table
owner. The MAINTAIN privilege is only given to highly-privileged users,
but there's still some risk. For this reason I also propose that it
goes in v16.


There's one interesting part: in the code path for creating a
materialized view, ExecCreateTableAs() has this comment:

/*
 * For materialized views, lock down security-restricted operations and
 * arrange to make GUC variable changes local to this command.  This is
 * not necessary for security, but this keeps the behavior similar to 
 * REFRESH MATERIALIZED VIEW.  Otherwise, one could create a
materialized  
 * view not possible to refresh.  
 */

My patch doesn't address this ExecCreateTableAs() check. To do so, we
would need to set the search path after DefineRelation(), otherwise it
will try to create the object in pg_catalog. But DefineRelation() is
happening at execution time, well after we entered the
SECURITY_RESTRICTED_OPERATION, and it doesn't seem good to separate the
SECURITY_RESTRICTED_OPERATION from setting search_path.

This ExecCreateTableAs() check doesn't seem terribly important, so I
don't think it's necessary to improve it as a part of this patch (it
won't be perfect anyway: functions can behave inconsistently for all
kinds of reasons). But I'm open to suggestion if someone knows a good
way to do it.


-- 
Jeff Davis
PostgreSQL Contributor Team - AWS


From 5c6d707a88c887641d551ed9a6983c74d6a82a7a Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Tue, 18 Apr 2023 10:45:51 -0700
Subject: [PATCH] Fix search_path to a safe value during maintenance
 operations.

While executing maintenance operations (ANALYZE, CLUSTER, REFRESH
MATERIALIZED VIEW, REINDEX, or VACUUM), set search_path to
'pg_catalog, pg_temp' to prevent inconsistent behavior.

Functions that are used for functional indexes, in index expressions,
or in materialized views and depend on a different search path must be
declared with CREATE FUNCTION ... SET search_path='...'.

This change addresses a small security risk introduced in commit
60684dd834, where a role with MAINTAIN privileges on a table may be
able to escalate privileges to the table owner. That commit is not yet
part of any release, so no need to backpatch.
---
 contrib/amcheck/verify_nbtree.c  |  2 ++
 src/backend/access/brin/brin.c   |  2 ++
 src/backend/catalog/index.c  |  8 
 src/backend/commands/analyze.c   |  2 ++
 src/backend/commands/cluster.c   |  2 ++
 src/backend/commands/indexcmds.c |  6 ++
 src/backend/commands/matview.c   |  2 ++
 src/backend/commands/vacuum.c|  2 ++
 src/bin/scripts/t/100_vacuumdb.pl|  4 
 src/include/utils/guc.h  |  6 ++
 .../test_oat_hooks/expected/test_oat_hooks.out   |  4 
 src/test/regress/expected/privileges.out | 12 ++--
 src/test/regress/expected/vacuum.out |  2 +-
 src/test/regress/sql/privileges.sql  |  8 
 src/test/regress/sql/vacuum.sql  |  2 +-
 15 files changed, 48 insertions(+), 16 deletions(-)

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index 6979aff727..2b5c8a300b 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -282,6 +282,8 @@ bt_index_check_internal(Oid indrelid, bool parentcheck, bool heapallindexed,
 		SetUserIdAndSecContext(heaprel->rd_rel->relowner,
 			   save_sec_context | SECURITY_RESTRICTED_OPERATION);
 		save_nestlevel = NewGUCNestLevel();
+		SetConfigOption("search_path", GUC_SAFE_SEARCH_PATH, PGC_USERSET,
+		PGC_S_SESSION);
 	}
 	else
 	{
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 3c6a956eaa..11e78cb62c 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -1066,6 +1066,8 @@ brin_summarize_range(PG_FUNCTION_ARGS)
 		SetUserIdAndSecContext(heapRel->rd_rel->relowner,
 			   save_sec_context | SECURITY_RESTRICTED_OPERATION);
 		save_nestlevel = NewGUCNestLevel();
+		

Re: Cleaning up nbtree after logical decoding on standby work

2023-05-26 Thread Michael Paquier
On Fri, May 26, 2023 at 10:56:53AM +0200, Alvaro Herrera wrote:
> I suppose you're not thinking of applying this to current master, but
> instead just leave it for when pg17 opens, right?  I mean, clearly it
> seems far too invasive to put it in after beta1.  On the other hand,
> it's painful to know that we're going to have code that exists only on
> 16 and not any other release, in an area that's likely to have bugs here
> and there, so we're going to need to heavily adjust backpatches for 16
> especially.
> 
> I can't make up my mind about this.  What do others think?

When I looked at the patch yesterday, my impression was that this
would be material for v17 as it is refactoring work, not v16.
--
Michael


signature.asc
Description: PGP signature


Re: Docs: Encourage strong server verification with SCRAM

2023-05-26 Thread Jacob Champion
On Thu, May 25, 2023 at 6:10 PM Jonathan S. Katz  wrote:
> I read through the proposal and like this much better.

Great!

> I rewrote this to just focus on server spoofing that can occur with
> SCRAM authentication and did some wordsmithing. I was torn on keeping in
> the part of offline analysis of an intercepted hash, given one can do
> this with md5 as well, but perhaps it helps elaborate on the consequences.

This part:

> +   To prevent server spoofing from occurring when using
> +   scram-sha-256 password authentication
> +   over a network, you should ensure you are connecting using SSL.

seems to backtrack on the recommendation -- you have to use
sslmode=verify-full, not just SSL, to avoid handing a weak(er) hash to
an untrusted party.

--Jacob




Re: Cleaning up nbtree after logical decoding on standby work

2023-05-26 Thread Peter Geoghegan
On Fri, May 26, 2023 at 2:49 PM Andres Freund  wrote:
> What do we gain by not passing down the heap relation to those places?

Just clearer code, free from noisey changes. Easier when backpatching, too.

> If you're concerned about the efficiency of passing down the parameters,
> I doubt it will make a meaningful difference, because the parameter can
> just stay in the register to be passed down further.

I'm not concerned about the efficiency of passing down heapRel in so
many places.

As things stand, there is no suggestion that any _bt_getbuf() call is
exempt from the requirement to pass down a heaprel -- every caller
(internal and external) goes to the trouble of making sure that they
comply with the apparent requirement to supply a heapRel. In some
cases callers do so just to be able to read the metapage. Even code as
far removed from nbtree as heapam_relation_copy_for_cluster() will now
go to the trouble of passing its own heap rel, just to perform a
CLUSTER-based tuplesort. The relevant tuplesort call site even has
comments that try to justify this approach, with a reference to
_bt_log_reuse_page(). So heapam_handler.c now references a static
helper function private to nbtpage.c -- an obvious modularity
violation.

It's not even the modularity violation itself that bothers me. It's
just 100% unnecessary for heapam_relation_copy_for_cluster() to do any
of this, because there simply isn't going to be a call to
_bt_log_reuse_page() during its cluster tuplesort, no matter what.
This has nothing to do with any underlying implementation detail from
nbtree, or from any other index AM.

-- 
Peter Geoghegan




Re: Cleaning up nbtree after logical decoding on standby work

2023-05-26 Thread Andres Freund
Hi,

On 2023-05-25 18:50:31 -0700, Peter Geoghegan wrote:
> Commit 61b313e4, which prepared index access method code for the
> logical decoding on standbys work, made quite a few mechanical
> changes. Many routines were revised in order to make sure that heaprel
> was available in _bt_getbuf()'s P_NEW (new page allocation) path. But
> this went further than it really had to. Many of the changes to nbtree
> seem excessive.
> 
> We only need a heaprel in those code paths that might end up calling
> _bt_getbuf() with blkno = P_NEW. This includes most callers that pass
> access = BT_WRITE, and all callers that pass access = BT_READ. This
> doesn't have to be haphazard -- there just aren't that many places
> that can allocate new nbtree pages.

What do we gain by not passing down the heap relation to those places?
If you're concerned about the efficiency of passing down the parameters,
I doubt it will make a meaningful difference, because the parameter can
just stay in the register to be passed down further.

Note that I do agree with some aspects of the change for other reasons,
see below...

> It's just page splits, and new
> root page allocations (which are actually a slightly different kind of
> page split). The rule doesn't need to be complicated (to be fair it
> looks more complicated than it really is).
> 
> Attached patch completely removes the changes to _bt_getbuf()'s
> signature from 61b313e4. This is possible without any loss of
> functionality. The patch splits _bt_getbuf () in two: the code that
> handles BT_READ/BT_WRITE stays in _bt_getbuf(), which is now far
> shorter. Handling of new page allocation is moved to a new routine
> I've called _bt_alloc(). This is clearer in general, and makes it
> clear what the rules really are. Any code that might need to call
> _bt_alloc() must be prepared for that, by having a heaprel to pass to
> it (the slightly complicated case is interrupted page splits).

I think it's a very good idea to split the "new page" case off
_bt_getbuf().  We probably should evolve the design in the area, and
that will be easier with such a change.


Greetings,

Andres Freund




Re: Cleaning up nbtree after logical decoding on standby work

2023-05-26 Thread Peter Geoghegan
On Fri, May 26, 2023 at 10:28 AM Peter Geoghegan  wrote:
> I've added several defensive assertions that make it hard to get the
> details wrong. These will catch the issue much earlier than the main
> "heapRel != NULL" assertion in _bt_allocbuf(). So, the rules are
> reasonably straightforward and enforceable.

Though it's not an issue new to 16, or a problem that anybody is
obligated to deal with on this thread, I wonder: Why is it okay that
we're using "rel" (the index rel) for our TestForOldSnapshot() calls
in nbtree, rather than using heapRel? Why wouldn't the rules be the
same as they are for the new code paths needed for logical decoding on
standbys?  (Namely, the "use heapRel, not rel" rule.)

More concretely, I'm pretty sure that RelationIsUsedAsCatalogTable()
(which TestForOldSnapshot() relies on) gives an answer that would
change if we decided to pass heapRel to the main TestForOldSnapshot()
call within _bt_moveright(), instead of doing what we actually do,
which is to just pass it the index rel. I suppose that that
interaction might have been overlooked when bugfix commit bf9a60ee33
first added RelationIsUsedAsCatalogTable() -- since that happened a
couple of months after the initial "snapshot too old" commit went in,
a fix that happened under time pressure.

More generally, the high level rules/invariants that govern when
TestForOldSnapshot() should be called (and with what rel/snapshot)
feel less than worked out. I find it suspicious that there isn't any
attempt to relate TestForOldSnapshot() behaviors to the conceptually
similar PredicateLockPage() behavior. We don't need predicate locks on
internal pages, but TestForOldSnapshot() *does* get called for
internal pages. Many PredicateLockPage() calls happen very close to
TestForOldSnapshot() calls, each of which use the same snapshot -- not
addressing that seems like a glaring omission to me.

Basically it seems like there should be one standard set of rules for
all this stuff. Though it's not the fault of Bertrand or Andres, all
that we have now is two poorly documented sets of rules that partially
overlap. This has long bothered me.

-- 
Peter Geoghegan




Re: Is NEW.ctid usable as table_tuple_satisfies_snapshot?

2023-05-26 Thread Kaiting Chen
On Fri, May 26, 2023 at 12:49 PM Tom Lane  wrote:

> > Just to clarify, there's no way for SELECT FROM foo WHERE ctid = NEW.ctid
> > to return a row that ordinary wouldn't be visible right? There's no magic
> > going on with the qual on ctid that skips a visibility check right?
>
> No, a ctid test isn't magic in that way; nodeTidscan.c applies the
> same snapshot check as any other relation scan.
>

Okay thanks!


Re: pg_collation.collversion for C.UTF-8

2023-05-26 Thread Jeff Davis
On Thu, 2023-05-25 at 14:48 -0400, Tom Lane wrote:
> Jeff Davis  writes:
> > What should we do with locales like C.UTF-8 in both libc and ICU? 
> 
> I vote for passing those to the existing C-specific code paths,

Great, this would be a big step toward solving the ICU usability issues
in this thread:

https://postgr.es/m/000b01d97465%24c34bbd60%2449e33820%24%40pcorp.us

> Probably "C", or "C.anything", or "POSIX", or "POSIX.anything".
> Case-independent might be good, but we haven't accepted such in
> the past, so I don't feel strongly about it.  (Arguably, passing
> lower case "c" to the provider would provide an "out" to anybody
> who dislikes our choices here.)

Patch attached with your suggestions. It's based on the first patch in
the series I posted here:

https://postgr.es/m/a4388fa3acabf7794ac39fdb471ad97eebdfbe11.ca...@j-davis.com

We still need to consider backwards compatibility. If someone has a
collation with locale name C.UTF-8 in an earlier version, any change to
the interpretation of that locale name after an upgrade carries a
corruption risk. The risks are different in ICU vs libc:

  For ICU: iculocale=C in an earlier version was a mistake that must
have been explicitly requested by the user. However, if such a mistake
was made, the indexes would have been created using the ICU root
locale, which is very different from the C locale. So reinterpreting
iculocale=C as memcmp() would be likely to result in index corruption.
Patch 0002 (also based on a patch from the series linked above) solves
this with a pg_upgrade check for iculocale=C in versions 15 and
earlier. The upgrade check is not likely to affect many users, and
those it does affect have a mis-defined collation and would benefit
from the check.

  For libc: this change may affect any user who happened to have
LANG=C.UTF-8 in their environment at initdb time, which is probably a
lot of users, and some buildfarm members. However, the average risk
seems to be much lower, because we've gone a long time with the
assumption that C.UTF-8 has the same behavior as C, and this only
recently came up. Also, I'm not sure how obscure the cases are even if
there is a difference; perhaps they don't often occur in practice? It's
not clear to me how we mitigate this risk further, though.

Regards,
Jeff Davis

From c6721272200c14931ad757185a3aaeb615c432ed Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Mon, 24 Apr 2023 15:46:17 -0700
Subject: [PATCH 1/2] Interpret C locales consistently between ICU and libc.

Treat a locale named C, C.anything, POSIX, or POSIX.anything as
equivalent to the C locale; implemented with built-in semantics
(memcmp() for collation and pg_ascii_*() for ctype).

Such locales are not passed to the provider at all, so have identical
behavior regardless of whether it's declared with provider ICU or
libc.

Previously, only C and POSIX locales had this behavior (not
e.g. "C.UTF-8"), and only if the provider was declared as libc. That
caused problems on libc for locales like C.UTF-8, which may have
subtly different behavior in some versions of libc; and it caused
problems on ICU because newer versions don't recognize C locales.

Discussion: https://postgr.es/m/1559006.1685040...@sss.pgh.pa.us
Discussion: https://postgr.es/m/c840107b-4cb9-c8e9-abb7-1d8c5e0d51df%40enterprisedb.com
Discussion: https://postgr.es/m/87v8hoexdv@news-spur.riddles.org.uk
---
 doc/src/sgml/charset.sgml |   3 +-
 src/backend/commands/collationcmds.c  |  42 +++---
 src/backend/commands/dbcommands.c |  41 +++---
 src/backend/utils/adt/pg_locale.c | 126 +-
 src/backend/utils/init/postinit.c |   4 +-
 src/backend/utils/mb/mbutils.c|   3 +-
 src/include/utils/pg_locale.h |   1 +
 .../regress/expected/collate.icu.utf8.out |   6 +
 src/test/regress/expected/collate.out |   5 +
 src/test/regress/sql/collate.icu.utf8.sql |   4 +
 src/test/regress/sql/collate.sql  |   5 +
 11 files changed, 167 insertions(+), 73 deletions(-)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index ed84465996..8ba3117557 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -136,7 +136,8 @@ initdb --locale=sv_SE

 If you want the system to behave as if it had no locale support,
 use the special locale name C, or equivalently
-POSIX.
+POSIX. An encoding may also be appended, for
+example C.UTF-8.

 

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 2969a2bb21..a451ae8843 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -264,26 +264,38 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
 		 errmsg("parameter \"locale\" must be specified")));
 
-			/*
-			 * During binary upgrade, preserve the 

Re: Cleaning up nbtree after logical decoding on standby work

2023-05-26 Thread Peter Geoghegan
On Fri, May 26, 2023 at 1:56 AM Alvaro Herrera  wrote:
> I suppose you're not thinking of applying this to current master, but
> instead just leave it for when pg17 opens, right?  I mean, clearly it
> seems far too invasive to put it in after beta1.

I was planning on targeting 16 with this. Although I only posted a
patch recently, I complained about the problems in this area shortly
after the code first went in. It's fairly obvious to me that the
changes made to nbtree went quite a bit further than they needed to.
Admittedly that's partly because I'm an expert on this particular
code.

> On the other hand,
> it's painful to know that we're going to have code that exists only on
> 16 and not any other release, in an area that's likely to have bugs here
> and there, so we're going to need to heavily adjust backpatches for 16
> especially.

Right -- it's important to keep things reasonably consistent to ease
backpatching. Though I doubt that changes to nbtree itself will turn
out to be buggy -- with or without my patch. The changes to nbtree
were all pretty mechanical. A little too mechanical, in fact.

As I said already, there just aren't that many ways that new nbtree
pages can come into existence -- it's naturally limited to page splits
(including root page splits), and the case where we need to add a new
root page that's also a leaf page at the point that the first ever
tuple is inserted into the index (before that we just have a metapage)
-- so I only have three _bt_allocbuf() callers to worry about. It's
completely self-evident (even to people that know little about nbtree)
that the only type of page access that could possibly need a heapRel
in the first place is P_NEW (i.e., a new page allocation). Once you
know all that, this situation begins to look much more
straightforward.

Now, to be fair to Bertrand, it *looks* more complicated than it
really is, in large part due to the obscure case where VACUUM finishes
an interrupted page split (during page deletion), which itself goes on
to cause a page split one level up. So it's possible (barely) that
VACUUM will enlarge an index by one page, which requires a heapRel,
just like any other place where an index is enlarged by a page split
(I documented all this in commit 35bc0ec7).

I've added several defensive assertions that make it hard to get the
details wrong. These will catch the issue much earlier than the main
"heapRel != NULL" assertion in _bt_allocbuf(). So, the rules are
reasonably straightforward and enforceable.

-- 
Peter Geoghegan




Re: Implement generalized sub routine find_in_log for tap test

2023-05-26 Thread vignesh C
On Thu, 25 May 2023 at 23:04, Dagfinn Ilmari Mannsåker
 wrote:
>
> vignesh C  writes:
>
> > Hi,
> >
> > The recovery tap test has 4 implementations of find_in_log sub routine
> > for various uses, I felt we can generalize these and have a single
> > function for the same. The attached patch is an attempt to have a
> > generalized sub routine find_in_log which can be used by all of them.
> > Thoughts?
>
> +1 on factoring out this common code. Just a few comments on the 
> implementation.
>
>
> > diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm 
> > b/src/test/perl/PostgreSQL/Test/Utils.pm
> > index a27fac83d2..5c9b2f6c03 100644
> > --- a/src/test/perl/PostgreSQL/Test/Utils.pm
> > +++ b/src/test/perl/PostgreSQL/Test/Utils.pm
> > @@ -67,6 +67,7 @@ our @EXPORT = qw(
> >slurp_file
> >append_to_file
> >string_replace_file
> > +  find_in_log
> >check_mode_recursive
> >chmod_recursive
> >check_pg_config
> > @@ -579,6 +580,28 @@ sub string_replace_file
> >
> >  =pod
> >
> > +
> > +=item find_in_log(node, pattern, offset)
> > +
> > +Find pattern in logfile of node after offset byte.
> > +
> > +=cut
> > +
> > +sub find_in_log
> > +{
> > + my ($node, $pattern, $offset) = @_;
> > +
> > + $offset = 0 unless defined $offset;
> > + my $log = PostgreSQL::Test::Utils::slurp_file($node->logfile);
>
> Since this function is in the same package, there's no need to qualify
> it with the full name.  I know the callers you copied it from did, but
> they wouldn't have had to either, since it's exported by default (in the
> @EXPORT array above), unless the use statement has an explicit argument
> list that excludes it.

I have moved this function to Cluster.pm file now, since it is moved,
I had to qualify the name with the full name.

> > + return 0 if (length($log) <= 0 || length($log) <= $offset);
> > +
> > + $log = substr($log, $offset);
>
> Also, the existing callers don't seem to have got the memo that
> slurp_file() takes an optinal offset parameter, which will cause it to
> seek to that postion before slurping the file, which is more efficient
> than reading the whole file in and substr-ing it.  There's not much
> point in the length checks either, since regex-matching against an empty
> string is very cheap (and if the provide pattern can match the empty
> string the whole function call is rather pointless).
>
> > + return $log =~ m/$pattern/;
> > +}
>
> All in all, it could be simplified to:
>
> sub find_in_log {
> my ($node, $pattern, $offset) = @_;
>
> return slurp_file($node->logfile, $offset) =~ $pattern;
> }

Modified in similar lines

> However, none of the other functions in ::Utils know anything about node
> objects, which makes me think it should be a method on the node itself
> (i.e. in PostgreSQL::Test::Cluster) instead.  Also, I think log_contains
> would be a better name, since it just returns a boolean.  The name
> find_in_log makes me think it would return the log lines matching the
> pattern, or the position of the match in the file.

Modified

> In that case, the slurp_file() call would have to be fully qualified,
> since ::Cluster uses an empty import list to avoid polluting the method
> namespace with imported functions.

Modified.

Thanks for the comments, the attached v2 version patch has the changes
for the same.

Regards,
Vignesh
From 8df06d233de4e5b5abe7a4dd4bc314c215b2dfc2 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Fri, 26 May 2023 20:07:30 +0530
Subject: [PATCH v2] Remove duplicate find_in_log sub routines from tap tests.

Remove duplicate find_in_log sub routines from tap tests.
---
 src/test/authentication/t/003_peer.pl | 17 ++
 src/test/perl/PostgreSQL/Test/Cluster.pm  | 16 +
 src/test/recovery/t/019_replslot_limit.pl | 33 +--
 src/test/recovery/t/033_replay_tsp_drops.pl   | 15 ++---
 .../t/035_standby_logical_decoding.pl | 26 +++
 5 files changed, 33 insertions(+), 74 deletions(-)

diff --git a/src/test/authentication/t/003_peer.pl b/src/test/authentication/t/003_peer.pl
index 3272e52cae..2a035c2d0d 100644
--- a/src/test/authentication/t/003_peer.pl
+++ b/src/test/authentication/t/003_peer.pl
@@ -69,17 +69,6 @@ sub test_role
 	}
 }
 
-# Find $pattern in log file of $node.
-sub find_in_log
-{
-	my ($node, $offset, $pattern) = @_;
-
-	my $log = PostgreSQL::Test::Utils::slurp_file($node->logfile, $offset);
-	return 0 if (length($log) <= 0);
-
-	return $log =~ m/$pattern/;
-}
-
 my $node = PostgreSQL::Test::Cluster->new('node');
 $node->init;
 $node->append_conf('postgresql.conf', "log_connections = on\n");
@@ -91,9 +80,9 @@ reset_pg_hba($node, 'peer');
 # Check if peer authentication is supported on this platform.
 my $log_offset = -s $node->logfile;
 $node->psql('postgres');
-if (find_in_log(
-		$node, $log_offset,
-		qr/peer authentication is not supported on this platform/))
+if ($node->log_contains(
+		qr/peer authentication is not supported 

Re: Cleaning up nbtree after logical decoding on standby work

2023-05-26 Thread Peter Geoghegan
On Fri, May 26, 2023 at 12:46 AM Michael Paquier  wrote:
> Nice cleanup overall.

Thanks.

To be clear, when I said "it would be nice to get rid of P_NEW", what
I meant was that it would be nice to go much further than what I've
done in the patch by getting rid of the general idea of P_NEW. So the
handling of P_NEW at the top of ReadBuffer_common() would be removed,
for example. (Note that nbtree doesn't actually rely on that code,
even now; while its use of the P_NEW constant creates the impression
that it needs that bufmgr.c code, it actually doesn't, even now.)

> +if (XLogStandbyInfoActive() && RelationNeedsWAL(rel))
>  {
> -/* Okay to use page.  Initialize and return it. */
> -_bt_pageinit(page, BufferGetPageSize(buf));
> -return buf;
> +safexid = BTPageGetDeleteXid(page);
> +isCatalogRel = RelationIsAccessibleInLogicalDecoding(heaprel);
> +_bt_log_reuse_page(rel, blkno, safexid, isCatalogRel);
>
> There is only one caller of _bt_log_reuse_page(), so assigning a
> boolean rather than the heap relation is a bit strange to me.  I think
> that calling RelationIsAccessibleInLogicalDecoding() within
> _bt_log_reuse_page() where xlrec_reuse is filled with its data is much
> more natural, like HEAD.

Attached is v2, which deals with this by moving the code from
_bt_log_reuse_page() into _bt_allocbuf() itself -- there is no need
for a separate logging function. This structure seems like a clear
improvement, since such logging is largely the point of having a
separate _bt_allocbuf() function that deals with new page allocations
and requires a valid heapRel in all cases.

v2 also renames "heaprel" to "heapRel" in function signatures, for
consistency with older code that always used that convention.

-- 
Peter Geoghegan


v2-0001-nbtree-Remove-use-of-P_NEW.patch
Description: Binary data


Re: Is NEW.ctid usable as table_tuple_satisfies_snapshot?

2023-05-26 Thread Tom Lane
Kaiting Chen  writes:
> On Fri, May 26, 2023 at 11:34 AM David G. Johnston <
> david.g.johns...@gmail.com> wrote:
>> On Fri, May 26, 2023 at 8:04 AM Kaiting Chen  wrote:
>>> 2. If I lookup the row by its ctid, will the visibility map be consulted.

>> No, but that doesn't seem to be material anyway.  Your user-space pl/pgsql
>> function shouldn't care about such a purely performance optimization.

It'd be a waste of cycles to consult the map in this usage, since the
tuple of interest is surely not all-visible and thus the page couldn't
be either.

> Just to clarify, there's no way for SELECT FROM foo WHERE ctid = NEW.ctid
> to return a row that ordinary wouldn't be visible right? There's no magic
> going on with the qual on ctid that skips a visibility check right?

No, a ctid test isn't magic in that way; nodeTidscan.c applies the
same snapshot check as any other relation scan.

regards, tom lane




Re: Order changes in PG16 since ICU introduction

2023-05-26 Thread Daniel Verite
Jeff Davis wrote:

> > #1
> > 
> > postgres=# create database test1 locale='fr_FR.UTF-8';
> > NOTICE:  using standard form "fr-FR" for ICU locale "fr_FR.UTF-8"
> > ERROR:  new ICU locale (fr-FR) is incompatible with the ICU locale of
> 
> I don't see a problem here. If you specify LOCALE to CREATE DATABASE,
> you should either be using "TEMPLATE template0", or you should be
> expecting an error if the LOCALE doesn't match exactly.
> 
> What would you like to see happen here?

What's odd is that initdb starting in an fr_FR.UTF-8 environment
found that "fr" was the default ICU locale to use, whereas
"create database" reports that "fr" and "fr_FR.UTF-8" refer to
incompatible locales.

To me initdb is wrong when coming up with the less precise "fr"
instead of "fr-FR".

I suggest the attached patch to call uloc_getDefault() instead of the
current code that somehow leaves out the country/region component.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 31156e863b..09a5c98cc0 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2354,42 +2354,13 @@ icu_validate_locale(const char *loc_str)
 }
 
 /*
- * Determine default ICU locale by opening the default collator and reading
- * its locale.
- *
- * NB: The default collator (opened using NULL) is different from the collator
- * for the root locale (opened with "", "und", or "root"). The former depends
- * on the environment (useful at initdb time) and the latter does not.
+ * Determine the default ICU locale
  */
 static char *
 default_icu_locale(void)
 {
 #ifdef USE_ICU
-	UCollator  *collator;
-	UErrorCode	status;
-	const char *valid_locale;
-	char	   *default_locale;
-
-	status = U_ZERO_ERROR;
-	collator = ucol_open(NULL, );
-	if (U_FAILURE(status))
-		pg_fatal("could not open collator for default locale: %s",
- u_errorName(status));
-
-	status = U_ZERO_ERROR;
-	valid_locale = ucol_getLocaleByType(collator, ULOC_VALID_LOCALE,
-		);
-	if (U_FAILURE(status))
-	{
-		ucol_close(collator);
-		pg_fatal("could not determine default ICU locale");
-	}
-
-	default_locale = pg_strdup(valid_locale);
-
-	ucol_close(collator);
-
-	return default_locale;
+	return pg_strdup(uloc_getDefault());
 #else
 	pg_fatal("ICU is not supported in this build");
 #endif


Re: Is NEW.ctid usable as table_tuple_satisfies_snapshot?

2023-05-26 Thread Kaiting Chen
On Fri, May 26, 2023 at 11:34 AM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Fri, May 26, 2023 at 8:04 AM Kaiting Chen  wrote:
>
>> I need to implement a trigger that will behave similarly to a foreign key
>> constraint. The trigger itself will be created with:
>>
>>   CREATE CONSTRAINT TRIGGER ... AFTER INSERT OR UPDATE OF ... ON foo
>>
>> I'd like to skip execution of the trigger logic if, by the time that the
>> trigger
>> is executed, the NEW row is no longer valid.
>>
>
> To be clear, when using deferrable constraints and insert then
> subsequently delete a row you wish to return-early in the insert trigger
> body as if the row had never been inserted in the first place?
>

Yes this is exactly the behavior I'm looking for. Furthermore, if the row
is updated more than once in the same transaction and the constraint has
been deferred, or even if the constraint hasn't been deferred but the row
has been updated since the trigger is queued (for example, if there are
multiple writeable CTEs), then I'd like to skip the trigger body as if that
update didn't occur. Essentially I'm looking for the same behavior as the
builtin triggers that enforce referential integrity.

Specifically:
>
> 1. Is there any possibility that, by the time the trigger function is
> called,
>the NEW row's ctid no longer refers to the row version in NEW, but to an
>entirely different row? For example, is it possible for VACUUM to
> reclaim the
>space at that page number and offset in between the INSERT/UPDATE and
> when
>the trigger function is called?
>
> No.  Transaction and MVCC semantics prevent that from happening.
>

Okay I think this is exactly what I'm looking for.


>> 2. If I lookup the row by its ctid, will the visibility map be consulted.
>>
>
> No, but that doesn't seem to be material anyway.  Your user-space pl/pgsql
> function shouldn't care about such a purely performance optimization.
>

Just to clarify, there's no way for SELECT FROM foo WHERE ctid = NEW.ctid
to return a row that ordinary wouldn't be visible right? There's no magic
going on with the qual on ctid that skips a visibility check right?


Re: ERROR: no relation entry for relid 6

2023-05-26 Thread Tom Lane
Richard Guo  writes:
> On Fri, May 26, 2023 at 6:06 AM Tom Lane  wrote:
>> Based on what deconstruct_distribute_oj_quals is doing, it seems
>> likely to me that there are cases that require ignoring
>> commute_above_r, but I've not tried to devise one.  It'd be good to
>> have one to include in the commit, if we can find one.

> It seems that queries of the second form of identity 3 require ignoring
> commute_above_r.
> select 1 from t t1 left join (t t2 left join t t3 on t2.a = t3.a) on
> t1.a = t2.a;
> When removing t2/t3 join, the clone of 't2.a = t3.a' with t1 join in the
> nulling bitmap would be put back if we do not ignore commute_above_r.
> There is no observable problem though because it would be rejected later
> in subbuild_joinrel_restrictlist, but still I think it should not be put
> back in the first place.

Ah.  I realized that we could make the problem testable by adding
assertions that a joinclause we're not removing doesn't contain
any surviving references to the target rel or join.  That turns
out to fire (without the bug fix) in half a dozen existing test
cases, so I felt that we didn't need to add another one.

I did the other refactoring we discussed and pushed it.
Thanks for the report and review!

regards, tom lane




Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2023-05-26 Thread Jacob Champion
On Thu, May 25, 2023 at 7:09 PM Michael Paquier  wrote:
> Please find attached an updated patch only for the removal of 1.0.1.
> Thanks for the review.

Nice! Sorry about the new complications with LibreSSL. :(

> -  # Functions introduced in OpenSSL 1.0.2. LibreSSL does not have
> +  # Function introduced in OpenSSL 1.0.2. LibreSSL does not have
># SSL_CTX_set_cert_cb().
> -  AC_CHECK_FUNCS([X509_get_signature_nid SSL_CTX_set_cert_cb])
> +  AC_CHECK_FUNCS([SSL_CTX_set_cert_cb])

Can X509_get_signature_nid be moved to the required section up above?
As it is now, anyone configuring with -Dssl=auto can still pick up a
1.0.1 build, which Meson accepts, and then the build fails downstream.
If we require the function instead, Meson will ignore 1.0.1 (or, for
-Dssl=openssl, complain before we compile).

t/001_ssltests.pl has a reference to 1.0.1 that can probably be
entirely deleted:

# ... (Nor for OpenSSL
# 1.0.1, but that's old enough that accommodating it isn't worth the cost.)

Thanks,
--Jacob




Re: Is NEW.ctid usable as table_tuple_satisfies_snapshot?

2023-05-26 Thread David G. Johnston
On Fri, May 26, 2023 at 8:04 AM Kaiting Chen  wrote:

> I need to implement a trigger that will behave similarly to a foreign key
> constraint. The trigger itself will be created with:
>
>   CREATE CONSTRAINT TRIGGER ... AFTER INSERT OR UPDATE OF ... ON foo
>
> I'd like to skip execution of the trigger logic if, by the time that the
> trigger
> is executed, the NEW row is no longer valid.
>

To be clear, when using deferrable constraints and insert then subsequently
delete a row you wish to return-early in the insert trigger body as if the
row had never been inserted in the first place?

The table_tuple_satisfies_snapshot() function is obviously unavailable from
> PL/pgSQL. Is this a reliable substitute?
>

If a row is not visible at the time the trigger fires the SELECT will not
return it.  When the deferred trigger eventually fires the inserted row
will no longer exist and SELECT will not return it.

The above is not tested; assuming it does indeed behave that way I would
expect the behavior to be deterministic given that there are no concurrency
issues involved.


>
>   IF NOT EXISTS (SELECT FROM foo WHERE ctid = NEW.ctid) THEN
> RETURN NULL;
>   END IF;
>
> Specifically:
>
> 1. Is there any possibility that, by the time the trigger function is
> called,
>the NEW row's ctid no longer refers to the row version in NEW, but to an
>entirely different row? For example, is it possible for VACUUM to
> reclaim the
>space at that page number and offset in between the INSERT/UPDATE and
> when
>the trigger function is called?
>

No.  Transaction and MVCC semantics prevent that from happening.


> 2. If I lookup the row by its ctid, will the visibility map be consulted.
>

No, but that doesn't seem to be material anyway.  Your user-space pl/pgsql
function shouldn't care about such a purely performance optimization.

David J.


Is NEW.ctid usable as table_tuple_satisfies_snapshot?

2023-05-26 Thread Kaiting Chen
I need to implement a trigger that will behave similarly to a foreign key
constraint. The trigger itself will be created with:

  CREATE CONSTRAINT TRIGGER ... AFTER INSERT OR UPDATE OF ... ON foo

I'd like to skip execution of the trigger logic if, by the time that the
trigger
is executed, the NEW row is no longer valid. For a normal FOREIGN KEY
trigger,
this is handled in ri_triggers.c by:

  /*
* We should not even consider checking the row if it is no longer valid,
* since it was either deleted (so the deferred check should be skipped)
* or updated (in which case only the latest version of the row should be
* checked).  Test its liveness according to SnapshotSelf.  We need pin
* and lock on the buffer to call HeapTupleSatisfiesVisibility.  Caller
* should be holding pin, but not lock.
*/
  if (!table_tuple_satisfies_snapshot(trigdata->tg_relation, newslot,
SnapshotSelf))
  return PointerGetDatum(NULL);

The table_tuple_satisfies_snapshot() function is obviously unavailable from
PL/pgSQL. Is this a reliable substitute?

  IF NOT EXISTS (SELECT FROM foo WHERE ctid = NEW.ctid) THEN
RETURN NULL;
  END IF;

Specifically:

1. Is there any possibility that, by the time the trigger function is
called,
   the NEW row's ctid no longer refers to the row version in NEW, but to an
   entirely different row? For example, is it possible for VACUUM to
reclaim the
   space at that page number and offset in between the INSERT/UPDATE and
when
   the trigger function is called?

2. If I lookup the row by its ctid, will the visibility map be consulted.
And if
   so, is there any material difference between what that would do vs what
   table_tuple_satisfies_snapshot() does?

Thanks!


Re: vector search support

2023-05-26 Thread Jonathan S. Katz

On 4/26/23 9:31 AM, Giuseppe Broccolo wrote:

Hi Nathan,

I find the patches really interesting. Personally, as Data/MLOps 
Engineer, I'm involved in a project where we use embedding techniques to 
generate vectors from documents, and use clustering and kNN searches to 
find similar documents basing on spatial neighbourhood of generated 
vectors.


Thanks! This seems to be a pretty common use-case these days.

We finally opted for ElasticSearch as search engine, considering that it 
was providing what we needed:


* support to store dense vectors
* support for kNN searches (last version of ElasticSearch allows this)


I do want to note that we can implement indexing techniques with GiST 
that perform K-NN searches with the "distance" support function[1], so 
adding the fundamental functions to help with this around known vector 
search techniques could add this functionality. We already have this 
today with "cube", but as Nathan mentioned, it's limited to 100 dims.


An internal benchmark showed us that we were able to achieve the 
expected performance, although we are still lacking some points:


* clustering of vectors (this has to be done outside the search engine, 
using DBScan for our use case)


From your experience, have you found any particular clustering 
algorithms better at driving a good performance/recall tradeoff?


* concurrency in updating the ElasticSearch indexes storing the dense 
vectors


I do think concurrent updates of vector-based indexes is one area 
PostgreSQL can ultimately be pretty good at, whether in core or in an 
extension.


I found these patches really interesting, considering that they would 
solve some of open issues when storing dense vectors. Index support 
would help a lot with searches though.


Great -- thanks for the feedback,

Jonathan

[1] https://www.postgresql.org/docs/devel/gist-extensibility.html


OpenPGP_signature
Description: OpenPGP digital signature


Re: vector search support

2023-05-26 Thread Jonathan S. Katz

On 5/25/23 1:48 PM, Oliver Rice wrote:

A nice side effect of using the float8[] to represent vectors is that it 
allows for vectors of different sizes to coexist in the same column.


We most frequently see (pgvector) vector columns being used for storing 
ML embeddings. Given that different models produce embeddings with a 
different number of dimensions, the need to specify a vector’s size in 
DDL tightly couples the schema to a single model. Support for variable 
length vectors would be a great way to decouple those concepts. It would 
also be a differentiating feature from existing vector stores.


I hadn't thought of that, given most of what I've seen (or at least my 
personal bias in designing systems) is you keep a vector of one 
dimensionality in a column. But this sounds like where having native 
support in a variable array would help.


One drawback is that variable length vectors complicates indexing for 
similarity search because similarity measures require vectors of 
consistent length. Partial indexes are a possible solution to that challenge


Yeah, that presents a challenge. This may also be an argument for a 
vector data type, since that would eliminate the need to check for 
consistent dimensionality on the indexing.


Jonathan


OpenPGP_signature
Description: OpenPGP digital signature


Re: vector search support

2023-05-26 Thread Jonathan S. Katz

Hi,

On 4/21/23 8:07 PM, Nathan Bossart wrote:

Attached is a proof-of-concept/work-in-progress patch set that adds
functions for "vectors" repreѕented with one-dimensional float8 arrays.
These functions may be used in a variety of applications, but I am
proposing them with the AI/ML use-cases in mind.  I am posting this early
in the v17 cycle in hopes of gathering feedback prior to PGCon.


Thanks for proposing this. Looking forward to discussing more in person. 
There's definitely demand to use PostgreSQL to store / search over 
vector data, and I do think we need to improve upon this in core.



With the accessibility of AI/ML tools such as large language models (LLMs),
there has been a demand for storing and manipulating high-dimensional
vectors in PostgreSQL, particularly around nearest-neighbor queries.  Many
of these vectors have more than 1500 dimensions. 


1536 seems to be a popular one from LLMs, but I've been seeing much 
higher dimensionality (8K, 16K etc). My hunch is that at a practical 
level, apps are going to favor data sets / sources that use a reduced 
dimensionality, but I wouldn't be shocked if we see vectors of all sizes.



The cube extension [0]
provides some of the distance functionality (e.g., taxicab, Euclidean, and
Chebyshev), but it is missing some popular functions (e.g., cosine
similarity, dot product), and it is limited to 100 dimensions.  We could
extend cube to support more dimensions, but this would require reworking
its indexing code and filling in gaps between the cube data type and the
array types.  For some previous discussion about using the cube extension
for this kind of data, see [1].


I've stared at the cube code quite a bit over the past few months. There 
are definitely some clever methods in it for handling searches over 
(now) lower dimensionality data, but I generally agree we should add 
functionality that's specific to ARRAY types.


I'll start making specific comments on the patches below.



float8[] is well-supported and allows for effectively unlimited dimensions
of data.  float8 matches the common output format of many AI embeddings,
and it allows us or extensions to implement indexing methods around these
functions.  This patch set does not yet contain indexing support, but we
are exploring using GiST or GIN for the use-cases in question.  It might
also be desirable to add support for other linear algebra operations (e.g.,
operations on matrices).  The attached patches likely only scratch the
surface of the "vector search" use-case.

The patch set is broken up as follows:

  * 0001 does some minor refactoring of dsqrt() in preparation for 0002.


This seems pretty benign and may as well do anyway, though we may need 
to expand on it based on comments on next patch. Question on:


+static inline float8
+float8_sqrt(const float8 val)
+{
+   float8  result;
+
+   if (unlikely(val < 0))

Should this be:

  if (unlikely(float8_lt(val, 0))

Similarly:

+   if (unlikely(result == 0.0) && val != 0.0)

  if (unlikely(float8_eq(result,0.0)) && float8_ne(val, 0.0))



  * 0002 adds several vector-related functions, including distance functions
and a kmeans++ implementation.


Nice. Generally I like this patch. The functions seems to match the most 
commonly used vector distance functions I'm seeing, and it includes a 
function that can let a user specify a constraint on an ARRAY column so 
they can ensure it contains valid vectors.


While I think supporting float8 is useful, I've been seeing a mix of 
data types in the different AI/ML vector embeddings, i.e. float4 / 
float8. Additionally, it could be helpful to support integers as well, 
particularly based on some of the dimensionality reduction techniques 
I've seen. I think this holds double true for kmeans, which is often 
used in those calculations.


I'd suggest ensure these functions support:

* float4, float8
* int2, int4, int8

There's probably some nuance of how we document this too, given our 
docs[1] specify real / double precision, and smallint, int, bigint.


(Separately, we mention the int2/int4/int8 aliases in [1], but not 
float4/float8, which seems like a small addition we should make).


If you agree, I'd be happy to review more closely once that's implemented.

Other things:

* kmeans -- we're using kmeans++, should the function name reflect that? 
Do you think we could end up with a different kmeans algo in the future? 
Maybe we allow the user to specify the kmeans algo from the function 
name (with the default / only option today being kmeans++)?



  * 0003 adds support for optionally using the OpenBLAS library, which is an
implementation of the Basic Linear Algebra Subprograms [2]
specification.  Basic testing with this library showed a small
performance boost, although perhaps not enough to justify giving this
patch serious consideration.


It'd be good to see what else we could use OpenBLAS with. Maybe that's a 
discussion for PGCon.



Of 

Re: running logical replication as the subscription owner

2023-05-26 Thread Masahiko Sawada
On Thu, May 25, 2023 at 5:41 PM Amit Kapila  wrote:
>
> On Thu, May 25, 2023 at 12:33 PM Masahiko Sawada  
> wrote:
> >
> > On Tue, May 23, 2023 at 8:21 PM Amit Kapila  wrote:
> > >
> > > On Mon, May 22, 2023 at 6:06 PM Masahiko Sawada  
> > > wrote:
> > > >
> > > > Thank you for updating the patch! Here are review comments:
> > > >
> > > > +   /*
> > > > +* Make sure that the copy command runs as the table owner, 
> > > > unless
> > > > +* the user has opted out of that behaviour.
> > > > +*/
> > > > +   run_as_owner = MySubscription->runasowner;
> > > > +   if (!run_as_owner)
> > > > +   SwitchToUntrustedUser(rel->rd_rel->relowner, );
> > > > +
> > > > /* Now do the initial data copy */
> > > > PushActiveSnapshot(GetTransactionSnapshot());
> > > >
> > > > I think we should switch users before the acl check in
> > > > LogicalRepSyncTableStart().
> > > >
> > >
> > > Agreed, we should check acl with the user that is going to perform
> > > operations on the target table. BTW, is it okay to perform an
> > > operation on the system table with the changed user as that would be
> > > possible with your suggestion (see replorigin_create())?
> >
> > Do you see any problem in particular?
> >
> > As per the documentation, pg_replication_origin_create() is only
> > allowed to the superuser by default, but in CreateSubscription() a
> > non-superuser (who has pg_create_subscription privilege) can call
> > replorigin_create().
>
> Nothing in particular but it seems a bit odd to perform operations on
> catalog tables with some other user table owners when that was not the
> actual intent of this option.
>
> > OTOH, we don't necessarily need to switch to the
> > table owner user for checking ACL and RLS. We can just pass either
> > table owner OID or subscription owner OID to pg_class_aclcheck() and
> > check_enable_rls() without actually switching the user.
> >
>
> I think that would be better.

Agreed.

I've attached the updated patch. Please review it.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


v3-0001-Fix-tablesync-worker-missed-using-run_as_owner-op.patch
Description: Binary data


Re: Adding SHOW CREATE TABLE

2023-05-26 Thread Andrew Dunstan


On 2023-05-25 Th 09:23, Jelte Fennema wrote:

On Mon, 22 May 2023 at 13:52, Andrew Dunstan  wrote:

A performant server side set of functions would be written in C and follow the 
patterns in ruleutils.c.

We have lots of DDL ruleutils in our Citus codebase:
https://github.com/citusdata/citus/blob/main/src/backend/distributed/deparser/citus_ruleutils.c

I'm pretty sure we'd be happy to upstream those if that meant, we
wouldn't have to update them for every postgres release.

We also have the master_get_table_ddl_events UDF, which does what SHOW
CREATE TABLE would do.



Sounds promising. I'd love to see a patch.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: ERROR: no relation entry for relid 6

2023-05-26 Thread Richard Guo
On Fri, May 26, 2023 at 6:06 AM Tom Lane  wrote:

> 1. The test case you give upthread only needs to ignore commute_below_l,
> that is it still passes without the lines
>
> +join_plus_commute = bms_add_members(join_plus_commute,
> +
> removable_sjinfo->commute_above_r);
>
> Based on what deconstruct_distribute_oj_quals is doing, it seems
> likely to me that there are cases that require ignoring
> commute_above_r, but I've not tried to devise one.  It'd be good to
> have one to include in the commit, if we can find one.


It seems that queries of the second form of identity 3 require ignoring
commute_above_r.

select 1 from t t1 left join (t t2 left join t t3 on t2.a = t3.a) on
t1.a = t2.a;

When removing t2/t3 join, the clone of 't2.a = t3.a' with t1 join in the
nulling bitmap would be put back if we do not ignore commute_above_r.
There is no observable problem though because it would be rejected later
in subbuild_joinrel_restrictlist, but still I think it should not be put
back in the first place.


> 2. We could go a little further in refactoring, specifically the
> computation of joinrelids could be moved into remove_rel_from_query,
> since remove_useless_joins itself doesn't need it.  Not sure if
> this'd be an improvement or not.  Certainly it'd be a loser if
> remove_useless_joins grew a reason to need the value; but I can't
> foresee a reason for that to happen --- remove_rel_from_query is
> where basically all the work is going on.


+1 to move the computation of joinrelids into remove_rel_from_query().
We also do that in join_is_removable().

BTW, I doubt that the check of 'sjinfo->ojrelid != 0' in
remove_useless_joins() is needed.  Since we've known that the join is
removable, I think we can just Assert that sjinfo->ojrelid cannot be 0.

--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -91,8 +91,8 @@ restart:

/* Compute the relid set for the join we are considering */
joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
-   if (sjinfo->ojrelid != 0)
-   joinrelids = bms_add_member(joinrelids, sjinfo->ojrelid);
+   Assert(sjinfo->ojrelid != 0);
+   joinrelids = bms_add_member(joinrelids, sjinfo->ojrelid);

remove_rel_from_query(root, innerrelid, sjinfo, joinrelids);


> 3. I called the parameter removable_sjinfo because sjinfo is already
> used within another loop, leading to a shadowed-parameter warning.
> In a green field we'd probably have called the parameter sjinfo
> and used another name for the loop's local variable, but that would
> make the patch bulkier without adding anything.  Haven't decided
> whether to rename before commit or leave it as-is.


Personally I prefer to rename before commit but I'm OK with both ways.

Thanks
Richard


Re: PG 16 draft release notes ready

2023-05-26 Thread Alvaro Herrera
On 2023-May-25, Laurenz Albe wrote:

> @@ -1335,7 +1335,7 @@ Author: Peter Eisentraut 
>  
>  
>  
> -Add Windows process the system collations (Jose Santamaria Flecha)
> +Add Windows to process the system collations (Jose Santamaria Flecha)
>  ADD THIS?
>  
>  

Hmm, not sure this describes the change properly.  Maybe something like
"On Windows, system locales are now imported automatically.  Previously,
only ICU locales were imported automatically on Windows."

Maybe the Windows improvements should be listed together in a separate
section.

Also, "Juan José Santamaría Flecha" is the spelling Juan José uses for
his name.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: pg_get_indexdef() modification to use TxnSnapshot

2023-05-26 Thread vignesh C
On Fri, 26 May 2023 at 15:25, shveta malik  wrote:
>
> I have attempted to convert pg_get_indexdef() to use
> systable_beginscan() based on transaction-snapshot rather than using
> SearchSysCache(). The latter does not have any old info and thus
> provides only the latest info as per the committed txns, which could
> result in errors in some scenarios. One such case is mentioned atop
> pg_dump.c. The patch is an attempt to fix the same pg_dump's issue.
> Any feedback is welcome.
>
> There is a long list of pg_get_* functions which use SearchSysCache()
> and thus may expose similar issues. I can give it a try to review the
> possibility of converting all of them. Thoughts?

I could reproduce this issue in HEAD with pg_dump dumping a database
having a table and an index like:
create table t1(c1 int);
create index idx1 on t1(c1);

Steps to reproduce:
a) ./pg_dump -d postgres -f dump.txt -- Debug this statement and hold
a breakpoint at getTables function just before it takes lock on the
table t1 b) when the breakpoint is hit, drop the index idx1 c)
Continue the pg_dump execution after dropping the index d) pg_dump
calls pg_get_indexdef to get the index definition e) since
pg_get_indexdef->pg_get_indexdef uses SearchSysCache1 which uses the
latest transaction, it will not get the index information as it sees
the latest catalog snapshot(it will not get the index information). e)
pg_dump will get an empty index statement in this case like:
-
--
-- Name: idx; Type: INDEX; Schema: public; Owner: vignesh
--

;
-

This issue is solved using shveta's patch as it registers the
transaction snapshot and calls systable_beginscan which will not see
the transactions that were started after pg_dump's transaction(the
drop index will not be seen) and gets the index definition as expected
like:
-
--
-- Name: idx; Type: INDEX; Schema: public; Owner: vignesh
--

CREATE INDEX idx ON public.t1 USING btree (c1);
-

Regards,
Vignesh




pg_get_indexdef() modification to use TxnSnapshot

2023-05-26 Thread shveta malik
I have attempted to convert pg_get_indexdef() to use
systable_beginscan() based on transaction-snapshot rather than using
SearchSysCache(). The latter does not have any old info and thus
provides only the latest info as per the committed txns, which could
result in errors in some scenarios. One such case is mentioned atop
pg_dump.c. The patch is an attempt to fix the same pg_dump's issue.
Any feedback is welcome.

There is a long list of pg_get_* functions which use SearchSysCache()
and thus may expose similar issues. I can give it a try to review the
possibility of converting all of them. Thoughts?

thanks
Shveta


v1-0001-pg_get_indexdef-modification-to-use-TxnSnapshot.patch
Description: Binary data


Re: Cleaning up nbtree after logical decoding on standby work

2023-05-26 Thread Alvaro Herrera
On 2023-May-25, Peter Geoghegan wrote:

> Attached patch completely removes the changes to _bt_getbuf()'s
> signature from 61b313e4.

I suppose you're not thinking of applying this to current master, but
instead just leave it for when pg17 opens, right?  I mean, clearly it
seems far too invasive to put it in after beta1.  On the other hand,
it's painful to know that we're going to have code that exists only on
16 and not any other release, in an area that's likely to have bugs here
and there, so we're going to need to heavily adjust backpatches for 16
especially.

I can't make up my mind about this.  What do others think?

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"La primera ley de las demostraciones en vivo es: no trate de usar el sistema.
Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)




Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2023-05-26 Thread Daniel Gustafsson
> On 26 May 2023, at 04:08, Michael Paquier  wrote:
> On Thu, May 25, 2023 at 10:16:27AM +0200, Daniel Gustafsson wrote:

>> The changes to the Windows buildsystem worry me a bit, but they don't move 
>> the
>> goalposts in either direction wrt to LibreSSL on Windows so for the purpose 
>> of
>> this patch we don't need to do more.  Longer term we should either make
>> LibreSSL work on Windows builds, or explicitly not support it on Windows.
> 
> Yes, I was wondering what to do about that in the long term.  With the
> argument that the MSVC scripts may be gone over meson, it is not
> really appealing to dig into that.

Yeah, let's revisit this during the v17 cycle when the future of these scripts
will become clearer.

> Something that was missing in 0001 is the bump of OPENSSL_API_COMPAT
> in meson.build.  This was in 0002.
> 
> Please find attached an updated patch only for the removal of 1.0.1.
> Thanks for the review. 

LGTM.

--
Daniel Gustafsson





Re: Cleaning up nbtree after logical decoding on standby work

2023-05-26 Thread Michael Paquier
On Thu, May 25, 2023 at 06:50:31PM -0700, Peter Geoghegan wrote:
> It's possible that Bertand would have done it this way to begin with
> were it not for the admittedly pretty bad nbtree convention around
> P_NEW. It would be nice to get rid of P_NEW in the near future, too --
> I gather that there was discussion of that in the context of recent
> work in this area.

Nice cleanup overall.

+if (XLogStandbyInfoActive() && RelationNeedsWAL(rel))
 {
-/* Okay to use page.  Initialize and return it. */
-_bt_pageinit(page, BufferGetPageSize(buf));
-return buf;
+safexid = BTPageGetDeleteXid(page);
+isCatalogRel = RelationIsAccessibleInLogicalDecoding(heaprel);
+_bt_log_reuse_page(rel, blkno, safexid, isCatalogRel);

There is only one caller of _bt_log_reuse_page(), so assigning a
boolean rather than the heap relation is a bit strange to me.  I think
that calling RelationIsAccessibleInLogicalDecoding() within
_bt_log_reuse_page() where xlrec_reuse is filled with its data is much
more natural, like HEAD.  One argument in favor of HEAD is that it is
not possible to pass down a wrong value for isCatalogRel, but your
patch would make that possible.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-05-26 Thread Peter Smith
On Thu, May 25, 2023 at 6:59 PM Melih Mutlu  wrote:
>
> Hi,
>
>
> Peter Smith , 24 May 2023 Çar, 05:59 tarihinde şunu 
> yazdı:
>>
>> Hi, and thanks for the patch! It is an interesting idea.
>>
>> I have not yet fully read this thread, so below are only my first
>> impressions after looking at patch 0001. Sorry if some of these were
>> already discussed earlier.
>>
>> TBH the patch "reuse-workers" logic seemed more complicated than I had
>> imagined it might be.
>
>
> If you mean patch 0001 by the patch "reuse-workers", most of the complexity 
> comes with some refactoring to split apply worker and tablesync worker paths. 
> [1]
> If you mean the whole patch set, then I believe it's because reusing 
> replication slots also requires having a proper snapshot each time the worker 
> moves to a new table. [2]
>

Yes, I was mostly referring to the same as point 1 below about patch
0001. I guess I just found the concept of mixing A) launching TSW (via
apply worker) with B) reassigning TSW to another relation (by the TSW
battling with its peers) to be a bit difficult to understand. I
thought most of the refactoring seemed to arise from choosing to do it
that way.

>>
>>
>> 1.
>> IIUC with patch 0001, each/every tablesync worker (a.k.a. TSW) when it
>> finishes dealing with one table then goes looking to find if there is
>> some relation that it can process next. So now every TSW has a loop
>> where it will fight with every other available TSW over who will get
>> to process the next relation.
>>
>> Somehow this seems all backwards to me. Isn't it strange for the TSW
>> to be the one deciding what relation it would deal with next?
>>
>> IMO it seems more natural to simply return the finished TSW to some
>> kind of "pool" of available workers and the main Apply process can
>> just grab a TSW from that pool instead of launching a brand new one in
>> the existing function process_syncing_tables_for_apply(). Or, maybe
>> those "available" workers can be returned to a pool maintained within
>> the launcher.c code, which logicalrep_worker_launch() can draw from
>> instead of launching a whole new process?
>>
>> (I need to read the earlier posts to see if these options were already
>> discussed and rejected)
>
>
> I think ([3]) relying on a single apply worker for the assignment of several 
> tablesync workers might bring some overhead, it's possible that some 
> tablesync workers wait in idle until the apply worker assigns them something. 
> OTOH yes, the current approach makes tablesync workers race for a new table 
> to sync.

Yes, it might be slower than the 'patched' code because "available"
workers might be momentarily idle while they wait to be re-assigned to
the next relation. We would need to try it to find out.

> TBF both ways might be worth discussing/investigating more, before deciding 
> which way to go.

+1. I think it would be nice to see POC of both ways for benchmark
comparison because IMO performance is not the only consideration --
unless there is an obvious winner, then they need to be judged also by
the complexity of the logic, the amount of code that needed to be
refactored, etc.

>
>>
>> 2.
>> AFAIK the thing that identifies a  tablesync worker is the fact that
>> only TSW will have a 'relid'.
>>
>> But it feels very awkward to me to have a TSW marked as "available"
>> and yet that LogicalRepWorker must still have some OLD relid field
>> value lurking (otherwise it will forget that it is a "tablesync"
>> worker!).
>>
>> IMO perhaps it is time now to introduce some enum 'type' to the
>> LogicalRepWorker. Then an "in_use" type=TSW would have a valid 'relid'
>> whereas an "available" type=TSW would have relid == InvalidOid.
>
>
> Hmm, relid will be immediately updated when the worker moves to a new table. 
> And the time between finishing sync of a table and finding a new table to 
> sync should be minimal. I'm not sure how having an old relid for such a small 
> amount of time can do any harm.

There is no "harm", but it just didn't feel right to make the
LogicalRepWorker to transition through some meaningless state
("available" for re-use but still assigned some relid), just because
it was easy to do it that way. I think it is more natural for the
'relid' to be valid only when it is valid for the worker and to be
InvalidOid when it is not valid. --- Maybe this gripe would become
more apparent if the implementation use the "free-list" idea because
then you would have a lot of bogus relids assigned to the workers of
that list for longer than just a moment.

>
>>
>> 3.
>> Maybe I am mistaken, but it seems the benchmark results posted are
>> only using quite a small/default values for
>> "max_sync_workers_per_subscription", so I wondered how those results
>> are affected by increasing that GUC. I think having only very few
>> workers would cause more sequential processing, so conveniently the
>> effect of the patch avoiding re-launch might be seen in the best
>> possible light. OTOH, using more TSW 

BF animal dikkop reported a failure in 035_standby_logical_decoding

2023-05-26 Thread Yu Shi (Fujitsu)
Hi hackers,

I saw a buildfarm failure on "dikkop"[1]. It failed in
035_standby_logical_decoding.pl, because the slots row_removal_inactiveslot and
row_removal_activeslot are not invalidated after vacuum.

regress_log_035_standby_logical_decoding:
```
[12:15:05.943](4.442s) not ok 22 - inactiveslot slot invalidation is logged 
with vacuum on pg_class
[12:15:05.945](0.003s) 
[12:15:05.946](0.000s) #   Failed test 'inactiveslot slot invalidation is 
logged with vacuum on pg_class'
#   at t/035_standby_logical_decoding.pl line 238.
[12:15:05.948](0.002s) not ok 23 - activeslot slot invalidation is logged with 
vacuum on pg_class
[12:15:05.949](0.001s) 
[12:15:05.950](0.000s) #   Failed test 'activeslot slot invalidation is logged 
with vacuum on pg_class'
#   at t/035_standby_logical_decoding.pl line 244.
[13:38:26.977](5001.028s) # poll_query_until timed out executing this query:
# select (confl_active_logicalslot = 1) from pg_stat_database_conflicts where 
datname = 'testdb'
# expecting this output:
# t
# last actual query output:
# f
# with stderr:
[13:38:26.980](0.003s) not ok 24 - confl_active_logicalslot updated
[13:38:26.982](0.002s) 
[13:38:26.982](0.000s) #   Failed test 'confl_active_logicalslot updated'
#   at t/035_standby_logical_decoding.pl line 251.
Timed out waiting confl_active_logicalslot to be updated at 
t/035_standby_logical_decoding.pl line 251.
```

035_standby_logical_decoding.pl:
```
# This should trigger the conflict
$node_primary->safe_psql(
'testdb', qq[
  CREATE TABLE conflict_test(x integer, y text);
  DROP TABLE conflict_test;
  VACUUM pg_class;
  INSERT INTO flush_wal DEFAULT VALUES; -- see create table flush_wal
]);

$node_primary->wait_for_replay_catchup($node_standby);

# Check invalidation in the logfile and in pg_stat_database_conflicts
check_for_invalidation('row_removal_', $logstart, 'with vacuum on pg_class');
```

Is it possible that the vacuum command didn't remove tuples and then the
conflict was not triggered? It seems we can't confirm this because there is not
enough information. Maybe "vacuum verbose" can be used to provide more
information.

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dikkop=2023-05-24%2006%3A16%3A18

Regards,
Shi Yu


Re: testing dist tarballs

2023-05-26 Thread Peter Eisentraut

On 24.05.23 23:24, Andres Freund wrote:

First thing I noticed that 'make dist' doesn't work in a vpath, failing in a
somewhat obscure way (likely because in a vpath build the the copy from the
source dir doesn't include GNUMakefile). Do we expect it to work?


I don't think so.


Separately, it's somewhat confusing that we include errcodes.h etc in
src/backend/utils, rather than its final location, in src/include/utils. It
works, even without perl, because copying the file doesn't require perl, it's
just generating it...


The "copying" is actually a symlink, right?  I don't think we want to 
ship symlinks in the tarball?