from:"Jelte Fennema"

Re: Improve EXPLAIN output for multicolumn B-Tree Index

2024-06-24 Thread Jelte Fennema-Nio

On Mon, 24 Jun 2024 at 13:02, Matthias van de Meent
 wrote:
> It does not really behave similar: index scan keys (such as the
> id3=101 scankey) don't require visibility checks in the btree code,
> while the Filter condition _does_ require a visibility check, and
> delegates the check to the table AM if the scan isn't Index-Only, or
> if the VM didn't show all-visible during the check.

Any chance you could point me in the right direction for the
code/docs/comment about this? I'd like to learn a bit more about why
that is the case, because I didn't realize visibility checks worked
differently for index scan keys and Filter keys.

> Furthermore, the index could use the scankey to improve the number of
> keys to scan using "skip scans"; by realising during a forward scan
> that if you've reached tuple (1, 2, 3) and looking for (1, _, 1) you
> can skip forward to (1, 3, _), rather than having to go through tuples
> (1, 2, 4), (1, 2, 5), ... (1, 2, n). This is not possible for
> INCLUDE-d columns, because their datatypes and structure are opaque to
> the index AM; the AM cannot assume anything about or do anything with
> those values.

Does Postgres actually support this currently? I thought skip scans
were not available (yet).

> I don't want A to to be the plan, while showing B' to the user, as the
> performance picture for the two may be completely different. And, as I
> mentioned upthread, the differences between AMs in the (lack of)
> meaning in index column order also makes it quite wrong to generally
> separate prefixes equalities from the rest of the keys.

Yeah, that makes sense. These specific explain lines probably
only/mostly make sense for btree. So yeah we'd want the index AM to be
able to add some stuff to the explain plan.

> As you can see, there's a huge difference in performance. Putting both
> non-bound and "normal" filter clauses in the same Filter clause will
> make it more difficult to explain performance issues based on only the
> explain output.

Fair enough, that's of course the main point of this patch in the
first place: being able to better interpret the explain plan when you
don't have access to the schema. Still I think Filter is the correct
keyword for both, so how about we make it less confusing by making the
current "Filter" more specific by calling it something like "Non-key
Filter" or "INCLUDE Filter" and then call the other something like
"Index Filter" or "Secondary Bound Filter".

Re: Improve EXPLAIN output for multicolumn B-Tree Index

2024-06-24 Thread Jelte Fennema-Nio

+1 for the idea.

On Mon, 24 Jun 2024 at 11:11, Matthias van de Meent
 wrote:
> I think this is too easy to confuse with the pre-existing 'Filter'
> condition, which you'll find on indexes with INCLUDE-d columns or
> filters on non-index columns.

Why not combine them? And both call them Filter? In a sense this
filtering acts very similar to INCLUDE based filtering (for btrees at
least). Although I might be wrong about that, because when I try to
confirm the same perf using the following script I do get quite
different timings (maybe you have an idea what's going on here). But
even if it does mean something slightly different perf wise, I think
using Filter for both is unlikely to confuse anyone. Since, while
allowed, it seems extremely unlikely in practice that someone will use
the same column as part of the indexed columns and as part of the
INCLUDE-d columns (why would you store the same info twice).

CREATE TABLE test (id1 int, id2 int, id3 int, value varchar(32));
INSERT INTO test (SELECT i % 10, i % 1000, i, 'hello' FROM
generate_series(1,100) s(i));
vacuum freeze test;
CREATE INDEX test_idx_include ON test(id1, id2) INCLUDE (id3);
ANALYZE test;
EXPLAIN (VERBOSE, ANALYZE, BUFFERS) SELECT id1, id3 FROM test WHERE
id1 = 1 AND id3 = 101;
CREATE INDEX test_idx ON test(id1, id2, id3);
ANALYZE test;
EXPLAIN (VERBOSE, ANALYZE, BUFFERS) SELECT id1, id3 FROM test WHERE
id1 = 1 AND id3 = 101;

  QUERY PLAN
───
 Index Only Scan using test_idx_include on public.test
(cost=0.42..3557.09 rows=1 width=8) (actual time=0.708..6.639 rows=1
loops=1)
   Output: id1, id3
   Index Cond: (test.id1 = 1)
   Filter: (test.id3 = 101)
   Rows Removed by Filter: 9
   Heap Fetches: 0
   Buffers: shared hit=1 read=386
 Query Identifier: 471139784017641093
 Planning:
   Buffers: shared hit=8 read=1
 Planning Time: 0.091 ms
 Execution Time: 6.656 ms
(12 rows)

Time: 7.139 ms
  QUERY PLAN
─
 Index Only Scan using test_idx on public.test  (cost=0.42..2591.77
rows=1 width=8) (actual time=0.238..2.110 rows=1 loops=1)
   Output: id1, id3
   Index Cond: ((test.id1 = 1) AND (test.id3 = 101))
   Heap Fetches: 0
   Buffers: shared hit=1 read=386
 Query Identifier: 471139784017641093
 Planning:
   Buffers: shared hit=10 read=1
 Planning Time: 0.129 ms
 Execution Time: 2.128 ms
(10 rows)

Time: 2.645 ms

Re: Partial aggregates pushdown

2024-06-24 Thread Jelte Fennema-Nio

On Sun, 23 Jun 2024 at 10:24, fujii.y...@df.mitsubishielectric.co.jp
 wrote:
> I attached the POC patch(the second one) which supports avg(int8) whose 
> standard format is _numeric type.

Okay, very interesting. So instead of defining the
serialization/deserialization functions to text/binary, you serialize
the internal type to an existing non-internal type, which then in turn
gets serialized to text. In the specific case of avg(int8) this is
done to an array of numeric (with length 2).

> However, I do not agree that I modify the internal transtype to the native 
> data type. The reasons are the following three.
> 1. Generality
> I believe we should develop a method that can theoretically apply to any 
> aggregate function, even if we cannot implement it immediately. However, I 
> find it exceptionally challenging to demonstrate that any internal transtype 
> can be universally converted to a native data type for aggregate functions 
> that are parallel-safe under the current parallel query feature. 
> Specifically, proving this for user-defined aggregate functions presents a 
> significant difficulty in my view.
> On the other hand, I think that the usage of export and import functions can 
> theoretically apply to any aggregate functions.

The only thing required when doing CREATE TYPE is having an INPUT and
OUTPUT function for the type, which (de)serialize the type to text
format. As far as I can tell by definition that requirement should be
fine for any aggregates that we can do partial aggregation pushdown
for. To clarify I'm not suggesting we should change any of the
internal representation of the type for the current internal
aggregates. I'm suggesting we create new native types (i.e. do CREATE
TYPE) for those internal representations and then use the name of that
type instead of internal.

> 2. Amount of codes.
> It could need more codes.

I think it would be about the same as your proposal. Instead of
serializing to an intermediary existing type you serialize to string
straight away. I think it would probably be slightly less code
actually, since you don't have to add code to handle the new
aggpartialexportfn and aggpartialimportfn columns.

> 3. Concern about performance
> I'm concerned that altering the current internal data types could impact 
> performance.

As explained above in my proposal all the aggregation code would
remain unchanged, only new native types will be added. Thus
performance won't be affected, because all aggregation code will be
the same. The only thing that's changed is that the internal type now
has a name and an INPUT and OUTPUT function.

> I know. In my proposal, the standard format is not seriarized data by 
> serialfn, instead, is text or other native data type.
> Just to clarify, I'm writing this to avoid any potential misunderstanding.

Ah alright, that definitely clarifies the proposal. I was looking at
the latest patch file on the thread and that one was still using
serialfn. Your new one indeed doesn't, so this is fine.

> Basically responding to your advice,
> for now, I prepare two POC patches.

Great! I definitely think this makes the review/discussion easier.

> The first supports case a, currently covering only avg(int4) and other 
> aggregate functions that do not require import or export functions, such as 
> min, max, and count.

Not a full review but some initial notes:

1. Why does this patch introduce aggpartialpushdownsafe? I'd have
expected that any type with a non-pseudo/internal type as aggtranstype
would be safe to partially push down.
2. It seems the int4_avg_import function shouldn't be part of this
patch (but maybe of a future one).
3. I think splitting this patch in two pieces would make it even
easier to review: First adding support for the new PARTIAL_AGGREGATE
keyword (adds the new feature). Second, using PARTIAL_AGGREGATE in
postgres_fdw (starts using the new feature). Citus would only need the
first patch not the second one, so I think the PARTIAL_AGGREGATE
feature has merit to be added on its own, even without the
postgres_fdw usage.
4. Related to 3, I think it would be good to have some tests of
PARTIAL_AGGREGATE that don't involve postgres_fdw at all. I also
spotted some comments too that mention FDW, even though they apply to
the "pure" PARTIAL_AGGREGATE code.
5. This comment now seems incorrect:
-* Apply the agg's finalfn if one is provided, else return transValue.
+* If the agg's finalfn is provided and PARTIAL_AGGREGATE keyword is
+* not specified, apply the agg's finalfn.
+* If PARTIAL_AGGREGATE keyword is specified and the transValue type
+* is internal, apply the agg's serialfn. In this case the agg's
+* serialfn must not be invalid. Otherwise return transValue.

6. These errors are not on purpose afaict (if they are a comment in
the test would be good to explain why)

+SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b ORDER BY 1;
+ERROR:  could not connect to server "loopback"
+DETAIL:

Re: [EXTERNAL] Re: Add non-blocking version of PQcancel

2024-06-23 Thread Jelte Fennema-Nio

On Sat, 22 Jun 2024 at 17:00, Alexander Lakhin  wrote:
> @@ -2775,6 +2775,7 @@
>   SET LOCAL statement_timeout = '10ms';
>   select count(*) from ft1 CROSS JOIN ft2 CROSS JOIN ft4 CROSS JOIN ft5; -- 
> this takes very long
>   ERROR:  canceling statement due to statement timeout
> +WARNING:  could not get result of cancel request due to timeout
>   COMMIT;

As you describe it, this problem occurs when the cancel request is
processed by the foreign server, before the query is actually
received. And postgres then (rightly) ignores the cancel request. I'm
not sure if the existing test is easily changeable to fix this. The
only thing that I can imagine works in practice is increasing the
statement_timeout, e.g. to 100ms.

> I also came across another failure of the test:
> @@ -2774,7 +2774,7 @@
>   BEGIN;
>   SET LOCAL statement_timeout = '10ms';
>   select count(*) from ft1 CROSS JOIN ft2 CROSS JOIN ft4 CROSS JOIN ft5; -- 
> this takes very long
> -ERROR:  canceling statement due to statement timeout
> +ERROR:  canceling statement due to user request
>   COMMIT;
>
> which is reproduced with a sleep added here:
> @@ -1065,6 +1065,7 @@ exec_simple_query(const char *query_string)
>*/
>   parsetree_list = pg_parse_query(query_string);
> +pg_usleep(11000);

After investigating, I realized this actually exposes a bug in our
statement timeout logic. It has nothing to do with posgres_fdw and
reproduces with any regular postgres query too. Attached is a patch
that fixes this issue. This one should probably be backported.

v1-0001-Do-not-reset-statement_timeout-indicator-outside-.patch
Description: Binary data

cfbot update: Using GitHub for patch review

2024-06-21 Thread Jelte Fennema-Nio

I recently got write access to the cfbot repo[1] and machine from
Thomas. And I deployed a few improvements this week. The most
significant one is that it is now much easier to use GitHub as part of
your patch review workflow.

On the cfbot website[2] there's now a "D" (diff) link next to each
commit fest entry.  A good example of such a link would be the one for
my most recent commitfest entry[3]. There is a separate commit for
each patch file and those commits contain the "git format-patch"
metadata. (this is not done using git am, but using git mailinfo +
patch + sed, because git am is horrible at resolving conflicts)

The killer feature (imho) of GitHub diffs over looking at patch files:
You can press the "Expand up"/"Expand down" buttons on the left of the
diff to see some extra context that the patch file doesn't contain.

You can also add the cfbot repo as a remote to your local git
repository. That way you don't have to manually download patches and
apply them to your local checkout anymore:

# Add the remote
git remote add -f cfbot https://github.com/postgresql-cfbot/postgresql.git
# make future git pulls much quicker (optional)
git maintenance start
# check out a commitfest entry
git checkout cf/5065

P.S. Suggestions for further improvements are definitely appreciated.
We're currently already working on better integration between the
commitfest app website and the cfbot website.

P.P.S The "D" links don't work for patches that need to be rebased
since before I deployed this change, but that problem should fix
itself with time.

[1]: https://github.com/macdice/cfbot
[2]: http://cfbot.cputube.org/
[3]: https://github.com/postgresql-cfbot/postgresql/compare/cf/5065~1...cf/5065

Re: Small LO_BUFSIZE slows down lo_import and lo_export in libpq

2024-06-21 Thread Jelte Fennema-Nio

On Fri, 21 Jun 2024 at 10:46, Дмитрий Питаков  wrote:
> Why not increase the buffer size?

I think changing the buffer size sounds like a reasonable idea, if
that speeds stuff up. But I think it would greatly help your case if
you showed the perf increase using a simple benchmark, especially if
people could run this benchmark on their own machines to reproduce.

libpq: Fix lots of discrepancies in PQtrace

2024-06-21 Thread Jelte Fennema-Nio

After initially trying to add trace support for
StartupMessage/SSLRequest/GSSENCRequest[1] I realized there were many
more cases where PQtrace was not correctly tracing messages, or not
even tracing them at all. These patches fix all of the issues that I
was able to find.

0001 is some cleanup after f4b54e1ed9
0002 does some preparatory changes for 0004 & 0007

All the others improve the tracing, and apart from 0004 and 0007
depending on 0002, none of these patches depend on each other.
Although you could argue that 0007 and 0008 depend on 0006, because
without 0006 the code added by 0007 and 0008 won't ever really be
executed.

To test you can add a PQreset(conn) call to the start of the
test_cancel function in:
src/test/modules/libpq_pipeline/libpq_pipeline.c.

And then run:
ninja -C build all install-quiet &&
build/src/test/modules/libpq_pipeline/libpq_pipeline cancel
'port=5432' -t test.trace

And then look at the top of test.trace

[1]: 
https://www.postgresql.org/message-id/CAGECzQTTN5aGqtDaRifJXPyd_O5qHWQcOxsHJsDSVNqMugGNEA%40mail.gmail.com
From 8567a09b4e1816ccdd58786f912ee055accfbc93 Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio 
Date: Fri, 21 Jun 2024 01:05:24 +0200
Subject: [PATCH v1 1/8] libpq: Use PqMsg macros in fe-auth.c

In f4b54e1ed9 macros were introduced for protocol characters to improve
readability of code. But these new macros were not used everywhere in
fe-auth.c by accident. This fixes the final three places.
---
 src/interfaces/libpq/fe-auth.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/interfaces/libpq/fe-auth.c b/src/interfaces/libpq/fe-auth.c
index 256f596e6bb..b5b367f24d0 100644
--- a/src/interfaces/libpq/fe-auth.c
+++ b/src/interfaces/libpq/fe-auth.c
@@ -124,7 +124,7 @@ pg_GSS_continue(PGconn *conn, int payloadlen)
 		 * first or subsequent packet, just send the same kind of password
 		 * packet.
 		 */
-		if (pqPacketSend(conn, 'p',
+		if (pqPacketSend(conn, PqMsg_GSSResponse,
 		 goutbuf.value, goutbuf.length) != STATUS_OK)
 		{
 			gss_release_buffer(_s, );
@@ -324,7 +324,7 @@ pg_SSPI_continue(PGconn *conn, int payloadlen)
 		 */
 		if (outbuf.pBuffers[0].cbBuffer > 0)
 		{
-			if (pqPacketSend(conn, 'p',
+			if (pqPacketSend(conn, PqMsg_GSSResponse,
 			 outbuf.pBuffers[0].pvBuffer, outbuf.pBuffers[0].cbBuffer))
 			{
 FreeContextBuffer(outbuf.pBuffers[0].pvBuffer);
@@ -683,7 +683,7 @@ pg_SASL_continue(PGconn *conn, int payloadlen, bool final)
 		/*
 		 * Send the SASL response to the server.
 		 */
-		res = pqPacketSend(conn, 'p', output, outputlen);
+		res = pqPacketSend(conn, PqMsg_SASLResponse, output, outputlen);
 		free(output);
 
 		if (res != STATUS_OK)
@@ -754,7 +754,7 @@ pg_password_sendauth(PGconn *conn, const char *password, AuthRequest areq)
 		default:
 			return STATUS_ERROR;
 	}
-	ret = pqPacketSend(conn, 'p', pwd_to_send, strlen(pwd_to_send) + 1);
+	ret = pqPacketSend(conn, PqMsg_PasswordMessage, pwd_to_send, strlen(pwd_to_send) + 1);
 	free(crypt_pwd);
 	return ret;
 }

base-commit: c5c82123d3050c3a5eef0f51e9783f1cc5004ba0
-- 
2.34.1

From 1187d694bbff1e1c13041bb361db906a0cb329f1 Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio 
Date: Fri, 21 Jun 2024 10:45:38 +0200
Subject: [PATCH v1 2/8] libpq: Add suppress argument to pqTraceOutputNchar

In future commits we're going to trace authentication related messages.
Some of these messages contain challenge bytes as part of a
challenge-responese flow. Since these bytes are different for every
connection we want normalized them when the PQTRACE_REGRESS_MODE trace
flag is set. This commit modifies pqTraceOutputNchar to take a suppress
argument, which makes it possible to do so.
---
 src/interfaces/libpq/fe-trace.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/src/interfaces/libpq/fe-trace.c b/src/interfaces/libpq/fe-trace.c
index d7a61ec9cc1..6ea7bb821a1 100644
--- a/src/interfaces/libpq/fe-trace.c
+++ b/src/interfaces/libpq/fe-trace.c
@@ -185,12 +185,19 @@ pqTraceOutputString(FILE *pfdebug, const char *data, int *cursor, bool suppress)
  * pqTraceOutputNchar: output a string of exactly len bytes message to the log
  */
 static void
-pqTraceOutputNchar(FILE *pfdebug, int len, const char *data, int *cursor)
+pqTraceOutputNchar(FILE *pfdebug, int len, const char *data, int *cursor, bool suppress)
 {
 	int			i,
 next;			/* first char not yet printed */
 	const char *v = data + *cursor;
 
+	if (suppress)
+	{
+		fprintf(pfdebug, " ''");
+		*cursor += len;
+		return;
+	}
+
 	fprintf(pfdebug, " \'");
 
 	for (next = i = 0; i < len; ++i)
@@ -246,7 +253,7 @@ pqTraceOutput_Bind(FILE *f, const char *message, int *cursor)
 		nbytes = pqTraceOutputInt32(f, message, cursor, false);
 		if (nbytes == -1)
 			continue;
-		pqTraceOutputNchar(f, nbytes, message, cursor);
+		pqTraceOutputNchar(f, nbytes, message, cursor, false);
 	}
 
 	nparams = pqTraceOutputInt16(f, message, cursor);
@@ -283,

Re: Extension security improvement: Add support for extensions with an owned schema

2024-06-20 Thread Jelte Fennema-Nio

On Wed, 19 Jun 2024 at 17:22, David G. Johnston
 wrote:
>
> On Wed, Jun 19, 2024 at 8:19 AM Jelte Fennema-Nio  wrote:
>
>>
>> Because part of it would
>> only be relevant once we support upgrading from PG18. So for now the
>> upgrade_code I haven't actually run.
>
>
> Does it apply against v16?  If so, branch off there, apply it, then upgrade 
> from the v16 branch to master.

I realized it's possible to do an "upgrade" with pg_upgrade from v17
to v17. So I was able to test both the pre and post PG18 upgrade logic
manually by changing the version in this line:

if (fout->remoteVersion >= 18)

As expected the new pg_upgrade code was severely broken. Attached is a
new patch where the pg_upgrade code now actually works.

v3-0001-Add-support-for-extensions-with-an-owned-schema.patch
Description: Binary data

Re: Extension security improvement: Add support for extensions with an owned schema

2024-06-19 Thread Jelte Fennema-Nio

On Wed, 19 Jun 2024 at 19:55, Tom Lane  wrote:
>
> Jelte Fennema-Nio  writes:
> > On Wed, 19 Jun 2024 at 17:28, Robert Haas  wrote:
> >> I have a feeling that this might be pretty annoying to implement, and
> >> if that is true, then never mind.
>
> > Based on a quick look it's not trivial, but also not super bad.
> > Basically it seems like in src/backend/catalog/namespace.c, every time
> > we loop over activeSearchPath and CurrentExtensionObject is set, then
> > we should skip any item that's not stored in pg_catalog, unless
> > there's a DEPENDENCY_EXTENSION pg_depend entry for the item (and that
> > pg_depend entry references the extension or the requires list).
>
> We could change the lookup rules that apply during execution of
> an extension script, but we already restrict search_path at that
> time so I'm not sure how much further this'd move the goalposts.

The point I tried to make in my first email is that this restricted
search_path you mention, is not very helpful at preventing privilege
escalations. Since it's often possible for a non-superuser to create
functions in one of the schemas in this search_path, e.g. by having
the non-superuser first create the schema & create some functions in
it, and then asking the DBA/control plane to create the extension in
that schema.

My patch tries to address that problem by creating the schema of the
extension during extension creation, and failing if it already exists.
Thus implicitly ensuring that a non-superuser cannot mess with the
schema.

The proposal from Robert tries to instead address by changing the
lookup rules during execution of an extension script to be more strict
than they would be outside of it (i.e. even if a function/operator
matches search_path it might still not be picked).

> The *real* problem IMO is that if you create a PL function or
> (old-style) SQL function within an extension, execution of that
> function is not similarly protected.

That's definitely a big problem too, and that's the problem that [1]
tries to fix. But first the lookup in extension scripts would need to
be made secure, because it doesn't seem very useful (security wise) to
use the same lookup mechanism in functions as we do in extension
scripts, if the lookup in extension scripts is not secure in the first
place. I think the method of making the lookup secure in my patch
would transfer over well, because it adds a way for a safe search_path
to exist, so all that's needed is for the PL function to use that
search_path. Robbert's proposal would be more difficult I think. When
executing a PL function from an extension we'd need to use the same
changed lookup rules that we'd use during the extension script of that
extension. I think that should be possible, but it's definitely more
involved.

[1]: 
https://www.postgresql.org/message-id/flat/CAE9k0P%253DFcZ%253Dao3ZpEq29BveF%252B%253D27KBcRT2HFowJxoNCv02dHLA%2540mail.gmail.com

Re: Extension security improvement: Add support for extensions with an owned schema

2024-06-19 Thread Jelte Fennema-Nio

On Wed, 19 Jun 2024 at 17:28, Robert Haas  wrote:
> But I wonder if there might also be another possible approach: could
> we, somehow, prevent object references in extension scripts from
> resolving to anything other than the system catalogs and the contents
> of that extension?

This indeed does sound like the behaviour that pretty much every
existing extension wants to have. One small addition/clarification
that I would make to your definition: fully qualified references to
other objects should still be allowed.

I do think, even if we have this, there would be other good reasons to
use "owned schemas" for extension authors. At least the following two:
1. To have a safe search_path that can be used in SET search_path on a
function (see also [1]).
2. To make it easy for extension authors to avoid conflicts with other
extensions/UDFs.

> Perhaps with a control file setting to specify a
> list of trusted extensions which we're also allowed to reference?

I think we could simply use the already existing "requires" field from
the control file. i.e. you're allowed to reference only your own
extension

> I have a feeling that this might be pretty annoying to implement, and
> if that is true, then never mind.

Based on a quick look it's not trivial, but also not super bad.
Basically it seems like in src/backend/catalog/namespace.c, every time
we loop over activeSearchPath and CurrentExtensionObject is set, then
we should skip any item that's not stored in pg_catalog, unless
there's a DEPENDENCY_EXTENSION pg_depend entry for the item (and that
pg_depend entry references the extension or the requires list).

There's quite a few loops over activeSearchPath in namespace.c, but
they all seem pretty similar. So while a bunch of code would need to
be changed, the changes could probably be well encapsulated in a
function.

[1]: 
https://www.postgresql.org/message-id/flat/00d8f046156e355ec0eb49585408bafc8012e4a5.camel%40j-davis.com#3ad7a8073d5ef50cfe44e305c38d

Re: Extension security improvement: Add support for extensions with an owned schema

2024-06-19 Thread Jelte Fennema-Nio

Attached is an updated version of this patch that fixes a few issues
that CI reported (autoconf, compiler warnings and broken docs).

I also think I changed the pg_upgrade to do the correct thing, but I'm
not sure how to test this (even manually). Because part of it would
only be relevant once we support upgrading from PG18. So for now the
upgrade_code I haven't actually run.
From 37b6fa45bf877bcc15ce76d7e342199b7ca76d50 Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio 
Date: Fri, 31 May 2024 02:04:31 -0700
Subject: [PATCH v2] Add support for extensions with an owned schema

Writing the sql migration scripts that are run by CREATE EXTENSION and
ALTER EXTENSION UPDATE are security minefields for extension authors.
One big reason for this is that search_path is set to the schema of the
extension while running these scripts, and thus if a user with lower
privileges can create functions or operators in that schema they can do
all kinds of search_path confusion attacks if not every function and
operator that is used in the script is schema qualified. While doing
such schema qualification is possible, it relies on the author to never
make a mistake in any of the sql files. And sadly humans have a tendency
to make mistakes.

This patch adds a new "owned_schema" option to the extension control
file that can be set to true to indicate that this extension wants to
own the schema in which it is installed. What that means is that the
schema should not exist before creating the extension, and will be
created during extension creation. This thus gives the extension author
an easy way to use a safe search_path, while still allowing all objects
to be grouped together in a schema. The implementation also has the
pleasant side effect that the schema will be automatically dropped when
the extension is dropped.
---
 doc/src/sgml/extend.sgml  |  13 ++
 doc/src/sgml/ref/create_extension.sgml|   3 +-
 src/backend/commands/extension.c  | 141 +-
 src/backend/utils/adt/pg_upgrade_support.c|  20 ++-
 src/bin/pg_dump/pg_dump.c |  15 +-
 src/bin/pg_dump/pg_dump.h |   1 +
 src/include/catalog/pg_extension.h|   1 +
 src/include/catalog/pg_proc.dat   |   2 +-
 src/include/commands/extension.h  |   4 +-
 src/test/modules/test_extensions/Makefile |   7 +-
 .../expected/test_extensions.out  |  50 +++
 src/test/modules/test_extensions/meson.build  |   4 +
 .../test_extensions/sql/test_extensions.sql   |  27 
 .../test_ext_owned_schema--1.0.sql|   2 +
 .../test_ext_owned_schema.control |   5 +
 ...test_ext_owned_schema_relocatable--1.0.sql |   2 +
 .../test_ext_owned_schema_relocatable.control |   4 +
 17 files changed, 248 insertions(+), 53 deletions(-)
 create mode 100644 src/test/modules/test_extensions/test_ext_owned_schema--1.0.sql
 create mode 100644 src/test/modules/test_extensions/test_ext_owned_schema.control
 create mode 100644 src/test/modules/test_extensions/test_ext_owned_schema_relocatable--1.0.sql
 create mode 100644 src/test/modules/test_extensions/test_ext_owned_schema_relocatable.control

diff --git a/doc/src/sgml/extend.sgml b/doc/src/sgml/extend.sgml
index 218940ee5ce..36dc692abef 100644
--- a/doc/src/sgml/extend.sgml
+++ b/doc/src/sgml/extend.sgml
@@ -809,6 +809,19 @@ RETURNS anycompatible AS ...
   
  
 
+ 
+  owned_schema (boolean)
+  
+   
+An extension is owned_schema if it requires a
+new dedicated schema for its objects. Such a requirement can make
+security concerns related to search_path injection
+much easier to reason about. The default is false,
+i.e., the extension can be installed into an existing schema.
+   
+  
+ 
+
  
   schema (string)
   
diff --git a/doc/src/sgml/ref/create_extension.sgml b/doc/src/sgml/ref/create_extension.sgml
index ca2b80d669c..6e767c7bfca 100644
--- a/doc/src/sgml/ref/create_extension.sgml
+++ b/doc/src/sgml/ref/create_extension.sgml
@@ -102,7 +102,8 @@ CREATE EXTENSION [ IF NOT EXISTS ] extension_name

 The name of the schema in which to install the extension's
 objects, given that the extension allows its contents to be
-relocated.  The named schema must already exist.
+relocated.  The named schema must already exist if the extension's
+control file does not specify owned_schema.
 If not specified, and the extension's control file does not specify a
 schema either, the current default object creation schema is used.

diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 1643c8c69a0..c9586ad62a1 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -83,6 +83,8 @@ typedef struct ExtensionControlFile
 	 * MODULE_PATHNAME */
 	char	   *comment;		/* comme

Re: RFC: adding pytest as a supported test framework

2024-06-17 Thread Jelte Fennema-Nio

On Sun, 16 Jun 2024 at 23:04, Robert Haas  wrote:
>
> On Sat, Jun 15, 2024 at 6:00 PM Melanie Plageman
>  wrote:
> > Writing a new test framework in a popular language that makes it more
> > likely that more people will write more tests and test infrastructure
> > is such a completely different thing than suggesting we rewrite
> > Postgres in Rust that I feel that this comparison is unfair and,
> > frankly, a distraction from the discussion at hand.
>
> I don't really agree with this.
> 
> it's not *as* problematic if some tests are
> written in one language and others in another as it would be if
> different parts of the backend used different languages, and it
> wouldn't be *as* hard if at some point we decided we wanted to convert
> all remaining code to the new language.

Honestly, it sounds like you actually do agree with each other. It
seems you interpreted Melanie her use of "thing" as "people wanting to
use Rust/Python in the Postgres codebase" while I believe she probably
meant "all the problems and effort involved in the task making that
possible''. And afaict from your response, you definitely agree that
making it possible to use Rust in our main codebase is a lot more
difficult than for Python for our testing code.

> But, would I feel respected and valued as a participant in
> that project? Would I have to use weird tools and follow arcane and
> frustrating processes? If I did, *that* would make me give up. I don't
> want to say that the choice of programming language doesn't matter at
> all, but it seems to me that it might matter more because it's a
> symptom of being unwilling to modernize things rather than for its own
> sake.

I can personally definitely relate to this (although I wouldn't frame
it as strongly as you did). Postgres development definitely requires
weird tools and arcane processes (imho) when compared to most other
open source projects. The elephant in the room is of course the
mailing list development flow. But we have some good reasons for using
that[^1]. But most people have some limit on the amount of weirdness
they are willing to accept when wanting to contribute, and the mailing
list pushes us quite close to that limit for a bunch of people
already. Any additional weird tools/arcane processes might push some
people over that limit.

We've definitely made big improvements in modernizing our development
workflow over the last few years though: We now have CI (cfbot), a
modern build system (meson), and working autoformatting (requiring
pgindent on commit). These improvements have been very noticeable to
me, and I think we should continue such efforts. I think allowing
people to write tests in Python is one of the easier improvements that
we can make.

[^1]: Although I think those reasons apply much less to the
documentation, maybe we could allow github contributions for just
those.

Re: RFC: adding pytest as a supported test framework

2024-06-15 Thread Jelte Fennema-Nio

On Sat, 15 Jun 2024 at 19:27, Robert Haas  wrote:
> This surprises me. I agree that the current state of affairs is kind
> of annoying, but the contents of regress_log_whatever are usually
> quite long. Printing all of that out to standard output seems like
> it's just going to flood the terminal with output. I don't think I'd
> be a fan of that change.

I think at the very least the locations of the different logs should
be listed in the output.

Re: RFC: adding pytest as a supported test framework

2024-06-15 Thread Jelte Fennema-Nio

On Sat, 15 Jun 2024 at 16:45, Andrew Dunstan  wrote:
> I see the fact that we stash the output in a file as a feature. Without
> it, capturing failure information in the buildfarm client would be more
> difficult, especially if there are multiple failures. So this is
> actually something I think we would need for any alternative framework.

I indeed heard that the current behaviour was somehow useful to the
buildfarm client.

> Maybe we need an environment setting that would output the
> regress_log_00whatever file to stderr on failure.  That should be pretty
> easy to arrange in the END handler for PostgreSQL::Test::Utils.

That sounds awesome! But I'm wondering: do we really need a setting to
enable/disable that? Can't we always output it to stderr on failure?
If we output the log both to stderr and as a file, would that be fine
for the build farm? If not, a setting should work. (but I'd prefer the
default for that setting to be on in that case, it seems much easier
to turn it off in the buildfarm client, instead of asking every
developer to turn the feature on)

> > 4. Pytest has autodiscovery of test files and functions, so we
> > probably wouldn't have to specify all of the exact test files anymore
> > in the meson.build files.
>
>
> I find this comment a bit ironic. We don't need to do that with the
> Makefiles, and the requirement to do so was promoted as a meson feature
> rather than a limitation, ISTR.

Now, I'm very curious why that would be considered a feature. I
certainly have had many cases where I forgot to add the test file to
the meson.build file.

> > Regarding 2, there are ~150 checks that are using a suboptimal way of
> > testing for a comparison. Mostly a lot that could use "like(..., ...)"
> > instead of "ok(... ~= ...)"
> > ❯ grep '\bok(.*=' **.pl | wc -l
> > 151
>
>
> Well, let's fix those. I would be tempted to use cmp_ok() for just about
> all of them.

Sounds great to me.

> But the fact that Test::More has a handful of test primitives rather
> than just one strikes me as a relatively minor complaint.

It is indeed a minor paper cut, but paper-cuts add up.

Honestly, my primary *objective* complaint about our current test
suite, is that when a test fails, it's very often impossible for me to
understand why the test failed, by only looking at the output of
"meson test". I think logging the postgres log to stderr for Perl, as
you proposed, would significantly improve that situation. I think the
only thing that we cannot get from Perl Test::More that we can from
pytest, is the fancy recursive introspection of the expression that
pytest shows on error.

Apart from that my major *subjective* complaint is that I very much
dislike writing Perl code. I'm slow at writing it and I don't (want
to) improve at it because I don't have reasons to use it except for
Postgres tests. So currently I'm not really incentivised to write more
tests than the bare minimum, help improve the current test tooling, or
add new testing frameworks for things we currently cannot test.
Afaict, there's a significant part of our current community who feel
the same way (and I'm pretty sure every sub-30 year old person who
newly joins the community would feel the exact same way too).

As a project I think we would like to have more tests, and to have
more custom tooling to test things that we currently cannot (e.g.
oauth or manually messing with the wire-protocol). I think the only
way to achieve that is by encouraging more people to work on these
things. I very much appreciate that you and others are improving our
Perl tooling, because that makes our current tests easier to work
with. But I don't think it significantly increases the willingness to
write tests or test-tooling for people that don't want to write Perl
in the first place.

So I think the only way to get more people involved in contributing
tests and test-tooling is by allowing testing in another language than
Perl (but also still allow writing tests in Perl). Even if that means
that we have two partially-overlapping test frameworks, that are both
easy to use for different things. In my view that's even a positive
thing, because that means we are able to test more with two languages
than we would be able to with either one (and it's thus useful to have
both).

And I agree with Robbert that Python seems like the best choice for
this other language, given its current popularity level. But as I said
before, I'm open to other languages as well.

Re: RFC: adding pytest as a supported test framework

2024-06-14 Thread Jelte Fennema-Nio

On Fri, 14 Jun 2024 at 17:49, Tom Lane  wrote:
> But what I'd really like to see is some comparison of the
> language-provided testing facilities that we're proposing
> to depend on.  Why is pytest (or whatever) better than Test::More?

Some advantages of pytest over Test::More:

1. It's much easier to debug failing tests using the output that
pytest gives. A good example of this is on pytest its homepage[1]
(i.e. it shows the value given to the call to inc in the error)
2. No need to remember a specific comparison DSL
(is/isnt/is_deeply/like/ok/cmp_ok/isa_ok), just put assert in front of
a boolean expression and your output is great (if you want to add a
message too for clarity you can use: assert a == b, "the world is
ending")
3. Very easy to postgres log files on stderr/stdout when a test fails.
This might be possible/easy with Perl too, but we currently don't do
that. So right now for many failures you're forced to traverse the
build/testrun/... directory tree to find the logs.
4. Pytest has autodiscovery of test files and functions, so we
probably wouldn't have to specify all of the exact test files anymore
in the meson.build files.

Regarding 2, there are ~150 checks that are using a suboptimal way of
testing for a comparison. Mostly a lot that could use "like(..., ...)"
instead of "ok(... ~= ...)"
❯ grep '\bok(.*=' **.pl | wc -l
151

[1]: https://docs.pytest.org/en/8.2.x/#a-quick-example

Re: RFC: adding pytest as a supported test framework

2024-06-14 Thread Jelte Fennema-Nio

On Fri, 14 Jun 2024 at 22:33, Greg Sabino Mullane  wrote:
> I am not happy with the state of Perl, as it has made some MAJOR missteps 
> along the way, particularly in the last 5 years. But can we dispel this 
> strawman? There is a difference between "unpopular" and "unmaintained". The 
> latest version of Perl was released May 20, 2024. The latest release of 
> Test::More was April 25, 2024. Both are heavily used. Just not as heavily as 
> they used to be. :)

Sorry, yes I exaggerated here. Looking at the last Perl changelog[1]
it's definitely getting more new features and improvements than I had
thought.

Test::More on the other hand, while indeed still maintained, it's
definitely not getting significant new feature development or
improvements[2]. Especially when comparing it to pytest[3].

[1]: https://perldoc.perl.org/perldelta
[2]: https://github.com/Test-More/test-more/blob/master/Changes
[3]: https://docs.pytest.org/en/stable/changelog.html

Re: RFC: adding pytest as a supported test framework

2024-06-13 Thread Jelte Fennema-Nio

On Thu, 13 Jun 2024 at 23:35, Andrew Dunstan  wrote:
> Clearly there are areas we can improve, and we need to be somewhat more
> proactive about it.

To follow that great suggestion, I updated meson wiki[1] after I
realized some of the major gripes I had with the Perl tap test output
were not actually caused by Perl but by meson:

The main changes I made was using "-q --print-errorlogs" instead "-v",
to reduce the enormous clutter in the output of the commands in the
wiki to something much more reasonable.

As well as adding examples on how to run specific tests

[1]: https://wiki.postgresql.org/wiki/Meson#Test_related_commands

Re: RFC: adding pytest as a supported test framework

2024-06-13 Thread Jelte Fennema-Nio

On Thu, 13 Jun 2024 at 20:11, Robert Haas  wrote:
> > But Perl is at the next level of unmaintained infrastructure. It is
> > actually clear how you can contribute to it, but still no new
> > community members actually want to contribute to it. Also, it's not
> > only unmaintained by us but it's also pretty much unmaintained by the
> > upstream community.
>
> I feel like I already agreed to this in a previous email and you're
> continuing to argue with me as if I were disagreeing.

Sorry about that.

> I also agree with this. I'm just not super optimistic about how much
> of that will actually happen. And I'd like to hear you acknowledge
> that concern and think about whether it can be addressed in some way,
> instead of just repeating that we should do it anyway. Because I agree
> we probably should do it anyway, but that doesn't mean I wouldn't like
> to see the downsides mitigated as much as we can.

I'm significantly more optimistic than you, but I also definitely
understand and agree with the concern. I also agree that mitigating
that concern beforehand would be a good thing.

> In particular, if
> the proposal is exactly "let's add the smallest possible patch that
> enables people to write tests in Python and then add a few new tests
> in Python while leaving almost everything else in Perl, with no
> migration plan and no clear vision of how the Python support ever gets
> any better than the minimum stub that is proposed for initial commit,"
> then I don't know that I can vote for that plan. Honestly, that sounds
> like very little work for the person proposing that minimal patch and
> a whole lot of work for the rest of the community later on, and the
> evidence is not in favor of volunteers showing up to take care of that
> work. The plan should be more front-loaded than that: enough initial
> development should get done by the people making the proposal that if
> the work stops after, we don't have another big mess on our hands.
>
> Or so I think, anyway.

I understand and agree with your final stated goal of not ending up in
another big mess. It's also clear to me that you don't think the
current proposal achieves that goal. So I assume you have some
additional ideas for the proposal to help achieve that goal and/or
some specific worries that you'd like to get addressed better in the
proposal. But currently it's not really clear to me what either of
those are. Could you clarify?

Re: RFC: adding pytest as a supported test framework

2024-06-13 Thread Jelte Fennema-Nio

On Thu, 13 Jun 2024 at 15:38, Robert Haas  wrote:
> For whatever reason, it seems like every piece of
> infrastructure that the PostgreSQL community has suffers from severe
> neglect. Literally everything I know of either has one or maybe two
> very senior hackers maintaining it, or no maintainer at all. Andrew
> maintains the buildfarm and it evolves quite slowly. Andres did all
> the work on meson, with some help from Peter. Thomas maintains cfbot
> as a skunkworks. The Perl-based TAP test framework gets barely any
> love at all. The CommitFest application is pretty much totally
> stagnant, and in fact is a great example of what I'm talking about
> here: I wrote an original version in Perl and somebody -- I think
> Magnus -- rewrote it in a more maintainable framework -- and then the
> development pace went to basically zero. All of this stuff is critical
> project infrastructure and yet it feels like nobody wants to work on
> it.

Overall, I agree with the sentiment of us not maintaining our tooling
well (although I think meson maintenance has been pretty decent so
far). I think there's a bunch of reasons for this (not all apply to
each of the tools):
1. pretty much solely maintained by senior community members who don't
have time to maintain it
2. no clear way to contribute. e.g. where should I send a patch/PR for
the commitfest app, or the cfbot?
3. (related to 1) unresponsive when somehow contributions are actually
sent in (I have two open PRs on the cfbot app from 3 years ago without
any response)

I think 1 & 3 could be addressed by more easily giving commit/merge
access to these tools than to the main PG repo. And I think 2 could be
addressed by writing on the relevant wiki page where to go, and
probably putting a link to the wiki page on the actual website of the
tool.

But Perl is at the next level of unmaintained infrastructure. It is
actually clear how you can contribute to it, but still no new
community members actually want to contribute to it. Also, it's not
only unmaintained by us but it's also pretty much unmaintained by the
upstream community.

> But I still
> think it's fair to question whether the preference of many developers
> for Python over Perl will translate into sustained investment in
> improving the infrastructure. Again, I will be thrilled if it does,
> but that just doesn't seem to be the way that things go around here,
> and I bet the reasons go well beyond choice of programming language.

As you said, no one in our community wants to maintain our testsuite
full time. But our test suite consists partially of upstream
dependencies and partially of our own code. Right now pretty much
no-one improves the ustream code, and pretty much no-one improves our
own code. Using a more modern language gives up much more frequent
upstream improvements for free, and it will allow new community
members to contribute to our own test suite.

And I understand you are sceptical that people will contribute to our
own test suite, just because it's Python. But as a counterpoint:
people are currently already doing exactly that, just outside of the
core postgres repo[1][2][3]. I don't see why those people would
suddenly stop doing that if we include such a suite in the official
repo. Apparently many people hate writing tests in Perl so much that
they'd rather build Python test frameworks to test their extensions,
than to use/improve the Perl testing framework included in Postgres.

[1]: https://github.com/pgbouncer/pgbouncer/tree/master/test
[2]: https://github.com/jchampio/pg-pytest-suite
[3]: https://github.com/postgrespro/testgres

PS. I don't think it makes sense to host our tooling like the
commitfest app on our own git server instead of github/gitlab. That
only makes it harder for community members to contribute and also much
harder to set up CI. I understand the reasons why we use mailing lists
for the development of core postgres, but I don't think those apply
nearly as much to our tooling repos. And honestly also not to stuff
like the website.

Re: RFC: adding pytest as a supported test framework

2024-06-13 Thread Jelte Fennema-Nio

On Thu, 13 Jun 2024 at 17:19, Tom Lane  wrote:
> I wonder if we should be checking out some of the other newer
> languages that were mentioned upthread.

If this is actually something that we want to seriously evaluate, I
think that's a significant effort. And I think the people that want a
language would need to step in to make that effort. So far Jacob[1],
Alexander[2] and me[3] seem to be doing that for Python, and Sutou has
done that for Ruby[4].

[1]: https://github.com/pgbouncer/pgbouncer/tree/master/test
[2]: https://github.com/jchampio/pg-pytest-suite
[3]: https://github.com/postgrespro/testgres
[4]: 
https://github.com/pgroonga/pgroonga/blob/main/test/test-streaming-replication.rb

> It feels like going to
> Python here will lead to having two testing infrastructures with
> mas-o-menos the same capabilities, leaving us with a situation
> where people have to know both languages in order to make sense of
> our test suite.  I find it hard to picture that as an improvement
> over the status quo.

You don't have to be fluent in writing Python to be able to read and
understand tests written in it. As someone familiar with Python I can
definitely read our test suite, and I expect everyone smart enough to
be fluent in Perl to be able to read and understand Python with fairly
little effort too.

I think having significantly more tests being written, and those tests
being written faster and more correctly, is definitely worth the
slight mental effort of learning to read two very similarly looking
scripting languages (they both pretty much looking like pseudo code).

Re: RFC: adding pytest as a supported test framework

2024-06-12 Thread Jelte Fennema-Nio

On Wed, 12 Jun 2024 at 21:07, Robert Haas  wrote:
> Yeah, I don't think depending on psycopg2 is practical at all. We can
> either shell out to psql like we do now, or we can use something like
> CFFI.

Quick clarification I meant psycopg3, not psycopg2. And I'd very much
like to avoid using psql for sending queries, luckily CFFI in python
is very good.

> Many of
> those improvements could be made on top of the Perl framework we have
> today, and some of them have been discussed, but nobody's done the
> work.

I agree it's not a technical issue. It is a people issue. There are
very few people skilled in Perl active in the community. And most of
those are very senior hackers that have much more important things to
do that make our Perl testing framework significantly better. And the
less senior people that might see improving tooling as a way to get
help out in the community, are try to stay away from Perl with a 10
foot pole. So the result is, nothing gets improved. Especially since
very few people outside our community improve this tooling either.

 I also don't understand the argument that assert a == b is some
> new and wonderful thing; I mean, you can already do is($a,$b,"test
> name") which *also* shows you the values when they don't match, and
> includes a test name, too!

Sure you can, if you know the function exists. And clearly not
everyone knows that it exists, as the quick grep below demonstrates:

❯ grep 'ok(.* == .*' **.pl | wc -l
41

But apart from the obvious syntax doing what you want, the output is
also much better when looking at a slightly more complex case. With
the following code:

def some_returning_func():
return 1234

def some_func(val):
if val > 100:
return 100
return val

def test_mytest():
assert some_func(some_returning_func()) == 50

Pytest will show the following output

def test_mytest():
>   assert some_func(some_returning_func()) == 50
E   assert 100 == 50
E+  where 100 = some_func(1234)
E+where 1234 = some_returning_func()

I have no clue how you could get output that's even close to that
clear with Perl.

Another problem I run into is that, as you probably know, sometimes
you need to look at the postgres logs to find out what actually went
wrong. Currently the only way to find them (for me) is following the
following steps: hmm, let me figure out what that directory was called
again... ah okay it is build/testrun/pg_upgrade/001_basic/... okay
let's start opening log files that all have very similar names until
find the right one.

When a test in pytest fails it automatically outputs all stdout/stderr
that was outputted, and hides it on success. So for the PgBouncer test
suite. I simply send all the relevant log files to stdout, prefixed by
some capitalized identifying line with a few newlines around it.
Something like "PG_LOG: /path/to/actual/logfile". Then when a test
fails in my terminal window I can look at the files related to the
failed test instantly. This allows me to debug failures much faster.

A related thing that also doesn't help at all is that (afaik) seeing
any of the perl tap test output in your terminal requires running
`meson test` with the -v option, and then scrolling up past all the
super verbose output of successfully passing tests to find out what
exactly failed in the single test that failed. And if you don't want
to do that you have to navigate to the magic directory path (
build/testrun/pg_upgrade/001_basic/) of the specific tests to look at
the stdout file there... Which then turns out not to even be there if
you actually had a compilation failure in your perl script (which
happens very often to anyone that doesn't use perl often). So now you
have to scroll up anyway.

Pytest instead is very good at only showing output for the tests that
failed, and hiding pretty much all output for the tests that passed.

Re: RFC: adding pytest as a supported test framework

2024-06-12 Thread Jelte Fennema-Nio

On Wed, 12 Jun 2024 at 18:46, Daniele Varrazzo
 wrote:
> This is true, but [citation needed] :D I assume the pointer wanted to
> be https://www.psycopg.org/psycopg3/docs/api/pq.html#pq-impl

Ugh, yes I definitely meant to add a link to that [1]. I meant this one though:

[1]: 
https://www.psycopg.org/psycopg3/docs/basic/install.html#pure-python-installation

> - using the pure Python psycopg (enforced by exporting
> 'PSYCOPG_IMPL=python') you would use the libpq found on the
> LD_LIBRARY_PATH, which can be useful to test regressions to the libpq
> itself.

This indeed was the main idea I had in mind.

> - if you want to test new libpq functions you can reach them in Python
> by dynamic lookup. See [2] for an example of a function only available
> from libpq v17.
>
> [2]: 
> https://github.com/psycopg/psycopg/blob/2bf7783d66ab239a2fa330a842fd461c4bb17c48/psycopg/psycopg/pq/_pq_ctypes.py#L564-L569

Yeah, that dynamic lookup would work. But due to the cyclic dependency
on postgres commit vs psycopg PR we couldn't depend on psycopg for
those dynamic lookups. So we'd need to have our own dynamic lookup
code to do this.

I don't see a huge problem with using psycopg for already released
commonly used features (i.e. connecting to postgres and doing
queries), but still use our own custom dynamic lookup for the rare
tests that test newly added features. But I can definitely see people
making the argument that if we need to write & maintain some dynamic
lookup code ourselves anyway, we might as well have all the dynamic
lookup code in our repo to avoid dependencies.

On Wed, 12 Jun 2024 at 18:46, Daniele Varrazzo
 wrote:
>
> On Wed, 12 Jun 2024 at 18:08, Jelte Fennema-Nio  wrote:
> >
> > On Wed, 12 Jun 2024 at 17:50, Andres Freund  wrote:
> > > > The OAuth pytest suite makes extensive use of
> > > > - psycopg, to easily drive libpq;
> > >
> > > That's probably not going to fly. It introduces painful circular 
> > > dependencies
> > > between building postgres (for libpq), building psycopg (requiring libpq) 
> > > and
> > > testing postgres (requiring psycopg).
> >
> > psycopg has a few implementations binary, c, & pure python. The pure
> > python one can be linked to a specific libpq.so file at runtime[1]. As
>
> This is true, but [citation needed] :D I assume the pointer wanted to
> be https://www.psycopg.org/psycopg3/docs/api/pq.html#pq-impl
>
> I see the following use cases and how I would use psycopg to implement them:
>
> - by installing 'psycopg[binary]' you would get a binary bundle
> shipping with a stable version of the libpq, so you can test the
> database server regardless of libpq instabilities in the same
> codebase.
> - using the pure Python psycopg (enforced by exporting
> 'PSYCOPG_IMPL=python') you would use the libpq found on the
> LD_LIBRARY_PATH, which can be useful to test regressions to the libpq
> itself.
> - if you want to test new libpq functions you can reach them in Python
> by dynamic lookup. See [2] for an example of a function only available
> from libpq v17.
>
> [2]: 
> https://github.com/psycopg/psycopg/blob/2bf7783d66ab239a2fa330a842fd461c4bb17c48/psycopg/psycopg/pq/_pq_ctypes.py#L564-L569
>
> -- Daniele

Re: RFC: adding pytest as a supported test framework

2024-06-12 Thread Jelte Fennema-Nio

On Wed, 12 Jun 2024 at 17:50, Andres Freund  wrote:
> > The OAuth pytest suite makes extensive use of
> > - psycopg, to easily drive libpq;
>
> That's probably not going to fly. It introduces painful circular dependencies
> between building postgres (for libpq), building psycopg (requiring libpq) and
> testing postgres (requiring psycopg).
>
> You *can* solve such issues, but we've debated that in the past, and I doubt
> we'll find agreement on the awkwardness it introduces.

psycopg has a few implementations binary, c, & pure python. The pure
python one can be linked to a specific libpq.so file at runtime[1]. As
long as we don't break the libpq API (which we shouldn't), we can just
point it to the libpq compiled by meson/make. We wouldn't be able to
use the newest libpq features that way (because psycopg wouldn't know
about them), but that seems totally fine for most usages (i.e. sending
a query over a connection). If we really want to use those from the
python tests we could write our own tiny CFFI layer specifically for
those.

> One thing worth thinking about is that such dependencies have to work on a
> relatively large number of platforms / architectures. A lot of projects
> don't...

Do they really? A bunch of the Perl tests we just skip on windows or
uncommon platforms. I think we could do the same for these.

Re: RFC: adding pytest as a supported test framework

2024-06-12 Thread Jelte Fennema-Nio

On Wed, 12 Jun 2024 at 15:49, FWS Neil  wrote:
> I believe that anyone coming out of school these days would have a relatively
> easy transition to any of Go, Rust, Kotlin, Swift, etc.  In other words, any 
> of
> the modern languages.

Agreed, which is why I said they'd be acceptable to me. But I think
one important advantage of Python is that it's clear that many people
in the community are willing to write tests in it. At PGConf.dev there
were a lot of people in the unconference session about this. Also many
people already wrote a Postgres testing framework for python, and are
using it (see list of projects that Alexander shared). I haven't seen
that level of willingness to write tests for any of those other
languages (yet).

> In addition, the language should scale well to
> multiprocessors, because parallel testing is becoming more important every 
> day.
> 
> Python is good for package depth and resource availability, but fails IMO in 
> the
> other categories.

You can easily pin packages or call a different function based on the
version of the package, so I'm not sure what the problem is with
package stability. Also chances are we'll pull in very little external
packages and rely mostly on the stdlib (which is quite stable).

Regarding parallelised running of tests, I agree that's very
important. And indeed normally parallelism in python can be a pain
(although async/await makes I/O parallelism a lot better at least).
But running pytest tests in parallel is extremely easy by using
pytest-xdist[1], so I don't really think there's an issue for this
specific Python usecase.

> My experience with python where the program flow can change
> because of non-visible characters is a terrible way to write robust long term
> maintainable code.  Because of this most of the modern languages are going to 
> be
> closer in style to Postgres’s C code base than Python.

I'm assuming this is about spaces vs curly braces for blocks? Now that
we have auto formatters for every modern programming language I indeed
prefer curly braces myself too. But honestly that's pretty much a tabs
vs spaces discussion.

[1]: https://pypi.org/project/pytest-xdist/

On Wed, 12 Jun 2024 at 15:49, FWS Neil  wrote:
>
>
> > On Jun 12, 2024, at 6:40 AM, Jelte Fennema-Nio  wrote:
> >
> > I think C#, Java, Go, Rust, Kotlin, and Swift would be acceptable
> > choices for me (and possibly some more). They allow some type of
> > introspection, they have a garbage collector, and their general
> > tooling is quite good.
> >
>
> Having used Python for 15+ years and then abandoned it for all projects I 
> would
> say the single most important points for a long term project like Postgres is,
> not necessarily in order, package stability, package depth, semantic 
> versioning,
> available resources, and multiprocessor support.
>
> The reason I abandoned Python was for the constant API breaks in packages. 
> Yes,
> python is a great language to teach in school for a one-time class project, 
> but
> that is not Postgres’s use-case.  Remember that Fortran and Pascal were the
> darlings for teaching in school prior to Python and no-one uses them any more.
>
> Yes Python innovates fast and is fashionable.  But again, not Postgres’s 
> use-case.
>
> I believe that anyone coming out of school these days would have a relatively
> easy transition to any of Go, Rust, Kotlin, Swift, etc.  In other words, any 
> of
> the modern languages.  In addition, the language should scale well to
> multiprocessors, because parallel testing is becoming more important every 
> day.
>
> If the Postgres project is going to pick a new language for testing, it should
> pick a language for the next 50 years based on the projects needs.
>
> Python is good for package depth and resource availability, but fails IMO in 
> the
> other categories. My experience with python where the program flow can change
> because of non-visible characters is a terrible way to write robust long term
> maintainable code.  Because of this most of the modern languages are going to 
> be
> closer in style to Postgres’s C code base than Python.

Re: Proposal: Document ABI Compatibility

2024-06-12 Thread Jelte Fennema-Nio

On Wed, 12 Jun 2024 at 14:44, Peter Eisentraut  wrote:
> I think since around 6 years ago we have been much more vigilant about
> avoiding ABI breaks.  So if there aren't any more recent examples of
> breakage, then maybe that was ultimately successful, and the upshot is,
> continue to be vigilant at about the same level?

While not strictly an ABI break I guess, the backport of 32d5a4974c81
broke building Citus against 13.10 and 14.7[1].

[1]: https://github.com/citusdata/citus/pull/6711

Re: RFC: adding pytest as a supported test framework

2024-06-12 Thread Jelte Fennema-Nio

On Wed, 12 Jun 2024 at 01:48, Noah Misch  wrote:
> If we're going to test in a non-Perl language, I'd pick C over Python.
> 
> We'd need a library like today's Perl
> PostgreSQL::Test to make C-language tests nice, but the same would apply to
> any new language.

P.P.S. We already write tests in C, we use it for testing libpq[1].
I'd personally definitely welcome a C library to make those tests
nicer to write, because I've written a fair bit of those in the past
and currently it's not very fun to do.

[1]: 
https://github.com/postgres/postgres/blob/master/src/test/modules/libpq_pipeline/libpq_pipeline.c

Re: RFC: adding pytest as a supported test framework

2024-06-12 Thread Jelte Fennema-Nio

On Wed, 12 Jun 2024 at 01:48, Noah Misch  wrote:
> If we're going to test in a non-Perl language, I'd pick C over Python.  There
> would be several other unlikely-community-choice languages I'd pick over
> Python (C#, Java, C++).

My main goals of this thread are:
1. Allowing people to quickly write tests
2. Have those tests do what the writer intended them to do
3. Have good error reporting by default

Those goals indeed don't necesitate python.

But I think those are really hard to achieve with any C based
framework, and probably with C++ too. Also manual memory management in
tests seems to add tons of complexity for basically no benefit.

I think C#, Java, Go, Rust, Kotlin, and Swift would be acceptable
choices for me (and possibly some more). They allow some type of
introspection, they have a garbage collector, and their general
tooling is quite good.

But I think a dynamically typed scripting language is much more
fitting for writing tests like this. I love static typing for
production code, but imho it really doesn't have much benefit for
tests.

As scripting languages go, the ones that are still fairly heavily in
use are Javascript, Python, Ruby, and PHP. I think all of those could
probably work, but my personal order of preference would be Python,
Ruby, Javascript, PHP.

Finally, I'm definitely biased towards using Python myself. But I
think there's good reasons for that:
1. In the data space, that Postgres in, Python is very heavily used for analysis
2. Everyone coming out of university these days knows it to some extent
3. Multiple people in the community have been writing Postgres related
tests in python and have enjoyed doing so (me[1], Jacob[2],
Alexander[3])

What language people want to write tests in is obviously very
subjective. And obviously we cannot allow every language for writing
tests. But I think if ~25% of the community prefers to write their
tests in Python. Then that should be enough of a reason to allow them
to do so.

TO CLARIFY: This thread is not a proposal to replace Perl with Python.
It's a proposal to allow people to also write tests in Python.

> I also want the initial scope to be the new language coexisting with the
> existing Perl tests.  If a bulk translation ever happens, it should happen
> long after the debut of the new framework.  That said, I don't much trust a
> human-written bulk language translation to go through without some tests
> accidentally ceasing to test what they test in Perl today.

I definitely don't think we should rewrite all the tests that we have
in Perl today into some other language. But I do think that whatever
language we choose, that language should make it as least as easy to
write tests, as easy to read them and as easy to see that they are
testing the intended thing, as is currently the case for Perl.
Rewriting a few Perl tests into the new language, even if not merging
the rewrite, is a good way of validating that imho.

PS. For PgBouncer I actually hand-rewrote all the tests that we had in
bash (which is the worst testing language ever) in Python and doing so
actually found more bugs in PgBouncer code that our bash tests
wouldn't catch. So it's not necessarily the case that you lose
coverage by rewriting tests.

[1]: https://github.com/pgbouncer/pgbouncer/tree/master/test
[2]: https://github.com/jchampio/pg-pytest-suite
[3]: https://github.com/postgrespro/testgres

Re: Re: Re: Add support to TLS 1.3 cipher suites and curves lists

2024-06-12 Thread Jelte Fennema-Nio

On Wed, 12 Jun 2024 at 04:32, Erica Zhang  wrote:
> There are certain government, financial and other enterprise organizations 
> that have very strict requirements about the encrypted communication and more 
> specifically about fine grained params like the TLS ciphers and curves that 
> they use. The default ones for those customers are not acceptable. Any 
> products that integrate Postgres and requires encrypted communication with 
> the Postgres would have to fulfil those requirements.

Yeah, I ran into such requirements before too. So I do think it makes
sense to have such a feature in Postgres.

> So if we can have this patch in the upcoming new major version, that means 
> Postgres users who have similar requirements can upgrade to PG17.

As Daniel mentioned you can already achieve the same using the
"Ciphersuites" directive in openssl.conf. Also you could of course
always disable TLSv1.3 support.

Re: Add support to TLS 1.3 cipher suites and curves lists

2024-06-12 Thread Jelte Fennema-Nio

On Mon, 10 Jun 2024 at 12:31, Daniel Gustafsson  wrote:
> Regarding the ciphersuites portion of the patch.  I'm not particularly 
> thrilled
> about having a GUC for TLSv1.2 ciphers and one for TLSv1.3 ciphersuites, users
> not all that familiar with TLS will likely find it confusing to figure out 
> what
> to do.

I don't think it's easy to create a single GUC because OpenSSL has
different APIs for both. So we'd have to add some custom parsing for
the combined string, which is likely to cause some problems imho. I
think separating them is the best option from the options we have and
I don't think it matters much practice for users. Users not familiar
with TLS might indeed be confused, but those users shouldn't touch
these settings anyway, and just use the defaults. The users that care
about this probably already get two cipher strings from their
compliance teams, because many other applications also have two
separate options for specifying both.

Re: Partial aggregates pushdown

2024-06-12 Thread Jelte Fennema-Nio

On Wed, 12 Jun 2024 at 07:27, fujii.y...@df.mitsubishielectric.co.jp
 wrote:
> Could you please clarify what you mean?
> Are you referring to:
>   Option 1: Modifying existing aggregate functions to minimize the use of 
> internal state values.
>   Option 2: Not supporting the push down of partial aggregates for functions 
> with internal state values.

Basically I mean both Option 1 and Option 2 together. i.e. once we do
option 1, supporting partial aggregate pushdown for all important
aggregates with internal state values, then supporting pushdown of
internal state values becomes unnecessary.

> There are many aggregate functions with internal state values, so if we go 
> with Option 1,
> we might need to change a lot of existing code, like transition functions and 
> finalize functions.
> Also, I'm not sure how many existing aggregate functions can be modified this 
> way.

There are indeed 57 aggregate functions with internal state values:

> SELECT count(*) from pg_aggregate where aggtranstype = 'internal'::regtype;
 count
───
57
(1 row)

But there are only 26 different aggtransfns. And the most used 8 of
those cover 39 of those 57 aggregates.

> SELECT
sum (count) OVER(ORDER BY count desc, aggtransfn ROWS BETWEEN
unbounded preceding and current row) AS cumulative_count
, *
FROM (
SELECT
count(*),
aggtransfn
from pg_aggregate
where aggtranstype = 'internal'::regtype
group by aggtransfn
order by count(*) desc, aggtransfn
);
 cumulative_count │ count │   aggtransfn
──┼───┼
7 │ 7 │ ordered_set_transition
   13 │ 6 │ numeric_accum
   19 │ 6 │ int2_accum
   25 │ 6 │ int4_accum
   31 │ 6 │ int8_accum
   35 │ 4 │ ordered_set_transition_multi
   37 │ 2 │ int8_avg_accum
   39 │ 2 │ numeric_avg_accum
   40 │ 1 │ array_agg_transfn
   41 │ 1 │ json_agg_transfn
   42 │ 1 │ json_object_agg_transfn
   43 │ 1 │ jsonb_agg_transfn
   44 │ 1 │ jsonb_object_agg_transfn
   45 │ 1 │ string_agg_transfn
   46 │ 1 │ bytea_string_agg_transfn
   47 │ 1 │ array_agg_array_transfn
   48 │ 1 │ range_agg_transfn
   49 │ 1 │ multirange_agg_transfn
   50 │ 1 │ json_agg_strict_transfn
   51 │ 1 │ json_object_agg_strict_transfn
   52 │ 1 │ json_object_agg_unique_transfn
   53 │ 1 │ json_object_agg_unique_strict_transfn
   54 │ 1 │ jsonb_agg_strict_transfn
   55 │ 1 │ jsonb_object_agg_strict_transfn
   56 │ 1 │ jsonb_object_agg_unique_transfn
   57 │ 1 │ jsonb_object_agg_unique_strict_transfn
(26 rows)

And actually most of those don't have a serialfn, so they wouldn't be
supported by your suggested approach either. Looking at the
distribution of aggserialfns instead we see the following:

> SELECT
sum (count) OVER(ORDER BY count desc, aggserialfn ROWS BETWEEN
unbounded preceding and current row) AS cumulative_count
, *
FROM (
SELECT
count(*),
aggserialfn
from pg_aggregate
where
aggtranstype = 'internal'::regtype
AND aggserialfn != 0
group by aggserialfn
order by count(*) desc, aggserialfn
);
 cumulative_count │ count │aggserialfn
──┼───┼───
   12 │12 │ numeric_serialize
   24 │12 │ numeric_poly_serialize
   26 │ 2 │ numeric_avg_serialize
   28 │ 2 │ int8_avg_serialize
   30 │ 2 │ string_agg_serialize
   31 │ 1 │ array_agg_serialize
   32 │ 1 │ array_agg_array_serialize
(7 rows)

So there are only 7 aggserialfns, and thus at most 7 new postgres
types that you would need to create to support the same aggregates as
in your current proposal. But looking at the implementations of these
serialize functions even that is an over-estimation: numeric_serialize
and numeric_avg_serialize both serialize a NumericAggState, and
numeric_poly_serialize and int8_avg_serialize both serialize a
PolyNumAggState. So probably a we could even do with only 5 types. And
to be clear: only converting PolyNumAggState and NumericAggState to
actual postgres types would already cover 28 out of the 32 aggregates.
That seems quite feasible to do.

So I agree it's probably more code than your current approach. At the
very least because you would need to implement in/out text
serialization functions for these internal types that currently don't
have them. But I do think it would be quite a feasible amount. And to
clarify, I see a few benefits of using the approach that I'm
proposing:

1. So far aggserialfn and

Re: Partial aggregates pushdown

2024-06-11 Thread Jelte Fennema-Nio

On Tue, 11 Jun 2024 at 13:40, fujii.y...@df.mitsubishielectric.co.jp
 wrote:
> > From: Bruce Momjian 
> > So, we need import/export text representation for the partial aggregate 
> > mode for these eight, and call the base data type
> > text import/export functions for the zero ones when in this mode?
>
> I think that you are basically right.
> But, I think, in a perfect world we should also add an import/export function 
> for the following
> two category.
>
>   Category1. Validation Chek is needed for Safety.
> For example, I think a validation check is needed for avg(float4),
>  whose transition type is not internal. (See p.18 in [1])
> I plan to add import functions of avg, count (See p.18, p.19 in [1]).
>   Category1. Transition type is a pseudo data type.
> Aggregate functions of this category needs to accept many actual data 
> types,
> including user-defined types. So I think that it is hard to implement 
> import/export functions.
> Consequently, I do not plan to support these category. (See p.19 in [1])

How about instead of trying to serialize the output of
serialfn/deserialfn, instead we don't use the "internal" type and
create actual types in pg_type for these transtypes? Then we can
simply use the in/out and recv/send functions of those types to
serialize the values of the partial aggregate over the network.
Instead of having to rely on serialfn/deserialfn to be network-safe
(which they probably aren't).

That indeed still leaves the pseudo types. Since non of those
pseudotypes have a working in/recv function (they always error by
definition), I agree that we can simply not support those.

Basically that would mean that any aggregate with a non-internal and
non-pseudotype as a transtype could be used in this multi-node partial
aggregate pushdown.

Re: Addressing SECURITY DEFINER Function Vulnerabilities in PostgreSQL Extensions

2024-06-11 Thread Jelte Fennema-Nio

On Tue, 11 Jun 2024 at 11:54, Ashutosh Sharma  wrote:
> 1) Extends the CREATE EXTENSION command to support a new option, SET
> SEARCH_PATH.

I don't think it makes sense to add such an option to CREATE EXTENSION.
I feel like such a thing should be part of the extension control file
instead. That way the extension author controls the search path, not
the person that installs the extension.

Re: RFC: adding pytest as a supported test framework

2024-06-10 Thread Jelte Fennema-Nio

On Mon, 10 Jun 2024 at 22:47, Andrew Dunstan  wrote:
> As for what up and coming developers learn, they mostly don't learn C either, 
> and that's far more critical to what we do.

I think many up and coming devs have at least touched C somewhere
(e.g. in university). And because it's more critical to the project
and also to many other low level projects, they don't mind learning it
so much if they don't know it yet. But I, for example, try to write as
few Perl tests as possible, because getting good at Perl has pretty
much no use to me outside of writing tests for postgres.

(I do personally think that official Rust support in Postgres would
probably be a good thing, but that is a whole other discussion that
I'd like to save for some other day)

> But let's not throw the baby out with the bathwater. Quite apart from 
> anything else, a wholesale rework of the test infrastructure would make 
> backpatching more painful.

Backporting test improvements to decrease backporting pain is
something we don't look badly upon afaict (Citus its test suite breaks
semi-regularly on minor PG version updates due to some slight output
changes introduced by e.g. an updated version of the isolationtester).

Re: RFC: adding pytest as a supported test framework

2024-06-10 Thread Jelte Fennema-Nio

On Mon, 10 Jun 2024 at 20:46, Jacob Champion
 wrote:
> For the v18 cycle, I would like to try to get pytest [1] in as a
> supported test driver, in addition to the current offerings.

Huge +1 from me (but I'm definitely biased here)

> Thoughts? Suggestions?

I think the most important thing is that we make it easy for people to
use this thing, and use it in a "correct" way. I have met very few
people that actually like writing tests, so I think it's very
important to make the barrier to do so as low as possible.

For the PgBouncer repo I created my own pytest based test suite more
~1.5 years ago now. I tried to make it as easy as possible to write
tests there, and it has worked out quite well imho. I don't think it
makes sense to copy all things I did there verbatim, because some of
it is quite specific to testing PgBouncer. But I do think there's
quite a few things that could probably be copied (or at least inspire
what you do). Some examples:

1. helpers to easily run shell commands, most importantly setting
check=True by default[1]
2. helper to get a free tcp port[2]
3. helper to check if the log contains a specific string[3]
4. automatically show PG logs on test failure[4]
5. helpers to easily run sql commands (psycopg interface isn't very
user friendly imho for the common case)[5]
6. startup/teardown cleanup logic[6]

[1]: 
https://github.com/pgbouncer/pgbouncer/blob/3f791020fb017c570fcd2db390600a353f1cba0c/test/utils.py#L83-L131
[2]: 
https://github.com/pgbouncer/pgbouncer/blob/3f791020fb017c570fcd2db390600a353f1cba0c/test/utils.py#L210-L233
[3]: 
https://github.com/pgbouncer/pgbouncer/blob/3f791020fb017c570fcd2db390600a353f1cba0c/test/utils.py#L1125-L1143
[4]: 
https://github.com/pgbouncer/pgbouncer/blob/3f791020fb017c570fcd2db390600a353f1cba0c/test/utils.py#L1075-L1103
[5]: 
https://github.com/pgbouncer/pgbouncer/blob/3f791020fb017c570fcd2db390600a353f1cba0c/test/utils.py#L326-L338
[6]: 
https://github.com/pgbouncer/pgbouncer/blob/3f791020fb017c570fcd2db390600a353f1cba0c/test/utils.py#L546-L642

On Mon, 10 Jun 2024 at 22:04, Andres Freund  wrote:
> > Problem 1 (rerun failing tests): One architectural roadblock to this
> > in our Test::More suite is that tests depend on setup that's done by
> > previous tests. pytest allows you to declare each test's setup
> > requirements via pytest fixtures, letting the test runner build up the
> > world exactly as it needs to be for a single isolated test. These
> > fixtures may be given a "scope" so that multiple tests may share the
> > same setup for performance or other reasons.
>
> OTOH, that's quite likely to increase overall test times very
> significantly. Yes, sometimes that can be avoided with careful use of various
> features, but often that's hard, and IME is rarely done rigiorously.

You definitely want to cache things like initdb and "pg_ctl start".
But that's fairly easy to do with some startup/teardown logic. For
PgBouncer I create a dedicated schema for each test that needs to
create objects and automatically drop that schema at the end of the
test[6] (including any temporary objects outside of schemas like
users/replication slots). You can even choose not to clean up certain
large schemas if they are shared across multiple tests.

[6]: 
https://github.com/pgbouncer/pgbouncer/blob/3f791020fb017c570fcd2db390600a353f1cba0c/test/utils.py#L546-L642

> > Problem 2 (seeing what failed): pytest does this via assertion
> > introspection and very detailed failure reporting. If you haven't seen
> > this before, take a look at the pytest homepage [1]; there's an
> > example of a full log.
>
> That's not really different than what the perl tap test stuff allows. We
> indeed are bad at utilizing it, but I'm not sure that switching languages will
> change that.

It's not about allowing, it's about doing the thing that you want by
default. The following code

assert a == b

will show you the actual values of both a and b when the test fails,
instead of saying something like "false is not true". Ofcourse you can
provide a message here too, like with perl its ok function, but even
when you don't the output is helpful.

> I think part of the problem is that the information about what precisely
> failed is often much harder to collect when testing multiple servers
> interacting than when doing localized unit tests.
>
> I think we ought to invest a bunch in improving that, I'd hope that a lot of
> that work would be largely independent of the language the tests are written
> in.

Well, as you already noted no-one that started doing dev stuff in the
last 10 years knows Perl nor wants to learn it. So a large part of the
community tries to touch the current perl test suite as little as
possible. I personally haven't tried to improve anything about our
perl testing framework, even though I'm normally very much into
improving developer tooling.

> > Python's standard library has lots of power by itself, with very good
> > documentation. And virtualenvs and better

Re: Addressing SECURITY DEFINER Function Vulnerabilities in PostgreSQL Extensions

2024-06-06 Thread Jelte Fennema-Nio

On Thu, 6 Jun 2024 at 20:10, Isaac Morland  wrote:
>
> On Thu, 6 Jun 2024 at 12:53, Jeff Davis  wrote:
>
>>
>> > I didn't get you completely here. w.r.t extensions how will this have
>> > an impact if we set the search_path for definer functions.
>>
>> If we only set the search path for SECURITY DEFINER functions, I don't
>> think that solves the whole problem.
>
>
> Indeed. While the ability for a caller to set the search_path for a security 
> definer functions introduces security problems that are different than for 
> security invoker functions, it's still weird for the behaviour of a function 
> to depend on the caller's search_path. It’s even weirder for the default 
> search path behaviour to be different depending on whether or not the 
> function is security definer.

+1

And +1 to the general idea and direction this thread is going in. I
definitely think we should be making extensions more secure by
default, and this is an important piece of it.

Even by default making the search_path "pg_catalog, pg_temp" for
functions created by extensions would be very useful.

Re: ssl tests fail due to TCP port conflict

2024-06-06 Thread Jelte Fennema-Nio

On Wed, 5 Jun 2024 at 23:37, Tom Lane  wrote:
>
> Andrew Dunstan  writes:
> > On 2024-06-05 We 16:00, Alexander Lakhin wrote:
> >> That is, psql from the test instance 001_ssltests_34 opened a
> >> connection to
> >> the test server with the client port 50072 and it made using the port by
> >> the server from the test instance 001_ssltests_30 impossible.
>
> > Oh. (kicks self)
>
> D'oh.
>
> > Should we really be allocating ephemeral server ports in the range
> > 41952..65535? Maybe we should be looking for an unallocated port
> > somewhere below 41952, and above, say, 32767, so we couldn't have a
> > client socket collision.
>
> Hmm, are there really any standards about how these port numbers
> are used?
>
> I wonder if we don't need to just be prepared to retry the whole
> thing a few times.  Even if it's true that "clients" shouldn't
> choose ports below 41952, we still have a small chance of failure
> against a non-Postgres server starting up at the wrong time.

My suggestion would be to not touch the ephemeral port range at all
for these ports. In practice the ephemeral port range is used for
cases where the operating system assigns the port, and the application
doesn't care whot it is. Not for when you want to get a free port, but
want to know in advance which one it is.

For the PgBouncer test suite we do something similar as the PG its
perl tests do, but there we allocate a port between 10200 and 32768:
https://github.com/pgbouncer/pgbouncer/blob/master/test/utils.py#L192-L215

Sure theoretically it's possible to hit a rare case where another
server starts up at the wrong time, but that chance seems way lower
than a client starting up at the wrong time. Especially since there
aren't many servers that use a port with 5 digits.

Attached is a patch that updates the port numbers.

v1-0001-Don-t-use-ephemeral-port-range.patch
Description: Binary data

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-06-06 Thread Jelte Fennema-Nio

On Thu, 6 Jun 2024 at 18:01, Robert Haas  wrote:
> As I see it, the issue here is whether the default value would ever be
> different from the latest value. If not, then using blank to mean the
> latest seems fine, but if so, then I'd expect blank to mean the
> default version and latest to mean the latest version.

Alright, that's fair. And we already seem to follow that pattern:
There's currently no connection option that has a default that's not
the empty string, but still accepts the empty string as an argument.

> > I'll look into adding min_protocol_version to the patchset soonish.
> > Some review of the existing code in the first few patches would
> > definitely be appreciated.
>
> Yeah, I understand, and I do want to do that, but keep in mind I've
> already spent considerable time on this patch set, way more than most
> others, and if I want to get it committed I'm nowhere close to being
> done. It's probably multiple weeks of additional work for me, and I
> think I've probably already spent close to a week on this, and I only
> work ~48 weeks a year, and there are ~300 patches in the CommitFest.

I very much appreciate the time you spent on this patchset so far. I
mainly thought that instead of only discussing the more complex parts
of the patchset, it would be nice to also actually move forward a
little bit too. And the first 3 patches in this patchset are very
small and imho straightforward improvements.

To be clear, I'm not saying that should be all on you. I think those
first three patches can be reviewed by pretty much anyone.

> Plus, right now there is no possibility of actually committing
> anything until after we branch.

Totally fair, but even a LGTM on one of the patches would be quite nice.

> And, respectfully, I feel like there
> has to be some give and take here. I've been trying to give this patch
> set higher priority because it's in an area that I know something
> about and have opinions about and also because I can tell that you're
> kind of frustrated and I don't want you to leave the development
> community.

Thank you for giving it a higher priority, it's definitely appreciated
and noticed.

> But, at the same time, I don't think you've done much to
> help me get my patches committed, and while you have done some review
> of other people's patches, it doesn't seem to often be the kind of
> detailed, line-by-line review that is needed to get most patches
> committed. So I'm curious how you expect this system to scale.

Of course there's always the possibility to review more. But I don't
really agree with this summary of my review activity. I did see your
patches related to the incremental backup stuff. They looked
interesting, but at the time from an outside perspective it didn't
seem like those threads needed my reviews to progress (a bunch of
people more knowledgable on the topic were already responding). So I
spent my time mainly on threads where I felt I could add something
useful, and often that was more on the design front than the exact
code. Honestly that's what triggered this whole patchset in the first
place: Adding infrastructure for protocol changes so that the several
other threads that try to introduce protocol changes can actually move
forward, instead of being in limbo forever.

Regarding line-by-line reviews, imho I definitely do that for the
smaller patches I tend to review (even if they are part of a bigger
patchset). But the bigger ones I don't think line-by-line reviews are
super helpful at the start, so I generally comment more on the design
in those cases.

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-06-06 Thread Jelte Fennema-Nio

On Thu, 6 Jun 2024 at 03:03, Robert Haas  wrote:
> This makes some sense to me. I don't think that I believe
> max_protocol_version=latest is future-proof: just because I want to
> opt into this round of new features doesn't mean I feel the same way
> about the next round. But it may still be a good idea.

I think for most people the only reason not to opt-in to improvements
(even if they are small) is if those improvements break something
else. Once the NegotiateProtocolVersion message is implemented
everywhere in the ecosystem, nothing should break when going from e.g.
3.2 to 3.4. So for the majority of the people I think
max_protocol_version=latest is what they'll want to use once the
ecosystem has caught up. Of course there will be people that want
tight control, but they can set max_protocol_version=3.2 instead.

> I suppose the semantics are that we try to connect with the version
> specified by max_protocol_version and, if we get downgraded by the
> server, we abort if we end up below min_protocol_version.

Correct

> I like those
> semantics, and I think I also like having both parameters, but I'm not
> 100% sure of the naming. It's a funny use of "max" and "min", because
> the max is really what we're trying to do and the min is what we end
> up with, and those terms don't necessarily bring those ideas to mind.
> I don't have a better idea off-hand, though.

I borrowed this terminology from the the ssl_min_protocol_version and
ssl_max_protocol_version connection options that we already have.
Those basically have the same semantics as what I'm proposing here,
but for the TLS protocol version instead of the Postgres protocol
version. I'm also not a huge fan of the min_protocol_version and
max_protocol_version names, but staying consistent with existing
options seems quite nice.

Looking at ssl_max_protocol_version closer though, to stay really
consistent I'd have to change "latest" to be renamed to empty string
(i.e. there is no max_protocol_version). I think I might prefer
staying consistent over introducing an imho slightly clearer name.
Another way to stay consistent would of course be also adding "latest"
as an option to ssl_max_protocol_version? What do you think?

I'll look into adding min_protocol_version to the patchset soonish.
Some review of the existing code in the first few patches would
definitely be appreciated.

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-06-05 Thread Jelte Fennema-Nio

On Wed, 5 Jun 2024 at 22:48, Robert Haas  wrote:
> I agree that we need such a mechanism, but if the driving feature is
> cancel-key length, I expect that opt-in isn't going to work very well.
> Opt-in seems like it would work well with compression or transparent
> column encryption, but few users will specify a connection string
> option just to get a longer cancel key length, so the feature won't
> get much use if we do it that way.

I know Neon wants to make use of this for their proxy (to encode some
tenant_id into the key). So they might want to require people to
opt-in when using their proxy.

> I won't be crushed if we decide to
> somehow make it opt-in, but I kind of doubt that will happen.

> Would we
> make everyone add longcancelkey=true to all their connection strings
> for one release and then carry that connection parameter until the
> heat death of the universe even though after the 1-release transition
> period there would be no reason to ever use it? Would we rip the
> parameter back out after the transition period and break existing
> connection strings? Would we have people write protocolversion=3.1 to
> opt in and then they could just keep that in the connection string
> without harm, or at least without harm until 3.2 comes out? I don't
> really like any of these options that well.

I agree longcancelkey=true is not what we want. In my patch 0004, you
can specify max_protocol_version=latest to use the latest protocol
version as opt-in. This is a future proof version of
protocolversion=3.1 that you're proposing, because it will
automatically start using 3.2 when it comes out. So I think that
solves your concern here. (although maybe it should be called
latest-3.x or something, in case we ever want to add a 4.0 protocol,
naming is hard)

I personally quite like the max_protocol_version connection parameter.
I think even for testing it is pretty useful to tell libpq what
protocol version to try to connect as. It could even be accompanied
with a min_protocol_version, e.g. in case you only want the connection
attempt to fail when the server does not support this more secure
cancel key.

Re: [multithreading] extension compatibility

2024-06-05 Thread Jelte Fennema-Nio

On Wed, 5 Jun 2024 at 22:05, Robert Haas  wrote:
> The attached patch is a sketch of one possible approach: PostgreSQL
> signals whether it is multithreaded by defining or not defining
> PG_MULTITHREADING in pg_config_manual.h, and an extension signals
> thread-readiness by defining PG_THREADSAFE_EXTENSION before including
> any PostgreSQL headers other than postgres.h.

My first gut-reaction: It seems kinda annoying to have to do this for
every c that you use to build your extension, e.g. citus or postgis
have a ton of those.

PG_MODULE_MAGIC seems like a better fit imho.

If we really want a compile time failure, then I think I'd prefer to
have a new postgres.h file (e.g. postgres-thread.h) that you would
include instead of plain postgres.h. Basically this file could then
contain the below two lines:

#include "postgres.h"
#define PG_THREADSAFE_EXTENSION1

Re: Extension security improvement: Add support for extensions with an owned schema

2024-06-05 Thread Jelte Fennema-Nio

On Wed, 5 Jun 2024 at 19:53, Jeff Davis  wrote:
> Is this orthogonal to relocatability?

It's fairly orthogonal, but it has some impact on relocatability: You
can only relocate to a schema name that does not exist yet (while
currently you can only relocate to a schema that already exists). This
is because, for owned_schema=true, relocation is not actually changing
the schema of the extension objects, it only renames the existing
schema to the new name.

> When you say "an easy way to use a safe search_path": the CREATE
> EXTENSION command already sets the search_path, so your patch just
> ensures that it's empty (and therefore safe) first, right?

Correct: **safe** is the key word in that sentence. Without
owned_schema, you get an **unsafe** search_path by default unless you
go out of your way to set "schema=pg_catalog" in the control file.

> Should we go further and try to prevent creating objects in an
> extension-owned schema with normal SQL?

That would be nice for sure, but security wise it doesn't matter
afaict. Only the creator of the extension would be able to add stuff
in the extension-owned schema anyway, so there's no privilege
escalation concern there.

> Approximately how many extensions should be adjusted to use
> owned_schema=true?

Adjusting existing extensions would be hard at the moment, because the
current patch does not introduce a migration path. But basically I
think for most new extension installs (even of existing extensions) it
would be fine if owned_schema=true would be the default. I didn't
propose (yet) to make it the default though, to avoid discussing the
tradeoff of security vs breaking installation for an unknown amount of
existing extensions.

I think having a generic migration path would be hard, due to the many
ways in which extensions can now be installed. But I think we might be
able to add one fairly easily for relocatable extensions: e.g. "ALTER
EXTESION SET SCHEMA new_schema OWNED_SCHEMA", which would essentially
do CREATE SCHEMA new_schema + move all objects from old_schema to
new_schema. And even for non-relocatable one you could do something
like:

CREATE SCHEMA temp_schema_{random_id};
-- move all objects from ext_schema to temp_schema_{random_id};
DROP SCHEMA ext_schema; -- if this fails, ext_schema was not empty
ALTER SCHEMA temp_schema_{random_id} RENAME TO ext_schema;

> What are the reasons an extension would not want to
> own the schema in which the objects are created? I assume some would
> still create objects in pg_catalog, but ideally we'd come up with a
> better solution to that as well.

Some extensions depend on putting stuff into the public schema. But
yeah it would be best if they didn't.

> This protects the extension script, but I'm left wondering if we could
> do something here to make it easier to protect extension functions
> called from outside the extension script, also. It would be nice if we
> could implicitly tack on a "SET search_path TO @extschema@, pg_catalog,
> pg_temp" to each function in the extension. I'm not proposing that, but
> perhaps a related idea might work. Probably outside the scope of your
> proposal.

Yeah, this proposal definitely doesn't solve all security problems
with extensions. And indeed what you're proposing would solve another
major issue, another option would be to default to the "safe"
search_path that you proposed a while back. But yeah I agree that it's
outside of the scope of this proposal. I feel like if we try to solve
every security problem at once, probably nothing gets solved instead.
That's why I tried to keep this proposal very targeted, i.e. have this
be step 1 of an N step plan to make extensions more secure by default.

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-06-05 Thread Jelte Fennema-Nio

On Wed, 5 Jun 2024 at 17:12, Robert Haas  wrote:
> Well, hang on. The discussion on that thread suggests that this is
> only going to break connections to servers that don't have
> NegotiateProtocolVersion.

Yes, correct.

> What
> I really don't want is for v18 to break connections to v17 servers.
> That would be exponentially more disruptive.

Totally agreed, and that's easily achievable because of the
NegotiateProtocolVersion message. We're all good on v18 to v17
connections and protocol changes, as long as we make any new behaviour
depend on either a protocol parameter or a protocol version bump.

> I do take your point that poolers haven't added support for
> NegotiateProtocolVersion yet, but I bet that will change pretty
> quickly once there's a benefit to doing so.

I think at minimum we'll need a mechanism for people to connect using
a StartupMessage that doesn't break existing poolers. If users have no
way to connect at all to their semi-recently installed connection
pooler with the newest libpq, then I expect we'll have many unhappy
users. I think it's debatable whether the compatible StartupMessage
should be opt-in or opt-out. I'd personally at minimum want to wait
one PG release before using the incompatible StartupMessage by
default, just to give pooler installs some time to catch up.

> I think it's a lot easier
> for people to install a new pooler version than a new server version,
> because the server has the data in it. Also, it's not like they
> haven't had time.

I agree that it's a lot easier to upgrade poolers than it is to
upgrade PG, but still people are generally hesitant to upgrade stuff
in their infrastructure. And I totally agree that poolers have had the
time to implement NegotiateProtocolVersion support, but I'm pretty
sure many users will assign blame to libpq anyway.

> But the real question here is whether we think the longer cancel key
> is really important enough to justify the partial compatibility break.
> I'm not deathly opposed to it, but I think it's debatable and I'm sure
> some people are going to be unhappy.

I think if it's an opt-in compatibility break, users won't care how
important it is. It's either important enough to opt-in to this
compatibility break for them, or it's not and nothing changes for
them.

My feeling is that we're unlikely to find a feature that's worth
breaking compatibility (with poolers and pre-9.3 PGs) by default on
its own. But after adding a few new protocol features, at some point
together these features become worth this break. Especially, because
by then 9.3 will be even older and poolers will have started to
support NegotiateProtocolVersion (due to people complaining that they
cannot connect with the new opt-in libpq break-compatibility flag).
But if we're going to wait until we have the one super important
protocol feature, then I don't see us moving forward.

To summarize: I'd like to make some relatively small opt-in change(s)
in PG18 that breaks compatibility (with poolers and pre-9.3 PGs). So
that when we have an important enough reason to break compatibility by
default, that break will be much less painful to do.

libpq: Trace StartupMessage/SSLRequest/GSSENCRequest correctly

2024-06-05 Thread Jelte Fennema-Nio

While working on my patchset for protocol changes I realized that the
StartupMessage/SSLRequest/GSSENCRequest was not shown correctly in the
tracing output of libpq. With this change these messages are now shown
correctly in the tracing output.

To test you can add a PQreset(conn) call to the start of the
test_cancel function in
src/test/modules/libpq_pipeline/libpq_pipeline.c.

And then run:
ninja -C build all install-quiet &&
build/src/test/modules/libpq_pipeline/libpq_pipeline cancel
'port=5432' -t test.trace

And then look at the top of test.trace
From 6fe11a1656e3bc85608e3d883848e42cd765e553 Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio 
Date: Wed, 5 Jun 2024 12:07:09 +0200
Subject: [PATCH v1] libpq: Trace StartupMessage/SSLRequest/GSSENCRequest
 correctly

These messages would previously be logged in the tracing output as:
Unknown message: length %d

With this commit their type and contents are now correctly listed.
---
 src/interfaces/libpq/fe-trace.c | 47 -
 1 file changed, 35 insertions(+), 12 deletions(-)

diff --git a/src/interfaces/libpq/fe-trace.c b/src/interfaces/libpq/fe-trace.c
index d7a61ec9cc1..cb9648e5ceb 100644
--- a/src/interfaces/libpq/fe-trace.c
+++ b/src/interfaces/libpq/fe-trace.c
@@ -697,6 +697,7 @@ pqTraceOutputNoTypeByteMessage(PGconn *conn, const char *message)
 {
 	int			length;
 	int			logCursor = 0;
+	int			version = 0;
 
 	if ((conn->traceFlags & PQTRACE_SUPPRESS_TIMESTAMPS) == 0)
 	{
@@ -712,19 +713,41 @@ pqTraceOutputNoTypeByteMessage(PGconn *conn, const char *message)
 
 	fprintf(conn->Pfdebug, "F\t%d\t", length);
 
-	switch (length)
+	if (length < 8)
 	{
-		case 16:/* CancelRequest */
-			fprintf(conn->Pfdebug, "CancelRequest\t");
-			pqTraceOutputInt32(conn->Pfdebug, message, , false);
-			pqTraceOutputInt32(conn->Pfdebug, message, , false);
-			pqTraceOutputInt32(conn->Pfdebug, message, , false);
-			break;
-		case 8:	/* GSSENCRequest or SSLRequest */
-			/* These messages do not reach here. */
-		default:
-			fprintf(conn->Pfdebug, "Unknown message: length is %d", length);
-			break;
+		fprintf(conn->Pfdebug, "Unknown message: length is %d\n", length);
+		return;
+	}
+
+	memcpy(, message + logCursor, 4);
+	version = (int) pg_ntoh32(version);
+
+	if (version == CANCEL_REQUEST_CODE)
+	{
+		fprintf(conn->Pfdebug, "CancelRequest\t");
+		pqTraceOutputInt32(conn->Pfdebug, message, , false);
+		pqTraceOutputInt32(conn->Pfdebug, message, , false);
+		pqTraceOutputInt32(conn->Pfdebug, message, , false);
+	}
+	else if (version == NEGOTIATE_SSL_CODE)
+	{
+		fprintf(conn->Pfdebug, "SSLRequest\t");
+		pqTraceOutputInt32(conn->Pfdebug, message, , false);
+	}
+	else if (version == NEGOTIATE_GSS_CODE)
+	{
+		fprintf(conn->Pfdebug, "GSSENCRequest\t");
+		pqTraceOutputInt32(conn->Pfdebug, message, , false);
+	}
+	else
+	{
+		fprintf(conn->Pfdebug, "StartupMessage\t");
+		pqTraceOutputInt32(conn->Pfdebug, message, , false);
+		while (message[logCursor] != '\0')
+		{
+			pqTraceOutputString(conn->Pfdebug, message, , false);
+			pqTraceOutputString(conn->Pfdebug, message, , false);
+		}
 	}
 
 	fputc('\n', conn->Pfdebug);

base-commit: 7b71e5bbccd6c86bc12ba0124e7282cfb3aa3226
-- 
2.34.1

Extension security improvement: Add support for extensions with an owned schema

2024-06-01 Thread Jelte Fennema-Nio

Writing the sql migration scripts that are run by CREATE EXTENSION and
ALTER EXTENSION UPDATE are security minefields for extension authors.
One big reason for this is that search_path is set to the schema of the
extension while running these scripts, and thus if a user with lower
privileges can create functions or operators in that schema they can do
all kinds of search_path confusion attacks if not every function and
operator that is used in the script is schema qualified. While doing
such schema qualification is possible, it relies on the author to never
make a mistake in any of the sql files. And sadly humans have a tendency
to make mistakes.

This patch adds a new "owned_schema" option to the extension control
file that can be set to true to indicate that this extension wants to
own the schema in which it is installed. What that means is that the
schema should not exist before creating the extension, and will be
created during extension creation. This thus gives the extension author
an easy way to use a safe search_path, while still allowing all objects
to be grouped together in a schema. The implementation also has the
pleasant side effect that the schema will be automatically dropped when
the extension is dropped.

One way in which certain extensions currently hack around the
non-existence of this feature is by using the approach that pg_cron
uses: Setting the schema to pg_catalog and running "CREATE SCHEMA
pg_cron" from within the extension script. While this works, it's
obviously a hack, and a big downside of it is that it doesn't allow
users to choose the schema name used by the extension.

PS. I have never added fields to pg_catalag tables before, so there's
a clear TODO in the pg_upgrade code related to that. If anyone has
some pointers for me to look at to address that one that would be
helpful, if not I'll probably figure it out myself. All other code is
in pretty finished state, although I'm considering if
AlterExtensionNamespace should maybe be split a bit somehow, because
owned_schema skips most of the code in that function.
From cca2fc5a820822a29d9bc3b810a811adb1d081e1 Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio 
Date: Fri, 31 May 2024 02:04:31 -0700
Subject: [PATCH v1] Add support for extensions with an owned schema

Writing the sql migration scripts that are run by CREATE EXTENSION and
ALTER EXTENSION UPDATE are security minefields for extension authors.
One big reason for this is that search_path is set to the schema of the
extension while running these scripts, and thus if a user with lower
privileges can create functions or operators in that schema they can do
all kinds of search_path confusion attacks if not every function and
operator that is used in the script is schema qualified. While doing
such schema qualification is possible, it relies on the author to never
make a mistake in any of the sql files. And sadly humans have a tendency
to make mistakes.

This patch adds a new "owned_schema" option to the extension control
file that can be set to true to indicate that this extension wants to
own the schema in which it is installed. What that means is that the
schema should not exist before creating the extension, and will be
created during extension creation. This thus gives the extension author
an easy way to use a safe search_path, while still allowing all objects
to be grouped together in a schema. The implementation also has the
pleasant side effect that the schema will be automatically dropped when
the extension is dropped.
---
 doc/src/sgml/extend.sgml  |  13 ++
 doc/src/sgml/ref/create_extension.sgml|   2 +-
 src/backend/commands/extension.c  | 139 +-
 src/backend/utils/adt/pg_upgrade_support.c|   1 +
 src/include/catalog/pg_extension.h|   1 +
 src/include/commands/extension.h  |   4 +-
 .../expected/test_extensions.out  |  50 +++
 src/test/modules/test_extensions/meson.build  |   4 +
 .../test_extensions/sql/test_extensions.sql   |  27 
 .../test_ext_owned_schema--1.0.sql|   2 +
 .../test_ext_owned_schema.control |   5 +
 ...test_ext_owned_schema_relocatable--1.0.sql |   2 +
 .../test_ext_owned_schema_relocatable.control |   4 +
 13 files changed, 214 insertions(+), 40 deletions(-)
 create mode 100644 src/test/modules/test_extensions/test_ext_owned_schema--1.0.sql
 create mode 100644 src/test/modules/test_extensions/test_ext_owned_schema.control
 create mode 100644 src/test/modules/test_extensions/test_ext_owned_schema_relocatable--1.0.sql
 create mode 100644 src/test/modules/test_extensions/test_ext_owned_schema_relocatable.control

diff --git a/doc/src/sgml/extend.sgml b/doc/src/sgml/extend.sgml
index 218940ee5ce..36dc692abef 100644
--- a/doc/src/sgml/extend.sgml
+++ b/doc/src/sgml/extend.sgml
@@ -809,6 +809,19 @@ RETURNS anycompatible AS ...
   
  
 
+ 
+  owned_schema (boolean)
+

Re: In-placre persistance change of a relation

2024-05-28 Thread Jelte Fennema-Nio

On Fri, 24 May 2024 at 00:09, Kyotaro Horiguchi  wrote:
> Along with rebasing, I changed the interface of XLogFsyncFile() to
> return a boolean instead of an error message.

Two notes after looking at this quickly during the advanced patch
feedback session:

1. I would maybe split 0003 into two separate patches. One to make SET
UNLOGGED fast, which seems quite easy to do because no WAL is needed.
And then a follow up to make SET LOGGED fast, which does all the
XLOG_FPI stuff.
2. When wal_level = minital, still some WAL logging is needed. The
pages that were changed since the last still need to be made available
for crash recovery.

Re: Fix calloc check if oom (PQcancelCreate)

2024-05-27 Thread Jelte Fennema-Nio

On Mon, 27 May 2024 at 18:16, Ranier Vilela  wrote:
> Is it mandatory to call PQcancelFinish in case PQcancelCreate fails?

Yes, see the following line in the docs:

Note that when PQcancelCreate returns a non-null pointer, you must
call PQcancelFinish when you are finished with it, in order to dispose
of the structure and any associated memory blocks. This must be done
even if the cancel request failed or was abandoned.

Source: 
https://www.postgresql.org/docs/17/libpq-cancel.html#LIBPQ-PQCANCELCREATE

> IMO, I would expect problems with users.

This pattern is taken from regular connection creation and is really
common in libpq code so I don't expect issues. And even if there were
an issue, your v1 patch would not be nearly enough. Because you're
only freeing the connhost and addr field now in case of OOM. But there
are many more fields that would need to be freed.

>> And that function will free any
>> partially initialized PGconn correctly. This is also how
>> pqConnectOptions2 works.
>
> Well, I think that function *pqReleaseConnHost*, is incomplete.
> 1. Don't free connhost field;

ehm... it does free that afaict? It only doesn't set it to NULL. Which
indeed would be good to do, but not doing so doesn't cause any issues
with it's current 2 usages afaict.

> 2. Don't free addr field;

That's what release_conn_addrinfo is for.

> 3. Leave nconnhost and naddr, non-zero.

I agree that it would be good to do that pqReleaseConnHost and
release_conn_addrinfo. But also here, I don't see any issues caused by
not doing that currently.

So overall I agree pqReleaseConnHost and release_conn_addrinfo can be
improved for easier safe usage in the future, but I don't think those
improvements should be grouped into the same commit with an actual
bugfix.

Re: Fix calloc check if oom (PQcancelCreate)

2024-05-27 Thread Jelte Fennema-Nio

On Mon, 27 May 2024 at 16:06, Ranier Vilela  wrote:
> Em seg., 27 de mai. de 2024 às 10:23, Daniel Gustafsson  
> escreveu:
>> > On 27 May 2024, at 14:25, Ranier Vilela  wrote:
>> > I think that commit 61461a3, left some oversight.
>> > The function *PQcancelCreate* fails in check,
>> > return of *calloc* function.
>> >
>> > Trivial fix is attached.
>>
>> Agreed, this looks like a copy/paste from the calloc calls a few lines up.
>
> Yeah.

Agreed, this was indeed a copy paste mistake


>> > But, IMO, I think that has more problems.
>> > If any allocation fails, all allocations must be cleared.
>> > Or is the current behavior acceptable?
>>
>> Since this is frontend library code I think we should free all the 
>> allocations
>> in case of OOM.
>
> Agreed.
>
> With v1 patch, it is handled.

I much prefer the original trivial patch to the v1. Even in case of
OOM users are expected to call PQcancelFinish on a non-NULL result,
which in turn calls freePGConn. And that function will free any
partially initialized PGconn correctly. This is also how
pqConnectOptions2 works.

Re: DISCARD ALL does not force re-planning of plpgsql functions/procedures

2024-05-27 Thread Jelte Fennema-Nio

On Sun, 26 May 2024 at 19:39, Tom Lane  wrote:
> Hm, should it be?  That's hard-won knowledge, and I'm not sure there
> is a good reason to believe it's no longer applicable.

Okay, so I looked into this a bit more and there's a similar case here
that's imho very clearly doing something incorrectly: num_custom_plans
of prepared statements is not reset when you change the search_path.
When the search_path is changed, there's no reason to assume that the
previous generic plans have any relevance to the new generic plans,
because the tables that are being accessed might be completely
different. See below for an (imho) obviously incorrect choice of using
the generic plan after changing search_path. Maybe the fix for this
issue should be that if a plan gets invalidated, then num_custom_plans
for the source of that plan should be set to zero too. So to be clear,
that means I now think that DISCARD PLANS should also reset
num_custom_plans (as opposed to what I said before).

create schema a;
create schema b;
create table a.test_mode (a int);
create table b.test_mode (a int);
insert into a.test_mode select 1 from generate_series(1,100) union
all select 2;
insert into b.test_mode select 2 from generate_series(1,100) union
all select 1;
create index on a.test_mode (a);
create index on b.test_mode (a);
analyze a.test_mode;
analyze b.test_mode;

SET search_path = a;
PREPARE test_mode_func(int) as select count(*) from test_mode where a = $1;

\timing on
-- trigger execution 5 times
EXECUTE test_mode_func(1);
EXECUTE test_mode_func(1);
EXECUTE test_mode_func(1);
EXECUTE test_mode_func(1);
EXECUTE test_mode_func(1);
-- slow because of bad plan, even after changing search_path
SET search_path = b;
EXECUTE test_mode_func(1);
\c
-- fast after re-connect, because of custom plan
SET search_path = a;
PREPARE test_mode_func(int) as select count(*) from test_mode where a = $1;
SET search_path = b;
EXECUTE test_mode_func(1);

Re: DISCARD ALL does not force re-planning of plpgsql functions/procedures

2024-05-26 Thread Jelte Fennema-Nio

On Sun, May 26, 2024, 12:26 Jelte Fennema-Nio  wrote:

> DISCARD PLANS should probably forget about it though indeed.
>

DISCARD PLANS should probably **not** forget about it


> > Note that any change in behavior there would affect prepared
> > statements in general, not only plpgsql.
>
> DISCARD ALL already removes all prepared statements and thus their run
> counts, so for prepared statements there would be no difference there.
>

Re: DISCARD ALL does not force re-planning of plpgsql functions/procedures

2024-05-26 Thread Jelte Fennema-Nio

On Sun, 26 May 2024 at 19:39, Tom Lane  wrote:
> Hm, should it be?  That's hard-won knowledge, and I'm not sure there
> is a good reason to believe it's no longer applicable.

I think for DISCARD ALL it would probably make sense to forget this
knowledge . Since that is advertised as "reset the session to its initial
state". DISCARD PLANS should probably forget about it though indeed.

> Note that any change in behavior there would affect prepared
> statements in general, not only plpgsql.

DISCARD ALL already removes all prepared statements and thus their run
counts, so for prepared statements there would be no difference there.

DISCARD ALL does not force re-planning of plpgsql functions/procedures

2024-05-26 Thread Jelte Fennema-Nio

I got a report on the PgBouncer repo[1] that running DISCARD ALL was
not sufficient between connection handoffs to force replanning of
stored procedures. Turns out that while DISCARD AL and DISCARD PLAN
reset the plan cache they do not reset the num_custom_plans fields of
the existing PlanSources. So while the generic plan is re-planned
after DISCARD ALL, the decision on whether to choose it or not won't
be changed. See below for a minimally reproducing example:


create table test_mode (a int);
insert into test_mode select 1 from generate_series(1,100) union
all select 2;
create index on test_mode (a);
analyze test_mode;

create function test_mode_func(int)
returns integer as $$
declare
v_count integer;
begin
select into v_count count(*) from test_mode where a = $1;
return v_count;
END;
$$ language plpgsql;

\timing on
-- trigger execution 5 times
SELECT test_mode_func(1);
SELECT test_mode_func(1);
SELECT test_mode_func(1);
SELECT test_mode_func(1);
SELECT test_mode_func(1);
DISCARD ALL;
-- slow because of bad plan, even after DISCARD ALL
SELECT test_mode_func(2);
\c
-- fast after re-connect, because of custom plan
SELECT test_mode_func(2);



[1]: https://github.com/pgbouncer/pgbouncer/issues/912#issuecomment-2131250109

Re: Volatile write caches on macOS and Windows, redux

2024-05-25 Thread Jelte Fennema-Nio

On Tue, 18 Jul 2023 at 05:29, Thomas Munro  wrote:
> It's possible that fcntl(F_FULLFSYNC) might fail with ENOSUPP or other
> errors in obscure cases (eg unusual file systems).  In that case, you
> could manually lower fsync to just "on" and do your own research on
> whether power loss can toast your database, but that doesn't seem like
> a reason for us not to ship good solid defaults for typical users.

Is this the only reason why you're suggesting adding fsync=full,
instead of simply always setting F_FULLFSYNC when fsync=true on MacOS.
If so, I'm not sure we really gain anything by this tri-state. I think
people either care about data loss on power loss, or they don't. I
doubt many people want his third intermediate option, which afaict
basically means lose data on powerloss less often than fsync=false but
still lose data most of the time.

If you're going to keep this tri-state for MacOS, then it still seems
nicer to me to "fix" fsync=true on MacOS and introduce a fsync=partial
or something. Then defaults are the same across platforms and anyone
setting fsync=yes currently in their postgresql.conf would get the
fixed behaviour on upgrade.

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-05-25 Thread Jelte Fennema-Nio

(pressed send to early)

On Sat, 25 May 2024 at 12:39, Jelte Fennema-Nio  wrote:
> But again if I'm alone in this, then I don't

... mind budging on this to move this decision along. Using _pq_.xxx
parameters for all protocol changes would totally be acceptable to me.

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-05-25 Thread Jelte Fennema-Nio

On Thu, 23 May 2024 at 20:12, Tom Lane  wrote:
>
> Jelte Fennema-Nio  writes:
> > On Fri, 17 May 2024 at 21:24, Robert Haas  wrote:
> >> Perhaps these are all minority positions, but I can't tell you what
> >> everyone thinks, only what I think.
>
> > I'd love to hear some opinions from others on these design choices. So
> > far it seems like we're the only two that have opinions on this (which
> > seems hard to believe) and our opinions are clearly conflicting. And
> > above all I'd like to move forward with this, be it my way or yours
> > (although I'd prefer my way of course ;) )
>
> I got around to looking through this thread in preparation for next
> week's patch review session.  I have a couple of opinions to offer:
>
> 1. Protocol versions suck.  Bumping them is seldom a good answer,
> and should never be done if you have any finer-grained negotiation
> mechanism available.  My aversion to this is over thirty years old:
> I learned that lesson from watching the GIF87-to-GIF89 transition mess.
> Authors of GIF-writing tools tended to take the easy way out and write
> "GIF89" in the header whether they were actually using any of the new
> version's features or not.  This led to an awful lot of pictures that
> couldn't be read by available GIF-displaying tools, for no good reason
> whatsoever.  The PNG committee, a couple years later, reacted to that
> mess by designing PNG to have no version number whatsoever, and yet
> be extensible in a fine-grained way.  (Basically, a PNG file is made
> up of labeled chunks.  If a reader doesn't recognize a particular
> chunk code, it can still tell whether the chunk is "critical" or not,
> and thereby decide if it must give up or can proceed while ignoring
> that chunk.)
>
> So overall, I have a strong preference for using the _pq_.xxx
> mechanism instead of a protocol version bump.  I do not believe
> the latter has any advantage.

I'm not necessarily super opposed to only using the _pq_.xxx
mechanism. I mainly think it's silly to have a protocol version number
and then never use it. And I feel some of the proposed changes don't
really benefit from being able to be turned on-and-off by themselves.
My rule of thumb would be:
1. Things that a modern client/pooler would always request: version bump
2. Everything else: _pq_.xxx

Of the proposed changes so far on the mailing list the only 2 that
would fall under 1 imho are:
1. The ParameterSet message
2. Longer than 32bit secret in BackendKeyData

I also don't think the GIF situation you describe translates fully to
this discussion. We have active protocol version negotiation, so if a
server doesn't support protocol 3.1 a client is expected to fall back
to the 3.0 protocol when communicating. Of course you can argue that a
badly behaved client will fail to connect when it gets a downgrade
request from the server, but that same argument can be made about a
server not reporting support for a _pq_.xxx parameter that every
modern client/pooler requests. So I don't think there's a practical
difference in the problem you're describing.

But again if I'm alone in this, then I don't

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-05-25 Thread Jelte Fennema-Nio

On Thu, 23 May 2024 at 20:40, Tom Lane  wrote:
>
> Jacob Champion  writes:
> > Would it be good to expand on that idea of criticality? IIRC one of
> > Jelte's complaints earlier was that middleware has to know all the
> > extension types anyway, to be able to figure out whether it has to do
> > something about them or not. HTTP has the concept of hop-by-hop vs
> > end-to-end headers for related reasons.
>
> Yeah, perhaps.  We'd need to figure out just which classes we need
> to divide protocol parameters into, and then think about a way for
> code to understand which class a parameter falls into even when
> it doesn't specifically know that parameter.

I think this class is so rare, that it's not worth complicating the
discussion on new protocol features even more. AFAICT there is only
one proposed protocol change that does not need any pooler support
(apart from syncing the feature value when re-assigning the
connectin): Automatic binary encoding for a list of types

All others need some support from poolers, at the very least they need
new message types to not error out. But in many cases more complex
stuff is needed.

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-05-24 Thread Jelte Fennema-Nio

On Fri, 24 May 2024 at 15:28, Robert Haas  wrote:
>
> On Thu, May 23, 2024 at 4:00 PM Tom Lane  wrote:
> > I don't recall exactly what I thought earlier, but now I think we'd
> > be better off with separate infrastructure.  guc.c is unduly complex
> > already.  Perhaps there are bits of it that could be factored out
> > and shared, but I bet not a lot.
>
> OK. That seems fine to me, but I bet Jelte is going to disagree.

I indeed disagree. I think the effort needed to make guc.c handle
protocol parameters is extremely little. The 0011 patch are all the
changes that are needed to achieve that, and that patch only adds 65
additional lines. And only 15 of those 65 lines actually have to do
anything somewhat weird, to be able to handle the transactionality
discrepancy between protocol parameters and other GUCs. The other 50
lines are (imho) really clean and fit perfectly with the way guc.c is
currently structured (i.e. they add PGC_PROTOCOL and PGC_SU_PROTOCOL
in a really obvious way)

Separating it from the GUC infrastructure will mean we need to
duplicate a lot of the infrastructure. Assuming we don't care about
SHOW or pg_settings (which I agree are not super important), the
things that we would want for protocol parameters to have that guc.c
gives us for free are:
1. Reporting the value of the parameter to the client (done using
ParameterStatus)
2. Parsing and validating of the input, bool, int, enum, etc, but also
check_hook and assign_hook.
3. Logic in all connection poolers to change GUC values to the
client's expected values whenever a server connection is handed off to
a client
4. Permission checking, if we want some protocol extensions to only be
configurable by a highly privileged user

All of those things would have to be duplicated/re-implemented if we
make protocol parameters their own dedicated thing. Doing that work
seems like a waste of time to me, and would imho add much more
complexity than the proposed 65 lines of code in 0011.

Re: libpq compression (part 3)

2024-05-21 Thread Jelte Fennema-Nio

On Mon, 20 May 2024 at 21:42, Jacob Champion
 wrote:
> As Andrey points out, there was prior work done that started to take
> this into account. I haven't reviewed it to see how good it is -- and
> I think there are probably many use cases in which queries and tables
> contain both private and attacker-controlled information -- but if we
> agree that they have to be separated, then the strategy can at least
> be improved upon.

To help get everyone on the same page I wanted to list all the
security concerns in one place:

1. Triggering excessive CPU usage before authentication, by asking for
very high compression levels
2. Triggering excessive memory/CPU usage before authentication, by
sending a client sending a zipbomb
3. Triggering excessive CPU after authentication, by asking for a very
high compression level
4. Triggering excessive memory/CPU after authentication due to
zipbombs (i.e. small amount of data extracting to lots of data)
5. CRIME style leakage of information about encrypted data

1 & 2 can easily be solved by not allowing any authentication packets
to be compressed. This also has benefits for 5.

3 & 4 are less of a concern than 1&2 imho. Once authenticated a client
deserves some level of trust. But having knobs to limit impact
definitely seems useful.

3 can be solved in two ways afaict:
a. Allow the server to choose the maximum compression level for each
compression method (using some GUC), and downgrade the level
transparently when a higher level is requested
b. Don't allow the client to choose the compression level that the server uses.

I'd prefer option a

4 would require some safety limits on the amount of data that a
(small) compressed message can be decompressed to, and stop
decompression of that message once that limit is hit. What that limit
should be seems hard to choose though. A few ideas:
a. The size of the message reported by the uncompressed header. This
would mean that at most the 4GB will be uncompressed, since maximum
message length is 4GB (limited by 32bit message length field)
b. Allow servers to specify maximum client decompressed message length
lower than this 4GB, e.g. messages of more than 100MB of uncompressed
size should not be allowed.

I think 5 is the most complicated to deal with, especially as it
depends on the actual usage to know what is safe. I believe we should
let users have the freedom to make their own security tradeoffs, but
we should protect them against some of the most glaring issues
(especially ones that benefit little from compression anyway). As
already shown by Andrey, sending LDAP passwords in a compressed way
seems extremely dangerous. So I think we should disallow compressing
any authentication related packets. To reduce similar risks further we
can choose to compress only the message types that we expect to
benefit most from compression. IMHO those are the following (marked
with (B)ackend or (F)rontend to show who sends them):
- Query (F)
- Parse (F)
- Describe (F)
- Bind (F)
- RowDescription (B)
- DataRow (B)
- CopyData (B/F)

Then I think we should let users choose how they want to compress and
where they want their compression stream to restart. Something like
this:
a. compression_restart=query: Restart the stream after every query.
Recommended if queries across the same connection are triggered by
different end-users. I think this would be a sane default
b. compression_restart=message: Restart the stream for every message.
Recommended if the amount of correlation between rows of the same
query is a security concern.
c. compression_restart=manual: Don't restart the stream automatically,
but only when the client user calls a specific function. Recommended
only if the user can make trade-offs, or if no encryption is used
anyway.

Re: Multiple startup messages over the same connection

2024-05-19 Thread Jelte Fennema-Nio

On Sat, 18 May 2024 at 23:10, Vladimir Churyukin  wrote:
> I guess the process of passing control from child processes to the parent 
> could be a bit tricky for that one, but doable?
> Is there anything I'm missing that can be a no-go for this?

One seriously difficult/possibly impossible thing is passing SSL
session state between processes using shared memory.

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-05-17 Thread Jelte Fennema-Nio

On Fri, 17 May 2024 at 21:24, Robert Haas  wrote:
> I think the
> fear that we're going to run into cases where such complex handshaking
> is necessary is a major reason why I'm afraid of Jelte's ideas about
> ParameterSet: it seems much more opinionated to me than he seems to
> think it is.

I think that fear is valid, and I agree that we might want to add a
bespoke message for cases where ParameterSet is not enough. But as far
as I can tell ParameterSet would at least cover all the protocol
changes that have been suggested so far. Using an opinionated but
limited message type for 90% of the cases and a bespoke message for
the last 10% seems much better to me than having a bespoke one for
each (especially if currently none of the protocol proposals fall into
the last 10%).

> To the extent that I can wrap my head around what Jelte is proposing,
> and all signs point to that extent being quite limited, I suppose I
> was thinking of something like his option (2). That is, I assumed that
> a client would request all the optional features that said client
> might wish to use at connection startup time. But I can see how that
> assumption might be somewhat problematic in a connection-pooling
> environment.

To be clear, I'd also be totally fine with my option (2). I'm
personally slightly leaning towards my option (1), due to the reasons
listed before. But connection poolers could request all the protocol
extensions at the start just fine (using the default "disabled" value)
to check for support. So I think option (2) would probably be the most
conservative, i.e. we could always decide that option (1) is fine in
some future release.

> Still, at least to me, it seems better than trying to
> rely on GUC_REPORT. My opinion is (1) GUC_REPORT isn't a particularly
> well-designed mechanism so I dislike trying to double down on it

I agree that GUC_REPORT is not particularly well designed,
currently... But even in its current form it's already a very
effective mechanism for connection poolers to find out to which value
a specific GUC is set to, and if something similar to patch 0014 would
be merged my main gripe with GUC_REPORT would be gone. Tracking GUC
settings by using ParameterSet would actually be harder, because if
ParameterSet errors at Postgres then the connection pooler would have
to roll back its cache of that setting. While with the GUC_REPORT
response from Postgres it can simply rely on Postgres telling the
pooler that the GUC has changed, even rollbacks are handled correctly
this way.

> and
> (2) trying to mix these protocol-level parameters and the
> transactional GUCs we have together in a single mechanism seems
> potentially problematic

I don't understand what potential problems you're worried about here.
Could you clarify?

> and (3) I'm still not particularly happy about
> the idea of making protocol parameters into GUCs in the first place.

Similar to the above: Could you clarify why you're not happy about that?

> Perhaps these are all minority positions, but I can't tell you what
> everyone thinks, only what I think.

I'd love to hear some opinions from others on these design choices. So
far it seems like we're the only two that have opinions on this (which
seems hard to believe) and our opinions are clearly conflicting. And
above all I'd like to move forward with this, be it my way or yours
(although I'd prefer my way of course ;) )

Re: libpq compression (part 3)

2024-05-17 Thread Jelte Fennema-Nio

On Fri, 17 May 2024 at 23:10, Jacob Champion
 wrote:
> We're talking about a transport-level option, though -- I thought the
> proposal enabled compression before authentication completed? Or has
> that changed?

I think it would make sense to only compress messages after
authentication has completed. The gain of compressing authentication
related packets seems pretty limited.

> (I'm suspicious of arguments that begin "well you can already do bad
> things"

Once logged in it's really easy to max out a core of the backend
you're connected as. There's many trivial queries you can use to do
that. An example would be:
SELECT sum(i) from generate_series(1, 10) i;

So I don't think it makes sense to worry about an attacker using a
high compression level as a means to DoS the server. Sending a few of
the above queries seems much easier.

Re: libpq compression (part 3)

2024-05-17 Thread Jelte Fennema-Nio

On Fri, 17 May 2024 at 23:40, Robert Haas  wrote:
> To be clear, I am not arguing that it should be the receiver's choice.
> I'm arguing it should be the client's choice, which means the client
> decides what it sends and also tells the server what to send to it.
> I'm open to counter-arguments, but as I've thought about this more,
> I've come to the conclusion that letting the client control the
> behavior is the most likely to be useful and the most consistent with
> existing facilities. I think we're on the same page based on the rest
> of your email: I'm just clarifying.

+1

> I have some quibbles with the syntax but I agree with the concept.
> What I'd probably do is separate the server side thing into two GUCs,
> each with a list of algorithms, comma-separated, like we do for other
> lists in postgresql.conf. Maybe make the default 'all' meaning
> "everything this build of the server supports". On the client side,
> I'd allow both things to be specified using a single option, because
> wanting to do the same thing in both directions will be common, and
> you actually have to type in connection strings sometimes, so
> verbosity matters more.
>
> As far as the format of the value for that keyword, what do you think
> about either compression=DO_THIS_BOTH_WAYS or
> compression=DO_THIS_WHEN_SENDING/DO_THIS_WHEN_RECEIVING, with each "do
> this" being a specification of the same form already accepted for
> server-side compression e.g. gzip or gzip:level=9? If you don't like
> that, why do you think the proposal you made above is better, and why
> is that one now punctuated with slashes instead of semicolons?

+1

Re: commitfest.postgresql.org is no longer fit for purpose

2024-05-17 Thread Jelte Fennema-Nio

On Fri, 17 May 2024 at 17:59, Robert Haas  wrote:
> If they
> get attention, it's much more likely to be because somebody saw their
> email and wrote back than it is to be because somebody went through
> the CommitFest and found their entry and was like "oh, I should review
> this".

I think this is an important insight. I used the commitfest app to
find patches to review when I was just starting out in postgres
development, since I didn't subscribe to all af pgsql-hackers yet. I
do subscribe nowadays, but now I rarely look at the commitfest app,
instead I skim the titles of emails that go into my "Postgres" folder
in my mailbox. Going from such an email to a commitfest entry is near
impossible (at least I don't know how to do this efficiently). I guess
I'm not the only one doing this.

> Honestly, if we get to a situation where a patch author is sad
> because they have to click a link every 2 months to say "yeah, I'm
> still here, please review my patch," we've already lost the game. That
> person isn't sad because we asked them to click a link. They're sad
> it's already been N * 2 months and nothing has happened.

Maybe it wouldn't be so bad for an author to click the 2 months
button, if it would actually give their patch some higher chance of
being reviewed by doing that. And given the previous insight, that
people don't look at the commitfest app that often, it might be good
if pressing this button would also bump the item in people's
mailboxes.

Re: commitfest.postgresql.org is no longer fit for purpose

2024-05-17 Thread Jelte Fennema-Nio

On Fri, 17 May 2024 at 14:19, Joe Conway  wrote:
>
> On 5/16/24 22:26, Robert Haas wrote:
> > For example, imagine that the CommitFest is FORCIBLY empty
> > until a week before it starts. You can still register patches in the
> > system generally, but that just means they get CI runs, not that
> > they're scheduled to be reviewed. A week before the CommitFest,
> > everyone who has a patch registered in the system that still applies
> > gets an email saying "click here if you think this patch should be
> > reviewed in the upcoming CommitFest -- if you don't care about the
> > patch any more or it needs more work before other people review it,
> > don't click here". Then, the CommitFest ends up containing only the
> > things where the patch author clicked there during that week.
>
> 100% agree. This is in line with what I suggested on an adjacent part of
> the thread.

Such a proposal would basically mean that no-one that cares about
their patches getting reviews can go on holiday and leave work behind
during the week before a commit fest. That seems quite undesirable to
me.

Re: commitfest.postgresql.org is no longer fit for purpose

2024-05-17 Thread Jelte Fennema-Nio

On Fri, 17 May 2024 at 13:39, Robert Haas  wrote:
>
> On Fri, May 17, 2024 at 7:11 AM Tomas Vondra
>  wrote:
> > Yeah, I 100% agree with this. If a patch bitrots and no one cares enough
> > to rebase it once in a while, then sure - it's probably fine to mark it
> > RwF. But forcing all contributors to do a dance every 2 months just to
> > have a chance someone might take a look, seems ... not great.
>
> I don't think that clicking on a link that someone sends you that asks
> "hey, is this ready to be reviewed' qualifies as a dance.

If there's been any useful response to the patch since the last time
you pressed this button, then it might be okay. But it's definitely
not uncommon for items on the commitfest app to get no actual response
at all for half a year, i.e. multiple commits fests (except for the
odd request for a rebase that an author does within a week). I'd most
certainly get very annoyed if for those patches where it already seems
as if I'm screaming into the void I'd also be required to press a
button every two months, for it to even have a chance at receiving a
response.

> So I think the right interpretation of his comment is that
> managing the CommitFest has become about an order of magnitude more
> difficult than what it needs to be for the task to be done well.

+1

> > Long time ago there was a "rule" that people submitting patches are
> > expected to do reviews. Perhaps we should be more strict this.
>
> This sounds like it's just generating more manual work to add to a
> system that's already suffering from needing too much manual work. Who
> would keep track of how many reviews each person is doing, and how
> many patches they're submitting, and whether those reviews were
> actually any good, and what would they do about it?

+1

Re: commitfest.postgresql.org is no longer fit for purpose

2024-05-17 Thread Jelte Fennema-Nio

On Fri, 17 May 2024 at 11:02, Jelte Fennema-Nio  wrote:
>
> On Fri, 17 May 2024 at 10:47, Daniel Gustafsson  wrote:
> > One way to ensure we capture detail could be if the system would send an
> > automated email to the thread summarizing the entry when it's marked as
> > "committed"?
>
> This sounds great! Especially if

(oops pressed send too early)
**Especially if it contains the git commit hash**

> Going back from an archived thread
> to the commitfest entry or git commit is currently very hard. If I'll
> just be able to Ctrl+F on commitf...@postgresql.com that would be so
> helpful. I think it would even be useful to have an email be sent
> whenever a patch gets initially added to the commitfest, so that
> there's a back link to and it's easy to mark yourself as reviewer.
> Right now, I almost never take the time to do that because if I look
> at my inbox, I have no clue what the interesting email thread is
> called in the commitfest app.

Re: commitfest.postgresql.org is no longer fit for purpose

2024-05-17 Thread Jelte Fennema-Nio

On Fri, 17 May 2024 at 10:47, Daniel Gustafsson  wrote:
> One way to ensure we capture detail could be if the system would send an
> automated email to the thread summarizing the entry when it's marked as
> "committed"?

This sounds great! Especially if  Going back from an archived thread
to the commitfest entry or git commit is currently very hard. If I'll
just be able to Ctrl+F on commitf...@postgresql.com that would be so
helpful. I think it would even be useful to have an email be sent
whenever a patch gets initially added to the commitfest, so that
there's a back link to and it's easy to mark yourself as reviewer.
Right now, I almost never take the time to do that because if I look
at my inbox, I have no clue what the interesting email thread is
called in the commitfest app.

Re: commitfest.postgresql.org is no longer fit for purpose

2024-05-17 Thread Jelte Fennema-Nio

On Fri, 17 May 2024 at 06:58, Peter Eisentraut  wrote:
> Yeah, some fine-tuning might be required.  But I would be wary of
> over-designing too many new states at this point.  I think the key idea
> is that there ought to be a state that communicates "needs attention
> from someone other than author, reviewer, or committer".

+1 on adding a new state like this. I think it would make sense for
the author to request additional input, but I think it would need two
states, something like "Request for new reviewer" and "Request for new
committer". Just like authors disappear sometimes after a driveby
patch submission, it's fairly common too imho for reviewers or
committers to disappear after a driveby review. Having a way for an
author to mark a patch as such would be helpful, both to the author,
and to reviewers/committers looking to do help some patch out.

On Fri, 17 May 2024 at 09:33, Heikki Linnakangas  wrote:
> Things like "cfbot says
> this has bitrotted" or "Will review this after other patch this depends
> on". On the mailing list, notes like that are both noisy and easily lost
> in the threads. But as a little free-form text box on the commitfest, it
> would be handy.

+many on the free form text box

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-05-16 Thread Jelte Fennema-Nio

On Thu, 16 May 2024 at 18:57, Robert Haas  wrote:
> Ugh, it's so hard to communicate clearly about this stuff. I didn't
> really have any thought that we'd ever try to handle something as
> complicated as compression using ParameterSet.

Okay, then it's definitely very hard to communicate clearly about
this. Because being able to re-configure compression using
ParameterSet is exactly the type of thing I want to be able to do in
PgBouncer. Otherwise a connection from PgBouncer to Postgres cannot be
handed off to any client, because its compression state cannot be
changed on the fly to the state that a client expects, if the first
one wants lz4 compression, and a second one wants zstd compression.
There's no way the same connection can be reused for both unless you
decompress and recompress in pgbouncer, which is expensive to do.
Being able to reconfigure the stream to compress the messages in the
expected form is much cheaper.

> Does it send a NegotiateCompressionType message after
> the NegotiateProtocolVersion, for example? That seems like it could
> lead to the client having to be prepared for a lot of NegotiateX
> messages somewhere down the road.

Like Jacob explains, you'd want to allow the client to provide a list
of options in order of preference. And then have the server respond
with a ParameterStatus message saying what it ended up being. So no
new NegotiateXXX messages are needed, as long as we make sure any
_pq_.xxx falls back to reporting its default value on failure. This is
exactly why I said I prefer option 1 of the options I listed, because
we need _pq_.xxx messages to report their current value to the client.

To be clear, this is not special to compression. This applies to ALL
proposed protocol parameters. The server should fall back to some
"least common denominator" if it doesn't understand (part of) the
value for the protocol parameter that's provided by the client,
possibly falling back to disabling the protocol extension completely.

> Changing compression algorithms in mid-stream is tricky, too. If I
> tell the server "hey, turn on server-to-client compression!"

Yes it is tricky, but it's something that it would need to support
imho. And Jacob actually implemented it this way, so I feel like we're
discussing a non-problem here.

> I don't know if we want to support changing compression algorithms in
> mid-stream. I don't think there's any reason we can't, but it might be
> a bunch of work for something that nobody really cares about.

Again, I guess I wasn't clear at all in my previous emails and/or
commit messages. Connection poolers care **very much** about this.
Poolers need to be able to re-configure any protocol parameter to be
able to pool the same server connection across clients with
differently configured protocol parameters. Again: This is the primary
reason for me wanting to introduce the ParameterSet message.

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-05-16 Thread Jelte Fennema-Nio

On Thu, 16 May 2024 at 17:29, Robert Haas  wrote:
> You're probably not going to like this answer, but I feel like this is
> another sign that you're trying to use the protocol extensibility
> facilities in the wrong way. In my first reply to the thread, I
> proposed having the client send _pq_.protocol_set=1 in that startup
> message. If the server accepts that message, then you can send
> whatever set of message types are associated with that option, which
> could include messages to list known settings, as well as messages to
> set them. Alternatively, if we used a wire protocol bump for this, you
> could request version 3.1 and everything that I just said still
> applies. In other words, I feel that if you had adopted the design
> that I proposed back in March, or some variant of it, the problem
> you're having now wouldn't exist.

I don't really understand the benefit of your proposal over option 2
that I proposed. Afaict you're proposing that for e.g. compression we
first set _pq_.supports_compression=1 in the StartupMessage and use
that  to do feature detection, and then after we get the response we
send ParameterSet("compression", "gzip"). To me this is pretty much
identical to option 2, except that it introduces an extra round trip
for no benefit (as far as I can see). Why not go for option 2 and send
_pq_.compression=gzip in the StartupMessage directly.

> IMHO, we need to negotiate the language that we're going to use to
> communicate before we start communicating. We should find out which
> protocol version we're using, and what protocol options are accepted,
> based on sending a StartupMessage and receiving a reply. Then, after
> that, having established a common language, we can do whatever. I
> think you're trying to meld those two steps into one, which is
> understandable from the point of view of saving a round trip, but I
> just don't see it working out well.

I think not increasing the number of needed round trips in the startup
of a connection is extremely important. I think it's so important that
I honestly don't think we should merge a protocol change that
introduces an extra round trip without a VERY good reason, and this
round trip should only be needed when actually using the feature.

> What we can do in the startup
> message is extremely limited, because any startup messages we generate
> can't break existing servers, and also because of the concerns I
> raised earlier about leaving room for more extension in the future.
> Once we get past the startup message negotiation, the sky's the limit!

Sure, what we can do in the StartupMessage is extremely limited, but
what it does allow is passing arbitrary key value pairs to the server.
But by only using _pq_.feature_name=1, we're effectively only using
the key part of the key value pair. Limiting ourselves even more, by
throwing half of our communication channel away, seems like a bad idea
to me. But maybe I'm just not understanding the problem you're seeing
with using the value too.

Re: PostgreSQL 17 Beta 1 release announcement draft

2024-05-16 Thread Jelte Fennema-Nio

On Thu, 16 May 2024 at 03:45, Jonathan S. Katz  wrote:
> Attached is a copy of the PostgreSQL 17 Beta 1 release announcement
> draft.

I think we can quickly mention c4ab7da6061 in the COPY paragraph, in
some benchmarks it improved perf by close to 2x. Something like this:
"has improved performance in PostgreSQL 17 when the source encoding
matches the destination encoding *and when sending large rows from
server to client*"

Also, I think it's a bit weird to put the current COPY paragraph under
Developer Experience. I think if you want to keep it there instead of
move it to the per section, we should put the line about IGNORE_ERROR
first instead of the perf improvements. Now the IGNORE_ERROR addition
seems more of an afterthought.

s/IGNORE_ERROR/ON_ERROR

I think it would be good to clarify if the following applies when
upgrading from or to PostgreSQL 17:
"Starting with PostgreSQL 17, you no longer need to drop logical
replication slots when using pg_upgrade"

Finally, I personally would have included a lot more links for the new
items in this document. Some that would benefit from being a link
imho:
- pg_createsubscriber
- JSON_TABLE
- SQL/JSON constructor
- SQL/JSON query functions
- ON_ERROR
- sslnegotiation
- PQchangePassword
- pg_maintain

Re: First draft of PG 17 release notes

2024-05-16 Thread Jelte Fennema-Nio

On Thu, 16 May 2024 at 05:48, Andres Freund  wrote:
> We're having this debate every release. I think the ongoing reticence to note
> performance improvements in the release notes is hurting Postgres.
>
> For one, performance improvements are one of the prime reason users
> upgrade. Without them being noted anywhere more dense than the commit log,
> it's very hard to figure out what improved for users. A halfway widely
> applicable performance improvement is far more impactful than many of the
> feature changes we do list in the release notes.
>
> For another, it's also very frustrating for developers that focus on
> performance. The reticence to note their work, while noting other, far
> smaller, things in the release notes, pretty much tells us that our work isn't
> valued.

+1 to the general gist of listing every perf improvement **and memory
usage reduction** in the release notes. Most of them are already
grouped together in a dedicated "General performance" section anyway,
having that section be big would only be good imho to show that we're
committed to improving perf.

I think one thing would make this a lot easier though is if commits
that knowlingy impact perf would clearly say so in the commit message,
because now it's sometimes hard to spot as someone not deeply involved
with the specific patch. e.g. c4ab7da606 doesn't mention performance
at all, so I'm not surprised it wasn't listed initially. And while
667e65aac3 states that multiple rounds of heap scanning is now
extremely rare, it doesn't explicitly state what the kind of perf
impact can be expected because of that.

Maybe something like introducing a common "Perf-Improvement: true"
marker in the commit message and when doing so add a clear paragraph
explaining the expected perf impact perf impact. Another option could
be to add a "User Impact" section to the commit message, where an
author could add their suggestion for a release note entry. So
basically this suggestion boils down to more clearly mentioning user
impact in commit messages, instead of mostly/only including
technical/implementation details.

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-05-16 Thread Jelte Fennema-Nio

On Fri, 5 Apr 2024 at 18:30, Dave Cramer  wrote:
> On Fri, 5 Apr 2024 at 12:09, Jelte Fennema-Nio  wrote:
>> I'll take a look at redesigning the protocol parameter stuff. To work
>> with dedicated functions instead.
>
> +1

It's been a while, but I now actually took the time to look into this.
And I ran into a problem that I'd like to get some feedback on before
continuing the implementation:

If we're not setting the protocol parameter in the StartupMessage,
there's currently no way for us to know if the protocol parameter is
supported by the server. If protocol parameters were unchangable then
that would be fine, but the whole point of introducing ParameterSet is
to make it possible to change protocol parameters on an existing
connection. Having the function SupportsProtocolCompression return
false, even though you can enable compression just fine, only because
we didn't ask for compression when connecting seems quite silly and
confusing.

I see five ways around this problem and would love some feedback on
which you think is best (or if you can think of any other/better
ones):
1. Have protocol parameters always be GUC_REPORT, so that the presence
of a ParameterStatus message during connection startup can be used as
a way of detecting support for the protocol parameter.
2. Make libpq always send each known protocol parameter in the
StartupMessage to check for their support, even if the connection
string does not contain the related parameters (set them to their
default value then). Then the non-presence of the parameter in the
NegotiateProtocolVersion message can be used reliably to determine
support for the feature. We could even disallow changing a protocol
parameter at the server side using ParameterSet if it was not
requested in the StartupMessage.
3. Very similar to 1, but require explicit user input in the
connection string to request the feature on connection startup by
having the user explicitly provide its default value. If it's not
requested on connection startup assume its unsupported and disallow
usage of the feature (even if the server might actually support it).
4. Make SupportsProtocolCompression return a tri-state, SUPPORTED,
UNSUPPORTED, UNKNOWN. If it's UNKNOWN people could send a ParameterSet
message themselves to check for feature support after connection
startup. We could even recognize this and change the state that
SupportProtocolCompression function to return SUPPORTED/UNSUPPORTED on
future calls according to the server response.
5. Basically the same as 4 but automatically send a ParameterSet
message internally when calling SupportsProtocolCompression and the
state is UNKNOWN, so we'd only ever return SUPPORTED or UNSUPPORTED.

The above options are listed in my order of preference, below some
reasoning why:

1 and 2 would increase the bandwidth used during connection handshake
slightly for each protocol parameter that we would add, but they have
the best user experience IMHO.

I slightly prefer 1 over 2 because there is another argument to be
made for always having protocol parameters be GUC_REPORT: these
parameters change what message types a client can send or receive. So
it makes sense to me to have the server tell the client what the
current value of such a parameter is. This might not be a strong
argument though, because this value would only ever change due to user
interaction. But still, one might imagine scenarios where the value
that the client sent is not exactly what the server would set the
parameter to on receiving that value from the client. e.g. for
protocol compression, maybe the client sends a list of prefered
compression methods and the server would send a ParameterStatus
containing only the specific compression method that it will use when
sending messages to the client.

3 seems also an acceptable option to me. While having slightly worse
user experience than 2, it allows the user of the client to make the
decision if the extra bandwidth during connection startup is worth it
to be able to enable the feature later.

4 assumes that we want people to be able to trigger sending
ParameterSet messages for every protocol parameter. I'm not sure we'd
want to give that ability in all cases.

5 would require SupportsProtocolCompression to also have a
non-blocking version, which bloats our API more than I'd like. Also as
a user you wouldn't be able to know if SupportsProtocolCompression
will do a network request or not.

PS. This is only a problem for feature detection for features relying
on protocol parameters, feature-support relying on protocol version
bumps are easy to detect based on the NegotiateProtocolVersion
message.

Re: Postgres and --config-file option

2024-05-15 Thread Jelte Fennema-Nio

On Wed, 15 May 2024 at 11:49, Peter Eisentraut  wrote:
> Yeah, some of this is becoming quite unwieldy, if we document and
> mention each spelling variant of each option everywhere.
>
> Maybe if the original problem is that the option --config-file is not
> explicitly in the --help output, let's add it to the --help output?

I definitely think it would be useful to list this --config variant in
more places, imho it's nicer than the -c variant. Especially in the
PGOPTIONS docs it would be useful. People are already using it in the
wild and I regressed on support for that in PgBouncer by accident:
https://github.com/pgbouncer/pgbouncer/pull/1064

Re: First draft of PG 17 release notes

2024-05-14 Thread Jelte Fennema-Nio

On Tue, 14 May 2024 at 02:56, Bruce Momjian  wrote:
>
> On Sat, May 11, 2024 at 10:24:39AM -0400, Joe Conway wrote:
> > On 5/11/24 09:57, Jelte Fennema-Nio wrote:
> > > The buffering change improved performance up to ~40% in some of the
> > > benchmarks. The case it improves mostly is COPY of large rows and
> > > streaming a base backup. That sounds user-visible enough to me to
> > > warrant an entry imho.
> >
> > +1
>
> Attached patch applied.

I think we shouldn't list this under the libpq changes and shouldn't
mention libpq in the description, since this patch changes
src/backend/libpq files instead of src/interfaces/libpq files. I think
it should be in the "General performance" section and describe the
change as something like the below:

Improve performance when transferring large blocks of data to a client

PS. I completely understand that this was not clear from the commit message.

Re: JIT compilation per plan node

2024-05-14 Thread Jelte Fennema-Nio

On Tue, 14 May 2024 at 10:19, David Rowley  wrote:
> Here's a plan where the total cost of a subnode is greater than the
> total cost of the top node:

Ah I didn't realize it was possible for that to happen. **reads up on
plan costs**

This actually makes me think that using total_cost of the sub-nodes is
not the enough to determine determining if the node should be JITet.
We wouldn't want to start jitting plans like this, i.e. introducing
all the JIT setup overhead for just a single row:

set max_parallel_workers_per_gather=0;
create table t0 as select a from generate_Series(1,100)a;
analyze t0;
explain select a+a*a+a*a+a from t0 limit 1;
QUERY PLAN
-
 Limit  (cost=0.00..0.03 rows=1 width=4)
   ->  Seq Scan on t0  (cost=0.00..26980.00 rows=100 width=4)

An easy way to work around that issue I guess is by using the minimum
total_cost of all the total_costs from the current sub-node up to the
root node. The current minimum could be passed along as a part of the
context I guess.

Re: JIT compilation per plan node

2024-05-14 Thread Jelte Fennema-Nio

On Tue, 20 Feb 2024 at 06:38, David Rowley  wrote:
>
> On Tue, 20 Feb 2024 at 18:31, Tom Lane  wrote:
> > FWIW, I seriously doubt that an extra walk of the plan tree is even
> > measurable compared to the number of cycles JIT compilation will
> > expend if it's called.  So I don't buy your argument here.
> > We would be better off to do this in a way that's clean and doesn't
> > add overhead for non-JIT-enabled builds.
>
> The extra walk of the tree would need to be done for every plan, not
> just the ones where we do JIT. I'd rather find a way to not add this
> extra plan tree walk, especially since the vast majority of cases on
> an average instance won't be doing any JIT.

I'm not saying I'd prefer the extra walk, but I don't think you'd need
to do this extra walk for all plans. Afaict you could skip the extra
walk when top_plan->total_cost < jit_above_cost. i.e. only doing the
extra walk to determine which exact nodes to JIT for cases where we
currently JIT all nodes. That would limit the extra walk overhead to
cases where we currently already spend significant resources on JITing
stuff.

Re: Direct SSL connection with ALPN and HBA rules

2024-05-13 Thread Jelte Fennema-Nio

On Mon, 13 May 2024 at 18:14, Robert Haas  wrote:
> I disagree with Jacob's assertion that sslmode=require has no security
> benefits over sslmode=prefer. That seems like the kind of pessimism
> that makes people hate security professionals. There have got to be
> some attacks that are foreclosed by encrypting the connection, even if
> you don't stop MITM attacks or other things that are more
> sophisticated than running wireshark and seeing what goes by on the
> wire.

Like Jacob already said, I guess you meant me here. The main point I
was trying to make is that sslmode=require is extremely insecure too,
so if we're changing the default then I'd rather bite the bullet and
actually make the default a secure one this time. No-ones browser
trusts self-signed certs by default, but currently lots of people
trust self-signed certs when connecting to their production database
without realizing.

IMHO the only benefit that sslmode=require brings over sslmode=prefer
is detecting incorrectly configured servers i.e. servers that are
supposed to support ssl but somehow don't due to a misconfigured
GUC/pg_hba. Such "incorrectly configured server" detection avoids
sending data to such a server, which an eavesdropper on the network
could see. Which is definitely a security benefit, but it's an
extremely small one. In all other cases sslmode=prefer brings exactly
the same protection as sslmode=require, because sslmode=prefer
encrypts the connection unless postgres actively tells the client to
downgrade to plaintext (which never happens when the server is
configured correctly).

Re: Direct SSL connection with ALPN and HBA rules

2024-05-13 Thread Jelte Fennema-Nio

On Mon, 13 May 2024 at 13:07, Heikki Linnakangas  wrote:
> "channel_binding=require sslmode=require" also protects from MITM attacks.

Cool, I didn't realize we had this connection option and it could be
used like this. But I think there's a few security downsides of
channel_binding=require over sslmode=verify-full: If the client relies
on channel_binding to validate server authenticity, a leaked
server-side SCRAM hash is enough for an attacker to impersonate a
server. While normally a leaked scram hash isn't really much of a
security concern (assuming long enough passwords). I also don't know
of many people rotating their scram hashes, even though many rotate
TLS certs.

> I think these options should be designed from the user's point of view,
> so that the user can specify the risks they're willing to accept, and
> the details of how that's accomplished are handled by libpq. For
> example, I'm OK with (tick all that apply):
>
> [ ] passive eavesdroppers seeing all the traffic
> [ ] MITM being able to hijack the session
> [ ] connecting without verifying the server's identity
> [ ] divulging the plaintext password to the server
> [ ] ...

I think that sounds like a great idea, looking forward to the proposal.

Re: Direct SSL connection with ALPN and HBA rules

2024-05-13 Thread Jelte Fennema-Nio

On Mon, 13 May 2024 at 15:38, Heikki Linnakangas  wrote:
> Here's a patch to implement that.

+   if (conn->sslnegotiation[0] == 'd' &&
+   conn->sslmode[0] != 'r' && conn->sslmode[0] != 'v')

I think these checks should use strcmp instead of checking magic first
characters. I see this same clever trick is used in the recently added
init_allowed_encryption_methods, and I think that should be changed to
use strcmp too for readability.

Re: Direct SSL connection with ALPN and HBA rules

2024-05-13 Thread Jelte Fennema-Nio

On Sun, 12 May 2024 at 23:39, Heikki Linnakangas  wrote:
> You might miss that by changing sslnegotiation to 'postgres', or by
> removing it altogether, you not only made it compatible with older
> server versions, but you also allowed falling back to a plaintext
> connection. Maybe you're fine with that, but maybe not. I'd like to
> nudge people to use sslmode=require, not rely on implicit stuff like
> this just to make connection strings a little shorter.

I understand your worry, but I'm not sure that this is actually much
of a security issue in practice. sslmode=prefer and sslmode=require
are the same amount of insecure imho (i.e. extremely insecure). The
only reason sslmode=prefer would connect as non-ssl to a server that
supports ssl is if an attacker has write access to the network in the
middle (i.e. eavesdropping ability alone is not enough). Once an
attacker has this level of network access, it's trivial for this
attacker to read any data sent to Postgres by intercepting the TLS
handshake and doing TLS termination with some arbitrary cert (any cert
is trusted by sslmode=require).

So the only actual case where this is a security issue I can think of
is when an attacker has only eavesdropping ability on the network. And
somehow the Postgres server that the client tries to connect to is
configured incorrectly, so that no ssl is set up at all. Then a client
would drop to plaintext, when connecting to this server instead of
erroring, and the attacker could now read the traffic. But I don't
really see this scenario end up any differently when requiring people
to enter sslmode=require. The only action a user can take to connect
to a server that does not have ssl support is to remove
sslmode=require from the connection string. Except if they are also
the server operator, in which case they could enable ssl on the
server. But if ssl is not set up, then it was probably never set up,
and thus providing sslnegotiation=direct would be to test if ssl
works.

But, if you disagree I'm fine with erroring on plain sslnegotiation=direct

> In v18, I'd like to make sslmode=require the default. Or maybe introduce
> a new setting like "encryption=ssl|gss|none", defaulting to 'ssl'. If we
> want to encourage encryption, that's the right way to do it. (I'd still
> recommend everyone to use an explicit sslmode=require in their
> connection strings for many years, though, because you might be using an
> older client without realizing it.)

I'm definitely a huge proponent of making the connection defaults more
secure. But as described above sslmode=require is still extremely
insecure and I don't think we gain anything substantial by making it
the default. I think the only useful secure default would be to use
sslmode=verify-full (with probably some automatic fallback to
sslmode=prefer when connecting to hardcoded IPs or localhost). Which
probably means that sslrootcert=system should also be made the
default. Which would mean that ~/.postgresql/root.crt would not be the
default anymore, which I personally think is fine but others likely
disagree.

Re: Direct SSL connection with ALPN and HBA rules

2024-05-11 Thread Jelte Fennema-Nio

On Fri, 10 May 2024 at 15:50, Heikki Linnakangas  wrote:
> New proposal:
>
> - Remove the "try both" mode completely, and rename "requiredirect" to
> just "direct". So there would be just two modes: "postgres" and
> "direct". On reflection, the automatic fallback mode doesn't seem very
> useful. It would make sense as the default, because then you would get
> the benefits automatically in most cases but still be compatible with
> old servers. But if it's not the default, you have to fiddle with libpq
> settings anyway to enable it, and then you might as well use the
> "requiredirect" mode when you know the server supports it. There isn't
> anything wrong with it as such, but given how much confusion there's
> been on how this all works, I'd prefer to cut this back to the bare
> minimum now. We can add it back in the future, and perhaps make it the
> default at the same time. This addresses points 2. and 3. above.
>
> and:
>
> - Only allow sslnegotiation=direct with sslmode=require or higher. This
> is what you, Jacob, wanted to do all along, and addresses point 1.
>
> Thoughts?

Sounds mostly good to me. But I think we'd want to automatically
increase sslmode to require if it is unset, but sslnegotation is set
to direct. Similar to how we bump sslmode to verify-full if
sslrootcert is set to system, but sslmode is unset. i.e. it seems
unnecessary/unwanted to throw an error if the connection string only
contains sslnegotiation=direct

Re: First draft of PG 17 release notes

2024-05-11 Thread Jelte Fennema-Nio

On Fri, 10 May 2024 at 23:31, Tom Lane  wrote:
>
> Bruce Momjian  writes:
> > I looked at both of these.   In both cases I didn't see why the user
> > would need to know these changes were made:
>
> I agree that the buffering change is not likely interesting, but
> the fact that you can now control-C out of a psql "\c" command
> is user-visible.  People might have internalized the fact that
> it didn't work, or created complicated workarounds.

The buffering change improved performance up to ~40% in some of the
benchmarks. The case it improves mostly is COPY of large rows and
streaming a base backup. That sounds user-visible enough to me to
warrant an entry imho.

Re: First draft of PG 17 release notes

2024-05-10 Thread Jelte Fennema-Nio

On Thu, 9 May 2024 at 06:04, Bruce Momjian  wrote:
>
> I have committed the first draft of the PG 17 release notes;  you can
> see the results here:
>
> https://momjian.us/pgsql_docs/release-17.html

Great work!

There are two commits that I think would benefit from being listed
(but maybe they are already listed and I somehow missed them, or they
are left out on purpose for some reason):

- c4ab7da60617f020e8d75b1584d0754005d71830
- cafe1056558fe07cdc52b95205588fcd80870362

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-05-06 Thread Jelte Fennema-Nio

On Tue, 23 Apr 2024 at 19:39, Jacob Champion
 wrote:
>
> On Mon, Apr 22, 2024 at 2:20 PM Jelte Fennema-Nio  wrote:
> > 1. I strongly believe minor protocol version bumps after the initial
> > 3.1 one can be made painless for clients/poolers (so the ones to
> > 3.2/3.3/etc). Similar to how TLS 1.3 can be safely introduced, and not
> > having to worry about breaking TLS 1.2 communication.
>
> Apologies for focusing on a single portion of your argument, but this
> claim in particular stuck out to me. To my understanding, IETF worried
> a _lot_ about breaking TLS 1.2 implementations with the TLS 1.3
> change, to the point that TLS 1.3 clients and servers advertise
> themselves as TLS 1.2 and sneak the actual version used into a TLS
> extension (roughly analogous to the _pq_ stuff). I vaguely recall that
> the engineering work done for that update was pretty far from
> painless.

My bad... I guess TLS 1.3 was a bad example, due to it changing the
handshake itself so significantly.

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-05-06 Thread Jelte Fennema-Nio

On Tue, 23 Apr 2024 at 17:03, Robert Haas  wrote:
> If a client doesn't know about encrypted columns and sets that
> bit at random, that will break things, and formally I think that's a
> risk, because I don't believe we document anywhere that you shouldn't
> set unused bits in the format mask. But practically, it's not likely.
> (And also, maybe we should document that you shouldn't do that.)

Currently Postgres errors out when you set anything other than 0 or 1
in the format field. And it's already documented that these are the
only allowed values: "Each must presently be zero (text) or one
(binary)."

> Let's consider a hypothetical country much like Canada except that
> there are three official languages rather than two: English, French,
> and Robertish.

I don't really understand the point you are trying to make with this
analogy. Sure an almost equivalent unused language maybe shouldn't be
put on the signs, but having two different names (aka protocol
versions) for English and Robertish seems quite useful to avoid
confusion.

> And so here. If someone codes a connection pooler in the way you
> suppose, then it will break. But, first of all, they probably won't do
> that, both because it's not particularly likely that someone wants to
> gather that particular set of statistics and also because erroring out
> seems like an overreaction.

Is it really that much of an overreaction? Postgres happily errors on
any weirdness in the protocol including unknown format codes. Why
wouldn't a pooler/proxy in the middle be allowed to do the same? The
pooler has no clue what the new format means, maybe it requires some
session state to be passed on to the server to be interpreted
correctly. The pooler would be responsible for syncing that session
state but might not have synced it because it doesn't know that the
format code requires that. An example of this would be a compressed
value of which the compression algorithm is configured using a GUC.
Thus the pooler being strict in what it accepts would be a good way of
notifying the client that the pooler needs to be updated (or the
feature not used).

But I do agree that it seems quite unlikely that a pooler would
implement it like that though. However, a pooler logging a warning
seems not that crazy. And in that case the connection might not be
closed, but the logs would be flooded.

> And secondly, let's imagine that we do
> bump the protocol version and think about whether and how that solves
> the problem. A client will request from the pooler a version 3.1
> connection and the pooler will say, sorry, no can do, I only
> understand 3.0. So the client will now say, oh ok, no problem, I'm
> going to refrain from setting that parameter format bit. Cool, right?
>
> Well, no, not really. First, now the client application is probably
> broken. If the client is varying its behavior based on the server's
> protocol version, that must mean that it cares about accessing
> encrypted columns, and that means that the bit in question is not an
> optional feature. So actually, the fact that the pooler can force the
> client to downgrade hasn't fixed anything at all.

It seems quite a lot nicer if the client can safely fallback to not
writing data using the new format code, instead of getting an error
from the pooler/flooding the pooler logs/having the pooler close the
connection.

> Third, applications, drivers, and connection poolers now all need to
> worry about handling downgrades smoothly. If a connection pooler
> requests a v3.1 connection to the server and gets v3.0, it had better
> make sure that it only advertises 3.0 to the client.

This seems quite straightforward to solve from a pooler perspective:
1. Connect to the server first to find out what the maximum version is
that it support
2. If a client asks for a higher version, advertise the server version

> If the client
> requests v3.0, the pooler had better make sure to either request v3.0
> from the server. Or alternatively, the pooler can be prepared to
> translate between 3.0 and 3.1 wherever that's needed, in either
> direction. But it's not at all clear what that would look like for
> something like TCE. Will the pooler arrange to encrypt parameters
> destined for encrypted tables if the client doesn't do so? Will it
> arrange to decrypt values coming from encrypted tables if the client
> doesn't understand encryption?

This is a harder problem, and indeed one I hadn't considered much.
>From a pooler perspective I really would like to be able to have just
one pool of connections to the server all initially connected with the
3.1 protocol and use those 3.1 server connections for clients that use
the 3.1 protocol but also for clients that use the 3.0 protocol.
Creating separate pools of server connections for different protocol
versions is super annoying to manage for operators (e.g. how should
these pools be sized). One of the reasons I'm proposing the
ParameterSet message is to avoid this problem for protocol

Re: Reducing the log spam

2024-05-02 Thread Jelte Fennema-Nio

On Thu, 2 May 2024 at 13:08, Jelte Fennema-Nio  wrote:
> 2. Change the newly added check in errcode() to only set
> output_to_server to false when IsLogicalWorker() returns false.

Actually a third, and probably even better solution would be to only
apply this new GUC to non-backgroundworker processes. That seems quite
reasonable, since often the only way to access background worker
errors is often through the logs.

Re: Reducing the log spam

2024-05-02 Thread Jelte Fennema-Nio

On Thu, 2 May 2024 at 12:47, Laurenz Albe  wrote:
> Yes.  But I'd argue that that is a shortcoming of logical replication:
> there should be a ways to get this information via SQL.  Having to look into
> the log file is not a very useful option.

Definitely agreed that accessing the error details using SQL would be
much better. But having no way at all (by default) to find the cause
of the failure is clearly much worse.

> The feature will become much less useful if unique voilations keep getting 
> logged.

Agreed. How about changing the patch so that the GUC is not applied to
logical replication apply workers (and document that accordingly). I
can think of two ways of achieving that (but there might be
other/better ones):
1. Set the GUC to empty string when an apply worker is started.
2. Change the newly added check in errcode() to only set
output_to_server to false when IsLogicalWorker() returns false.

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-04-22 Thread Jelte Fennema-Nio

On Mon, 22 Apr 2024 at 16:26, Robert Haas  wrote:
> That's a fair point, but I'm still not seeing much practical
> advantage. It's unlikely that a client is going to set a random bit in
> a format parameter for no reason.

I think you're missing an important point of mine here. The client
wouldn't be "setting a random bit in a format parameter for no
reason". The client would decide it is allowed to set this bit,
because the PG version it connected to supports column encryption
(e.g. PG18). But this completely breaks protocol and application layer
separation.

It doesn't seem completely outside of the realm of possibility for a
pooler to gather some statistics on the amount of Bind messages that
use text vs binary query parameters. That's very easily doable now,
while looking only at the protocol layer. If a client then sets the
new format parameter bit, this pooler could then get confused and
close the connection.

> Perhaps this is the root of our disagreement, or at least part of it.
> I completely agree that it is important for human beings to be able to
> understand whether, and how, the wire protocol has changed from one
> release to another.

I think this is partially the reason for our disagreement, but I think
there are at least two other large reasons:

1. I strongly believe minor protocol version bumps after the initial
3.1 one can be made painless for clients/poolers (so the ones to
3.2/3.3/etc). Similar to how TLS 1.3 can be safely introduced, and not
having to worry about breaking TLS 1.2 communication. Once clients and
poolers implement version negotiation support for 3.1, there's no
reason for version negation support to work for 3.0 and 3.1 to then
suddenly break on the 3.2 bump. To be clear, I'm talking about the act
of bumping the version here, not the actual protocol changes. So
assuming zero/near-zero client implementation effort for the new
features (like never setting the newly supported bit in a format
parameter), then bumping the protocol version for these new features
can never have negative consequences.

2. I very much want to keep a clear split between the protocol layer
and the application layer of our communication. And these layers merge
whenever (like you say) "the wire protocol has changed from one
release to another", but no protocol version bump or protocol
extension is used to indicate that. When that happens the only way for
a client to know what valid wire protocol messages are according to
the server, is by checking the server version. This completely breaks
the separation between layers. So, while checking the server version
indeed works for direct client to postgres communication, it starts to
break down whenever you put a pooler inbetween (as explained in the
example earlier in this email). And it breaks down even more when
connecting to servers that implement the Postgres wire protocol, but
are not postgres at all, like CockroachDB. Right now libpq and other
postgres drivers can be used to talk to these other servers and
poolers, but if we start mixing protocol and application layer stuff
then eventually that will stop being the case.

Afaict from your responses, you disagree with 1. However, it's not at
all clear to me what exact problems you're worried about. It sounds
like you don't know either, and it's more that you're worried about
things breaking for not yet known reasons. I hoped to take away/reduce
those worries using some arguments in a previous email (quoted below),
but you didn't respond to those arguments, so I'm not sure if they
were able to change your mind.

On Thu, 18 Apr 2024 at 21:34, Jelte Fennema-Nio  wrote:
> When the server supports a lower version than the client, the client
> should disable certain features because it gets the
> ProtocolVersionNegotiation message. This is also true if we don't bump
> the version. Negotiating a lower version actually makes it clearer for
> the client what features to disable. Using the reported postgres
> version for this might not, because a connection pooler in the middle
> might not support the features that the client wants and thus throw an
> error (e.g. due to the client sending unknown messages) even if the
> backing Postgres server would support these features. Not to mention
> non-postgresql servers that implement the PostgreSQL protocol (of
> which there are more and more).
>
> When the server supports a higher version, the client never even
> notices this because the server will silently accept and only enable
> the features of the lower version. So this could never cause breakage,
> as from the client's perspective the server didn't bump their protocol
> version.

Re: Support a wildcard in backtrace_functions

2024-04-21 Thread Jelte Fennema-Nio

On Sat, 20 Apr 2024 at 01:19, Michael Paquier  wrote:
> Removing this GUC and making the backend react by default the same way
> as when this GUC was enabled sounds like a sensible route here.  This
> stuff is useful.

I definitely agree it's useful. But I feel like changing the default
of the GUC and removing the ability to disable it at the same time are
pretty radical changes that we should not be doing after a feature
freeze. I think we should at least have a way to turn this feature off
to be able to avoid log spam. Especially given the fact that
extensions use elog much more freely than core. Which afaict from
other threads[1] Tom also thinks we should normally be careful about.

Of the options to resolve the open item so far, I think there are only
three somewhat reasonable to do after the feature freeze:
1. Rename the GUC to something else (but keep behaviour the same)
2. Decide to keep the GUC as is
3. Revert a740b213d4

I hoped 1 was possible, but based on the discussion so far it doesn't
seem like we'll be able to get a quick agreement on a name. IMHO 2 is
just a version of 1, but with a GUC name that no-one thinks is a good
one. So I think we are left with option 3.

[1]: 
https://www.postgresql.org/message-id/flat/524751.1707240550%40sss.pgh.pa.us#59710fd4f3f186e642b8e6b886b2fdff

Re: Support a wildcard in backtrace_functions

2024-04-19 Thread Jelte Fennema-Nio

On Fri, 19 Apr 2024 at 10:58, Peter Eisentraut  wrote:
> This presupposes that there is consensus about how the future
> functionality should look like.  This topic has gone through half a
> dozen designs over a few months, and I think it would be premature to
> randomly freeze that discussion now and backport that design.

While there maybe isn't consensus on what a new design exactly looks
like, I do feel like there's consensus on this thread that the
backtrace_on_internal_error GUC is almost certainly not the design
that we want. I guess a more conservative approach to this problem
would be to revert the backtrace_on_internal_error commit and agree on
a better design for PG18. But I didn't think that would be necessary
if we could agree on the name for a more flexibly named GUC, which
seemed quite possible to me (after a little bit of bikeshedding).

> If a better, more comprehensive design arises for PG18, I think it would
> be pretty easy to either remove backtrace_on_internal_error or just
> internally remap it.

I think releasing a GUC (even if it's just meant for developers) in
PG17 and then deprecating it for a newer version in PG18 wouldn't be a
great look. And even if that's not a huge problem, it still seems
better not to have the problem at all. Renaming the GUC now seems to
have only upsides to me: worst case the new design turns out not to be
what we want either, and we have to deprecate the GUC. But in the best
case we don't need to deprecate anything.

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-04-18 Thread Jelte Fennema-Nio

On Thu, 18 Apr 2024 at 23:36, Jelte Fennema-Nio  wrote:
> To clarify: My proposed approach is to use a protocol extension
> parameter for to enable the new messages that the server can send
> (like Peter is doing now). And **in addition to that** gate the new
> Bind format type behind a feature switch.

ugh, correction: gate the new Bind format type behind a **protocol bump**

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-04-18 Thread Jelte Fennema-Nio

On Thu, 18 Apr 2024 at 22:17, Robert Haas  wrote:
> If we go with Peter's approach, every driver that supports his feature
> will work perfectly, and every driver that doesn't will work exactly
> as it does today. The risk of breaking anything is as near to zero as
> human developers can reasonably hope to achieve. Nobody who doesn't
> care about the feature will have to lift a single finger, today,
> tomorrow, or ever. That's absolutely brilliant.
>
> If we instead go with your approach, then anyone who wants to support
> 3.2 when it materializes will have to also support 3.1, which means
> they have to support this feature.

To clarify: My proposed approach is to use a protocol extension
parameter for to enable the new messages that the server can send
(like Peter is doing now). And **in addition to that** gate the new
Bind format type behind a feature switch. There is literally nothing
clients will have to do to "support" that feature (except for
requesting a higher version protocol). Your suggestion of not bumping
the version but still allowing the new format type on version 3.0
doesn't have any advantage afaict, except secretly hiding from any
pooler in the middle that such a format type might be sent.

> Also, even just 3.1 is going to break
> something for somebody. There's just no way that we've left the
> protocol version unchanged for this long and the first change we make
> doesn't cause some collateral damage.

Sure, but the exact same argument holds for protocol extension
parameters. We've never set them, so they are bound to break something
the first time. My whole point is that once we bite that bullet, the
next protocol parameters and protocol version bumps won't cause such
breakage.

> Sure, those are minor downsides in the grand scheme of things. But
> AFAICS the only downside of Peter's approach that you've alleged is
> that doesn't involve bumping the version number. Of course, if bumping
> the version number is an intrinsic good, then no further justification
> is required, but I don't buy that. I do not believe that users or
> maintainers will throw us a pizza party when they find out that we've
> changed the version number. There's no reason for anyone to be happy
> about that for its own sake.

As a connection pooler maintainer I would definitely love it if every
protocol change required either a protocol version parameter or a
protocol version bump. That way I can easily check every release if
the protocol changed by looking at two things, instead of diffing the
protocol docs for some tiny "supposedly irrelevant" change was made.

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-04-18 Thread Jelte Fennema-Nio

On Mon, 15 Apr 2024 at 21:47, Dave Cramer  wrote:
>> On Mon, 15 Apr 2024 at 19:52, Robert Haas  wrote:
>> > surely it can't be right to use protocol
>> > version 3.0 to refer to a bunch of different things. But at the same
>> > time, surely we don't want clients to start panicking and bailing out
>> > when everything would have been fine.
>>
>> I feel like the ProtocolVersionNegotiation should make sure people
>> don't panic and bail out. And if not, then feature flags won't help
>> with this either. Because clients would then still bail out if some
>> feature is not supported.
>
> I don't think a client should ever bail out. They may not support something 
> but IMO bailing out is not an option.

On Thu, 18 Apr 2024 at 21:01, Robert Haas  wrote:
> On Thu, Apr 18, 2024 at 1:49 PM Jelte Fennema-Nio  wrote:
> > IMHO that means that we should also bump the protocol version for this
> > change, because it's changing the wire protocol by adding a new
> > parameter format code. And it does so in a way that does not depend on
> > the new protocol extension.
>
> I think we're more or less covering the same ground we did on the
> other thread here -- in theory I don't love the fact that we never
> bump the protocol version when we change stuff, but in practice if we
> start bumping it every time we do anything I think it's going to just
> break a bunch of stuff without any real benefit.

(the second quoted message comes from Peter his column encryption
thread, but responding here to keep this discussion in one place)

I really don't understand what exactly you're worried about. What do
you expect will break when bumping the protocol version? As Dave said,
clients should never bail out due to protocol version differences.

When the server supports a lower version than the client, the client
should disable certain features because it gets the
ProtocolVersionNegotiation message. This is also true if we don't bump
the version. Negotiating a lower version actually makes it clearer for
the client what features to disable. Using the reported postgres
version for this might not, because a connection pooler in the middle
might not support the features that the client wants and thus throw an
error (e.g. due to the client sending unknown messages) even if the
backing Postgres server would support these features. Not to mention
non-postgresql servers that implement the PostgreSQL protocol (of
which there are more and more).

When the server supports a higher version, the client never even
notices this because the server will silently accept and only enable
the features of the lower version. So this could never cause breakage,
as from the client's perspective the server didn't bump their protocol
version.

So, I don't understand why you seem to view bumping the protocol
version with so much negativity. We're also bumping PG versions every
year. Afaik people only like that, partially because it's immediately
clear that certain features (e.g. MERGE) are not supported when
connecting to older servers. To me the same is true for bumping the
protocol version. There are no downsides to bumping it, only upsides.

Re: Transparent column encryption

2024-04-18 Thread Jelte Fennema-Nio

On Thu, 18 Apr 2024 at 18:46, Robert Haas  wrote:
> With regard to the Bind message, I suggest that we regard the protocol
> change as reserving a currently-unused bit in the message to indicate
> whether the value is pre-encrypted, without reference to the protocol
> extension. It could be legal for a client that can't understand
> encryption message from the server to supply an encrypted value to be
> inserted into a column. And I don't think we would ever want the bit
> that's being reserved here to be used by some other extension for some
> other purpose, even when this extension isn't used. So I don't see a
> need for this to be tied into the protocol extension.

I think this is an interesting idea. I can indeed see use cases for
e.g. inserting a new row based on another row (where the secret is the
same).

IMHO that means that we should also bump the protocol version for this
change, because it's changing the wire protocol by adding a new
parameter format code. And it does so in a way that does not depend on
the new protocol extension.

Re: Transparent column encryption

2024-04-18 Thread Jelte Fennema-Nio

On Thu, 18 Apr 2024 at 13:25, Peter Eisentraut  wrote:
> Hopefully, the reason for key rotation is mainly that policies require
> key rotation, not that keys get compromised all the time.

These key rotation policies are generally in place to reduce the
impact of a key compromise by limiting the time a compromised key is
valid.

> This
> seems pretty standard to me.  For example, I can change the password on
> my laptop's file system encryption, which somehow wraps a lower-level
> key, but I can't reencrypt the actual file system in place.

I think the threat model for this proposal and a laptop's file system
encryption are different enough that the same choices/tradeoffs don't
automatically translate. Specifically in this proposal the unencrypted
CEK is present on all servers that need to read/write those encrypted
values. And a successful attacker would then be able to read the
encrypted values forever with this key, because it effectively cannot
be rotated. That is a much bigger attack surface and risk than a
laptop's disk encryption. So, I feel quite strongly that shipping the
proposed feature without being able to re-encrypt columns in an online
fashion would be a mistake.

> That's the
> reason for having this two-tier key system in the first place.

If we allow for online-rotation of the actual encryption key, then
maybe we don't even need this two-tier system ;)

Not having this two tier system would have a few benefits in my opinion:
1. We wouldn't need to be sending encrypted key material from the
server to every client. Which seems nice from a security, bandwidth
and client implementation perspective.
2. Asymmetric encryption of columns is suddenly an option. Allowing
certain clients to enter encrypted data into the database but not read
it.

> Two problems here: One, for deterministic encryption, everyone needs to
> agree on the representation, otherwise equality comparisons won't work.
>   Two, if you give clients the option of storing text or binary, then
> clients also get back a mix of text or binary, and it will be a mess.
> Just giving the option of storing the payload in binary wouldn't be that
> hard, but it's not clear what you can sensibly do with that in the end.

How about defining at column creation time if the underlying value
should be binary or not? Something like:

CREATE TABLE t(
mytime timestamp ENCRYPTED WITH (column_encryption_key = cek1, binary=true)
);

> Even if the identifiers
> somehow were global (but OIDs can also change and are not guaranteed
> unique forever),

OIDs of existing rows can't just change while a connection is active,
right? (all I know is upgrades can change them but that seems fine)
Also they are unique within a catalog table, right?

> the state of which keys have already been sent is
> session state.

I agree that this is the case. But it's state that can be tracked
fairly easily by a transaction pooler. Similar to how prepared
statements can be tracked. And this is easier to do when at the IDs of
the same keys are the same across each session to the server, because
if they differ then you need to do re-mapping of IDs.

> This is kind of like SASL or TLS can add new methods dynamically without
> requiring a new version.  I mean, as we are learning, making new
> protocol versions is kind of hard, so the point was to avoid it.

Fair enough

> I guess you could do that, but wouldn't that making the decoding of
> these messages much more complicated?  You would first have to read the
> "short" variant, decode the format, and then decide to read the rest.
> Seems weird.

I see your point. But with the current approach even for queries that
don't return any encrypted columns, these useless fields would be part
of the RowDescryption. It seems quite annoying to add extra network
and parsing overhead all of your queries even if only a small
percentage use the encryption feature. Maybe we should add a new
message type instead like EncryptedRowDescription, or add some flag
field at the start of RowDescription that can be used to indicate that
there is encryption info for some of the columns.

> Yes, that's what would happen, and that's the intention, so that for
> example you can use pg_dump to back up encrypted columns without having
> to decrypt them.

Okay, makes sense. But I think it would be good to document that.

> > A related question to this is that currently libpq throws an error if
> > e.g. a master key realm is not defined but another one is. Is that
> > really what we want? Is not having one of the realms really that
> > different from not providing any realms at all?
>
> Can you provide a more concrete example of what scenario you have a
> concern about?

A server has table A and B. A is encrypted with a master key realm X
and B is encrypted with master key realm Y. If libpq is only given a
key for realm X, and it then tries to read table B, an error is
thrown. While if you don't provide any realm at all, you can read from
table B just

Re: Support a wildcard in backtrace_functions

2024-04-18 Thread Jelte Fennema-Nio

On Thu, 18 Apr 2024 at 09:02, Michael Paquier  wrote:
>
> On Fri, Apr 12, 2024 at 09:36:36AM +0900, Michael Paquier wrote:
> > log_backtrace speaks a bit more to me as a name for this stuff because
> > it logs a backtrace.  Now, there is consistency on HEAD as well
> > because these GUCs are all prefixed with "backtrace_".  Would
> > something like a backtrace_mode where we have an enum rather than a
> > boolean be better?

I guess it depends what we want consistency with. If we want naming
consistency with all other LOGGING_WHAT GUCs or if we want naming
consistency with the current backtrace_functions GUC. I personally
like log_backtrace slightly better, but I don't have a super strong
opinion on this either. Another option could be plain "backtrace".

> > One thing would be to redesign the existing GUC as
> > having two values on HEAD as of:
> > - "none", to log nothing.
> > - "internal", to log backtraces for internal errors.

If we go for backtrace_mode or backtrace, then I think I'd prefer
"disabled"/"off" and "internal_error" for these values.

> The rest of the proposals had better happen as a v18 discussion, where
> extending this GUC is a benefit.

agreed

Re: Support a wildcard in backtrace_functions

2024-04-18 Thread Jelte Fennema-Nio

On Thu, 18 Apr 2024 at 10:50, Peter Eisentraut  wrote:
> Why exactly is this an open item?  Is there anything wrong with the
> existing feature?

The name of the GUC backtrace_on_internal_error is so specific that
it's impossible to extend our backtrace behaviour in future releases
without adding yet another backtrace GUC. You started the discussion
on renaming it upthread:

On Fri, 8 Mar 2024 at 15:51, Peter Eisentraut  wrote:
> What is the relationship of these changes with the recently added
> backtrace_on_internal_error?  We had similar discussions there, I feel
> like we are doing similar things here but slightly differently.  Like,
> shouldn't backtrace_functions_min_level also affect
> backtrace_on_internal_error?  Don't you really just want
> backtrace_on_any_error?  You are sneaking that in through the backdoor
> via backtrace_functions.  Can we somehow combine all these use cases
> more elegantly?  backtrace_on_error = {all|internal|none}?

Re: Speed up clean meson builds by ~25%

2024-04-17 Thread Jelte Fennema-Nio

On Wed, 17 Apr 2024 at 16:00, Tom Lane  wrote:
> Break gram.y (say, misspell some token in a production) and
> see what happens.  The behavior's likely to be timing sensitive
> though.

Thanks for clarifying. It took me a little while to break gram.y in
such a way that I was able to consistently reproduce, but I managed in
the end using the attached small diff.
And then running ninja in non-parallel mode:

ninja -C build all -j1

As I expected this problem was indeed fairly easy to address by still
building "backend/parser" before "interfaces". See attached patch.

v1-0001-Speed-up-clean-parallel-meson-builds-a-lot.patch
Description: Binary data

trigger-confusing-build-error.diff
Description: Binary data

Re: Speed up clean meson builds by ~25%

2024-04-17 Thread Jelte Fennema-Nio

On Wed, 17 Apr 2024 at 13:41, Peter Eisentraut  wrote:
>
> On 10.04.24 17:33, Tom Lane wrote:
> > The immediate question then is do we want to take Jelte's patch
> > as a way to ameliorate the pain meanwhile.  I'm kind of down on
> > it, because AFAICS what would happen if you break the core
> > grammar is that (most likely) the failure would be reported
> > against ecpg first.  That seems pretty confusing.
>
> Yeah that would be confusing.

How can I test if this actually happens? Because it sounds like if
that indeed happens it should be fixable fairly easily.

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-04-15 Thread Jelte Fennema-Nio

On Mon, 15 Apr 2024 at 19:43, Robert Haas  wrote:
>
> On Sat, Apr 6, 2024 at 6:14 PM Jelte Fennema-Nio  wrote:
> > I think for clients/drivers, the work would generally be pretty
> > minimal. For almost all proposed changes, clients can "support" the
> > protocol version update by simply not using the new features, ...
>
> I mean, I agree if a particular protocol version bump does nothing
> other than signal the presence of some optional, ignorable feature,
> then it doesn't cause a problem if we force clients to support it. But
> that seems a bit like saying that eating wild mushrooms is fine
> because some of them aren't poisonous. The point is that if we roll
> out two protocol changes, A and B, each of which requires the client
> to make some change in order to work with the newer protocol version,
> then using version numbers as the gating mechanism requires that the
> client can't support the newer of those two changes without also
> supporting the older one. Using feature flags doesn't impose that
> constraint, which I think is a plus.

I think we're in agreement here, i.e. it depends on the situation if a
feature flag or version bump is more appropriate. I think the
guidelines could be as follows:
1. For protocol changes that require "extremely minimal" work from
clients & poolers: bump the protocol version.
2. For "niche" features that require some work from clients and/or
poolers: use a protocol parameter feature flag.
3. For anything in between, let's discuss on the thread for that
specific protocol change on the tradeoffs.

On Mon, 15 Apr 2024 at 19:52, Robert Haas  wrote:
> surely it can't be right to use protocol
> version 3.0 to refer to a bunch of different things. But at the same
> time, surely we don't want clients to start panicking and bailing out
> when everything would have been fine.

I feel like the ProtocolVersionNegotiation should make sure people
don't panic and bail out. And if not, then feature flags won't help
with this either. Because clients would then still bail out if some
feature is not supported.

> I'm unconvinced that we should let ParameterSet change
> non-PGC_PROTOCOL GUCs. The pooler can agree on a list of protocol GUCs
> with the end client that differs from what the server agreed with the
> pooler - e.g., it can add pgbouncer.pool_mode to the list. But for
> truly non-protocol GUCs, we have a lot of ways to set those already.

I feel like you're glossing over something fairly important here. How
exactly would the client know about pgbouncer.pool_mode? Are you
envisioning a list of GUCs which can be changed using ParameterSet,
which the server then sends to the client during connection startup
(using presumably some new protocol message)? If so, then I feel this
same problem still exists. How would the client know which of those
GUCs change wire-protocol behaviour and which don't? It still would
need a hardcoded list (now including pgbouncer.pool_mode and maybe
more) of things that a user is allowed to change using ParameterSet.
So I think a well-known prefix would still be applicable.

To be clear, imho the well-known prefix discussion is separate from
the discussion about whether Postgres should throw an ERROR when
ParameterSet is used to change any non-PGC_PROTOCOL GUC. I'd be fine
with disallowing that if that seems better/safer/clearer to you
(although I'd love to hear your exact concerns about this). But I'd
still want a well-known prefix for protocol parameters. Because that
prefix is not for the benefit of the server, it's for the benefit of
the client and pooler. So the client/pooler can error if any dangerous
GUC is being changed, because the server would accept it and change
the wire-protocol accordingly.

1 2 3 4 5 6 >

1 - 100 of 552 matches

Mail list logo