Re: POC: GROUP BY optimization

2024-04-24 Thread jian he
hi.
I found an interesting case.

CREATE TABLE t1 AS
  SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS
z, i::int4 AS w
  FROM generate_series(1, 100) AS i;
CREATE INDEX t1_x_y_idx ON t1 (x, y);
ANALYZE t1;
SET enable_hashagg = off;
SET enable_seqscan = off;

EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,y,w;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,y,z;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,w,y;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,z,y;
the above part will use:
  ->  Incremental Sort
 Sort Key: x, $, $, $
 Presorted Key: x
 ->  Index Scan using t1_x_y_idx on t1

EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY z,y,w,x;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY w,y,z,x;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,z,x,w;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,w,x,z;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,x,z,w;
EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,x,w,z;

these will use:
  ->  Incremental Sort
 Sort Key: x, y, $, $
 Presorted Key: x, y
 ->  Index Scan using t1_x_y_idx on t1

I guess this is fine, but not optimal?




Re: Race condition in FetchTableStates() breaks synchronization of subscription tables

2024-04-24 Thread Amit Kapila
On Wed, Mar 13, 2024 at 9:19 AM vignesh C  wrote:
>
> On Tue, 12 Mar 2024 at 09:34, Ajin Cherian  wrote:
> >
> >
> >
> > On Tue, Mar 12, 2024 at 2:59 PM vignesh C  wrote:
> >>
> >>
> >> Thanks, I have created the following Commitfest entry for this:
> >> https://commitfest.postgresql.org/47/4816/
> >>
> >> Regards,
> >> Vignesh
> >
> >
> > Thanks for the patch, I have verified that the fix works well by following 
> > the steps mentioned to reproduce the problem.
> > Reviewing the patch, it seems good and is well documented. Just one minor 
> > comment I had was probably to change the name of the variable 
> > table_states_valid to table_states_validity. The current name made sense 
> > when it was a bool, but now that it is a tri-state enum, it doesn't fit 
> > well.
>
> Thanks for reviewing the patch, the attached v6 patch has the changes
> for the same.
>

v6_0001* looks good to me. This should be backpatched unless you or
others think otherwise.

-- 
With Regards,
Amit Kapila.




Re: Fix parallel vacuum buffer usage reporting

2024-04-24 Thread Masahiko Sawada
On Mon, Apr 22, 2024 at 5:07 PM Anthonin Bonnefoy
 wrote:
>
> On Sat, Apr 20, 2024 at 2:00 PM Alena Rybakina  
> wrote:
>>
>> Hi, thank you for your work with this subject.
>>
>> While I was reviewing your code, I noticed that your patch conflicts with 
>> another patch [0] that been committed.
>>
>> [0] 
>> https://www.postgresql.org/message-id/flat/CA%2BhUKGJkOiOCa%2Bmag4BF%2BzHo7qo%3Do9CFheB8%3Dg6uT5TUm2gkvA%40mail.gmail.com
>
>
> I've rebased the patch and also split the changes:

Thank you for updating the patch!

> 1: Use pgBufferUsage in Vacuum and Analyze block reporting

I think that if the anayze command doesn't have the same issue, we
don't need to change it. Making the vacuum and the analyze consistent
is a good point but I'd like to avoid doing unnecessary changes in
back branches. I think the patch set would contain:

(a) make lazy vacuum use BufferUsage instead of
VacuumPage{Hit,Miss,Dirty}. (backpatched down to pg13).
(b) make analyze use BufferUsage and remove VacuumPage{Hit,Miss,Dirty}
variables for consistency and simplicity (only for HEAD, if we agree).

BTW I realized that VACUUM VERBOSE running on a temp table always
shows the number of dirtied buffers being 0, which seems to be a bug.
The patch (a) will resolve it as well.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Feature request: schema diff tool

2024-04-24 Thread Neszt Tibor
Hello,

A diff tool that would generate create, drop, alter, etc. commands from the
differences between 2 specified schemes would be very useful. The diff
could even manage data, so there would be insert, delete update command
outputs, although I think the schema diff management is much more important
and necessary.

Today, all modern applications are version-tracked, including the sql
scheme. Now the schema changes must be handled twice: on the one hand, the
schema must be modified, and on the other hand, the schema modification
commands must also be written for the upgrade process. A good diff tool
would allow only the schema to be modified.

Such a tool already exists because the community needed it, e.g. apgdiff. I
think the problem with this is that the concept isn't even good. I think
this tool should be part of postgresql, because postgresql always knows
what the 100% sql syntax is current, an external program, for example
apgdiff can only follow changes afterwards, generating continuous problems.
Not to mention that an external application can stop, e.g. apgdiff is also
no longer actively developed, so users who built on a diff tool are now in
trouble.

Furthermore, it is the least amount of work to do this on the postgresql
development side, you have the expertise, the sql language processor, etc.

What is your opinion on this?

Regards,
 Neszt Tibor


Re: ecpg_config.h symbol missing with meson

2024-04-24 Thread Peter Eisentraut

On 17.04.24 18:15, Tom Lane wrote:

Peter Eisentraut  writes:

I checked the generated ecpg_config.h with make and meson, and the meson
one is missing



#define HAVE_LONG_LONG_INT 1



This is obviously quite uninteresting, since that is required by C99.
But it would be more satisfactory if we didn't have discrepancies like
that.  Note that we also kept ENABLE_THREAD_SAFETY in ecpg_config.h for
compatibility.
...
Alternatively, we could remove the symbol from the make side.


Think I'd vote for removing it, since we use it nowhere.
The ENABLE_THREAD_SAFETY precedent feels a little bit different,
since there's not the C99-requires-the-feature angle.


Ok, fixed by removing instead.





Re: Extend ALTER DEFAULT PRIVILEGES for large objects

2024-04-24 Thread Yugo NAGATA
On Tue, 23 Apr 2024 23:47:38 -0400
Tom Lane  wrote:

> Yugo NAGATA  writes:
> > Currently, ALTER DEFAULT PRIVILEGE doesn't support large objects,
> > so if we want to allow users other than the owner to use the large
> > object, we need to grant a privilege on it every time a large object
> > is created. One of our clients feels that this is annoying, so I would
> > like propose to extend  ALTER DEFAULT PRIVILEGE to large objects. 
> 
> I wonder how this plays with pg_dump, and in particular whether it
> breaks the optimizations that a45c78e32 installed for large numbers
> of large objects.  The added test cases seem to go out of their way
> to leave no trace behind that the pg_dump/pg_upgrade tests might
> encounter.

Thank you for your comments.

The previous patch did not work with pg_dump since I forgot some fixes.
I attached a updated patch including fixes.

I believe a45c78e32 is about already-existing large objects and does
not directly related to default privileges, so will not be affected
by this patch.

> I think you broke psql's \ddp, too.  And some other places; grepping
> for DEFACLOBJ_NAMESPACE finds other oversights.

Yes, I did. The attached patch include fixes for psql, too.
 
> On the whole I find this proposed feature pretty unexciting
> and dubiously worthy of the implementation/maintenance effort.

I believe this feature is beneficial to some users allows because
this enables to omit GRANT that was necessary every large object
creation. It seems to me that implementation/maintenance cost is not
so high compared to other objects (e.g. default privileges on schemas)
unless I am still missing something wrong. 

Regards,
Yugo Nagata

-- 
Yugo NAGATA 
>From 0cfcdc2b297556248cfb64d67779d5fcb8dab227 Mon Sep 17 00:00:00 2001
From: Yugo Nagata 
Date: Fri, 8 Mar 2024 17:43:43 +0900
Subject: [PATCH v2] Extend ALTER DEFAULT PRIVILEGES for large objects

Original patch by Haruka Takatsuka, some fixes and tests by
Yugo Nagata.
---
 doc/src/sgml/catalogs.sgml|   3 +-
 .../sgml/ref/alter_default_privileges.sgml|  15 ++-
 src/backend/catalog/aclchk.c  |  21 
 src/backend/catalog/objectaddress.c   |  18 ++-
 src/backend/catalog/pg_largeobject.c  |  18 ++-
 src/backend/parser/gram.y |   5 +-
 src/bin/pg_dump/dumputils.c   |   3 +-
 src/bin/pg_dump/pg_dump.c |   3 +
 src/bin/psql/describe.c   |   6 +-
 src/bin/psql/tab-complete.c   |   2 +-
 src/include/catalog/pg_default_acl.h  |   1 +
 src/include/parser/kwlist.h   |   1 +
 src/test/regress/expected/privileges.out  | 104 +-
 src/test/regress/sql/privileges.sql   |  47 
 14 files changed, 235 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 2907079e2a..b8cc822aeb 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -3330,7 +3330,8 @@ SCRAM-SHA-256$iteration count:
S = sequence,
f = function,
T = type,
-   n = schema
+   n = schema,
+   L = large object
   
  
 
diff --git a/doc/src/sgml/ref/alter_default_privileges.sgml b/doc/src/sgml/ref/alter_default_privileges.sgml
index 1de4c5c1b4..3b358b7a88 100644
--- a/doc/src/sgml/ref/alter_default_privileges.sgml
+++ b/doc/src/sgml/ref/alter_default_privileges.sgml
@@ -50,6 +50,11 @@ GRANT { USAGE | CREATE | ALL [ PRIVILEGES ] }
 ON SCHEMAS
 TO { [ GROUP ] role_name | PUBLIC } [, ...] [ WITH GRANT OPTION ]
 
+GRANT { { SELECT | UPDATE }
+[, ...] | ALL [ PRIVILEGES ] }
+ON LARGE OBJECTS
+TO { [ GROUP ] role_name | PUBLIC } [, ...] [ WITH GRANT OPTION ]
+
 REVOKE [ GRANT OPTION FOR ]
 { { SELECT | INSERT | UPDATE | DELETE | TRUNCATE | REFERENCES | TRIGGER | MAINTAIN }
 [, ...] | ALL [ PRIVILEGES ] }
@@ -81,6 +86,13 @@ REVOKE [ GRANT OPTION FOR ]
 ON SCHEMAS
 FROM { [ GROUP ] role_name | PUBLIC } [, ...]
 [ CASCADE | RESTRICT ]
+
+REVOKE [ GRANT OPTION FOR ]
+{ { SELECT | UPDATE }
+[, ...] | ALL [ PRIVILEGES ] }
+ON LARGE OBJECTS
+FROM { [ GROUP ] role_name | PUBLIC } [, ...]
+[ CASCADE | RESTRICT ]
 
  
 
@@ -159,7 +171,8 @@ REVOKE [ GRANT OPTION FOR ]
   If IN SCHEMA is omitted, the global default privileges
   are altered.
   IN SCHEMA is not allowed when setting privileges
-  for schemas, since schemas can't be nested.
+  for schemas and large objects, since schemas can't be nested and
+  large objects don't belong to a schema.
  
 

diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index 7abf3c2a74..41baf81a1d 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -1077,6 +1077,10 @@ ExecAlterDefaultPrivilegesStmt(ParseState *pstate, AlterDefaultPrivilegesStmt *s
 			all_privileges = ACL_ALL_RIGHTS_SCHEMA;
 			errormsg = 

Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2024-04-24 Thread Alexander Korotkov
On Wed, Apr 17, 2024 at 9:38 AM Peter Eisentraut  wrote:
> On 24.10.23 22:13, Alexander Korotkov wrote:
> > On Wed, Sep 28, 2022 at 11:44 AM Aleksander Alekseev
> >  wrote:
> >>> I think, this patch was marked as "Waiting on Author", probably, by 
> >>> mistake. Since recent changes were done without any significant code 
> >>> changes and CF bot how happy again.
> >>>
> >>> I'm going to move it to RfC, could I? If not, please tell why.
> >>
> >> I restored the "Ready for Committer" state. I don't think it's a good
> >> practice to change the state every time the patch has a slight
> >> conflict or something. This is not helpful at all. Such things happen
> >> quite regularly and typically are fixed in a couple of days.
> >
> > This patch seems useful to me.  I went through the thread, it seems
> > that all the critics are addressed.
> >
> > I've rebased this patch.   Also, I've run perltidy for tests, split
> > long errmsg() into errmsg(), errdetail() and errhint(), and do other
> > minor enchantments.
> >
> > I think this patch is ready to go.  I'm going to push it if there are
> > no objections.
>
> I just found the new pg_amcheck option --checkunique in PG17-to-be.
> Could we rename this to --check-unique?  Seems friendlier.  Maybe also
> rename the bt_index_check function argument to check_unique.

+1 from me
Let's do so if nobody objects.

--
Regards,
Alexander Korotkov




Remove unnecessary code rom be_lo_put()

2024-04-24 Thread Yugo NAGATA
Hi,

I noticed that a permission check is performed in be_lo_put()
just after returning inv_open(), but teh permission should be
already checked in inv_open(), so I think we can remove this
part of codes. I attached a patch for this fix.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 
diff --git a/src/backend/libpq/be-fsstubs.c b/src/backend/libpq/be-fsstubs.c
index 28ad1c9277..f728d0346c 100644
--- a/src/backend/libpq/be-fsstubs.c
+++ b/src/backend/libpq/be-fsstubs.c
@@ -860,17 +860,6 @@ be_lo_put(PG_FUNCTION_ARGS)
 	lo_cleanup_needed = true;
 	loDesc = inv_open(loOid, INV_WRITE, CurrentMemoryContext);
 
-	/* Permission check */
-	if (!lo_compat_privileges &&
-		pg_largeobject_aclcheck_snapshot(loDesc->id,
-		 GetUserId(),
-		 ACL_UPDATE,
-		 loDesc->snapshot) != ACLCHECK_OK)
-		ereport(ERROR,
-(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
- errmsg("permission denied for large object %u",
-		loDesc->id)));
-
 	inv_seek(loDesc, offset, SEEK_SET);
 	written = inv_write(loDesc, VARDATA_ANY(str), VARSIZE_ANY_EXHDR(str));
 	Assert(written == VARSIZE_ANY_EXHDR(str));


Re: Why does pgindent's README say to download typedefs.list from the buildfarm?

2024-04-24 Thread Andrew Dunstan



On 2024-04-24 We 06:12, Peter Eisentraut wrote:

On 22.04.24 22:28, Tom Lane wrote:

Nathan Bossart  writes:

On Mon, Apr 22, 2024 at 04:08:08PM -0400, Tom Lane wrote:

I think the actual plan now is that we'll sync the in-tree copy
with the buildfarm's results (and then do a tree-wide pgindent)
every so often, probably shortly before beta every year.
Okay.  Is this just to resolve the delta between the manual updates 
and a

clean autogenerated copy every once in a while?

The main reason there's a delta is that people don't manage to
maintain the in-tree copy perfectly (at least, they certainly
haven't done so for this past year).  So we need to do that
to clean up every now and then.

A secondary reason is that the set of typedefs we absorb from
system include files changes over time.


Is the code to extract typedefs available somewhere independent of the 
buildfarm?  It would be useful sometimes to be able to run this 
locally, like before and after some patch, to keep the in-tree 
typedefs list updated.






There's been talk about it but I don't think anyone's done it. I'd be 
more than happy if the buildfarm client could just call something in the 
core repo (c.f. src/test/perl/Postgres/Test/AdjustUpgrade.pm).


Regarding testing with your patch, some years ago I wrote this blog 
post: 




cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Tarball builds in the new world order

2024-04-24 Thread Greg Sabino Mullane
On Tue, Apr 23, 2024 at 6:06 PM Tom Lane  wrote:

> This change seems like a good thing anyway for anyone who's tempted
> to use "make dist" manually, since they wouldn't necessarily want
> to package HEAD either.  Now, if we just do it exactly like that
> then trying to "make dist" without setting PG_COMMIT_HASH will
> fail, since "git archive" has no default for its 
> argument.  I can't quite decide if that's a good thing, or if we
> should hack the makefile a little further to allow PG_COMMIT_HASH
> to default to HEAD.
>

Just having it fail seems harsh. What if we had plain "make dist" at least
output a friendly hint about "please specify a hash"? That seems better
than an implicit HEAD default, as they can manually set it to HEAD
themselves per the hint.

Cheers,
Greg


Re: Cleanup: remove unused fields from nodes

2024-04-24 Thread Matthias van de Meent
On Wed, 24 Apr 2024 at 09:28, Michael Paquier  wrote:
>
> On Tue, Apr 23, 2024 at 11:03:40PM -0400, Tom Lane wrote:
> > Hah.  Seems like the comment for isall needs to explain that it
> > exists for this purpose, so we don't make this mistake again.
>
> How about something like the attached?

LGTM.

Thanks,

Matthias




Re: Race condition in FetchTableStates() breaks synchronization of subscription tables

2024-04-24 Thread vignesh C
On Wed, 24 Apr 2024 at 11:59, Amit Kapila  wrote:
>
> On Wed, Mar 13, 2024 at 9:19 AM vignesh C  wrote:
> >
> > On Tue, 12 Mar 2024 at 09:34, Ajin Cherian  wrote:
> > >
> > >
> > >
> > > On Tue, Mar 12, 2024 at 2:59 PM vignesh C  wrote:
> > >>
> > >>
> > >> Thanks, I have created the following Commitfest entry for this:
> > >> https://commitfest.postgresql.org/47/4816/
> > >>
> > >> Regards,
> > >> Vignesh
> > >
> > >
> > > Thanks for the patch, I have verified that the fix works well by 
> > > following the steps mentioned to reproduce the problem.
> > > Reviewing the patch, it seems good and is well documented. Just one minor 
> > > comment I had was probably to change the name of the variable 
> > > table_states_valid to table_states_validity. The current name made sense 
> > > when it was a bool, but now that it is a tri-state enum, it doesn't fit 
> > > well.
> >
> > Thanks for reviewing the patch, the attached v6 patch has the changes
> > for the same.
> >
>
> v6_0001* looks good to me. This should be backpatched unless you or
> others think otherwise.

This patch needs to be committed in master,PG16 and PG15.
This is not required from PG14 onwards because they don't have
HasSubscriptionRelations call before updating table_states_valid:
/*
 * Does the subscription have tables?
 *
 * If there were not-READY relations found then we know it does. But
 * if table_states_not_ready was empty we still need to check again to
 * see if there are 0 tables.
 */
has_subrels = (table_states_not_ready != NIL) ||
  HasSubscriptionRelations(MySubscription->oid);

So the invalidation function will not be called here.

Whereas for PG15 and newer versions this is applicable:
HasSubscriptionRelations calls table_open function which will get the
invalidate callback like in:
HasSubscriptionRelations -> table_open -> relation_open ->
LockRelationOid -> AcceptInvalidationMessages ->
ReceiveSharedInvalidMessages -> LocalExecuteInvalidationMessage ->
CallSyscacheCallbacks -> invalidate_syncing_table_states

The attached patch
v7-0001-Table-sync-missed-due-to-race-condition-in-subscr.patch
applies for master and PG16 branch,
v7-0001-Table-sync-missed-due-to-race-condition-in-subscr_PG15.patch
applies for PG15 branch.

Regards,
Vignesh
From 93116320bd9ffbcddc83a1fc253cba90c56cb47e Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Thu, 1 Feb 2024 09:46:40 +0530
Subject: [PATCH v7] Table sync missed due to race condition in subscription
 cache handling.

Table sync was missed if the invalidation of table sync occurs while
the non ready tables were getting prepared. This occurrs because
the table state was being set to valid at the end of non ready table
list preparation irrespective of any inavlidations occurred while
preparing the list. Fixed it by changing the boolean variable to an
tri-state enum and by setting table state to valid only if no
invalidations have been occurred while the list is being prepared.
---
 src/backend/replication/logical/tablesync.c | 25 +
 src/tools/pgindent/typedefs.list|  1 +
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 81fff1919d..7d4ee48206 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -122,7 +122,14 @@
 #include "utils/syscache.h"
 #include "utils/usercontext.h"
 
-static bool table_states_valid = false;
+typedef enum
+{
+	SYNC_TABLE_STATE_NEEDS_REBUILD,
+	SYNC_TABLE_STATE_REBUILD_STARTED,
+	SYNC_TABLE_STATE_VALID,
+} SyncingTablesState;
+
+static SyncingTablesState table_states_validity = SYNC_TABLE_STATE_NEEDS_REBUILD;
 static List *table_states_not_ready = NIL;
 static bool FetchTableStates(bool *started_tx);
 
@@ -272,7 +279,7 @@ wait_for_worker_state_change(char expected_state)
 void
 invalidate_syncing_table_states(Datum arg, int cacheid, uint32 hashvalue)
 {
-	table_states_valid = false;
+	table_states_validity = SYNC_TABLE_STATE_NEEDS_REBUILD;
 }
 
 /*
@@ -1556,13 +1563,15 @@ FetchTableStates(bool *started_tx)
 
 	*started_tx = false;
 
-	if (!table_states_valid)
+	if (table_states_validity != SYNC_TABLE_STATE_VALID)
 	{
 		MemoryContext oldctx;
 		List	   *rstates;
 		ListCell   *lc;
 		SubscriptionRelState *rstate;
 
+		table_states_validity = SYNC_TABLE_STATE_REBUILD_STARTED;
+
 		/* Clean the old lists. */
 		list_free_deep(table_states_not_ready);
 		table_states_not_ready = NIL;
@@ -1596,7 +1605,15 @@ FetchTableStates(bool *started_tx)
 		has_subrels = (table_states_not_ready != NIL) ||
 			HasSubscriptionRelations(MySubscription->oid);
 
-		table_states_valid = true;
+		/*
+		 * If the subscription relation cache has been invalidated since we
+		 * entered this routine, we still use and return the relations we just
+		 * finished constructing, to avoid infinite loops, but we leave the
+		 * table states marked as stale so that we'll rebuild 

Re: POC: GROUP BY optimization

2024-04-24 Thread jian he
hi
one more question (maybe a dumb one)

drop table if exists t1;
CREATE TABLE t1 AS
  SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS
z, i::int4 AS w
  FROM generate_series(1, 100) AS i;
CREATE INDEX t1_x_y_idx ON t1 (x, y);
ANALYZE t1;
SET enable_hashagg = off;
SET enable_seqscan = off;


EXPLAIN (COSTS OFF, verbose) SELECT count(*) FROM t1 GROUP BY z,x,y,w
order by w;


  QUERY PLAN
--
 GroupAggregate
   Output: count(*), w, z, x, y
   Group Key: t1.w, t1.x, t1.y, t1.z
   ->  Sort
 Output: w, z, x, y
 Sort Key: t1.w, t1.x, t1.y, t1.z
 ->  Index Scan using t1_x_y_idx on public.t1
   Output: w, z, x, y
(8 rows)


if you do
` Sort Key: t1.w, t1.x, t1.y, t1.z`

then the output is supposed to be:

Output: w, x, y, z

?




Re: doc: create table improvements

2024-04-24 Thread David G. Johnston
On Wed, Apr 24, 2024 at 3:30 AM Peter Eisentraut 
wrote:

>  > +   The reliability characteristics of a table are governed by its
>  > +   persistence mode.  The default mode is described
>  > +   here
>  > +   There are two alternative modes that can be specified during
>  > +   table creation:
>  > +   temporary and
>  > +   unlogged.
>
> Not sure reliability is the best word here.  I mean, a temporary table
> isn't any less reliable than any other table.  It just does different
> things.
>
>
Given the name of the section where this is all discussed I'm having
trouble going with a different word.  But better framing and phrasing I can
do:

A table may be opted out of certain storage aspects of reliability, as
described [here], by specifying either of the alternate persistence modes:
[temporary] or [logged]. The specific trade-offs and implications are
detailed below.

David J.


Re: Why does pgindent's README say to download typedefs.list from the buildfarm?

2024-04-24 Thread Andrew Dunstan


On 2024-04-23 Tu 06:23, Alvaro Herrera wrote:

But there are others: InjectionPointEntry, ResourceOwnerData,
JsonNonTerminal, JsonParserSem, ...



The last two are down to me. Let's just get rid of them like the 
attached (no need for a typedef at all)



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com
diff --git a/src/common/jsonapi.c b/src/common/jsonapi.c
index 12fabcaccf..fc0cb36974 100644
--- a/src/common/jsonapi.c
+++ b/src/common/jsonapi.c
@@ -50,16 +50,16 @@ typedef enum	/* contexts of JSON parser */
  * tokens, non-terminals, and semantic action markers.
  */
 
-typedef enum
+enum JsonNonTerminal
 {
 	JSON_NT_JSON = 32,
 	JSON_NT_ARRAY_ELEMENTS,
 	JSON_NT_MORE_ARRAY_ELEMENTS,
 	JSON_NT_KEY_PAIRS,
 	JSON_NT_MORE_KEY_PAIRS,
-} JsonNonTerminal;
+};
 
-typedef enum
+enum JsonParserSem
 {
 	JSON_SEM_OSTART = 64,
 	JSON_SEM_OEND,
@@ -72,7 +72,7 @@ typedef enum
 	JSON_SEM_AELEM_END,
 	JSON_SEM_SCALAR_INIT,
 	JSON_SEM_SCALAR_CALL,
-} JsonParserSem;
+};
 
 /*
  * struct containing the 3 stacks used in non-recursive parsing,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index d551ada325..ba48a3371d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1312,14 +1312,12 @@ JsonManifestParseIncrementalState
 JsonManifestParseState
 JsonManifestSemanticState
 JsonManifestWALRangeField
-JsonNonTerminal
 JsonObjectAgg
 JsonObjectConstructor
 JsonOutput
 JsonParseExpr
 JsonParseContext
 JsonParseErrorType
-JsonParserSem
 JsonParserStack
 JsonPath
 JsonPathBool


Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-24 Thread Daniel Gustafsson
> On 24 Apr 2024, at 00:20, Michael Paquier  wrote:
> 
> On Tue, Apr 23, 2024 at 10:08:13PM +0200, Daniel Gustafsson wrote:
>> Hearing no objections to this plan (and the posted v10), I'll go ahead with
>> 0001, 0003 and 0004 into v17 tomorrow.
> 
> WFM, thanks.

Done.  Attached are the two remaining patches, rebased over HEAD, for removing
support for OpenSSL 1.0.2 in v18. Parking them in the commitfest for now.

--
Daniel Gustafsson



v11-0002-Remove-pg_strong_random-initialization.patch
Description: Binary data


v11-0001-Remove-support-for-OpenSSL-1.0.2.patch
Description: Binary data


Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM

2024-04-24 Thread Bharath Rupireddy
On Wed, Apr 3, 2024 at 1:10 AM Jeff Davis  wrote:
>
> Here's where I think this API should go:
>
> 1. Have table_modify_begin/end and table_modify_buffer_insert, like
> those that are implemented in your patch.

I added table_modify_begin, table_modify_buffer_insert,
table_modify_buffer_flush and table_modify_end. Table Access Method (AM)
authors now can define their own buffering strategy and flushing decisions
based on their tuple storage kinds and various other AM specific factors. I
also added a default implementation that falls back to single inserts when
no implementation is provided for these AM by AM authors. See the attached
v19-0001 patch.

> 2. Add some kind of flush callback that will be called either while the
> tuples are being flushed or after the tuples are flushed (but before
> they are freed by the AM). (Aside: do we need to call it while the
> tuples are being flushed to get the right visibility semantics for
> after-row triggers?)

I added a flush callback named TableModifyBufferFlushCallback; when
provided by callers invoked after tuples are flushed to disk from the
buffers but before the AM frees them up. Index insertions and AFTER ROW
INSERT triggers can be executed in this callback. See the v19-0001 patch
for how AM invokes the flush callback, and see either v19-0003 or v19-0004
or v19-0005 for how a caller can supply the callback and required context
to execute index insertions and AR triggers.

> 3. Add table_modify_buffer_{update|delete} APIs.
>
> 9. Use these new methods for DELETE, UPDATE, and MERGE. MERGE can use
> the buffer_insert/update/delete APIs; we don't need a separate merge
> method. This probably requires that the AM maintain 3 separate buffers
> to distinguish different kinds of changes at flush time (obviously
> these can be initialized lazily to avoid overhead when not being used).

I haven't thought about these things yet. I can only focus on them after
seeing how the attached patches go from here.

> 4. Some kind of API tweaks to help manage memory when modifying
> pertitioned tables, so that the buffering doesn't get out of control.
> Perhaps just reporting memory usage and allowing the caller to force
> flushes would be enough.

Heap implementation for thes new Table AMs uses a separate memory context
for all of the operations. Please have a look and let me know if we need
anything more.

> 5. Use these new methods for CREATE/REFRESH MATERIALIZED VIEW. This is
> fairly straightforward, I believe, and handled by your patch. Indexes
> are (re)built afterward, and no triggers are possible.
>
> 6. Use these new methods for CREATE TABLE ... AS. This is fairly
> straightforward, I believe, and handled by your patch. No indexes or
> triggers are possible.

I used multi inserts for all of these including TABLE REWRITE commands such
as ALTER TABLE. See the attached v19-0002 patch. Check the testing section
below for benefits.

FWIW, following are some of the TABLE REWRITE commands that can get
benefitted:

ALTER TABLE tbl ALTER c1 TYPE bigint;
ALTER TABLE itest13 ADD COLUMN c int GENERATED BY DEFAULT AS IDENTITY;
ALTER MATERIALIZED VIEW heapmv SET ACCESS METHOD heap2;
ALTER TABLE itest3 ALTER COLUMN a TYPE int;
ALTER TABLE gtest20 ALTER COLUMN b SET EXPRESSION AS (a * 3);
ALTER TABLE has_volatile ADD col4 int DEFAULT (random() * 1)::int;
and so on.

> 7. Use these new methods for COPY. We have to be careful to avoid
> regressions for the heap method, because it's already managing its own
> buffers. If the AM manages the buffering, then it may require
> additional copying of slots, which could be a disadvantage. To solve
> this, we may need some minor API tweaks to avoid copying when the
> caller guarantees that the memory will not be freed to early, or
> perhaps expose the AM's memory context to copyfrom.c. Another thing to
> consider is that the buffering in copyfrom.c is also used for FDWs, so
> that buffering code path needs to be preserved in copyfrom.c even if
> not used for AMs.

I modified the COPY FROM code to use the new Table AMs, and performed some
tests which show no signs of regression. Check the testing section below
for more details. See the attached v19-0005 patch. With this,
table_multi_insert can be deprecated.

> 8. Use these new methods for INSERT INTO ... SELECT. One potential
> challenge here is that execution nodes are not always run to
> completion, so we need to be sure that the flush isn't forgotten in
> that case.

I did that in v19-0003. I did place the table_modify_end call in multiple
places including ExecEndModifyTable. I didn't find any issues with it.
Please have a look and let me know if we need the end call in more places.
Check the testing section below for benefits.

> 10. Use these new methods for logical apply.

I used multi inserts for Logical Replication apply. in v19-0004. Check the
testing section below for benefits.

FWIW, open-source pglogical does have multi insert support, check code
around

Re: Small filx on the documentation of ALTER DEFAULT PRIVILEGES

2024-04-24 Thread Tom Lane
Yugo NAGATA  writes:
> We can specify more than one privilege type in 
> "ALTER DEFAULT PRIVILEGES GRANT/REVOKE ON SCHEMAS",
> for example,

>   ALTER DEFAULT PRIVILEGES GRANT USAGE,CREATE ON SCHEMAS TO PUBLIC;

> However, the syntax described in the documentation looks to
> be allowing only one,

>  GRANT { USAGE | CREATE | ALL [ PRIVILEGES ] }
> ON SCHEMAS
> TO { [ GROUP ] role_name | PUBLIC } [, ...] [ WITH GRANT OPTION ]

> while the syntaxes for tables and sequences are described correctly.

Yup, you're right.  I'll push this shortly.

regards, tom lane




Re: Q: Escapes in jsonpath Idents

2024-04-24 Thread Peter Eisentraut

On 17.03.24 20:12, Erik Wienhold wrote:

Mentioning JSON and \v in the same sentence is wrong: JavaScript allows
that escape in strings but JSON doesn't.  I think the easiest is to just
replace "JSON" with "JavaScript" in that sentence to make it right.  The
paragraph also already says "embedded string literals follow JavaScript/
ECMAScript conventions", so mentioning JSON seems unnecessary to me.

The last sentence also mentions backslash escapes \xNN and \u{N...} as
deviations from JSON when in fact those are valid escape sequences from
ECMA-262:https://262.ecma-international.org/#prod-HexEscapeSequence
So I think it makes sense to reword the entire backslash part of the
paragraph and remove references to JSON entirely.  The attached patch
does that and also formats the backslash escapes as a bulleted list for
readability.


I have committed this patch, and backpatched it, as a bug fix, because 
the existing description was wrong.  To keep the patch minimal for 
backpatching, I didn't do the conversion to a list.  I'm not sure I like 
that anyway, because it tends to draw more attention to that part over 
the surrounding parts, which didn't seem appropriate in this case.  But 
anyway, if you have any more non-bug-fix editing in this area, which 
would then target PG18, please send more patches.






Re: Q: Escapes in jsonpath Idents

2024-04-24 Thread David E. Wheeler
On Apr 24, 2024, at 05:51, Peter Eisentraut  wrote:

>A  is classified as follows.
> 
>Case:
> 
>a) A  that is a  is acontext variable>.
> 
>b) A  that begins with  is a
>   .
> 
>c) Otherwise, a  is a .
> 
> Does this help?  I wasn't following all the discussion to see if there is 
> anything wrong with the implementation.

Yes, it does, as it ties the special meaning of the dollar sign to the 
*beginning* of an expression. So it makes sense that this would be an error:

david=# select '$.$foo'::jsonpath;
ERROR: syntax error at or near "$foo" of jsonpath input
LINE 1: select '$.$foo'::jsonpath;
   ^

But I’m less sure when a dollar sign is used in the *middle* (or end) of a json 
path identifier:

david=# select '$.xx$foo'::jsonpath;
ERROR:  syntax error at or near "$foo" of jsonpath input
LINE 1: select '$.xx$foo'::jsonpath;
   ^

Perhaps that should be valid?

Best,

David





Re: Avoid orphaned objects dependencies, take 3

2024-04-24 Thread Alexander Lakhin

Hi Bertrand,

24.04.2024 11:38, Bertrand Drouvot wrote:

Please find attached v2 that should not produce the issue anymore (I launched a
lot of attempts without any issues). v1 was not strong enough as it was not
always checking for the dependent object existence. v2 now always checks if the
object still exists after the additional lock acquisition attempt while 
recording
the dependency.

I still need to think about v2 but in the meantime could you please also give
v2 a try on you side?

I gave more thought to v2 and the approach seems reasonable to me. Basically 
what
it does is that in case the object is already dropped before we take the new 
lock
(while recording the dependency) then the error message is a "generic" one 
(means
it does not provide the description of the "already" dropped object). I think it
makes sense to write the patch that way by abandoning the patch's ambition to
tell the description of the dropped object in all the cases.

Of course, I would be happy to hear others thought about it.

Please find v3 attached (which is v2 with more comments).


Thank you for the improved version!

I can confirm that it fixes that case.
I've also tested other cases that fail on master (most of them also fail
with v1), please try/look at the attached script. (There could be other
broken-dependency cases, of course, but I think I've covered all the
representative ones.)

All tested cases work correctly with v3 applied — I couldn't get broken
dependencies, though concurrent create/drop operations can still produce
the "cache lookup failed" error, which is probably okay, except that it is
an INTERNAL_ERROR, which assumed to be not easily triggered by users.

Best regards,
Alexanderif [ "$1" == "function-schema" ]; then
res=1
for ((i=1;i<=300;i++)); do
  ( { for ((n=1;n<=20;n++)); do echo "DROP SCHEMA s;"; done } | psql ) 
>psql1.log 2>&1 & 
  echo "
CREATE SCHEMA s;
CREATE FUNCTION s.func1() RETURNS int LANGUAGE SQL AS 'SELECT 1;';
CREATE FUNCTION s.func2() RETURNS int LANGUAGE SQL AS 'SELECT 2;';
CREATE FUNCTION s.func3() RETURNS int LANGUAGE SQL AS 'SELECT 3;';
CREATE FUNCTION s.func4() RETURNS int LANGUAGE SQL AS 'SELECT 4;';
CREATE FUNCTION s.func5() RETURNS int LANGUAGE SQL AS 'SELECT 5;';
  "  | psql >psql2.log 2>&1 &
  wait
  psql -c "DROP SCHEMA s CASCADE" >psql3.log 2>&1
done
echo "
SELECT pg_identify_object('pg_proc'::regclass, pp.oid, 0), pp.oid FROM pg_proc 
pp
  LEFT JOIN pg_namespace pn ON pp.pronamespace = pn.oid WHERE pn.oid IS NULL" | 
psql
sleep 1
grep 'was terminated' server.log || res=0
fi


if [ "$1" == "function-function" ]; then
res=1
for ((i=1;i<=1000;i++)); do
( { for ((n=1;n<=20;n++)); do echo "DROP FUNCTION f();"; done } | psql ) 
>psql1.log 2>&1 &
( echo "
CREATE FUNCTION f() RETURNS int LANGUAGE SQL RETURN 1;
CREATE FUNCTION f1() RETURNS int LANGUAGE SQL RETURN f() + 1;
CREATE FUNCTION f2() RETURNS int LANGUAGE SQL RETURN f() + 2;
CREATE FUNCTION f3() RETURNS int LANGUAGE SQL RETURN f() + 3;
CREATE FUNCTION f4() RETURNS int LANGUAGE SQL RETURN f() + 4;
CREATE FUNCTION f5() RETURNS int LANGUAGE SQL RETURN f() + 5;
" | psql ) >psql2.log 2>&1 &
wait
echo "ALTER FUNCTION f1() RENAME TO f01; SELECT f01()" | psql 2>&1 | grep 
'cache lookup failed' && { echo "on iteration $i"; res=1; break; }

psql -c "DROP FUNCTION f() CASCADE" >psql3.log 2>&1
done

grep -A2 'cache lookup failed' server.log || res=0
fi


if [ "$1" == "function-rettype" ]; then
res=0
for ((i=1;i<=5000;i++)); do
( echo "
CREATE DOMAIN id AS int;
CREATE FUNCTION f1() RETURNS id LANGUAGE SQL RETURN 1;
CREATE FUNCTION f2() RETURNS id LANGUAGE SQL RETURN 2;
CREATE FUNCTION f3() RETURNS id LANGUAGE SQL RETURN 3;
CREATE FUNCTION f4() RETURNS id LANGUAGE SQL RETURN 4;
CREATE FUNCTION f5() RETURNS id LANGUAGE SQL RETURN 5;
" | psql ) >psql1.log 2>&1 &
( echo "SELECT pg_sleep(random() / 100); DROP DOMAIN id"  | psql ) >psql2.log 
2>&1 &
wait

psql -c "SELECT f1()" 2>&1 | grep 'cache lookup failed' && { echo "on iteration 
$i"; res=1; break; }
psql -c "DROP DOMAIN id CASCADE" >psql3.log 2>&1
done

[ $res == 1 ] && psql -c "SELECT pn.oid, proname, pronamespace, proowner, 
prolang, prorettype FROM pg_proc pp INNER JOIN pg_namespace pn ON 
(pp.pronamespace = pn.oid) WHERE pn.nspname='public'"
fi


if [ "$1" == "function-argtype" ]; then
res=0
for ((i=1;i<=5000;i++)); do
( echo "
CREATE DOMAIN id AS int;
CREATE FUNCTION f1(i id) RETURNS int LANGUAGE SQL RETURN 1;
CREATE FUNCTION f2(i id) RETURNS int LANGUAGE SQL RETURN 2;
CREATE FUNCTION f3(i id) RETURNS int LANGUAGE SQL RETURN 3;
CREATE FUNCTION f4(i id) RETURNS int LANGUAGE SQL RETURN 4;
CREATE FUNCTION f5(i id) RETURNS int LANGUAGE SQL RETURN 5;
" | psql ) >psql1.log 2>&1 &
( echo "SELECT pg_sleep(random() / 1000); DROP DOMAIN id" | psql ) >psql2.log 
2>&1 &
wait

psql -c "SELECT f1(1)" 2>&1 | grep 'cache lookup failed' && { echo "on 
iteration $i"; res=1; break; }
psql -q -c "DROP DOMAIN id CASCADE" >/dev/null 2>&1
done

grep -A1 'cache lookup failed' server.log || res=0
fi

Re: Tarball builds in the new world order

2024-04-24 Thread Tom Lane
Peter Eisentraut  writes:
> On 24.04.24 00:05, Tom Lane wrote:
>> # Export the selected git ref
>> git archive ${gitref} | tar xf - -C pgsql

> Where does ${gitref} come from?  Why doesn't this line use git archive 
> HEAD | ... ?

${gitref} is an argument to the script, specifying the commit
to be packaged.  HEAD would certainly not work when trying to
package a back-branch release, and in general it would hardly
ever be what you want if your goal is to make a reproducible
package.

>> What I suggest is doing this in mk-one-release:
>> -make dist
>> +make dist PG_COMMIT_HASH=${gitref}

> I suppose we could do something like that, but we'd also need to come up 
> with a meson version.

Packaging via meson is years away yet IMO, so I'm unconcerned
about that for now.  See below.

> (Let's not use "hash" though, since other ways to commit specify a 
> commit can be used.)

OK, do you have a different term in mind?

>> This change seems like a good thing anyway for anyone who's tempted
>> to use "make dist" manually, since they wouldn't necessarily want
>> to package HEAD either.

> A tin-foil-hat argument is that we might not want to encourage that, 
> because for reproducibility, we need a known git commit and also a known 
> implementation of make dist.  If in the future someone uses the make 
> dist implementation of PG19 to build a tarball for PG17, it might not 
> come out the same way as using the make dist implementation of PG17.

Of course.  The entire reason why this script invokes "make dist",
rather than implementing the behavior for itself, is so that
branch-specific behaviors can be accounted for in the branches
not here.  (To be clear, the script has no idea which branch
it's packaging --- that's implicit in the commit ID.)

Because of that, I really don't want to rely on some equivalent
meson infrastructure until it's available in all the live branches.
v15 will be EOL in 3.5 years, and that's more or less the time frame
that we've spoken of for dropping the makefile infrastructure, so
I don't think that's an unreasonable plan.

regards, tom lane




Re: Why does pgindent's README say to download typedefs.list from the buildfarm?

2024-04-24 Thread Peter Eisentraut

On 22.04.24 22:28, Tom Lane wrote:

Nathan Bossart  writes:

On Mon, Apr 22, 2024 at 04:08:08PM -0400, Tom Lane wrote:

I think the actual plan now is that we'll sync the in-tree copy
with the buildfarm's results (and then do a tree-wide pgindent)
every so often, probably shortly before beta every year.

Okay.  Is this just to resolve the delta between the manual updates and a
clean autogenerated copy every once in a while?

The main reason there's a delta is that people don't manage to
maintain the in-tree copy perfectly (at least, they certainly
haven't done so for this past year).  So we need to do that
to clean up every now and then.

A secondary reason is that the set of typedefs we absorb from
system include files changes over time.


Is the code to extract typedefs available somewhere independent of the 
buildfarm?  It would be useful sometimes to be able to run this locally, 
like before and after some patch, to keep the in-tree typedefs list updated.






Re: Rename libpq trace internal functions

2024-04-24 Thread Yugo NAGATA
On Wed, 24 Apr 2024 09:39:02 +0200
Peter Eisentraut  wrote:

> libpq's pqTraceOutputMessage() used to look like this:
> 
>  case 'Z':   /* Ready For Query */
>  pqTraceOutputZ(conn->Pfdebug, message, );
>  break;
> 
> Commit f4b54e1ed98 introduced macros for protocol characters, so now
> it looks like this:
> 
>  case PqMsg_ReadyForQuery:
>  pqTraceOutputZ(conn->Pfdebug, message, );
>  break;
> 
> But this introduced a disconnect between the symbol in the switch case
> and the function name to be called, so this made the manageability of
> this file a bit worse.
> 
> This patch changes the function names to match, so now it looks like
> this:
> 
>  case PqMsg_ReadyForQuery:
>  pqTraceOutput_ReadyForQuery(conn->Pfdebug, message, );
>  break;

+1

I prefer the new function names since it seems more natural and easier to read.

I noticed pqTraceOutputNR() is left as is, is this intentional? Or, shoud this
be changed to pqTranceOutput_NoticeResponse()?

Regards,
Yugo Nagata

> (This also improves the readability of the file in general, since some
> function names like "pqTraceOutputt" were a little hard to read
> accurately.)
>
> Some protocol characters have different meanings to and from the
> server.  The old code structure had a common function for both, for
> example, pqTraceOutputD().  The new structure splits this up into
> separate ones to match the protocol message name, like
> pqTraceOutput_Describe() and pqTraceOutput_DataRow().


-- 
Yugo NAGATA 




Re: Why does pgindent's README say to download typedefs.list from the buildfarm?

2024-04-24 Thread Tom Lane
Andrew Dunstan  writes:
> On 2024-04-24 We 06:12, Peter Eisentraut wrote:
>> Is the code to extract typedefs available somewhere independent of the 
>> buildfarm?  It would be useful sometimes to be able to run this 
>> locally, like before and after some patch, to keep the in-tree 
>> typedefs list updated.

> There's been talk about it but I don't think anyone's done it. I'd be 
> more than happy if the buildfarm client could just call something in the 
> core repo (c.f. src/test/perl/Postgres/Test/AdjustUpgrade.pm).

There is already src/tools/find_typedef, which looks like it might
be an ancestral version of the current buildfarm code (which is sub
find_typedefs in run_build.pl of the client distribution).  Perhaps
it'd be useful to bring that up to speed with the current BF code.

The main problem with this though is that a local run can only
give you the system-supplied typedefs for your own platform and
build options.  The value-add that the buildfarm brings is to
merge the results from several different platforms.

I suppose you could set up some merging process that would add
symbols from a local run to src/tools/pgindent/typedefs.list
but never remove any.  But that hardly removes the need for
an occasional cleanup pass.

regards, tom lane




Re: Remove unnecessary code rom be_lo_put()

2024-04-24 Thread Tom Lane
Peter Eisentraut  writes:
> On 24.04.24 11:59, Yugo NAGATA wrote:
>> I noticed that a permission check is performed in be_lo_put()
>> just after returning inv_open(), but teh permission should be
>> already checked in inv_open(), so I think we can remove this
>> part of codes. I attached a patch for this fix.

> Yes, I think you are right.

I agree.  Do you want to do the honors?

regards, tom lane




Re: Fix parallel vacuum buffer usage reporting

2024-04-24 Thread Alena Rybakina

On 22.04.2024 11:07, Anthonin Bonnefoy wrote:
On Sat, Apr 20, 2024 at 2:00 PM Alena Rybakina 
 wrote:


Hi, thank you for your work with this subject.

While I was reviewing your code, I noticed that your patch
conflicts with another patch [0] that been committed.

[0]

https://www.postgresql.org/message-id/flat/CA%2BhUKGJkOiOCa%2Bmag4BF%2BzHo7qo%3Do9CFheB8%3Dg6uT5TUm2gkvA%40mail.gmail.com


I've rebased the patch and also split the changes:
1: Use pgBufferUsage in Vacuum and Analyze block reporting
2: Stop tracking buffer usage in the now unused Vacuum{Hit,Miss,Dirty} 
variables

3: Remove declarations of Vacuum{Hit,Miss,Dirty}


Hi! Thank you for your work, and I have reviewed your corrections.

I tested the main postgres branch with and without your fix using a 
script that was written by me. It consists of five scenarios and I made 
a comparison in the logs between the original version of the master 
branch and the master branch with your patch:


1. I added 1 million data to the table and deleted 10% of them. I ran 
vacuum verbose and didn't see any differences.

buffer usage: 12585 hits, 0 misses, 4 dirtied
2. I opened another connection with a repeatable read transaction 
through the dblink extension and launched a query updating the records 
in the table under test. Later, I ran vacuum verbose again and also 
didn't see any differences.

buffer usage: 19424 hits, 0 misses, 6 dirtied
3. I commited transaction and ran vacuum verbose again. Everything is 
fine in the logs.

buffer usage: 22960 hits, 0 misses, 11456 dirtied
4.  I deleted all the data from the table and later started vacuum 
verbose again. The number of pages in the buffer matched with your patch 
too.
5.I inserted 1 million data into the table and updated it, and I found 
the difference between the original master version and the version with 
your patch:

with your patch: buffer usage: 32315 hits, 606 misses, 1547 dirtied
original version: buffer usage: 32348 hits, 573 misses, 1456 dirtied
I suppose, something wasn't calculated.

The same script was run, but using vacuum verbose analyze, and I saw the 
difference again in the fifth step:

with your patch: buffer usage: 32312 hits, 607 misses, 1566 dirtied
master: buffer usage: 32346 hits, 573 misses, 1360 dirtied

I have attached a test file (vacuum_check_logs.sql) and four log files: 
two with the vacuum verbose command (vacuum_log and 
vacuum_log_with_patch files) and two others with the vacuum verbose 
analyze command (vacuum_analyze_test_with_patch and vacuum_analyze_test 
files). Both test approaches show logs with and without your changes.

1: Use pgBufferUsage in Vacuum and Analyze block reporting

I think that if the anayze command doesn't have the same issue, we
don't need to change it. Making the vacuum and the analyze consistent
is a good point but I'd like to avoid doing unnecessary changes in
back branches. I think the patch set would contain:

(a) make lazy vacuum use BufferUsage instead of
VacuumPage{Hit,Miss,Dirty}. (backpatched down to pg13).
(b) make analyze use BufferUsage and remove VacuumPage{Hit,Miss,Dirty}
variables for consistency and simplicity (only for HEAD, if we agree).

BTW I realized that VACUUM VERBOSE running on a temp table always
shows the number of dirtied buffers being 0, which seems to be a bug.
The patch (a) will resolve it as well.

I agree with that. I think we can leave these changes to the analysis 
command for master, but I also doubt the need to backport his changes to 
back versions.


--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company
postgres=# \i ~/vacuum_check_logs.sql 
psql:/home/alena/vacuum_check_logs.sql:2: ERROR:  extension "dblink" does not 
exist
CREATE EXTENSION
 dblink_connect 

 OK
(1 row)

psql:/home/alena/vacuum_check_logs.sql:5: ERROR:  table "vestat" does not exist
SET
SET
CREATE TABLE
CREATE INDEX
psql:/home/alena/vacuum_check_logs.sql:15: INFO:  vacuuming 
"postgres.public.vestat"
psql:/home/alena/vacuum_check_logs.sql:15: INFO:  finished vacuuming 
"postgres.public.vestat": index scans: 0
pages: 0 removed, 0 remain, 0 scanned (100.00% of total)
tuples: 0 removed, 0 remain, 0 are dead but not yet removable
removable cutoff: 741, which was 0 XIDs old when operation ended
new relfrozenxid: 741, which is 2 XIDs ahead of previous value
frozen: 0 pages from table (100.00% of total) had 0 tuples frozen
index scan not needed: 0 pages from table (100.00% of total) had 0 dead item 
identifiers removed
I/O timings: read: 0.100 ms, write: 0.000 ms
avg read rate: 29.260 MB/s, avg write rate: 0.000 MB/s
buffer usage: 13 hits, 1 misses, 0 dirtied
WAL usage: 1 records, 0 full page images, 237 bytes
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
VACUUM
 pg_sleep 
--
 
(1 row)

INSERT 0 100
DELETE 10
psql:/home/alena/vacuum_check_logs.sql:24: INFO:  vacuuming 
"postgres.public.vestat"

Re: Race condition in FetchTableStates() breaks synchronization of subscription tables

2024-04-24 Thread Amit Kapila
On Tue, Apr 23, 2024 at 4:53 PM Amit Kapila  wrote:
>
> On Wed, Mar 13, 2024 at 11:59 AM vignesh C  wrote:
> >
> > On Wed, 13 Mar 2024 at 10:12, Zhijie Hou (Fujitsu)
> >  wrote:
> > >
> > >
> > > For 0002, instead of avoid resetting the latch, is it possible to let the
> > > logical rep worker wake up the launcher once after attaching ?
> >
> > Waking up of the launch process uses the same latch that is used for
> > subscription creation/modification and apply worker process exit. As
> > the handling of this latch for subscription creation/modification and
> > worker process exit can be done only by ApplyLauncherMain, we will not
> > be able to reset the latch in WaitForReplicationWorkerAttach. I feel
> > waking up the launcher process might not help in this case as
> > currently we will not be able to differentiate between worker
> > attached, subscription creation/modification and apply worker process
> > exit.
> >
>
> IIUC, even if we set the latch once the worker attaches, the other
> set_latch by subscription creation/modification or apply_worker_exit
> could also be consumed due to reset of latch in
> WaitForReplicationWorkerAttach(). Is that understanding correct? If
> so, can we use some other way to wake up
> WaitForReplicationWorkerAttach() say condition variable?
>

The other possibility is to have a GUC launcher_retry_time or
something like that instead of using a DEFAULT_NAPTIME_PER_CYCLE. This
still may not resolve the issue if launcher_retry_time is longer but
at least users would be able to configure it. I am not sure if this is
a good idea or not but just trying to brainstorm different ideas to
solve this problem.

BTW, as far as I understand, this is an improvement in the existing
code, so should be done only for HEAD (probably PG18) and should be
discussed in a separate thread.

-- 
With Regards,
Amit Kapila.




Re: doc: create table improvements

2024-04-24 Thread Peter Eisentraut

> +   The reliability characteristics of a table are governed by its
> +   persistence mode.  The default mode is described
> +   here
> +   There are two alternative modes that can be specified during
> +   table creation:
> +   temporary and
> +   unlogged.

Not sure reliability is the best word here.  I mean, a temporary table 
isn't any less reliable than any other table.  It just does different 
things.






Re: pg_combinebackup does not detect missing files

2024-04-24 Thread Robert Haas
On Tue, Apr 23, 2024 at 7:23 PM David Steele  wrote:
> > I don't understand what you mean here. I thought we were in agreement
> > that verifying contents would cost a lot more. The verification that
> > we can actually do without much cost can only check for missing files
> > in the most recent backup, which is quite weak. pg_verifybackup is
> > available if you want more comprehensive verification and you're
> > willing to pay the cost of it.
>
> I simply meant that it is *possible* to verify the output of
> pg_combinebackup without explicitly verifying all the backups. There
> would be overhead, yes, but it would be less than verifying each backup
> individually. For my 2c that efficiency would make it worth doing
> verification in pg_combinebackup, with perhaps a switch to turn it off
> if the user is confident in their sources.

Hmm, can you outline the algorithm that you have in mind? I feel we've
misunderstood each other a time or two already on this topic, and I'd
like to avoid more of that. Unless you just mean what the patch I
posted does (check if anything from the final manifest is missing from
the corresponding directory), but that doesn't seem like verifying the
output.

> >> I think it is a worthwhile change and we are still a month away from
> >> beta1. We'll see if anyone disagrees.
> >
> > I don't plan to press forward with this in this release unless we get
> > a couple of +1s from disinterested parties. We're now two weeks after
> > feature freeze and this is design behavior, not a bug. Perhaps the
> > design should have been otherwise, but two weeks after feature freeze
> > is not the time to debate that.
>
> It doesn't appear that anyone but me is terribly concerned about
> verification, even in this weak form, so probably best to hold this
> patch until the next release. As you say, it is late in the game.

Added https://commitfest.postgresql.org/48/4951/

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Tarball builds in the new world order

2024-04-24 Thread Peter Eisentraut

On 24.04.24 00:05, Tom Lane wrote:

It makes tarballs all right, but whatever commit ID you specify
is semi-ignored, and you get a tarball corresponding to HEAD
of master.  (The PDFs come from the right version, though!)

The reason for that is that the mk-one-release script does this
(shorn of not-relevant-here details):

export BASE=/home/pgsql
export GIT_DIR=$BASE/postgresql.git

mkdir pgsql

# Export the selected git ref
git archive ${gitref} | tar xf - -C pgsql


Where does ${gitref} come from?  Why doesn't this line use git archive 
HEAD | ... ?



What I suggest is doing this in mk-one-release:

-make dist
+make dist PG_COMMIT_HASH=${gitref}

and changing the "make dist" rules to write $(PG_COMMIT_HASH) not
HEAD.  The extra make variable will have no effect in the back
branches, while it should cause the right thing to happen with
the new implementation of "make dist".


I suppose we could do something like that, but we'd also need to come up 
with a meson version.


(Let's not use "hash" though, since other ways to commit specify a 
commit can be used.)



This change seems like a good thing anyway for anyone who's tempted
to use "make dist" manually, since they wouldn't necessarily want
to package HEAD either.


A tin-foil-hat argument is that we might not want to encourage that, 
because for reproducibility, we need a known git commit and also a known 
implementation of make dist.  If in the future someone uses the make 
dist implementation of PG19 to build a tarball for PG17, it might not 
come out the same way as using the make dist implementation of PG17.






Proposal: Early providing of PGDG repositories for the major Linux distributions like Fedora or Debian

2024-04-24 Thread Hans Buschmann
Yesterday Fedora 40 was released as GA available release.


According to the project website (download for Linux/Fedora):


" the PostgreSQL project provides a 
repository of packages 
of all supported versions for the most common distributions."

Today (24.4.2024) I upgraded my laptop to Fedora 40, but there where no 
repository available, so I ended with a mix of Fedora 40 and Fedora 39 
installation.

To mitigate this situation, I propose to provide these repositories much 
earlier in the beta phase of the distributions:

Fedora easily switches from a beta to a full installation by simply dnf 
upgrading after GA release. So it should be possible, to test the standard 
repositories and installations in the beta phase.
This also could give a broader coverage of the new standard tools utilized 
(e.g. gcc compiler), wich changed in the current case from major version 13 to 
14.

Instead of following every beta introduction of linux distros I propose to 
provide the repositories with (or shortly after) the standard dates for minor 
releases, especially in February and August of the year, wich seems well 
fitting the rythm of distribution releases.

The website for download also should contain the new upcoming distributions to 
avoid the current 404 error during installation/upgrade.

Should any serious compiler problems arise, the publication can be delayed 
until they are solved.

This proposal also follows the idea of getting reproducable builds and 
integrating build scripts into the source tree. [1]

I think there should be not much extra work, since the same minor versions of 
the Postgres repositories have to be build anyway at release time of the linux 
distribution.

I think for easy installation it is important to have a standard installation 
experience for all potential users so that manual installation or self 
compilation can be avoided.

To complete this message I would kindly ask if somebody has taken a look on bug 
report # 18339 which addressed another problem with fedora installations. [2]

(CC Devrim according to the thread Security lessons from libzlma - libsystemd)

regards

Hans Buschmann

[1] https://www.postgresql.org/message-id/flat/ZgdCpFThi9ODcCsJ%40momjian.us

[2] 
https://www.postgresql.org/message-id/18339-f9dda01a46c0432f%40postgresql.org


Re: Q: Escapes in jsonpath Idents

2024-04-24 Thread David E. Wheeler
On Apr 24, 2024, at 05:46, Peter Eisentraut  wrote:

> I have committed this patch, and backpatched it, as a bug fix, because the 
> existing description was wrong.  To keep the patch minimal for backpatching, 
> I didn't do the conversion to a list.  I'm not sure I like that anyway, 
> because it tends to draw more attention to that part over the surrounding 
> parts, which didn't seem appropriate in this case.  But anyway, if you have 
> any more non-bug-fix editing in this area, which would then target PG18, 
> please send more patches.

Makes sense, that level of detail gets into the weeks so maybe doesn’t need to 
be quite so prominent as a list. Thank you!

David





Re: Cleanup: remove unused fields from nodes

2024-04-24 Thread Tom Lane
Michael Paquier  writes:
> On Tue, Apr 23, 2024 at 11:03:40PM -0400, Tom Lane wrote:
>> Hah.  Seems like the comment for isall needs to explain that it
>> exists for this purpose, so we don't make this mistake again.

> How about something like the attached?

I was thinking about wording like

 * True if DEALLOCATE ALL.  This is redundant with "name == NULL",
 * but we make it a separate field so that exactly this condition
 * (and not the precise name) will be accounted for in query jumbling.

regards, tom lane




Re: minor error message inconsistency in make_pathkey_from_sortinfo

2024-04-24 Thread Yugo NAGATA
On Wed, 24 Apr 2024 15:05:00 +0800
jian he  wrote:

> hi.
> 
> in make_pathkey_from_sortinfo
> 
> equality_op = get_opfamily_member(opfamily,
>   opcintype,
>   opcintype,
>   BTEqualStrategyNumber);
> if (!OidIsValid(equality_op)) /* shouldn't happen */
> elog(ERROR, "missing operator %d(%u,%u) in opfamily %u",
> BTEqualStrategyNumber, opcintype, opcintype, opfamily);
> 
> the error message seems not right?

This message was introduced by 278cb434110 which was aiming to
standardize the wording for similar errors. We can find the pattern

 "missing {support function | operator} %d(%u,%u) in opfamily %u"

in several places.

Regards,
Yugo Nagata

> 
> maybe
> if (!OidIsValid(equality_op)) /* shouldn't happen */
> elog(ERROR, "missing operator =(%u,%u) in opfamily %u",opcintype,
> opcintype, opfamily);
> 
> or
> 
> if (!OidIsValid(equality_op)) /* shouldn't happen */
> elog(ERROR, "missing equality operator for type %u in opfamily
> %u",opcintype, opcintype, opfamily);
> 
> 


-- 
Yugo NAGATA 




Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2024-04-24 Thread Alexander Korotkov
On Wed, Apr 17, 2024 at 6:41 PM Pavel Borisov  wrote:
>>  I did notice (I meant to point out) that I have concerns about this
>> part of the new uniqueness check code:
>> "
>> if (P_IGNORE(topaque) || !P_ISLEAF(topaque))
>> break;
>> "
>>
>> My concern here is with the !P_ISLEAF(topaque) test -- it shouldn't be
>> required
>
> I agree. But I didn't see the need to check uniqueness constraints violations 
> in internal pages. Furthermore, it doesn't mean only a violation of 
> constraint, but a major index corruption. I agree that checking and reporting 
> this type of corruption separately is a possible thing.

I think we could just throw an error in case of an unexpected internal
page.  It doesn't seem reasonable to continue the check with this type
of corruption detected.  If the tree linkage is corrupted we may enter
an endless loop or something.

>> Separately, I dislike the way the target block changes within
>> bt_target_page_check(). The general idea behind verify_nbtree.c's
>> target block is that every block becomes the target exactly once, in a
>> clearly defined place. All corruption (in the index structure itself)
>> is formally considered to be a problem with that particular target
>> block. I want to be able to clearly distinguish between the target and
>> target's right sibling here, to explain my concerns, but they're kinda
>> both the target, so that's a lot harder than it should be. (Admittedly
>> directly blaming the target block has always been a little bit
>> arbitrary, at least in certain cases, but even there it provides
>> structure that makes things much easier to describe unambiguously.)
>
> The possible way to load the target block only once is to get rid of the 
> cross-page uniqueness violation check. I introduced it to catch more possible 
> cases of uniqueness violations. Though they are expected to be extremely 
> rare, and anyway the algorithm doesn't get any warranty, just does its best 
> to catch what is possible. I don't object to this change.

I think we could probably just avoid setting state->target during
cross-page check.  Just save that into a local variable and pass as an
argument where needed.

Skipping the visibility checks for "only one tuple for scan key" case
looks like very valuable optimization [1].

I also think we should wrap lVis_* variables into struct.  That would
make the way we pass them to functions more elegant.

Links.
1. https://www.postgresql.org/message-id/20240325020323.fd.nmisch%40google.com

--
Regards,
Alexander Korotkov




New GUC autovacuum_max_threshold ?

2024-04-24 Thread Frédéric Yhuel

Hello,

I would like to suggest a new parameter, autovacuum_max_threshold, which 
would set an upper limit on the number of tuples to delete/update/insert 
prior to vacuum/analyze.


A good default might be 50.

The idea would be to replace the following calculation :

vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples;

with this one :

vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples / (1 
+ vac_scale_factor * reltuples / autovacuum_max_threshold)


(and the same for the others, vacinsthresh and anlthresh).

The attached graph plots vacthresh against pgclass.reltuples, with 
default settings :


autovacuum_vacuum_threshold = 50
autovacuum_vacuum_scale_factor = 0.2

and

autovacuum_max_threshold = 50 (the suggested default)

Thus, for small tables, vacthresh is only slightly smaller than 0.2 * 
pgclass.reltuples, but it grows towards 50 when reltuples → ∞


The idea is to reduce the need for autovacuum tuning.

The attached (draft) patch further illustrates the idea.

My guess is that a similar proposal has already been submitted... and 
rejected  If so, I'm very sorry for the useless noise.


Best regards,
FrédéricFrom 9027d857e3426f327a2a5f61aec11a7604bb48a9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Fr=C3=A9d=C3=A9ric=20Yhuel?= 
Date: Fri, 19 Apr 2024 14:05:37 +0200
Subject: [PATCH] Add new GUC autovacuum_max_threshold

---
 src/backend/access/common/reloptions.c | 11 +++
 src/backend/postmaster/autovacuum.c| 18 +++---
 src/backend/utils/misc/guc_tables.c|  9 +
 src/include/postmaster/autovacuum.h|  1 +
 src/include/utils/rel.h|  1 +
 5 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/common/reloptions.c 
b/src/backend/access/common/reloptions.c
index 469de9bb49..11a6423aff 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -233,6 +233,15 @@ static relopt_int intRelOpts[] =
},
-1, 0, INT_MAX
},
+   {
+   {
+   "autovacuum_max_threshold",
+   "Maximum number of tuple XXX prior to vacuum/analyze",
+   RELOPT_KIND_HEAP | RELOPT_KIND_TOAST,
+   ShareUpdateExclusiveLock
+   },
+   -1, 0, INT_MAX
+   },
{
{
"autovacuum_vacuum_insert_threshold",
@@ -1845,6 +1854,8 @@ default_reloptions(Datum reloptions, bool validate, 
relopt_kind kind)
offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, 
enabled)},
{"autovacuum_vacuum_threshold", RELOPT_TYPE_INT,
offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, 
vacuum_threshold)},
+   {"autovacuum_max_threshold", RELOPT_TYPE_INT,
+   offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, 
vacuum_max_threshold)},
{"autovacuum_vacuum_insert_threshold", RELOPT_TYPE_INT,
offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, 
vacuum_ins_threshold)},
{"autovacuum_analyze_threshold", RELOPT_TYPE_INT,
diff --git a/src/backend/postmaster/autovacuum.c 
b/src/backend/postmaster/autovacuum.c
index 7dd9345c61..e453c7c824 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -119,6 +119,7 @@ int autovacuum_max_workers;
 intautovacuum_work_mem = -1;
 intautovacuum_naptime;
 intautovacuum_vac_thresh;
+intautovacuum_max_thresh;
 double autovacuum_vac_scale;
 intautovacuum_vac_ins_thresh;
 double autovacuum_vac_ins_scale;
@@ -3017,6 +3018,12 @@ recheck_relation_needs_vacanalyze(Oid relid,
*doanalyze = false;
 }
 
+#define COMPUTE_TRESHOLD(res, scale_factor, base_thresh) \
+do { \
+   res = (float4) base_thresh + scale_factor * reltuples / \
+   (1 + scale_factor * reltuples / vac_max_thresh); \
+} while(0)
+
 /*
  * relation_needs_vacanalyze
  *
@@ -3071,6 +3078,7 @@ relation_needs_vacanalyze(Oid relid,
 
/* constants from reloptions or GUC variables */
int vac_base_thresh,
+   vac_max_thresh,
vac_ins_base_thresh,
anl_base_thresh;
float4  vac_scale_factor,
@@ -3111,6 +3119,10 @@ relation_needs_vacanalyze(Oid relid,
? relopts->vacuum_threshold
: autovacuum_vac_thresh;
 
+   vac_max_thresh = (relopts && relopts->vacuum_max_threshold >= 0)
+   ? relopts->vacuum_max_threshold
+   : autovacuum_max_thresh;
+
vac_ins_scale_factor = (relopts && relopts->vacuum_ins_scale_factor >= 
0)
? relopts->vacuum_ins_scale_factor
: 

Re: WIP Incremental JSON Parser

2024-04-24 Thread Andrew Dunstan



On 2024-04-24 We 04:56, Michael Paquier wrote:

On Fri, Apr 19, 2024 at 02:04:40PM -0400, Robert Haas wrote:

Yeah, I think this patch invented a new solution to a problem that
we've solved in a different way everywhere else. I think we should
change it to match what we do in general.

As of ba3e6e2bca97, did you notice that test_json_parser_perf
generates two core files because progname is not set, failing an
assertion when using the frontend logging?



No, it didn't for me. Thanks for noticing, I've pushed a fix.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Q: Escapes in jsonpath Idents

2024-04-24 Thread Peter Eisentraut

On 18.03.24 01:09, Erik Wienhold wrote:

The error message 'syntax error at or near "$oo" of jsonpath input' for
the second case ($.f$oo), however, looks as if the scanner identifies
'$oo' as a variable instead of contiuing the scan of identifier (f$oo)
for the member accessor.  Looks like a bug to me because a variable
doesn't even make sense in that place.

Right. Maybe the docs should be updated to say that a literal dollar
sign isn’t supported in identifiers, unlike in JavaScript, except
through escapes like this:

Unfortunately, I don't have access to that part of the SQL spec.  So I
don't know how the jsonpath grammar is specified.


The SQL spec says that  corresponds to Identifier 
in ECMAScript.


But it also says,

A  is classified as follows.

Case:

a) A  that is a  is a .

b) A  that begins with  is a
   .

c) Otherwise, a  is a .

Does this help?  I wasn't following all the discussion to see if there 
is anything wrong with the implementation.






Re: Remove unnecessary code rom be_lo_put()

2024-04-24 Thread Peter Eisentraut

On 24.04.24 11:59, Yugo NAGATA wrote:

I noticed that a permission check is performed in be_lo_put()
just after returning inv_open(), but teh permission should be
already checked in inv_open(), so I think we can remove this
part of codes. I attached a patch for this fix.


Yes, I think you are right.

This check was added in 8d9881911f0, but then the refactoring in 
ae20b23a9e7 should probably have removed it.






Re: Tarball builds in the new world order

2024-04-24 Thread Tom Lane
Greg Sabino Mullane  writes:
> On Tue, Apr 23, 2024 at 6:06 PM Tom Lane  wrote:
>> Now, if we just do it exactly like that
>> then trying to "make dist" without setting PG_COMMIT_HASH will
>> fail, since "git archive" has no default for its 
>> argument.  I can't quite decide if that's a good thing, or if we
>> should hack the makefile a little further to allow PG_COMMIT_HASH
>> to default to HEAD.

> Just having it fail seems harsh. What if we had plain "make dist" at least
> output a friendly hint about "please specify a hash"? That seems better
> than an implicit HEAD default, as they can manually set it to HEAD
> themselves per the hint.

Yeah, it would be easy to do something like

ifneq ($(PG_COMMIT_HASH),)
$(GIT) ...
else
@echo "Please specify PG_COMMIT_HASH." && exit 1
endif

I'm just debating whether that's better than inserting a default
value.

regards, tom lane




Re: Improve WALRead() to suck data directly from WAL buffers when possible

2024-04-24 Thread Bharath Rupireddy
On Thu, Apr 11, 2024 at 6:31 AM Michael Paquier  wrote:
>
> On Tue, Apr 09, 2024 at 09:33:49AM +0300, Andrey M. Borodin wrote:
> > As far as I understand CF entry [0] is committed? I understand that
> > there are some open followups, but I just want to determine correct
> > CF item status...
>
> So much work has happened on this thread with things that has been
> committed, so switching the entry to committed makes sense to me.  I
> have just done that.
>
> Bharath, could you create a new thread with the new things you are
> proposing?  All that should be v18 work, particularly v27-0002:
> https://www.postgresql.org/message-id/CALj2ACWCibnX2jcnRreBHFesFeQ6vbKiFstML=w-jvtvukd...@mail.gmail.com

Thanks. I started a new thread
https://www.postgresql.org/message-id/CALj2ACVfF2Uj9NoFy-5m98HNtjHpuD17EDE9twVeJng-jTAe7A%40mail.gmail.com.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Q: Escapes in jsonpath Idents

2024-04-24 Thread Erik Wienhold
On 2024-04-24 13:52 +0200, David E. Wheeler wrote:
> On Apr 24, 2024, at 05:51, Peter Eisentraut  wrote:
> 
> >A  is classified as follows.
> > 
> >Case:
> > 
> >a) A  that is a  is a  >   path context variable>.
> > 
> >b) A  that begins with  is a
> >   .
> > 
> >c) Otherwise, a  is a .
> > 
> > Does this help?  I wasn't following all the discussion to see if
> > there is anything wrong with the implementation.

Thanks Peter!  But what is the definition of the entire path expression?
Perhaps something like:

 ::=  { "."  }

That would imply that "$.$foo" is a valid path that accesses a variable
member (but I guess the path evaluation is also specified somewhere).

Does it say anything about double-quoted accessors?  In table 8.25[1] we
allow member accessor ."$varname" and it says "If the key name matches
some named variable starting with $ or does not meet the JavaScript
rules for an identifier, it must be enclosed in double quotes to make it
a string literal."

What bugs me about this description, after reading it a couple of times,
is that it's not clear what is meant by ."$varname".  It could mean two
things: (1) the double-quoting masks $varname in order to not interpret
those characters as a variable or (2) an interpolated string that
resolves $varname and yields a dynamic member accessor.

The current implementation supports (1), i.e., ."$foo" does not refer to
variable foo but the actual property "$foo":

=> select jsonb_path_query('{"$foo":123,"bar":456}', '$."$foo"', 
'{"foo":"bar"}');
 jsonb_path_query
--
 123
(1 row)

Under case (2) I'd expect that query to return 456 (because $foo
resolves to "bar").  (Similar to how psql would resolve :'foo' to
'bar'.)

Variables already work in array accessors and table 8.25 says that "The
specified index can be an integer, as well as an expression returning a
single numeric value [...]".  A variable is such an expression.

=> select jsonb_path_query('[2,3,5]', '$[$i]', '{"i":1}');
 jsonb_path_query
--
 3
(1 row)

So I'd expect a similar behavior for member accessors as well when
seeing ."$varname" in the same table.

> Yes, it does, as it ties the special meaning of the dollar sign to the
> *beginning* of an expression. So it makes sense that this would be an
> error:
> 
> david=# select '$.$foo'::jsonpath;
> ERROR: syntax error at or near "$foo" of jsonpath input
> LINE 1: select '$.$foo'::jsonpath;
>^
> But I’m less sure when a dollar sign is used in the *middle* (or end)
> of a json path identifier:
> 
> david=# select '$.xx$foo'::jsonpath;
> ERROR:  syntax error at or near "$foo" of jsonpath input
> LINE 1: select '$.xx$foo'::jsonpath;
>^
> Perhaps that should be valid?

Yes, I think so.  That would be case C in the spec excerpt provided by
Peter.  So it's just a key name that happens to contain (but not start
with) the dollar sign.

[1] 
https://www.postgresql.org/docs/current/datatype-json.html#TYPE-JSONPATH-ACCESSORS

-- 
Erik




Re: Statistics Import and Export

2024-04-24 Thread Bruce Momjian
On Tue, Apr 23, 2024 at 06:33:48PM +0200, Matthias van de Meent wrote:
> I've heard of use cases where dumping stats without data would help
> with production database planner debugging on a non-prod system.
> 
> Sure, some planner inputs would have to be taken into account too, but
> having an exact copy of production stats is at least a start and can
> help build models and alerts for what'll happen when the tables grow
> larger with the current stats.
> 
> As for other planner inputs: table size is relatively easy to shim
> with sparse files; cumulative statistics can be copied from a donor
> replica if needed, and btree indexes only really really need to
> contain their highest and lowest values (and need their height set
> correctly).

Is it possible to prevent stats from being updated by autovacuum and
other methods?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Q: Escapes in jsonpath Idents

2024-04-24 Thread David E. Wheeler
On Apr 24, 2024, at 3:22 PM, Erik Wienhold  wrote:

> Thanks Peter!  But what is the definition of the entire path expression?
> Perhaps something like:
> 
> ::=  { "."  }
> 
> That would imply that "$.$foo" is a valid path that accesses a variable
> member (but I guess the path evaluation is also specified somewhere).

I read it as “if it starts with a dollar sign, it’s a variable and not a path 
identifier”, and I assume any `.foo` expression is a path identifier.

> What bugs me about this description, after reading it a couple of times,
> is that it's not clear what is meant by ."$varname".  It could mean two
> things: (1) the double-quoting masks $varname in order to not interpret
> those characters as a variable or (2) an interpolated string that
> resolves $varname and yields a dynamic member accessor.

My understanding is that if it’s in double quotes it’s never anything other 
than a string (whether a string literal or a path identifier string literal). 
IOW, variables don’t interpolate inside strings.

> Under case (2) I'd expect that query to return 456 (because $foo
> resolves to "bar").  (Similar to how psql would resolve :'foo' to
> 'bar'.)

Yes, I suspect this is the correct interpretation, but agree the wording could 
use some massaging, especially since path identifiers cannot start with a 
dollar sign anyway. Perhaps:

"If the key name starts with $ or does not meet the JavaScript rules for an 
identifier, it must be enclosed in double quotes to make it a string literal."

> Variables already work in array accessors and table 8.25 says that "The
> specified index can be an integer, as well as an expression returning a
> single numeric value [...]".  A variable is such an expression.
> 
>=> select jsonb_path_query('[2,3,5]', '$[$i]', '{"i":1}');
> jsonb_path_query
>--
> 3
>(1 row)
> 
> So I'd expect a similar behavior for member accessors as well when
> seeing ."$varname" in the same table.

Oh, interesting point! Now I wonder if the standard has this inconsistency (and 
is aware of it).

> Yes, I think so.  That would be case C in the spec excerpt provided by
> Peter.  So it's just a key name that happens to contain (but not start
> with) the dollar sign.

Exactly. It also matches the doc you quote above. Something would have to 
change in src/backend/utils/adt/jsonpath_scan.l to fix that, but that file 
makes my eyes water, so I’m not gonna take a stab at it. :-)

D





Re: Experiments with Postgres and SSL

2024-04-24 Thread Peter Eisentraut

On 01.03.24 22:49, Jacob Champion wrote:

If we're interested in ALPN negotiation in the future, we may also
want to look at GREASE [1] to keep those options open in the presence
of third-party implementations. Unfortunately OpenSSL doesn't do this
automatically yet.

If we don't have a reason not to, it'd be good to follow the strictest
recommendations from [2] to avoid cross-protocol attacks. (For anyone
currently running web servers and Postgres on the same host, they
really don't want browsers "talking" to their Postgres servers.) That
would mean checking the negotiated ALPN on both the server and client
side, and failing if it's not what we expect.


I've been reading up on ALPN.  There is another thread that is 
discussing PostgreSQL protocol version negotiation, and ALPN also has 
"protocol negotiation" in the name and there is some discussion in this 
thread about the granularity oft the protocol names.


I'm concerned that there appears to be some confusion over whether ALPN 
is a performance feature or a security feature.  RFC 7301 appears to be 
pretty clear that it's for performance, not for security.


Looking at the ALPACA attack, I'm not convinced that it's very relevant 
for PostgreSQL.  It's basically just a case of, you connected to the 
wrong server.  And web browsers routinely open additional connections 
based on what data they have previously received, and they liberally 
send along session cookies to those new connections, so I understand 
that this can be a problem.  But I don't see how ALPN is a good defense. 
 It can help only if all other possible services other than http 
implement it and say, you're a web browser, go away.  And what if the 
rogue server is in fact a web server, then it doesn't help at all.  I 
guess there could be some common configurations where there is a web 
server, and ftp server, and some mail servers running on the same TLS 
end point.  But in how many cases is there also a PostgreSQL server 
running on the same end point?  The page about ALPACA also suggests SNI 
as a mitigation, which seems more sensible, because the burden is then 
on the client to do the right thing, and not on all other servers to 
send away clients doing the wrong thing.  And of course libpq already 
supports SNI.


For the protocol negotiation aspect, how does this work if the wrapped 
protocol already has a version negotiation system?  For example, various 
HTTP versions are registered as separate protocols for ALPN.  What if 
ALPN says it's HTTP/1.0 but the actual HTTP requests specify 1.1, or 
vice versa?  What is the actual mechanism where the performance benefits 
(saving round-trips) are created?  I haven't caught up with HTTP 2 and 
so on, so maybe there are additional things at play there, but it is not 
fully explained in the RFCs.  I suppose PostgreSQL would keep its 
internal protocol version negotiation in any case, but then what do we 
need ALPN on top for?






Use WALReadFromBuffers in more places

2024-04-24 Thread Bharath Rupireddy
Hi,

Commit 91f2cae7a4e that introduced WALReadFromBuffers only used it for
physical walsenders. It can also be used in more places benefitting
logical walsenders, backends running pg_walinspect and logical
decoding functions if the WAL is available in WAL buffers. I'm
attaching a 0001 patch for this.

While at it, I've also added a test module in 0002 patch to
demonstrate 2 things: 1) how the caller can ensure the requested WAL
is fully copied to WAL buffers using WaitXLogInsertionsToFinish before
reading from WAL buffers. 2) how one can implement an xlogreader
page_read callback to read unflushed/not-yet-flushed WAL directly from
WAL buffers. FWIW, a separate test module to explicitly test the new
function is suggested here -
https://www.postgresql.org/message-id/CAFiTN-sE7CJn-ZFj%2B-0Wv6TNytv_fp4n%2BeCszspxJ3mt77t5ig%40mail.gmail.com.

Please have a look at the attached patches.

I will register this for the next commit fest.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v1-0001-Use-WALReadFromBuffers-in-more-places.patch
Description: Binary data


v1-0002-Add-test-module-to-demonstrate-reading-from-WAL-b.patch
Description: Binary data


Re: doc: create table improvements

2024-04-24 Thread David G. Johnston
On Wed, Apr 24, 2024 at 7:45 AM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Wed, Apr 24, 2024 at 3:30 AM Peter Eisentraut 
> wrote:
>
>>  > +   The reliability characteristics of a table are governed by its
>>  > +   persistence mode.  The default mode is described
>>  > +   here
>>  > +   There are two alternative modes that can be specified during
>>  > +   table creation:
>>  > +   temporary and
>>  > +   unlogged.
>>
>> Not sure reliability is the best word here.  I mean, a temporary table
>> isn't any less reliable than any other table.  It just does different
>> things.
>>
>>
> Given the name of the section where this is all discussed I'm having
> trouble going with a different word.  But better framing and phrasing I can
> do:
>
> A table may be opted out of certain storage aspects of reliability, as
> described [here], by specifying either of the alternate persistence modes:
> [temporary] or [logged]. The specific trade-offs and implications are
> detailed below.
>
>
Or maybe:

A table operates in one of three persistence modes (default, [temporary],
and [unlogged]) described in [Chapter 28]. --point to the intro page for
the chapter as expanded as below, not the reliability page.

diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 05e2a8f8be..102cfeca68 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -5,8 +5,17 @@

  
   This chapter explains how to control the reliability of
-  PostgreSQL, including details about the
-  Write-Ahead Log.
+  PostgreSQL.  At its core this
+  involves writing all changes to disk twice - first to a
+  journal of changes called the write-ahead-log (WAL) and
+  then to the physical pages that comprise permanent tables
+  on disk (heap).  This results in four high-level
+  persistence modes for tables.
+  The default mode results in both these features being
+  enabled.  Temporary tables forgo both of these options,
+  while unlogged tables only forgo WAL.  There is no WAL-only
+  operating mode.  The rest of this chapter discusses
+  implementation details related to these two options.
  

David J.


Re: Add notes to pg_combinebackup docs

2024-04-24 Thread Robert Haas
On Mon, Apr 22, 2024 at 2:52 PM Robert Haas  wrote:
> On Thu, Apr 18, 2024 at 3:25 PM Daniel Gustafsson  wrote:
> > > On 18 Apr 2024, at 20:11, Robert Haas  wrote:
> > > 2. As (1), but make check_control_files() emit a warning message when
> > > the problem case is detected.
> >
> > Being in the post-freeze part of the cycle, this seems like the best option.
>
> Here is a patch for that.

Is someone willing to review this patch? cfbot seems happy enough with
it, and I'd like to get it committed since it's an open item, but I'd
be happy to have some review before I do that.

If nobody reviews or indicates an intent to do so in the next couple
of days, I'll go ahead and commit this on my own.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




some additional (small) problems with pg_combinebackup and tablespaces

2024-04-24 Thread Robert Haas
Tomas Vondra pointed out to me a couple of mistakes that I made with
regard to pg_combinebackup and tablespaces.

One is that I screwed up the long_options array by specifying
tablespace-mapping as no_argument rather than required_argument. That
doesn't break the tests I just committed because, in the actual string
passed to getopt_long(), I wrote "T:", which means the short form of
the option works; only the long form does not.

The other is that the documentation says that --tablespace-mapping is
applied to the pathnames from the first backup specified on the
command line. It should have said "final" rather than "first".

Here is a very small patch correcting these regrettable errors.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v1-0001-Fix-small-mistakes-around-pg_combinebackup-with-t.patch
Description: Binary data


Re: Statistics Import and Export

2024-04-24 Thread Matthias van de Meent
On Wed, 24 Apr 2024 at 21:31, Bruce Momjian  wrote:
>
> On Tue, Apr 23, 2024 at 06:33:48PM +0200, Matthias van de Meent wrote:
> > I've heard of use cases where dumping stats without data would help
> > with production database planner debugging on a non-prod system.
> >
> > Sure, some planner inputs would have to be taken into account too, but
> > having an exact copy of production stats is at least a start and can
> > help build models and alerts for what'll happen when the tables grow
> > larger with the current stats.
> >
> > As for other planner inputs: table size is relatively easy to shim
> > with sparse files; cumulative statistics can be copied from a donor
> > replica if needed, and btree indexes only really really need to
> > contain their highest and lowest values (and need their height set
> > correctly).
>
> Is it possible to prevent stats from being updated by autovacuum

You can set autovacuum_analyze_threshold and *_scale_factor to
excessively high values, which has the effect of disabling autoanalyze
until it has had similarly excessive tuple churn. But that won't
guarantee autoanalyze won't run; that guarantee only exists with
autovacuum = off.

> and other methods?

No nice ways. AFAIK there is no command (or command sequence) that can
"disable" only ANALYZE and which also guarantee statistics won't be
updated until ANALYZE is manually "re-enabled" for that table. An
extension could maybe do this, but I'm not aware of any extension
points where this would hook into PostgreSQL in a nice way.

You can limit maintenance access on the table to only trusted roles
that you know won't go in and run ANALYZE for those tables, or even
only your superuser (so only they can run ANALYZE, and have them
promise they won't). Alternatively, you can also constantly keep a
lock on the table that conflicts with ANALYZE. The last few are just
workarounds though, and not all something I'd suggest running on a
production database.

Kind regards,

Matthias van de Meent




Re: Partitioned tables and [un]loggedness

2024-04-24 Thread Nathan Bossart
On Wed, Apr 24, 2024 at 04:17:44PM +0900, Michael Paquier wrote:
> - Support ALTER TABLE .. SET LOGGED/UNLOGGED for partitioned tables,
> where the command only works on partitioned tables so that's only a
> catalog switch.

I'm not following what this means.  Does SET [UN]LOGGED on a partitioned
table recurse to its partitions?  Does this mean that you cannot changed
whether a single partition is [UN]LOGGED?  How does this work with
sub-partitioning?

> - CREATE TABLE PARTITION OF would make a new partition inherit the
> logged ness of the parent if UNLOGGED is not directly specified.

This one seems generally reasonable to me, provided it's properly noted in
the docs.

> - How about ONLY?  Would it make sense to support it so as ALTER TABLE
> ONLY on a partitioned table does not touch any of its partitions?
> Would not specifying ONLY mean that the loggedness of the partitioned
> table and all its partitions is changed?  That would mean a burst in
> WAL when switching to LOGGED, which was a concern when this feature
> was first implemented even for a single table, so for potentially
> hundreds of them, that would really hurt if a DBA is not careful.

I guess ONLY could be a way of changing the default for new partitions
without changing whether existing ones were logged.  I'm not tremendously
concerned about the burst-of-WAL problem.  Or, at least, I'm not any more
concerned about it for partitioned tables than I am for regular tables.  I
do wonder if we can improve the performance of setting tables LOGGED, but
that's for a separate thread...

> - CREATE TABLE does not have a LOGGED keyword, hence it is not
> possible through the parser to force a partition to have a permanent
> persistence even if its partitioned table uses UNLOGGED.

Could we add LOGGED for CREATE TABLE?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-04-24 Thread Justin Pryzby
On Mon, Apr 22, 2024 at 01:31:48PM +0300, Alexander Korotkov wrote:
> Hi!
> 
> On Fri, Apr 19, 2024 at 4:29 PM Alexander Korotkov  
> wrote:
> > On Fri, Apr 19, 2024 at 2:26 AM Dmitry Koval  wrote:
> > > 18.04.2024 19:00, Alexander Lakhin wrote:
> > > > leaves a strange constraint:
> > > > \d+ t*
> > > >Table "public.tp_0"
> > > > ...
> > > > Not-null constraints:
> > > >  "merge-16385-26BCB0-tmp_i_not_null" NOT NULL "i"
> > >
> > > Thanks!
> > > Attached fix (with test) for this case.
> > > The patch should be applied after patches
> > > v6-0001- ... .patch ... v6-0004- ... .patch
> >
> > I've incorporated this fix with 0001 patch.
> >
> > Also added to the patchset
> > 005 – tab completion by Dagfinn [1]
> > 006 – draft fix for table AM issue spotted by Alexander Lakhin [2]
> > 007 – doc review by Justin [3]
> >
> > I'm continuing work on this.
> >
> > Links
> > 1. https://www.postgresql.org/message-id/87plumiox2.fsf%40wibble.ilmari.org
> > 2. 
> > https://www.postgresql.org/message-id/84ada05b-be5c-473e-6d1c-ebe5dd21b190%40gmail.com
> > 3. https://www.postgresql.org/message-id/ZiGH0xc1lxJ71ZfB%40pryzbyj2023
> 
> 0001
> The way we handle name collisions during MERGE PARTITIONS operation is
> reworked by integration of patch [3].  This makes note about commit in
> [2] not relevant.

This patch also/already fixes the schema issue I reported.  Thanks.

If you wanted to include a test case for that:

begin;
CREATE SCHEMA s;
CREATE SCHEMA t;
CREATE TABLE p(i int) PARTITION BY RANGE(i);
CREATE TABLE s.c1 PARTITION OF p FOR VALUES FROM (1)TO(2);
CREATE TABLE s.c2 PARTITION OF p FOR VALUES FROM (2)TO(3);
ALTER TABLE p MERGE PARTITIONS (s.c1, s.c2) INTO s.c1; -- misbehaves if merging 
into the same name as an existing partition
\d+ p
...
Partitions: c1 FOR VALUES FROM (1) TO (3)

> 0002
> The persistence of the new partition is copied as suggested in [1].
> But the checks are in-place, because search_path could influence new
> table persistence.  Per review [2], commit message typos are fixed,
> documentation is revised, revised tests to cover schema-qualification,
> usage of search_path.

Subject: [PATCH v8 2/7] Make new partitions with parent's persistence during 
MERGE/SPLIT operations

This patch adds documentation saying:
+  Any indexes, constraints and user-defined row-level triggers that exist
+  in the parent table are cloned on new partitions [...]

Which is good to say, and addresses part of my message [0]
[0] ZiJW1g2nbQs9ekwK@pryzbyj2023

But it doesn't have anything to do with "creating new partitions with
parent's persistence".  Maybe there was a merge conflict and the docs
ended up in the wrong patch ?

Also, defaults, storage options, compression are also copied.  As will
be anything else from LIKE.  And since anything added in the future will
also be copied, maybe it's better to just say that the tables will be
created the same way as "LIKE .. INCLUDING ALL EXCLUDING ..", or
similar.  Otherwise, the next person who adds a new option for LIKE
would have to remember to update this paragraph...

Also, extended stats objects are currently cloned to new child tables.
But I suggested in [0] that they probably shouldn't be.

> 007 – doc review by Justin [3]

I suggest to drop this patch for now.  I'll send some more minor fixes to
docs and code comments once the other patches are settled.

-- 
Justin




Re: Add notes to pg_combinebackup docs

2024-04-24 Thread Daniel Gustafsson
> On 22 Apr 2024, at 20:52, Robert Haas  wrote:
> 
> On Thu, Apr 18, 2024 at 3:25 PM Daniel Gustafsson  wrote:
>>> On 18 Apr 2024, at 20:11, Robert Haas  wrote:
>>> 2. As (1), but make check_control_files() emit a warning message when
>>> the problem case is detected.
>> 
>> Being in the post-freeze part of the cycle, this seems like the best option.
> 
> Here is a patch for that.

LGTM; only one small comment which you can ignore if you feel it's not worth
the extra words.

+pg_combinebackup when the checksum status of the
+cluster has been changed; see

I would have preferred that this sentence included the problematic period for
the change, perhaps "..has been changed after the initial backup." or ideally
something even better.  In other words, clarifying that if checksums were
enabled before any backups were taken this limitation is not in play.  It's not
critical as the link aptly documents this, it just seems like the sentence is
cut short.

--
Daniel Gustafsson





Re: some additional (small) problems with pg_combinebackup and tablespaces

2024-04-24 Thread Daniel Gustafsson
> On 24 Apr 2024, at 19:59, Robert Haas  wrote:

> Here is a very small patch correcting these regrettable errors.

Patch LGTM.

In addition to those, unless I'm reading it wrong the current coding seems to
include a "-P" short option which is missing in the command parsing switch
statement (or in the help output):

while ((c = getopt_long(argc, argv, "dnNPo:T:",

Should that be removed?

A tiny nit-pick in the same area: while not wrong, it reads a bit odd to have
"don't" and "do not" on lines directly next to each other:

printf(_("  -n, --dry-run don't actually do anything\n"));
printf(_("  -N, --no-sync do not wait for changes to be written 
safely to disk\n"));

--
Daniel Gustafsson





Re: cataloguing NOT NULL constraints

2024-04-24 Thread Alvaro Herrera
On 2024-Apr-22, Alvaro Herrera wrote:

> > On d9f686a72~1 this script results in:
> > ERROR:  cannot change NO INHERIT status of inherited NOT NULL constraint 
> > "t_a_not_null" on relation "t"
> 
> Right.  Now I'm beginning to wonder if allowing ADD CONSTRAINT to mutate
> a pre-existing NO INHERIT constraint into a inheritable constraint
> (while accepting a constraint name in the command that we don't heed) is
> really what we want.  Maybe we should throw some error when the affected
> constraint is the topmost one, and only accept the inheritance status
> change when we're recursing.

So I added a restriction that we only accept such a change when
recursively adding a constraint, or during binary upgrade.  This should
limit the damage: you're no longer able to change an existing constraint
from NO INHERIT to YES INHERIT merely by doing another ALTER TABLE ADD
CONSTRAINT.

One thing that has me a little nervous about this whole business is
whether we're set up to error out where some child table down the
hierarchy has nulls, and we add a not-null constraint to it but fail to
do a verification scan.  I tried a couple of cases and AFAICS it works
correctly, but maybe there are other cases I haven't thought about where
it doesn't.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"You're _really_ hosed if the person doing the hiring doesn't understand
relational systems: you end up with a whole raft of programmers, none of
whom has had a Date with the clue stick."  (Andrew Sullivan)
https://postgr.es/m/20050809113420.gd2...@phlogiston.dyndns.org
>From 238bc09bed57dcd0e4887615f3c57a580eb26d9e Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 22 Apr 2024 11:32:04 +0200
Subject: [PATCH v2 1/2] Acquire locks on children before recursing

ALTER TABLE ADD CONSTRAINT was missing this, as evidenced by assertion
failures.  While at it, create a small routine to encapsulate the weird
find_all_inheritors() call.

Reported-by: Alexander Lakhin 
Discussion: https://postgr.es/m/5b4cd32f-1d5b-c080-c688-31736bbcd...@gmail.com
---
 src/backend/commands/tablecmds.c | 40 +---
 1 file changed, 27 insertions(+), 13 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3556240c8e..9058a0de7a 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -427,6 +427,7 @@ static void ATSimplePermissions(AlterTableType cmdtype, Relation rel, int allowe
 static void ATSimpleRecursion(List **wqueue, Relation rel,
 			  AlterTableCmd *cmd, bool recurse, LOCKMODE lockmode,
 			  AlterTableUtilityContext *context);
+static void ATLockAllDescendants(Oid relid, LOCKMODE lockmode);
 static void ATCheckPartitionsNotInUse(Relation rel, LOCKMODE lockmode);
 static void ATTypedTableRecursion(List **wqueue, Relation rel, AlterTableCmd *cmd,
   LOCKMODE lockmode,
@@ -1621,9 +1622,7 @@ RemoveRelations(DropStmt *drop)
 		 * will lock those objects in the other order.
 		 */
 		if (state.actual_relkind == RELKIND_PARTITIONED_INDEX)
-			(void) find_all_inheritors(state.heapOid,
-	   state.heap_lockmode,
-	   NULL);
+			ATLockAllDescendants(state.heapOid, state.heap_lockmode);
 
 		/* OK, we're ready to delete this one */
 		obj.classId = RelationRelationId;
@@ -4979,10 +4978,12 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 			break;
 		case AT_AddConstraint:	/* ADD CONSTRAINT */
 			ATSimplePermissions(cmd->subtype, rel, ATT_TABLE | ATT_FOREIGN_TABLE);
-			/* Recursion occurs during execution phase */
-			/* No command-specific prep needed except saving recurse flag */
 			if (recurse)
+			{
+/* if recursing, set flag and lock all descendants */
 cmd->recurse = true;
+ATLockAllDescendants(RelationGetRelid(rel), lockmode);
+			}
 			pass = AT_PASS_ADD_CONSTR;
 			break;
 		case AT_AddIndexConstraint: /* ADD CONSTRAINT USING INDEX */
@@ -6721,6 +6722,21 @@ ATSimpleRecursion(List **wqueue, Relation rel,
 	}
 }
 
+/*
+ * ATLockAllDescendants
+ *
+ * Acquire lock on all descendant relations of the given relation.
+ */
+static void
+ATLockAllDescendants(Oid relid, LOCKMODE lockmode)
+{
+	/*
+	 * This is only used in DDL code, so it doesn't matter that we leak the
+	 * list storage; it'll be gone soon enough.
+	 */
+	(void) find_all_inheritors(relid, lockmode, NULL);
+}
+
 /*
  * Obtain list of partitions of the given table, locking them all at the given
  * lockmode and ensuring that they all pass CheckTableNotInUse.
@@ -9370,10 +9386,9 @@ ATPrepAddPrimaryKey(List **wqueue, Relation rel, AlterTableCmd *cmd,
 
 	/*
 	 * Acquire locks all the way down the hierarchy.  The recursion to lower
-	 * levels occurs at execution time as necessary, so we don't need to do it
-	 * here, and we don't need the returned list either.
+	 * levels occurs at execution time as necessary.
 	 */
-	(void) find_all_inheritors(RelationGetRelid(rel), lockmode, NULL);
+	

Re: New GUC autovacuum_max_threshold ?

2024-04-24 Thread Melanie Plageman
On Wed, Apr 24, 2024 at 8:08 AM Frédéric Yhuel
 wrote:
>
> Hello,
>
> I would like to suggest a new parameter, autovacuum_max_threshold, which
> would set an upper limit on the number of tuples to delete/update/insert
> prior to vacuum/analyze.

Hi Frédéric, thanks for the proposal! You are tackling a very tough
problem. I would also find it useful to know more about what led you
to suggest this particular solution. I am very interested in user
stories around difficulties with what tables are autovacuumed and
when.

Am I correct in thinking that one of the major goals here is for a
very large table to be more likely to be vacuumed?

> The idea would be to replace the following calculation :
>
> vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples;
>
> with this one :
>
> vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples / (1
> + vac_scale_factor * reltuples / autovacuum_max_threshold)
>
> (and the same for the others, vacinsthresh and anlthresh).

My first thought when reviewing the GUC and how it is used is
wondering if its description is a bit misleading.

autovacuum_vacuum_threshold is the "minimum number of updated or
deleted tuples needed to trigger a vacuum". That is, if this many
tuples are modified, it *may* trigger a vacuum, but we also may skip
vacuuming the table for other reasons or due to other factors.
autovacuum_max_threshold's proposed definition is the upper
limit/maximum number of tuples to insert/update/delete prior to
vacuum/analyze. This implies that if that many tuples have been
modified or inserted, the table will definitely be vacuumed -- which
isn't true. Maybe that is okay, but I thought I would bring it up.

> The attached (draft) patch further illustrates the idea.

Thanks for including a patch!

> My guess is that a similar proposal has already been submitted... and
> rejected  If so, I'm very sorry for the useless noise.

I rooted around in the hackers archive and couldn't find any threads
on this specific proposal. I copied some other hackers I knew of who
have worked on this problem and thought about it in the past, in case
they know of some existing threads or prior work on this specific
topic.

- Melanie




Re: BitmapHeapScan streaming read user and prelim refactoring

2024-04-24 Thread Melanie Plageman
On Tue, Apr 23, 2024 at 6:43 PM Tomas Vondra
 wrote:
>
> On 4/23/24 18:05, Melanie Plageman wrote:
> > The patch with a fix is attached. I put the test in
> > src/test/regress/sql/join.sql. It isn't the perfect location because
> > it is testing something exercisable with a join but not directly
> > related to the fact that it is a join. I also considered
> > src/test/regress/sql/select.sql, but it also isn't directly related to
> > the query being a SELECT query. If there is a better place for a test
> > of a bitmap heap scan edge case, let me know.
>
> I don't see a problem with adding this to join.sql - why wouldn't this
> count as something related to a join? Sure, it's not like this code
> matters only for joins, but if you look at join.sql that applies to a
> number of other tests (e.g. there are a couple btree tests).

I suppose it's true that other tests in this file use joins to test
other code. I guess if we limited join.sql to containing tests of join
implementation, it would be rather small. I just imagined it would be
nice if tests were grouped by what they were testing -- not how they
were testing it.

> That being said, it'd be good to explain in the comment why we're
> testing this particular plan, not just what the plan looks like.

You mean I should explain in the test comment why I included the
EXPLAIN plan output? (because it needs to contain a bitmapheapscan to
actually be testing anything)

I do have a detailed explanation in my test comment of why this
particular query exercises the code we want to test.

> > One other note: there is some concurrency effect in the parallel
> > schedule group containing "join" where you won't trip the assert if
> > all the tests in that group in the parallel schedule are run. But, if
> > you would like to verify that the test exercises the correct code,
> > just reduce the group containing "join".
> >
>
> That is ... interesting. Doesn't that mean that most test runs won't
> actually detect the problem? That would make the test a bit useless.

Yes, I should really have thought about it more. After further
investigation, I found that the reason it doesn't trip the assert when
the join test is run concurrently with other tests is that the SELECT
query doesn't use the skip fetch optimization because the VACUUM
doesn't set the pages all visible in the VM. In this case, it's
because the tuples' xmins are not before VacuumCutoffs->OldestXmin
(which is derived from GetOldestNonRemovableTransactionId()).

After thinking about it more, I suppose we can't add a test that
relies on the relation being all visible in the VM in a group in the
parallel schedule. I'm not sure this edge case is important enough to
merit its own group or an isolation test. What do you think?

- Melanie




Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM

2024-04-24 Thread Pavel Stehule
st 24. 4. 2024 v 14:50 odesílatel Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> napsal:

> On Wed, Apr 3, 2024 at 1:10 AM Jeff Davis  wrote:
> >
> > Here's where I think this API should go:
> >
> > 1. Have table_modify_begin/end and table_modify_buffer_insert, like
> > those that are implemented in your patch.
>
> I added table_modify_begin, table_modify_buffer_insert,
> table_modify_buffer_flush and table_modify_end. Table Access Method (AM)
> authors now can define their own buffering strategy and flushing decisions
> based on their tuple storage kinds and various other AM specific factors. I
> also added a default implementation that falls back to single inserts when
> no implementation is provided for these AM by AM authors. See the attached
> v19-0001 patch.
>
> > 2. Add some kind of flush callback that will be called either while the
> > tuples are being flushed or after the tuples are flushed (but before
> > they are freed by the AM). (Aside: do we need to call it while the
> > tuples are being flushed to get the right visibility semantics for
> > after-row triggers?)
>
> I added a flush callback named TableModifyBufferFlushCallback; when
> provided by callers invoked after tuples are flushed to disk from the
> buffers but before the AM frees them up. Index insertions and AFTER ROW
> INSERT triggers can be executed in this callback. See the v19-0001 patch
> for how AM invokes the flush callback, and see either v19-0003 or v19-0004
> or v19-0005 for how a caller can supply the callback and required context
> to execute index insertions and AR triggers.
>
> > 3. Add table_modify_buffer_{update|delete} APIs.
> >
> > 9. Use these new methods for DELETE, UPDATE, and MERGE. MERGE can use
> > the buffer_insert/update/delete APIs; we don't need a separate merge
> > method. This probably requires that the AM maintain 3 separate buffers
> > to distinguish different kinds of changes at flush time (obviously
> > these can be initialized lazily to avoid overhead when not being used).
>
> I haven't thought about these things yet. I can only focus on them after
> seeing how the attached patches go from here.
>
> > 4. Some kind of API tweaks to help manage memory when modifying
> > pertitioned tables, so that the buffering doesn't get out of control.
> > Perhaps just reporting memory usage and allowing the caller to force
> > flushes would be enough.
>
> Heap implementation for thes new Table AMs uses a separate memory context
> for all of the operations. Please have a look and let me know if we need
> anything more.
>
> > 5. Use these new methods for CREATE/REFRESH MATERIALIZED VIEW. This is
> > fairly straightforward, I believe, and handled by your patch. Indexes
> > are (re)built afterward, and no triggers are possible.
> >
> > 6. Use these new methods for CREATE TABLE ... AS. This is fairly
> > straightforward, I believe, and handled by your patch. No indexes or
> > triggers are possible.
>
> I used multi inserts for all of these including TABLE REWRITE commands
> such as ALTER TABLE. See the attached v19-0002 patch. Check the testing
> section below for benefits.
>
> FWIW, following are some of the TABLE REWRITE commands that can get
> benefitted:
>
> ALTER TABLE tbl ALTER c1 TYPE bigint;
> ALTER TABLE itest13 ADD COLUMN c int GENERATED BY DEFAULT AS IDENTITY;
> ALTER MATERIALIZED VIEW heapmv SET ACCESS METHOD heap2;
> ALTER TABLE itest3 ALTER COLUMN a TYPE int;
> ALTER TABLE gtest20 ALTER COLUMN b SET EXPRESSION AS (a * 3);
> ALTER TABLE has_volatile ADD col4 int DEFAULT (random() * 1)::int;
> and so on.
>
> > 7. Use these new methods for COPY. We have to be careful to avoid
> > regressions for the heap method, because it's already managing its own
> > buffers. If the AM manages the buffering, then it may require
> > additional copying of slots, which could be a disadvantage. To solve
> > this, we may need some minor API tweaks to avoid copying when the
> > caller guarantees that the memory will not be freed to early, or
> > perhaps expose the AM's memory context to copyfrom.c. Another thing to
> > consider is that the buffering in copyfrom.c is also used for FDWs, so
> > that buffering code path needs to be preserved in copyfrom.c even if
> > not used for AMs.
>
> I modified the COPY FROM code to use the new Table AMs, and performed some
> tests which show no signs of regression. Check the testing section below
> for more details. See the attached v19-0005 patch. With this,
> table_multi_insert can be deprecated.
>
> > 8. Use these new methods for INSERT INTO ... SELECT. One potential
> > challenge here is that execution nodes are not always run to
> > completion, so we need to be sure that the flush isn't forgotten in
> > that case.
>
> I did that in v19-0003. I did place the table_modify_end call in multiple
> places including ExecEndModifyTable. I didn't find any issues with it.
> Please have a look and let me know if we need the end call in more places.
> Check the testing section 

Re: New GUC autovacuum_max_threshold ?

2024-04-24 Thread Nathan Bossart
On Wed, Apr 24, 2024 at 03:10:27PM -0400, Melanie Plageman wrote:
> On Wed, Apr 24, 2024 at 8:08 AM Frédéric Yhuel
>  wrote:
>> I would like to suggest a new parameter, autovacuum_max_threshold, which
>> would set an upper limit on the number of tuples to delete/update/insert
>> prior to vacuum/analyze.
> 
> Hi Frédéric, thanks for the proposal! You are tackling a very tough
> problem. I would also find it useful to know more about what led you
> to suggest this particular solution. I am very interested in user
> stories around difficulties with what tables are autovacuumed and
> when.
> 
> Am I correct in thinking that one of the major goals here is for a
> very large table to be more likely to be vacuumed?

If this is indeed the goal, +1 from me for doing something along these
lines.

>> The idea would be to replace the following calculation :
>>
>> vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples;
>>
>> with this one :
>>
>> vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples / (1
>> + vac_scale_factor * reltuples / autovacuum_max_threshold)
>>
>> (and the same for the others, vacinsthresh and anlthresh).
> 
> My first thought when reviewing the GUC and how it is used is
> wondering if its description is a bit misleading.

Yeah, I'm having trouble following the proposed mechanics for this new GUC,
and it's difficult to understand how users would choose a value.  If we
just want to cap the number of tuples required before autovacuum takes
action, perhaps we could simplify it to something like

vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples;
vacthresh = Min(vacthres, vac_max_thresh);

This would effectively cause autovacuum_vacuum_scale_factor to be
overridden for large tables where the scale factor would otherwise cause
the calculated threshold to be extremely high.

>> My guess is that a similar proposal has already been submitted... and
>> rejected  If so, I'm very sorry for the useless noise.
> 
> I rooted around in the hackers archive and couldn't find any threads
> on this specific proposal. I copied some other hackers I knew of who
> have worked on this problem and thought about it in the past, in case
> they know of some existing threads or prior work on this specific
> topic.

FWIW I have heard about this problem in the past, too.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: A varint implementation for PG?

2024-04-24 Thread Nathan Bossart
On Thu, Jan 05, 2023 at 06:36:15PM -0500, Robert Haas wrote:
> Andres asked me off-list if I could take another look at this.

I'm curious whether there are plans to pick this up again.  IMHO it seems
like a generally good idea.  AFAICT the newest version of the patch is in a
separate thread [0], which I just wanted to link here for posterity.  If
there are no plans, I might give it a try, but otherwise I'm happy to help
review.

[0] https://postgr.es/m/20221004234952.anrguppx5owewb6n%40awork3.anarazel.de

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Row pattern recognition

2024-04-24 Thread Jacob Champion
On Tue, Apr 23, 2024 at 8:13 PM Tatsuo Ishii  wrote:
> SELECT v.a, count(*) OVER w
> FROM (VALUES ('A'),('B'),('B'),('C')) AS v (a)
> WINDOW w AS (
>   ORDER BY v.a
>   ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>   PATTERN (B+)
>   DEFINE B AS a = 'B'
> )
>  a | count
> ---+---
>  A | 0
>  B | 2
>  B |
>  C | 0
> (4 rows)
>
> Here row 3 is skipped because the pattern B matches row 2 and 3. In
> this case I think cont(*) should return 0 rathern than NULL for row 3.

I think returning zero would match Vik's explanation upthread [1],
yes. Unfortunately I don't have a spec handy to double-check for
myself right now.

--Jacob

[1] 
https://www.postgresql.org/message-id/c9ebc3d0-c3d1-e8eb-4a57-0ec099cbda17%40postgresfriends.org




Re: Experiments with Postgres and SSL

2024-04-24 Thread Peter Eisentraut

On 08.04.24 10:38, Heikki Linnakangas wrote:

On 08/04/2024 04:25, Heikki Linnakangas wrote:

One important open item now is that we need to register a proper ALPN
protocol ID with IANA.


I sent a request for that: 
https://mailarchive.ietf.org/arch/msg/tls-reg-review/9LWPzQfOpbc8dTT7vc9ahNeNaiw/


Why did you ask for "pgsql"?  The IANA protocol name for port 5432 is 
"postgres".  This seems confusing.






Re: [PoC] Improve dead tuple storage for lazy vacuum

2024-04-24 Thread Noah Misch
On Mon, Apr 15, 2024 at 04:12:38PM +0700, John Naylor wrote:
> - Some paths for single-value leaves are not covered:
> 
> https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/include/lib/radixtree.h.gcov.html#L904
> https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/include/lib/radixtree.h.gcov.html#L954
> https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/include/lib/radixtree.h.gcov.html#L2606
> 
> However, these paths do get regression test coverage on 32-bit
> machines. 64-bit builds only have leaves in the TID store, which
> doesn't (currently) delete entries, and doesn't instantiate the tree
> with the debug option.
> 
> - In RT_SET "if (found)" is not covered:
> 
> https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/include/lib/radixtree.h.gcov.html#L1768
> 
> That's because we don't yet have code that replaces an existing value
> with a value of a different length.

I saw a SIGSEGV there when using tidstore to write a fix for something else.
Patch attached.
Author: Noah Misch 
Commit: Noah Misch 

radixtree: Fix SIGSEGV at update of embeddable value to non-embeddable.

Also, fix a memory leak when updating from non-embeddable to embeddable.
Both were unreachable without adding C code.

Reviewed by FIXME.

Discussion: https://postgr.es/m/FIXME

diff --git a/src/include/lib/radixtree.h b/src/include/lib/radixtree.h
index dc4c00d..aa8f44c 100644
--- a/src/include/lib/radixtree.h
+++ b/src/include/lib/radixtree.h
@@ -1749,6 +1749,9 @@ have_slot:
 
if (RT_VALUE_IS_EMBEDDABLE(value_p))
{
+   if (found && !RT_CHILDPTR_IS_VALUE(*slot))
+   RT_FREE_LEAF(tree, *slot);
+
/* store value directly in child pointer slot */
memcpy(slot, value_p, value_sz);
 
@@ -1765,7 +1768,7 @@ have_slot:
{
RT_CHILD_PTR leaf;
 
-   if (found)
+   if (found && !RT_CHILDPTR_IS_VALUE(*slot))
{
Assert(RT_PTR_ALLOC_IS_VALID(*slot));
leaf.alloc = *slot;
diff --git a/src/test/modules/test_tidstore/expected/test_tidstore.out 
b/src/test/modules/test_tidstore/expected/test_tidstore.out
index 06c610e..d116927 100644
--- a/src/test/modules/test_tidstore/expected/test_tidstore.out
+++ b/src/test/modules/test_tidstore/expected/test_tidstore.out
@@ -79,6 +79,96 @@ SELECT test_destroy();
  
 (1 row)
 
+-- Test replacements crossing RT_CHILDPTR_IS_VALUE in both directions
+SELECT test_create(false);
+ test_create 
+-
+ 
+(1 row)
+
+SELECT do_set_block_offsets(1, array[1]::int2[]); SELECT 
check_set_block_offsets();
+ do_set_block_offsets 
+--
+1
+(1 row)
+
+ check_set_block_offsets 
+-
+ 
+(1 row)
+
+SELECT do_set_block_offsets(1, array[1,2]::int2[]); SELECT 
check_set_block_offsets();
+ do_set_block_offsets 
+--
+1
+(1 row)
+
+ check_set_block_offsets 
+-
+ 
+(1 row)
+
+SELECT do_set_block_offsets(1, array[1,2,3]::int2[]); SELECT 
check_set_block_offsets();
+ do_set_block_offsets 
+--
+1
+(1 row)
+
+ check_set_block_offsets 
+-
+ 
+(1 row)
+
+SELECT do_set_block_offsets(1, array[1,2,3,4]::int2[]); SELECT 
check_set_block_offsets();
+ do_set_block_offsets 
+--
+1
+(1 row)
+
+ check_set_block_offsets 
+-
+ 
+(1 row)
+
+SELECT do_set_block_offsets(1, array[1,2,3]::int2[]); SELECT 
check_set_block_offsets();
+ do_set_block_offsets 
+--
+1
+(1 row)
+
+ check_set_block_offsets 
+-
+ 
+(1 row)
+
+SELECT do_set_block_offsets(1, array[1,2]::int2[]); SELECT 
check_set_block_offsets();
+ do_set_block_offsets 
+--
+1
+(1 row)
+
+ check_set_block_offsets 
+-
+ 
+(1 row)
+
+SELECT do_set_block_offsets(1, array[1]::int2[]); SELECT 
check_set_block_offsets();
+ do_set_block_offsets 
+--
+1
+(1 row)
+
+ check_set_block_offsets 
+-
+ 
+(1 row)
+
+SELECT test_destroy();
+ test_destroy 
+--
+ 
+(1 row)
+
 -- Use shared memory this time. We can't do that in test_radixtree.sql,
 -- because unused static functions would raise warnings there.
 SELECT test_create(true);
diff --git a/src/test/modules/test_tidstore/sql/test_tidstore.sql 
b/src/test/modules/test_tidstore/sql/test_tidstore.sql
index bb31877..704d869 100644
--- a/src/test/modules/test_tidstore/sql/test_tidstore.sql
+++ b/src/test/modules/test_tidstore/sql/test_tidstore.sql
@@ -14,6 +14,7 @@ CREATE TEMP TABLE hideblocks(blockno bigint);
 -- We use a higher number to test tidstore.
 \set maxoffset 512
 
+
 SELECT test_create(false);
 
 -- Test on empty tidstore.
@@ -50,6 +51,20 @@ SELECT 

Re: Improve WALRead() to suck data directly from WAL buffers when possible

2024-04-24 Thread Michael Paquier
On Wed, Apr 24, 2024 at 09:46:20PM +0530, Bharath Rupireddy wrote:
> Thanks. I started a new thread
> https://www.postgresql.org/message-id/CALj2ACVfF2Uj9NoFy-5m98HNtjHpuD17EDE9twVeJng-jTAe7A%40mail.gmail.com.

Cool, thanks.
--
Michael


signature.asc
Description: PGP signature


Re: Support "Right Semi Join" plan shapes

2024-04-24 Thread Richard Guo
Here is another rebase with a commit message to help review.  I also
tweaked some comments.

Thanks
Richard


v5-0001-Support-Right-Semi-Join-plan-shapes.patch
Description: Binary data


Re: Partitioned tables and [un]loggedness

2024-04-24 Thread Michael Paquier
On Wed, Apr 24, 2024 at 03:26:40PM -0500, Nathan Bossart wrote:
> On Wed, Apr 24, 2024 at 04:17:44PM +0900, Michael Paquier wrote:
> > - Support ALTER TABLE .. SET LOGGED/UNLOGGED for partitioned tables,
> > where the command only works on partitioned tables so that's only a
> > catalog switch.
> 
> I'm not following what this means.  Does SET [UN]LOGGED on a partitioned
> table recurse to its partitions?  Does this mean that you cannot changed
> whether a single partition is [UN]LOGGED?  How does this work with
> sub-partitioning?

The POC implements ALTER TABLE .. SET LOGGED/UNLOGGED so as the change
only reflects on the relation defined on the query.  In short, if
specifying a partitioned table as relation, only its relpersistence is
changed in the patch.  If sub-partitioning is involved, the POC
behaves the same way, relpersistence for partitioned table A1 attached
to partitioned table A does not change under ALTER TABLE A.

Recursing to all the partitions and partitioned tables attached to a
partitioned table A when using an ALTER TABLE A, when ONLY is not
specified, is OK by me if that's the consensus folks prefer.  I just
wanted to gather some opinions first about what people find the most
natural.

>> - CREATE TABLE PARTITION OF would make a new partition inherit the
>> logged ness of the parent if UNLOGGED is not directly specified.
> 
> This one seems generally reasonable to me, provided it's properly noted in
> the docs.

Noted.  That's a second piece.  This part felt natural to me, but it
would not fly far without a LOGGED keyword and a way to attach a new
"undefined" RELPERSISTENCE_ in gram.y's OptTemp, likely a '\0'.
That's going to require some tweaks for CTAS as these cannot be
partitioned, but that should be a few lines to fall back to a
permanent if the query is parsed with the new "undefined".

>> - How about ONLY?  Would it make sense to support it so as ALTER TABLE
>> ONLY on a partitioned table does not touch any of its partitions?
>> Would not specifying ONLY mean that the loggedness of the partitioned
>> table and all its partitions is changed?  That would mean a burst in
>> WAL when switching to LOGGED, which was a concern when this feature
>> was first implemented even for a single table, so for potentially
>> hundreds of them, that would really hurt if a DBA is not careful.
> 
> I guess ONLY could be a way of changing the default for new partitions
> without changing whether existing ones were logged.  I'm not tremendously
> concerned about the burst-of-WAL problem.  Or, at least, I'm not any more
> concerned about it for partitioned tables than I am for regular tables.  I
> do wonder if we can improve the performance of setting tables LOGGED, but
> that's for a separate thread...

Yeah.  A burst of WAL caused by a switch to LOGGED for a few thousand
partitions would never be fun, perhaps I'm just worrying to much as
that should not be a regular operation.

>> - CREATE TABLE does not have a LOGGED keyword, hence it is not
>> possible through the parser to force a partition to have a permanent
>> persistence even if its partitioned table uses UNLOGGED.
> 
> Could we add LOGGED for CREATE TABLE?

I don't see why not if people agree with it, that's a patch of its own
that would help greatly in making the [un]logged attribute be
inherited for new partitions, because it is them possible to force a
persistence to be permanent as much as unlogged at query level, easing
the manipulation of partition trees.
--
Michael


signature.asc
Description: PGP signature


Re: Partitioned tables and [un]loggedness

2024-04-24 Thread David G. Johnston
On Wed, Apr 24, 2024 at 4:35 PM Michael Paquier  wrote:

>
> I disagree here, actually.  Temporary tables are a different beast
> because they require automated cleanup which would include interacting
> with the partitionining information if temp and non-temp relations are
> mixed.  That's why the current restrictions are in place: you should
>
[ not ] ?

> be able to mix them.
>

My point is that if you feel that treating logged as a copy-able property
is OK then doing the following should also just work:

postgres=# create temp table parentt ( id integer ) partition by range (id);
CREATE TABLE
postgres=# create table child10t partition of parentt for values from (0)
to (9);
ERROR:  cannot create a permanent relation as partition of temporary
relation "parentt"

i.e., child10t should be created as a temporary partition under parentt.

David J.


Re: Partitioned tables and [un]loggedness

2024-04-24 Thread Michael Paquier
On Thu, Apr 25, 2024 at 08:35:32AM +0900, Michael Paquier wrote:
> That's why the current restrictions are in place: you should
> be able to mix them.

Correction due to a ENOCOFFEE: you should *not* be able to mix them.
--
Michael


signature.asc
Description: PGP signature


Re: Remove unnecessary code rom be_lo_put()

2024-04-24 Thread Michael Paquier
On Wed, Apr 24, 2024 at 09:25:09AM -0400, Tom Lane wrote:
> I agree.  Do you want to do the honors?

Good catch.  The same check happens when the object is opened.  Note
that you should be able to remove utils/acl.h at the top of
be-fsstubs.c as this would remove the last piece of code that does an
ACL check in this file.  No objections with doing that now, removing
this code.
--
Michael


signature.asc
Description: PGP signature


Re: docs: minor typo fix for "lower(anymultirange)"

2024-04-24 Thread Richard Guo
On Thu, Apr 25, 2024 at 8:40 AM Ian Lawrence Barwick 
wrote:

> Hi
>
> Here:
>
>
> https://www.postgresql.org/docs/current/functions-range.html#MULTIRANGE-FUNCTIONS-TABLE
>
> the description for "lower(anymultirange)":
>
> > (NULL if the multirange is empty has no lower bound).
>
> is missing "or" and should be:
>
> > (NULL if the multirange is empty or has no lower bound).


Good catch!  I checked the descriptions for "upper(anymultirange)",
"lower(anyrange)" and "upper(anyrange)", and they are all correct.  We
should fix this one.

Thanks
Richard


Re: Partitioned tables and [un]loggedness

2024-04-24 Thread Michael Paquier
On Wed, Apr 24, 2024 at 03:36:35PM -0700, David G. Johnston wrote:
> On Wed, Apr 24, 2024 at 1:26 PM Nathan Bossart 
> wrote:
>> On Wed, Apr 24, 2024 at 04:17:44PM +0900, Michael Paquier wrote:
>>> - CREATE TABLE PARTITION OF would make a new partition inherit the
>>> logged ness of the parent if UNLOGGED is not directly specified.
>>
>> This one seems generally reasonable to me, provided it's properly noted in
>> the docs.
> 
> I don't see this being desirable at this point.  And especially not by
> itself.  It is an error to not specify TEMP on the partition create table
> command when the parent is temporary, and that one is a no-brainer for
> having the persistence mode of the child be derived from the parent.  The
> current policy of requiring the persistence of the child to be explicitly
> specified seems perfectly reasonable.  Or, put differently, the specific
> current persistence of the parent doesn't get copied or even considered
> when creating children.

I disagree here, actually.  Temporary tables are a different beast
because they require automated cleanup which would include interacting
with the partitionining information if temp and non-temp relations are
mixed.  That's why the current restrictions are in place: you should
be able to mix them.  Mixing unlogged and logged is OK, they retain a
state in their catalogs.

> In any case we aren't going to be able to do exactly what it means by
> marking a partitioned table unlogged - namely that we execute the truncate
> command on it after a crash.  Forcing the implementation here just so that
> our existing decision to ignore unlogged on the parent table is, IMO,
> letting optics corrupt good design.

It depends on retention policies, for one.  I could imagine an
application where partitioning is used based on a key where we
classify records based on their persistency, and one does not care
about a portion of them, still would like some retention as long as
the application is healthy.

> I do agree with having an in-core way for the DBA to communicate that they
> wish for partitions to be created with a known persistence unless the
> create table command specifies something different.  The way I would
> approach this is to add something like the following to the syntax/system:
> 
> CREATE [ persistence_mode ] TABLE ...
> ...
> WITH (partition_default_persistence = { logged, unlogged, temporary }) --
> storage_parameter, logged is effectively the default

While we have keywords to drive that at query level for TEMP and
UNLOGGED?  Not sure to be on board with this idea.  pg_dump may need
some changes to reflect the loggedness across the partitions, now that
I think about it even if there should be an ATTACH once the table is
created to link it to its partitioned table.  There should be no
rewrites at restore.

> I do agree with adding LOGGED to the list of optional persistence_mode
> specifiers, possibly regardless of whether any of this happens.  Seems
> desirable to give our default mode a name.

Yeah, at least it looks like Nathan and you are OK with this addition.
--
Michael


signature.asc
Description: PGP signature


Re: Use XLOG_CONTROL_FILE macro everywhere?

2024-04-24 Thread Michael Paquier
On Wed, Apr 24, 2024 at 12:32:46PM +0300, Anton A. Melnikov wrote:
> To ensure backward compatibility we can save the old macro like this:
> 
> #define XLOG_CONTROL_FILE "global/pg_control"
> #define PG_CONTROL_FILE   XLOG_CONTROL_FILE
> 
> With the best wishes,

Not sure that I would bother with a second one.  But, well, why not if
people want to rename it, as long as you keep compatibility.
--
Michael


signature.asc
Description: PGP signature


docs: minor typo fix for "lower(anymultirange)"

2024-04-24 Thread Ian Lawrence Barwick
Hi

Here:

 
https://www.postgresql.org/docs/current/functions-range.html#MULTIRANGE-FUNCTIONS-TABLE

the description for "lower(anymultirange)":

> (NULL if the multirange is empty has no lower bound).

is missing "or" and should be:

> (NULL if the multirange is empty or has no lower bound).

Seems to have been omitted with 7539a1b2 (REL_16_STABLE + master).

Very minor issue, but I have seen it and can no longer unsee it.

Regards

Ian Barwick
From e98ee3934cd8d7360388d02b79b880c328f0 Mon Sep 17 00:00:00 2001
From: Ian Barwick 
Date: Thu, 25 Apr 2024 09:29:30 +0900
Subject: [PATCH v1] doc: fix "lower ( anymultirange )" description

Add missing "or".
---
 doc/src/sgml/func.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index ebcd936acb..a46530abee 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -21298,7 +21298,7 @@ SELECT NULLIF(value, '(none)') ...


 Extracts the lower bound of the multirange (NULL if the
-multirange is empty has no lower bound).
+multirange is empty or has no lower bound).


 lower('{[1.1,2.2)}'::nummultirange)

base-commit: e51c2a46b54ee294c699499bedf76940ce2f392c
-- 
2.39.3



Re: Extend ALTER DEFAULT PRIVILEGES for large objects

2024-04-24 Thread Nathan Bossart
On Tue, Apr 23, 2024 at 11:47:38PM -0400, Tom Lane wrote:
> On the whole I find this proposed feature pretty unexciting
> and dubiously worthy of the implementation/maintenance effort.

I don't have any particularly strong feelings on $SUBJECT, but I'll admit
I'd be much more interested in resolving any remaining reasons folks are
using large objects over TOAST.  I see a couple of reasons listed in the
docs [0] that might be worth examining.

[0] https://www.postgresql.org/docs/devel/lo-intro.html

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Partitioned tables and [un]loggedness

2024-04-24 Thread David G. Johnston
On Wed, Apr 24, 2024 at 1:26 PM Nathan Bossart 
wrote:

> On Wed, Apr 24, 2024 at 04:17:44PM +0900, Michael Paquier wrote:
> > - Support ALTER TABLE .. SET LOGGED/UNLOGGED for partitioned tables,
> > where the command only works on partitioned tables so that's only a
> > catalog switch.
>
> I'm not following what this means.  Does SET [UN]LOGGED on a partitioned
> table recurse to its partitions?  Does this mean that you cannot changed
> whether a single partition is [UN]LOGGED?  How does this work with
> sub-partitioning?
>
> > - CREATE TABLE PARTITION OF would make a new partition inherit the
> > logged ness of the parent if UNLOGGED is not directly specified.
>
> This one seems generally reasonable to me, provided it's properly noted in
> the docs.
>

I don't see this being desirable at this point.  And especially not by
itself.  It is an error to not specify TEMP on the partition create table
command when the parent is temporary, and that one is a no-brainer for
having the persistence mode of the child be derived from the parent.  The
current policy of requiring the persistence of the child to be explicitly
specified seems perfectly reasonable.  Or, put differently, the specific
current persistence of the parent doesn't get copied or even considered
when creating children.

In any case we aren't going to be able to do exactly what it means by
marking a partitioned table unlogged - namely that we execute the truncate
command on it after a crash.  Forcing the implementation here just so that
our existing decision to ignore unlogged on the parent table is, IMO,
letting optics corrupt good design.

I do agree with having an in-core way for the DBA to communicate that they
wish for partitions to be created with a known persistence unless the
create table command specifies something different.  The way I would
approach this is to add something like the following to the syntax/system:

CREATE [ persistence_mode ] TABLE ...
...
WITH (partition_default_persistence = { logged, unlogged, temporary }) --
storage_parameter, logged is effectively the default

This method is more explicit and has zero implications for existing backups
or upgrading.


> > - How about ONLY?  Would it make sense to support it so as ALTER TABLE
> > ONLY on a partitioned table does not touch any of its partitions?
>

I'd leave it to the community to develop and maintain scripts that iterate
over the partition hierarchy and toggle the persistence mode between logged
and unlogged on the individual partitions using whatever throttling and
batching strategy each individual user requires for their situation.


> > - CREATE TABLE does not have a LOGGED keyword, hence it is not
> > possible through the parser to force a partition to have a permanent
> > persistence even if its partitioned table uses UNLOGGED.
>
> Could we add LOGGED for CREATE TABLE?
>
>
I do agree with adding LOGGED to the list of optional persistence_mode
specifiers, possibly regardless of whether any of this happens.  Seems
desirable to give our default mode a name.

David J.


Re: Cleanup: remove unused fields from nodes

2024-04-24 Thread Michael Paquier
On Wed, Apr 24, 2024 at 08:31:57AM -0400, Tom Lane wrote:
> I was thinking about wording like
> 
>  * True if DEALLOCATE ALL.  This is redundant with "name == NULL",
>  * but we make it a separate field so that exactly this condition
>  * (and not the precise name) will be accounted for in query jumbling.

Fine by me.  I've just used that and applied a patch to doing so.
Thanks.
--
Michael


signature.asc
Description: PGP signature


RE: Race condition in FetchTableStates() breaks synchronization of subscription tables

2024-04-24 Thread Zhijie Hou (Fujitsu)
On Wednesday, April 24, 2024 6:29 PM vignesh C  wrote:
> 
> On Wed, 24 Apr 2024 at 11:59, Amit Kapila  wrote:
> >
> > On Wed, Mar 13, 2024 at 9:19 AM vignesh C  wrote:
> > >
> > > On Tue, 12 Mar 2024 at 09:34, Ajin Cherian  wrote:
> > > >
> > > >
> > > >
> > > > On Tue, Mar 12, 2024 at 2:59 PM vignesh C 
> wrote:
> > > >>
> > > >>
> > > >> Thanks, I have created the following Commitfest entry for this:
> > > >> https://commitfest.postgresql.org/47/4816/
> > > >>
> > > >> Regards,
> > > >> Vignesh
> > > >
> > > >
> > > > Thanks for the patch, I have verified that the fix works well by 
> > > > following
> the steps mentioned to reproduce the problem.
> > > > Reviewing the patch, it seems good and is well documented. Just one
> minor comment I had was probably to change the name of the variable
> table_states_valid to table_states_validity. The current name made sense when
> it was a bool, but now that it is a tri-state enum, it doesn't fit well.
> > >
> > > Thanks for reviewing the patch, the attached v6 patch has the
> > > changes for the same.
> > >
> >
> > v6_0001* looks good to me. This should be backpatched unless you or
> > others think otherwise.
> 
> This patch needs to be committed in master,PG16 and PG15.
> This is not required from PG14 onwards because they don't have
> HasSubscriptionRelations call before updating table_states_valid:
> /*
>  * Does the subscription have tables?
>  *
>  * If there were not-READY relations found then we know it does. But
>  * if table_states_not_ready was empty we still need to check again to
>  * see if there are 0 tables.
>  */
> has_subrels = (table_states_not_ready != NIL) ||
>   HasSubscriptionRelations(MySubscription->oid);
> 
> So the invalidation function will not be called here.
> 
> Whereas for PG15 and newer versions this is applicable:
> HasSubscriptionRelations calls table_open function which will get the
> invalidate callback like in:
> HasSubscriptionRelations -> table_open -> relation_open -> LockRelationOid
> -> AcceptInvalidationMessages -> ReceiveSharedInvalidMessages ->
> LocalExecuteInvalidationMessage -> CallSyscacheCallbacks ->
> invalidate_syncing_table_states
> 
> The attached patch
> v7-0001-Table-sync-missed-due-to-race-condition-in-subscr.patch
> applies for master and PG16 branch,
> v7-0001-Table-sync-missed-due-to-race-condition-in-subscr_PG15.patch
> applies for PG15 branch.

Thanks, I have verified that the patches can be applied cleanly and fix the
issue on each branch. The regression test can also pass after applying the patch
on my machine.

Best Regards,
Hou zj


Re: [PoC] Improve dead tuple storage for lazy vacuum

2024-04-24 Thread Masahiko Sawada
On Thu, Apr 25, 2024 at 6:03 AM Noah Misch  wrote:
>
> On Mon, Apr 15, 2024 at 04:12:38PM +0700, John Naylor wrote:
> > - Some paths for single-value leaves are not covered:
> >
> > https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/include/lib/radixtree.h.gcov.html#L904
> > https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/include/lib/radixtree.h.gcov.html#L954
> > https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/include/lib/radixtree.h.gcov.html#L2606
> >
> > However, these paths do get regression test coverage on 32-bit
> > machines. 64-bit builds only have leaves in the TID store, which
> > doesn't (currently) delete entries, and doesn't instantiate the tree
> > with the debug option.
> >
> > - In RT_SET "if (found)" is not covered:
> >
> > https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/include/lib/radixtree.h.gcov.html#L1768
> >
> > That's because we don't yet have code that replaces an existing value
> > with a value of a different length.
>
> I saw a SIGSEGV there when using tidstore to write a fix for something else.
> Patch attached.

Great find, thank you for the patch!

The fix looks good to me. I think we can improve regression tests for
better coverage. In TidStore on a 64-bit machine, we can store 3
offsets in the header and these values are embedded to the leaf page.
With more than 3 offsets, the value size becomes more than 16 bytes
and a single value leaf. Therefore, if we can add the test with the
array[1,2,3,4,100], we can cover the case of replacing a single-value
leaf with a different size new single-value leaf. Now we add 9 pairs
of do_gset_block_offset() and check_set_block_offsets(). If these are
annoying, we can remove the cases of array[1] and array[1,2].

I've attached a new patch. In addition to the new test case I
mentioned, I've added some new comments and removed an unnecessary
added line in test_tidstore.sql.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


0001-radixtree-Fix-SIGSEGV-at-update-of-embeddable-value-.patch
Description: Binary data


Re: [PoC] Improve dead tuple storage for lazy vacuum

2024-04-24 Thread John Naylor
On Thu, Apr 25, 2024 at 9:50 AM Masahiko Sawada  wrote:
>
> > I saw a SIGSEGV there when using tidstore to write a fix for something else.
> > Patch attached.
>
> Great find, thank you for the patch!

+1

(This occurred to me a few days ago, but I was far from my computer.)

With the purge function that  Noah proposed, I believe we can also get
rid of the comment at the top of the .sql test file warning of a
maintenance hazard:
..."To avoid adding duplicates,
-- each call to do_set_block_offsets() should use different block
-- numbers."

I found that it doesn't add any measurable time to run the test.

> The fix looks good to me. I think we can improve regression tests for
> better coverage. In TidStore on a 64-bit machine, we can store 3
> offsets in the header and these values are embedded to the leaf page.
> With more than 3 offsets, the value size becomes more than 16 bytes
> and a single value leaf. Therefore, if we can add the test with the
> array[1,2,3,4,100], we can cover the case of replacing a single-value
> leaf with a different size new single-value leaf. Now we add 9 pairs

Good idea.

> of do_gset_block_offset() and check_set_block_offsets(). If these are
> annoying, we can remove the cases of array[1] and array[1,2].

Let's keep those -- 32-bit platforms should also exercise this path.




Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-24 Thread Michael Paquier
On Wed, Apr 24, 2024 at 01:31:12PM +0200, Daniel Gustafsson wrote:
> Done.  Attached are the two remaining patches, rebased over HEAD, for removing
> support for OpenSSL 1.0.2 in v18. Parking them in the commitfest for now.

You have mentioned once upthread the documentation of PQinitOpenSSL():
   However, this is unnecessary when using OpenSSL
   version 1.1.0 or later, as duplicate initializations are no longer 
problematic.

With 1.0.2's removal in place, this could be simplified more and the
patch does not touch it.  This relates also to pq_init_crypto_lib,
which is gone with 0001.  Of course, it is not possible to just remove
PQinitOpenSSL() or old application may fail linking.  Removing
pq_init_crypto_lib reduces any confusion around this part of the
initialization.

0002 looks OK here.
--
Michael


signature.asc
Description: PGP signature


Re: Partitioned tables and [un]loggedness

2024-04-24 Thread Michael Paquier
On Wed, Apr 24, 2024 at 04:43:58PM -0700, David G. Johnston wrote:
> My point is that if you feel that treating logged as a copy-able property
> is OK then doing the following should also just work:
> 
> postgres=# create temp table parentt ( id integer ) partition by range (id);
> CREATE TABLE
> postgres=# create table child10t partition of parentt for values from (0)
> to (9);
> ERROR:  cannot create a permanent relation as partition of temporary
> relation "parentt"
> 
> i.e., child10t should be created as a temporary partition under parentt.

Ah, indeed, I've missed your point here.  Lifting the error and
inheriting temporary in this case would make sense.
--
Michael


signature.asc
Description: PGP signature


Re: Experiments with Postgres and SSL

2024-04-24 Thread Jacob Champion
On Wed, Apr 24, 2024 at 1:57 PM Peter Eisentraut  wrote:
> I'm concerned that there appears to be some confusion over whether ALPN
> is a performance feature or a security feature.  RFC 7301 appears to be
> pretty clear that it's for performance, not for security.

It was also designed to give benefits for more complex topologies
(proxies, cert selection, etc.), but yeah, this is a mitigation
technique that just uses what is already widely implemented.

> Looking at the ALPACA attack, I'm not convinced that it's very relevant
> for PostgreSQL.  It's basically just a case of, you connected to the
> wrong server.

I think that's an oversimplification. This prevents active MITM, where
an adversary has connected you to the wrong server.

> But I don't see how ALPN is a good defense.
>   It can help only if all other possible services other than http
> implement it and say, you're a web browser, go away.

Why? An ALPACA-aware client will fail the connection if the server
doesn't advertise the correct protocol. An ALPACA-aware server will
fail the handshake if the client doesn't advertise the correct
protocol. They protect themselves, and their peers, without needing
their peers to understand.

> And what if the
> rogue server is in fact a web server, then it doesn't help at all.

It's not a rogue server; the attack is using other friendly services
against you. If you're able to set up an attacker-controlled server,
using the same certificate as the valid server, on a host covered by
the cert, I think it's game over for many other reasons.

If you mean that you can't prevent an attacker from redirecting one
web server's traffic to another (friendly) web server that's running
on the same host, that's correct. Web admins who care would need to
implement countermeasures, like Origin header filtering or something?
I don't think we have a similar concept to that -- it'd be nice! --
but we don't need to have one in order to provide protection for the
other network protocols we exist next to.

> I
> guess there could be some common configurations where there is a web
> server, and ftp server, and some mail servers running on the same TLS
> end point.  But in how many cases is there also a PostgreSQL server
> running on the same end point?

Not only have I seen those cohosted, I've deployed such setups myself.
Isn't that basically cPanel's MO, and a standard setup for ? (It's been a while and I don't have a setup
handy to double-check, sorry; feel free to push back if they don't do
that anymore.)

A quick search for "running web server and Postgres on the same host"
seems to yield plenty of conversations. Some of those conversations
say "don't do it", but of course others do not :) Some actively
encourage it for simplicity.

> The page about ALPACA also suggests SNI
> as a mitigation, which seems more sensible, because the burden is then
> on the client to do the right thing, and not on all other servers to
> send away clients doing the wrong thing.  And of course libpq already
> supports SNI.

That mitigates a different attack. From the ALPACA site [1]:

> Implementing these [ALPN] countermeasures is effective in preventing 
> cross-protocol attacks irregardless of hostnames and ports used for 
> application servers.
> ...
> Implementing these [SNI] countermeasures is effective in preventing 
> same-protocol attacks on servers with different hostnames, as well as 
> cross-protocol attacks on servers with different hostnames even if the ALPN 
> countermeasures can not be implemented.

SNI is super useful; it's just not always enough. And a strict SNI
check would also be good to do, but it doesn't seem imperative to tie
it to this feature, since same-protocol attacks were already possible
AFAICT. It's the cross-protocol attacks that are new, made possible by
the new handshake.

> For the protocol negotiation aspect, how does this work if the wrapped
> protocol already has a version negotiation system?  For example, various
> HTTP versions are registered as separate protocols for ALPN.  What if
> ALPN says it's HTTP/1.0 but the actual HTTP requests specify 1.1, or
> vice versa?

If a client or server incorrectly negotiates a protocol and then
starts speaking a different one, then it's just protocol-dependent
whether that works or not. HTTP/1.0 and HTTP/1.1 would still be
cross-compatible in some cases. The others, not so much.

> What is the actual mechanism where the performance benefits
> (saving round-trips) are created?

The negotiation gets done as part of the TLS handshake, which had to
be done anyway.

> I haven't caught up with HTTP 2 and
> so on, so maybe there are additional things at play there, but it is not
> fully explained in the RFCs.

Practically speaking, HTTP/2 is negotiated via ALPN in the real world,
at least last I checked. I don't think browsers ever supported the
plaintext h2c:// scheme. There's also an in-band `Upgrade: h2c` path
defined that does not use ALPN at all, but again I don't think any

Re: [PoC] Improve dead tuple storage for lazy vacuum

2024-04-24 Thread Masahiko Sawada
On Mon, Apr 15, 2024 at 6:12 PM John Naylor  wrote:
>
> I took a look at the coverage report from [1] and it seems pretty
> good, but there are a couple more tests we could do.

Thank you for checking!

>
> - RT_KEY_GET_SHIFT is not covered for key=0:
>
> https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/include/lib/radixtree.h.gcov.html#L803
>
> That should be fairly simple to add to the tests.

There are two paths to call RT_KEY_GET_SHIFT():

1. RT_SET() -> RT_KEY_GET_SHIFT()
2. RT_SET() -> RT_EXTEND_UP() -> RT_KEY_GET_SHIFT()

In both cases, it's called when key > tree->ctl->max_val. Since the
minimum value of max_val is 255, RT_KEY_GET_SHIFT() is never called
when key=0.

>
> - Some paths for single-value leaves are not covered:
>
> https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/include/lib/radixtree.h.gcov.html#L904
> https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/include/lib/radixtree.h.gcov.html#L954
> https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/include/lib/radixtree.h.gcov.html#L2606
>
> However, these paths do get regression test coverage on 32-bit
> machines. 64-bit builds only have leaves in the TID store, which
> doesn't (currently) delete entries, and doesn't instantiate the tree
> with the debug option.

Right.

>
> - In RT_SET "if (found)" is not covered:
>
> https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/include/lib/radixtree.h.gcov.html#L1768
>
> That's because we don't yet have code that replaces an existing value
> with a value of a different length.

Noah reported an issue around that. We should incorporate the patch
and cover this code path.

>
> - RT_FREE_RECURSE isn't well covered:
>
> https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/include/lib/radixtree.h.gcov.html#L1768
>
> The TID store test is pretty simple as far as distribution of block
> keys, and focuses more on the offset bitmaps. We could try to cover
> all branches here, but it would make the test less readable, and it's
> kind of the wrong place to do that anyway. test_radixtree.c does have
> a commented-out option to use shared memory, but that's for local
> testing and won't be reflected in the coverage report. Maybe it's
> enough.

Agreed.

>
> - RT_DELETE: "if (key > tree->ctl->max_val)" is not covered:
>
> https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/include/lib/radixtree.h.gcov.html#L2644
>
> That should be easy to add.

Agreed. The patch is attached.

>
> - RT_DUMP_NODE is not covered, and never called by default anyway:
>
> https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/include/lib/radixtree.h.gcov.html#L2804
>
> It seems we could just leave it alone since it's debug-only, but it's
> also a lot of lines. One idea is to use elog with DEBUG5 instead of
> commenting out the call sites, but that would cause a lot of noise.

I think we can leave it alone.

>
> - TidStoreCreate* has some memory clamps that are not covered:
>
> https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/access/common/tidstore.c.gcov.html#L179
> https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/access/common/tidstore.c.gcov.html#L234
>
> Maybe we could experiment with using 1MB for shared, and something
> smaller for local.

I've confirmed that the local and shared tidstore with small max sizes
such as 4kB and 1MB worked. Currently the max size is hard-coded in
test_tidstore.c but if we use work_mem as the max size, we can pass
different max sizes for local and shared in the test script.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


improve_code_coverage_radixtree.patch
Description: Binary data


Re: docs: minor typo fix for "lower(anymultirange)"

2024-04-24 Thread Michael Paquier
On Thu, Apr 25, 2024 at 09:02:34AM +0800, Richard Guo wrote:
> Good catch!  I checked the descriptions for "upper(anymultirange)",
> "lower(anyrange)" and "upper(anyrange)", and they are all correct.  We
> should fix this one.

+1.
--
Michael


signature.asc
Description: PGP signature


Re: AIX support

2024-04-24 Thread Bruce Momjian
On Sat, Apr 20, 2024 at 12:25:47PM -0400, Tom Lane wrote:
> > I can see several ways going forward:
> > 1. We revert the removal of AIX support and carry on with the status quo 
> > ante.  (The removal of AIX is a regression; it is timely and in scope 
> > now to revert the change.)
> > 2. Like (1), but we consider that notice has been given, and we will 
> > remove it early in PG18 (like August) unless the situation improves.
> > 3. We leave it out of PG17 and consider a new AIX port for PG18 on its 
> > own merits.
> 
> Andres has ably summarized the reasons why the status quo ante was
> getting untenable.  The direct-I/O problem could have been tolerable
> on its own, but in reality it was the straw that broke the camel's
> back so far as our willingness to maintain AIX support went.  There
> were just too many hacks and workarounds for too many problems,
> with too few people interested in looking for better answers.
> 
> So I'm totally not in favor of #1, at least not without some hard
> commitments and follow-through on really cleaning up the mess
> (which maybe looks more like your #2).  What's needed here, as
> you said, is for someone with a decent amount of expertise in
> modern AIX to review all the issues.  Maybe framing that as a
> "new port" per #3 would be a good way to think about it.  But
> I don't want to just revert the AIX-ectomy and continue drifting.
> 
> On the whole, it wouldn't be the worst thing in the world if PG 17
> lacks AIX support but that comes back in PG 18.  That approach would
> solve the schedule-crunch aspect and give time for considered review
> of how many of the hacks removed in 0b16bb877 really need to be put
> back, versus being obsolete or amenable to a nicer solution in
> late-model AIX.  If we take a "new port" mindset then it would be
> totally reasonable to say that it only supports very recent AIX
> releases, so I'd hope at least some of the cruft could be removed.

I agree that targeting PG 18 for a new-er AIX port is the reasonable
approach.  If there is huge demand, someone can create an AIX fork for
PG 17 using the reverted patches --- yeah, lots of pain there, but we
have carried the AIX pain for too long with too little support.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: AIX support

2024-04-24 Thread Tom Lane
Michael Paquier  writes:
> Some of the portability changes removed in 0b16bb877 feel indeed
> obsolete, so it may not hurt to start an analysis from scratch to see
> the minimum amount of work that would be really required with the
> latest versions of xlc, using the newest compilers as a supported
> base.

Something I've been mulling over is whether to suggest that the
proposed "new port" should only target building with gcc.

On the one hand, that would (I think) remove a number of annoying
issues, and the average end user is unlikely to care which compiler
their database server was built with.  On the other hand, I'm a strong
proponent of avoiding software monocultures, and xlc is one of the few
C compilers still standing that aren't gcc or clang.

It would definitely make sense for a new port to start by getting
things going with gcc only, and then look at resurrecting xlc
support.

> I'd like to think backporting these to stable branches should
> be OK at some point, once the new port is proving baked enough.

If things go as I expect, the "new port" would effectively drop
support for older AIX and/or older compiler versions.  So back-
porting seems like an unlikely decision.

> Anyway, getting an access to such compilers to be able to debug issues
> on hosts that take less than 12h to just compile the code would
> certainly help its adoption.

+many

regards, tom lane




Why don't we support external input/output functions for the composite types

2024-04-24 Thread Dilip Kumar
Hi,

I'm curious about composite types in PostgreSQL. By default, when we
create a composite type, it utilizes the "record_in" and "record_out"
functions for input/output. Do you think it would be beneficial to
expand the syntax to allow users to specify custom input/output
functions when creating composite types? Has anyone attempted this
before, and are there any design challenges associated with it? Or is
it simply not implemented because it's not seen as a valuable
addition?

I believe it would be beneficial because users creating a new type
might prefer to define specific input/output syntax rather than
conforming to what is accepted by the RECORD type.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Why don't we support external input/output functions for the composite types

2024-04-24 Thread Dilip Kumar
On Thu, Apr 25, 2024 at 10:14 AM Tom Lane  wrote:
>
> Dilip Kumar  writes:
> > I'm curious about composite types in PostgreSQL. By default, when we
> > create a composite type, it utilizes the "record_in" and "record_out"
> > functions for input/output. Do you think it would be beneficial to
> > expand the syntax to allow users to specify custom input/output
> > functions when creating composite types?
>
> No.
>
> > I believe it would be beneficial because users creating a new type
> > might prefer to define specific input/output syntax rather than
> > conforming to what is accepted by the RECORD type.
>

Thanks for the quick response, Tom.

> The primary outcome would be to require a huge amount of new work
> to be done by a lot of software, much of it not under our control.

Yeah, I agree with that.

> And the impact wouldn't only be to software that would prefer not
> to know about this.  For example, how likely do you think it is
> that these hypothetical user-defined I/O functions would cope
> well with ALTER TABLE/ALTER TYPE commands that change those
> rowtypes?

That's a good point. I was primarily focused on altering the
representation of input and output values, rather than considering
changes to internal storage. However, offering this feature could
indeed allow users to influence how values are stored.  And that can
potentially affect ALTER TYPE because then we do not have control over
how those values are stored internally.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: AIX support

2024-04-24 Thread Michael Paquier
On Wed, Apr 24, 2024 at 11:39:37PM -0400, Bruce Momjian wrote:
> On Sat, Apr 20, 2024 at 12:25:47PM -0400, Tom Lane wrote:
>> So I'm totally not in favor of #1, at least not without some hard
>> commitments and follow-through on really cleaning up the mess
>> (which maybe looks more like your #2).  What's needed here, as
>> you said, is for someone with a decent amount of expertise in
>> modern AIX to review all the issues.  Maybe framing that as a
>> "new port" per #3 would be a good way to think about it.  But
>> I don't want to just revert the AIX-ectomy and continue drifting.
>> 
>> On the whole, it wouldn't be the worst thing in the world if PG 17
>> lacks AIX support but that comes back in PG 18.  That approach would
>> solve the schedule-crunch aspect and give time for considered review
>> of how many of the hacks removed in 0b16bb877 really need to be put
>> back, versus being obsolete or amenable to a nicer solution in
>> late-model AIX.  If we take a "new port" mindset then it would be
>> totally reasonable to say that it only supports very recent AIX
>> releases, so I'd hope at least some of the cruft could be removed.
> 
> I agree that targeting PG 18 for a new-er AIX port is the reasonable
> approach.  If there is huge demand, someone can create an AIX fork for
> PG 17 using the reverted patches --- yeah, lots of pain there, but we
> have carried the AIX pain for too long with too little support.

Some of the portability changes removed in 0b16bb877 feel indeed
obsolete, so it may not hurt to start an analysis from scratch to see
the minimum amount of work that would be really required with the
latest versions of xlc, using the newest compilers as a supported
base.  I'd like to think backporting these to stable branches should
be OK at some point, once the new port is proving baked enough.

Anyway, getting an access to such compilers to be able to debug issues
on hosts that take less than 12h to just compile the code would
certainly help its adoption.  So seeing commitment in the form of
patches and access to environments would help a lot.  Overall,
studying that afresh with v18 looks like a good idea, assuming that
anybody who commits such patches has access to hosts to evaluate them,
with buildfarm members running on top, of course.

My 2c.
--
Michael


signature.asc
Description: PGP signature


Re: AIX support

2024-04-24 Thread Michael Paquier
On Thu, Apr 25, 2024 at 12:20:05AM -0400, Tom Lane wrote:
> It would definitely make sense for a new port to start by getting
> things going with gcc only, and then look at resurrecting xlc
> support.

Sriram mentioned upthread that he was looking at both of them.  I'd be
ready to assume that most of the interest is in xlc, not gcc.  But I
may be wrong.

Saying that, dividing the effort into more successive steps is
sensible here (didn't consider that previously, you have a good
point).
--
Michael


signature.asc
Description: PGP signature


Re: Why don't we support external input/output functions for the composite types

2024-04-24 Thread Tom Lane
Dilip Kumar  writes:
> I'm curious about composite types in PostgreSQL. By default, when we
> create a composite type, it utilizes the "record_in" and "record_out"
> functions for input/output. Do you think it would be beneficial to
> expand the syntax to allow users to specify custom input/output
> functions when creating composite types?

No.

> I believe it would be beneficial because users creating a new type
> might prefer to define specific input/output syntax rather than
> conforming to what is accepted by the RECORD type.

The primary outcome would be to require a huge amount of new work
to be done by a lot of software, much of it not under our control.
And the impact wouldn't only be to software that would prefer not
to know about this.  For example, how likely do you think it is
that these hypothetical user-defined I/O functions would cope
well with ALTER TABLE/ALTER TYPE commands that change those
rowtypes?

regards, tom lane




Re: Is it acceptable making COPY format extendable?

2024-04-24 Thread Sutou Kouhei
Hi,

Thanks for replying this.

In <02cccd36-3083-4a50-aae4-f957e96fb...@eisentraut.org>
  "Re: Is it acceptable making COPY format extendable?" on Wed, 24 Apr 2024 
09:57:38 +0200,
  Peter Eisentraut  wrote:

>> I'm proposing a patch that making COPY format extendable:
>> https://www.postgresql.org/message-id/20231204.153548.2126325458835528809.kou%40clear-code.com
>> https://commitfest.postgresql.org/48/4681/
>>
>> But my proposal stalled. It it acceptable the feature
>> request that making COPY format extendable? If it's not
>> acceptable, what are blockers? Performance?
> 
> I think that thread is getting an adequate amount of attention.  Of
> course, you can always wish for more, but "stalled" looks very
> different.

Sorry for "stalled" misuse.

I haven't got any reply since 2024-03-15:
https://www.postgresql.org/message-id/flat/20240315.173754.2049843193122381085.kou%40clear-code.com#07aefc636d8165204ddfba971dc9a490
(I sent some pings.)

So I called this status as "stalled".

I'm not familiar with the PostgreSQL's development
style. What should I do for this case? Should I just wait
for a reply from others without doing anything?

> Let's not start a new thread here to discuss the merits of the other
> thread; that discussion belongs there.

I wanted to discuss possibility for this feature request in
this thread. I wanted to use my existing thread is for how
to implement this feature request.

Should I reuse
https://www.postgresql.org/message-id/flat/CAJ7c6TM6Bz1c3F04Cy6%2BSzuWfKmr0kU8c_3Stnvh_8BR0D6k8Q%40mail.gmail.com
for it instead of this thread?


Thanks,
-- 
kou




Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-04-24 Thread Bharath Rupireddy
On Mon, Apr 22, 2024 at 7:21 PM Masahiko Sawada  wrote:
>
> > Please find the attached v35 patch.
>
> The documentation says about both 'active' and 'inactive_since'
> columns of pg_replication_slots say:
>
> ---
> active bool
> True if this slot is currently actively being used
>
> inactive_since timestamptz
> The time since the slot has become inactive. NULL if the slot is
> currently being used. Note that for slots on the standby that are
> being synced from a primary server (whose synced field is true), the
> inactive_since indicates the last synchronization (see Section 47.2.3)
> time.
> ---
>
> When reading the description I thought if 'active' is true,
> 'inactive_since' is NULL, but it doesn't seem to apply for temporary
> slots.

Right.

> Since we don't reset the active_pid field of temporary slots
> when the release, the 'active' is still true in the view but
> 'inactive_since' is not NULL.

Right. inactive_since is reset whenever the temporary slot is acquired
again within the same backend that created the temporary slot.

> Do you think we need to mention it in
> the documentation?

I think that's the reason we dropped "active" from the statement. It
was earlier "NULL if the slot is currently actively being used.". But,
per Bertrand's comment
https://www.postgresql.org/message-id/ZehE2IJcsetSJMHC%40ip-10-97-1-34.eu-west-3.compute.internal
changed it to ""NULL if the slot is currently being used.".

Temporary slots retain the active = true and active_pid =  even when the slot is not being used until
the lifetime of the backend process. We haven't tied active or
active_pid flags to inactive_since, doing so now to represent the
temporary slot behaviour for active and active_pid will confuse users
more. As far as the inactive_since of a slot is concerned, it is set
to 0 when the slot is being used (acquired) and set to current
timestamp when the slot is not being used (released).

> As for the timeout-based slot invalidation feature, we could end up
> invalidating the temporary slots even if they are shown as active,
> which could confuse users. Do we want to somehow deal with it?

Yes. As long as the temporary slot is lying unused holding up
resources for more than the specified
replication_slot_inactive_timeout, it is bound to get invalidated.
This keeps behaviour consistent and less-confusing to the users.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: [PoC] Improve dead tuple storage for lazy vacuum

2024-04-24 Thread Masahiko Sawada
On Thu, Apr 25, 2024 at 12:17 PM John Naylor  wrote:
>
> On Thu, Apr 25, 2024 at 9:50 AM Masahiko Sawada  wrote:
> >
> > > I saw a SIGSEGV there when using tidstore to write a fix for something 
> > > else.
> > > Patch attached.
> >
> > Great find, thank you for the patch!
>
> +1
>
> (This occurred to me a few days ago, but I was far from my computer.)
>
> With the purge function that  Noah proposed, I believe we can also get
> rid of the comment at the top of the .sql test file warning of a
> maintenance hazard:
> ..."To avoid adding duplicates,
> -- each call to do_set_block_offsets() should use different block
> -- numbers."

Good point. Removed.

>
> > of do_gset_block_offset() and check_set_block_offsets(). If these are
> > annoying, we can remove the cases of array[1] and array[1,2].
>
> Let's keep those -- 32-bit platforms should also exercise this path.

Agreed.

I've attached a new patch. I'll push it tonight, if there is no further comment.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


v2-0001-radixtree-Fix-SIGSEGV-at-update-of-embeddable-val.patch
Description: Binary data


Re: Avoid orphaned objects dependencies, take 3

2024-04-24 Thread Bertrand Drouvot
Hi,

On Wed, Apr 24, 2024 at 03:00:00PM +0300, Alexander Lakhin wrote:
> 24.04.2024 11:38, Bertrand Drouvot wrote:
> > I gave more thought to v2 and the approach seems reasonable to me. 
> > Basically what
> > it does is that in case the object is already dropped before we take the 
> > new lock
> > (while recording the dependency) then the error message is a "generic" one 
> > (means
> > it does not provide the description of the "already" dropped object). I 
> > think it
> > makes sense to write the patch that way by abandoning the patch's ambition 
> > to
> > tell the description of the dropped object in all the cases.
> > 
> > Of course, I would be happy to hear others thought about it.
> > 
> > Please find v3 attached (which is v2 with more comments).
> 
> Thank you for the improved version!
> 
> I can confirm that it fixes that case.

Great, thanks for the test!

> I've also tested other cases that fail on master (most of them also fail
> with v1), please try/look at the attached script.

Thanks for all those tests!

> (There could be other broken-dependency cases, of course, but I think I've
> covered all the representative ones.)

Agree. Furthermore the way the patch is written should be agnostic to the
object's kind that are part of the dependency. Having said that, that does not
hurt to add more tests in this patch, so v4 attached adds some of your tests (
that would fail on the master branch without the patch applied).

The way the tests are written in the patch are less "racy" that when triggered
with your script. Indeed, I think that in the patch the dependent object can not
be removed before the new lock is taken when recording the dependency. We may
want to add injection points in the game if we feel the need.

> All tested cases work correctly with v3 applied —

Yeah, same on my side, I did run them too and did not find any issues.

> I couldn't get broken
> dependencies,

Same here.

> though concurrent create/drop operations can still produce
> the "cache lookup failed" error, which is probably okay, except that it is
> an INTERNAL_ERROR, which assumed to be not easily triggered by users.

I did not see any of those "cache lookup failed" during my testing with/without
your script. During which test(s) did you see those with v3 applied?

Attached v4, simply adding more tests to v3.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From a9b34955fab0351b7b5a7ba6eb36f199f5a5822c Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Fri, 29 Mar 2024 15:43:26 +
Subject: [PATCH v4] Avoid orphaned objects dependencies

It's currently possible to create orphaned objects dependencies, for example:

Scenario 1:

session 1: begin; drop schema schem;
session 2: create a function in the schema schem
session 1: commit;

With the above, the function created in session 2 would be linked to a non
existing schema.

Scenario 2:

session 1: begin; create a function in the schema schem
session 2: drop schema schem;
session 1: commit;

With the above, the function created in session 1 would be linked to a non
existing schema.

To avoid those scenarios, a new lock (that conflicts with a lock taken by DROP)
has been put in place when the dependencies are being recorded. With this in
place, the drop schema in scenario 2 would be locked.

Also, after the new lock attempt, the patch checks that the object still exists:
with this in place session 2 in scenario 1 would be locked and would report an
error once session 1 committs (that would not be the case should session 1 abort
the transaction).

If the object is dropped before the new lock attempt is triggered then the patch
would also report an error (but with less details).

The patch adds a few tests for some dependency cases (that would currently produce
orphaned objects):

- schema and function (as the above scenarios)
- function and arg type
- function and return type
- function and function
- domain and domain
- table and type
- server and foreign data wrapper
---
 src/backend/catalog/dependency.c  |  54 +
 src/backend/catalog/objectaddress.c   |  70 +++
 src/backend/catalog/pg_depend.c   |   6 +
 src/include/catalog/dependency.h  |   1 +
 src/include/catalog/objectaddress.h   |   1 +
 src/test/modules/Makefile |   1 +
 src/test/modules/meson.build  |   1 +
 .../test_dependencies_locks/.gitignore|   3 +
 .../modules/test_dependencies_locks/Makefile  |  14 +++
 .../expected/test_dependencies_locks.out  | 113 ++
 .../test_dependencies_locks/meson.build   |  12 ++
 .../specs/test_dependencies_locks.spec|  78 
 12 files changed, 354 insertions(+)
  24.6% src/backend/catalog/
  42.7% src/test/modules/test_dependencies_locks/expected/
  25.9% src/test/modules/test_dependencies_locks/specs/
   5.0% 

Re: cataloguing NOT NULL constraints

2024-04-24 Thread Alexander Lakhin

24.04.2024 20:36, Alvaro Herrera wrote:

So I added a restriction that we only accept such a change when
recursively adding a constraint, or during binary upgrade.  This should
limit the damage: you're no longer able to change an existing constraint
from NO INHERIT to YES INHERIT merely by doing another ALTER TABLE ADD
CONSTRAINT.

One thing that has me a little nervous about this whole business is
whether we're set up to error out where some child table down the
hierarchy has nulls, and we add a not-null constraint to it but fail to
do a verification scan.  I tried a couple of cases and AFAICS it works
correctly, but maybe there are other cases I haven't thought about where
it doesn't.



Thank you for the fix!

While studying the NO INHERIT option, I've noticed that the documentation
probably misses it's specification for NOT NULL:
https://www.postgresql.org/docs/devel/sql-createtable.html

where column_constraint is:
...
[ CONSTRAINT constraint_name ]
{ NOT NULL |
  NULL |
  CHECK ( expression ) [ NO INHERIT ] |

Also, I've found a weird behaviour with a non-inherited NOT NULL
constraint for a partitioned table:
CREATE TABLE pt(a int NOT NULL NO INHERIT) PARTITION BY LIST (a);
CREATE TABLE dp(a int NOT NULL);
ALTER TABLE pt ATTACH PARTITION dp DEFAULT;
ALTER TABLE pt DETACH PARTITION dp;
fails with:
ERROR:  relation 16389 has non-inherited constraint "dp_a_not_null"

Though with an analogous check constraint, I get:
CREATE TABLE pt(a int, CONSTRAINT nna CHECK (a IS NOT NULL) NO INHERIT) 
PARTITION BY LIST (a);
ERROR:  cannot add NO INHERIT constraint to partitioned table "pt"

Best regards,
Alexander




Re: Avoid orphaned objects dependencies, take 3

2024-04-24 Thread Bertrand Drouvot
Hi,

On Tue, Apr 23, 2024 at 04:20:46PM +, Bertrand Drouvot wrote:
> Please find attached v2 that should not produce the issue anymore (I launched 
> a
> lot of attempts without any issues). v1 was not strong enough as it was not
> always checking for the dependent object existence. v2 now always checks if 
> the
> object still exists after the additional lock acquisition attempt while 
> recording
> the dependency.
> 
> I still need to think about v2 but in the meantime could you please also give
> v2 a try on you side?

I gave more thought to v2 and the approach seems reasonable to me. Basically 
what
it does is that in case the object is already dropped before we take the new 
lock
(while recording the dependency) then the error message is a "generic" one 
(means
it does not provide the description of the "already" dropped object). I think it
makes sense to write the patch that way by abandoning the patch's ambition to
tell the description of the dropped object in all the cases.

Of course, I would be happy to hear others thought about it.

Please find v3 attached (which is v2 with more comments).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 32945912ceddad6b171fd8b345adefa4432af357 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Fri, 29 Mar 2024 15:43:26 +
Subject: [PATCH v3] Avoid orphaned objects dependencies

It's currently possible to create orphaned objects dependencies, for example:

Scenario 1:

session 1: begin; drop schema schem;
session 2: create a function in the schema schem
session 1: commit;

With the above, the function created in session 2 would be linked to a non
existing schema.

Scenario 2:

session 1: begin; create a function in the schema schem
session 2: drop schema schem;
session 1: commit;

With the above, the function created in session 1 would be linked to a non
existing schema.

To avoid those scenarios, a new lock (that conflicts with a lock taken by DROP)
has been put in place when the dependencies are being recorded. With this in
place, the drop schema in scenario 2 would be locked.

Also, after locking the object, the patch checks that the object still exists:
with this in place session 2 in scenario 1 would be locked and would report an
error once session 1 committs (that would not be the case should session 1 abort
the transaction).

The patch adds a few tests for some dependency cases (that would currently produce
orphaned objects):

- schema and function (as the above scenarios)
- function and type
- table and type
---
 src/backend/catalog/dependency.c  | 54 ++
 src/backend/catalog/objectaddress.c   | 70 +++
 src/backend/catalog/pg_depend.c   |  6 ++
 src/include/catalog/dependency.h  |  1 +
 src/include/catalog/objectaddress.h   |  1 +
 src/test/modules/Makefile |  1 +
 src/test/modules/meson.build  |  1 +
 .../test_dependencies_locks/.gitignore|  3 +
 .../modules/test_dependencies_locks/Makefile  | 14 
 .../expected/test_dependencies_locks.out  | 49 +
 .../test_dependencies_locks/meson.build   | 12 
 .../specs/test_dependencies_locks.spec| 39 +++
 12 files changed, 251 insertions(+)
  40.5% src/backend/catalog/
  29.6% src/test/modules/test_dependencies_locks/expected/
  18.7% src/test/modules/test_dependencies_locks/specs/
   8.3% src/test/modules/test_dependencies_locks/

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index d4b5b2ade1..a49357bbe2 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1517,6 +1517,60 @@ AcquireDeletionLock(const ObjectAddress *object, int flags)
 	}
 }
 
+/*
+ * depLockAndCheckObject
+ *
+ * Lock the object that we are about to record a dependency on.
+ * After it's locked, verify that it hasn't been dropped while we
+ * weren't looking.  If the object has been dropped, this function
+ * does not return!
+ */
+void
+depLockAndCheckObject(const ObjectAddress *object)
+{
+	char	   *object_description;
+
+	/*
+	 * Those don't rely on LockDatabaseObject() when being dropped (see
+	 * AcquireDeletionLock()). Also it looks like they can not produce
+	 * orphaned dependent objects when being dropped.
+	 */
+	if (object->classId == RelationRelationId || object->classId == AuthMemRelationId)
+		return;
+
+	object_description = getObjectDescription(object, true);
+
+	/* assume we should lock the whole object not a sub-object */
+	LockDatabaseObject(object->classId, object->objectId, 0, AccessShareLock);
+
+	/* check if object still exists */
+	if (!ObjectByIdExist(object, false))
+	{
+		/*
+		 * It might be possible that we are creating it (for example creating
+		 * a composite type while creating a relation), so bypass the syscache
+		 * lookup and use a dirty snaphot instead to cover this scenario.
+		 

Re: Show WAL write and fsync stats in pg_stat_io

2024-04-24 Thread Nazir Bilal Yavuz
Hi,

On Fri, 19 Apr 2024 at 11:01, Nazir Bilal Yavuz  wrote:
> > On Thu, 18 Jan 2024 at 04:22, Michael Paquier  wrote:
> > >
> > > On Wed, Jan 17, 2024 at 03:20:39PM +0300, Nazir Bilal Yavuz wrote:
> > > > I agree with your points. While the other I/O related work is
> > > > happening we can discuss what we should do in the variable op_byte
> > > > cases. Also, this is happening only for WAL right now but if we try to
> > > > extend pg_stat_io in the future, that problem possibly will rise
> > > > again. So, it could be good to come up with a general solution, not
> > > > only for WAL.
> > >
> > > Okay, I've marked the patch as RwF for this CF.

Since the last commitfest entry was returned with feedback, I created
a new commitfest entry: https://commitfest.postgresql.org/48/4950/

-- 
Regards,
Nazir Bilal Yavuz
Microsoft




Re: WIP Incremental JSON Parser

2024-04-24 Thread Michael Paquier
On Fri, Apr 19, 2024 at 02:04:40PM -0400, Robert Haas wrote:
> Yeah, I think this patch invented a new solution to a problem that
> we've solved in a different way everywhere else. I think we should
> change it to match what we do in general.

As of ba3e6e2bca97, did you notice that test_json_parser_perf
generates two core files because progname is not set, failing an
assertion when using the frontend logging?
--
Michael


signature.asc
Description: PGP signature


  1   2   >