Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-03-20 Thread John Naylor
On Mon, Mar 20, 2023 at 9:34 PM Masahiko Sawada 
wrote:
>
> On Mon, Mar 20, 2023 at 9:34 PM John Naylor
>  wrote:
> > That's an interesting idea, and the analogous behavior to aset could be
a good thing for readability and maintainability. Worth seeing if it's
workable.
>
> I've attached a quick hack patch. It can be applied on top of v32
> patches. The changes to dsa.c are straightforward since it makes the
> initial and max block sizes configurable.

Good to hear -- this should probably be proposed in a separate thread for
wider visibility.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Eliminating SPI from RI triggers - take 2

2023-03-20 Thread Amit Langote
Hi Greg,

On Tue, Mar 21, 2023 at 3:54 AM Gregory Stark (as CFM)
 wrote:
> On Mon, 17 Oct 2022 at 14:59, Robert Haas  wrote:
> > But I think the bigger problem for this patch set is that the
> > design-level feedback from
> > https://www.postgresql.org/message-id/CA%2BTgmoaiTNj4DgQy42OT9JmTTP1NWcMV%2Bke0i%3D%2Ba7%3DVgnzqGXw%40mail.gmail.com
> > hasn't really been addressed, AFAICS. ri_LookupKeyInPkRelPlanIsValid
> > is still trivial in v7, and that still seems wrong to me. And I still
> > don't know how we're going to avoid changing the semantics in ways
> > that are undesirable, or even knowing precisely what we did change. If
> > we don't have answers to those questions, then I suspect that this
> > patch set isn't going anywhere.
>
> Amit, do you plan to work on this patch for this commitfest (and
> therefore this release?). And do you think it has a realistic chance
> of being ready for commit this month?

Unfortunately, I don't think so.

> It looks to me like you have some good feedback and can progress and
> are unlikely to finish this patch for this release. In which case
> maybe we can move it forward to the next release?

Yes, that's what I am thinking too at this point.

I agree with Robert's point that changing the implementation from an
SQL query plan to a hand-rolled C function is going to change the
semantics in some known and perhaps many unknown ways.  Until I have
enumerated all those semantic changes, it's hard to judge whether the
hand-rolled implementation is correct to begin with.  I had started
doing that a few months back but couldn't keep up due to some other
work.

An example I had found of a thing that would be broken by taking out
the executor out of the equation, as the patch does, is the behavior
of an update under READ COMMITTED isolation, whereby a PK tuple being
checked for existence is concurrently updated and thus needs to
rechecked whether it still satisfies the RI query's conditions.  The
executor has the EvalPlanQual() mechanism to do that, but while the
hand-rolled implementation did refactor ExecLockRows() to allow doing
the tuple-locking without a PlanState, it gave no consideration to
handling rechecking under READ COMMITTED isolation.

There may be other such things and I think I'd better look for them
carefully in the next cycle than in the next couple of weeks for this
release.  My apologies that I didn't withdraw the patch sooner.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: psql: Add role's membership options to the \du+ command

2023-03-20 Thread Pavel Luzanov

I would suggest tweaking the test output to include regress_du_admin ...


I found (with a help of cfbot) difficulty with this. The problem is the 
bootstrap superuser name (oid=10).
This name depends on the OS username. In my case it's pal, but in most cases 
it's postgres or something else.
And the output of \du regress_du_admin can't be predicted:

\du regress_du_admin
List of roles
Role name | Attributes  |  Member of
--+-+-
 regress_du_admin | Create role | regress_du_role0 from pal (a, i, s)+
  | | regress_du_role1 from pal (a, i, s)+
  | | regress_du_role2 from pal (a, i, s)


So, I decided not to include regress_du_admin in the test output.

Please, see version 5 attached. Only tests changed.

-
Pavel Luzanov
From b8f35733126a843edd47a1f89da0d9f8babeec93 Mon Sep 17 00:00:00 2001
From: Pavel Luzanov 
Date: Tue, 21 Mar 2023 05:58:31 +0300
Subject: [PATCH v5] psql: show membership options in the \du command

Format for the "Member of" column completely redesigned.
Shown within each row, in newline-separated format, are the memberships granted to
the role.  The presentation includes both the name of the grantor
as well as the membership permissions (in an abbreviated format:
a for admin option, i for inherit option,
s for set option.) The word 'empty' is printed in
the case that none of those permissions are granted.
---
 doc/src/sgml/ref/psql-ref.sgml | 30 ---
 src/bin/psql/describe.c| 61 --
 src/test/regress/expected/psql.out | 49 
 src/test/regress/sql/psql.sql  | 30 +++
 4 files changed, 144 insertions(+), 26 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 7b8ae9fac3..03e7da93de 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1727,9 +1727,18 @@ INSERT INTO tbl1 VALUES ($1, $2) \bind 'first value' 'second value' \g
 S modifier to include system roles.
 If pattern is specified,
 only those roles whose names match the pattern are listed.
-If the form \dg+ is used, additional information
-is shown about each role; currently this adds the comment for each
-role.
+
+
+Shown within each row, in newline-separated format, are the memberships granted to
+the role.  The presentation includes both the name of the grantor
+as well as the membership permissions (in an abbreviated format:
+a for admin option, i for inherit option,
+s for set option.) The word empty is printed in
+the case that none of those permissions are granted.
+See the GRANT command for their meaning.
+
+
+If the form \dg+ is used the comment attached to the role is shown.
 
 
   
@@ -1969,9 +1978,18 @@ INSERT INTO tbl1 VALUES ($1, $2) \bind 'first value' 'second value' \g
 S modifier to include system roles.
 If pattern is specified,
 only those roles whose names match the pattern are listed.
-If the form \du+ is used, additional information
-is shown about each role; currently this adds the comment for each
-role.
+
+
+Shown within each row, in newline-separated format, are the memberships granted to
+the role.  The presentation includes both the name of the grantor
+as well as the membership permissions (in an abbreviated format:
+a for admin option, i for inherit option,
+s for set option.) The word empty is printed in
+the case that none of those permissions are granted.
+See the GRANT command for their meaning.
+
+
+If the form \du+ is used the comment attached to the role is shown.
 
 
   
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 99e28f607e..9f7b7326e9 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -3631,24 +3631,42 @@ describeRoles(const char *pattern, bool verbose, bool showSystem)
 	printfPQExpBuffer(,
 	  "SELECT r.rolname, r.rolsuper, r.rolinherit,\n"
 	  "  r.rolcreaterole, r.rolcreatedb, r.rolcanlogin,\n"
-	  "  r.rolconnlimit, r.rolvaliduntil,\n"
-	  "  ARRAY(SELECT b.rolname\n"
-	  "FROM pg_catalog.pg_auth_members m\n"
-	  "JOIN pg_catalog.pg_roles b ON (m.roleid = b.oid)\n"
-	  "WHERE m.member = r.oid) as memberof");
+	  "  r.rolconnlimit, r.rolvaliduntil, r.rolreplication,\n");
 
-	if (verbose)
-	{
-		appendPQExpBufferStr(, "\n, pg_catalog.shobj_description(r.oid, 'pg_authid') AS description");
-		ncols++;
-	}
-	appendPQExpBufferStr(, "\n, r.rolreplication");
+	if (pset.sversion >= 16)
+		appendPQExpBufferStr(,
+			 "  (SELECT 

Re: Track IO times in pg_stat_io

2023-03-20 Thread Andres Freund
Hi,

On 2023-03-16 17:19:16 -0400, Melanie Plageman wrote:
> > > > I wonder if we should get rid of pgStatBlockReadTime, 
> > > > pgStatBlockWriteTime,
> > >
> > > And then have pg_stat_reset_shared('io') reset pg_stat_database IO
> > > stats?
> >
> > Yes.
> 
> I think this makes sense but I am hesitant to do it in this patchset,
> because it feels a bit hidden...maybe?

I'd not do it in the same commit, but I don't see a problem with doing it in
the same patchset.

Now that I think about it again, this wouldn't make pg_stat_reset_shared('io')
affect pg_stat_database - I was thinking we should use pgstat_io.c stats to
provide the information for pgstat_database.c, using its own pending counter.


> > No, I don't think I am suggesting that. What I am trying to suggest is that
> > PendingIOStats should contain instr_time, but that PgStat_IO should contain
> > PgStat_Counter as microseconds, as before.
> 
> So, I've modified the code to make a union of instr_time and
> PgStat_Counter in PgStat_BktypeIO. I am not quite sure if this is okay.
> I store in microsec and then in pg_stat_io, I multiply to get
> milliseconds for display.

Not a fan - what do we gain by having this union? It seems considerably
cleaner to have a struct local to pgstat_io.c that uses instr_time and have a
clean type in PgStat_BktypeIO.  In fact, the code worked after just changing
that.

I don't think it makes sense to have pgstat_io_start()/end() as well as
pgstat_count_io*. For one, the name seems in a "too general namespace" - why
not a pgstat_count*?


> I considered refactoring pgstat_io_end() to use INSTR_TIME_ACCUM_DIFF()
> like [1], but, in the end I actually think I would end up with more
> operations because of the various different counters needing to be
> updated. As it is now, I do a single subtract and a few adds (one for
> each of the different statistics objects tracking IO times
> (pgBufferUsage, pgStatBlockWrite/ReadTime). Whereas, I would need to do
> an accum diff for every one of those.

Right - that only INSTR_TIME_ACCUM_DIFF() only makes sense if there's just a
single counter to update.


WRT:
/* TODO: AFAICT, pgstat_count_buffer_write_time 
is only called */
/* for shared buffers whereas 
pgstat_count_buffer_read_time is */
/* called for temp relations and shared 
buffers. */
/*
 * is this intentional and should I match 
current behavior or
 * not?
 */

It's hard to see how that behaviour could be intentional.  Probably worth
fixing in a separate patch. I don't think we're going to backpatch, but it
would make this clearer nonetheless.


Incremental patch with some of the above changed attached.



Btw, it's quite nice how one now can attribute time more easily:

20 connections copying an 8MB file 50 times each:
SELECT reuses, evictions, writes, write_time, extends, extend_time FROM 
pg_stat_io WHERE backend_type = 'client backend' AND io_object = 'relation' AND 
io_context='bulkwrite';
┌┬───┬┬┬─┬─┐
│ reuses │ evictions │ writes │ write_time │ extends │ extend_time │
├┼───┼┼┼─┼─┤
│  36112 │ 0 │  36112 │141 │ 1523176 │8676 │
└┴───┴┴┴─┴─┘

20 connections copying an 80MB file 5 times each:
┌─┬───┬─┬┬─┬─┐
│ reuses  │ evictions │ writes  │ write_time │ extends │ extend_time │
├─┼───┼─┼┼─┼─┤
│ 1318539 │ 0 │ 1318539 │   5013 │ 1523339 │7873 │
└─┴───┴─┴┴─┴─┘
(1 row)


Greetings,

Andres
>From 5d4aa3f6c651006f1ec960f59e24ebc8b5a8ca25 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Mon, 6 Mar 2023 10:41:51 -0500
Subject: [PATCH v6 1/2] Track IO times in pg_stat_io

Add IO timing for reads, writes, extends, and fsyncs to pg_stat_io.

Reviewed-by: Bertrand Drouvot 
Reviewed-by: Andres Freund 
Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_ay5iKmnbXZ3DsauViF3eMxu4m1oNnJXqV_HyqYeg55Ww%40mail.gmail.com
---
 src/include/catalog/pg_proc.dat|   6 +-
 src/include/pgstat.h   |  14 +++-
 src/backend/catalog/system_views.sql   |   4 +
 src/backend/storage/buffer/bufmgr.c|  56 ++
 src/backend/storage/buffer/localbuf.c  |   6 +-
 src/backend/storage/smgr/md.c  |  27 ---
 src/backend/utils/activity/pgstat.c|  77 ++-
 src/backend/utils/activity/pgstat_io.c | 101 -
 src/backend/utils/adt/pgstatfuncs.c|  48 ++--
 doc/src/sgml/monitoring.sgml   |  59 +++
 src/test/regress/expected/rules.out|   6 

Re: Assertion failure with barriers in parallel hash join

2023-03-20 Thread Thomas Munro
Pushed and back-patched, with minor comment tweaks.  Apologies for
taking so long.




Re: Initial Schema Sync for Logical Replication

2023-03-20 Thread Euler Taveira
On Mon, Mar 20, 2023, at 10:10 PM, Kumar, Sachin wrote:
> > From: Alvaro Herrera 
> > Subject: RE: [EXTERNAL]Initial Schema Sync for Logical Replication
> > On 2023-Mar-15, Kumar, Sachin wrote:
> > 
> > > 1. In  CreateSubscription()  when we create replication
> > > slot(walrcv_create_slot()), should use CRS_EXPORT_SNAPSHOT, So that we
> > can use this snapshot later in the pg_dump.
> > >
> > > 2.  Now we can call pg_dump with above snapshot from CreateSubscription.
> > 
> > Overall I'm not on board with the idea that logical replication would 
> > depend on
> > pg_dump; that seems like it could run into all sorts of trouble (what if 
> > calling
> > external binaries requires additional security setup?  what about pg_hba
> > connection requirements? what about max_connections in tight
> > circumstances?).
> > what if calling external binaries requires additional security setup
> I am not sure what kind of security restriction would apply in this case, 
> maybe pg_dump
> binary can be changed ? 
Using pg_dump as part of this implementation is not acceptable because we
expect the backend to be decoupled from the client. Besides that, pg_dump
provides all table dependencies (such as tablespaces, privileges, security
labels, comments); not all dependencies shouldn't be replicated. You should
exclude them removing these objects from the TOC before running pg_restore or
adding a few pg_dump options to exclude these objects. Another issue is related
to different version. Let's say the publisher has a version ahead of the
subscriber version, a new table syntax can easily break your logical
replication setup. IMO pg_dump doesn't seem like a good solution for initial
synchronization.

Instead, the backend should provide infrastructure to obtain the required DDL
commands for the specific (set of) tables. This can work around the issues from
the previous paragraph:

* you can selectively choose dependencies;
* don't require additional client packages;
* don't need to worry about different versions.

This infrastructure can also be useful for other use cases such as:

* client tools that provide create commands (such as psql, pgAdmin);
* other logical replication solutions;
* other logical backup solutions.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Allow logical replication to copy tables in binary format

2023-03-20 Thread Peter Smith
Here are my review comments for v18-0001

==
doc/src/sgml/logical-replication.sgml

1.
+   target table. However, logical replication in binary format is more
+   restrictive. See the binary option of
+   CREATE
SUBSCRIPTION
+   for details.
   

Because you've changed the linkend to be the binary option, IMO now
the  part also needs to be modified. Otherwise, this page has
multiple "CREATE SUBSCRIPTION" links which jump to different places,
which just seems wrong to me.

SUGGESTION (for the "See the" sentence)

See the binary
option of CREATE SUBSCRIPTION for details.

==
doc/src/sgml/ref/alter_subscription.sgml

2.
+ 
+  See the binary option of
+  CREATE
SUBSCRIPTION
+  for details about copying pre-existing data in binary format.
+ 

(Same as review comment #1 above)

SUGGESTION
See the binary
option of CREATE SUBSCRIPTION for details
about copying pre-existing data in binary format.


==
src/backend/replication/logical/tablesync.c

3.
+ /*
+ * Prior to v16, initial table synchronization will use text format even
+ * if the binary option is enabled for a subscription.
+ */
+ if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 16 &&
+ MySubscription->binary)
+ {
+ appendStringInfoString(, " WITH (FORMAT binary)");
+ options = lappend(options,
+   makeDefElem("format",
+   (Node *) makeString("binary"), -1));
+ }

I think there can only be 0 or 1 list element in 'options'.

So, why does the code here use lappend(options,...) instead of just
using list_make1(...)?

==
src/test/subscription/t/014_binary.pl

4.
# -
# Test mismatched column types with/without binary mode
# -

# Test syncing tables with mismatching column types
$node_publisher->safe_psql(
'postgres', qq(
CREATE TABLE public.test_mismatching_types (
a bigint PRIMARY KEY
);
INSERT INTO public.test_mismatching_types (a)
VALUES (1), (2);
));

# Check the subscriber log from now on.
$offset = -s $node_subscriber->logfile;

# Ensure the subscription is enabled. disable_on_error is still true,
# so the subscription can be disabled due to missing realtion until
# the test_mismatching_types table is created.
$node_subscriber->safe_psql(
'postgres', qq(
CREATE TABLE public.test_mismatching_types (
a int PRIMARY KEY
);
ALTER SUBSCRIPTION tsub ENABLE;
ALTER SUBSCRIPTION tsub REFRESH PUBLICATION;
));

~~

I found the "Ensure the subscription is enabled..." comment and the
necessity for enabling the subscription to be confusing.

Can't some complications all be eliminated just by creating the table
on the subscribe side first?

For example, I rearranged that test (above fragment) like below and it
still works OK for me:

# -
# Test mismatched column types with/without binary mode
# -

# Create the table on the subscriber side
$node_subscriber->safe_psql(
'postgres', qq(
CREATE TABLE public.test_mismatching_types (
a int PRIMARY KEY
)));

# Check the subscriber log from now on.
$offset = -s $node_subscriber->logfile;

# Test syncing tables with mismatching column types
$node_publisher->safe_psql(
'postgres', qq(
CREATE TABLE public.test_mismatching_types (
a bigint PRIMARY KEY
);
INSERT INTO public.test_mismatching_types (a)
VALUES (1), (2);
));

# Refresh the publication to trigger the tablesync
$node_subscriber->safe_psql(
'postgres', qq(
ALTER SUBSCRIPTION tsub REFRESH PUBLICATION;
));

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Fix typo plgsql to plpgsql.

2023-03-20 Thread Michael Paquier
On Tue, Mar 21, 2023 at 09:04:32AM +0800, Richard Guo wrote:
> +1. I believe these are typos. And a grep search shows that all typos of
> this kind are addressed by the patch.

Yes, you are right.  The comments fixed here are related to plpgsql.
Will fix..
--
Michael


signature.asc
Description: PGP signature


RE: Initial Schema Sync for Logical Replication

2023-03-20 Thread Kumar, Sachin
Hi Alvaro,

> From: Alvaro Herrera 
> Subject: RE: [EXTERNAL]Initial Schema Sync for Logical Replication
> On 2023-Mar-15, Kumar, Sachin wrote:
> 
> > 1. In  CreateSubscription()  when we create replication
> > slot(walrcv_create_slot()), should use CRS_EXPORT_SNAPSHOT, So that we
> can use this snapshot later in the pg_dump.
> >
> > 2.  Now we can call pg_dump with above snapshot from CreateSubscription.
> 
> Overall I'm not on board with the idea that logical replication would depend 
> on
> pg_dump; that seems like it could run into all sorts of trouble (what if 
> calling
> external binaries requires additional security setup?  what about pg_hba
> connection requirements? what about max_connections in tight
> circumstances?).
> what if calling external binaries requires additional security setup
I am not sure what kind of security restriction would apply in this case, maybe 
pg_dump
binary can be changed ? 
> what about pg_hba connection requirements?
We will use the same connection string which subscriber process uses to connect 
to
the publisher.
>what about max_connections in tight circumstances?
Right that might be a issue, but I don’t think it will be a big issue, We will 
create dump
of database in CreateSubscription() function itself , So before tableSync 
process even starts
if we have reached max_connections while calling pg_dump itself , tableSync 
wont be successful.
> It would be much better, I think, to handle this internally in the publisher 
> instead:
> similar to how DDL sync would work, except it'd somehow generate the CREATE
> statements from the existing tables instead of waiting for DDL events to 
> occur.  I
> grant that this does require writing a bunch of new code for each object 
> type, a
> lot of which would duplicate the pg_dump logic, but it would probably be a lot
> more robust.
Agree , But we might have a lots of code duplication essentially almost all of 
pg_dump
Code needs to be duplicated, which might cause issue when modifying/adding new
DDLs. 
I am not sure but if it's possible to move dependent code of pg_dump to common/ 
folder
, to avoid duplication.

Regards
Sachin


Re: [BUG] pg_stat_statements and extended query protocol

2023-03-20 Thread Michael Paquier
On Mon, Mar 20, 2023 at 09:41:12PM +, Imseih (AWS), Sami wrote:
> I was thinking about this and it seems to me we can avoid
> adding new fields to Estate. I think a better place to track
> rows and calls is in the Instrumentation struct.
> 
> If this is more palatable, I can prepare the patch.

This indeed feels a bit more natural seen from here, after looking at
the code paths using an Instrumentation in the executor and explain,
for example.  At least, this stresses me much less than adding 16
bytes to EState for something restricted to the extended protocol when
it comes to monitoring capabilities.
--
Michael


signature.asc
Description: PGP signature


Re: Fix typo plgsql to plpgsql.

2023-03-20 Thread Richard Guo
On Tue, Mar 21, 2023 at 12:26 AM Zhang Mingli  wrote:

> Found several typos like plgsql, I think it should be plpgsql.
>

+1. I believe these are typos. And a grep search shows that all typos of
this kind are addressed by the patch.

Thanks
Richard


Re: Question about pull_up_sublinks_qual_recurse

2023-03-20 Thread Andy Fan
On Sat, Oct 15, 2022 at 3:27 AM Tom Lane  wrote:

> Andy Fan  writes:
> > After some more self review,  I find my proposal has the following side
> > effects.
>
> Yeah, I do not think this works at all.  

The discussion of outer join
> reordering in optimizer/README says that that doesn't work, and while
> I'm too lazy to construct an example right now, I believe it's true.
>

I came to this topic again recently and have finally figured out the
reason.  It looks to me that semi join is slightly different with outer
join in this case.

The following test case can show why we have to
pull_up_sublinks_qual_recurse to either LHS or RHS rather than the
JoinExpr.

create table t1(a int, b int, c int);
create table t2(a int, b int, c int);
create table t3(a int, b int, c int);

insert into t1 select 1, 1, 2;
insert into t2 select 1, 2, 1;
insert into t2 select 1, 1, 2;
insert into t3 select 1, 1, 2;

select * from t1
where exists (select 1 from t2
-- below references to t1 and t2 at the same time
where exists (select 1 from t3
   where t1.c = t2.c and t2.b = t3.b)
and t1.a = t2.a);

which can be transformed to

SELECT * FROM t1 SEMI JOIN t2
ON t1.a = t2.a
AND exists (select 1 from t3
where t1.c = t2.c
and t2.b = t3.b)

Here the semantics of the query is return the rows in T1 iff there is a
row in t2 matches the whole clause (t1.a = t2.a AND exists..);

But If we transform it to

SELECT * FROM (t1 SEMI JOIN t2
ON t1.a = t2.a) SEMI JOIN t3
on t1.c = t2.c and t2.b = t3.b;

The scan on T2 would stop if ONLY (t1.a = t2.a) matches and the following
rows will be ignored. However the matched rows may doesn't match the
exists clause! So in the above example, the correct result set will be 1
row. If we pull up the sublink above the JoinExpr, no row would be found.

The attached is just a comment and a test case to help understand why we
have to do things like this.

-- 
Best Regards
Andy Fan


v1-0001-test-case-and-comments-to-help-understand-why-we-.patch
Description: Binary data


Re: Save a few bytes in pg_attribute

2023-03-20 Thread Andres Freund
Hi,

On 2023-03-20 11:00:12 +0100, Peter Eisentraut wrote:
> After the discussion in [0] ff., I was looking around in pg_attribute and
> noticed that we could possibly save 4 bytes.  We could change both
> attstattarget and attinhcount from int4 to int2, which together with some
> reordering would save 4 bytes from the fixed portion.

attndims seems like another good candidate to shrink.

Greetings,

Andres Freund




Re: Request for comment on setting binary format output per session

2023-03-20 Thread Dave Cramer
On Mon, 20 Mar 2023 at 15:09, Jeff Davis  wrote:

> On Mon, 2023-03-20 at 14:36 -0400, Dave Cramer wrote:
> > Thanks for the review. I'm curious what system you are running on as
> > I don't see any of these errors.
>
> Are asserts enabled?
>
> > Well I'm guessing psql doesn't know how to read date or timestamptz
> > in binary. This is not a failing of the code.
>
> It seems strange, and potentially dangerous, to send binary data to a
> client that's not expecting it. It feels too easy to cause confusion by
> changing the GUC mid-session.
>
> Also, it seems like DISCARD ALL is not resetting it, which I think is a
> bug.
>
Thanks yes, this is a bug

>
> >
> > This is an interesting question. If the type isn't visible then it's
> > not visible to the query so
>
> I don't think that's true -- the type could be in a different schema
> from the table.


Good point. This seems to be the very difficult part.

>


> > >
> > > 5. There's a theoretical invalidation problem. It might also be a
> > > practical problem in some testing setups with long-lived
> > > connections
> > > that are recreating user-defined types.
> > >
> >
> > UDT's seem to be a problem here which candidly have very little use
> > case for binary output.
>
> I mostly agree with that, but it also might not be hard to support
> UDTs. Is there a design problem here or is it "just a matter of code"?
>
> >
> > I didn't try to solve it as Tom was OK with using a GUC. Using a
> > startup GUC is interesting,
> > but how would that work with pools where we want to reset the
> > connection when we return it and then
> > set the binary format on borrow ? By using a GUC when a client
> > borrows a connection from a pool the client
> > can reconfigure the oids it wants formatted in binary.
>
> That's a good point. How common is it to share a connection pool
> between different clients (some of which might support a binary format,
> and others which don't)? And would the connection pool put connections
> with and without the property in different pools?
>

For JAVA pools it's probably OK, but for pools like pgbouncer we have no
control of who is going to get the connection next.


>
> >
> > I really hadn't considered supporting type names. I have asked Paul
> > Ramsey  about PostGIS and he doesn't see PostGIS using this.
>
> One of the things I like about Postgres is that the features all work
> together, and that user-defined objects are generally as good as built-
> in ones. Sometimes there's a reason to make a special case (e.g. syntax
> support or something), but in this case it seems like we could support
> user-defined types just fine, right? It's also just more friendly and
> readable to use type names, especially if it's a GUC.
>
> Regards,
> Jeff Davis
>
>


Re: Missing rules for queryjumblefuncs.{funcs,switch}.c for maintainer-clean

2023-03-20 Thread Michael Paquier
On Mon, Mar 20, 2023 at 10:21:28AM -0400, Tom Lane wrote:
> Is similar knowledge missing in the meson build files?

src/backend/nodes/meson.build and src/include/nodes/meson.build are
the two meson files that have the knowledge about the files generated
by gen_node_support.pl, and the query jumbling files are consistent
with that since 0e681cf.  Perhaps I've missed an extra spot?
--
Michael


signature.asc
Description: PGP signature


Re: Request for comment on setting binary format output per session

2023-03-20 Thread Dave Cramer
Dave Cramer


On Mon, 20 Mar 2023 at 15:09, Tom Lane  wrote:

> Dave Cramer  writes:
> > On Mon, 20 Mar 2023 at 13:05, Jeff Davis  wrote:
> >> 2. Easy to confuse psql:
> >>
> >> CREATE TABLE a(d date, t timestamptz);
> >> SET format_binary='25,1082,1184';
> >> SELECT * FROM a;
> >> d | t
> >> ---+---
> >> ! |
> >> (1 row)
> >>
> >> Well I'm guessing psql doesn't know how to read date or timestamptz in
> >> binary. This is not a failing of the code.
>
> What it is is a strong suggestion that controlling this via a GUC is
> not a great choice.  There are many inappropriate (wrong abstraction
> level) ways to change a GUC and thereby break a client that's not
> expecting binary output.  I think Jeff's suggestion that we should
> treat this as a protocol extension might be a good idea.
>
> If I recall the protocol-extension design correctly, such a setting
> could only be set at session start, which could be annoying --- at the
> very least we'd have to tolerate entries for unrecognized data types,
> since clients couldn't be expected to have checked the list against
> the current server in advance.
>

As mentioned for connection pools we need to be able to set these after the
session starts.
I'm not sure how useful the protocol extension mechanism works given that
it can only be used on startup.

>
> regards, tom lane
>


Re: Request for comment on setting binary format output per session

2023-03-20 Thread Dave Cramer
On Mon, 20 Mar 2023 at 19:10, Merlin Moncure  wrote:

>
>
> On Mon, Mar 13, 2023 at 3:33 PM Dave Cramer  wrote:
>
>>
>> Dave Cramer
>>
>>
>> On Sat, 4 Mar 2023 at 19:39, Dave Cramer  wrote:
>>
>>>
>>>
>>> On Sat, 4 Mar 2023 at 19:06, Tom Lane  wrote:
>>>
 Jeff Davis  writes:
 > On Sat, 2023-03-04 at 18:04 -0500, Dave Cramer wrote:
 >> Most of the clients know how to decode the builtin types. I'm not
 >> sure there is a use case for binary encode types that the clients
 >> don't have a priori knowledge of.

 > The client could, in theory, have a priori knowledge of a non-builtin
 > type.

 I don't see what's "in theory" about that.  There seems plenty of
 use for binary I/O of, say, PostGIS types.  Even for built-in types,
 do we really want to encourage people to hard-wire their OIDs into
 applications?

>>>
>>> How does a client read these? I'm pretty narrowly focussed. The JDBC API
>>> doesn't really have a way to read a non built-in type.  There is a facility
>>> to read a UDT, but the user would have to provide that transcoder. I guess
>>> I'm curious how other clients read binary UDT's ?
>>>

 I don't see a big problem with driving this off a GUC, but I think
 it should be a list of type names not OIDs.  We already have plenty
 of precedent for dealing with that sort of thing; see search_path
 for the canonical example.  IIRC, there's similar caching logic
 for temp_tablespaces.

>>>
>>> I have no issue with allowing names, OID's were compact, but we could
>>> easily support both
>>>
>>
>> Attached is a preliminary patch that takes a list of OID's. I'd like to
>> know if this is going in the right direction.
>>
>> Next step would be to deal with type names as opposed to OID's.
>> This will be a bit more challenging as type names are schema specific.
>>
>
> OIDs are a pain to deal with IMO.   They will not survive a dump style
> restore, and are hard to keep synchronized between databases...type names
> don't have this problem.   OIDs are an implementation artifact that ought
> not need any extra dependency.
>
AFAIK, OID's for built-in types don't change.
Clearly we need more thought on how to deal with UDT's

>
>
> This seems like a protocol or even a driver issue rather than a GUC issue.
> Why does the server need to care what format the client might want to
> prefer on a query by query basis?
>

Actually this isn't a query by query basis. The point of this is that the
client wants all the results for given OID's in binary.


> I just don't see it. The resultformat switch in libpq works pretty well,
> except that it's "all in" on getting data from the server, with the dead
> simple workaround of casting to text which might even be able to be managed
> from within the driver itself.
>
> merlin
>
>
>


Re: Add pg_walinspect function with block info columns

2023-03-20 Thread Peter Geoghegan
On Mon, Mar 20, 2023 at 4:51 PM Peter Geoghegan  wrote:
> The new pg_get_wal_block_info outputs columns in an order that doesn't
> seem like the most useful possible order to me. This gives us another
> reason to have separate GetWALRecordInfo and GetWALBlockInfo utility
> functions rather than sharing logic for building output tuples.

One more piece of feedback for Bharath:

I think that we should also make the description output column display
NULLs for those records that don't output any description string. This
at least includes the "FPI" record type from the "XLOG" rmgr.
Alternatively, we could find a way of making it show a description.

-- 
Peter Geoghegan




Re: Add pg_walinspect function with block info columns

2023-03-20 Thread Peter Geoghegan
On Mon, Mar 20, 2023 at 4:34 PM Peter Geoghegan  wrote:
> I agree. A little redundancy is better when the alternative is fragile
> code, and I'm pretty sure that that applies here -- there won't be
> very many duplicated lines, and the final code will be significantly
> clearer. There can be a comment about keeping GetWALRecordInfo and
> GetWALBlockInfo in sync.

The new pg_get_wal_block_info outputs columns in an order that doesn't
seem like the most useful possible order to me. This gives us another
reason to have separate GetWALRecordInfo and GetWALBlockInfo utility
functions rather than sharing logic for building output tuples.

Specifically, I think that pg_get_wal_block_info should ouput the
"primary key" columns first:

reltablespace, reldatabase, relfilenode, blockid, start_lsn, end_lsn

Next comes the columns that duplicate the columns output by
pg_get_wal_records_info, in the same order as they appear in
pg_get_wal_records_info. (Obviously this won't include block_ref).

-- 
Peter Geoghegan




Re: Save a few bytes in pg_attribute

2023-03-20 Thread Tom Lane
Andres Freund  writes:
> On 2023-03-20 10:37:36 -0400, Tom Lane wrote:
>> I agree that attinhcount could be narrowed, but I have some concern
>> about attstattarget.  IIRC, the limit on attstattarget was once 1000
>> and then we raised it to 1.  Is it inconceivable that we might
>> want to raise it to 10 someday?

> Hard to believe that'd happen in a minor version - and I don't think there'd
> an issue with widening it again in a major version?

True.  However, I think Tomas' idea of making these columns nullable
is even better than narrowing them.

regards, tom lane




Re: Save a few bytes in pg_attribute

2023-03-20 Thread Andres Freund
Hi,

On 2023-03-20 10:37:36 -0400, Tom Lane wrote:
> I agree that attinhcount could be narrowed, but I have some concern
> about attstattarget.  IIRC, the limit on attstattarget was once 1000
> and then we raised it to 1.  Is it inconceivable that we might
> want to raise it to 10 someday?

Hard to believe that'd happen in a minor version - and I don't think there'd
an issue with widening it again in a major version?

I doubt we'll ever go to 100k without a major redesign of stats storage/access
- the size of the stats datums would make that pretty impractical right now.

Greetings,

Andres Freund




Re: Add pg_walinspect function with block info columns

2023-03-20 Thread Peter Geoghegan
On Sun, Mar 19, 2023 at 8:21 PM Kyotaro Horiguchi
 wrote:
> The documentation has just one section titled "General Functions"
> which directly contains detailed explation of four functions, making
> it hard to get clear understanding of the available functions.  I
> considered breaking it down into a few subsections, but that wouldn't
> look great since most of them would only contain one function.
> However, I feel it would be helpful to add a list of all functions at
> the beginning of the section.

I like the idea of sections, even if there is only one function per
section in some cases.

I also think that we should add a "Tip" that advises users that they
may use an "end LSN" that is the largest possible LSN,
'/' to get information about records up until the
current LSN of the cluster (per commit 5c1b6628).

Is there a straightforward way to get a usable LSN constant for this
purpose? The simplest way I could come up with quickly is "SELECT
pg_lsn(2^64.-1)" -- which still isn't very simple. Actually, it might
be even worse than '/', so perhaps we should just use
that in the docs new "Tip".

> I agree that adding a note about the characteristics would helpful to
> avoid the misuse of pg_get_wal_block_info(). How about something like,
> "Note that pg_get_wal_block_info() omits records that contains no
> block references."?

This should be a strict invariant. In other words, it should be part
of the documented contract of pg_get_wal_block_info and
pg_get_wal_records_info. The two functions should be defined in terms
of each other. Their relationship is important.

Users should be able to safely assume that the records that have a
NULL block_ref according to pg_get_wal_records_info are *precisely*
those records that won't have any entries within pg_get_wal_block_info
(assuming that the same LSN range is used with both functions).
pg_walinspect should explicitly promise this, and promise the
corollary condition around non-NULL block_ref records. It is a useful
promise from the point of view of users. It also makes it easier to
understand what's really going on here without any ambiguity.

I don't completely disagree with Michael about the redundancy. I just
think that it's worth it on performance grounds. We might want to say
that directly in the docs, too.

> > Attaching v3 patch set - 0001 optimizations around block references,
> > 0002 enables pg_get_wal_block_info() to emit per-record info. Any
> > thoughts?
>
> +   /* Get block references, if any, otherwise continue. */
> +   if (!XLogRecHasAnyBlockRefs(xlogreader))
> +   continue;
>
> I'm not sure, but the "continue" might be confusing since the code
> "continue"s if the condition is true and continues the process
> otherwise..  And it seems like a kind of "explaination of what the
> code does". I feel we don't need the a comment there.

+1.

Also, if GetWALBlockInfo() is now supposed to only be called when
XLogRecHasAnyBlockRefs() now then it should probably have an assertion
to verify the precondition.

> It is not an issue with this patch, but as I look at this version, I'm
> starting to feel uneasy about the subtle differences between what
> GetWALRecordsInfo and GetWALBlockInfo do. One solution might be to
> have GetWALBlockInfo return a values array for each block, but that
> could make things more complex than needed. Alternatively, could we
> get GetWALRecordsInfo to call tuplestore_putvalues() internally? This
> way, both functions can manage the temporary memory context within
> themselves.

 Agreed. I'm also not sure what to do about it, though.

> This means GetWALBlockInfo overwrites the last two columns generated
> by GetWalRecordInfo, but I don't think this approach is clean and
> stable. I agree we don't want the final columns in a block info tuple
> but we don't want to duplicate the common code path.

> I initially thought we could devide the function into
> GetWALCommonInfo(), GetWALRecordInfo() and GetWALBlockInfo(), but it
> doesn't seem that simple.. In the end, I think we should have separate
> GetWALRecordInfo() and GetWALBlockInfo() that have duplicate
> "values[i++] = .." lines.

I agree. A little redundancy is better when the alternative is fragile
code, and I'm pretty sure that that applies here -- there won't be
very many duplicated lines, and the final code will be significantly
clearer. There can be a comment about keeping GetWALRecordInfo and
GetWALBlockInfo in sync.

-- 
Peter Geoghegan




Re: Request for comment on setting binary format output per session

2023-03-20 Thread Merlin Moncure
On Mon, Mar 13, 2023 at 3:33 PM Dave Cramer  wrote:

>
> Dave Cramer
>
>
> On Sat, 4 Mar 2023 at 19:39, Dave Cramer  wrote:
>
>>
>>
>> On Sat, 4 Mar 2023 at 19:06, Tom Lane  wrote:
>>
>>> Jeff Davis  writes:
>>> > On Sat, 2023-03-04 at 18:04 -0500, Dave Cramer wrote:
>>> >> Most of the clients know how to decode the builtin types. I'm not
>>> >> sure there is a use case for binary encode types that the clients
>>> >> don't have a priori knowledge of.
>>>
>>> > The client could, in theory, have a priori knowledge of a non-builtin
>>> > type.
>>>
>>> I don't see what's "in theory" about that.  There seems plenty of
>>> use for binary I/O of, say, PostGIS types.  Even for built-in types,
>>> do we really want to encourage people to hard-wire their OIDs into
>>> applications?
>>>
>>
>> How does a client read these? I'm pretty narrowly focussed. The JDBC API
>> doesn't really have a way to read a non built-in type.  There is a facility
>> to read a UDT, but the user would have to provide that transcoder. I guess
>> I'm curious how other clients read binary UDT's ?
>>
>>>
>>> I don't see a big problem with driving this off a GUC, but I think
>>> it should be a list of type names not OIDs.  We already have plenty
>>> of precedent for dealing with that sort of thing; see search_path
>>> for the canonical example.  IIRC, there's similar caching logic
>>> for temp_tablespaces.
>>>
>>
>> I have no issue with allowing names, OID's were compact, but we could
>> easily support both
>>
>
> Attached is a preliminary patch that takes a list of OID's. I'd like to
> know if this is going in the right direction.
>
> Next step would be to deal with type names as opposed to OID's.
> This will be a bit more challenging as type names are schema specific.
>

OIDs are a pain to deal with IMO.   They will not survive a dump style
restore, and are hard to keep synchronized between databases...type names
don't have this problem.   OIDs are an implementation artifact that ought
not need any extra dependency.

This seems like a protocol or even a driver issue rather than a GUC issue.
Why does the server need to care what format the client might want to
prefer on a query by query basis?  I just don't see it. The resultformat
switch in libpq works pretty well, except that it's "all in" on getting
data from the server, with the dead simple workaround of casting to text
which might even be able to be managed from within the driver itself.

merlin


Re: Ability to reference other extensions by schema in extension scripts

2023-03-20 Thread Tom Lane
Sandro Santilli  writes:
> On Mon, Mar 13, 2023 at 05:57:57PM -0400, Regina Obe wrote:
>> Attached is a slightly revised patch to fix the extra whitespace in the
>> extend.gml document that Sandro noted to me.

> Thanks Regina.
> I've tested attached patch (md5 0b652a8271fc7e71ed5f712ac162a0ef)
> against current master (hash 4ef1be5a0b676a9f030cc2e4837f4b5650ecb069).
> The patch applies cleanly, builds cleanly, regresses cleanly.

Pushed with some mostly-cosmetic adjustments (in particular I tried
to make the docs and tests neater).

I did not commit the changes in get_available_versions_for_extension
to add no_relocate as an output column.  Those were dead code because
you hadn't done anything to connect them up to an actual output parameter
of pg_available_extension_versions().  While I'm not necessarily averse
to making the no_relocate values visible somehow, I'm not convinced that
pg_available_extension_versions should be the place to do it.  ISTM what's
relevant is the no_relocate values of *installed* extensions, not those of
potentially-installable extensions.  If we had a view over pg_extension
then that might be a place to add this, but we don't.  On the whole it
didn't seem important enough to pursue, so I just left it out.

regards, tom lane




Re: Add LZ4 compression in pg_dump

2023-03-20 Thread Justin Pryzby
On Fri, Mar 17, 2023 at 03:43:58PM +, gkokola...@pm.me wrote:
> From a174cdff4ec8aad59f5bcc7e8d52218a14fe56fc Mon Sep 17 00:00:00 2001
> From: Georgios Kokolatos 
> Date: Fri, 17 Mar 2023 14:45:58 +
> Subject: [PATCH v3 1/3] Improve type handling in pg_dump's compress file API

> -int
> +bool
>  EndCompressFileHandle(CompressFileHandle *CFH)
>  {
> - int ret = 0;
> + boolret = 0;

Should say "= false" ?

>   /*
>* Write 'size' bytes of data into the file from 'ptr'.
> +  *
> +  * Returns true on success and false on error.
> +  */
> +   bool(*write_func) (const void *ptr, size_t size,

> -* Get a pointer to a string that describes an error that occurred 
> during a
> -* compress file handle operation.
> +* Get a pointer to a string that describes an error that occurred 
> during
> +* a compress file handle operation.
>  */
> const char *(*get_error_func) (CompressFileHandle *CFH);

This should mention that the error accessible in error_func() applies (only) to
write_func() ?

As long as this touches pg_backup_directory.c you could update the
header comment to refer to "compressed extensions", not just .gz.

I noticed that EndCompressorLZ4() tests "if (LZ4cs)", but that should
always be true.

I was able to convert the zstd patch to this new API with no issue.

-- 
Justin




Re: [PATCH] Fix alter subscription concurrency errors

2023-03-20 Thread Jelte Fennema
> "Gregory Stark (as CFM)"  writes:
> > Hm. This patch is still waiting on updates. But it's a bug fix and it
> > would be good to get this in. Is someone else interested in finishing
> > this if Jeite isn't available?
>
> I think the patch as-submitted is pretty uninteresting, mainly because the
> design of adding bespoke lock code for subscription objects doesn't scale.
> I'm not excited about doing this just for subscriptions without at least a
> clear plan for working on other object types.
>
> Alvaro suggested switching it to use get_object_address() instead, which'd
> be better; but before going in that direction we might have to do more
> work on get_object_address's error reporting (note the para in its header
> comments saying it's pretty weak on that).

Sorry for not responding earlier in this thread. I'll be honest in
saying this was a small annoyance to me, so I ignored theresonses more
than I should have. It caused some test flakiness in the Citus test
suite, and it seemed that fixing the underlying issue in Postgres was
most appropriate. I addressed this in Citus its test suite by
disabling the relevant test (which was an edge case test anyway). So
my immidiate problem was fixed, and I stopped caring about this patch
very much. Definitely not enough to address this for all other DDLs
with the same issue.

All in all I'm having a hard time feeling motivated to work on a patch
that I don't care much about. Especially since I have two other
patches open for a few commit fests that I actually care about, but
those patches have received (imho) very little input. Which makes it
hard to justify to myself to spend time on this patch, given the
knowledge that if I would spend time on it, it might take away the
precious reviewer time from the patches I do care about.




Re: [BUG] pg_stat_statements and extended query protocol

2023-03-20 Thread Imseih (AWS), Sami
Sorry about the delay in response about this.

I was thinking about this and it seems to me we can avoid
adding new fields to Estate. I think a better place to track
rows and calls is in the Instrumentation struct.

--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -88,6 +88,8 @@ typedef struct Instrumentation
double  nfiltered2; /* # of tuples removed by 
"other" quals */
BufferUsage bufusage;   /* total buffer usage */
WalUsagewalusage;   /* total WAL usage */
+   int64   calls;
+   int64   rows_processed;
 } Instrumentation; 


If this is more palatable, I can prepare the patch.

Thanks for your feedback.

Regards.

Sami Imseih
Amazon Web Services (AWS)









Re: Add LZ4 compression in pg_dump

2023-03-20 Thread Tomas Vondra
Hi,

I was preparing to get the 3 cleanup patches pushed, so I
updated/reworded the commit messages a bit (attached, please check).

But I noticed the commit message for 0001 said:

  In passing save the appropriate errno in LZ4File_open_write in
  case that the caller is not using the API's get_error_func.

I think that's far too low-level for a commit message, it'd be much more
appropriate for a comment at the function.

However, do we even need this behavior? I was looking for code calling
this function without using get_error_func(), but no luck. And if there
is such caller, shouldn't we fix it to use get_error_func()?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom 6a3d5d743f022ffcd0fcaf3d6e9ba711e2e785e7 Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Fri, 17 Mar 2023 14:45:58 +
Subject: [PATCH v4 1/3] Improve type handling in pg_dump's compress file API
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

After 0da243fed0 got committed, we've received a report about a compiler
warning, related to the new LZ4File_gets() function:

  compress_lz4.c: In function ‘LZ4File_gets’:
  compress_lz4.c:492:19: warning: comparison of unsigned expression in
  ‘< 0’ is always false [-Wtype-limits]
492 | if (dsize < 0)

The reason is very simple - dsize is declared as size_t, which is an
unsigned integer, and thus the check is pointless and we might fail to
notice an error in some cases (or fail in a strange way a bit later).

The warning could have been silenced by simply changing the type, but we
realized the API mostly assumes all the libraries use the same types and
report errors the same way (e.g. by returning 0 and/or negative value).

But we can't make this assumption - the gzip/lz4 libraries already
disagree on some of this, and even if they did a library added in the
future might not.

The right solution is to define what the API does, and translate the
library-specific behavior in consistent way (so that the internal errors
are not exposed to users of our compression API).

For that reason, this commit adjusts the data types in a couple places,
so that we don't miss library errors, and unifies the error reporting to
simply return true/false (instead of e.g. size_t).

Author: Georgios Kokolatos
Reported-by: Alexander Lakhin
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/33496f7c-3449-1426-d568-63f6bca2a...@gmail.com
---
 src/bin/pg_dump/compress_gzip.c   | 37 ++--
 src/bin/pg_dump/compress_io.c |  8 +--
 src/bin/pg_dump/compress_io.h | 38 +---
 src/bin/pg_dump/compress_lz4.c| 85 +++
 src/bin/pg_dump/compress_none.c   | 41 -
 src/bin/pg_dump/pg_backup_archiver.c  | 18 ++
 src/bin/pg_dump/pg_backup_directory.c | 36 ++--
 7 files changed, 148 insertions(+), 115 deletions(-)

diff --git a/src/bin/pg_dump/compress_gzip.c b/src/bin/pg_dump/compress_gzip.c
index 0af65afeb4..d9c3969332 100644
--- a/src/bin/pg_dump/compress_gzip.c
+++ b/src/bin/pg_dump/compress_gzip.c
@@ -233,14 +233,14 @@ InitCompressorGzip(CompressorState *cs,
  *--
  */
 
-static size_t
-Gzip_read(void *ptr, size_t size, CompressFileHandle *CFH)
+static bool
+Gzip_read(void *ptr, size_t size, size_t *rsize, CompressFileHandle *CFH)
 {
 	gzFile		gzfp = (gzFile) CFH->private_data;
-	size_t		ret;
+	int			gzret;
 
-	ret = gzread(gzfp, ptr, size);
-	if (ret != size && !gzeof(gzfp))
+	gzret = gzread(gzfp, ptr, size);
+	if (gzret <= 0 && !gzeof(gzfp))
 	{
 		int			errnum;
 		const char *errmsg = gzerror(gzfp, );
@@ -249,15 +249,18 @@ Gzip_read(void *ptr, size_t size, CompressFileHandle *CFH)
  errnum == Z_ERRNO ? strerror(errno) : errmsg);
 	}
 
-	return ret;
+	if (rsize)
+		*rsize = (size_t) gzret;
+
+	return true;
 }
 
-static size_t
+static bool
 Gzip_write(const void *ptr, size_t size, CompressFileHandle *CFH)
 {
 	gzFile		gzfp = (gzFile) CFH->private_data;
 
-	return gzwrite(gzfp, ptr, size);
+	return gzwrite(gzfp, ptr, size) > 0;
 }
 
 static int
@@ -287,22 +290,22 @@ Gzip_gets(char *ptr, int size, CompressFileHandle *CFH)
 	return gzgets(gzfp, ptr, size);
 }
 
-static int
+static bool
 Gzip_close(CompressFileHandle *CFH)
 {
 	gzFile		gzfp = (gzFile) CFH->private_data;
 
 	CFH->private_data = NULL;
 
-	return gzclose(gzfp);
+	return gzclose(gzfp) == Z_OK;
 }
 
-static int
+static bool
 Gzip_eof(CompressFileHandle *CFH)
 {
 	gzFile		gzfp = (gzFile) CFH->private_data;
 
-	return gzeof(gzfp);
+	return gzeof(gzfp) == 1;
 }
 
 static const char *
@@ -319,7 +322,7 @@ Gzip_get_error(CompressFileHandle *CFH)
 	return errmsg;
 }
 
-static int
+static bool
 Gzip_open(const char *path, int fd, const char *mode, CompressFileHandle *CFH)
 {
 	gzFile		gzfp;
@@ -342,18 +345,18 @@ Gzip_open(const char *path, int fd, const char *mode, CompressFileHandle *CFH)
 		gzfp = 

Adjust Memoize hit_ratio calculation

2023-03-20 Thread David Rowley
Yesterday, in 785f70957, I adjusted the Memoize costing code to
account for the size of the cache key when estimating how many cache
entries can exist at once in the cache.  That effectively makes
Memoize a less likely choice as fewer entries will be expected to fit
in work_mem now.

Because that's being changed in v16, I think it might also be a good
idea to fix the hit_ratio calculation problem reported by David
Johnston in [1].  In the attached, I've adjusted David's calculation
slightly so that we divide by Max(ndistinct, est_cache_entries)
instead of ndistinct.  This saves from overestimating when ndistinct
is smaller than est_cache_entries.  I'd rather fix this now for v16
than wait until v17 and further adjust the Memoize costing.

I've attached a spreadsheet showing the new and old hit_ration
calculations. Cells C1 - C3 can be adjusted to show what the hit ratio
is for both the old and new method.

Any objections?

David

[1] 
https://postgr.es/m/cakfquwzemcnk3yqo2xj4eduody6qakad31rod1vc4q1_s68...@mail.gmail.com


adjust_memoize_hit_ratio_calculation.patch
Description: Binary data


memoize_cache_hits.ods
Description: application/vnd.oasis.opendocument.spreadsheet


Re: Experiments with Postgres and SSL

2023-03-20 Thread Greg Stark
Sorry, checking the cfbot apparently I had a typo in the #ifndef USE_SSL case.
From 4b6e01c7f569a919d660cd80ce64cb913bc9f220 Mon Sep 17 00:00:00 2001
From: Greg Stark 
Date: Mon, 20 Mar 2023 14:09:30 -0400
Subject: [PATCH v4 4/4] alpn support

---
 src/backend/libpq/be-secure-openssl.c| 65 
 src/backend/libpq/be-secure.c|  3 ++
 src/backend/postmaster/postmaster.c  | 23 +
 src/bin/psql/command.c   |  7 ++-
 src/include/libpq/libpq-be.h |  1 +
 src/include/libpq/libpq.h|  1 +
 src/interfaces/libpq/fe-connect.c|  5 ++
 src/interfaces/libpq/fe-secure-openssl.c | 25 +
 src/interfaces/libpq/libpq-int.h |  1 +
 9 files changed, 129 insertions(+), 2 deletions(-)

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 685aa2ed69..034e1cf2ec 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -67,6 +67,12 @@ static int	ssl_external_passwd_cb(char *buf, int size, int rwflag, void *userdat
 static int	dummy_ssl_passwd_cb(char *buf, int size, int rwflag, void *userdata);
 static int	verify_cb(int ok, X509_STORE_CTX *ctx);
 static void info_cb(const SSL *ssl, int type, int args);
+static int  alpn_cb(SSL *ssl,
+	const unsigned char **out,
+	unsigned char *outlen,
+	const unsigned char *in,
+	unsigned int inlen,
+	void *userdata);
 static bool initialize_dh(SSL_CTX *context, bool isServerStart);
 static bool initialize_ecdh(SSL_CTX *context, bool isServerStart);
 static const char *SSLerrmessage(unsigned long ecode);
@@ -432,6 +438,11 @@ be_tls_open_server(Port *port)
 	/* set up debugging/info callback */
 	SSL_CTX_set_info_callback(SSL_context, info_cb);
 
+	if (ssl_enable_alpn) {
+		elog(WARNING, "Enabling ALPN Callback");
+		SSL_CTX_set_alpn_select_cb(SSL_context, alpn_cb, port);
+	}
+
 	if (!(port->ssl = SSL_new(SSL_context)))
 	{
 		ereport(COMMERROR,
@@ -1250,6 +1261,60 @@ info_cb(const SSL *ssl, int type, int args)
 	}
 }
 
+static unsigned char alpn_protos[] = {
+	12, 'P','o','s','t','g','r','e','s','/','3','.','0'
+};
+static unsigned int alpn_protos_len = sizeof(alpn_protos);
+
+/*
+ * Server callback for ALPN negotiation. We use use the standard "helper"
+ * function even though currently we only accept one value. We store the
+ * negotiated protocol in Port->ssl_alpn_protocol and rely on higher level
+ * logic (in postmaster.c) to decide what to do with that info.
+ *
+ * XXX Not sure if it's kosher to use palloc and elog() inside OpenSSL
+ * callbacks. If we throw an error from here is there a risk of messing up
+ * OpenSSL data structures? It's possible it's ok becuase this is only called
+ * during connection negotiation which we don't try to recover from.
+ */
+static int  alpn_cb(SSL *ssl,
+	const unsigned char **out,
+	unsigned char *outlen,
+	const unsigned char *in,
+	unsigned int inlen,
+	void *userdata)
+{
+	/* why does OpenSSL provide a helper function that requires a nonconst
+	 * vector when the callback is declared to take a const vector? What are
+	 * we to do with that?*/
+	int retval;
+	Assert(userdata != NULL);
+	Assert(out != NULL);
+	Assert(outlen != NULL);
+	Assert(in != NULL);
+
+	retval = SSL_select_next_proto((unsigned char **)out, outlen,
+   alpn_protos, alpn_protos_len,
+   in, inlen);
+	if (*out == NULL || *outlen > alpn_protos_len || outlen <= 0)
+		elog(ERROR, "SSL_select_next_proto returned nonsensical results");
+
+	if (retval == OPENSSL_NPN_NEGOTIATED)
+	{
+		struct Port *port = (struct Port *)userdata;
+		
+		port->ssl_alpn_protocol = palloc0(*outlen+1);
+		/* SSL uses unsigned chars but Postgres uses host signedness of chars */
+		strncpy(port->ssl_alpn_protocol, (char*)*out, *outlen);
+		return SSL_TLSEXT_ERR_OK;
+	} else if (retval == OPENSSL_NPN_NO_OVERLAP) {
+		return SSL_TLSEXT_ERR_NOACK;
+	} else {
+		return SSL_TLSEXT_ERR_NOACK;
+	}
+}
+
+
 /*
  * Set DH parameters for generating ephemeral DH keys.  The
  * DH parameters can take a long time to compute, so they must be
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 1020b3adb0..79a61900ba 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -61,6 +61,9 @@ bool		SSLPreferServerCiphers;
 int			ssl_min_protocol_version = PG_TLS1_2_VERSION;
 int			ssl_max_protocol_version = PG_TLS_ANY;
 
+/* GUC variable: if false ignore ALPN negotiation */
+bool		ssl_enable_alpn = true;
+
 /*  */
 /*			 Procedures common to all secure sessions			*/
 /*  */
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index ec1d895a23..2640b69fed 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1934,6 +1934,8 @@ ServerLoop(void)

Re: Commitfest 2023-03 starting tomorrow!

2023-03-20 Thread Thomas Munro
On Tue, Mar 21, 2023 at 9:22 AM Greg Stark  wrote:
> Yeah, even aside from flappers there's the problem that it's about as
> common for real commits to break some test as it is for patches to
> start failing tests. So often it's a real failure but it's nothing to
> do with the patch, it has to do with the commit that went into git.

That is true.  The best solution to that problem, I think, is to teach
cfbot that it should test on top of the most recent master commit that
succeeded in CI here
https://github.com/postgres/postgres/commits/master .  It's on my
list...




Re: Experiments with Postgres and SSL

2023-03-20 Thread Greg Stark
Here's a first cut at ALPN support.

Currently it's using a hard coded "Postgres/3.0" protocol (hard coded
both in the client and the server...). And it's hard coded to be
required for direct connections and supported but not required for
regular connections.

IIRC I put a variable labeled a "GUC" but forgot to actually make it a
GUC. But I'm thinking of maybe removing that variable since I don't
see much of a use case for controlling this manually. I *think* ALPN
is supported by all the versions of OpenSSL we support.

The other patches are unchanged (modulo a free() that I missed in the
client before). They still have the semi-open issues I mentioned in
the previous email.




--
greg
From 32185020927824c4b57af900100a37f92ae6a040 Mon Sep 17 00:00:00 2001
From: Greg Stark 
Date: Mon, 20 Mar 2023 14:09:30 -0400
Subject: [PATCH v3 4/4] alpn support

---
 src/backend/libpq/be-secure-openssl.c| 65 
 src/backend/libpq/be-secure.c|  3 ++
 src/backend/postmaster/postmaster.c  | 23 +
 src/bin/psql/command.c   |  7 ++-
 src/include/libpq/libpq-be.h |  1 +
 src/include/libpq/libpq.h|  1 +
 src/interfaces/libpq/fe-connect.c|  5 ++
 src/interfaces/libpq/fe-secure-openssl.c | 25 +
 src/interfaces/libpq/libpq-int.h |  1 +
 9 files changed, 129 insertions(+), 2 deletions(-)

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 685aa2ed69..034e1cf2ec 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -67,6 +67,12 @@ static int	ssl_external_passwd_cb(char *buf, int size, int rwflag, void *userdat
 static int	dummy_ssl_passwd_cb(char *buf, int size, int rwflag, void *userdata);
 static int	verify_cb(int ok, X509_STORE_CTX *ctx);
 static void info_cb(const SSL *ssl, int type, int args);
+static int  alpn_cb(SSL *ssl,
+	const unsigned char **out,
+	unsigned char *outlen,
+	const unsigned char *in,
+	unsigned int inlen,
+	void *userdata);
 static bool initialize_dh(SSL_CTX *context, bool isServerStart);
 static bool initialize_ecdh(SSL_CTX *context, bool isServerStart);
 static const char *SSLerrmessage(unsigned long ecode);
@@ -432,6 +438,11 @@ be_tls_open_server(Port *port)
 	/* set up debugging/info callback */
 	SSL_CTX_set_info_callback(SSL_context, info_cb);
 
+	if (ssl_enable_alpn) {
+		elog(WARNING, "Enabling ALPN Callback");
+		SSL_CTX_set_alpn_select_cb(SSL_context, alpn_cb, port);
+	}
+
 	if (!(port->ssl = SSL_new(SSL_context)))
 	{
 		ereport(COMMERROR,
@@ -1250,6 +1261,60 @@ info_cb(const SSL *ssl, int type, int args)
 	}
 }
 
+static unsigned char alpn_protos[] = {
+	12, 'P','o','s','t','g','r','e','s','/','3','.','0'
+};
+static unsigned int alpn_protos_len = sizeof(alpn_protos);
+
+/*
+ * Server callback for ALPN negotiation. We use use the standard "helper"
+ * function even though currently we only accept one value. We store the
+ * negotiated protocol in Port->ssl_alpn_protocol and rely on higher level
+ * logic (in postmaster.c) to decide what to do with that info.
+ *
+ * XXX Not sure if it's kosher to use palloc and elog() inside OpenSSL
+ * callbacks. If we throw an error from here is there a risk of messing up
+ * OpenSSL data structures? It's possible it's ok becuase this is only called
+ * during connection negotiation which we don't try to recover from.
+ */
+static int  alpn_cb(SSL *ssl,
+	const unsigned char **out,
+	unsigned char *outlen,
+	const unsigned char *in,
+	unsigned int inlen,
+	void *userdata)
+{
+	/* why does OpenSSL provide a helper function that requires a nonconst
+	 * vector when the callback is declared to take a const vector? What are
+	 * we to do with that?*/
+	int retval;
+	Assert(userdata != NULL);
+	Assert(out != NULL);
+	Assert(outlen != NULL);
+	Assert(in != NULL);
+
+	retval = SSL_select_next_proto((unsigned char **)out, outlen,
+   alpn_protos, alpn_protos_len,
+   in, inlen);
+	if (*out == NULL || *outlen > alpn_protos_len || outlen <= 0)
+		elog(ERROR, "SSL_select_next_proto returned nonsensical results");
+
+	if (retval == OPENSSL_NPN_NEGOTIATED)
+	{
+		struct Port *port = (struct Port *)userdata;
+		
+		port->ssl_alpn_protocol = palloc0(*outlen+1);
+		/* SSL uses unsigned chars but Postgres uses host signedness of chars */
+		strncpy(port->ssl_alpn_protocol, (char*)*out, *outlen);
+		return SSL_TLSEXT_ERR_OK;
+	} else if (retval == OPENSSL_NPN_NO_OVERLAP) {
+		return SSL_TLSEXT_ERR_NOACK;
+	} else {
+		return SSL_TLSEXT_ERR_NOACK;
+	}
+}
+
+
 /*
  * Set DH parameters for generating ephemeral DH keys.  The
  * DH parameters can take a long time to compute, so they must be
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 1020b3adb0..79a61900ba 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -61,6 +61,9 @@ bool		SSLPreferServerCiphers;
 int		

Re: Commitfest 2023-03 starting tomorrow!

2023-03-20 Thread Greg Stark
On Mon, 20 Mar 2023 at 16:08, Thomas Munro  wrote:
>
> On Tue, Mar 21, 2023 at 3:15 AM Greg Stark  wrote:
> > The next level of this would be something like notifying the committer
> > with a list of patches in the CF that a commit broke. I don't
> > immediately see how to integrate that with our workflow but I have
> > seen something like this work well in a previous job. When committing
> > code you often went and updated other unrelated projects to adapt to
> > the new API (or could adjust the code you were committing to cause
> > less breakage).
>
> I've been hesitant to make it send email.  The most obvious message to
> send would be "hello, you posted a patch, but it fails on CI" to the
> submitter.

Yeah, even aside from flappers there's the problem that it's about as
common for real commits to break some test as it is for patches to
start failing tests. So often it's a real failure but it's nothing to
do with the patch, it has to do with the commit that went into git.

What I'm interested in is something like "hey, your commit caused 17
patches to start failing tests". And it doesn't necessarily have to be
an email. Having a historical page so when I look at a patch I can go
check, "hey did this start failing at the same time as 17 other
patches on the same commit?" would be the same question.


-- 
greg




Re: Add SHELL_EXIT_CODE to psql

2023-03-20 Thread Corey Huinker
On Mon, Mar 20, 2023 at 1:01 PM Tom Lane  wrote:

> Corey Huinker  writes:
> > 128+N is implemented.
>
> I think this mostly looks OK, but:
>
> * I still say there is no basis whatever for believing that the result
> of ferror() is an exit code, errno code, or anything else with
> significance beyond zero-or-not.  Feeding it to wait_result_to_exit_code
> as you've done here is not going to do anything but mislead people in
> a platform-dependent way.  Probably should set exit_code to -1 if
> ferror reports trouble.
>

Agreed. Sorry, I read your comment wrong the last time or I would have
already made it so.


>
> * Why do you have wait_result_to_exit_code defaulting to return 0
> if it doesn't recognize the code as either WIFEXITED or WIFSIGNALED?
> That seems pretty misleading; again -1 would be a better idea.
>

That makes sense as well. Given that WIFSIGNALED is currently defined as
the negation of WIFEXITED, whatever default result we have here is
basically a this-should-never-happen. That might be something we want to
catch with an assert, but I'm fine with a -1 for now.
From 1ff2b5ba4b82efca0edf60a9fb1cdf8307471eee Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Fri, 17 Mar 2023 06:43:24 -0400
Subject: [PATCH v11 1/3] Add PG_OS_TARGET environment variable to enable
 OS-specific changes to regression tests.

---
 src/test/regress/pg_regress.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 7b23cc80dc..333aaab139 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -594,6 +594,17 @@ initialize_environment(void)
 #endif
 	}
 
+		/*
+		 * Regression tests that invoke OS commands must occasionally use
+		 * OS-specific syntax (example: NUL vs /dev/null). This env var allows
+		 * those tests to adjust commands accordingly.
+		 */
+#if defined(WIN32)
+		setenv("PG_OS_TARGET", "WIN32", 1);
+#else
+		setenv("PG_OS_TARGET", "OTHER", 1);
+#endif
+
 	/*
 	 * Set translation-related settings to English; otherwise psql will
 	 * produce translated messages and produce diffs.  (XXX If we ever support
-- 
2.39.2

From 3d9c43cfec7f60785c1f1fa1aaa3e1b48224ef98 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Mon, 20 Mar 2023 16:05:10 -0400
Subject: [PATCH v11 2/3] Add wait_result_to_exit_code().

---
 src/common/wait_error.c | 15 +++
 src/include/port.h  |  1 +
 2 files changed, 16 insertions(+)

diff --git a/src/common/wait_error.c b/src/common/wait_error.c
index 4a3c3c61af..80b94cae51 100644
--- a/src/common/wait_error.c
+++ b/src/common/wait_error.c
@@ -127,3 +127,18 @@ wait_result_is_any_signal(int exit_status, bool include_command_not_found)
 		return true;
 	return false;
 }
+
+/*
+ * Return the POSIX exit code (0 to 255) that corresponds to the argument.
+ * The argument is a return code returned by wait(2) or waitpid(2), which
+ * also applies to pclose(3) and system(3).
+ */
+int
+wait_result_to_exit_code(int exit_status)
+{
+	if (WIFEXITED(exit_status))
+		return WEXITSTATUS(exit_status);
+	if (WIFSIGNALED(exit_status))
+		return 128 + WTERMSIG(exit_status);
+	return -1;
+}
diff --git a/src/include/port.h b/src/include/port.h
index e66193bed9..1b119a3c56 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -495,6 +495,7 @@ extern char *escape_single_quotes_ascii(const char *src);
 extern char *wait_result_to_str(int exitstatus);
 extern bool wait_result_is_signal(int exit_status, int signum);
 extern bool wait_result_is_any_signal(int exit_status, bool include_command_not_found);
+extern int wait_result_to_exit_code(int exit_status);
 
 /*
  * Interfaces that we assume all Unix system have.  We retain individual macros
-- 
2.39.2

From bf65f36fd57977acaac57972425d1717a205cb72 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Mon, 20 Mar 2023 16:05:23 -0400
Subject: [PATCH v11 3/3] Add psql variables SHELL_ERROR and SHELL_EXIT_CODE
 which report the success or failure of the most recent psql OS-command
 executed via the \! command or `backticks`. These variables are roughly
 analogous to ERROR and SQLSTATE, but for OS level operations instead of SQL
 commands.

---
 doc/src/sgml/ref/psql-ref.sgml | 21 ++
 src/bin/psql/command.c | 15 +
 src/bin/psql/help.c|  4 
 src/bin/psql/psqlscanslash.l   | 33 +
 src/test/regress/expected/psql.out | 34 ++
 src/test/regress/sql/psql.sql  | 30 ++
 6 files changed, 133 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 7b8ae9fac3..e6e307fd18 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -4267,6 +4267,27 @@ bar
 
   
 
+  
+   SHELL_ERROR
+   
+
+ true if the last shell command failed, false if
+ it succeeded. This applies to shell 

Re: [PATCH] Add <> support to sepgsql_restorecon

2023-03-20 Thread Ted Toth
Not this month unfortunately other work has taken precedence. I'll need to
look at what it's going to take to create a test. Hopefully I can piggyback
on an existing test.

Ted

On Mon, Mar 20, 2023 at 3:05 PM Gregory Stark (as CFM) 
wrote:

> > Ok, sounds reasonable. Maybe just add a comment to that effect.
>
> > This needs regression test support for the feature and some minimal
> documentation that shows how to make use of it.
>
> Hm. It sounds like this patch is uncontroversial but is missing
> documentation and tests? Has this been addressed? Do you think you'll
> get a chance to resolve those issues this month in time for this
> release?
>
> --
> Gregory Stark
> As Commitfest Manager
>


Re: Commitfest 2023-03 starting tomorrow!

2023-03-20 Thread Thomas Munro
On Tue, Mar 21, 2023 at 3:15 AM Greg Stark  wrote:
> The next level of this would be something like notifying the committer
> with a list of patches in the CF that a commit broke. I don't
> immediately see how to integrate that with our workflow but I have
> seen something like this work well in a previous job. When committing
> code you often went and updated other unrelated projects to adapt to
> the new API (or could adjust the code you were committing to cause
> less breakage).

I've been hesitant to make it send email.  The most obvious message to
send would be "hello, you posted a patch, but it fails on CI" to the
submitter.  Cfbot has been running for about 5 years now, and I'd say
the rate of spurious/bogus failures has come down a lot over that time
as we've chased down the flappers in master, but it's still enough
that you would quickly become desensitised/annoyed by the emails, I
guess (one of the reasons I recently started keeping history is to be
able to do some analysis of that so we can direct attention to chasing
down the rest, or have some smart detection of known but not yet
resolved flappers, I dunno, something like that).  Speaking as someone
who occasionally sends patches to other projects, it's confusing and
unsettling when you get automated emails from github PRs telling you
that this broke, that broke and the other broke, but only the project
insiders know which things are newsworthy and which things are "oh
yeah that test is a bit noisy, ignore that one".




Re: [PATCH] Add <> support to sepgsql_restorecon

2023-03-20 Thread Gregory Stark (as CFM)
> Ok, sounds reasonable. Maybe just add a comment to that effect.

> This needs regression test support for the feature and some minimal 
> documentation that shows how to make use of it.

Hm. It sounds like this patch is uncontroversial but is missing
documentation and tests? Has this been addressed? Do you think you'll
get a chance to resolve those issues this month in time for this
release?

-- 
Gregory Stark
As Commitfest Manager




Re: Inconsistency in reporting checkpointer stats

2023-03-20 Thread Gregory Stark (as CFM)
On Sun, 19 Feb 2023 at 04:58, Nitin Jadhav
 wrote:
>
> > This doesn't pass the tests, because the regression tests weren't adjusted:
> > https://api.cirrus-ci.com/v1/artifact/task/5937624817336320/testrun/build/testrun/regress/regress/regression.diffs
>
> Thanks for sharing this. I have fixed this in the patch attached.

AFAICS this patch was marked Waiting on Author due to this feedback on
Feb 14 but the patch was updated Feb 19 and is still passing today.
I've marked it Needs Review.

I'm not sure if all of the feedback from Bharath Rupireddy and Kyotaro
Horiguchi was addressed.

If there's any specific questions remaining you need feedback on it
would be good to ask explicitly. Otherwise if you think it's ready you
could mark it Ready for Commit.




-- 
Gregory Stark
As Commitfest Manager




Re: Optimizing Node Files Support

2023-03-20 Thread Gregory Stark (as CFM)
I think it's clear we aren't going to be taking this patch in its
current form. Perhaps there are better ways to do what these files do
(I sure think there are!) but I don't think microoptimizing the
copying is something people are super excited about. It sounds like
rethinking how to make these functions more convenient for programmers
to maintain reliably would be more valuable.

I guess I'll mark it Returned with Feedback -- if there are
significant performance gains to show without making the code harder
to maintain and/or a nicer way to structure this code in general then
we can revisit this.

-- 
Gregory Stark
As Commitfest Manager




Re: pg_stats and range statistics

2023-03-20 Thread Egor Rogov

On 20.03.2023 22:27, Gregory Stark (as CFM) wrote:

On Sun, 22 Jan 2023 at 18:22, Tomas Vondra
 wrote:

I wonder if we have other functions doing something similar, i.e.
accepting a polymorphic type and then imposing additional restrictions
on it.

Meh, there's things like array comparison functions that require both
arguments to be the same kind of arrays. And array_agg that requires
the elements to be the same type as the state array (ie, same type as
the first element). Not sure there are any taking just one specific
type though.


Shouldn't this add some sql tests ?

Yeah, I guess we should have a couple tests calling these functions on
different range arrays.

This reminds me lower()/upper() have some extra rules about handling
empty ranges / infinite boundaries etc. These functions should behave
consistently (as if we called lower() in a loop) and I'm pretty sure
that's not the current state.

Are we still waiting on these two items? Egor, do you think you'll
have a chance to work it for this month?



I can try to tidy things up, but I'm not sure if we reached a consensus.

Do we stick with the ranges_upper(anyarray) and ranges_lower(anyarray) 
functions? This approach is okay with me. Tomas, have you made up your mind?


Do we want to document these functions? They are very 
pg_statistic-specific and won't be useful for end users imo.







Re: Save a few bytes in pg_attribute

2023-03-20 Thread David Rowley
On Mon, 20 Mar 2023, 11:00 pm Peter Eisentraut,
 wrote:
> After the discussion in [0] ff., I was looking around in pg_attribute
> and noticed that we could possibly save 4 bytes.  We could change both
> attstattarget and attinhcount from int4 to int2, which together with
> some reordering would save 4 bytes from the fixed portion.

I just want to highlight 1ef61ddce9, which fixed a very long-standing
bug that meant that pg_inherits.inhseqno was effectively just 16-bit.
Perhaps because nobody seemed to report that as a limitation 16 bits
is enough space. I only noticed that as a bug from code reading.

David




Re: [PATCH] Introduce array_shuffle() and array_sample()

2023-03-20 Thread Gregory Stark (as CFM)
On Thu, 29 Sept 2022 at 15:34, Tom Lane  wrote:
>
> Martin Kalcher  writes:
> > New patch: array_shuffle() and array_sample() use pg_global_prng_state now.
>
> I took a closer look at the patch today.  I find this behavior a bit
> surprising:
>

It looks like this patch received useful feedback and it wouldn't take
much to push it over the line. But it's been Waiting On Author since
last September.

Martin, any chance of getting these last bits of feedback resolved so
it can be Ready for Commit?

-- 
Gregory Stark
As Commitfest Manager




Re: [PATCH] Add initial xid/mxid/mxoff to initdb

2023-03-20 Thread Gregory Stark (as CFM)
On Fri, 25 Nov 2022 at 07:22, Peter Eisentraut
 wrote:
>
> Just for completeness, over in the other thread the feedback was that
> this functionality is better put into pg_resetwal.

So is that other thread tracked in a different commitfest entry and
this one completely redundant? I'll mark it Rejected then?

-- 
Gregory Stark
As Commitfest Manager




Re: pg_stats and range statistics

2023-03-20 Thread Gregory Stark (as CFM)
On Sun, 22 Jan 2023 at 18:22, Tomas Vondra
 wrote:
>
> I wonder if we have other functions doing something similar, i.e.
> accepting a polymorphic type and then imposing additional restrictions
> on it.

Meh, there's things like array comparison functions that require both
arguments to be the same kind of arrays. And array_agg that requires
the elements to be the same type as the state array (ie, same type as
the first element). Not sure there are any taking just one specific
type though.

> > Shouldn't this add some sql tests ?
>
> Yeah, I guess we should have a couple tests calling these functions on
> different range arrays.
>
> This reminds me lower()/upper() have some extra rules about handling
> empty ranges / infinite boundaries etc. These functions should behave
> consistently (as if we called lower() in a loop) and I'm pretty sure
> that's not the current state.

Are we still waiting on these two items? Egor, do you think you'll
have a chance to work it for this month?

-- 
Gregory Stark
As Commitfest Manager




Re: meson documentation build open issues

2023-03-20 Thread Andres Freund
Hi,

On 2023-03-19 19:33:38 -0700, Andres Freund wrote:
> I think we can make the docs build in parallel and incrementally, by building
> the different parts of the docs in parallel, using --stringparam rootid,
> e.g. building each 'part' separately.
> 
> A very very rough draft attached:
> 
> parallel with parts:
> real  0m10.831s
> user  0m58.295s
> sys   0m1.402s
> 
> normal:
> real  0m32.215s
> user  0m31.876s
> sys   0m0.328s
> 
> 1/3 of the build time at 2x the cost is nothing to sneeze at.

I could not make myself stop trying to figure out where the big constant time
factor comes from. Every invocation costs about 2s, even if not much is
rendered. Turns out, that's solely spent building all the s. The
first time *any* key() is invoked for a document, all the keys are computed in
a single pass over the document.

A single reasonable key doesn't take that much time, even for the size of our
docs. But there are several redundant keys being built. Some of them somewhat
expensive. E.g. each

takes about 300ms. There's one in chunk-common and one in
docbook-no-doctype.xsl.

I'm going to cry now.

Greetings,

Andres Freund




Re: Request for comment on setting binary format output per session

2023-03-20 Thread Jeff Davis
On Mon, 2023-03-20 at 14:36 -0400, Dave Cramer wrote:
> Thanks for the review. I'm curious what system you are running on as
> I don't see any of these errors. 

Are asserts enabled?

> Well I'm guessing psql doesn't know how to read date or timestamptz
> in binary. This is not a failing of the code.

It seems strange, and potentially dangerous, to send binary data to a
client that's not expecting it. It feels too easy to cause confusion by
changing the GUC mid-session.

Also, it seems like DISCARD ALL is not resetting it, which I think is a
bug.

> 
> This is an interesting question. If the type isn't visible then it's
> not visible to the query so 

I don't think that's true -- the type could be in a different schema
from the table.

> > 
> > 5. There's a theoretical invalidation problem. It might also be a
> > practical problem in some testing setups with long-lived
> > connections
> > that are recreating user-defined types.
> > 
> 
> UDT's seem to be a problem here which candidly have very little use
> case for binary output. 

I mostly agree with that, but it also might not be hard to support
UDTs. Is there a design problem here or is it "just a matter of code"?

> 
> I didn't try to solve it as Tom was OK with using a GUC. Using a
> startup GUC is interesting, 
> but how would that work with pools where we want to reset the
> connection when we return it and then
> set the binary format on borrow ? By using a GUC when a client
> borrows a connection from a pool the client
> can reconfigure the oids it wants formatted in binary.

That's a good point. How common is it to share a connection pool
between different clients (some of which might support a binary format,
and others which don't)? And would the connection pool put connections
with and without the property in different pools?

> 
> I really hadn't considered supporting type names. I have asked Paul
> Ramsey  about PostGIS and he doesn't see PostGIS using this.

One of the things I like about Postgres is that the features all work
together, and that user-defined objects are generally as good as built-
in ones. Sometimes there's a reason to make a special case (e.g. syntax
support or something), but in this case it seems like we could support
user-defined types just fine, right? It's also just more friendly and
readable to use type names, especially if it's a GUC.

Regards,
Jeff Davis





Re: Request for comment on setting binary format output per session

2023-03-20 Thread Tom Lane
Dave Cramer  writes:
> On Mon, 20 Mar 2023 at 13:05, Jeff Davis  wrote:
>> 2. Easy to confuse psql:
>> 
>> CREATE TABLE a(d date, t timestamptz);
>> SET format_binary='25,1082,1184';
>> SELECT * FROM a;
>> d | t
>> ---+---
>> ! |
>> (1 row)
>> 
>> Well I'm guessing psql doesn't know how to read date or timestamptz in
>> binary. This is not a failing of the code.

What it is is a strong suggestion that controlling this via a GUC is
not a great choice.  There are many inappropriate (wrong abstraction
level) ways to change a GUC and thereby break a client that's not
expecting binary output.  I think Jeff's suggestion that we should
treat this as a protocol extension might be a good idea.

If I recall the protocol-extension design correctly, such a setting
could only be set at session start, which could be annoying --- at the
very least we'd have to tolerate entries for unrecognized data types,
since clients couldn't be expected to have checked the list against
the current server in advance.

regards, tom lane




Re: Eliminating SPI from RI triggers - take 2

2023-03-20 Thread Gregory Stark (as CFM)
On Mon, 17 Oct 2022 at 14:59, Robert Haas  wrote:
>
> On Sat, Oct 15, 2022 at 1:47 AM Amit Langote  wrote:
>
> But I think the bigger problem for this patch set is that the
> design-level feedback from
> https://www.postgresql.org/message-id/CA%2BTgmoaiTNj4DgQy42OT9JmTTP1NWcMV%2Bke0i%3D%2Ba7%3DVgnzqGXw%40mail.gmail.com
> hasn't really been addressed, AFAICS. ri_LookupKeyInPkRelPlanIsValid
> is still trivial in v7, and that still seems wrong to me. And I still
> don't know how we're going to avoid changing the semantics in ways
> that are undesirable, or even knowing precisely what we did change. If
> we don't have answers to those questions, then I suspect that this
> patch set isn't going anywhere.

Amit, do you plan to work on this patch for this commitfest (and
therefore this release?). And do you think it has a realistic chance
of being ready for commit this month?

It looks to me like you have some good feedback and can progress and
are unlikely to finish this patch for this release. In which case
maybe we can move it forward to the next release?

-- 
Gregory Stark
As Commitfest Manager




Re: doc: add missing "id" attributes to extension packaging page

2023-03-20 Thread Gregory Stark (as CFM)
On Thu, 26 Jan 2023 at 15:55, Brar Piening  wrote:
>
> On 18.01.2023 at 06:50, Brar Piening wrote:
>
> > I'll give it a proper look this weekend.
>
> It turns out I didn't get a round tuit.
>
> ... and I'm afraid I probably will not be able to work on this until
> mid/end February so we'll have to move this to the next commitfest until
> somebody wants to take it over and push it through.


Looks like a lot of good work was happening on this patch right up
until mid-February. Is there a lot of work left? Do you think you'll
have a chance to wrap it up this commitfest for this release?


-- 
Gregory Stark
As Commitfest Manager




Re: [PATCH] psql: Add tab-complete for optional view parameters

2023-03-20 Thread Gregory Stark (as CFM)
On Sun, 29 Jan 2023 at 05:20, Mikhail Gribkov  wrote:
>
> The problem is obviously in the newly added second line of the following 
> clause:
> COMPLETE_WITH("ALTER COLUMN", "OWNER TO", "RENAME",
>   "SET SCHEMA", "SET (", "RESET (");
>
> "set schema" and "set (" alternatives are competing, while completion of the 
> common part "set" leads to a string composition which does not have 
> the check branch (Matches("ALTER", "VIEW", MatchAny, "SET")).
>
> I think it may worth looking at "alter materialized view"  completion tree 
> and making "alter view" the same way.
>
> The new status of this patch is: Waiting on Author

I think this patch received real feedback and it looks like it's clear
where there's more work needed. I'll move this to the next commitfest.
If you plan to work on it this month we can always move it back.


-- 
Gregory Stark
As Commitfest Manager




Re: Request for comment on setting binary format output per session

2023-03-20 Thread Jeff Davis
On Mon, 2023-03-20 at 10:04 -0700, Jeff Davis wrote:
>   CREATE TABLE a(d date, t timestamptz);
>   SET format_binary='25,1082,1184';
>   SELECT * FROM a;
>    d | t 
>   ---+---
>    ! | 
>   (1 row)

Oops, missing the following statement after the CREATE TABLE:

  INSERT INTO a VALUES('1234-01-01', '2023-03-20 09:00:00');

Regards,
Jeff Davis





Re: Make ON_ERROR_STOP stop on shell script failure

2023-03-20 Thread Gregory Stark (as CFM)
On Mon, 20 Mar 2023 at 14:34, Gregory Stark (as CFM)
 wrote:
>
> Pruning bouncing email address -- please respond from this point in
> thread, not previous messages.

Oh for heaven's sake. Trying again to prune bouncing email address.
Please respond from *here* on the thread Sorry


-- 
Gregory Stark
As Commitfest Manager




Re: Request for comment on setting binary format output per session

2023-03-20 Thread Dave Cramer
+Paul Ramsey

On Mon, 20 Mar 2023 at 13:05, Jeff Davis  wrote:

> On Mon, 2023-03-13 at 16:33 -0400, Dave Cramer wrote:
> > Attached is a preliminary patch that takes a list of OID's. I'd like
> > to know if this is going in the right direction.
>
>
Thanks for the review. I'm curious what system you are running on as I
don't see any of these errors.

> I found a few issues:
>
> 1. Some kind of memory error:
>
>   SET format_binary='25,1082,1184';
>   WARNING:  problem in alloc set PortalContext: detected write past
> chunk end in block 0x55ba7b5f7610, chunk 0x55ba7b5f7a48
>   ...
>   SET
>
2. Easy to confuse psql:
>
>   CREATE TABLE a(d date, t timestamptz);
>   SET format_binary='25,1082,1184';
>   SELECT * FROM a;
>d | t
>   ---+---
>! |
>   (1 row)
>
> Well I'm guessing psql doesn't know how to read date or timestamptz in
binary. This is not a failing of the code.


> 3. Some style issues
>   - use of "//" comments
>   - findOid should return bool, not int
>
> Sure will fix see attached patch

> When you add support for user-defined types, that introduces a couple
> other issues:
>
> 4. The format_binary GUC would depend on the search_path GUC, which
> isn't great.
>
This is an interesting question. If the type isn't visible then it's not
visible to the query so

>
> 5. There's a theoretical invalidation problem. It might also be a
> practical problem in some testing setups with long-lived connections
> that are recreating user-defined types.
>
UDT's seem to be a problem here which candidly have very little use case
for binary output.

>
>
> We've had this problem with binary for a long time, and it seems
> desirable to solve it. But I'm not sure GUCs are the right way.
>
> How hard did you try to solve it in the protocol rather than with a
> GUC? I see that the startup message allows protocol extensions by
> prefixing a parameter name with "_pq_". Are protocol extensions
> documented somewhere and would that be a reasonable thing to do here?
>

I didn't try to solve it as Tom was OK with using a GUC. Using a startup
GUC is interesting,
but how would that work with pools where we want to reset the connection
when we return it and then
set the binary format on borrow ? By using a GUC when a client borrows a
connection from a pool the client
can reconfigure the oids it wants formatted in binary.

>
> Also, if we're going to make the binary format more practical to use,
> can we document the expectations better?

Yes we can do that.

> It seems the expecatation is
> that the binary format just never changes, and that if it does, that's
> a new type name.
>
> I really hadn't considered supporting type names. I have asked Paul
Ramsey  about PostGIS and he doesn't see PostGIS using this.


> Regards,
> Jeff Davis
>
>


0002-style-issues-with.patch
Description: Binary data


Re: Make ON_ERROR_STOP stop on shell script failure

2023-03-20 Thread Gregory Stark (as CFM)
Pruning bouncing email address -- please respond from this point in
thread, not previous messages.


-- 
Gregory Stark
As Commitfest Manager




Re: Make ON_ERROR_STOP stop on shell script failure

2023-03-20 Thread Greg Stark
So I took a look at this patch. The conflict is with 2fe3bdbd691
committed by Peter Eisentraut which added error checks for pipes.
Afaics the behaviour is now for pipe commands returning non-zero to
cause an error *always* regardless of the setting of ON_ERROR_STOP.

I'm not entirely sure that's sensible actually. If you write to a pipe
that ends in grep and it happens to produce no matching rows you may
actually be quite surprised when that causes your script to fail...

But if you remove that failing hunk the resulting patch does apply. I
don't see any tests so ... I don't know if the behaviour is still
sensible. A quick read gives me the impression that now it's actually
inconsistent in the other direction where it stops sometimes more
often than the user might expect.

I also don't understand the difference between ON_ERROR_STOP_ON and
ON_ERROR_STOP_ALL. Nor why we would want ON_ERROR_STOP_SHELL which
stops only on shell errors, rather than, say, ON_ERROR_STOP_SQL to do
the converse which would at least match the historical behaviour?
From 1f0feb9daf106721cb7fcba39ef7d663178f8ed1 Mon Sep 17 00:00:00 2001
From: Greg Stark 
Date: Mon, 20 Mar 2023 14:25:22 -0400
Subject: [PATCH v5] on error stop for shell

---
 doc/src/sgml/ref/psql-ref.sgml |  7 +++-
 src/bin/psql/command.c | 20 ++--
 src/bin/psql/common.c  | 58 +++---
 src/bin/psql/psqlscanslash.l   | 29 +++--
 src/bin/psql/settings.h| 10 +-
 src/bin/psql/startup.c | 29 +
 6 files changed, 134 insertions(+), 19 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 7b8ae9fac3..856bb5716f 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -4194,7 +4194,12 @@ bar
 
 By default, command processing continues after an error.  When this
 variable is set to on, processing will instead stop
-immediately.  In interactive mode,
+immediately upon an error in SQL query. When this variable is set to
+shell, a nonzero exit status of a shell command,
+which indicates failure, is interpreted as an error that stops the processing.
+When this variable is set to all, errors from both
+SQL queries and shell commands can stop the processing.
+In interactive mode,
 psql will return to the command prompt;
 otherwise, psql will exit, returning
 error code 3 to distinguish this case from fatal error
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 61ec049f05..9fbf2522ea 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -34,6 +34,7 @@
 #include "describe.h"
 #include "fe_utils/cancel.h"
 #include "fe_utils/print.h"
+#include "fe_utils/psqlscan_int.h"
 #include "fe_utils/string_utils.h"
 #include "help.h"
 #include "input.h"
@@ -2394,9 +2395,13 @@ exec_command_set(PsqlScanState scan_state, bool active_branch)
 			 */
 			char	   *newval;
 			char	   *opt;
+			PQExpBuffer output_buf = scan_state->output_buf;
 
 			opt = psql_scan_slash_option(scan_state,
 		 OT_NORMAL, NULL, false);
+			if (output_buf->len >= output_buf->maxlen
+&& (pset.on_error_stop == PSQL_ERROR_STOP_SHELL || pset.on_error_stop == PSQL_ERROR_STOP_ALL))
+success = false;
 			newval = pg_strdup(opt ? opt : "");
 			free(opt);
 
@@ -5041,10 +5046,19 @@ do_shell(const char *command)
 	else
 		result = system(command);
 
-	if (result == 127 || result == -1)
+	if (result != 0)
 	{
-		pg_log_error("\\!: failed");
-		return false;
+		if (result < 0)
+			pg_log_error("could not execute command: %m");
+		else
+		{
+			char *reason = wait_result_to_str(result);
+
+			pg_log_error("%s: %s", command, reason ? reason : "");
+			free(reason);
+		}
+		if (pset.on_error_stop == PSQL_ERROR_STOP_SHELL || pset.on_error_stop == PSQL_ERROR_STOP_ALL)
+			return false;
 	}
 	return true;
 }
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index f907f5d4e8..c87090a75d 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -94,6 +94,7 @@ setQFout(const char *fname)
 {
 	FILE	   *fout;
 	bool		is_pipe;
+	bool		status = true;
 
 	/* First make sure we can open the new output file/pipe */
 	if (!openQueryOutputFile(fname, , _pipe))
@@ -103,7 +104,24 @@ setQFout(const char *fname)
 	if (pset.queryFout && pset.queryFout != stdout && pset.queryFout != stderr)
 	{
 		if (pset.queryFoutPipe)
-			pclose(pset.queryFout);
+		{
+			int pclose_rc = pclose(pset.queryFout);
+			if (pclose_rc != 0)
+			{
+if (pclose_rc < 0)
+	pg_log_error("could not close pipe to external command: %m");
+else
+{
+	char *reason = wait_result_to_str(pclose_rc);
+
+	pg_log_error("command failure: %s",
+ reason ? reason : "");
+	free(reason);
+}
+if ((pset.on_error_stop == PSQL_ERROR_STOP_SHELL || pset.on_error_stop == PSQL_ERROR_STOP_ALL))
+	status = false;
+			}
+		}
 		else
 		

Re: Save a few bytes in pg_attribute

2023-03-20 Thread Tom Lane
Tomas Vondra  writes:
> IMHO it'd be much better to just not store the statistics target for
> attributes that have it default (which we now identify by -1), or for
> system attributes (where we store 0). I'd bet vast majority of systems
> will just use the default / GUC value. So if we're interested in saving
> these bytes, we could just store NULL in these cases, no?

Hmm, we'd have to move it to the nullable part of the row and expend
more code to fetch it; but I don't think it's touched in many places,
so that might be a good tradeoff.  Couple of notes:

* As things stand I think we have a null bitmap in every row of
pg_attribute already (surely attfdwoptions and attmissingval are never
both filled), so there's no extra cost there.

* Putting it in the variable part of the row means it wouldn't appear
in tuple descriptors, but that seems fine.

I wonder if the same is true of attinhcount.  Since it's nonzero for
partitions' attributes, it might be non-null in a fairly large fraction
of pg_attribute rows in some use-cases, but it still seems like we'd not
be losing anything.  It wouldn't need to be touched in any
high-performance code paths AFAICS.

regards, tom lane




Re: [PATCH] Fix alter subscription concurrency errors

2023-03-20 Thread Tom Lane
"Gregory Stark (as CFM)"  writes:
> Hm. This patch is still waiting on updates. But it's a bug fix and it
> would be good to get this in. Is someone else interested in finishing
> this if Jeite isn't available?

I think the patch as-submitted is pretty uninteresting, mainly because the
design of adding bespoke lock code for subscription objects doesn't scale.
I'm not excited about doing this just for subscriptions without at least a
clear plan for working on other object types.

Alvaro suggested switching it to use get_object_address() instead, which'd
be better; but before going in that direction we might have to do more
work on get_object_address's error reporting (note the para in its header
comments saying it's pretty weak on that).

regards, tom lane




Re: Fix order of checking ICU options in initdb and create database

2023-03-20 Thread Gregory Stark (as CFM)
Is this feedback enough to focus the work on the right things?

I feel like Marina Polyakova pointed out some real confusing behaviour
and perhaps there's a way to solve them by focusing on one at a time
without making large changes in the code.

Perhaps an idea would be to have each module provide two functions, one
which is called early and signals an error if that module's parameters
are provided when it's not compiled in, and a second which verifies
that the parameters are consistent at the point in time where that's
appropriate. (Not entirely unlike what we do for GUCs, though simpler)

If every module did that consistently then it would avoid making the
changes "unprincipled" or "spaghetti" though frankly I find words like
that not very helpful to someone receiving that feedback.

The patch is obviously not ready for commit now but it also seems like
the feedback has not been really sufficient for Marina Polyakova to
make progress either.


--
Gregory Stark
As Commitfest Manager




Re: Data is copied twice when specifying both child and parent table in publication

2023-03-20 Thread Jacob Champion
On Fri, Mar 17, 2023 at 9:45 PM Amit Kapila  wrote:
> > There are a bunch of moving parts and hidden subtleties here, and I fell
> > into a few traps when I was working on my patch, so it'd be nice to have
> > additional coverage. I'm happy to contribute effort in that area if it's
> > helpful.
>
> I think it depends on what tests you have in mind.

Just the ones I mentioned, to start with.

> I suggest you can
> propose a patch to cover tests for this are in a separate thread. We
> can then evaluate those separately.

To confirm -- you want me to start a new thread for tests for this
patchset? (Tests written against HEAD would likely be obsoleted by
this change.)

--Jacob




Re: Error "initial slot snapshot too large" in create replication slot

2023-03-20 Thread Gregory Stark (as CFM)
On Wed, 12 Oct 2022 at 01:10, Michael Paquier  wrote:
>
> On Tue, Sep 13, 2022 at 05:31:05PM +0900, Kyotaro Horiguchi wrote:
> > And ExportSnapshot repalces oversized subxip with overflowed.
> >
> > So even when GetSnapshotData() returns a snapshot with oversized
> > subxip, it is saved as just "overflowed" on exporting. I don't think
> > this is the expected behavior since such (no xip and overflowed)
> > snapshot no longer works.
> >
> > On the other hand, it seems to me that snapbuild doesn't like
> > takenDuringRecovery snapshots.
> >
> > So snapshot needs additional flag signals that xip is oversized and
> > all xid are holded there. And also need to let takenDuringRecovery
> > suggest subxip is oversized.
>
> The discussion seems to have stopped here.  As this is classified as a
> bug fix, I have moved this patch to next CF, waiting on author for the
> moment.

Kyotoro Horiguchi, any chance you'll be able to work on this for this
commitfest? If so shout (or anyone else is planning to push it over
the line Andres?) otherwise I'll move it on to the next release.


-- 
Gregory Stark
As Commitfest Manager




Re: Save a few bytes in pg_attribute

2023-03-20 Thread Tomas Vondra



On 3/20/23 15:37, Tom Lane wrote:
> Peter Eisentraut  writes:
>> After the discussion in [0] ff., I was looking around in pg_attribute 
>> and noticed that we could possibly save 4 bytes.  We could change both 
>> attstattarget and attinhcount from int4 to int2, which together with 
>> some reordering would save 4 bytes from the fixed portion.
> 
>> attstattarget is already limited to 1, so this wouldn't lose 
>> anything.  For attinhcount, I don't see any documented limits.  But it 
>> seems unlikely to me that someone would need more than 32k immediate 
>> inheritance parents on a column.  (Maybe an overflow check would be 
>> useful, though, to prevent shenanigans.)
> 
>> The attached patch seems to work.  Thoughts?
> 
> I agree that attinhcount could be narrowed, but I have some concern
> about attstattarget.  IIRC, the limit on attstattarget was once 1000
> and then we raised it to 1.  Is it inconceivable that we might
> want to raise it to 10 someday?
> 

Yeah, I don't think it'd be wise to make it harder to increase the
statistics target limit.

IMHO it'd be much better to just not store the statistics target for
attributes that have it default (which we now identify by -1), or for
system attributes (where we store 0). I'd bet vast majority of systems
will just use the default / GUC value. So if we're interested in saving
these bytes, we could just store NULL in these cases, no?

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [PATCH] Fix alter subscription concurrency errors

2023-03-20 Thread Gregory Stark (as CFM)
On Mon, 16 Jan 2023 at 09:47, vignesh C  wrote:
>
> Jeite, please post an updated version with the fixes. As CommitFest
> 2023-01 is currently underway, this would be an excellent time to
> update the patch.

Hm. This patch is still waiting on updates. But it's a bug fix and it
would be good to get this in. Is someone else interested in finishing
this if Jeite isn't available?

-- 
Gregory Stark
As Commitfest Manager




Re: real/float example for testlibpq3

2023-03-20 Thread Gregory Stark (as CFM)
On Mon, 23 Jan 2023 at 11:54, Mark Wong  wrote:
fficient way to communicate useful information.
>
> Yeah, I will try to cover all the data types we ship. :)  I'll keep at
> it and drop the code examples.

I assume this isn't going to happen for this commitfest? If you think
it is then shout otherwise I'll mark it Returned with Feedback and
move it on to the next release.

-- 
Gregory Stark
As Commitfest Manager




Re: meson documentation build open issues

2023-03-20 Thread Andres Freund
Hi,

On 2023-03-20 11:58:08 +0100, Peter Eisentraut wrote:
> Oh, this patch set grew quite quickly. ;-)

Yep :)


> [PATCH v2 5/8] docs: html: copy images to output as part of xslt build
> 
> Making the XSLT stylesheets do the copying has some appeal.  I think it
> would only work for SVG (or other XML) files, which I guess is okay, but
> maybe the templates should have a filter on format="SVG" or something. Also,
> this copying actually modifies the files in some XML-equivalent way.  Also
> okay, I think, but worth noting.

I think it can be made work for non-xml files with xinclude too. But the
restriction around only working in top-level stylesheets (vs everywhere for
documents) is quite annoying.


> [PATCH v2 6/8] wip: docs: copy or inline css
> 
> This seems pretty complicated compared to just copying a file?

Mainly that it works correctly for the standalone file.

Greetings,

Andres Freund




Re: Request for comment on setting binary format output per session

2023-03-20 Thread Jeff Davis
On Mon, 2023-03-13 at 16:33 -0400, Dave Cramer wrote:
> Attached is a preliminary patch that takes a list of OID's. I'd like
> to know if this is going in the right direction.

I found a few issues:

1. Some kind of memory error:

  SET format_binary='25,1082,1184';
  WARNING:  problem in alloc set PortalContext: detected write past
chunk end in block 0x55ba7b5f7610, chunk 0x55ba7b5f7a48
  ...
  SET

2. Easy to confuse psql:

  CREATE TABLE a(d date, t timestamptz);
  SET format_binary='25,1082,1184';
  SELECT * FROM a;
   d | t 
  ---+---
   ! | 
  (1 row)

3. Some style issues
  - use of "//" comments
  - findOid should return bool, not int

When you add support for user-defined types, that introduces a couple
other issues:

4. The format_binary GUC would depend on the search_path GUC, which
isn't great.

5. There's a theoretical invalidation problem. It might also be a
practical problem in some testing setups with long-lived connections
that are recreating user-defined types.


We've had this problem with binary for a long time, and it seems
desirable to solve it. But I'm not sure GUCs are the right way.

How hard did you try to solve it in the protocol rather than with a
GUC? I see that the startup message allows protocol extensions by
prefixing a parameter name with "_pq_". Are protocol extensions
documented somewhere and would that be a reasonable thing to do here?

Also, if we're going to make the binary format more practical to use,
can we document the expectations better? It seems the expecatation is
that the binary format just never changes, and that if it does, that's
a new type name.

Regards,
Jeff Davis





Re: logical decoding and replication of sequences, take 2

2023-03-20 Thread Tomas Vondra



On 3/20/23 13:26, Amit Kapila wrote:
> On Mon, Mar 20, 2023 at 5:13 PM Tomas Vondra
>  wrote:
>>
>> On 3/20/23 12:00, Amit Kapila wrote:
>>> On Mon, Mar 20, 2023 at 1:49 PM Tomas Vondra
>>>  wrote:


 I don't understand why we'd need WAL from before the slot is created,
 which happens before copy_sequence so the sync will see a more recent
 state (reflecting all changes up to the slot LSN).

>>>
>>> Imagine the following sequence of events:
>>> 1. Operation on a sequence seq-1 which requires WAL. Say, this is done
>>> at LSN 1000.
>>> 2. Some other random operations on unrelated objects. This would
>>> increase LSN to 2000.
>>> 3. Create a slot that uses current LSN 2000.
>>> 4. Copy sequence seq-1 where you will get the LSN value as 1000. Then
>>> you will use LSN 1000 as a starting point to start replication in
>>> sequence sync worker.
>>>
>>> It is quite possible that WAL from LSN 1000 may not be present. Now,
>>> it may be possible that we use the slot's LSN in this case but
>>> currently, it may not be possible without some changes in the slot
>>> machinery. Even, if we somehow solve this, we have the below problem
>>> where we can miss some concurrent activity.
>>>
>>
>> I think the question is what would be the WAL-requiring operation at LSN
>> 1000. If it's just regular nextval(), then we *will* see it during
>> copy_sequence - sequences are not transactional in the MVCC sense.
>>
>> If it's an ALTER SEQUENCE, I guess it might create a new relfilenode,
>> and then we might fail to apply this - that'd be bad.
>>
>> I wonder if we'd allow actually discarding the WAL while building the
>> consistent snapshot, though.
>>
> 
> No, as soon as we reserve the WAL location, we update the slot's
> minLSN (replicationSlotMinLSN) which would prevent the required WAL
> from being removed.
> 
>> You're however right we can't just decide
>> this based on LSN, we'd probably need to compare the relfilenodes too or
>> something like that ...
>>
 I think the only "issue" are the WAL records after the slot LSN, or more
 precisely deciding which of the decoded changes to apply.


> Now, for the second idea which is to directly use
> pg_current_wal_insert_lsn(), I think we won't be able to ensure that
> the changes covered by in-progress transactions like the one with
> Alter Sequence I have given example would be streamed later after the
> initial copy. Because the LSN returned by pg_current_wal_insert_lsn()
> could be an LSN after the LSN associated with Alter Sequence but
> before the corresponding xact's commit.

 Yeah, I think you're right - the locking itself is not sufficient to
 prevent this ordering of operations. copy_sequence would have to lock
 the sequence exclusively, which seems bit disruptive.

>>>
>>> Right, that doesn't sound like a good idea.
>>>
>>
>> Although, maybe we could use a less strict lock level? I mean, one that
>> allows nextval() to continue, but would conflict with ALTER SEQUENCE.
>>
> 
> I don't know if that is a good idea but are you imagining a special
> interface/mechanism just for logical replication because as far as I
> can see you have used SELECT to fetch the sequence values?
> 

Not sure what would the special mechanism be? I don't think it could
read the sequence from somewhere else, and due the lack of MVCC we'd
just read same sequence data from the current relfilenode. Or what else
would it do?

The one thing we can't quite do at the moment is locking the sequence,
because LOCK is only supported for tables. So we could either provide a
function to lock a sequence, or locks it and then returns the current
state (as if we did a SELECT).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add SHELL_EXIT_CODE to psql

2023-03-20 Thread Tom Lane
Corey Huinker  writes:
> 128+N is implemented.

I think this mostly looks OK, but:

* I still say there is no basis whatever for believing that the result
of ferror() is an exit code, errno code, or anything else with
significance beyond zero-or-not.  Feeding it to wait_result_to_exit_code
as you've done here is not going to do anything but mislead people in
a platform-dependent way.  Probably should set exit_code to -1 if
ferror reports trouble.

* Why do you have wait_result_to_exit_code defaulting to return 0
if it doesn't recognize the code as either WIFEXITED or WIFSIGNALED?
That seems pretty misleading; again -1 would be a better idea.

regards, tom lane




Re: postgres_fdw, dblink, and CREATE SUBSCRIPTION security

2023-03-20 Thread Robert Haas
On Fri, Mar 10, 2023 at 7:00 PM Jacob Champion  wrote:
> On Thu, Mar 9, 2023 at 6:17 AM Robert Haas  wrote:
> > That seems like a circular argument. If you call the problem the
> > confused deputy problem then the issue must indeed be that the deputy
> > is confused, and needs to talk to someone else to get un-confused. But
> > why is the deputy necessarily confused in the first place? Our deputy
> > is confused because our code to decide whether to proxy a connection
> > or not is super-dumb,
>
> No, I think our proxy is confused because it doesn't know what power
> it has, and it can't tell the server what power it wants to use. That
> problem is independent of the decision to proxy. You're suggesting
> strengthening the code that makes that decision -- adding an oracle
> (in the form of a DBA) that knows about the confusion and actively
> mitigates it. That's guaranteed to work if the oracle is perfect,
> because "perfect" is somewhat tautologically defined as "whatever
> ensures secure operation". But the oracle doesn't reduce the
> confusion, and DBAs aren't perfect.

I think this is the root of our disagreement. My understanding of the
previous discussion is that people think that the major problem here
is the wraparound-to-superuser attack. That is, in general, we expect
that when we connect to a database over the network, we expect it to
do some kind of active authentication, like asking us for a password,
or asking us for an SSL certificate that isn't just lying around for
anyone to use. However, in the specific case of a local connection, we
have a reliable way of knowing who the remote user is without any kind
of active authentication, namely 'peer' authentication or perhaps even
'trust' if we trust all the local users, and so we don't judge it
unreasonable to allow local connections without any form of active
authentication. There can be some scenarios where even over a network
we can know the identity of the person connecting with complete
certainty, e.g. if endpoints are locked down such that the source IP
address is a reliable indicator of who is initiating the connection,
but in general when there's a network involved you don't know who the
person making the connection is and need to do something extra to
figure it out.

If you accept this characterization of the problem, then I don't think
the oracle is that hard to design. We simply set it up not to allow
wraparound connections, or maybe even more narrowly to not allow
wraparound connections to superuser. If the DBA has some weird network
topology where that's not the correct rule, either because they want
to allow wraparound connections or they want to disallow other things,
then yeah they have to tell us what to allow, but I don't really see
why that's an unreasonable expectation. I'd expect the correct
configuration of the proxy facility to fall naturally out of what's
allowed in pg_hba.conf. If machine A is configured to accept
connections from machines B and C based on environmental factors, then
machines B and C should be configured not to proxy connections to A.
If machines B and C aren't under our control such that we can
configure them that way, then the configuration is fundamentally
insecure in a way that we can't really fix.

I think that what you're proposing is that B and C can just be allowed
to proxy to A and A can say "hey, by the way, I'm just gonna let you
in without asking for anything else" and B and C can, when proxying,
react to that by disconnecting before the connection actually goes
through. That's simpler, in a sense. It doesn't require us to set up
the proxy configuration on B and C in a way that matches what
pg_hba.conf allows on A. Instead, B and C can automatically deduce
what connections they should refuse to proxy. I guess that's nice, but
it feels pretty magical to me. It encourages the DBA not to think
about what B and C should actually be allowed to proxy, and instead
just trust that the automatics are going to prevent any security
disasters. I'm not sure that they always will, and I fear cultivating
too much reliance on them. I think that if you're setting up a network
topology where the correct rule is something more complex than "don't
allow wraparound connections to superuser," maybe you ought to be
forced to spell that rule out instead of letting the system deduce one
that you hope will be right.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Fix typo plgsql to plpgsql.

2023-03-20 Thread Zhang Mingli
Hi, all

Found several typos like plgsql, I think it should be plpgsql.


Regards,
Zhang Mingli


0001-Fix-typo-plgsql-to-plpgsql.patch
Description: Binary data


Re: Add SHELL_EXIT_CODE to psql

2023-03-20 Thread Maxim Orlov
I did a look at the patch, and it seems to be in a good condition.
It implements declared functionality with no visible defects.

As for test, I think it is possible to implement "signals" case in tap
tests. But let the actual commiter decide does it worth it or not.

-- 
Best regards,
Maxim Orlov.


Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2023-03-20 Thread Maxim Orlov
Hi!

As suggested before by Heikki Linnakangas, I've added a patch for making
2PC transaction state 64-bit.
At first, my intention was to rebuild all twophase interface to use
FullTransactionId. But doing this in a proper
manner would lead to switching from TransactionId to FullTransactionId in
PGPROC and patch become too
big to handle here.

So I decided to keep it simple for now and use wrap logic trick and calc
FullTransactionId on current epoch,
since the span of active xids cannot exceed one epoch at any given time.


Patches 1 and 2 are the same as above.

-- 
Best regards,
Maxim Orlov.


v57-0002-Use-larger-segment-file-names-for-pg_notify.patch
Description: Binary data


v57-0003-Make-use-FullTransactionId-in-2PC-filenames.patch
Description: Binary data


v57-0001-Index-SLRUs-by-64-bit-integers-rather-than-by-32.patch
Description: Binary data


Re: Allow parallel plan for referential integrity checks?

2023-03-20 Thread Frédéric Yhuel




On 3/20/23 15:58, Gregory Stark (as CFM) wrote:

On Mon, 12 Dec 2022 at 11:37, Frédéric Yhuel  wrote:


I've planned to work on it full time on week 10 (6-10 March), if you
agree to bear with me. The idea would be to bootstrap my brain on it,
and then continue to work on it from time to time.


Any updates on this patch? 


I had the opportunity to discuss it with Melanie Plageman when she came 
to Paris last month and teach us (Dalibo's folk) about the patch review 
process.


She advised me to provide a good test case first, and to explain better 
how it would be useful, which I'm going to do.



Should we move it to next release at this
point? Even if you get time to work on it this commitfest do you think
it's likely to be committable in the next few weeks?



It is very unlikely. Maybe it's better to remove it from CF and put it 
back later if the test case I will provide does a better job at 
convincing the Postgres folks that RI checks should be parallelized.





Re: Implement missing join selectivity estimation for range types

2023-03-20 Thread Schoemans Maxime
Hi Tomas,

As a quick update, the paper related to this work has finally been published in 
Mathematics (https://www.mdpi.com/2227-7390/11/6/1383).
During revision we also added a figure showing a comparison of our algorithm vs 
the existing algorithms in Oracle, SQL Server, MySQL and PostgreSQL, which can 
be found in the experiments section of the paper.
As can be seen, our algorithm outperforms even Oracle and SQL Server.

During this revision we also found a small bug, so we are working on a revision 
of the patch, which fixes this.


Also, calc_hist_selectivity_contains in multirangetypes_selfuncs.c needs
a proper comment, not just "this is a copy from rangetypes".


Right, the comment should elaborate more that the collected statistics are
currently that same as rangetypes but may potentially deviate.



However, it seems the two functions are exactly the same. Would the
functions diverge in the future? If not, maybe there should be just a
single shared function?


Indeed, it is possible that the two functions will deviate if that statistics
of multirange types will be refined.



Right, but are there any such plans? Also, what's the likelihood we'll
add new statistics to only one of the places (e.g. for multiranges but
not plain ranges)?

I'd keep a single function until we actually need two. That's also
easier for maintenance - with two it's easy to fix a bug in one place
and forget about the other, etc.

Regarding our previous discussion about the duplication of 
calc_hist_join_selectivity in rangetypes_selfuncs.c and 
multirangetypes_selfuncs.c, we can also remove this duplication in the revision 
if needed.
Note that currently, there are no external functions shared between 
rangetypes_selfuncs.c and multirangetypes_selfuncs.c.
Any function that was used in both files was duplicated as a static function.
The functions calc_hist_selectivity_scalar, calc_length_hist_frac, 
calc_hist_selectivity_contained and calc_hist_selectivity_contains are examples 
of this, where the function is identical but has been declared static in both 
files.
That said, we can remove the duplication of calc_hist_join_selectivity if it 
still needed.
We would, however, require some guidance as to where to put the external 
definition of this function, as there does not appear to be a 
rangetypes_selfuncs.h header.
Should it simply go into utils/selfuncs.h or should we create a new header file?

Best regards,
Maxime Schoemans


Re: Allow parallel plan for referential integrity checks?

2023-03-20 Thread Gregory Stark (as CFM)
On Mon, 12 Dec 2022 at 11:37, Frédéric Yhuel  wrote:
>
> I've planned to work on it full time on week 10 (6-10 March), if you
> agree to bear with me. The idea would be to bootstrap my brain on it,
> and then continue to work on it from time to time.

Any updates on this patch? Should we move it to next release at this
point? Even if you get time to work on it this commitfest do you think
it's likely to be committable in the next few weeks?

-- 
Gregory Stark
As Commitfest Manager




Re: explain analyze rows=%.0f

2023-03-20 Thread Gregory Stark (as CFM)
On Wed, 4 Jan 2023 at 10:05, Ibrar Ahmed  wrote:
>
> Thanks, I have modified everything as suggested, except one point
>
> > Don't use variable format strings. They're hard to read and they
> > probably defeat compile-time checks that the arguments match the
> > format string. You could use a "*" field width instead, ie
...
> > * Another thought is that the non-text formats tend to prize output
> > consistency over readability, so maybe we should just always use 2
> > fractional digits there, rather than trying to minimize visible changes.
>
> In that, we need to adjust a lot of test case outputs.

> > * We need some doc adjustments, surely, to explain what the heck this
> > means.

That sounds like three points :) But they seem like pretty good
arguments to me and straightforward if a little tedious to adjust.

This patch was marked Returned with Feedback and then later Waiting on
Author. And it hasn't had any updates since January. What is the state
on this feedback? If it's already done we can set the patch to Ready
for Commit and if not do you disagree with the proposed changes?

It's actually the kind of code cleanup changes I'm reluctant to bump a
patch for. It's not like a committer can't make these kinds of changes
when committing. But there are so many patches they're likely to just
focus on a different patch when there are adjustments like this
pending.

-- 
Gregory Stark
As Commitfest Manager




Re: Question: Do we have a rule to use "PostgreSQL" and "PostgreSQL" separately?

2023-03-20 Thread Daniel Gustafsson
> On 20 Mar 2023, at 15:31, Tom Lane  wrote:
> 
> "Hayato Kuroda (Fujitsu)"  writes:
>> While checking documentations, I found that one line notes our product as
>> "PostgreSQL", whereas another line notes as just
>> "PostgreSQL".
> 
> IMO the convention is to use the  tag everywhere that we
> spell out "PostgreSQL".  I don't think it's actually rendered differently
> with our current stylesheets, but maybe someday it will be.

IIRC the main use in DocBook is for automatically decorating productnames with
trademark signs etc, and to generate lists of trademarks, but also that they
can be rendered differently.

This reminded me that I was planning to apply the below to make the markup of
PostgreSQL consistent:

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 4df8bd1b64..9ff6b08f5a 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -2667,7 +2667,7 @@ TIMESTAMP WITH TIME ZONE '2004-10-19 10:23:54+02'
  To complicate matters, some jurisdictions have used the same timezone
  abbreviation to mean different UTC offsets at different times; for
  example, in Moscow MSK has meant UTC+3 in some years 
and
- UTC+4 in others.  PostgreSQL interprets such
+ UTC+4 in others.  PostgreSQL interprets such
  abbreviations according to whatever they meant (or had most recently
  meant) on the specified date; but, as with the EST 
example
  above, this is not necessarily the same as local civil time on that date.

--
Daniel Gustafsson





Re: Save a few bytes in pg_attribute

2023-03-20 Thread Tom Lane
Peter Eisentraut  writes:
> After the discussion in [0] ff., I was looking around in pg_attribute 
> and noticed that we could possibly save 4 bytes.  We could change both 
> attstattarget and attinhcount from int4 to int2, which together with 
> some reordering would save 4 bytes from the fixed portion.

> attstattarget is already limited to 1, so this wouldn't lose 
> anything.  For attinhcount, I don't see any documented limits.  But it 
> seems unlikely to me that someone would need more than 32k immediate 
> inheritance parents on a column.  (Maybe an overflow check would be 
> useful, though, to prevent shenanigans.)

> The attached patch seems to work.  Thoughts?

I agree that attinhcount could be narrowed, but I have some concern
about attstattarget.  IIRC, the limit on attstattarget was once 1000
and then we raised it to 1.  Is it inconceivable that we might
want to raise it to 10 someday?

regards, tom lane




Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-03-20 Thread Masahiko Sawada
On Mon, Mar 20, 2023 at 9:34 PM John Naylor
 wrote:
>
>
> On Mon, Mar 20, 2023 at 12:25 PM Masahiko Sawada  
> wrote:
> >
> > On Fri, Mar 17, 2023 at 4:49 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Fri, Mar 17, 2023 at 4:03 PM John Naylor
> > >  wrote:
> > > >
> > > > On Wed, Mar 15, 2023 at 9:32 AM Masahiko Sawada  
> > > > wrote:
> > > > >
> > > > > On Tue, Mar 14, 2023 at 8:27 PM John Naylor
> > > > >  wrote:
> > > > > >
> > > > > > I wrote:
> > > > > >
> > > > > > > > > Since the block-level measurement is likely overestimating 
> > > > > > > > > quite a bit, I propose to simply reverse the order of the 
> > > > > > > > > actions here, effectively reporting progress for the *last 
> > > > > > > > > page* and not the current one: First update progress with the 
> > > > > > > > > current memory usage, then add tids for this page. If this 
> > > > > > > > > allocated a new block, only a small bit of that will be 
> > > > > > > > > written to. If this block pushes it over the limit, we will 
> > > > > > > > > detect that up at the top of the loop. It's kind of like our 
> > > > > > > > > earlier attempts at a "fudge factor", but simpler and less 
> > > > > > > > > brittle. And, as far as OS pages we have actually written to, 
> > > > > > > > > I think it'll effectively respect the memory limit, at least 
> > > > > > > > > in the local mem case. And the numbers will make sense.
> > > >
> > > > > > I still like my idea at the top of the page -- at least for vacuum 
> > > > > > and m_w_m. It's still not completely clear if it's right but I've 
> > > > > > got nothing better. It also ignores the work_mem issue, but I've 
> > > > > > given up anticipating all future cases at the moment.
> > > >
> > > > > IIUC you suggested measuring memory usage by tracking how much memory
> > > > > chunks are allocated within a block. If your idea at the top of the
> > > > > page follows this method, it still doesn't deal with the point Andres
> > > > > mentioned.
> > > >
> > > > Right, but that idea was orthogonal to how we measure memory use, and 
> > > > in fact mentions blocks specifically. The re-ordering was just to make 
> > > > sure that progress reporting didn't show current-use > max-use.
> > >
> > > Right. I still like your re-ordering idea. It's true that the most
> > > area of the last allocated block before heap scanning stops is not
> > > actually used yet. I'm guessing we can just check if the context
> > > memory has gone over the limit. But I'm concerned it might not work
> > > well in systems where overcommit memory is disabled.
> > >
> > > >
> > > > However, the big question remains DSA, since a new segment can be as 
> > > > large as the entire previous set of allocations. It seems it just 
> > > > wasn't designed for things where memory growth is unpredictable.
> >
> > aset.c also has a similar characteristic; allocates an 8K block upon
> > the first allocation in a context, and doubles that size for each
> > successive block request. But we can specify the initial block size
> > and max blocksize. This made me think of another idea to specify both
> > to DSA and both values are calculated based on m_w_m. For example, we
>
> That's an interesting idea, and the analogous behavior to aset could be a 
> good thing for readability and maintainability. Worth seeing if it's workable.

I've attached a quick hack patch. It can be applied on top of v32
patches. The changes to dsa.c are straightforward since it makes the
initial and max block sizes configurable. The patch includes a test
function, test_memory_usage() to simulate how DSA segments grow behind
the shared radix tree. If we set the first argument to true, it
calculates both initial and maximum block size based on work_mem (I
used work_mem here just because its value range is larger than m_w_m):

postgres(1:833654)=# select test_memory_usage(true);
NOTICE:  memory limit 134217728
NOTICE:  init 1048576 max 16777216
NOTICE:  initial: 1048576
NOTICE:  rt_create: 1048576
NOTICE:  allocate new DSM [1] 1048576
NOTICE:  allocate new DSM [2] 2097152
NOTICE:  allocate new DSM [3] 2097152
NOTICE:  allocate new DSM [4] 4194304
NOTICE:  allocate new DSM [5] 4194304
NOTICE:  allocate new DSM [6] 8388608
NOTICE:  allocate new DSM [7] 8388608
NOTICE:  allocate new DSM [8] 16777216
NOTICE:  allocate new DSM [9] 16777216
NOTICE:  allocate new DSM [10] 16777216
NOTICE:  allocate new DSM [11] 16777216
NOTICE:  allocate new DSM [12] 16777216
NOTICE:  allocate new DSM [13] 16777216
NOTICE:  allocate new DSM [14] 16777216
NOTICE:  reached: 148897792 (+14680064)
NOTICE:  12718205 keys inserted: 148897792
 test_memory_usage
---

(1 row)

Time: 7195.664 ms (00:07.196)

Setting the first argument to false, we can specify both manually in
second and third arguments:

postgres(1:833654)=# select test_memory_usage(false, 1024 * 1024, 1024
* 1024 * 1024 * 10::bigint);
NOTICE:  memory limit 134217728
NOTICE:  init 1048576 max 10737418240
NOTICE:  initial: 1048576

Re: Question: Do we have a rule to use "PostgreSQL" and "PostgreSQL" separately?

2023-03-20 Thread Tom Lane
"Hayato Kuroda (Fujitsu)"  writes:
> While checking documentations, I found that one line notes our product as
> "PostgreSQL", whereas another line notes as just
> "PostgreSQL".

IMO the convention is to use the  tag everywhere that we
spell out "PostgreSQL".  I don't think it's actually rendered differently
with our current stylesheets, but maybe someday it will be.

regards, tom lane




Re: Missing rules for queryjumblefuncs.{funcs,switch}.c for maintainer-clean

2023-03-20 Thread Tom Lane
Michael Paquier  writes:
> Nathan has reported to me offlist that maintainer-clean was not doing
> its job for the files generated by gen_node_support.pl in
> src/backend/nodes/ for the query jumbling.  Attached is a patch to
> take care of this issue.
> While on it, I have found a comment in the related README that was
> missing a refresh.
> Any objections or comments?

Is similar knowledge missing in the meson build files?

regards, tom lane




Re: Commitfest 2023-03 starting tomorrow!

2023-03-20 Thread Greg Stark
The next level of this would be something like notifying the committer
with a list of patches in the CF that a commit broke. I don't
immediately see how to integrate that with our workflow but I have
seen something like this work well in a previous job. When committing
code you often went and updated other unrelated projects to adapt to
the new API (or could adjust the code you were committing to cause
less breakage).




Re: Memory leak from ExecutorState context?

2023-03-20 Thread Jehan-Guillaume de Rorthais
On Mon, 20 Mar 2023 09:32:17 +0100
Tomas Vondra  wrote:

> >> * Patch 1 could be rebased/applied/backpatched  
> > 
> > Would it help if I rebase Patch 1 ("move BufFile stuff into separate
> > context")? 
> 
> Yeah, I think this is something we'd want to do. It doesn't change the
> behavior, but it makes it easier to track the memory consumption etc.

Will do this week.

> I think focusing on the backpatchability is not particularly good
> approach. It's probably be better to fist check if there's any hope of
> restarting the work on BNLJ, which seems like a "proper" solution for
> the future.

Backpatching worth some consideration. Balancing is almost there and could help
in 16. As time is flying, it might well miss release 16, but maybe it could be
backpacthed in 16.1 or a later minor release? But what if it is not eligible for
backpatching? We would stall another year without having at least an imperfect
stop-gap solution (sorry for picking your words ;)).

BNJL and/or other considerations are for 17 or even after. In the meantime,
Melanie, who authored BNLJ, +1 the balancing patch as it can coexists with other
discussed solutions. No one down vote since then. Melanie, what is your
opinion today on this patch? Did you change your mind as you worked for many
months on BNLJ since then?

> On 3/19/23 20:31, Justin Pryzby wrote:
> > On Fri, Mar 17, 2023 at 05:41:11PM +0100, Tomas Vondra wrote:  
>  * Patch 2 is worth considering to backpatch  
> >>
> >> I'm not quite sure what exactly are the numbered patches, as some of the
> >> threads had a number of different patch ideas, and I'm not sure which
> >> one was/is the most promising one.  
> > 
> > patch 2 is referring to the list of patches that was compiled
> > https://www.postgresql.org/message-id/20230310195114.6d0c5406%40karst  
> 
> Ah, I see - it's just the "rebalancing" patch which minimizes the total
> amount of memory used (i.e. grow work_mem a bit, so that we don't
> allocate too many files).
> 
> Yeah, I think that's the best we can do without reworking how we spill
> data (slicing or whatever).

Indeed, that's why I was speaking about backpatching for this one:

* it surely helps 16 (and maybe previous release) in such skewed situation
* it's constrained
* it's not too invasive, it doesn't shake to whole algorithm all over the place
* Melanie was +1 for it, no one down vote. 

What need to be discussed/worked:

* any regressions for existing queries running fine or without OOM?
* add the bufFile memory consumption in the planner considerations?

> I think the spilling is "nicer" in that it actually enforces work_mem
> more strictly than (a), 

Sure it enforces work_mem more strictly, but this is a discussion for 17 or
later in my humble opinion.

> but we should probably spend a bit more time on
> the exact spilling strategy. I only really tried two trivial ideas, but
> maybe we can be smarter about allocating / sizing the files? Maybe
> instead of slices of constant size we could/should make them larger,
> similarly to what log-structured storage does.
> 
> For example we might double the number of batches a file represents, so
> the first file would be for batch 1, the next one for batches 2+3, then
> 4+5+6+7, then 8-15, then 16-31, ...
> 
> We should have some model calculating (i) amount of disk space we would
> need for the spilled files, and (ii) the amount of I/O we do when
> writing the data between temp files.

And:

* what about Robert's discussion on uneven batch distribution? Why is it
  ignored? Maybe there was some IRL or off-list discussions? Or did I missed
  some mails?
* what about dealing with all normal batch first, then revamp in
  freshly emptied batches the skewed ones, spliting them if needed, then rince &
  repeat? At some point, we would probably still need something like slicing
  and/or BNLJ though...

> let's revive the rebalance and/or spilling patches. Imperfect but better than
> nothing.

+1 for rebalance. I'm not even sure it could make it to 16 as we are running
out time, but it worth to try as it's the closest one that could be
reviewed and ready'd-for-commiter.

I might lack of ambition, but spilling patches seems too much to make it for
16. It seems to belongs with other larger patches/ideas (patches 4 a 5 in my sum
up). But this is just my humble feeling.

> And then in the end we can talk about if/what can be backpatched.
> 
> FWIW I don't think there's a lot of rush, considering this is clearly a
> matter for PG17. So the summer CF at the earliest, people are going to
> be busy until then.

100% agree, there's no rush for patches 3, 4 ... and 5.

Thanks!




Re: Kerberos delegation support in libpq and postgres_fdw

2023-03-20 Thread Stephen Frost
Greetings,

* Stephen Frost (sfr...@snowman.net) wrote:
> * Michael Paquier (mich...@paquier.xyz) wrote:
> > On Mon, Sep 19, 2022 at 02:05:39PM -0700, Jacob Champion wrote:
> > > It's not prevented, because a password is being used. In my tests I'm
> > > connecting as an unprivileged user.
> > > 
> > > You're claiming that the middlebox shouldn't be doing this. If this new
> > > default behavior were the historical behavior, then I would have agreed.
> > > But the cat's already out of the bag on that, right? It's safe today.
> > > And if it's not safe today for some other reason, please share why, and
> > > maybe I can work on a patch to try to prevent people from doing it.
> > 
> > Please note that this has been marked as returned with feedback in the
> > current CF, as this has remained unanswered for a bit more than three
> > weeks.
> 
> There's some ongoing discussion about how to handle outbound connections
> from the server ending up picking up credentials from the server's
> environment (that really shouldn't be allowed unless specifically asked
> for..), that's ultimately an independent change from what this patch is
> doing.

That got committed, which is great, though it didn't go quite as far as
I had been hoping regarding dealing with outbound connections from the
server- perhaps we should make it clear at least for postgres_fdw that
it might be good for administrators to explicitly say which options are
allowed for a given user-map when it comes to how authentication is
done to the remote server?  Seems like mostly a documentation
improvement, I think?  Or should we have some special handling around
that option for postgres_fdw/dblink?

> Here's an updated version which does address Robert's concerns around
> having this disabled by default and having options on both the server
> and client side saying if it is to be enabled or not.  Also added to
> pg_stat_gssapi a field that indicates if credentials were proxied or not
> and made some other improvements and added additional regression tests
> to test out various combinations.

I've done some self-review and also reworked how the security checks are
done to be sure that we're not ending up pulling credentials from the
environment (with added regression tests to check for it too).  If
there's remaining concerns around that, please let me know.  Of course,
other review would be great also.  Presently though:

- Rebased up to today
- Requires explicitly being enabled on client and server
- Authentication to a remote server via dblink or postgres_fdw with
  GSSAPI requires that credentials were proxied by the client to the
  server, except if the superuser set 'password_required' to false on
  the postgres_fdw (which has lots of caveats around it in the
  documentation because it's inherently un-safe to do).
- Includes updated documentation
- Quite a few additional regression tests to check for unrelated
  credentials coming from the environment in either cases where
  credentials have been proxied and in cases where they haven't.
- Only changes to existing regression tests for dblink/postgres_fdw are
  in the error message wording updates.

Thanks!

Stephen
From 03a799699b42d3f9d9ed017bd448f606b7a2761d Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Mon, 28 Feb 2022 20:17:55 -0500
Subject: [PATCH] Add support for Kerberos credential delegation

Support GSSAPI/Kerberos credentials being delegated to the server by a
client.  With this, a user authenticating to PostgreSQL using Kerberos
(GSSAPI) credentials can choose to delegate their credentials to the
PostgreSQL server (which can choose to accept them, or not), allowing
the server to then use those delegated credentials to connect to
another service, such as with postgres_fdw or dblink or theoretically
any other service which is able to be authenticated using Kerberos.

Both postgres_fdw and dblink are changed to allow non-superuser
password-less connections but only when GSSAPI credentials have been
proxied to the server by the client and GSSAPI is used to authenticate
to the remote system.

Authors: Stephen Frost, Peifeng Qiu
Discussion: https://postgr.es/m/co1pr05mb8023cc2cb575e0faad7df4f8a8...@co1pr05mb8023.namprd05.prod.outlook.com
---
 contrib/dblink/dblink.c   | 118 +++---
 contrib/dblink/expected/dblink.out|   4 +-
 contrib/postgres_fdw/connection.c |  68 +++-
 .../postgres_fdw/expected/postgres_fdw.out|  19 +-
 contrib/postgres_fdw/option.c |   3 +
 contrib/postgres_fdw/sql/postgres_fdw.sql |   3 +-
 doc/src/sgml/config.sgml  |  17 +
 doc/src/sgml/dblink.sgml  |   5 +-
 doc/src/sgml/libpq.sgml   |  41 +++
 doc/src/sgml/monitoring.sgml  |   9 +
 doc/src/sgml/postgres-fdw.sgml|   5 +-
 src/backend/catalog/system_views.sql  |   3 +-
 src/backend/foreign/foreign.c |   1 +
 src/backend/libpq/auth.c  

RE: Data is copied twice when specifying both child and parent table in publication

2023-03-20 Thread Hayato Kuroda (Fujitsu)
Dear Wang,

I have tested about multilevel partitions, and it worked well.
Followings are my comments for v18-0001.

01. pg_get_publication_tables

```
+   ListCell   *lc;
```

This definition can be inside of the "for (i = 0; i < nelems; i++)".

02. pg_get_publication_tables

```
-* If the publication publishes partition changes via 
their
-* respective root partitioned tables, we must exclude 
partitions
-* in favor of including the root partitioned tables. 
Otherwise,
-* the function could return both the child and parent 
tables
-* which could cause data of the child table to be
-* double-published on the subscriber side.
+* Publications support partitioned tables. If
+* publish_via_partition_root is false, all changes are 
replicated
+* using leaf partition identity and schema, so we only 
need those.
+* Otherwise, get the partitioned table itself.
```

The comments can be  inside of the "else".

03. pg_get_publication_tables

```
+   pfree(elems);
```

Only elems is pfree()'d here, but how about other variable like pub_elem and 
pub_elem_tables?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

2023-03-20 Thread Amit Kapila
On Mon, Mar 20, 2023 at 2:58 PM Önder Kalacı  wrote:
>
>
> I applied all patches for all the versions, and re-run the subscription tests,
> all looks good to me.
>

LGTM. I'll push this tomorrow unless there are more comments.

-- 
With Regards,
Amit Kapila.




RE: Initial Schema Sync for Logical Replication

2023-03-20 Thread Kumar, Sachin
Hi Amit,

> From: Amit Kapila 
> > > Hi,
> > >
> > > I have a couple of questions.
> > >
> > > Q1.
> > >
> > > What happens if the subscriber already has some tables present? For
> > > example, I did not see the post saying anything like "Only if the
> > > table does not already exist then it will be created".
> > >
> > My assumption was the if subscriber is doing initial schema sync , It
> > does not have any conflicting database objects.
> >
> 
> Can't we simply error out in such a case with "obj already exists"?
> This would be similar to how we deal with conflicting rows with unique/primary
> keys.
Right this is the default behaviour , We will run pg_restore with 
--single_transaction,
So if we get error while executing a create table the whole pg_restore will 
fail and 
user will notified. 
Regards
Sachin


Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-03-20 Thread John Naylor
On Mon, Mar 20, 2023 at 12:25 PM Masahiko Sawada 
wrote:
>
> On Fri, Mar 17, 2023 at 4:49 PM Masahiko Sawada 
wrote:
> >
> > On Fri, Mar 17, 2023 at 4:03 PM John Naylor
> >  wrote:
> > >
> > > On Wed, Mar 15, 2023 at 9:32 AM Masahiko Sawada 
wrote:
> > > >
> > > > On Tue, Mar 14, 2023 at 8:27 PM John Naylor
> > > >  wrote:
> > > > >
> > > > > I wrote:
> > > > >
> > > > > > > > Since the block-level measurement is likely overestimating
quite a bit, I propose to simply reverse the order of the actions here,
effectively reporting progress for the *last page* and not the current one:
First update progress with the current memory usage, then add tids for this
page. If this allocated a new block, only a small bit of that will be
written to. If this block pushes it over the limit, we will detect that up
at the top of the loop. It's kind of like our earlier attempts at a "fudge
factor", but simpler and less brittle. And, as far as OS pages we have
actually written to, I think it'll effectively respect the memory limit, at
least in the local mem case. And the numbers will make sense.
> > >
> > > > > I still like my idea at the top of the page -- at least for
vacuum and m_w_m. It's still not completely clear if it's right but I've
got nothing better. It also ignores the work_mem issue, but I've given up
anticipating all future cases at the moment.
> > >
> > > > IIUC you suggested measuring memory usage by tracking how much
memory
> > > > chunks are allocated within a block. If your idea at the top of the
> > > > page follows this method, it still doesn't deal with the point
Andres
> > > > mentioned.
> > >
> > > Right, but that idea was orthogonal to how we measure memory use, and
in fact mentions blocks specifically. The re-ordering was just to make sure
that progress reporting didn't show current-use > max-use.
> >
> > Right. I still like your re-ordering idea. It's true that the most
> > area of the last allocated block before heap scanning stops is not
> > actually used yet. I'm guessing we can just check if the context
> > memory has gone over the limit. But I'm concerned it might not work
> > well in systems where overcommit memory is disabled.
> >
> > >
> > > However, the big question remains DSA, since a new segment can be as
large as the entire previous set of allocations. It seems it just wasn't
designed for things where memory growth is unpredictable.
>
> aset.c also has a similar characteristic; allocates an 8K block upon
> the first allocation in a context, and doubles that size for each
> successive block request. But we can specify the initial block size
> and max blocksize. This made me think of another idea to specify both
> to DSA and both values are calculated based on m_w_m. For example, we

That's an interesting idea, and the analogous behavior to aset could be a
good thing for readability and maintainability. Worth seeing if it's
workable.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: logical decoding and replication of sequences, take 2

2023-03-20 Thread Amit Kapila
On Mon, Mar 20, 2023 at 5:13 PM Tomas Vondra
 wrote:
>
> On 3/20/23 12:00, Amit Kapila wrote:
> > On Mon, Mar 20, 2023 at 1:49 PM Tomas Vondra
> >  wrote:
> >>
> >>
> >> I don't understand why we'd need WAL from before the slot is created,
> >> which happens before copy_sequence so the sync will see a more recent
> >> state (reflecting all changes up to the slot LSN).
> >>
> >
> > Imagine the following sequence of events:
> > 1. Operation on a sequence seq-1 which requires WAL. Say, this is done
> > at LSN 1000.
> > 2. Some other random operations on unrelated objects. This would
> > increase LSN to 2000.
> > 3. Create a slot that uses current LSN 2000.
> > 4. Copy sequence seq-1 where you will get the LSN value as 1000. Then
> > you will use LSN 1000 as a starting point to start replication in
> > sequence sync worker.
> >
> > It is quite possible that WAL from LSN 1000 may not be present. Now,
> > it may be possible that we use the slot's LSN in this case but
> > currently, it may not be possible without some changes in the slot
> > machinery. Even, if we somehow solve this, we have the below problem
> > where we can miss some concurrent activity.
> >
>
> I think the question is what would be the WAL-requiring operation at LSN
> 1000. If it's just regular nextval(), then we *will* see it during
> copy_sequence - sequences are not transactional in the MVCC sense.
>
> If it's an ALTER SEQUENCE, I guess it might create a new relfilenode,
> and then we might fail to apply this - that'd be bad.
>
> I wonder if we'd allow actually discarding the WAL while building the
> consistent snapshot, though.
>

No, as soon as we reserve the WAL location, we update the slot's
minLSN (replicationSlotMinLSN) which would prevent the required WAL
from being removed.

> You're however right we can't just decide
> this based on LSN, we'd probably need to compare the relfilenodes too or
> something like that ...
>
> >> I think the only "issue" are the WAL records after the slot LSN, or more
> >> precisely deciding which of the decoded changes to apply.
> >>
> >>
> >>> Now, for the second idea which is to directly use
> >>> pg_current_wal_insert_lsn(), I think we won't be able to ensure that
> >>> the changes covered by in-progress transactions like the one with
> >>> Alter Sequence I have given example would be streamed later after the
> >>> initial copy. Because the LSN returned by pg_current_wal_insert_lsn()
> >>> could be an LSN after the LSN associated with Alter Sequence but
> >>> before the corresponding xact's commit.
> >>
> >> Yeah, I think you're right - the locking itself is not sufficient to
> >> prevent this ordering of operations. copy_sequence would have to lock
> >> the sequence exclusively, which seems bit disruptive.
> >>
> >
> > Right, that doesn't sound like a good idea.
> >
>
> Although, maybe we could use a less strict lock level? I mean, one that
> allows nextval() to continue, but would conflict with ALTER SEQUENCE.
>

I don't know if that is a good idea but are you imagining a special
interface/mechanism just for logical replication because as far as I
can see you have used SELECT to fetch the sequence values?

-- 
With Regards,
Amit Kapila.




Question: Do we have a rule to use "PostgreSQL" and "PostgreSQL" separately?

2023-03-20 Thread Hayato Kuroda (Fujitsu)
Dear hackers,

While checking documentations, I found that one line notes our product as
"PostgreSQL", whereas another line notes as just
"PostgreSQL". For example, in bgworker.sgml:

```
PostgreSQL can be extended to run user-supplied code in separate processes.
...
These processes are attached to PostgreSQL's
shared memory area and have the option to connect to databases internally; ...
```

It seems that  tag is not used when the string is used as link or
title, but I cannot find other rule to use them. Could you please tell me 
discussions
or wikipage about it?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED





Re: allow_in_place_tablespaces vs. pg_basebackup

2023-03-20 Thread Robert Haas
On Thu, Mar 9, 2023 at 4:15 PM Robert Haas  wrote:
> Now that I'm done grumbling, here's a patch.

Anyone want to comment on this?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: logical decoding and replication of sequences, take 2

2023-03-20 Thread Tomas Vondra
On 3/20/23 12:00, Amit Kapila wrote:
> On Mon, Mar 20, 2023 at 1:49 PM Tomas Vondra
>  wrote:
>>
>>
>> On 3/20/23 04:42, Amit Kapila wrote:
>>> On Sat, Mar 18, 2023 at 8:49 PM Tomas Vondra
>>>  wrote:

 On 3/18/23 06:35, Amit Kapila wrote:
> On Sat, Mar 18, 2023 at 3:13 AM Tomas Vondra
>  wrote:
>>
>> ...
>>
>> Clearly, for sequences we can't quite rely on snapshots/slots, we need
>> to get the LSN to decide what changes to apply/skip from somewhere else.
>> I wonder if we can just ignore the queued changes in tablesync, but I
>> guess not - there can be queued increments after reading the sequence
>> state, and we need to apply those. But maybe we could use the page LSN
>> from the relfilenode - that should be the LSN of the last WAL record.
>>
>> Or maybe we could simply add pg_current_wal_insert_lsn() into the SQL we
>> use to read the sequence state ...
>>
>
> What if some Alter Sequence is performed before the copy starts and
> after the copy is finished, the containing transaction rolled back?
> Won't it copy something which shouldn't have been copied?
>

 That shouldn't be possible - the alter creates a new relfilenode and
 it's invisible until commit. So either it gets committed (and then
 replicated), or it remains invisible to the SELECT during sync.

>>>
>>> Okay, however, we need to ensure that such a change will later be
>>> replicated and also need to ensure that the required WAL doesn't get
>>> removed.
>>>
>>> Say, if we use your first idea of page LSN from the relfilenode, then
>>> how do we ensure that the corresponding WAL doesn't get removed when
>>> later the sync worker tries to start replication from that LSN? I am
>>> imagining here the sync_sequence_slot will be created before
>>> copy_sequence but even then it is possible that the sequence has not
>>> been updated for a long time and the LSN location will be in the past
>>> (as compared to the slot's LSN) which means the corresponding WAL
>>> could be removed. Now, here we can't directly start using the slot's
>>> LSN to stream changes because there is no correlation of it with the
>>> LSN (page LSN of sequence's relfilnode) where we want to start
>>> streaming.
>>>
>>
>> I don't understand why we'd need WAL from before the slot is created,
>> which happens before copy_sequence so the sync will see a more recent
>> state (reflecting all changes up to the slot LSN).
>>
> 
> Imagine the following sequence of events:
> 1. Operation on a sequence seq-1 which requires WAL. Say, this is done
> at LSN 1000.
> 2. Some other random operations on unrelated objects. This would
> increase LSN to 2000.
> 3. Create a slot that uses current LSN 2000.
> 4. Copy sequence seq-1 where you will get the LSN value as 1000. Then
> you will use LSN 1000 as a starting point to start replication in
> sequence sync worker.
> 
> It is quite possible that WAL from LSN 1000 may not be present. Now,
> it may be possible that we use the slot's LSN in this case but
> currently, it may not be possible without some changes in the slot
> machinery. Even, if we somehow solve this, we have the below problem
> where we can miss some concurrent activity.
> 

I think the question is what would be the WAL-requiring operation at LSN
1000. If it's just regular nextval(), then we *will* see it during
copy_sequence - sequences are not transactional in the MVCC sense.

If it's an ALTER SEQUENCE, I guess it might create a new relfilenode,
and then we might fail to apply this - that'd be bad.

I wonder if we'd allow actually discarding the WAL while building the
consistent snapshot, though. You're however right we can't just decide
this based on LSN, we'd probably need to compare the relfilenodes too or
something like that ...

>> I think the only "issue" are the WAL records after the slot LSN, or more
>> precisely deciding which of the decoded changes to apply.
>>
>>
>>> Now, for the second idea which is to directly use
>>> pg_current_wal_insert_lsn(), I think we won't be able to ensure that
>>> the changes covered by in-progress transactions like the one with
>>> Alter Sequence I have given example would be streamed later after the
>>> initial copy. Because the LSN returned by pg_current_wal_insert_lsn()
>>> could be an LSN after the LSN associated with Alter Sequence but
>>> before the corresponding xact's commit.
>>
>> Yeah, I think you're right - the locking itself is not sufficient to
>> prevent this ordering of operations. copy_sequence would have to lock
>> the sequence exclusively, which seems bit disruptive.
>>
> 
> Right, that doesn't sound like a good idea.
> 

Although, maybe we could use a less strict lock level? I mean, one that
allows nextval() to continue, but would conflict with ALTER SEQUENCE.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: server log inflates due to pg_logical_slot_peek_changes/pg_logical_slot_get_changes calls

2023-03-20 Thread Jeevan Ladhe
Thanks, Ashutosh for the reply.

I think those messages are useful when debugging logical replication
> problems (imagine missing transaction or inconsistent data between
> publisher and subscriber). I don't think pg_logical_slot_get_changes()
> or pg_logical_slot_peek_changes() are expected to be called frequently
> in a loop.


Yeah right. But can you please shed some light on when these functions
should be called, or are they used only for testing purposes?

Instead you should open a replication connection to
> continue to receive logical changes ... forever.
>

Yes, this is what I have decided to resort to now.

Why do you need to call pg_logical_slot_peek_changes() and
> pg_logical_slot_get_changes() frequently?
>

I was just playing around to do something for logical replication and
thought
of doing this quick test where every time interval I read using
pg_logical_slot_peek_changes(), make sure to consume them to a consistent
state, and only then use pg_logical_slot_get_changes() to advance the slot.

Regards,
Jeevan Ladhe


Re: logical decoding and replication of sequences, take 2

2023-03-20 Thread Amit Kapila
On Mon, Mar 20, 2023 at 1:49 PM Tomas Vondra
 wrote:
>
>
> On 3/20/23 04:42, Amit Kapila wrote:
> > On Sat, Mar 18, 2023 at 8:49 PM Tomas Vondra
> >  wrote:
> >>
> >> On 3/18/23 06:35, Amit Kapila wrote:
> >>> On Sat, Mar 18, 2023 at 3:13 AM Tomas Vondra
> >>>  wrote:
> 
>  ...
> 
>  Clearly, for sequences we can't quite rely on snapshots/slots, we need
>  to get the LSN to decide what changes to apply/skip from somewhere else.
>  I wonder if we can just ignore the queued changes in tablesync, but I
>  guess not - there can be queued increments after reading the sequence
>  state, and we need to apply those. But maybe we could use the page LSN
>  from the relfilenode - that should be the LSN of the last WAL record.
> 
>  Or maybe we could simply add pg_current_wal_insert_lsn() into the SQL we
>  use to read the sequence state ...
> 
> >>>
> >>> What if some Alter Sequence is performed before the copy starts and
> >>> after the copy is finished, the containing transaction rolled back?
> >>> Won't it copy something which shouldn't have been copied?
> >>>
> >>
> >> That shouldn't be possible - the alter creates a new relfilenode and
> >> it's invisible until commit. So either it gets committed (and then
> >> replicated), or it remains invisible to the SELECT during sync.
> >>
> >
> > Okay, however, we need to ensure that such a change will later be
> > replicated and also need to ensure that the required WAL doesn't get
> > removed.
> >
> > Say, if we use your first idea of page LSN from the relfilenode, then
> > how do we ensure that the corresponding WAL doesn't get removed when
> > later the sync worker tries to start replication from that LSN? I am
> > imagining here the sync_sequence_slot will be created before
> > copy_sequence but even then it is possible that the sequence has not
> > been updated for a long time and the LSN location will be in the past
> > (as compared to the slot's LSN) which means the corresponding WAL
> > could be removed. Now, here we can't directly start using the slot's
> > LSN to stream changes because there is no correlation of it with the
> > LSN (page LSN of sequence's relfilnode) where we want to start
> > streaming.
> >
>
> I don't understand why we'd need WAL from before the slot is created,
> which happens before copy_sequence so the sync will see a more recent
> state (reflecting all changes up to the slot LSN).
>

Imagine the following sequence of events:
1. Operation on a sequence seq-1 which requires WAL. Say, this is done
at LSN 1000.
2. Some other random operations on unrelated objects. This would
increase LSN to 2000.
3. Create a slot that uses current LSN 2000.
4. Copy sequence seq-1 where you will get the LSN value as 1000. Then
you will use LSN 1000 as a starting point to start replication in
sequence sync worker.

It is quite possible that WAL from LSN 1000 may not be present. Now,
it may be possible that we use the slot's LSN in this case but
currently, it may not be possible without some changes in the slot
machinery. Even, if we somehow solve this, we have the below problem
where we can miss some concurrent activity.

> I think the only "issue" are the WAL records after the slot LSN, or more
> precisely deciding which of the decoded changes to apply.
>
>
> > Now, for the second idea which is to directly use
> > pg_current_wal_insert_lsn(), I think we won't be able to ensure that
> > the changes covered by in-progress transactions like the one with
> > Alter Sequence I have given example would be streamed later after the
> > initial copy. Because the LSN returned by pg_current_wal_insert_lsn()
> > could be an LSN after the LSN associated with Alter Sequence but
> > before the corresponding xact's commit.
>
> Yeah, I think you're right - the locking itself is not sufficient to
> prevent this ordering of operations. copy_sequence would have to lock
> the sequence exclusively, which seems bit disruptive.
>

Right, that doesn't sound like a good idea.

-- 
With Regards,
Amit Kapila.




Re: Allow logical replication to copy tables in binary format

2023-03-20 Thread Melih Mutlu
Hi,

Please see the attached patch.

vignesh C , 18 Mar 2023 Cmt, 12:03 tarihinde şunu
yazdı:

> On Fri, 17 Mar 2023 at 17:55, Melih Mutlu  wrote:
> 1) Currently we refer the link to the beginning of create subscription
> page, this can be changed to refer to binary option contents in create
> subscription:
>

Done.


> 2) Running pgperltidy shows the test script 014_binary.pl could be
> slightly improved as in the attachment.
>

Couldn't apply the patch you attached but I ran pgperltidy myself and I
guess the result should be similar.


Peter Smith , 18 Mar 2023 Cmt, 12:41 tarihinde şunu
yazdı:

> Commit message
>
> 1.
> Binary copy is supported for v16 or later.
>

Done as you suggested.

Amit Kapila , 18 Mar 2023 Cmt, 13:11 tarihinde
şunu yazdı:

> On Sat, Mar 18, 2023 at 3:11 PM Peter Smith  wrote:
> >
> > SUGGESTION
> > If the publisher is v16 or later, then any initial table
> > synchronization will use the same format as specified by the
> > subscription binary mode. If the publisher is before v16, then any
> > initial table synchronization will use text format regardless of the
> > subscription binary mode.
> >
>
> I agree that the previous comment should be updated but I would prefer
> something along the lines: "Prior to v16, initial table
> synchronization will use text format even if the binary option is
> enabled for a subscription."
>

Done.

Amit Kapila , 20 Mar 2023 Pzt, 05:13 tarihinde
şunu yazdı:

> On Mon, Mar 20, 2023 at 3:37 AM Peter Smith  wrote:
> >
> > There are a couple of TAP tests where the copy binary is expected to
> > fail. And when it fails, you do binary=false (changing the format back
> > to 'text') so the test is then expected to be able to proceed.
> >
> > I don't know if this happens in practice, but IIUC in theory, if the
> > timing is extremely bad, the tablesync could relaunch in binary mode
> > multiple times (any fail multiple times?) before your binary=false
> > change takes effect.
> >
> > So, I was wondering if it could help to use the subscription
> > 'disable_on_error=true' parameter for those cases so that the
> > tablesync won't needlessly attempt to relaunch until you have set
> > binary=false and then re-enabled the subscription.
> >
>
> +1. That would make tests more reliable.
>

Done.

Hayato Kuroda (Fujitsu) , 20 Mar 2023 Pzt, 07:13
tarihinde şunu yazdı:

> Dear Melih,
>
> Thank you for updating the patch.
> I checked your added description about initial data sync and I think it's
> OK.
>
> Few minor comments:
>

Fixed both comments.

Thanks,
-- 
Melih Mutlu
Microsoft


v18-0001-Allow-logical-replication-to-copy-table-in-binary.patch
Description: Binary data


Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

2023-03-20 Thread Drouvot, Bertrand

Hi,

On 3/20/23 12:43 AM, Michael Paquier wrote:

At the
end, documenting both still sounds like the best move to me.


Agree.

Please find attached v1-0001-pg_stat_get_xact_blocks_fetched-and_hit-doc.patch 
doing so.

I did not put the exact same wording as the one being removed in ddfc2d9, as:

- For pg_stat_get_xact_blocks_hit(), I think it's better to be closer to say the
pg_statio_all_tables.heap_blks_hit definition.

- For pg_stat_get_xact_blocks_fetched(), I think that using "buffer" is better 
(than block) as at the
end it's related to pgstat_count_buffer_read().

At the end there is a choice to be made for both for the wording between block 
and buffer. Indeed their
counters get incremented in "buffer" macros while retrieved in those "blocks" 
functions.

"Buffer" sounds more appropriate to me, so the attached has been done that way.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 6249bb50d0..fba8e892cf 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -5697,6 +5697,32 @@ SELECT pid, wait_event_type, wait_event FROM 
pg_stat_activity WHERE wait_event i

   
 
+  
+   
+
+ pg_stat_get_xact_blocks_fetched
+
+pg_stat_get_xact_blocks_fetched ( 
oid )
+bigint
+   
+   
+Returns the number of buffer fetches for table or index, in the 
current transaction
+   
+  
+
+  
+   
+
+ pg_stat_get_xact_blocks_hit
+
+pg_stat_get_xact_blocks_hit ( oid )
+bigint
+   
+   
+Returns the number of buffer hits for table or index, in the current 
transaction
+   
+  
+
   

 


Re: meson documentation build open issues

2023-03-20 Thread Peter Eisentraut

On 20.03.23 03:33, Andres Freund wrote:

I did end up getting stuck when hacking on this, and ended up adding css
support for nochunk and support for the website style for htmlhelp and
nochunk, as well as obsoleting the need for copying the css files... But
perhaps that's a bit too much.

Updated set of patches attached. This one works in older meson versions too
and adds install-world and install-quiet targets.


Oh, this patch set grew quite quickly. ;-)

[PATCH v2 1/8] meson: rename html_help target to htmlhelp

This is obvious.


[PATCH v2 5/8] docs: html: copy images to output as part of xslt build

Making the XSLT stylesheets do the copying has some appeal.  I think it 
would only work for SVG (or other XML) files, which I guess is okay, but 
maybe the templates should have a filter on format="SVG" or something. 
Also, this copying actually modifies the files in some XML-equivalent 
way.  Also okay, I think, but worth noting.


Note sure why you removed this comment

-

since the code still exists.


[PATCH v2 6/8] wip: docs: copy or inline css

This seems pretty complicated compared to just copying a file?





Re: Ignoring BRIN for HOT updates (was: -udpates seems broken)

2023-03-20 Thread Josef Šimánek
po 20. 3. 2023 v 11:24 odesílatel Matthias van de Meent
 napsal:
>
> On Mon, 20 Mar 2023 at 11:11, Tomas Vondra
>  wrote:
> >
> > On 3/14/23 15:41, Matthias van de Meent wrote:
> > > On Tue, 14 Mar 2023 at 14:49, Tomas Vondra
> > >  wrote:
> > >>
> > >>> ...
> > >
> > >> If you agree with these changes, I'll get it committed.
> > >
> > > Yes, thanks!
> > >
> >
> > I've tweaked the patch per the last round of comments, cleaned up the
> > commit message a bit (it still talked about unused bit in tuple header
> > and so on), and pushed it.
> >
> > Thanks for fixing the issues that got the patch reverted last year!
>
> Thanks for helping getting this in!

Thanks for fixing the problems!

>
> Kind regards,
>
> Matthias van de Meent.
>
>




Re: server log inflates due to pg_logical_slot_peek_changes/pg_logical_slot_get_changes calls

2023-03-20 Thread Ashutosh Bapat
On Sun, Mar 19, 2023 at 4:59 PM Jeevan Ladhe  wrote:
>
> Hi,
>
> I observed absurd behaviour while using pg_logical_slot_peek_changes()
> and pg_logical_slot_get_changes(). Whenever any of these two functions
> are called to read the changes using a decoder plugin, the following
> messages are printed in the log for every single such call.
>
> 2023-03-19 16:36:06.040 IST [30099] LOG:  starting logical decoding for slot 
> "test_slot1"
> 2023-03-19 16:36:06.040 IST [30099] DETAIL:  Streaming transactions 
> committing after 0/851DFD8, reading WAL from 0/851DFA0.
> 2023-03-19 16:36:06.040 IST [30099] STATEMENT:  SELECT data FROM 
> pg_logical_slot_get_changes('test_slot1', NULL, NULL, 'format-version', '2');
> 2023-03-19 16:36:06.040 IST [30099] LOG:  logical decoding found consistent 
> point at 0/851DFA0
> 2023-03-19 16:36:06.040 IST [30099] DETAIL:  There are no running 
> transactions.
> 2023-03-19 16:36:06.040 IST [30099] STATEMENT:  SELECT data FROM 
> pg_logical_slot_get_changes('test_slot1', NULL, NULL, 'format-version', '2');
>
> This log is printed on every single call to peek/get functions and bloats
> the server log file by a huge amount when called in the loop for reading
> the changes.
>
> IMHO, printing the message every time we create the context for
> decoding a slot using pg_logical_slot_get_changes() seems over-burn.
> Wondering if instead of LOG messages, should we mark these as
> DEBUG1 in SnapBuildFindSnapshot() and CreateDecodingContext()
> respectively? I can produce a patch for the same if we agree.
>

I think those messages are useful when debugging logical replication
problems (imagine missing transaction or inconsistent data between
publisher and subscriber). I don't think pg_logical_slot_get_changes()
or pg_logical_slot_peek_changes() are expected to be called frequently
in a loop. Instead you should open a replication connection to
continue to receive logical changes ... forever.

Why do you need to call pg_logical_slot_peek_changes() and
pg_logical_slot_get_changes() frequently?

-- 
Best Wishes,
Ashutosh Bapat




Re: Ignoring BRIN for HOT updates (was: -udpates seems broken)

2023-03-20 Thread Matthias van de Meent
On Mon, 20 Mar 2023 at 11:11, Tomas Vondra
 wrote:
>
> On 3/14/23 15:41, Matthias van de Meent wrote:
> > On Tue, 14 Mar 2023 at 14:49, Tomas Vondra
> >  wrote:
> >>
> >>> ...
> >
> >> If you agree with these changes, I'll get it committed.
> >
> > Yes, thanks!
> >
>
> I've tweaked the patch per the last round of comments, cleaned up the
> commit message a bit (it still talked about unused bit in tuple header
> and so on), and pushed it.
>
> Thanks for fixing the issues that got the patch reverted last year!

Thanks for helping getting this in!


Kind regards,

Matthias van de Meent.




Re: Data is copied twice when specifying both child and parent table in publication

2023-03-20 Thread Amit Kapila
On Mon, Mar 20, 2023 at 1:02 PM Peter Smith  wrote:
>
>
> ==
> src/include/catalog/pg_proc.dat
>
> 4.
> +{ oid => '6119',
> +  descr => 'get information of the tables in the given publication array',
>
> Should that be worded in a way to make it more clear that the
> "publication array" is really an "array of publication names"?
>

I don't know how important it is to tell that the array is an array of
publication names but the current description can be improved. How
about something like: "get information of the tables that are part of
the specified publications"

Few other comments:
=
1.
  foreach(lc2, ancestors)
  {
  Oid ancestor = lfirst_oid(lc2);
+ ListCell   *lc3;

  /* Check if the parent table exists in the published table list. */
- if (list_member_oid(relids, ancestor))
+ foreach(lc3, table_infos)
  {
- skip = true;
- break;
+ Oid relid = ((published_rel *) lfirst(lc3))->relid;
+
+ if (relid == ancestor)
+ {
+ skip = true;
+ break;
+ }
  }
+
+ if (skip)
+ break;
  }

- if (!skip)
- result = lappend_oid(result, relid);
+ if (skip)
+ table_infos = foreach_delete_current(table_infos, lc);

The usage of skip looks a bit ugly to me. Can we move the code for the
inner loop to a separate function (like
is_ancestor_member_tableinfos()) and remove the current cell if it
returns true?

2.
  * Filter out the partitions whose parent tables were also specified in
  * the publication.
  */
-static List *
-filter_partitions(List *relids)
+static void
+filter_partitions(List *table_infos)

The comment atop filter_partitions is no longer valid. Can we slightly
change it to: "Filter out the partitions whose parent tables are also
present in the list."?

3.
-# Note: We create two separate tables, not a partitioned one, so that we can
-# easily identity through which relation were the changes replicated.
+# Note: We only create one table (tab4) here. We specified
+# publish_via_partition_root = true (see pub_all and pub_lower_level above), so
+# all data will be replicated to that table.
 $node_subscriber2->safe_psql('postgres',
  "CREATE TABLE tab4 (a int PRIMARY KEY)");
-$node_subscriber2->safe_psql('postgres',
- "CREATE TABLE tab4_1 (a int PRIMARY KEY)");

I am not sure if it is a good idea to remove tab4_1 here. It is
testing something different as mentioned in the comments. Also, I
don't see any data in tab4 for the initial sync, so not sure if this
tests the behavior changed by this patch.

4.
--- a/src/test/subscription/t/031_column_list.pl
+++ b/src/test/subscription/t/031_column_list.pl
@@ -959,7 +959,8 @@ $node_publisher->safe_psql(
  CREATE TABLE test_root_1 PARTITION OF test_root FOR VALUES FROM (1) TO (10);
  CREATE TABLE test_root_2 PARTITION OF test_root FOR VALUES FROM (10) TO (20);

- CREATE PUBLICATION pub_root_true FOR TABLE test_root (a) WITH
(publish_via_partition_root = true);
+ CREATE PUBLICATION pub_root_true_1 FOR TABLE test_root (a) WITH
(publish_via_partition_root = true);
+ CREATE PUBLICATION pub_root_true_2 FOR TABLE test_root_1 (a, b) WITH
(publish_via_partition_root = true);

  -- initial data
  INSERT INTO test_root VALUES (1, 2, 3);
@@ -968,7 +969,7 @@ $node_publisher->safe_psql(

 $node_subscriber->safe_psql(
  'postgres', qq(
- CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION
pub_root_true;
+ CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION
pub_root_true_1, pub_root_true_2;

It is not clear to me what exactly you want to test here. Please add
some comments.

5. I think you can merge the 0001 and 0003 patches.

Apart from the above, attached is a patch to change some of the
comments in the patch.

-- 
With Regards,
Amit Kapila.
diff --git a/src/backend/commands/subscriptioncmds.c 
b/src/backend/commands/subscriptioncmds.c
index 82941a0ce7..44fc371d2c 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -1949,22 +1949,15 @@ fetch_table_list(WalReceiverConn *wrconn, List 
*publications)
 
initStringInfo();
 
-   /*
-* Get namespace, relname and column list (if supported) of the tables
-* belonging to the specified publications.
-*
-* Get the list of tables from the publisher. The partition table whose
-* ancestor is also in this list will be ignored, otherwise the initial
-* data in the partition table would be replicated twice.
-*
-* From version 16, the parameter of the function
-* pg_get_publication_tables can be an array of publications. The
-* partition table whose ancestor is also published in this publication
-* array will be filtered out in this function.
-*/
+   /* Get the list of tables from the publisher. */
if (server_version >= 16)
{
/*
+* From version 16, we allowed passing multiple publications to 
the
+* function pg_get_publication_tables. This helped to filter 
out the
+  

  1   2   >