RE: Replica Identity check of partition table on subscriber

2022-06-16 Thread shiy.f...@fujitsu.com
On Fri Jun 17, 2022 11:06 AM shiy.f...@fujitsu.com  
wrote:
> 
> Attached the new version of patch set. I also moved the partitioned table
> check
> in logicalrep_rel_mark_updatable() to check_relation_updatable() as
> discussed
> [2].
> 

Attached back-branch patches of the first patch.

Regards,
Shi yu


v10-pg13-0001-Fix-partition-table-s-RI-checking-on-the-subsc_patch
Description:  v10-pg13-0001-Fix-partition-table-s-RI-checking-on-the-subsc_patch


v10-pg14-0001-Fix-partition-table-s-RI-checking-on-the-subsc_patch
Description:  v10-pg14-0001-Fix-partition-table-s-RI-checking-on-the-subsc_patch


v10-0001-Fix-partition-table-s-RI-checking-on-the-subscri.patch
Description:  v10-0001-Fix-partition-table-s-RI-checking-on-the-subscri.patch


Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

2022-06-16 Thread Andres Freund
Hi,

On 2022-06-17 16:53:31 +1200, David Rowley wrote:
> On Fri, 17 Jun 2022 at 15:33, Peter Geoghegan  wrote:
> > Have you tried this with the insert benchmark [1]?
> 
> I was mostly focusing on the performance of the hashed saop feature
> after having removed the additional fields that pushed ExprEvalStep
> over 64 bytes in 14.
> 
> I agree it would be good to do further benchmarking to see if there's
> anything else that's snuck into 15 that's slowed that benchmark down,
> but we can likely work on that after we get the ExprEvalStep size back
> to 64 bytes again.

I did reproduce a regression between 14 and 15, using both pgbench -Mprepared
-S (scale 1) and TPC-H Q01 (scale 5). Between 7-10% - not good, particularly
that that's not been found so far. Fixing the json size issue gets that down
to ~2%. Not sure what that's caused by yet.

Greetings,

Andres Freund
>From 8c1ec2f6dce304ae600177a4b5bdb936e8fb65b0 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Thu, 16 Jun 2022 18:28:39 -0700
Subject: [PATCH v1 1/2] wip: fix EEOP_HASHED_SCALARARRAYOP state width.

Author:
Reviewed-By:
Discussion: https://postgr.es/m/20220616233130.rparivafipt6d...@alap3.anarazel.de
Backpatch:
---
 src/include/executor/execExpr.h   |  5 -
 src/backend/executor/execExpr.c   | 10 --
 src/backend/executor/execExprInterp.c | 18 ++
 3 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/src/include/executor/execExpr.h b/src/include/executor/execExpr.h
index e34db8c93cb..2fba1450c65 100644
--- a/src/include/executor/execExpr.h
+++ b/src/include/executor/execExpr.h
@@ -582,12 +582,7 @@ typedef struct ExprEvalStep
 			struct ScalarArrayOpExprHashTable *elements_tab;
 			FmgrInfo   *finfo;	/* function's lookup data */
 			FunctionCallInfo fcinfo_data;	/* arguments etc */
-			/* faster to access without additional indirection: */
-			PGFunction	fn_addr;	/* actual call address */
 			FmgrInfo   *hash_finfo; /* function's lookup data */
-			FunctionCallInfo hash_fcinfo_data;	/* arguments etc */
-			/* faster to access without additional indirection: */
-			PGFunction	hash_fn_addr;	/* actual call address */
 		}			hashedscalararrayop;
 
 		/* for EEOP_XMLEXPR */
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 2831e7978b5..4128248be40 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -1204,7 +1204,6 @@ ExecInitExprRec(Expr *node, ExprState *state,
 FunctionCallInfo fcinfo;
 AclResult	aclresult;
 FmgrInfo   *hash_finfo;
-FunctionCallInfo hash_fcinfo;
 Oid			cmpfuncid;
 
 /*
@@ -1263,16 +1262,10 @@ ExecInitExprRec(Expr *node, ExprState *state,
 if (OidIsValid(opexpr->hashfuncid))
 {
 	hash_finfo = palloc0(sizeof(FmgrInfo));
-	hash_fcinfo = palloc0(SizeForFunctionCallInfo(1));
 	fmgr_info(opexpr->hashfuncid, hash_finfo);
 	fmgr_info_set_expr((Node *) node, hash_finfo);
-	InitFunctionCallInfoData(*hash_fcinfo, hash_finfo,
-			 1, opexpr->inputcollid, NULL,
-			 NULL);
 
 	scratch.d.hashedscalararrayop.hash_finfo = hash_finfo;
-	scratch.d.hashedscalararrayop.hash_fcinfo_data = hash_fcinfo;
-	scratch.d.hashedscalararrayop.hash_fn_addr = hash_finfo->fn_addr;
 
 	/* Evaluate scalar directly into left function argument */
 	ExecInitExprRec(scalararg, state,
@@ -1292,11 +1285,8 @@ ExecInitExprRec(Expr *node, ExprState *state,
 	scratch.d.hashedscalararrayop.inclause = opexpr->useOr;
 	scratch.d.hashedscalararrayop.finfo = finfo;
 	scratch.d.hashedscalararrayop.fcinfo_data = fcinfo;
-	scratch.d.hashedscalararrayop.fn_addr = finfo->fn_addr;
 
 	scratch.d.hashedscalararrayop.hash_finfo = hash_finfo;
-	scratch.d.hashedscalararrayop.hash_fcinfo_data = hash_fcinfo;
-	scratch.d.hashedscalararrayop.hash_fn_addr = hash_finfo->fn_addr;
 
 	ExprEvalPushStep(state, );
 }
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index eaec697bb38..77e1cb1b7b4 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -217,6 +217,7 @@ typedef struct ScalarArrayOpExprHashTable
 {
 	saophash_hash *hashtab;		/* underlying hash table */
 	struct ExprEvalStep *op;
+	FunctionCallInfo hash_fcinfo_data;	/* arguments etc */
 } ScalarArrayOpExprHashTable;
 
 /* Define parameters for ScalarArrayOpExpr hash table code generation. */
@@ -3474,13 +3475,13 @@ static uint32
 saop_element_hash(struct saophash_hash *tb, Datum key)
 {
 	ScalarArrayOpExprHashTable *elements_tab = (ScalarArrayOpExprHashTable *) tb->private_data;
-	FunctionCallInfo fcinfo = elements_tab->op->d.hashedscalararrayop.hash_fcinfo_data;
+	FunctionCallInfo fcinfo = elements_tab->hash_fcinfo_data;
 	Datum		hash;
 
 	fcinfo->args[0].value = key;
 	fcinfo->args[0].isnull = false;
 
-	hash = elements_tab->op->d.hashedscalararrayop.hash_fn_addr(fcinfo);
+	hash = 

Re: libpq: Remove redundant null pointer checks before free()

2022-06-16 Thread Tom Lane
Michael Paquier  writes:
> On Thu, Jun 16, 2022 at 10:07:33PM +0200, Peter Eisentraut wrote:
>> calls, where the "if" part is unnecessary.  This is of course pretty
>> harmless, but some functions like scram_free() and freePGconn() have become
>> so bulky that it becomes annoying.  So while I was doing some work in that
>> area I undertook to simplify this.

> Seems fine.  Would some of the buildfarm dinosaurs hiccup on that?
> gaur is one that comes into mind. 

Doubt it.  (In any case, gaur/pademelon are unlikely to be seen
again after a hardware failure --- I'm working on resurrecting that
machine using modern NetBSD on an external drive, but its HPUX
installation probably isn't coming back.)

POSIX has required free(NULL) to be a no-op since at least SUSv2 (1997).
Even back then, the machines that failed on it were legacy devices,
like then-decade-old SunOS versions.  So I don't think that Peter's
proposal has any portability risk today.

Having said that, the pattern "if (x) free(x);" is absolutely
ubiquitous across our code, and so I'm not sure that I'm on
board with undoing it only in libpq.  I'd be happier if we made
a push to get rid of it everywhere.  Notably, I think the choice
that pfree(NULL) is disallowed traces directly to worries about
coding-pattern-compatibility with pre-POSIX free().  Should we
revisit that?

Independently of that concern, how much of a back-patch hazard
might we create with such changes?

regards, tom lane




Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

2022-06-16 Thread David Rowley
On Fri, 17 Jun 2022 at 15:33, Peter Geoghegan  wrote:
> Have you tried this with the insert benchmark [1]?

I was mostly focusing on the performance of the hashed saop feature
after having removed the additional fields that pushed ExprEvalStep
over 64 bytes in 14.

I agree it would be good to do further benchmarking to see if there's
anything else that's snuck into 15 that's slowed that benchmark down,
but we can likely work on that after we get the ExprEvalStep size back
to 64 bytes again.

David




Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

2022-06-16 Thread Peter Geoghegan
On Thu, Jun 16, 2022 at 7:15 PM David Rowley  wrote:
> So, there appears to be no performance regression due to the extra
> indirection. There's maybe even some gains due to the smaller step
> size.

Have you tried this with the insert benchmark [1]?

I've run it myself in the past (when working on B-Tree deduplication).
It's quite straightforward to set up and run.

[1] http://smalldatum.blogspot.com/2017/06/the-insert-benchmark.html
-- 
Peter Geoghegan




Re: libpq: Remove redundant null pointer checks before free()

2022-06-16 Thread Michael Paquier
On Thu, Jun 16, 2022 at 10:07:33PM +0200, Peter Eisentraut wrote:
> calls, where the "if" part is unnecessary.  This is of course pretty
> harmless, but some functions like scram_free() and freePGconn() have become
> so bulky that it becomes annoying.  So while I was doing some work in that
> area I undertook to simplify this.

Seems fine.  Would some of the buildfarm dinosaurs hiccup on that?
gaur is one that comes into mind. 
--
Michael


signature.asc
Description: PGP signature


RE: Replica Identity check of partition table on subscriber

2022-06-16 Thread shiy.f...@fujitsu.com
On Thu, Jun 16, 2022 2:13 PM Amit Langote  wrote:
> 
> Hi,
> 
> On Thu, Jun 16, 2022 at 2:07 PM shiy.f...@fujitsu.com
>  wrote:
> > On Wed, Jun 15, 2022 8:30 PM Amit Kapila 
> wrote:
> > > I have pushed the first bug-fix patch today.
> >
> > Attached the remaining patches which are rebased.
> 
> Thanks.
> 
> Comments on v9-0001:

Thanks for your comments.

> 
> + * Don't throw any error here just mark the relation entry as not updatable,
> + * as replica identity is only for updates and deletes but inserts can be
> + * replicated even without it.
> 
> I know you're simply copying the old comment, but I think we can
> rewrite it to be slightly more useful:
> 
> We just mark the relation entry as not updatable here if the local
> replica identity is found to be insufficient and leave it to
> check_relation_updatable() to throw the actual error if needed.
> 

Modified as you suggested in another mail [1].

> +   /* Check that replica identity matches. */
> +   logicalrep_rel_mark_updatable(entry);
> 
> Maybe the comment (there are 2 instances) should say:
> 
> Set if the table's replica identity is enough to apply update/delete.
> 

Modified as suggested.

> Finally,
> 
> +# Alter REPLICA IDENTITY on subscriber.
> +# No REPLICA IDENTITY in the partitioned table on subscriber, but what we
> check
> +# is the partition, so it works fine.
> 
> For consistency with other recently added comments, I'd suggest the
> following wording:
> 
> Test that replication works correctly as long as the leaf partition
> has the necessary REPLICA IDENTITY, even though the actual target
> partitioned table does not.
> 

Modified as suggested.

> On v9-0002:
> 
> +   /* cleanup the invalid attrmap */
> 
> It seems that "invalid" here really means no-longer-useful, so we
> should use that phrase as a nearby comment does:
> 
> Release the no-longer-useful attrmap, if any.
> 

Modified as suggested.

Attached the new version of patch set. I also moved the partitioned table check
in logicalrep_rel_mark_updatable() to check_relation_updatable() as discussed
[2].

[1] 
https://www.postgresql.org/message-id/CA%2BHiwqG3Xi%3DwH4rBHm61ku-j0gm%2B-rc5VmDHxf%3DTeFkUsHtooA%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CA%2BHiwqHfN789ekiYVE%2B0xsLswMosMrWBwv4cPvYgWREWejw7HA%40mail.gmail.com

Regards,
Shi yu



v10-0001-Fix-partition-table-s-RI-checking-on-the-subscri.patch
Description:  v10-0001-Fix-partition-table-s-RI-checking-on-the-subscri.patch


v10-0002-Fix-memory-leak-about-attrmap.patch
Description: v10-0002-Fix-memory-leak-about-attrmap.patch


Re: CREATE TABLE ( .. STORAGE ..)

2022-06-16 Thread Kyotaro Horiguchi
Thanks! I have been annoyed sometimes by the lack of this feature.

At Thu, 16 Jun 2022 16:40:55 +0300, Aleksander Alekseev 
 wrote in 
> Hi Matthias,
> 
> > Apart from this comment on the format of the patch, the result seems solid.
> 
> Many thanks.
> 
> > When updating a patchset generally we try to keep the patches
> > self-contained, and update patches as opposed to adding incremental
> > patches to the set.
> 
> My reasoning was to separate my changes from the ones originally
> proposed by Teodor. After doing `git am` locally a reviewer can see
> them separately, or together with `git diff origin/master`, whatever
> he or she prefers. The committer can choose between committing two
> patches ony by one, or rebasing them to a single commit.
> 
> I will avoid the "patch for the patch" practice from now on. Sorry for
> the inconvenience.

0001 contains one tranling whitespace error. (which "git diff --check"
can detect)

The modified doc line gets too long to me. Maybe we should wrap it as
done in other lines of the same page.

I think we should avoid descriptions dead-copied between pages. In
this case, I think we should remove the duplicate part of the
description of ALTER TABLE then replace with something like "See
CREATE TABLE for details".

As the result of copying-in the description, SET-STORAGE and
COMPRESSION in the page of CREATE-TABLE use different articles in the
same context.

> SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }
>   This form sets the storage mode for *a* column.

> COMPRESSION compression_method
>   The COMPRESSION clause sets the compression method for *the* column.

FWIW I feel "the" is better here, but anyway we should unify them.

  
 static char GetAttributeCompression(Oid atttypid, char *compression);
+static charGetAttributeStorage(const char *storagemode);

The whitespace after "char" is TAB which differs from SPC used in
neigbouring lines.

In the grammar, COMPRESSION uses ColId, but STORAGE uses name. It
seems to me the STORAGE is correct here, though.. (So, do we need to
fix COMPRESSION syntax?)


This adds support for "ADD COLUMN SET STORAGE" but it is not described
in the doc.  COMPRESSION is not described, too.  Shouldn't we add the
both this time?  Or the fix for COMPRESSION can be a different patch.

Now that we have three column options COMPRESSION, COLLATE and STORGE
which has the strict order in syntax.  I wonder it can be relaxed but
it might be too much..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

2022-06-16 Thread David Rowley
On Fri, 17 Jun 2022 at 11:31, Andres Freund  wrote:
> hashedscalararrayop/EEOP_HASHED_SCALARARRAYOP is 64 bytes, even though the
> limit is 40 bytes.

> commit 50e17ad281b8d1c1b410c9833955bc80fbad4078
> Author: David Rowley 
> Date:   2021-04-08 23:51:22 +1200
>
> Speedup ScalarArrayOpExpr evaluation

I've put together the attached patch which removes 4 fields from the
hashedscalararrayop portion of the struct which, once the JSON part is
fixed, will put sizeof(ExprEvalStep) back down to 64 bytes again.

The attached patch causes some extra pointer dereferencing to perform
a hashed saop step, so I tested the performance on f4fb45d15 (prior to
the JSON patch that pushed the sizeof(ExprEvalStep) up further. I
found:

setup:
create table a (a int);
insert into a select x from generate_series(100,200) x;

bench.sql
select * from a where a in(1,2,3,4,5,6,7,8,9,10);

f4fb45d15 + reduce_sizeof_hashedsaop_ExprEvalStep.patch
drowley@amd3990x:~$ pgbench -n -f bench.sql -T 60 -M prepared postgres
tps = 44.841851 (without initial connection time)
tps = 44.986472 (without initial connection time)
tps = 44.944315 (without initial connection time)

f4fb45d15
drowley@amd3990x:~$ pgbench -n -f bench.sql -T 60 -M prepared postgres
tps = 44.446127 (without initial connection time)
tps = 44.614134 (without initial connection time)
tps = 44.895011 (without initial connection time)

(Patched is ~0.61% faster here)

So, there appears to be no performance regression due to the extra
indirection. There's maybe even some gains due to the smaller step
size.

David
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 2831e7978b..82bf47ddb9 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -1270,10 +1270,6 @@ ExecInitExprRec(Expr *node, ExprState *state,

 1, opexpr->inputcollid, NULL,

 NULL);
 
-   
scratch.d.hashedscalararrayop.hash_finfo = hash_finfo;
-   
scratch.d.hashedscalararrayop.hash_fcinfo_data = hash_fcinfo;
-   
scratch.d.hashedscalararrayop.hash_fn_addr = hash_finfo->fn_addr;
-
/* Evaluate scalar directly into left 
function argument */
ExecInitExprRec(scalararg, state,

>args[0].value, >args[0].isnull);
@@ -1290,13 +1286,8 @@ ExecInitExprRec(Expr *node, ExprState *state,
/* And perform the operation */
scratch.opcode = 
EEOP_HASHED_SCALARARRAYOP;
scratch.d.hashedscalararrayop.inclause 
= opexpr->useOr;
-   scratch.d.hashedscalararrayop.finfo = 
finfo;

scratch.d.hashedscalararrayop.fcinfo_data = fcinfo;
-   scratch.d.hashedscalararrayop.fn_addr = 
finfo->fn_addr;
-
-   
scratch.d.hashedscalararrayop.hash_finfo = hash_finfo;

scratch.d.hashedscalararrayop.hash_fcinfo_data = hash_fcinfo;
-   
scratch.d.hashedscalararrayop.hash_fn_addr = hash_finfo->fn_addr;
 
ExprEvalPushStep(state, );
}
diff --git a/src/backend/executor/execExprInterp.c 
b/src/backend/executor/execExprInterp.c
index eaec697bb3..c8dc859e1d 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3480,7 +3480,7 @@ saop_element_hash(struct saophash_hash *tb, Datum key)
fcinfo->args[0].value = key;
fcinfo->args[0].isnull = false;
 
-   hash = elements_tab->op->d.hashedscalararrayop.hash_fn_addr(fcinfo);
+   hash = 
elements_tab->op->d.hashedscalararrayop.hash_fcinfo_data->flinfo->fn_addr(fcinfo);
 
return DatumGetUInt32(hash);
 }
@@ -3502,7 +3502,7 @@ saop_hash_element_match(struct saophash_hash *tb, Datum 
key1, Datum key2)
fcinfo->args[1].value = key2;
fcinfo->args[1].isnull = false;
 
-   result = elements_tab->op->d.hashedscalararrayop.fn_addr(fcinfo);
+   result = 
elements_tab->op->d.hashedscalararrayop.fcinfo_data->flinfo->fn_addr(fcinfo);
 
return DatumGetBool(result);
 }
@@ -3526,7 +3526,7 @@ ExecEvalHashedScalarArrayOp(ExprState *state, 
ExprEvalStep *op, ExprContext *eco
ScalarArrayOpExprHashTable *elements_tab = 
op->d.hashedscalararrayop.elements_tab;
FunctionCallInfo fcinfo = op->d.hashedscalararrayop.fcinfo_data;
boolinclause = op->d.hashedscalararrayop.inclause;
-   bool

Re: pg_upgrade (12->14) fails on aggregate

2022-06-16 Thread Justin Pryzby
On Wed, Jun 15, 2022 at 03:32:04PM -0400, Tom Lane wrote:
> Justin Pryzby  writes:
> > But Petr has a point - pg_upgrade should aspire to catch errors in --check,
> > rather than starting and then leaving a mess behind for the user to clean up
> 
> Agreed; pg_upgrade has historically tried to find problems similar to
> this.  However, it's not just aggregates that are at risk.  People
> might also have built user-defined plain functions, or operators,
> atop these functions.  How far do we want to go in looking?

I wasn't yet able to construct a user-defined function which fails to reload.

> As for the query, I think it could be simplified quite a bit by
> relying on regprocedure literals, that is something like

Yes, thanks.

> Also, I think you need to check aggfinalfn too.

Done but maybe needs more cleanup.

> Also, I'd be inclined to reject system-provided objects by checking
> for OID >= 16384 rather than hard-wiring assumptions about things
> being in pg_catalog or not.

To me, oid>=16384 seems more hard-wired than namespace!='pg_catalog'.

This patch also resolves an issue with PQfinish()/dangling connections.

-- 
Justin
>From d8c3a2d4e4971b671347cc1c73c70e67049599c9 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 10 Jun 2022 11:17:36 -0500
Subject: [PATCH] WIP: pg_upgrade --check: detect old polymorphics from pre-14

These fail when upgrading from pre-14 (as expected), but it should fail during
pg_upgrade --check.

CREATE AGGREGATE array_accum(anyelement) (sfunc=array_append, stype=anyarray, initcond='{}');
CREATE OPERATOR !@# (PROCEDURE = array_append, LEFTARG=anyarray, rightarg=anyelement);

See also:
9e38c2bb5093ceb0c04d6315ccd8975bd17add66
97f73a978fc1aca59c6ad765548ce0096d95a923
---
 src/bin/pg_upgrade/check.c | 123 +
 1 file changed, 123 insertions(+)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index ace7387edaf..bfdadc4d685 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -31,6 +31,7 @@ static void check_for_jsonb_9_4_usage(ClusterInfo *cluster);
 static void check_for_pg_role_prefix(ClusterInfo *cluster);
 static void check_for_new_tablespace_dir(ClusterInfo *new_cluster);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
+static void check_for_old_polymorphics(ClusterInfo *cluster);
 static char *get_canonical_locale_name(int category, const char *locale);
 
 
@@ -122,6 +123,12 @@ check_and_dump_old_cluster(bool live_check)
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1300)
 		check_for_user_defined_postfix_ops(_cluster);
 
+	/*
+	 * Pre-PG 14 changed from anyarray to anycompatibelarray.
+	 */
+	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1300)
+		check_for_old_polymorphics(_cluster);
+
 	/*
 	 * Pre-PG 12 allowed tables to be declared WITH OIDS, which is not
 	 * supported anymore. Verify there are none, iff applicable.
@@ -775,6 +782,122 @@ check_proper_datallowconn(ClusterInfo *cluster)
 }
 
 
+/*
+ *	check_for_old_polymorphics()
+ *
+ *	Make sure nothing is using old polymorphic functions with
+ *	anyarray/anyelement rather than the new anycompatible variants.
+ */
+static void
+check_for_old_polymorphics(ClusterInfo *cluster)
+{
+	PGresult   *res;
+	FILE	   *script = NULL;
+	char		output_path[MAXPGPATH];
+
+	prep_status("Checking for use of old polymorphic functions");
+
+	snprintf(output_path, sizeof(output_path), "%s/%s",
+			 log_opts.basedir,
+			 "databases_with_old_polymorphics.txt");
+
+#define OLD_POLYMORPHICS \
+	"'array_append(anyarray,anyelement)', " \
+	"'array_cat(anyarray,anyarray)', " \
+	"'array_position(anyarray,anyelement)', " \
+	"'array_position(anyarray,anyelement,integer)', " \
+	"'array_positions(anyarray,anyelement)', " \
+	"'array_prepend(anyelement,anyarray)', " \
+	"'array_remove(anyarray,anyelement)', " \
+	"'array_replace(anyarray,anyelement,anyelement)', " \
+	"'width_bucket(anyelement,anyarray)' "
+
+	for (int dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		bool		db_used = false;
+		DbInfo	   *active_db = >dbarr.dbs[dbnum];
+		PGconn	   *conn = connectToServer(cluster, active_db->db_name);
+		int			ntups;
+		int			i_what,
+	i_objname;
+
+		res = executeQueryOrDie(conn,
+/* Aggregate transition functions */
+"SELECT 'aggregate' AS what, p.oid::regprocedure::text AS objname "
+"FROM pg_proc p "
+"JOIN pg_aggregate a ON a.aggfnoid=p.oid "
+"JOIN pg_proc transfn ON transfn.oid=a.aggtransfn "
+"JOIN pg_namespace pn ON pn.oid=p.pronamespace "
+"JOIN pg_namespace transnsp ON transnsp.oid=transfn.pronamespace "
+"WHERE pn.nspname != 'pg_catalog' "
+/* Before v11, used proisagg=true, and afterwards uses prokind='a' */
+"AND transnsp.nspname = 'pg_catalog' "
+"AND a.aggtransfn = ANY(ARRAY[" OLD_POLYMORPHICS "]::regprocedure[]) "
+// "AND aggtranstype='anyarray'::regtype
+
+/* Aggregate final functions */

Re: fix stats_fetch_consistency value in postgresql.conf.sample

2022-06-16 Thread Kyotaro Horiguchi
At Thu, 16 Jun 2022 08:23:07 -0500, Justin Pryzby  wrote 
in 
> On Thu, Jun 16, 2022 at 05:19:46PM +0900, Kyotaro Horiguchi wrote:
> > At Sat, 11 Jun 2022 09:41:37 -0500, Justin Pryzby  
> > wrote in 
> > > Note that this gives:
> > > 
> > > guc.c:7573:9: warning: ‘dst’ may be used uninitialized in this function 
> > > [-Wmaybe-uninitialized]
> > 
> > Mmm. I don't have an idea where the 'dst' came from...
> 
> Well, in your latest patch, you've renamed it.
> 
> guc.c:7586:19: warning: ‘result’ may be used uninitialized in this function 
> [-Wmaybe-uninitialized]
>  7586 |  PG_RETURN_TEXT_P(cstring_to_text(result));

Ooo. I find that the patch on my hand was different from that on this
list by some reason uncertain to me.  I now understand what's
happening.

At Sat, 11 Jun 2022 09:41:37 -0500, Justin Pryzby  wrote 
in 
> with gcc version 9.2.1 20191008 (Ubuntu 9.2.1-9ubuntu2)

My compiler (gcc 8.5.0) (with -Wswitch) is satisfied by finding that
the switch() covers all enum values.  I don't know why the new
compiler complains with this, but compilers in such environment should
shut up by the following change.


-   char   *result;
+   char   *result = "";

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From d67c13712ab1d47956c1c311d7ed80342fa51e2f Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 16 Jun 2022 17:08:02 +0900
Subject: [PATCH v3] Add fileval-bootval consistency check of GUC parameters

We should keep GUC variables consistent between the default values
written in postgresql.conf.sample and defined in guc.c. Add an
automated way to check for the consistency to the TAP test suite. Some
variables are still excluded since they cannot be checked simple way.
---
 src/backend/utils/misc/guc.c  | 53 
 src/include/catalog/pg_proc.dat   |  5 ++
 src/test/modules/test_misc/t/003_check_guc.pl | 60 +--
 3 files changed, 112 insertions(+), 6 deletions(-)

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a7cc49898b..136a21ba88 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -7520,6 +7520,59 @@ parse_and_validate_value(struct config_generic *record,
 	return true;
 }
 
+/*
+ * Convert value to unitless value according to the specified GUC variable
+ */
+Datum
+pg_config_unitless_value(PG_FUNCTION_ARGS)
+{
+	char   *name = "";
+	char   *value = "";
+	struct config_generic *record;
+	char   *result = "";
+	void   *extra;
+	union config_var_val val;
+	const char *p;
+	char   buffer[256];
+
+	if (!PG_ARGISNULL(0))
+		name = text_to_cstring(PG_GETARG_TEXT_PP(0));
+	if (!PG_ARGISNULL(1))
+		value = text_to_cstring(PG_GETARG_TEXT_PP(1));
+
+	record = find_option(name, true, false, ERROR);
+
+	parse_and_validate_value(record, name, value, PGC_S_TEST, WARNING,
+			 , );
+
+	switch (record->vartype)
+	{
+		case PGC_BOOL:
+			result = (val.boolval ? "on" : "off");
+			break;
+		case PGC_INT:
+			snprintf(buffer, sizeof(buffer), "%d", val.intval);
+			result = pstrdup(buffer);
+			break;
+		case PGC_REAL:
+			snprintf(buffer, sizeof(buffer), "%g", val.realval);
+			result = pstrdup(buffer);
+			break;
+		case PGC_STRING:
+			p = val.stringval;
+			if (p == NULL)
+p = "";
+			result = pstrdup(p);
+			break;
+		case PGC_ENUM:
+			p = config_enum_lookup_by_value((struct config_enum *)record,
+			val.boolval);
+			result = pstrdup(p);
+			break;
+	}
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
 
 /*
  * Sets option `name' to given value.
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 87aa571a33..9eb2a584e1 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6118,6 +6118,11 @@
   proname => 'pg_settings_get_flags', provolatile => 's', prorettype => '_text',
   proargtypes => 'text', prosrc => 'pg_settings_get_flags' },
 
+{ oid => '9956', descr => 'normalize value to the unit of specified GUC',
+  proname => 'pg_config_unitless_value', proisstrict => 'f',
+  provolatile => 's', prorettype => 'text', proargtypes => 'text text',
+  proargnames => '{varname,value}', prosrc => 'pg_config_unitless_value' },
+
 { oid => '3329', descr => 'show config file settings',
   proname => 'pg_show_all_file_settings', prorows => '1000', proretset => 't',
   provolatile => 'v', prorettype => 'record', proargtypes => '',
diff --git a/src/test/modules/test_misc/t/003_check_guc.pl b/src/test/modules/test_misc/t/003_check_guc.pl
index 60459ef759..9de6a386de 100644
--- a/src/test/modules/test_misc/t/003_check_guc.pl
+++ b/src/test/modules/test_misc/t/003_check_guc.pl
@@ -11,18 +11,41 @@ my $node = PostgreSQL::Test::Cluster->new('main');
 $node->init;
 $node->start;
 
+# parameter names that cannot get consistency check performed
+my @ignored_parameters = (
+	'data_directory',
+	'hba_file',
+	'ident_file',
+	'krb_server_keyfile',
+	'max_stack_depth',
+	'bgwriter_flush_after',
+	'wal_sync_method',
+	

Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

2022-06-16 Thread Andres Freund
Hi,

On 2022-06-16 16:31:30 -0700, Andres Freund wrote:
> The EEOP_JSONEXPR stuff was added during 15 development in:
>
> commit 1a36bc9dba8eae90963a586d37b6457b32b2fed4
> Author: Andrew Dunstan 
> Date:   2022-03-03 13:11:14 -0500
>
> SQL/JSON query functions

I'm quite confused about part of the struct definition of this:

struct JsonCoercionsState
{
struct JsonCoercionState
{
JsonCoercion *coercion; /* coercion 
expression */
ExprState  *estate; /* coercion 
expression state */
}   null,
string,
numeric,
boolean,
date,
time,
timetz,
timestamp,
timestamptz,
composite;
}   coercions;  /* states for 
coercion from SQL/JSON item
 * 
types directly to the output type */

Why on earth do we have coercion state for all these different types? That
really adds up:

struct {
JsonExpr * jsexpr;   /*24 8 */
struct {
FmgrInfo func;   /*3248 */
/* --- cacheline 1 boundary (64 bytes) was 16 
bytes ago --- */
Oid typioparam;  /*80 4 */
} input; /*3256 */

/* XXX last struct has 4 bytes of padding */

NullableDatum * formatted_expr;  /*88 8 */
NullableDatum * res_expr;/*96 8 */
NullableDatum * coercion_expr;   /*   104 8 */
NullableDatum * pathspec;/*   112 8 */
ExprState * result_expr; /*   120 8 */
/* --- cacheline 2 boundary (128 bytes) --- */
ExprState * default_on_empty;/*   128 8 */
ExprState * default_on_error;/*   136 8 */
List * args; /*   144 8 */
void * cache;/*   152 8 */
struct JsonCoercionsState coercions; /*   160   160 */
} jsonexpr;  /*24   296 */

And why is FmgrInfo stored inline in the struct? Everything else just stores
pointers to FmgrInfo.


Now that I look at this: It's a *bad* idea to have subsidiary ExprState inside
an ExprState. Nearly always the correct thing to do is to build those
expressions. There's plenty memory and evaluation overhead in jumping to a
different expression. And I see no reason for doing it that way here?

This stuff doesn't look ready.

Greetings,

Andres Freund




Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

2022-06-16 Thread Tom Lane
Andres Freund  writes:
> However, jsonexpr/EEOP_JSONEXPR is 296 bytes, and
> hashedscalararrayop/EEOP_HASHED_SCALARARRAYOP is 64 bytes, even though the
> limit is 40 bytes.

Oops.

> Maybe it's worth sticking a StaticAssert() for the struct size
> somewhere.

Indeed.  I thought we had one already.

> I'm a bit wary about that being too noisy, there are some machines with
> odd alignment requirements. Perhaps worth restricting the assertion to
> x86-64 + armv8 or such?

I'd put it in first and only reconsider if it shows unfixable problems.

regards, tom lane




PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

2022-06-16 Thread Andres Freund
Hi,

Mark Callaghan reported a regression in 15, in the post linked here (with
comments in the twitter thread, hence linked here)
https://twitter.com/MarkCallaghanDB/status/1537475430227161098

A differential flame graph shows increased time spent doing memory
allocations, below ExecInitExpr():
https://mdcallag.github.io/reports/22_06_06_ibench.20m.pg.all/l.i0.0.143.15b1.n.svg

Quite the useful comparison.


That lead to me to look at the size of ExprEvalStep - and indeed, it has
*exploded* in 15. 10-13: 64 bytes, 14: 320 bytes.

The comment above the union for data fields says:
/*
 * Inline data for the operation.  Inline data is faster to access, but
 * also bloats the size of all instructions.  The union should be kept 
to
 * no more than 40 bytes on 64-bit systems (so that the entire struct is
 * no more than 64 bytes, a single cacheline on common systems).
 */

However, jsonexpr/EEOP_JSONEXPR is 296 bytes, and
hashedscalararrayop/EEOP_HASHED_SCALARARRAYOP is 64 bytes, even though the
limit is 40 bytes.

The EEOP_JSONEXPR stuff was added during 15 development in:

commit 1a36bc9dba8eae90963a586d37b6457b32b2fed4
Author: Andrew Dunstan 
Date:   2022-03-03 13:11:14 -0500

SQL/JSON query functions


the EEOP_HASHED_SCALARARRAYOP stuff was added during 14 development in:

commit 50e17ad281b8d1c1b410c9833955bc80fbad4078
Author: David Rowley 
Date:   2021-04-08 23:51:22 +1200

Speedup ScalarArrayOpExpr evaluation


Unfortunately ExprEvalStep is public, so I don't think we can fix the 14
regression. If somebody installed updated server packages while the server is
running, we could end up loading extensions referencing ExprEvalStep (at least
plpgsql and LLVMJIT).

It's not great to have an ABI break at this point of the 15 cycle, but I don't
think we have a choice. Exploding the size of ExprEvalStep by ~4x is bad -
both for memory overhead (expressions often have dozens to hundreds of steps)
and expression evaluation performance.


The Hashed SAO case can perhaps be squeezed sufficiently to fit inline, but
clearly that's not going to happen for the json case. So we should just move
that out of line.


Maybe it's worth sticking a StaticAssert() for the struct size somewhere. I'm
a bit wary about that being too noisy, there are some machines with odd
alignment requirements. Perhaps worth restricting the assertion to x86-64 +
armv8 or such?


It very well might be that this isn't the full explanation of the regression
Mark observed. E.g. the difference in DecodeDateTime() looks more likely to be
caused by 591e088dd5b - but we need to fix the above issue, whether it's the
cause of the regression Mark observed or not.

Greetings,

Andres Freund




Re: better page-level checksums

2022-06-16 Thread Bruce Momjian
On Tue, Jun 14, 2022 at 01:42:55PM -0400, Robert Haas wrote:
> Hmm, but on the other hand, if you imagine a scenario in which the
> "storage system extra blob" is actually a nonce for TDE, you need to
> be able to find it before you've decrypted the rest of the page. If
> pd_checksum gives you the offset of that data, you need to exclude it
> from what gets encrypted, which means that you need encrypt three
> separate non-contiguous areas of the page whose combined size is
> unlikely to be a multiple of the encryption algorithm's block size.
> That kind of sucks (and putting it at the end of the page makes it way
> better).

I continue to believe that a nonce is not needed for XTS encryption
mode, and that adding a tamper-detection GCM hash is of limited
usefulness since malicious writes can be done to other critical files
and can be used to find the cluster or encryption keys

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





Re: real/float example for testlibpq3

2022-06-16 Thread Mark Wong
On Thu, Jun 16, 2022 at 03:41:50PM -0400, Tom Lane wrote:
> Robert Haas  writes:
> > On Tue, Jun 14, 2022 at 1:40 PM Mark Wong  wrote:
> >> I've created a function for each data type with the idea that an example
> >> for handling a specific data type can be more easily reviewed by looking
> >> in a single place.
> >> I've added examples for REAL, TIMESTAMP WITHOUT TIME ZONE, and BOOLEAN
> >> to try to illustrate how testlibpq3.sql and testlibpq3.c will grow if
> >> this is a good way to go.
> 
> > I'm not sure that we want to let these test programs grow very large.
> > In particular, I don't think we should let ourselves get sucked into
> > adding an example for every data type under the sun -- if that's
> > wanted, the solution is perhaps to add documentation for the binary
> > formats, not hide impromptu documentation inside a test program.  But
> > doing this much seems OK to me.
> 
> Yeah, "hiding impromptu documentation inside a test program" is what
> this looks like, and I'm not sure that's a reasonable way to go.
> 
> (1) Who's going to think to look in src/test/examples/testlibpq3.c for
> documentation of binary formats?
> 
> (2) The useful details are likely to get buried in notational and
> portability concerns, as I think your build failure illustrates.
> 
> (3) I bet few if any packagers install these files, so that the new
> info would be unavailable to many people.
> 
> I think some new appendix in the main SGML docs would be the appropriate
> place if we want to provide real documentation.

Ok, I'll leave the testlibpq3.c as it was then.  If it's worth keeping
any of those changes, then I can remove the timestamp example because of
the ntohll() portability since that is trivial.

I'll start a new appendix and share again when I have something to show.

Regards,
Mark

--
Mark Wong
EDB: http://www.enterprisedb.com




Re: Nothing is using StrategyNotifyBgWriter() anymore

2022-06-16 Thread Nasby, Jim
Answering my own question... I now see that the wakeup does in fact happen in 
StrategyGetBuffer(). Sorry for the noise.

On 6/16/22, 5:32 PM, "Jim Nasby"  wrote:

While browsing through some of the clock-sweep code I noticed that the 
only place StrategyNotifyBgWriter() is called now is in 
BackgroundWriterMain()[1]. Presumably this isn't what's desired. If 
nothing else, it means the function's description comment is wrong, as 
are comments in BackgroundWriterMain(). This isn't new; 9.2 shows the 
same thing and that's when the function was added. I'm not sure what the 
right fix here is, since ISTM joggling bgwriter for every call to 
BufferAlloc() would be overkill.


1: 

https://doxygen.postgresql.org/freelist_8c.html#aabbd7d3891afc1d8531c3871d08d4b28






Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity

2022-06-16 Thread Imseih (AWS), Sami
>AFAICS you're proposing to add an identifier for a specific plan, but no 
> way to
>know what that plan was?  How are users supposed to use the information if 
> they
>know something changed but don't know what changed exactly?

I see this as a start to do more useful things with plans.

The patch right out the gate exposes the plan_id in EXPLAIN output 
and auto_explain.
This will also be useful for extensions that will provide the plan text.
It is also conceivable that pg_stat_statements can provide an option
To store the plan text?

> - In the POC, the compute_query_id GUC determines if a
>   plan_id is to be computed. Should this be a separate GUC?

>Probably, as computing it will likely be quite expensive.  Some benchmark 
> on
>various workloads would be needed here.

Yes, more benchmarks will be needed here with more complex plans. I have
Only benchmarked with pgbench at this point. 
However, separating this into Its own GUC is what I am leaning towards as well 
and will update the patch.

>I only had a quick look at the patch, but I see that you have some code to
>avoid storing the query text multiple times with different planid.  How 
> does it
>work exactly, and does it ensure that the query text is only removed once 
> the
>last entry that uses it is removed?  It seems that you identify a specific
>query text by queryid, but that seems wrong as collision can (easily?) 
> happen
>in different databases.  The real identifier of a query text should be 
> (dbid,
>queryid).

The idea is to lookup the offsets and length of the text in the external file 
by querid
only. Therefore we can store similar query text for multiple pgss_hash entries
only once. 

When a new entry in pgss is not found, the new qtext_hash is consulted to 
see if it has information about the offsets/length of the queryid. If found in
qtext_hash, the new pgss_hash entry is created with the offsets found. 

If not found in qtext_hash, the query text will be (normalized) and stored in 
the external file. Then, a new entry will be created in qtext_hash and 
an entry in pgss_hash.

Of course we need to also handle the gc_qtext cleanups, entry_reset, startup
and shutdown scenarios. The patch does this, but I will go back and do more
testing.

>Note that this problem already exists, as the query texts are now stored 
> per
>(userid, dbid, queryid, istoplevel).  Maybe this part could be split in a
>different commit as it could already be useful without a planid.

Good point. I will separate this patch.

Regards, 

Sami Imseih
Amazon Web Services





Re: Typo in ro.po file?

2022-06-16 Thread Bruce Momjian
On Wed, Jun 15, 2022 at 09:29:02AM +0200, Peter Eisentraut wrote:
> On 14.06.22 05:34, Peter Smith wrote:
> > FWIW, I stumbled on this obscure possible typo (?) in 
> > src/pl/plperl/po/ro.po:
> > 
> > ~~~
> > 
> > #: plperl.c:788
> > msgid "while parsing Perl initialization"
> > msgstr "în timpul parsing inițializării Perl"
> > #: plperl.c:793
> > msgid "while running Perl initialization"
> > msgstr "în timpul rulării intializării Perl"
> > 
> > ~~~
> > 
> > (Notice the missing 'i' -  "inițializării" versus "intializării")
> 
> Fixed in translations repository.  Thanks.

What email list should such fixes be posted to?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson





libpq: Remove redundant null pointer checks before free()

2022-06-16 Thread Peter Eisentraut

libpq contains a lot of

if (foo)
free(foo);

calls, where the "if" part is unnecessary.  This is of course pretty 
harmless, but some functions like scram_free() and freePGconn() have 
become so bulky that it becomes annoying.  So while I was doing some 
work in that area I undertook to simplify this.From f7a10c1fca10c76b89dbf402ad9418ed8f0675d7 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 16 Jun 2022 21:50:56 +0200
Subject: [PATCH] libpq: Remove redundant null pointer checks before free()

---
 src/interfaces/libpq/fe-auth-scram.c|  33 ++--
 src/interfaces/libpq/fe-auth.c  |  18 +-
 src/interfaces/libpq/fe-connect.c   | 211 
 src/interfaces/libpq/fe-exec.c  |   6 +-
 src/interfaces/libpq/fe-print.c |  23 +--
 src/interfaces/libpq/fe-secure-common.c |   3 +-
 6 files changed, 97 insertions(+), 197 deletions(-)

diff --git a/src/interfaces/libpq/fe-auth-scram.c 
b/src/interfaces/libpq/fe-auth-scram.c
index e616200704..5012806fa5 100644
--- a/src/interfaces/libpq/fe-auth-scram.c
+++ b/src/interfaces/libpq/fe-auth-scram.c
@@ -174,30 +174,21 @@ scram_free(void *opaq)
 {
fe_scram_state *state = (fe_scram_state *) opaq;
 
-   if (state->password)
-   free(state->password);
-   if (state->sasl_mechanism)
-   free(state->sasl_mechanism);
+   free(state->password);
+   free(state->sasl_mechanism);
 
/* client messages */
-   if (state->client_nonce)
-   free(state->client_nonce);
-   if (state->client_first_message_bare)
-   free(state->client_first_message_bare);
-   if (state->client_final_message_without_proof)
-   free(state->client_final_message_without_proof);
+   free(state->client_nonce);
+   free(state->client_first_message_bare);
+   free(state->client_final_message_without_proof);
 
/* first message from server */
-   if (state->server_first_message)
-   free(state->server_first_message);
-   if (state->salt)
-   free(state->salt);
-   if (state->nonce)
-   free(state->nonce);
+   free(state->server_first_message);
+   free(state->salt);
+   free(state->nonce);
 
/* final message from server */
-   if (state->server_final_message)
-   free(state->server_final_message);
+   free(state->server_final_message);
 
free(state);
 }
@@ -941,8 +932,7 @@ pg_fe_scram_build_secret(const char *password, const char 
**errstr)
if (!pg_strong_random(saltbuf, SCRAM_DEFAULT_SALT_LEN))
{
*errstr = _("failed to generate random salt");
-   if (prep_password)
-   free(prep_password);
+   free(prep_password);
return NULL;
}
 
@@ -950,8 +940,7 @@ pg_fe_scram_build_secret(const char *password, const char 
**errstr)

SCRAM_DEFAULT_ITERATIONS, password,
errstr);
 
-   if (prep_password)
-   free(prep_password);
+   free(prep_password);
 
return result;
 }
diff --git a/src/interfaces/libpq/fe-auth.c b/src/interfaces/libpq/fe-auth.c
index 0a072a36dc..49a1c626f6 100644
--- a/src/interfaces/libpq/fe-auth.c
+++ b/src/interfaces/libpq/fe-auth.c
@@ -107,8 +107,7 @@ pg_GSS_continue(PGconn *conn, int payloadlen)
NULL,
NULL);
 
-   if (ginbuf.value)
-   free(ginbuf.value);
+   free(ginbuf.value);
 
if (goutbuf.length != 0)
{
@@ -270,8 +269,7 @@ pg_SSPI_continue(PGconn *conn, int payloadlen)
  NULL);
 
/* we don't need the input anymore */
-   if (inputbuf)
-   free(inputbuf);
+   free(inputbuf);
 
if (r != SEC_E_OK && r != SEC_I_CONTINUE_NEEDED)
{
@@ -604,21 +602,18 @@ pg_SASL_init(PGconn *conn, int payloadlen)
goto error;
 
termPQExpBuffer(_buf);
-   if (initialresponse)
-   free(initialresponse);
+   free(initialresponse);
 
return STATUS_OK;
 
 error:
termPQExpBuffer(_buf);
-   if (initialresponse)
-   free(initialresponse);
+   free(initialresponse);
return STATUS_ERROR;
 
 oom_error:
termPQExpBuffer(_buf);
-   if (initialresponse)
-   free(initialresponse);
+   free(initialresponse);
appendPQExpBufferStr(>errorMessage,
 libpq_gettext("out of 
memory\n"));
return STATUS_ERROR;
@@ -831,8 +826,7 @@ pg_password_sendauth(PGconn *conn, const char *password, 
AuthRequest areq)
return STATUS_ERROR;
}
  

Re: SGML doc file references

2022-06-16 Thread Peter Eisentraut

On 16.06.22 19:30, Josh Soref wrote:

I'm reading the docs (I'm trying to figure out some replication
things) and I was wondering why the file references [1] don't match
the file names.


I think it was never a goal to absolutely make them match all the time, 
so a lot of the differences might be accidental.  There are also some 
tooling restrictions for what characters can be in the output file names.





Re: real/float example for testlibpq3

2022-06-16 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jun 14, 2022 at 1:40 PM Mark Wong  wrote:
>> I've created a function for each data type with the idea that an example
>> for handling a specific data type can be more easily reviewed by looking
>> in a single place.
>> I've added examples for REAL, TIMESTAMP WITHOUT TIME ZONE, and BOOLEAN
>> to try to illustrate how testlibpq3.sql and testlibpq3.c will grow if
>> this is a good way to go.

> I'm not sure that we want to let these test programs grow very large.
> In particular, I don't think we should let ourselves get sucked into
> adding an example for every data type under the sun -- if that's
> wanted, the solution is perhaps to add documentation for the binary
> formats, not hide impromptu documentation inside a test program.  But
> doing this much seems OK to me.

Yeah, "hiding impromptu documentation inside a test program" is what
this looks like, and I'm not sure that's a reasonable way to go.

(1) Who's going to think to look in src/test/examples/testlibpq3.c for
documentation of binary formats?

(2) The useful details are likely to get buried in notational and
portability concerns, as I think your build failure illustrates.

(3) I bet few if any packagers install these files, so that the new
info would be unavailable to many people.

I think some new appendix in the main SGML docs would be the appropriate
place if we want to provide real documentation.

regards, tom lane




Re: real/float example for testlibpq3

2022-06-16 Thread Robert Haas
On Tue, Jun 14, 2022 at 1:40 PM Mark Wong  wrote:
> Checking in for quick feedback to see if this refactor makes sense.
>
> I've created a function for each data type with the idea that an example
> for handling a specific data type can be more easily reviewed by looking
> in a single place.
>
> I've added examples for REAL, TIMESTAMP WITHOUT TIME ZONE, and BOOLEAN
> to try to illustrate how testlibpq3.sql and testlibpq3.c will grow if
> this is a good way to go.

I think this could be a reasonable kind of thing to do, but I think
you left the "ts" output out of the example output in the comments,
and also your code's apparently not portable, because my compiler is
OK with testlibpq3 right now, but with your patch it emits lengthy
unhappy moaning, starting with:

testlibpq3.c:88:10: error: expected ')'
uint64_t ntohll(uint64_t);
 ^
/Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk/usr/include/sys/_endian.h:140:25:
note: expanded from macro 'ntohll'
#define ntohll(x)   __DARWIN_OSSwapInt64(x)
^

Apparently macOS defines ntohll as a macro, and it's not amused by
your attempt to include a function prototype for it.

I'm not sure that we want to let these test programs grow very large.
In particular, I don't think we should let ourselves get sucked into
adding an example for every data type under the sun -- if that's
wanted, the solution is perhaps to add documentation for the binary
formats, not hide impromptu documentation inside a test program.  But
doing this much seems OK to me.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-06-16 Thread Andrew Dunstan


On 2022-06-16 Th 00:56, Masahiko Sawada wrote:
>
> I've attached an updated version patch that changes the configure
> script. I'm still studying how to support AVX2 on msvc build. Also,
> added more regression tests.


I think you would need to add '/arch:AVX2' to the compiler flags in
MSBuildProject.pm.


See



cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: better page-level checksums

2022-06-16 Thread Robert Haas
On Wed, Jun 15, 2022 at 5:53 PM Peter Geoghegan  wrote:
> I think that it's worth doing the following exercise (humor me): Why
> wouldn't it be okay to just encrypt the tuple space and the line
> pointer array, leaving both the page header and page special area
> unencrypted? What kind of user would find that trade-off to be
> unacceptable, and why? What's the nuance of it?

Let's consider a continuum where, on the one end, you encrypt the
entire disk. Then, consider a solution where you encrypt each
individual file, block by block. Next, let's imagine that we don't
encrypt some kinds of files at all, if we think the data in them isn't
sensitive enough. CLOG, maybe. Perhaps pg_class, because that'd be
useful for debugging, and how sensitive can the names of the database
tables be? Then, let's adopt your proposal here and leave some parts
of each block unencrypted for debuggability. As a next step, we could
take the further step of separately encrypting each tuple, but only
the data, leaving the tuple header unencrypted. Then, going further,
we could encrypt each individual column value within the tuple
separately, rather than encrypting the tuple together. Then, let's
additionally decide that we're not going to encrypt all the columns,
but just the ones the user says are sensitive. Now I think we've
pretty much reached the other end of the continuum, unless someone is
going to propose something like encrypting only part of each column,
or storing some unencrypted data along with each encrypted column that
is somehow dependent on the column contents.

I think it is undeniable that every step along that continuum has
weakened security in some way. The worst case scenario for an attacker
must be that the entire disk is encrypted and they can gain no
meaningful information at all without having to break that encryption.
As the encryption touches fewer things, it becomes easier and easier
to make inferences about the unseen data based on the data that you
can see. One can sit around and argue about whether the amount of
information that is leaked at any given step is enough for anyone to
care, but to some extent that's an opinion question where any position
can be defended by someone. I would argue that even leaking the
lengths of the files is not great at all. Imagine that the table is
scheduled_nuclear_missile_launches. I definitely do not want my
adversaries to know even as much as whether that table is zero-length
or non-zero-length. In fact I would prefer that they be unable to
infer that I have such a table at all. Back in 2019 I constructed a
similar example for how access to pg_clog could leak meaningful
information:

http://postgr.es/m/ca+tgmozhbeymroaccj1ocn03jz2uak18qn4afx4wd7g+j7s...@mail.gmail.com

Now, obviously, anyone can debate how realistic such cases are, but
they definitely exist. If you can read btpo_prev, btpo_next,
btpo_level, and btpo_flags for every page in the btree, you can
probably infer some things about the distribution of keys in the table
-- especially if you can read all the pages at time T1 and then read
them all again later at time T2 (and maybe further times T3..Tn). You
can make inference about which parts of the keyspace are receiving new
index insertions and which are not. If that's the index on the
current_secret_missions.country_code column, well then that sucks.
Your adversary may be able to infer where in the world your secret
organization is operating and round up all your agents.

Now, I do realize that if we're ever going to get TDE in PostgreSQL,
we will probably have to make some compromises. Actually concealing
file lengths would require a redesign of the entire storage system,
and so is probably not practical in the short term. Concealing SLRU
contents would require significant changes too, some of which I think
are things Thomas wants to do anyway, but we might have to punt that
goal for the first version of a TDE feature, too. Surely, that weakens
security, but if it gets us to a feature that some people can use
before the heat death of the universe, there's a reasonable argument
that that's better than nothing. Still, conceding that we may not
realistically may be able to conceal all the information in v1 is
different from arguing that concealing it isn't desirable, and I think
the latter argument is pretty hard to defend.

People who want to break into computers have gotten incredibly good at
exploiting incredibly subtle bits of information in order to infer the
contents of unseen data.
https://en.wikipedia.org/wiki/Spectre_(security_vulnerability) is a
good example: somebody figured out that the branch prediction hardware
could initiate speculative accesses to RAM that the user doesn't
actually have permission to execute, and thus a JavaScript program
running in your browser can read out the entire contents of RAM by
measuring exactly how long mis-predicted code takes to execute.
There's got to be at least one chip designer out there somewhere who
was 

SGML doc file references

2022-06-16 Thread Josh Soref
Hi,
I'm reading the docs (I'm trying to figure out some replication
things) and I was wondering why the file references [1] don't match
the file names.

Most of the inconsistent items are for `obsolete-*` where the filename
is actually `appendix-obsolete-*`. But, oddly, afaict, they were
introduced with these inconsistent names.

In one of those cases, the base of the file is also wrong (pgxlogdump
[2] vs. pgreceivexlog [3]). I believe this was an api change between
9.3 and 9.4. I know that there are `id=` tags designed to catch old
references, but the comments don't seem to serve that purpose, if they
are, I was wondering if an additional comment explaining their
discrepancies would be warranted.

In one case, it's just a missing `-` (`backupmanifest.sgml` vs
`backup-manifest.sgml`) which feels accidental.

(I do have more technical questions about the docs, but I think I may
try a different venue to ask them.)

Thanks,

[1] https://github.com/jsoref/postgres/commit/sgml-doc-file-refs
[2] https://www.postgresql.org/docs/9.3/pgxlogdump.html
[3] https://www.postgresql.org/docs/9.4/app-pgreceivexlog.html


sgml-doc-file-refs.patch
Description: Binary data


Re: SLRUs in the main buffer pool, redux

2022-06-16 Thread Konstantin Knizhnik



On 28.05.2022 04:13, Thomas Munro wrote:

On Fri, May 27, 2022 at 11:24 PM Thomas Munro  wrote:

Rebased, debugged and fleshed out a tiny bit more, but still with
plenty of TODO notes and questions.  I will talk about this idea at
PGCon, so I figured it'd help to have a patch that actually applies,
even if it doesn't work quite right yet.  It's quite a large patch but
that's partly because it removes a lot of lines...

FWIW, here are my PGCon slides about this:
https://speakerdeck.com/macdice/improving-the-slru-subsystem

There was a little bit of discussion on #pgcon-stream2 which I could
summarise as: can we figure out a way to keep parts of the CLOG pinned
so that backends don't have to do that for each lookup?  Then CLOG
checks become simple reads.  There may be some relation to the idea of
'nailing' btree root pages that I've heard of from a couple of people
now (with ProcSignalBarrier or something more fine grained along those
lines if you need to unnail anything).  Something to think about.

I'm also wondering if it would be possible to do "optimistic" pinning
instead for reads that normally need only a pin, using some kind of
counter scheme with read barriers to tell you if the page might have
been evicted after you read the data...





I wonder if there are some tests which can illustrate advantages of 
storing SLRU pages in shared buffers?
In PgPro we had a customer which run PL-PgSql code with recursively 
called function containing exception handling code. Each exception block 
creates subtransaction

and subxids SLRU becomes bottleneck.
I have simulated this workload with large number subxids using the 
following function:


create or replace function do_update(id integer, level integer) returns 
void as $$

begin
    begin
        if level > 0 then
            perform do_update(id, level-1);
        else
            update pgbench_accounts SET abalance = abalance + 1 WHERE 
aid = id;

        end if;
    exception WHEN OTHERS THEN
        raise notice '% %', SQLERRM, SQLSTATE;
    end;
end; $$ language plpgsql;

With the following test script:

    \set aid random(1, 1000)
 select do_update(:aid,100)

I got the following results:

knizhnik@xps:~/db$ pgbench postgres -f update.sql -c 10 -T 100 -P 1 -M 
prepared

pgbench (15beta1)
starting vacuum...end.
progress: 1.0 s, 3030.8 tps, lat 3.238 ms stddev 1.110, 0 failed
progress: 2.0 s, 3018.0 tps, lat 3.303 ms stddev 1.088, 0 failed
progress: 3.0 s, 3000.4 tps, lat 3.329 ms stddev 1.063, 0 failed
progress: 4.0 s, 2855.6 tps, lat 3.494 ms stddev 1.152, 0 failed
progress: 5.0 s, 2747.0 tps, lat 3.631 ms stddev 1.306, 0 failed
progress: 6.0 s, 2664.0 tps, lat 3.743 ms stddev 1.410, 0 failed
progress: 7.0 s, 2498.0 tps, lat 3.992 ms stddev 1.659, 0 failed
...
progress: 93.0 s, 670.0 tps, lat 14.964 ms stddev 10.555, 0 failed
progress: 94.0 s, 615.0 tps, lat 16.222 ms stddev 11.419, 0 failed
progress: 95.0 s, 580.0 tps, lat 17.251 ms stddev 11.622, 0 failed
progress: 96.0 s, 568.0 tps, lat 17.582 ms stddev 11.679, 0 failed
progress: 97.0 s, 573.0 tps, lat 17.389 ms stddev 11.771, 0 failed
progress: 98.0 s, 611.0 tps, lat 16.428 ms stddev 11.768, 0 failed
progress: 99.0 s, 568.0 tps, lat 17.622 ms stddev 11.912, 0 failed
progress: 100.0 s, 568.0 tps, lat 17.631 ms stddev 11.672, 0 failed
tps = 1035.566054 (without initial connection time)

With Thomas patch results are the following:

progress: 1.0 s, 2949.8 tps, lat 3.332 ms stddev 1.285, 0 failed
progress: 2.0 s, 3009.1 tps, lat 3.317 ms stddev 1.077, 0 failed
progress: 3.0 s, 2993.6 tps, lat 3.338 ms stddev 1.099, 0 failed
progress: 4.0 s, 3034.4 tps, lat 3.291 ms stddev 1.056, 0 failed
...
progress: 97.0 s, 1113.0 tps, lat 8.972 ms stddev 3.885, 0 failed
progress: 98.0 s, 1138.0 tps, lat 8.803 ms stddev 3.496, 0 failed
progress: 99.0 s, 1174.8 tps, lat 8.471 ms stddev 3.875, 0 failed
progress: 100.0 s, 1094.1 tps, lat 9.123 ms stddev 3.842, 0 failed
tps = 2133.240094 (without initial connection time)

So there is still degrade of performance but smaller than in case of 
vanilla and total TPS are almost two times higher.


And this is another example demonstrating degrade of performance from 
presentation by Alexander Korotkov:

pgbench script:

\setaid random(1, 10 * :scale)
\setbid random(1, 1 * :scale)
\settid random(1, 10 * :scale)
\setdelta random(-5000, 5000)
BEGIN;
INSERT INTOpgbench_history (tid, bid, aid, delta, mtime)
VALUES(:tid, :bid, :aid, :delta,CURRENT_TIMESTAMP);
SAVEPOINT s1;
INSERT INTOpgbench_history (tid, bid, aid, delta, mtime)
VALUES(:tid, :bid, :aid, :delta,CURRENT_TIMESTAMP);

SAVEPOINT sN;
INSERT INTOpgbench_history (tid, bid, aid, delta, mtime)
VALUES(:tid, :bid, :aid, :delta,CURRENT_TIMESTAMP);
SELECTpg_sleep(1.0);
END;





I wonder which workload can cause CLOG to become a bottleneck?
Usually Postgres uses hint bits to avoid clog access. So standard 
pgbench doesn't demonstrate any degrade of performance even in case of 
presence of long living 

Re: Modest proposal to extend TableAM API for controlling cluster commands

2022-06-16 Thread Mark Dilger



> On Jun 16, 2022, at 12:27 AM, Andres Freund  wrote:
> 
>> I don't think I should have to do so.  It's like saying, "I think I should
>> have freedom of speech", and you say, "well, I'm not sure about that; tell
>> me what you want to say, and I'll decide if I'm going to let you say it".'
>> That's not freedom.  I think TAM authors should have broad discretion over
>> anything that the core system doesn't have a compelling interest in
>> controlling.
> 
> That's insultingly ridiculous. You can say, do whatever you want, but that
> doesn't mean I have to be convinced by it (i.e. +1 adding an API) - that'd be
> compelled speech, to go with your image...

Indeed it would be compelled speech, and I'm not trying to compel you, only to 
convince you.  And my apologies if it came across as insulting.  I have a lot 
of respect for you, as do others at EDB, per invariably complementary comments 
I've heard others express.

> It's utterly normal to be asked what the use case for a new API is when
> proposing one.

It seems like we're talking on two different levels.  I've said what the use 
case is, which is to implement a TAM that doesn't benefit from cluster or 
vacuum full, without the overhead of needlessly copying itself, and without 
causing argumentless VACUUM FULL commands to fail.  I'm *emphatically* not 
asking the community to accept the TAM back as a patch.  The freedom I'm 
talking about is the freedom to design and implement such a third-party TAM 
without seeking community approval of the TAM's merits.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Skipping logical replication transactions on subscriber side

2022-06-16 Thread Robert Haas
On Thu, Jun 16, 2022 at 3:26 AM Masahiko Sawada  wrote:
> FWIW, looking at the manual, there might have
> been a solution for AIX to specify -qalign=natural compiler option in
> order to enforce the alignment of double to 8.

Well if that can work it sure seems better.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Modest proposal to extend TableAM API for controlling cluster commands

2022-06-16 Thread Mark Dilger



> On Jun 16, 2022, at 12:28 AM, David G. Johnston  
> wrote:
> 
> But you are basically proposing a reworking of the existing system into one 
> that makes pretty much any SQL Command something that a TAM can treat as 
> being an optional request by the user;

Yes, and I think I'm perfectly correct in asking for that.  If the standard you 
are proposing (albeit as Devil's Advocate) were applied to filesystems, nobody 
could ever implement /dev/null, on the argument that users have a reasonable 
expectation that data they write to a file will be there for them to read 
later.  Yet Michael Paquier wrote a blackhole TAM, and although I don't find it 
terribly useful, I do think it's a reasonable thing for somebody to be able to 
write. 

> whereas today the system presumes that the implementations will respond to 
> these commands.

That depends on what you mean by "respond to".  A TAM which implements a tamper 
resistant audit log responds to update and delete commands with a "permission 
denied" error.  A TAM which implements running aggregates implements insert 
commands by computing and inserting a new running aggregate value and 
reclaiming space from old running aggregate values when no transaction could 
any longer see them.  You can do this stuff at a higher level with hooks, 
functions, triggers, and rules, inserting into a heap, and having to 
periodically vacuum, by why would you want to?  That's almost guaranteed to be 
slower, maybe even orders of magnitude slower. 

> And to make this change without any core code having such a need.

The core code won't have any such need, because the core code is content with 
heap, and the API already accommodates heap.  It seems Andres moved the project 
in the direction of allowing custom TAMs when he created the Table AM 
interface, and I'm quite pleased that he did so, but it doesn't allow nearly 
enough flexibility to do all the interesting things a TAM could otherwise do.  
Consider for example that the multi_insert hook uses a BulkInsertStateData 
parameter, defined as: 

typedef struct BulkInsertStateData
{   
BufferAccessStrategy strategy;  /* our BULKWRITE strategy object */
Buffer  current_buf;/* current insertion target page */
} BulkInsertStateData; 

which is just the structure heap would want, but what about a TAM that wants to 
route different tuples to different pages?  The "current_buf" isn't enough 
information, and there's no void *extra field, so you're just sunk.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Tightening behaviour for non-immutable behaviour in immutable functions

2022-06-16 Thread Greg Stark
On Mon, 13 Jun 2022 at 16:50, Greg Stark  wrote:
>
> For that matter the gotchas are a bit  convoluted
>
> Perhaps we should automatically fix up the current search patch and
> attach it to functions where necessary for users instead of just
> whingeing at them

So I reviewed my own patch and it was completely broken... I fixed
it to actually check the right variables.

I also implemented the other idea above of actually fixing up
search_path in proconfig for the user by default. I think this is
going to be more practical.

The problem with expecting the user to provide search_path is that
they probably aren't today so the warnings would be firing for
everything...

Providing a fixed up search_path for users would give them a smoother
upgrade process where we can give a warning only if the search_path is
changed substantively which is much less likely.

I'm still quite concerned about the performance impact of having
search_path on so many functions. It causes a cache flush which could
be pretty painful on a frequently called function such as one in an
index expression during a data load

The other issue is that having proconfig set does prevent these
functions from being inlined which can be seen in the regression test
as seen below. I'm not sure how big a problem this is for users.
Inlining is important for many use cases I think. Maybe we can go
ahead and inline things even if they have a proconfig if it matches
the proconfig on the caller? Or maybe even check if we get the same
objects from both search_paths?


Of course this patch is still very WIP. Only one or the other function
makes sense to keep. And I'm not opposed to having a GUC to
enable/disable the enforcement or warnings. And the code itself needs
to be cleaned up with parts of it moving to guc.c and/or namespace.c.


Example of regression tests noticing that immutable functions with
proconfig set become non-inlineable:

diff -U3 /home/stark/src/postgresql/src/test/regress/expected/rangefuncs.out
/home/stark/src/postgresql/src/test/regress/results/rangefuncs.out
--- /home/stark/src/postgresql/src/test/regress/expected/rangefuncs.out
2022-01-17 12:01:54.958628564 -0500
+++ /home/stark/src/postgresql/src/test/regress/results/rangefuncs.out
2022-06-16 02:16:47.589703966 -0400
@@ -1924,14 +1924,14 @@
 select * from array_to_set(array['one', 'two']) as t(f1 point,f2 text);
 ERROR:  return type mismatch in function declared to return record
 DETAIL:  Final statement returns integer instead of point at column 1.
-CONTEXT:  SQL function "array_to_set" during inlining
+CONTEXT:  SQL function "array_to_set" during startup
 explain (verbose, costs off)
   select * from array_to_set(array['one', 'two']) as t(f1
numeric(4,2),f2 text);
-  QUERY PLAN
---
- Function Scan on pg_catalog.generate_subscripts i
-   Output: i.i, ('{one,two}'::text[])[i.i]
-   Function Call: generate_subscripts('{one,two}'::text[], 1)
+ QUERY PLAN
+
+ Function Scan on public.array_to_set t
+   Output: f1, f2
+   Function Call: array_to_set('{one,two}'::text[])
 (3 rows)

 create temp table rngfunc(f1 int8, f2 int8);
@@ -2064,11 +2064,12 @@

 explain (verbose, costs off)
 select * from testrngfunc();
-   QUERY PLAN
-
- Result
-   Output: 7.136178::numeric(35,6), 7.14::numeric(35,2)
-(2 rows)
+ QUERY PLAN
+-
+ Function Scan on public.testrngfunc
+   Output: f1, f2
+   Function Call: testrngfunc()
+(3 rows)

 select * from testrngfunc();
 f1|  f2


--
greg
From 02f16014c3458e549719f4bb6c338e55b6e2 Mon Sep 17 00:00:00 2001
From: Greg Stark 
Date: Thu, 16 Jun 2022 01:40:49 -0400
Subject: [PATCH 2/2] Instead of checks actually force search_path on immutable
 and security definer functions

---
 src/backend/commands/functioncmds.c | 79 -
 1 file changed, 78 insertions(+), 1 deletion(-)

diff --git a/src/backend/commands/functioncmds.c b/src/backend/commands/functioncmds.c
index 9347ed70e2..20da751202 100644
--- a/src/backend/commands/functioncmds.c
+++ b/src/backend/commands/functioncmds.c
@@ -41,6 +41,7 @@
 #include "catalog/indexing.h"
 #include "catalog/objectaccess.h"
 #include "catalog/pg_aggregate.h"
+#include "catalog/pg_authid.h"
 #include "catalog/pg_cast.h"
 #include "catalog/pg_language.h"
 #include "catalog/pg_namespace.h"
@@ -683,9 +684,83 @@ update_proconfig_value(ArrayType *a, List *set_items)
  * https://www.postgresql.org/docs/current/sql-createfunction.html#SQL-CREATEFUNCTION-SECURITY
  */ 
 
-/* XXX This logic should perhaps be moved to namespace.c XXX */
+/* XXX Some or all of this logic should perhaps be moved to namespace.c XXX */
 #include "utils/varlena.h"
 
+static ArrayType *
+fixup_proconfig_value(ArrayType *a, char 

Re: CREATE TABLE ( .. STORAGE ..)

2022-06-16 Thread Aleksander Alekseev
Hi Matthias,

> Apart from this comment on the format of the patch, the result seems solid.

Many thanks.

> When updating a patchset generally we try to keep the patches
> self-contained, and update patches as opposed to adding incremental
> patches to the set.

My reasoning was to separate my changes from the ones originally
proposed by Teodor. After doing `git am` locally a reviewer can see
them separately, or together with `git diff origin/master`, whatever
he or she prefers. The committer can choose between committing two
patches ony by one, or rebasing them to a single commit.

I will avoid the "patch for the patch" practice from now on. Sorry for
the inconvenience.

-- 
Best regards,
Aleksander Alekseev




Re: fix stats_fetch_consistency value in postgresql.conf.sample

2022-06-16 Thread Justin Pryzby
On Thu, Jun 16, 2022 at 05:19:46PM +0900, Kyotaro Horiguchi wrote:
> At Sat, 11 Jun 2022 09:41:37 -0500, Justin Pryzby  
> wrote in 
> > Note that this gives:
> > 
> > guc.c:7573:9: warning: ‘dst’ may be used uninitialized in this function 
> > [-Wmaybe-uninitialized]
> 
> Mmm. I don't have an idea where the 'dst' came from...

Well, in your latest patch, you've renamed it.

guc.c:7586:19: warning: ‘result’ may be used uninitialized in this function 
[-Wmaybe-uninitialized]
 7586 |  PG_RETURN_TEXT_P(cstring_to_text(result));

-- 
Justin




Re: Replica Identity check of partition table on subscriber

2022-06-16 Thread Amit Langote
On Thu, Jun 16, 2022 at 9:28 PM Amit Kapila  wrote:
> On Thu, Jun 16, 2022 at 5:24 PM Amit Langote  wrote:
> > On Thu, Jun 16, 2022 at 6:42 PM Amit Kapila  wrote:
> > > On Fri, Jun 10, 2022 at 2:26 PM Amit Langote  
> > > wrote:
> > > > @@ -1735,6 +1735,13 @@ apply_handle_insert_internal(ApplyExecutionData 
> > > > *edata,
> > > >  static void
> > > >  check_relation_updatable(LogicalRepRelMapEntry *rel)
> > > >  {
> > > > +   /*
> > > > +* If it is a partitioned table, we don't check it, we will check 
> > > > its
> > > > +* partition later.
> > > > +*/
> > > > +   if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
> > > > +   return;
> > > >
> > > > Why do this?  I mean why if logicalrep_check_updatable() doesn't care
> > > > if the relation is partitioned or not -- it does all the work
> > > > regardless.
> > > >
> > > > I suggest we don't add this check in check_relation_updatable().
> > >
> > > I think based on this suggestion patch has moved this check to
> > > logicalrep_rel_mark_updatable(). For a partitioned table, it won't
> > > even validate whether it can mark updatable as false which seems odd
> > > to me even though there might not be any bug due to that. Was your
> > > suggestion actually intended to move it to
> > > logicalrep_rel_mark_updatable?
> >
> > No, I didn't intend to suggest that we move this check to
> > logicalrep_rel_mark_updatable(); didn't notice that that's what the
> > latest patch did.
> >
> > What I said is that we shouldn't ignore the updatable flag for a
> > partitioned table in check_relation_updatable(), because
> > logicalrep_rel_mark_updatable() would have set the updatable flag
> > correctly even for partitioned tables.  IOW, we should not
> > special-case partitioned tables anywhere.
> >
> > I guess the point of adding the check is to allow the case where a
> > leaf partition's replica identity can be used to apply an update
> > originally targeting its ancestor that doesn't itself have one.
> >
> > I wonder if it wouldn't be better to move the
> > check_relation_updatable() call to
> > apply_handle_{update|delete}_internal()?  We know for sure that we
> > only ever get there for leaf tables.  If we do that, we won't need the
> > relkind check.
>
> I think this won't work for updates via apply_handle_tuple_routing()
> unless we call it from some other place(s) as well. It will do
> FindReplTupleInLocalRel() before doing update/delete for CMD_UPDATE
> case and will lead to assertion failure.

You're right.  I guess it's fine then to check the relkind in
check_relation_updatable() the way the original patch did, even though
it would've been nice if it didn't need to.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: Ignoring BRIN for HOT udpates seems broken

2022-06-16 Thread Tomas Vondra
On 6/6/22 09:28, Michael Paquier wrote:
> On Mon, Jun 06, 2022 at 09:08:08AM +0200, Tomas Vondra wrote:
>> Attached is a patch reverting both commits (5753d4ee32 and fe60b67250).
>> This changes the IndexAmRoutine struct, so it's an ABI break. That's not
>> great post-beta :-( In principle we might also leave amhotblocking in
>> the struct but ignore it in the code (and treat it as false), but that
>> seems weird and it's going to be a pain when backpatching. Opinions?
> 
> I don't think that you need to worry about ABI breakages now in beta,
> because that's the period of time where we can still change things and
> shape the code in its best way for prime time.  It depends on the
> change, of course, but what you are doing, by removing the field,
> looks right to me here.

I've pushed the revert. Let's try again for PG16.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Replica Identity check of partition table on subscriber

2022-06-16 Thread Amit Kapila
On Thu, Jun 16, 2022 at 5:24 PM Amit Langote  wrote:
>
> On Thu, Jun 16, 2022 at 6:42 PM Amit Kapila  wrote:
> > On Fri, Jun 10, 2022 at 2:26 PM Amit Langote  
> > wrote:
> > > @@ -1735,6 +1735,13 @@ apply_handle_insert_internal(ApplyExecutionData 
> > > *edata,
> > >  static void
> > >  check_relation_updatable(LogicalRepRelMapEntry *rel)
> > >  {
> > > +   /*
> > > +* If it is a partitioned table, we don't check it, we will check its
> > > +* partition later.
> > > +*/
> > > +   if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
> > > +   return;
> > >
> > > Why do this?  I mean why if logicalrep_check_updatable() doesn't care
> > > if the relation is partitioned or not -- it does all the work
> > > regardless.
> > >
> > > I suggest we don't add this check in check_relation_updatable().
> >
> > I think based on this suggestion patch has moved this check to
> > logicalrep_rel_mark_updatable(). For a partitioned table, it won't
> > even validate whether it can mark updatable as false which seems odd
> > to me even though there might not be any bug due to that. Was your
> > suggestion actually intended to move it to
> > logicalrep_rel_mark_updatable?
>
> No, I didn't intend to suggest that we move this check to
> logicalrep_rel_mark_updatable(); didn't notice that that's what the
> latest patch did.
>
> What I said is that we shouldn't ignore the updatable flag for a
> partitioned table in check_relation_updatable(), because
> logicalrep_rel_mark_updatable() would have set the updatable flag
> correctly even for partitioned tables.  IOW, we should not
> special-case partitioned tables anywhere.
>
> I guess the point of adding the check is to allow the case where a
> leaf partition's replica identity can be used to apply an update
> originally targeting its ancestor that doesn't itself have one.
>
> I wonder if it wouldn't be better to move the
> check_relation_updatable() call to
> apply_handle_{update|delete}_internal()?  We know for sure that we
> only ever get there for leaf tables.  If we do that, we won't need the
> relkind check.
>

I think this won't work for updates via apply_handle_tuple_routing()
unless we call it from some other place(s) as well. It will do
FindReplTupleInLocalRel() before doing update/delete for CMD_UPDATE
case and will lead to assertion failure.

-- 
With Regards,
Amit Kapila.




Re: PGDOCS - Integer configuration parameters should say "(integer)"

2022-06-16 Thread Michael Paquier
On Thu, Jun 16, 2022 at 07:22:15PM +1000, Peter Smith wrote:
> It looked like a small mistake to me; there are only 3 of (int),
> versus 148 of (integer).

Grepping around, that's correct.  Will fix.
--
Michael


signature.asc
Description: PGP signature


Re: Replica Identity check of partition table on subscriber

2022-06-16 Thread Amit Langote
On Thu, Jun 16, 2022 at 6:42 PM Amit Kapila  wrote:
> On Fri, Jun 10, 2022 at 2:26 PM Amit Langote  wrote:
> > @@ -1735,6 +1735,13 @@ apply_handle_insert_internal(ApplyExecutionData 
> > *edata,
> >  static void
> >  check_relation_updatable(LogicalRepRelMapEntry *rel)
> >  {
> > +   /*
> > +* If it is a partitioned table, we don't check it, we will check its
> > +* partition later.
> > +*/
> > +   if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
> > +   return;
> >
> > Why do this?  I mean why if logicalrep_check_updatable() doesn't care
> > if the relation is partitioned or not -- it does all the work
> > regardless.
> >
> > I suggest we don't add this check in check_relation_updatable().
>
> I think based on this suggestion patch has moved this check to
> logicalrep_rel_mark_updatable(). For a partitioned table, it won't
> even validate whether it can mark updatable as false which seems odd
> to me even though there might not be any bug due to that. Was your
> suggestion actually intended to move it to
> logicalrep_rel_mark_updatable?

No, I didn't intend to suggest that we move this check to
logicalrep_rel_mark_updatable(); didn't notice that that's what the
latest patch did.

What I said is that we shouldn't ignore the updatable flag for a
partitioned table in check_relation_updatable(), because
logicalrep_rel_mark_updatable() would have set the updatable flag
correctly even for partitioned tables.  IOW, we should not
special-case partitioned tables anywhere.

I guess the point of adding the check is to allow the case where a
leaf partition's replica identity can be used to apply an update
originally targeting its ancestor that doesn't itself have one.

I wonder if it wouldn't be better to move the
check_relation_updatable() call to
apply_handle_{update|delete}_internal()?  We know for sure that we
only ever get there for leaf tables.  If we do that, we won't need the
relkind check.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: Handle infinite recursion in logical replication setup

2022-06-16 Thread vignesh C
On Wed, Jun 15, 2022 at 12:13 PM Peter Smith  wrote:
>
> Here are some review comments for patch v20-0004.
>
> ==
>
> 1. General
>
> I thought that it is better to refer to the
> subscription/publications/table "on" the node, rather than "in" the
> node. Most of the review comments below are related to this point.

Modified

> ==
>
> 2. Commit message
>
> a) Creating a two-node bidirectional replication when there is no data
> in both nodes.
> b) Adding a new node when there is no data in any of the nodes.
> c) Adding a new node when data is present in the existing nodes.
>
> "in both nodes" -> "on both nodes"
> "in any of the nodes" -> "on any of the nodes"
> "in the existing nodes" -> "on the existing nodes"

Modified

> ==
>
> 3. doc/src/sgml/logical-replication.sgml - Setting bidirectional
> replication between two nodes
>
> 3a.
> +   
> +The following steps demonstrate how to create a two-node bidirectional
> +replication when there is no table data present in both nodes
> +node1 and node2:
> +   
> -> "on both nodes"

Modified

> 3b.
> +Create a publication in node1:
> -> "on"

Modified

> 3c.
> +Create a publication in node2:
> -> "on"

Modified

> 3d.
> +   
> +Lock the table t1 in node1 and
> +node2 in EXCLUSIVE mode until the
> +setup is completed.
> +   
> -> "on node1"

Modified

> 3e.
> +Create a subscription in node2 to subscribe to
> -> "on"

Modified

> 3f.
> +Create a subscription in node1 to subscribe to
> +node2:
> -> "on"

Modified

> ~~~
>
> 4. doc/src/sgml/logical-replication.sgml - Adding a new node when
> there is no data in any of the nodes
>
> 4a.
> +   Adding a new node when there is no data in any of the nodes
> SUGGESTION
> Adding a new node when there is no table data on any of the nodes

Modified

> 4b.
> +   
> +The following steps demonstrate adding a new node 
> node3
> +to the existing node1 and node2 
> when
> +there is no t1 data in any of the nodes. This requires
> +creating subscriptions in node1 and
> +node2 to replicate the data from
> +node3 and creating subscriptions in
> +node3 to replicate data from node1
> +and node2. Note: These steps assume that the
> +bidirectional logical replication between node1 and
> +node2 is already completed.
> +   
>
> "data in any of the nodes" -> "data on any of the nodes"
> "creating subscriptions in node1" -> "creating
> subscriptions on node1"
> "creating subscriptions in node3" -> "creating
> subscriptions on node3"

Modified

> 4c.
> +Create a publication in node3:
> -> "on"

Modified

> 4d.
> +Lock table t1 in all the nodes
> -> "on"

Modified

> 4e.
> +Create a subscription in node1 to subscribe to
> +node3:
> -> "on"

Modified

> 4f.
> +Create a subscription in node2 to subscribe to
> +node3:
> -> "on"

Modified

> 4g.
> +Create a subscription in node3 to subscribe to
> +node1:
> -> "on"

Modified

> 4h.
> +Create a subscription in node3 to subscribe to
> +node2:

Modified

> 4i.
> +node3. Incremental changes made in any node will be
> +replicated to the other two nodes.
> "in any node" -> "on any node"

Modified

> ~~~
>
> 5. doc/src/sgml/logical-replication.sgml - Adding a new node when data
> is present in the existing nodes
>
> 5a.
> +   Adding a new node when data is present in the existing 
> nodes
> SUGGESTION
> Adding a new node when table data is present on the existing nodes

Modified

> 5b.
> + during initial data synchronization. Note: These steps assume that the
> + bidirectional logical replication between node1 and
> + node2 is already completed, and the pre-existing data
> + in table t1 is already synchronized in both those
> + nodes.
> +   
> "in both those nodes" -> "on both those nodes"

Modified

> 5c.
> +Create a publication in node3
> -> "on"

Modified

> 5d.
> +Lock table t1 in node2 and
> -> "on"

Modified

> 5e.
> +Create a subscription in node1 to subscribe to
> +node3:
> -> "on"

Modified

> 5f.
> +Create a subscription in node2 to subscribe to
> +node3:
> -> "on"

Modified

> 5g.
> +Create a subscription in node3 to subscribe to
> +node1. Use copy_data = force  so 
> that
> +the existing table data is copied during initial sync:
> -> "on"

Modified

>
> 5h.
> +Create a subscription in node3 to subscribe to
> +node2. Use copy_data = off
> -> "on"

Modified

> 5i.
> +node3. Incremental changes made in any node will be
> +replicated to the other two nodes.
> "in any node" -> "on any node"

Modified

> ~~~
>
> 6. doc/src/sgml/logical-replication.sgml - Adding a new node when data
> is present in the new node
>
> +   Adding a new node when data is present in the new node
> SUGGESTION
> Adding a new node when table data is present on the new node

Modified

> ~~~
>
> 7. doc/src/sgml/logical-replication.sgml - Generic steps for adding a
> new node to an existing set of nodes
>
> 7a.
> +   

Re: Handle infinite recursion in logical replication setup

2022-06-16 Thread vignesh C
On Wed, Jun 15, 2022 at 12:11 PM Peter Smith  wrote:
>
> Here are some review comments for patch v20-0003.
>
> ==
>
> 1. Commit message
>
> In case, table t1 has a unique key, it will lead to a unique key
> violation and replication won't proceed.
>
> SUGGESTION
> If table t1 has a unique key, this will lead to a unique key
> violation, and replication won't proceed.

Modified

> ~~~
>
> 2. Commit message
>
> This problem can be solved by using...
>
> SUGGESTION
> This problem can be avoided by using...

Modified

> ~~~
>
> 3. Commit message
>
> step 3: Create a subscription in node1 to subscribe to node2. Use
> 'copy_data = on' so that the existing table data is copied during
> initial sync:
> node1=# CREATE SUBSCRIPTION sub_node1_node2 CONNECTION ''
> node1-# PUBLICATION pub_node2 WITH (copy_data = off, origin = local);
> CREATE SUBSCRIPTION
>
> This is wrong information. The table on node2 has no data, so talking
> about 'copy_data = on' is inappropriate here.

Modified

> ==
>
> 4. Commit message
>
> IMO it might be better to refer to subscription/publication/table "on"
> nodeXXX, instead of saying "in" nodeXXX.
>
> 4a.
> "the publication tables were also subscribing data in the publisher
> from other publishers." -> "the publication tables were also
> subscribing from other publishers.

Modified

> 4b.
> "After the subscription is created in node2" -> "After the
> subscription is created on node2"

Modified

> 4c.
> "step 3: Create a subscription in node1 to subscribe to node2." ->
> "step 3: Create a subscription on node1 to subscribe to node2."

Modified

> 4d.
> "step 4: Create a subscription in node2 to subscribe to node1." ->
> "step 4: Create a subscription on node2 to subscribe to node1."

Modified

> ==
>
> 5. doc/src/sgml/ref/create_subscription.sgml
>
> @@ -383,6 +397,15 @@ CREATE SUBSCRIPTION  class="parameter">subscription_name can have non-existent publications.
>
>
> +  
> +   If the subscription is created with origin = local and
> +   copy_data = on, it will check if the publisher tables 
> are
> +   being subscribed to any other publisher and, if so, then throw an error to
> +   prevent possible non-local data from being copied. The user can override
> +   this check and continue with the copy operation by specifying
> +   copy_data = force.
> +  
>
> Perhaps it is better here to say 'copy_data = true' instead of
> 'copy_data = on', simply because the value 'true' was mentioned
> earlier on this page (but this would be the first mention of 'on').

Modified

> ==
>
> 6. src/backend/commands/subscriptioncmds.c - check_pub_table_subscribed
>
> + errhint("Use CREATE/ALTER SUBSCRIPTION with copy_data = off or force."));
>
> Saying "off or force" is not consistent with the other message wording
> in this patch, which used "/" for multiple enums.
> (e.g. "connect = false", "copy_data = true/force").
>
> So perhaps this errhint should be worded similarly:
> "Use CREATE/ALTER SUBSCRIPTION with copy_data = off/force."

Modified

Thanks for the comments, the v21 patch attached at [1] has the changes
for the same.
[1] - 
https://www.postgresql.org/message-id/CALDaNm3%2B6cey0rcDft1ZUCjSUtLDM0xmU_Q%2BYhcsBrqe1RH8%3Dw%40mail.gmail.com

Regards,
Vignesh




Re: Replica Identity check of partition table on subscriber

2022-06-16 Thread Amit Kapila
On Fri, Jun 10, 2022 at 2:26 PM Amit Langote  wrote:
>
> @@ -1735,6 +1735,13 @@ apply_handle_insert_internal(ApplyExecutionData *edata,
>  static void
>  check_relation_updatable(LogicalRepRelMapEntry *rel)
>  {
> +   /*
> +* If it is a partitioned table, we don't check it, we will check its
> +* partition later.
> +*/
> +   if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
> +   return;
>
> Why do this?  I mean why if logicalrep_check_updatable() doesn't care
> if the relation is partitioned or not -- it does all the work
> regardless.
>
> I suggest we don't add this check in check_relation_updatable().
>

I think based on this suggestion patch has moved this check to
logicalrep_rel_mark_updatable(). For a partitioned table, it won't
even validate whether it can mark updatable as false which seems odd
to me even though there might not be any bug due to that. Was your
suggestion actually intended to move it to
logicalrep_rel_mark_updatable? If so, why do you think that is a
better place?

I think it is important to have this check to avoid giving error via
check_relation_updatable() when partitioned tables don't have RI but
not clear which is the right place. I think check_relation_updatable()
is better place than logicalrep_rel_mark_updatable() but may be there
is a reason why that is not a good idea.

-- 
With Regards,
Amit Kapila.




PGDOCS - Integer configuration parameters should say "(integer)"

2022-06-16 Thread Peter Smith
FYI, I happened to notice in the PG docs there are a few integer
configuration parameters that describe themselves as type "(int)"
instead of "(integer)".

It looked like a small mistake to me; there are only 3 of (int),
versus 148 of (integer).

~~~

doc\src\sgml\auth-delay.sgml:
  29  
  30:  auth_delay.milliseconds (int)
  31   

doc\src\sgml\config.sgml:
  4918   
  4919:   max_logical_replication_workers
(int)
  4920

doc\src\sgml\pgprewarm.sgml:
  109 
  110:  pg_prewarm.autoprewarm_interval
(int)
  111   

~~~

PSA a small patch to correct those.

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v1-0001-Use-type-integer-instead-of-type-int.patch
Description: Binary data


Re: CREATE TABLE ( .. STORAGE ..)

2022-06-16 Thread Matthias van de Meent
On Wed, 15 Jun 2022 at 16:51, Aleksander Alekseev
 wrote:
>
> Hi hackers,
>
> I noticed that cfbot is not entirely happy with the patch, so I rebased it.
>
> > I see that COMPRESSION and STORAGE now are handled slightly
> > differently in the grammar. Maybe we could standardize that a bit
> > more; so that we have only one `STORAGE [kind]` definition in the
> > grammar?
> >
> > As I'm new to the grammar files; would you know the difference between
> > `name` and `ColId`, and why you would change from one to the other in
> > ALTER COLUMN STORAGE?
>
> Good point, Matthias. I addressed this in 0002. Does it look better now?

When updating a patchset generally we try to keep the patches
self-contained, and update patches as opposed to adding incremental
patches to the set.

Apart from this comment on the format of the patch, the result seems solid.

- Matthias




Re: Prevent writes on large objects in read-only transactions

2022-06-16 Thread Michael Paquier
On Thu, Jun 16, 2022 at 03:42:06PM +0900, Yugo NAGATA wrote:
> I am not sure if we should use PreventCommandIfReadOnly in lo_write()
> because there are other public functions that write to catalogs but there
> are not the similar restrictions in such functions. I think it is caller's
> responsibility to prevent to use such public functions in read-only context.

I'd be really tempted to remove the plug on this one, actually.
However, that would also mean to break something just for the sake of
breaking it.  So perhaps you are right at the end in that it is better
to let this code be, without the new check.

> I also fixed to raise the error in each of lo_truncate and lo_truncate64
> per Michael's comment in the other post.

Thanks!  That counts for 10 SQL functions blocked with 10 tests.  So
you have all of them covered.

Looking at the docs of large objects, as of "Client Interfaces", we
mention that any action must take place in a transaction block.
Shouldn't we add a note that no write operations are allowed in a
read-only transaction?

+   if (mode & INV_WRITE)
+   PreventCommandIfReadOnly("lo_open() in write mode");
Nit.  This breaks translation.  I think that it could be switched to
"lo_open(INV_WRITE)" instead as the flag name is documented.

The patch is forgetting a refresh for largeobject_1.out.

---   INV_READ  = 0x2
---   INV_WRITE = 0x4
+--   INV_READ  = 0x4
+--   INV_WRITE = 0x2
Good catch!  This one is kind of independent, so I have fixed it
separately, in all the expected output files.
--
Michael


signature.asc
Description: PGP signature


Re: Replica Identity check of partition table on subscriber

2022-06-16 Thread Amit Kapila
On Thu, Jun 16, 2022 at 12:30 PM Amit Langote  wrote:
>
> On Thu, Jun 16, 2022 at 3:45 PM Amit Kapila  wrote:
> > On Thu, Jun 16, 2022 at 11:43 AM Amit Langote  
> > wrote:
> > > + * Don't throw any error here just mark the relation entry as not 
> > > updatable,
> > > + * as replica identity is only for updates and deletes but inserts can be
> > > + * replicated even without it.
> > >
> > > I know you're simply copying the old comment, but I think we can
> > > rewrite it to be slightly more useful:
> > >
> > > We just mark the relation entry as not updatable here if the local
> > > replica identity is found to be insufficient and leave it to
> > > check_relation_updatable() to throw the actual error if needed.
> >
> > I am fine with improving this comment but it would be better if in
> > some way we keep the following part of the comment: "as replica
> > identity is only for updates and deletes but inserts can be replicated
> > even without it." as that makes it more clear why it is okay to just
> > mark the entry as not updatable. One idea could be: "We just mark the
> > relation entry as not updatable here if the local replica identity is
> > found to be insufficient and leave it to check_relation_updatable() to
> > throw the actual error if needed. This is because replica identity is
> > only for updates and deletes but inserts can be replicated even
> > without it.". Feel free to suggest if you have any better ideas?
>
> I thought mentioning check_relation_updatable() would make it clear
> that only updates (and deletes) care about a valid local replica
> identity, because only apply_handle_{update|delete}() call that
> function.  Anyway, how about this:
>
> We just mark the relation entry as not updatable here if the local
> replica identity is found to be insufficient for applying
> updates/deletes (inserts don't care!) and leave it to
> check_relation_updatable() to throw the actual error if needed.
>

This sounds better to me than the previous text.

-- 
With Regards,
Amit Kapila.




Re: fix stats_fetch_consistency value in postgresql.conf.sample

2022-06-16 Thread Kyotaro Horiguchi
At Thu, 16 Jun 2022 12:07:03 +0900, Michael Paquier  wrote 
in 
> On Sat, Jun 11, 2022 at 09:41:37AM -0500, Justin Pryzby wrote:
> > Note that this gives:
> > 
> > guc.c:7573:9: warning: ‘dst’ may be used uninitialized in this function 
> > [-Wmaybe-uninitialized]
> > 
> > with gcc version 9.2.1 20191008 (Ubuntu 9.2.1-9ubuntu2)
> > 
> > I wonder whether you'd consider renaming pg_normalize_config_value() to
> > pg_pretty_config_value() or similar.
> 
> I have looked at the patch, and I am not convinced that we need a
> function that does a integer -> integer-with-unit conversion for the
> purpose of this test.  One thing is that it can be unstable with the
> unit in the conversion where values are close to a given threshold
> (aka for cases like 2048kB/2MB).  On top of that, this overlaps with

I agree that needing to-with-unit conversion is a bit crumsy.  One of
the reason is I didn't want to add a function that has no use of other
than testing,

> the existing system function in charge of converting values with bytes
> as size unit, while this stuff handles more unit types and all GUC
> types.  I think that there could be some room in doing the opposite
> conversion, feeding the value from postgresql.conf.sample to something
> and compare it directly with boot_val.  That's solvable at SQL level,
> still a system function may be more user-friendly.

The output value must be the same with what pg_settings shows, so it
needs to take in some code from GetConfigOptionByNum() (and needs to
keep in-sync with it), which is what I didn't wnat to do. Anyway done
in the attached.

This method has a problem for wal_buffers. parse_and_validate_value()
returns 512 for -1 input since check_wal_buffers() converts it to 512.
It is added to the exclusion list.  (Conversely the previous method
regarded "-1" and "512" as identical.)

> Extending the tests to check after the values is something worth
> doing, but I think that I would limit the checks to the parameters  
> that do not have units for now, until we figure out which interface
> would be more adapted for doing the normalization of the parameter
> values.

The attached second is that.  FWIW, I'd like to support integer/real
values since I think they need more support of this kind of check.

> -#syslog_facility = 'LOCAL0'
> +#syslog_facility = 'local0'
> Those changes should not be necessary in postgresql.conf.sample.  The
> test should be in charge of applying the lower() conversion, in the
> same way as guc.c does internally, and that's a mode supported by the
> parameter parsing.  Using an upper-case value in the sample file is
> actually meaningful sometimes (for example, syslog can use upper-case
> strings to refer to LOCAL0~7).

I didn't notice, but now know parse_and_validate_value() convers
values the same way with bootval so finally case-unification is not
needed.

=# select pg_config_unitless_value('datestyle', 'iso, mdy');
 pg_config_unitless_value 
--
 ISO, MDY

However, the "datestyle" variable is shown as "DateStyle" in the
pg_settings view. So the name in the view needs to be lower-cased
instead. The same can be said to "TimeZone" and "IntervalStyle".  The
old query missed the case where there's no variable with the names
appear in the config file. Fixed it.

At Sat, 11 Jun 2022 09:41:37 -0500, Justin Pryzby  wrote 
in 
> Note that this gives:
> 
> guc.c:7573:9: warning: ‘dst’ may be used uninitialized in this function 
> [-Wmaybe-uninitialized]

Mmm. I don't have an idea where the 'dst' came from...


> I wonder whether you'd consider renaming pg_normalize_config_value() to
> pg_pretty_config_value() or similar.

Yeah, that's sensible, the function is now changed (not renamed) to
pg_config_unitless_value().  This name also doesn't satisfies me at
all..:(


So, the attached are:

v2-0001-Add-fileval-bootval-consistency-check-of-GUC-para.patch:

  New version of the previous patch.  It is changed after Michael's
  suggestions.
  

0001-Add-fileval-bootval-consistency-check-of-GUC-paramet-simple.patch

  Another version that doesn't need new C function.  It ignores
  variables that have units but I didn't count how many variables are
  ignored by this chnage.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 7ffa4f5f59564880107ffc3fa0bbd631e020054e Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 16 Jun 2022 17:08:02 +0900
Subject: [PATCH v2] Add fileval-bootval consistency check of GUC parameters

We should keep GUC variables consistent between the default values
written in postgresql.conf.sample and defined in guc.c. Add an
automated way to check for the consistency to the TAP test suite. Some
variables are still excluded since they cannot be checked simple way.
---
 src/backend/utils/misc/guc.c  | 53 
 src/include/catalog/pg_proc.dat   |  5 ++
 src/test/modules/test_misc/t/003_check_guc.pl | 60 +--
 3 files changed, 112 

Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-06-16 Thread John Naylor
On Thu, Jun 16, 2022 at 11:57 AM Masahiko Sawada  wrote:
> I've attached an updated version patch that changes the configure
> script. I'm still studying how to support AVX2 on msvc build. Also,
> added more regression tests.

Thanks for the update, I will take a closer look at the patch in the
near future, possibly next week. For now, though, I'd like to question
why we even need to use 32-byte registers in the first place. For one,
the paper referenced has 16-pointer nodes, but none for 32 (next level
is 48 and uses a different method to find the index of the next
pointer). Andres' prototype has 32-pointer nodes, but in a quick read
of his patch a couple weeks ago I don't recall a reason mentioned for
it. Even if 32-pointer nodes are better from a memory perspective, I
imagine it should be possible to use two SSE2 registers to find the
index. It'd be locally slightly more complex, but not much. It might
not even cost much more in cycles since AVX2 would require indirecting
through a function pointer. It's much more convenient if we don't need
a runtime check. There are also thermal and power disadvantages when
using AXV2 in some workloads. I'm not sure that's the case here, but
if it is, we'd better be getting something in return.

One more thing in general: In an earlier version, I noticed that
Andres used the slab allocator and documented why. The last version of
your patch that I saw had the same allocator, but not the "why".
Especially in early stages of review, we want to document design
decisions so it's more clear for the reader.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: Modest proposal to extend TableAM API for controlling cluster commands

2022-06-16 Thread David G. Johnston
On Wed, Jun 15, 2022 at 11:23 PM Mark Dilger 
wrote:

>
> > On Jun 15, 2022, at 8:50 PM, David G. Johnston <
> david.g.johns...@gmail.com> wrote:
> >
> > On Wed, Jun 15, 2022 at 8:18 PM Andres Freund 
> wrote:
> > > If a simple callback like
> > > relation_supports_cluster(Relation rel) is too simplistic
> >
> > Seems like it should be called:
> relation_supports_compaction[_by_removal_of_interspersed_dead_tuples]
>
> Ok.
>
> > Basically, if the user tells the table to make itself smaller on disk by
> removing dead tuples, should we support the case where the Table AM says:
> "Sorry, I cannot do that"?
>
> I submit that's the only sane thing to do if the table AM already
> guarantees that the table will always be fully compacted.  There is no
> justification for forcing the table contents to be copied without benefit.
>

I accept that this is a valid outcome that should be accommodated for.

>
> > If yes, then naming the table explicitly should elicit an error.  Having
> the table chosen implicitly should provoke a warning.  For ALTER TABLE
> CLUSTER there should be an error: which makes the implicit CLUSTER command
> a non-factor.
>
> I'm basically fine with how you would design the TAM, but I'm going to
> argue again that the core project should not dictate these decisions.  The
> TAM's relation_supports_compaction() function can return true/false, or
> raise an error.  If raising an error is the right action, the TAM can do
> that.



> If the core code makes that decision, the TAM can't override, and that
> paints TAM authors into a corner.


The core code has to decide what the template pattern code looks like,
including what things it will provide and what it requires extensions to
provide.  To a large extent, providing a consistent end-user experience is
the template's, and thus core code's, job.


> How about a TAM that implements a write-once, read-many logic.  You get
> one multi-insert, and forever after you can't modify it (other than to drop
> the table, or perhaps to truncate it).


So now the AM wants to ignore ALTER TABLE, INSERT, and DELETE commands.

  That's a completely made-up-on-the-spot example, but it's not entirely
> without merit.
>
> In what sense does this made-up TAM conflict with mvcc and wal?  It
> doesn't have all the features of heap, but that's not the same thing as
> violating mvcc or breaking wal.
>
>
I am nowhere near informed enough to speak to the implementation details
here, and my imagination is probably lacking too, but I'll accept that the
current system does indeed make assumptions in the template design that are
now being seen as incorrect in light of new algorithms.

But you are basically proposing a reworking of the existing system into one
that makes pretty much any SQL Command something that a TAM can treat as
being an optional request by the user; whereas today the system presumes
that the implementations will respond to these commands.  And to make this
change without any core code having such a need. Or even a working
extension that can be incorporated during development.  And, as per the
above, all of this requires coming to some kind of agreement on the desired
user experience (I don't immediately accept the "let the AM decide" option).

Anyway, that was mostly my attempt at Devil's Advocate.
I was going to originally post that the template simply inspect whether the
new physical relation file, after the copy was requested, had a non-zero
size, and if so finish performing the swap the way we do today, otherwise
basically abort (or otherwise perform the minimal amount of catalog
changes) so the existing relation file continues to be pointed at.
Something to consider with a smaller API footprint than a gatekeeper hook.

I think that all boils down to - it seems preferable to simply continue
assuming all these commands are accepted, but figure out whether a "no-op"
is a valid outcome and, if so, ensure there is a way to identify that no-op
meaningfully.  While hopefully designing the surrounding code so that
unnecessary work is not performed in front of a no-op.  This seems
preferable to spreading hooks throughout the code that basically ask "do
you handle this SQL command?".  The specifics of the existing code may
dictate otherwise.

David J.


Re: Modest proposal to extend TableAM API for controlling cluster commands

2022-06-16 Thread Andres Freund
Hi,

On 2022-06-15 22:23:36 -0700, Mark Dilger wrote:
> I'm not entirely against you on that, but it makes me cringe that we impose
> design decisions like that on any and all future TAMs.  It seems better to
> me to let the TAM author decide to emit an error, warning, notice, or
> whatever, as they see fit.

The tradeoff is that that pushes down complexity and makes the overall system
harder to understand. I'm not saying that there's no possible use for such
callbacks / configurability, I'm just not convinced it's worth the cost.


> >> But I can't really complete my work with the interface as it stands
> >> now.
> > 
> > Since you've not described that work to a meaningful degree...
> 
> I don't think I should have to do so.  It's like saying, "I think I should
> have freedom of speech", and you say, "well, I'm not sure about that; tell
> me what you want to say, and I'll decide if I'm going to let you say it".'
> That's not freedom.  I think TAM authors should have broad discretion over
> anything that the core system doesn't have a compelling interest in
> controlling.

That's insultingly ridiculous. You can say, do whatever you want, but that
doesn't mean I have to be convinced by it (i.e. +1 adding an API) - that'd be
compelled speech, to go with your image...

It's utterly normal to be asked what the use case for a new API is when
proposing one.


> You've not yet said why a TAM should be prohibited from opting
> out of cluster/vacfull.

API / behavioural complexity. If we make ever nook and cranny configurable,
we'll have an ever harder to use / administer system (from a user's POV) and
have difficulty understanding the state of the system when writing patches
(from a core PG developer's POV). It might be the right thing in this case -
hence me asking for what the motivation is.

Greetings,

Andres Freund




Re: Skipping logical replication transactions on subscriber side

2022-06-16 Thread Masahiko Sawada
On Thu, Jun 16, 2022 at 2:27 AM Robert Haas  wrote:
>
> On Tue, Jun 14, 2022 at 3:54 AM Masahiko Sawada  wrote:
> > > AFAICS, we could do that by:
> > >
> > > 1. De-supporting platforms that have this problem, or
> > > 2. Introducing new typalign values, as Noah proposed back on April 2, or
> > > 3. Somehow forcing values that are sometimes 4-byte aligned and
> > > sometimes 8-byte aligned to be 8-byte alignment on all platforms
> >
> > Introducing new typalign values seems a good idea to me as it's more
> > future-proof. Will this item be for PG16, right? The main concern
> > seems that what this test case enforces would be nuisance when
> > introducing a new system catalog or a new column to the existing
> > catalog but given we're in post PG15-beta1 it is unlikely to happen in
> > PG15.
>
> I agree that we're not likely to introduce a new typalign value any
> sooner than v16. There are a couple of things that bother me about
> that solution. One is that I don't know how many different behaviors
> exist out there in the wild. If we distinguish the alignment of double
> from the alignment of int8, is that good enough, or are there other
> data types whose properties aren't necessarily the same as either of
> those?

Yeah, there might be.

> The other is that 32-bit systems are already relatively rare
> and probably will become more rare until they disappear completely. It
> doesn't seem like a ton of fun to engineer solutions to problems that
> may go away by themselves with the passage of time.

IIUC the system affected by this problem is not necessarily 32-bit
system. For instance, the hoverfly on buildfarm is 64-bit system but
was affected by this problem. According to the XLC manual[1], there is
no difference between 32-bit systems and 64-bit systems in terms of
alignment for double. FWIW, looking at the manual, there might have
been a solution for AIX to specify -qalign=natural compiler option in
order to enforce the alignment of double to 8.

Regards,

[1] https://support.scinet.utoronto.ca/Manuals/xlC++-proguide.pdf;
Table 11 on page 10.

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Add header support to text format and matching feature

2022-06-16 Thread Peter Eisentraut

On 15.06.22 13:50, Daniel Verite wrote:

The other errors in the list above are more about the format itself,
with options that make sense only for one format. So the way we're
supposed to understand ERRCODE_FEATURE_NOT_SUPPORTED in these
other cases is that such format does not support such feature,
but without implying that it's a limitation.


I don't feel very strongly about this.  It makes sense to stay 
consistent with the existing COPY code.





Re: Replica Identity check of partition table on subscriber

2022-06-16 Thread Amit Langote
On Thu, Jun 16, 2022 at 3:45 PM Amit Kapila  wrote:
> On Thu, Jun 16, 2022 at 11:43 AM Amit Langote  wrote:
> > + * Don't throw any error here just mark the relation entry as not 
> > updatable,
> > + * as replica identity is only for updates and deletes but inserts can be
> > + * replicated even without it.
> >
> > I know you're simply copying the old comment, but I think we can
> > rewrite it to be slightly more useful:
> >
> > We just mark the relation entry as not updatable here if the local
> > replica identity is found to be insufficient and leave it to
> > check_relation_updatable() to throw the actual error if needed.
>
> I am fine with improving this comment but it would be better if in
> some way we keep the following part of the comment: "as replica
> identity is only for updates and deletes but inserts can be replicated
> even without it." as that makes it more clear why it is okay to just
> mark the entry as not updatable. One idea could be: "We just mark the
> relation entry as not updatable here if the local replica identity is
> found to be insufficient and leave it to check_relation_updatable() to
> throw the actual error if needed. This is because replica identity is
> only for updates and deletes but inserts can be replicated even
> without it.". Feel free to suggest if you have any better ideas?

I thought mentioning check_relation_updatable() would make it clear
that only updates (and deletes) care about a valid local replica
identity, because only apply_handle_{update|delete}() call that
function.  Anyway, how about this:

We just mark the relation entry as not updatable here if the local
replica identity is found to be insufficient for applying
updates/deletes (inserts don't care!) and leave it to
check_relation_updatable() to throw the actual error if needed.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: Replica Identity check of partition table on subscriber

2022-06-16 Thread Amit Kapila
On Thu, Jun 16, 2022 at 11:43 AM Amit Langote  wrote:
>
> On Thu, Jun 16, 2022 at 2:07 PM shiy.f...@fujitsu.com
>  wrote:
> > On Wed, Jun 15, 2022 8:30 PM Amit Kapila  wrote:
> > > I have pushed the first bug-fix patch today.
> >
> > Attached the remaining patches which are rebased.
>
> Thanks.
>
> Comments on v9-0001:
>
> + * Don't throw any error here just mark the relation entry as not updatable,
> + * as replica identity is only for updates and deletes but inserts can be
> + * replicated even without it.
>
> I know you're simply copying the old comment, but I think we can
> rewrite it to be slightly more useful:
>
> We just mark the relation entry as not updatable here if the local
> replica identity is found to be insufficient and leave it to
> check_relation_updatable() to throw the actual error if needed.
>

I am fine with improving this comment but it would be better if in
some way we keep the following part of the comment: "as replica
identity is only for updates and deletes but inserts can be replicated
even without it." as that makes it more clear why it is okay to just
mark the entry as not updatable. One idea could be: "We just mark the
relation entry as not updatable here if the local replica identity is
found to be insufficient and leave it to check_relation_updatable() to
throw the actual error if needed. This is because replica identity is
only for updates and deletes but inserts can be replicated even
without it.". Feel free to suggest if you have any better ideas?

-- 
With Regards,
Amit Kapila.




Re: Prevent writes on large objects in read-only transactions

2022-06-16 Thread Yugo NAGATA
On Thu, 2 Jun 2022 07:43:06 +0900
Michael Paquier  wrote:

> On Wed, Jun 01, 2022 at 10:15:17AM -0400, Tom Lane wrote:
> > It's always appropriate to consider backwards compatibility, and we
> > frequently don't back-patch a change because of worries about that.
> > However, if someone complains because we start rejecting this as of
> > v15 or v16, I don't think they have good grounds for that.  It's just
> > obviously wrong ... unless someone can come up with a plausible
> > definition of read-only-ness that excludes large objects.  I don't
> > say that that's impossible, but it sure seems like it'd be contorted
> > reasoning.  They're definitely inside-the-database entities.
> 
> FWIW, I find the removal of error paths to authorize new behaviors
> easy to think about in terms of compatibility, because nobody is going
> to complain about that as long as it works as intended.  The opposite,
> aka enforcing an error in a code path is much harder to reason about.
> Anyway, if I am outnumbered on this one that's fine by me :)

I attached the updated patch.

Per discussions above, I undo the change so that it prevents large
object writes in read-only transactions again.
 
> There are a couple of things in the original patch that may require to
> be adjusted though:
> 1) Should we complain in lo_open() when using the write mode for a
> read-only transaction?  My answer to that would be yes.

I fixed to raise the error in lo_open() when using the write mode.

> 2) We still publish two non-fmgr-callable routines, lo_read() and
> lo_write().  Pe4rhaps we'd better make them static to be-fsstubs.c or
> put the same restriction to the write routine as its SQL flavor?

I am not sure if we should use PreventCommandIfReadOnly in lo_write()
because there are other public functions that write to catalogs but there
are not the similar restrictions in such functions. I think it is caller's
responsibility to prevent to use such public functions in read-only context.

I also fixed to raise the error in each of lo_truncate and lo_truncate64
per Michael's comment in the other post.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 
diff --git a/src/backend/libpq/be-fsstubs.c b/src/backend/libpq/be-fsstubs.c
index 5804532881..19c0a0c5de 100644
--- a/src/backend/libpq/be-fsstubs.c
+++ b/src/backend/libpq/be-fsstubs.c
@@ -93,6 +93,9 @@ be_lo_open(PG_FUNCTION_ARGS)
 	elog(DEBUG4, "lo_open(%u,%d)", lobjId, mode);
 #endif
 
+	if (mode & INV_WRITE)
+		PreventCommandIfReadOnly("lo_open() in write mode");
+
 	/*
 	 * Allocate a large object descriptor first.  This will also create
 	 * 'fscxt' if this is the first LO opened in this transaction.
@@ -179,6 +182,8 @@ lo_write(int fd, const char *buf, int len)
 	int			status;
 	LargeObjectDesc *lobj;
 
+	PreventCommandIfReadOnly("lo_write");
+
 	if (fd < 0 || fd >= cookies_size || cookies[fd] == NULL)
 		ereport(ERROR,
 (errcode(ERRCODE_UNDEFINED_OBJECT),
@@ -245,6 +250,8 @@ be_lo_creat(PG_FUNCTION_ARGS)
 {
 	Oid			lobjId;
 
+	PreventCommandIfReadOnly("lo_creat()");
+
 	lo_cleanup_needed = true;
 	lobjId = inv_create(InvalidOid);
 
@@ -256,6 +263,8 @@ be_lo_create(PG_FUNCTION_ARGS)
 {
 	Oid			lobjId = PG_GETARG_OID(0);
 
+	PreventCommandIfReadOnly("lo_create()");
+
 	lo_cleanup_needed = true;
 	lobjId = inv_create(lobjId);
 
@@ -306,6 +315,8 @@ be_lo_unlink(PG_FUNCTION_ARGS)
 {
 	Oid			lobjId = PG_GETARG_OID(0);
 
+	PreventCommandIfReadOnly("lo_unlink()");
+
 	/*
 	 * Must be owner of the large object.  It would be cleaner to check this
 	 * in inv_drop(), but we want to throw the error before not after closing
@@ -368,6 +379,8 @@ be_lowrite(PG_FUNCTION_ARGS)
 	int			bytestowrite;
 	int			totalwritten;
 
+	PreventCommandIfReadOnly("lowrite()");
+
 	bytestowrite = VARSIZE_ANY_EXHDR(wbuf);
 	totalwritten = lo_write(fd, VARDATA_ANY(wbuf), bytestowrite);
 	PG_RETURN_INT32(totalwritten);
@@ -413,6 +426,8 @@ lo_import_internal(text *filename, Oid lobjOid)
 	LargeObjectDesc *lobj;
 	Oid			oid;
 
+	PreventCommandIfReadOnly("lo_import()");
+
 	/*
 	 * open the file to be read in
 	 */
@@ -561,6 +576,8 @@ be_lo_truncate(PG_FUNCTION_ARGS)
 	int32		fd = PG_GETARG_INT32(0);
 	int32		len = PG_GETARG_INT32(1);
 
+	PreventCommandIfReadOnly("lo_truncate()");
+
 	lo_truncate_internal(fd, len);
 	PG_RETURN_INT32(0);
 }
@@ -571,6 +588,8 @@ be_lo_truncate64(PG_FUNCTION_ARGS)
 	int32		fd = PG_GETARG_INT32(0);
 	int64		len = PG_GETARG_INT64(1);
 
+	PreventCommandIfReadOnly("lo_truncate64()");
+
 	lo_truncate_internal(fd, len);
 	PG_RETURN_INT32(0);
 }
@@ -815,6 +834,8 @@ be_lo_from_bytea(PG_FUNCTION_ARGS)
 	LargeObjectDesc *loDesc;
 	int			written PG_USED_FOR_ASSERTS_ONLY;
 
+	PreventCommandIfReadOnly("lo_from_bytea()");
+
 	lo_cleanup_needed = true;
 	loOid = inv_create(loOid);
 	loDesc = inv_open(loOid, INV_WRITE, CurrentMemoryContext);
@@ -837,6 +858,8 @@ be_lo_put(PG_FUNCTION_ARGS)
 	LargeObjectDesc *loDesc;
 	int			written PG_USED_FOR_ASSERTS_ONLY;
 
+	PreventCommandIfReadOnly("lo_put()");
+
 	lo_cleanup_needed = 

Re: Modest proposal to extend TableAM API for controlling cluster commands

2022-06-16 Thread Mark Dilger



> On Jun 15, 2022, at 8:50 PM, David G. Johnston  
> wrote:
> 
> On Wed, Jun 15, 2022 at 8:18 PM Andres Freund  wrote:
> > If a simple callback like
> > relation_supports_cluster(Relation rel) is too simplistic
> 
> Seems like it should be called: 
> relation_supports_compaction[_by_removal_of_interspersed_dead_tuples]

Ok.

> Basically, if the user tells the table to make itself smaller on disk by 
> removing dead tuples, should we support the case where the Table AM says: 
> "Sorry, I cannot do that"?

I submit that's the only sane thing to do if the table AM already guarantees 
that the table will always be fully compacted.  There is no justification for 
forcing the table contents to be copied without benefit.

> If yes, then naming the table explicitly should elicit an error.  Having the 
> table chosen implicitly should provoke a warning.  For ALTER TABLE CLUSTER 
> there should be an error: which makes the implicit CLUSTER command a 
> non-factor.

I'm basically fine with how you would design the TAM, but I'm going to argue 
again that the core project should not dictate these decisions.  The TAM's 
relation_supports_compaction() function can return true/false, or raise an 
error.  If raising an error is the right action, the TAM can do that.  If the 
core code makes that decision, the TAM can't override, and that paints TAM 
authors into a corner.

> However, given that should the table structure change it is imperative that 
> the Table AM be capable of producing a new physical relation with the correct 
> data, which will have been compacted as a side-effect, it seems like, 
> explicit or implicit, expecting any Table AM to do that when faced with 
> Vacuum Full is reasonable.  Which leaves deciding how to allow a table with a 
> given TAM to prevent itself from being added to the CLUSTER roster.  And 
> decide whether an opt-out feature for implicit VACUUM FULL is something we 
> should offer as well.
> 
> I'm doubtful that a TAM that is pluggable into the MVCC and WAL architecture 
> of PostgreSQL could avoid this basic contract between the system and its 
> users.

How about a TAM that implements a write-once, read-many logic.  You get one 
multi-insert, and forever after you can't modify it (other than to drop the 
table, or perhaps to truncate it).  That's a completely made-up-on-the-spot 
example, but it's not entirely without merit.  You could avoid a lot of locking 
overhead when using such a table, since you'd know a priori that nobody else is 
modifying it.  It could also be implemented with a smaller tuple header, since 
a lot of the header bytes in heap tuples are dedicated to tracking updates.  
You wouldn't need a per-row inserting transaction-Id either, since you could 
just store one per table, knowing that all the rows were inserted in the same 
transaction.

In what sense does this made-up TAM conflict with mvcc and wal?  It doesn't 
have all the features of heap, but that's not the same thing as violating mvcc 
or breaking wal.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Bump MIN_WINNT to 0x0600 (Vista) as minimal runtime in 16~

2022-06-16 Thread Michael Paquier
On Thu, Jun 09, 2022 at 02:47:36PM +0900, Michael Paquier wrote:
> On Thu, Jun 09, 2022 at 04:55:45PM +1200, Thomas Munro wrote:
>> I think we can drop mention of Itanium (RIP): the ancient versions of
>> Windows that could run on that arch are desupported with your patch.
>> It might be more relevant to say that we can't yet run on ARM, and
>> Windows 11 is untested by us, but let's fix those problems instead of
>> documenting them :-)
> 
> Okay to remove the Itanium part for me.

install-windows.sgml has one extra spot mentioning Windows 7 and
Server 2008 that can be simplified on top of that.

>> There are more mentions of older Windows releases near the compiler
>> stuff but perhaps your MSVC version vacuuming work will take care of
>> those.
> 
> Yes, I have a few changes like the one in main.c for _M_AMD64.  Are
> you referring to something else?

Actually, this can go with the bump of MIN_WINNT as it uses one of the
IsWindows*OrGreater() macros as a runtime check.  And there are two
more places in pg_ctl.c that can be similarly cleaned up.

It is possible that I have missed some spots, of course.
--
Michael
From 2d51b7dd04e44ace0294531878676ff89e465c0b Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 16 Jun 2022 15:11:53 +0900
Subject: [PATCH v3] Bump WIN_MINNT to 0x0A00 everywhere (Windows 10~)

---
 src/include/port/win32.h  | 11 +++
 src/backend/main/main.c   | 17 -
 src/backend/utils/adt/pg_locale.c |  4 ++--
 src/bin/pg_ctl/pg_ctl.c   | 26 ++
 doc/src/sgml/install-windows.sgml |  9 ++---
 doc/src/sgml/installation.sgml|  9 -
 6 files changed, 13 insertions(+), 63 deletions(-)

diff --git a/src/include/port/win32.h b/src/include/port/win32.h
index c6213c77c3..fe0829cedc 100644
--- a/src/include/port/win32.h
+++ b/src/include/port/win32.h
@@ -11,15 +11,10 @@
 
 /*
  * Make sure _WIN32_WINNT has the minimum required value.
- * Leave a higher value in place. When building with at least Visual
- * Studio 2015 the minimum requirement is Windows Vista (0x0600) to
- * get support for GetLocaleInfoEx() with locales. For everything else
- * the minimum version is Windows XP (0x0501).
+ * Leave a higher value in place.  The minimum requirement is Windows 10.
  */
-#if defined(_MSC_VER) && _MSC_VER >= 1900
-#define MIN_WINNT 0x0600
-#else
-#define MIN_WINNT 0x0501
+#if defined(_MSC_VER)
+#define MIN_WINNT 0x0A00
 #endif
 
 #if defined(_WIN32_WINNT) && _WIN32_WINNT < MIN_WINNT
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index c43a527d3f..dd82722ee3 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -290,23 +290,6 @@ startup_hacks(const char *progname)
 		_CrtSetReportFile(_CRT_ASSERT, _CRTDBG_FILE_STDERR);
 		_CrtSetReportMode(_CRT_WARN, _CRTDBG_MODE_FILE | _CRTDBG_MODE_DEBUG);
 		_CrtSetReportFile(_CRT_WARN, _CRTDBG_FILE_STDERR);
-
-#if defined(_M_AMD64) && _MSC_VER == 1800
-
-		/*--
-		 * Avoid crashing in certain floating-point operations if we were
-		 * compiled for x64 with MS Visual Studio 2013 and are running on
-		 * Windows prior to 7/2008R2 SP1 on an AVX2-capable CPU.
-		 *
-		 * Ref: https://connect.microsoft.com/VisualStudio/feedback/details/811093/visual-studio-2013-rtm-c-x64-code-generation-bug-for-avx2-instructions
-		 *--
-		 */
-		if (!IsWindows7SP1OrGreater())
-		{
-			_set_FMA3_enable(0);
-		}
-#endif			/* defined(_M_AMD64) && _MSC_VER == 1800 */
-
 	}
 #endif			/* WIN32 */
 
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index a0490a7522..4c39841b99 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1729,7 +1729,7 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 		else
 			ereport(ERROR,
 	(errmsg("could not load locale \"%s\"", collcollate)));
-#elif defined(WIN32) && _WIN32_WINNT >= 0x0600
+#elif defined(WIN32)
 		/*
 		 * If we are targeting Windows Vista and above, we can ask for a name
 		 * given a collation name (earlier versions required a location code
@@ -1757,7 +1757,7 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 			collcollate,
 			GetLastError(;
 		}
-		collversion = psprintf("%d.%d,%d.%d",
+		collversion = psprintf("%ld.%ld,%ld.%ld",
 			   (version.dwNLSVersion >> 8) & 0x,
 			   version.dwNLSVersion & 0xFF,
 			   (version.dwDefinedVersion >> 8) & 0x,
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index dd78e5bc66..ef58883a5c 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -1896,17 +1896,8 @@ CreateRestrictedProcess(char *cmd, PROCESS_INFORMATION *processInfo, bool as_ser
 	/* Verify that we found all functions */
 	if (_IsProcessInJob == NULL || _CreateJobObject == NULL || _SetInformationJobObject == NULL || _AssignProcessToJobObject == NULL || _QueryInformationJobObject == NULL)
 	{
-		/*
-		 * 

Re: Replica Identity check of partition table on subscriber

2022-06-16 Thread Amit Langote
Hi,

On Thu, Jun 16, 2022 at 2:07 PM shiy.f...@fujitsu.com
 wrote:
> On Wed, Jun 15, 2022 8:30 PM Amit Kapila  wrote:
> > I have pushed the first bug-fix patch today.
>
> Attached the remaining patches which are rebased.

Thanks.

Comments on v9-0001:

+ * Don't throw any error here just mark the relation entry as not updatable,
+ * as replica identity is only for updates and deletes but inserts can be
+ * replicated even without it.

I know you're simply copying the old comment, but I think we can
rewrite it to be slightly more useful:

We just mark the relation entry as not updatable here if the local
replica identity is found to be insufficient and leave it to
check_relation_updatable() to throw the actual error if needed.

+   /* Check that replica identity matches. */
+   logicalrep_rel_mark_updatable(entry);

Maybe the comment (there are 2 instances) should say:

Set if the table's replica identity is enough to apply update/delete.

Finally,

+# Alter REPLICA IDENTITY on subscriber.
+# No REPLICA IDENTITY in the partitioned table on subscriber, but what we check
+# is the partition, so it works fine.

For consistency with other recently added comments, I'd suggest the
following wording:

Test that replication works correctly as long as the leaf partition
has the necessary REPLICA IDENTITY, even though the actual target
partitioned table does not.

On v9-0002:

+   /* cleanup the invalid attrmap */

It seems that "invalid" here really means no-longer-useful, so we
should use that phrase as a nearby comment does:

Release the no-longer-useful attrmap, if any.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com