Re: [bug?] Missed parallel safety checks, and wrong parallel safety

2021-07-03 Thread Dilip Kumar
On Fri, Jul 2, 2021 at 8:16 PM Robert Haas  wrote:
>
> On Wed, Jun 30, 2021 at 11:46 PM Greg Nancarrow  wrote:
> > I personally think "(b) provide an option to the user to specify
> > whether inserts can be parallelized on a relation" is the preferable
> > option.
> > There seems to be too many issues with the alternative of trying to
> > determine the parallel-safety of a partitioned table automatically.
> > I think (b) is the simplest and most consistent approach, working the
> > same way for all table types, and without the overhead of (a).
> > Also, I don't think (b) is difficult for the user. At worst, the user
> > can use the provided utility-functions at development-time to verify
> > the intended declared table parallel-safety.
> > I can't really see some mixture of (a) and (b) being acceptable.
>
> Yeah, I'd like to have it be automatic, but I don't have a clear idea
> how to make that work nicely. It's possible somebody (Tom?) can
> suggest something that I'm overlooking, though.

In general, for the non-partitioned table, where we don't have much
overhead of checking the parallel safety and invalidation is also not
a big problem so I am tempted to provide an automatic parallel safety
check.  This would enable parallelism for more cases wherever it is
suitable without user intervention.  OTOH, I understand that providing
automatic checking might be very costly if the number of partitions is
more.  Can't we provide some mid-way where the parallelism is enabled
by default for the normal table but for the partitioned table it is
disabled by default and the user has to set it safe for enabling
parallelism?  I agree that such behavior might sound a bit hackish.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Using COPY FREEZE in pgbench

2021-07-03 Thread Fabien COELHO



Hello Tatsuo-san,

So overall gain by the patch is around 15%, whereas the last test before 
the commit was 14%.  It seems the patch is still beneficial after the 
commit.


Yes, that's good!

I had a quick look again, and about the comment:

 /*
  * If partitioning is not enabled and server version is 14.0 or later, we
  * can take account of freeze option of copy.
  */

I'd suggest instead the shorter:

 /* use COPY with FREEZE on v14 and later without partioning */

Or maybe even to fully drop the comment, because the code is clear enough 
and the doc already says it.


--
Fabien.




Re: A qsort template

2021-07-03 Thread Thomas Munro
On Fri, Jul 2, 2021 at 2:32 PM John Naylor  wrote:
> I suspect if we experiment on two extremes of type "heaviness" (accessing and 
> comparing trivial or not), such as uint32 and tuplesort, we'll have a pretty 
> good idea what the parameters should be, if anything different. I'll do some 
> testing along those lines.

Cool.

Since you are experimenting with tuplesort and likely thinking similar
thoughts, here's a patch I've been using to explore that area.  I've
seen it get, for example, ~1.18x speedup for simple index builds in
favourable winds (YMMV, early hacking results only).  Currently, it
kicks in when the leading column is of type int4, int8, timestamp,
timestamptz, date or text + friends (when abbreviatable, currently
that means "C" and ICU collations only), while increasing the
executable by only 8.5kB (Clang, amd64, -O2, no debug).

These types are handled with just three specialisations.  Their custom
"fast" comparators all boiled down to comparisons of datum bits,
varying only in signedness and width, so I tried throwing them away
and using 3 new common routines.  Then I extended
tuplesort_sort_memtuples()'s pre-existing specialisation dispatch to
recognise qualifying users of those and select 3 corresponding sort
specialisations.

It might turn out to be worth burning some more executable size on
extra variants (for example, see XXX notes in the code comments for
opportunities; one could also go nuts trying smaller things like
special cases for not-null, nulls first, reverse sort, ... to kill all
those branches), or not.
From 62a8acbc745f3b4a90c9a14b6b61989a9d83bece Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sat, 3 Jul 2021 19:02:10 +1200
Subject: [PATCH] WIP: Accelerate tuple sorting for common types.

Several data types have their own fast comparator functions that can be
replaced with common binary or signed comparator functions.  These
functions can then be recognized by tuplesort.c and used to dispatch to
corresponding specialized sort functions, to accelerate sorting.

XXX WIP, experiment grade only, many open questions...
---
 src/backend/access/nbtree/nbtcompare.c |  22 ++--
 src/backend/utils/adt/date.c   |  15 +--
 src/backend/utils/adt/timestamp.c  |  11 ++
 src/backend/utils/adt/varlena.c|  26 +---
 src/backend/utils/sort/tuplesort.c | 161 -
 src/include/utils/sortsupport.h| 115 ++
 6 files changed, 295 insertions(+), 55 deletions(-)

diff --git a/src/backend/access/nbtree/nbtcompare.c 
b/src/backend/access/nbtree/nbtcompare.c
index 7ac73cb8c2..204cf778fb 100644
--- a/src/backend/access/nbtree/nbtcompare.c
+++ b/src/backend/access/nbtree/nbtcompare.c
@@ -119,26 +119,12 @@ btint4cmp(PG_FUNCTION_ARGS)
PG_RETURN_INT32(A_LESS_THAN_B);
 }
 
-static int
-btint4fastcmp(Datum x, Datum y, SortSupport ssup)
-{
-   int32   a = DatumGetInt32(x);
-   int32   b = DatumGetInt32(y);
-
-   if (a > b)
-   return A_GREATER_THAN_B;
-   else if (a == b)
-   return 0;
-   else
-   return A_LESS_THAN_B;
-}
-
 Datum
 btint4sortsupport(PG_FUNCTION_ARGS)
 {
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
 
-   ssup->comparator = btint4fastcmp;
+   ssup->comparator = ssup_datum_int32_cmp;
PG_RETURN_VOID();
 }
 
@@ -156,6 +142,7 @@ btint8cmp(PG_FUNCTION_ARGS)
PG_RETURN_INT32(A_LESS_THAN_B);
 }
 
+#ifndef USE_FLOAT8_BYVAL
 static int
 btint8fastcmp(Datum x, Datum y, SortSupport ssup)
 {
@@ -169,13 +156,18 @@ btint8fastcmp(Datum x, Datum y, SortSupport ssup)
else
return A_LESS_THAN_B;
 }
+#endif
 
 Datum
 btint8sortsupport(PG_FUNCTION_ARGS)
 {
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
 
+#ifdef USE_FLOAT8_BYVAL
+   ssup->comparator = ssup_datum_signed_cmp;
+#else
ssup->comparator = btint8fastcmp;
+#endif
PG_RETURN_VOID();
 }
 
diff --git a/src/backend/utils/adt/date.c b/src/backend/utils/adt/date.c
index 0bea16cb67..350c662d50 100644
--- a/src/backend/utils/adt/date.c
+++ b/src/backend/utils/adt/date.c
@@ -438,25 +438,12 @@ date_cmp(PG_FUNCTION_ARGS)
PG_RETURN_INT32(0);
 }
 
-static int
-date_fastcmp(Datum x, Datum y, SortSupport ssup)
-{
-   DateADT a = DatumGetDateADT(x);
-   DateADT b = DatumGetDateADT(y);
-
-   if (a < b)
-   return -1;
-   else if (a > b)
-   return 1;
-   return 0;
-}
-
 Datum
 date_sortsupport(PG_FUNCTION_ARGS)
 {
SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
 
-   ssup->comparator = date_fastcmp;
+   ssup->comparator = ssup_datum_int32_cmp;
PG_RETURN_VOID();
 }
 
diff --git a/src/backend/utils/adt/timestamp.c 
b/src/backend/utils/adt/timestamp.c
index 79761f809c..c678517db6 100644
--- a/src/backend/utils/adt/timestamp.c
+++ b/src/backend/utils/adt/timestamp.c
@@ -37,6 +37,7 @@
 #include "utils/datetime.h"
 

Re: logical replication worker accesses catalogs in error context callback

2021-07-03 Thread Bharath Rupireddy
On Sat, Jul 3, 2021 at 10:03 PM Tom Lane  wrote:
>
> Bharath Rupireddy  writes:
> > The patch basically looks good to me except a minor comment to have a
> > local variable for var->varattno which makes the code shorter.
>
> Here's a restructured version that uses rangetable data for the
> simple-relation case too.  I also modified the relevant test cases
> so that it's visible that we're reporting aliases not true names.

How about making the below else if statement and the attname
assignment into a single line? They are falling below the 80 char
limit.
else if (colno > 0 && colno <= list_length(rte->eref->colnames))
attname = strVal(list_nth(rte->eref->colnames, colno - 1));

Otherwise the v8 patch looks good to me.

Regards,
Bharath Rupireddy.




Re: Re[3]: On login trigger: take three

2021-07-03 Thread vignesh C
On Thu, Jun 3, 2021 at 8:36 AM Greg Nancarrow  wrote:
>
> On Fri, May 21, 2021 at 2:46 PM Greg Nancarrow  wrote:
> >
> > On Thu, May 20, 2021 at 2:45 PM Ivan Panchenko  wrote:
> > >
> > > I have upgraded the patch for the 14th version.
> > >
> >
> > I have some feedback on the patch:
> >
>
> I've attached an updated version of the patch.
> I've applied my review comments and done a bit more tidying up of
> documentation and comments.
> Also, I found that the previously-posted patch was broken by
> snapshot-handling changes in commit 84f5c290 (with patch applied,
> resulting in a coredump during regression tests) so I've added a fix
> for that too.

CFBot shows the following failure:
# poll_query_until timed out executing this query:
# SELECT '0/3046250' <= replay_lsn AND state = 'streaming' FROM
pg_catalog.pg_stat_replication WHERE application_name = 'standby_1';
# expecting this output:
# t
# last actual query output:
# t
# with stderr:
# NOTICE: You are welcome!
# Looks like your test exited with 29 before it could output anything.
t/001_stream_rep.pl ..
Dubious, test returned 29 (wstat 7424, 0x1d00)

Regards,
Vignesh




Re: Disable WAL logging to speed up data loading

2021-07-03 Thread vignesh C
On Wed, Apr 7, 2021 at 12:13 PM osumi.takami...@fujitsu.com
 wrote:
>
> Hi
>
>
> Mainly affected by a commit 9de9294,
> I've fixed minor things to rebase the patch.
> All modifications I did are cosmetic changes and
> a little bit of documentation updates.
> Please have a look at attached v09.
>

Patch is not applying on Head, kindly post a rebased version. As this
patch is in Ready for Committer state, it will help one of the
committers to pick up this patch during commit fest.

Regards,
Vignesh




Re: Using COPY FREEZE in pgbench

2021-07-03 Thread Tatsuo Ishii
After this commit:
https://git.postgresql.org/pg/commitdiff/8e03eb92e9ad54e2f1ed8b5a73617601f6262f81
I was worried about that the benefit of COPY FREEZE patch is somewhat
reduced or gone.  So I ran a pgbench test again.

Current master:

$ pgbench -i -s 100 test
:
:
done in 20.23 s (drop tables 0.00 s, create tables 0.02 s, client-side generate 
13.54 s, vacuum 2.34 s, primary keys 4.33 s).

With v5 patch:
done in 16.92 s (drop tables 0.21 s, create tables 0.01 s, client-side generate 
12.68 s, vacuum 0.24 s, primary keys 3.77 s).

So overall gain by the patch is around 15%, whereas the last test
before the commit was 14%.  It seems the patch is still beneficial
after the commit.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: pgbench using COPY FREEZE

2021-07-03 Thread Tatsuo Ishii
Hi Simon,

>> Hello Simon,
>>
>> Indeed.
>>
>> There is already a "ready" patch in the queue, see:
>>
>> https://commitfest.postgresql.org/33/3034/
> 
> Ah, my bad. I withdraw this patch, apologies Tatsuo-san.

No problem at all. Thank you for looking into the issue.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Removing redundant check for transaction in progress in check_safe_enum_use

2021-07-03 Thread Zhihong Yu
Hi,
I was looking at :
Relax transactional restrictions on ALTER TYPE ... ADD VALUE (redux).

In check_safe_enum_use():

+   if (!TransactionIdIsInProgress(xmin) &&
+   TransactionIdDidCommit(xmin))
+   return;

Since the condition would be true only when TransactionIdDidCommit()
returns true, I think the call to TransactionIdIsInProgress is not needed.
If transaction for xmin is committed, the transaction cannot be in progress
at the same time.

Please see the simple patch for removing the redundant check.

Thanks


txn-in-progress-enum.patch
Description: Binary data


Re: Reducing the cycle time for CLOBBER_CACHE_ALWAYS buildfarm members

2021-07-03 Thread Tom Lane
I wrote:
> Andrew Dunstan  writes:
>> Seems reasonable. I don't have a CCA animal any more, but I guess I
>> could set up a test.

> I can run a test here --- I'll commandeer sifaka for awhile,
> since that's the fastest animal I have.

Done, and here's the results:

Traditional way (-DCLOBBER_CACHE_ALWAYS): 32:53:24
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sifaka=2021-07-01%2018%3A06%3A09

New way: 16:15:43
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sifaka=2021-07-03%2004%3A02%3A16

That's running sifaka's full test schedule including TAP tests,
which ordinarily takes it a shade under 10 minutes.  The savings
on a non-TAP run would be a lot less, of course, thanks to
fewer initdb invocations.

Although I lacked the patience to run a full back-branch cycle
with -DCLOBBER_CACHE_ALWAYS, I did verify that the patch
correctly injects that #define when running an old branch.
So I think it's ready to go into the buildfarm, modulo any
cosmetic work you might want to do.

regards, tom lane




Re: Preventing abort() and exit() calls in libpq

2021-07-03 Thread Tom Lane
Noah Misch  writes:
> On Sat, Jul 03, 2021 at 10:45:59AM -0400, Tom Lane wrote:
>> What I'm now thinking about is restricting the test to only be run on
>> platforms where use of foo.a libraries is deprecated, so that we can
>> be pretty sure that we won't hit this situation.  Even if we only
>> run the test on Linux, that'd be plenty to catch any mistakes.

> Hmm.  Static libraries are the rarer case on both AIX and Linux, but I'm not
> aware of a relevant deprecation on either platform.  If it comes this to, I'd
> be more inclined to control the Makefile rule with an environment variable
> (e.g. ENFORCE_LIBC_CALL_RESTRICTIONS) instead of reacting to the platform.

That'd require buildfarm owner intervention, as well as intervention
by users.  Which seems like exporting our problems onto them.  I'd
really rather not go that way if we can avoid it.

regards, tom lane




Re: Preventing abort() and exit() calls in libpq

2021-07-03 Thread Noah Misch
On Sat, Jul 03, 2021 at 10:45:59AM -0400, Tom Lane wrote:
> I'd imagined leaving, e.g., psprintf.c out of libpgcommon_shlib.a.
> But if someone mistakenly introduced a psprintf call into libpq,
> it'd still compile just fine; the symbol would be resolved against
> psprintf in the calling application's code.

I think that would fail to compile on Windows, where such references need
exported symbols.  We don't make an exports file for applications other than
postgres.exe.  So the strategy that inspired this may work.

> What I'm now thinking about is restricting the test to only be run on
> platforms where use of foo.a libraries is deprecated, so that we can
> be pretty sure that we won't hit this situation.  Even if we only
> run the test on Linux, that'd be plenty to catch any mistakes.

Hmm.  Static libraries are the rarer case on both AIX and Linux, but I'm not
aware of a relevant deprecation on either platform.  If it comes this to, I'd
be more inclined to control the Makefile rule with an environment variable
(e.g. ENFORCE_LIBC_CALL_RESTRICTIONS) instead of reacting to the platform.




Re: Unbounded %s in sscanf

2021-07-03 Thread Daniel Gustafsson
> On 28 Jun 2021, at 16:45, Daniel Gustafsson  wrote:
> 
>> On 28 Jun 2021, at 16:02, Tom Lane  wrote:
> 
>> Ugh.  Shouldn't we instead modify the format to read not more than
>> two characters?  Even if this is safe on non-malicious input, it
>> doesn't seem like good style.
> 
> No disagreement, I was only basing it on what is in the tree.  I would propose
> that we change the sscanf in _LoadBlobs() too though to eliminate all such
> callsites, even though that one is even safer.  I'll prepare a patch once more
> caffeine has been ingested.

Returning to this, attached is a patchset which amends the two sscanf()
callsites with their respective buffersizes for %s format parsing.  In pg_dump
we need to inject the MAXPGPATH limit with the preprocessor and thus the buffer
needs to be increased by one to account for the terminator (which is yet more
hygiene coding since the fname buffer is now larger than the input buffer).

While in here, I noticed that the fname variable is shadowed in the loop
parsing the blobs TOC which yields a broken error message on parse errors.  The
attached 0003 fixes that.

--
Daniel Gustafsson   https://vmware.com/



v3-0003-Fix-bug-in-TOC-file-error-message-printing.patch
Description: Binary data


v3-0002-Fix-sscanf-limit-in-pg_dump.patch
Description: Binary data


v3-0001-Fix-sscanf-limit-in-pg_basebackup.patch
Description: Binary data


Re: Synchronous commit behavior during network outage

2021-07-03 Thread Jeff Davis
On Sat, 2021-07-03 at 14:06 +0500, Andrey Borodin wrote:
> > But until you've disabled sync rep, the primary will essentially be
> > down for writes whether using this new feature or not. Even if you
> > can
> > terminate some backends to try to free space, the application will
> > just
> > make new connections that will get stuck the same way.
> 
> Surely I'm talking about terminating postmaster, not individual
> backends. But postmaster will need to terminate each running query.
> We surely need to have a way to stop whole instance without making
> any single query. And I do not like kill -9 for this purpose.

kill -6 would suffice.

I see the point that you don't want this to interfere with an
administrative shutdown. But it seems like most shutdowns will need to
escalate to SIGABRT for cases where things are going badly wrong (low
memory, etc.) anyway. I don't see a better solution here.

I don't fully understand why you'd be concerned about cancellation but
not concerned about similar problems with termination, but if you think
two GUCs are important I can do that.

Regards,
Jeff Davis







Re: cutting down the TODO list thread

2021-07-03 Thread John Naylor
On Thu, Jul 1, 2021 at 9:23 PM Bruce Momjian  wrote:

> Agreed.  Please remove them or I can do it.

Done, and also changed next release to "15".

--
John Naylor
EDB: http://www.enterprisedb.com


Re: logical replication worker accesses catalogs in error context callback

2021-07-03 Thread Tom Lane
Bharath Rupireddy  writes:
> The patch basically looks good to me except a minor comment to have a
> local variable for var->varattno which makes the code shorter.

Here's a restructured version that uses rangetable data for the
simple-relation case too.  I also modified the relevant test cases
so that it's visible that we're reporting aliases not true names.

regards, tom lane

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 31b5de91ad..25112df916 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -4096,15 +4096,17 @@ DROP TABLE reind_fdw_parent;
 -- conversion error
 -- ===
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 TYPE int;
-SELECT * FROM ft1 WHERE c1 = 1;  -- ERROR
+SELECT * FROM ft1 ftx(x1,x2,x3,x4,x5,x6,x7,x8) WHERE x1 = 1;  -- ERROR
 ERROR:  invalid input syntax for type integer: "foo"
-CONTEXT:  column "c8" of foreign table "ft1"
-SELECT  ft1.c1,  ft2.c2, ft1.c8 FROM ft1, ft2 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = 1; -- ERROR
+CONTEXT:  column "x8" of foreign table "ftx"
+SELECT ftx.x1, ft2.c2, ftx.x8 FROM ft1 ftx(x1,x2,x3,x4,x5,x6,x7,x8), ft2
+  WHERE ftx.x1 = ft2.c1 AND ftx.x1 = 1; -- ERROR
 ERROR:  invalid input syntax for type integer: "foo"
-CONTEXT:  column "c8" of foreign table "ft1"
-SELECT  ft1.c1,  ft2.c2, ft1 FROM ft1, ft2 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = 1; -- ERROR
+CONTEXT:  column "x8" of foreign table "ftx"
+SELECT ftx.x1, ft2.c2, ftx FROM ft1 ftx(x1,x2,x3,x4,x5,x6,x7,x8), ft2
+  WHERE ftx.x1 = ft2.c1 AND ftx.x1 = 1; -- ERROR
 ERROR:  invalid input syntax for type integer: "foo"
-CONTEXT:  whole-row reference to foreign table "ft1"
+CONTEXT:  whole-row reference to foreign table "ftx"
 SELECT sum(c2), array_agg(c8) FROM ft1 GROUP BY c8; -- ERROR
 ERROR:  invalid input syntax for type integer: "foo"
 CONTEXT:  processing expression at position 2 in select list
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index fafbab6b02..696277ba10 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -302,16 +302,8 @@ typedef struct
  */
 typedef struct ConversionLocation
 {
-	Relation	rel;			/* foreign table's relcache entry. */
 	AttrNumber	cur_attno;		/* attribute number being processed, or 0 */
-
-	/*
-	 * In case of foreign join push down, fdw_scan_tlist is used to identify
-	 * the Var node corresponding to the error location and
-	 * fsstate->ss.ps.state gives access to the RTEs of corresponding relation
-	 * to get the relation name and attribute name.
-	 */
-	ForeignScanState *fsstate;
+	ForeignScanState *fsstate;	/* plan node being processed */
 } ConversionLocation;
 
 /* Callback argument for ec_member_matches_foreign */
@@ -7082,7 +7074,6 @@ make_tuple_from_result_row(PGresult *res,
 	/*
 	 * Set up and install callback to report where conversion error occurs.
 	 */
-	errpos.rel = rel;
 	errpos.cur_attno = 0;
 	errpos.fsstate = fsstate;
 	errcallback.callback = conversion_error_callback;
@@ -7185,34 +7176,32 @@ make_tuple_from_result_row(PGresult *res,
 /*
  * Callback function which is called when error occurs during column value
  * conversion.  Print names of column and relation.
+ *
+ * Note that this function mustn't do any catalog lookups, since we are in
+ * an already-failed transaction.  Fortunately, we can get the needed info
+ * from the query's rangetable instead.
  */
 static void
 conversion_error_callback(void *arg)
 {
+	ConversionLocation *errpos = (ConversionLocation *) arg;
+	ForeignScanState *fsstate = errpos->fsstate;
+	ForeignScan *fsplan = castNode(ForeignScan, fsstate->ss.ps.plan);
+	int			varno = 0;
+	AttrNumber	colno = 0;
 	const char *attname = NULL;
 	const char *relname = NULL;
 	bool		is_wholerow = false;
-	ConversionLocation *errpos = (ConversionLocation *) arg;
 
-	if (errpos->rel)
+	if (fsplan->scan.scanrelid > 0)
 	{
 		/* error occurred in a scan against a foreign table */
-		TupleDesc	tupdesc = RelationGetDescr(errpos->rel);
-		Form_pg_attribute attr = TupleDescAttr(tupdesc, errpos->cur_attno - 1);
-
-		if (errpos->cur_attno > 0 && errpos->cur_attno <= tupdesc->natts)
-			attname = NameStr(attr->attname);
-		else if (errpos->cur_attno == SelfItemPointerAttributeNumber)
-			attname = "ctid";
-
-		relname = RelationGetRelationName(errpos->rel);
+		varno = fsplan->scan.scanrelid;
+		colno = errpos->cur_attno;
 	}
 	else
 	{
 		/* error occurred in a scan against a foreign join */
-		ForeignScanState *fsstate = errpos->fsstate;
-		ForeignScan *fsplan = castNode(ForeignScan, fsstate->ss.ps.plan);
-		EState	   *estate = fsstate->ss.ps.state;
 		TargetEntry *tle;
 
 		tle = list_nth_node(TargetEntry, fsplan->fdw_scan_tlist,
@@ -7220,35 +7209,42 @@ conversion_error_callback(void *arg)
 
 		/*
 		 * Target list can have Vars and expressions.  For Vars, we can get
-		

Re: logical replication worker accesses catalogs in error context callback

2021-07-03 Thread Tom Lane
Bharath Rupireddy  writes:
> Isn't it better to have the below comment before
> slot_store_error_callback, similar to what's there before
> conversion_error_callback in v7-0002.
>  * Note that this function mustn't do any catalog lookups, since we are in
>  * an already-failed transaction.

Not really, as there's not much temptation to do so in the current form
of that function.  I have no desire to go around and plaster every
errcontext callback with such comments.

> Maybe it's worth considering
> avoid_sys_cache_lookup_in_error_callback_v2.patch from [1] as another
> way to enforce the above statement.

That looks fundamentally broken from here.  Wouldn't it forbid
perfectly OK code like this randomly-chosen example from tablecmds.c?

if (list_member_oid(inheritOids, parentOid))
ereport(ERROR,
(errcode(ERRCODE_DUPLICATE_TABLE),
 errmsg("relation \"%s\" would be inherited from more than 
once",
get_rel_name(parentOid;

That is, it's a bit hard to say exactly where in the error processing
sequence we should start deeming it unsafe to do a catalog lookup;
but the mere presence of an error stack entry can't mean that.

In a lot of situations, there wouldn't be any need to consider the
transaction broken until we've done a longjmp, so that catalog
lookups in errcontext callbacks would be perfectly safe.  (Which
indeed is why we've not yet seen an actual problem in either of
the spots discussed in this thread.)  The reason for being paranoid
about what an errcontext callback can do is that the callback cannot
know what the current failure's cause is.  If we're trying to report
some internal error that means that the transaction really is pretty
broken, then it'd be problematic to initiate new catalog accesses.

regards, tom lane




Re: [PATCH] Hooks at XactCommand level

2021-07-03 Thread Gilles Darold
Le 01/07/2021 à 18:47, Tom Lane a écrit :
> Nicolas CHAHWEKILIAN  writes:
>> As far as I am concerned, I am totally awaiting for this kind of feature
>> exposed here, for one single reason at this time : the extension
>> pg_statement_rollback will be much more valuable with the ability of
>> processing "rollback to savepoint" without the need for explicit
>> instruction from client side (and this patch is giving this option).
> What exactly do these hooks do that isn't done as well or better
> by the RegisterXactCallback and RegisterSubXactCallback mechanisms?
> Perhaps we need to define some additional event types for those?
> Or pass more data to the callback functions?


Sorry it take me time to recall the reason of the hooks. Actually the
problem is that the callbacks are not called when a statement is
executed after an error so that we fall back to error:

    ERROR:  current transaction is aborted, commands ignored until end
of transaction block

For example with the rollback at statement level extension:


BEGIN;
UPDATE tbl_rsl SET id = 'two', val = 2 WHERE id = 1; -- > will fail
LOG:  statement: UPDATE tbl_rsl SET id = 'two', val = 2 WHERE id = 1;
ERROR:  invalid input syntax for type integer: "two"
LINE 1: UPDATE tbl_rsl SET id = 'two', val = 2 WHERE id = 1;
    ^
UPDATE tbl_rsl SET id = 'two', val = 2 WHERE id = 1; -- > will
fail again
LOG:  statement: UPDATE tbl_rsl SET id = 'two', val = 2 WHERE id = 1;
ERROR:  current transaction is aborted, commands ignored until end
of transaction block
SELECT * FROM tbl_rsl; -- Should show records id 1 + 2
LOG:  statement: SELECT * FROM tbl_rsl;
ERROR:  current transaction is aborted, commands ignored until end
of transaction block


With the exention and the hook on start_xact_command() we can continue
and execute all the following statements.


I have updated the patch to only keep the hook on start_xact_command(),
as you've suggested the other hooks can be replaced by the use of the
xact callback. The extension has also been updated for testing the
feature, available here https://github.com/darold/pg_statement_rollbackv2


> I quite dislike inventing a hook that's defined as "run during
> start_xact_command", because there is basically nothing that's
> not ad-hoc about that function: it's internal to postgres.c
> and both its responsibilities and its call sites have changed
> over time.  I think anyone hooking into that will be displeased
> by the stability of their results.

Unfortunately I had not found a better solution, but I just tried with
placing the hook in function BeginCommand() in src/backend/tcop/dest.c
and the extension is working as espected. Do you think it would be a
better place?In this case I can update the patch. For this feature we
need a hook that is executed before any command even if the transaction
is in abort state to be able to inject the rollback to savepoint, maybe
I'm not looking at the right place to do that.


Thanks

-- 
Gilles Darold
http://www.darold.net/

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8cea10c901..4b9f8eeb3c 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -207,6 +207,8 @@ static void log_disconnections(int code, Datum arg);
 static void enable_statement_timeout(void);
 static void disable_statement_timeout(void);
 
+/* Hooks for plugins to get control at end of start_xact_command() */
+XactCommandStart_hook_type start_xact_command_hook = NULL;
 
 /* 
  *		routines to obtain user input
@@ -2708,6 +2710,13 @@ start_xact_command(void)
 		!get_timeout_active(CLIENT_CONNECTION_CHECK_TIMEOUT))
 		enable_timeout_after(CLIENT_CONNECTION_CHECK_TIMEOUT,
 			 client_connection_check_interval);
+
+	/*
+	 * Now give loadable modules a chance to execute code before a transaction
+	 * command is processed.
+	 */
+	if (start_xact_command_hook)
+		(*start_xact_command_hook) ();
 }
 
 static void
diff --git a/src/include/tcop/pquery.h b/src/include/tcop/pquery.h
index 2318f04ff0..540ede42fd 100644
--- a/src/include/tcop/pquery.h
+++ b/src/include/tcop/pquery.h
@@ -48,4 +48,8 @@ extern bool PlannedStmtRequiresSnapshot(struct PlannedStmt *pstmt);
 
 extern void EnsurePortalSnapshotExists(void);
 
+/* Hook for plugins to get control in start_xact_command() and finish_xact_command() */
+typedef void (*XactCommandStart_hook_type) (void);
+extern PGDLLIMPORT XactCommandStart_hook_type start_xact_command_hook;
+
 #endif			/* PQUERY_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 64c06cf952..f473c2dc39 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2934,6 +2934,7 @@ XPVIV
 XPVMG
 XactCallback
 XactCallbackItem
+XactCommandStart_hook_type
 XactEvent
 XactLockTableWaitInfo
 XidBoundsViolation


Re: rand48 replacement

2021-07-03 Thread Fabien COELHO


1. PostgreSQL source uses `uint64` and `uint32`, but not 
`uint64_t`/`uint32_t`


Indeed you are right. Attached v6 does that as well.

--
Fabien.diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 2c2f149fb0..146b524076 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -1188,7 +1188,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
  * Found a suitable tuple, so save it, replacing one old tuple
  * at random
  */
-int			k = (int) (targrows * sampler_random_fract(rstate.randstate));
+int			k = (int) (targrows * sampler_random_fract());
 
 Assert(k >= 0 && k < targrows);
 heap_freetuple(rows[k]);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index fafbab6b02..3009861e45 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -5152,7 +5152,7 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
 		if (astate->rowstoskip <= 0)
 		{
 			/* Choose a random reservoir element to replace. */
-			pos = (int) (targrows * sampler_random_fract(astate->rstate.randstate));
+			pos = (int) (targrows * sampler_random_fract(>rstate.randstate));
 			Assert(pos >= 0 && pos < targrows);
 			heap_freetuple(astate->rows[pos]);
 		}
diff --git a/contrib/tsm_system_rows/tsm_system_rows.c b/contrib/tsm_system_rows/tsm_system_rows.c
index 4996612902..1a46d4b143 100644
--- a/contrib/tsm_system_rows/tsm_system_rows.c
+++ b/contrib/tsm_system_rows/tsm_system_rows.c
@@ -69,7 +69,7 @@ static BlockNumber system_rows_nextsampleblock(SampleScanState *node, BlockNumbe
 static OffsetNumber system_rows_nextsampletuple(SampleScanState *node,
 BlockNumber blockno,
 OffsetNumber maxoffset);
-static uint32 random_relative_prime(uint32 n, SamplerRandomState randstate);
+static uint32 random_relative_prime(uint32 n, pg_prng_state *randstate);
 
 
 /*
@@ -213,25 +213,25 @@ system_rows_nextsampleblock(SampleScanState *node, BlockNumber nblocks)
 		if (sampler->step == 0)
 		{
 			/* Initialize now that we have scan descriptor */
-			SamplerRandomState randstate;
+			pg_prng_state randstate;
 
 			/* If relation is empty, there's nothing to scan */
 			if (nblocks == 0)
 return InvalidBlockNumber;
 
 			/* We only need an RNG during this setup step */
-			sampler_random_init_state(sampler->seed, randstate);
+			sampler_random_init_state(sampler->seed, );
 
 			/* Compute nblocks/firstblock/step only once per query */
 			sampler->nblocks = nblocks;
 
 			/* Choose random starting block within the relation */
 			/* (Actually this is the predecessor of the first block visited) */
-			sampler->firstblock = sampler_random_fract(randstate) *
+			sampler->firstblock = sampler_random_fract() *
 sampler->nblocks;
 
 			/* Find relative prime as step size for linear probing */
-			sampler->step = random_relative_prime(sampler->nblocks, randstate);
+			sampler->step = random_relative_prime(sampler->nblocks, );
 		}
 
 		/* Reinitialize lb */
@@ -317,7 +317,7 @@ gcd(uint32 a, uint32 b)
  * (else return 1).
  */
 static uint32
-random_relative_prime(uint32 n, SamplerRandomState randstate)
+random_relative_prime(uint32 n, pg_prng_state *randstate)
 {
 	uint32		r;
 
diff --git a/contrib/tsm_system_time/tsm_system_time.c b/contrib/tsm_system_time/tsm_system_time.c
index 788d8f9a68..36acc6c106 100644
--- a/contrib/tsm_system_time/tsm_system_time.c
+++ b/contrib/tsm_system_time/tsm_system_time.c
@@ -69,7 +69,7 @@ static BlockNumber system_time_nextsampleblock(SampleScanState *node, BlockNumbe
 static OffsetNumber system_time_nextsampletuple(SampleScanState *node,
 BlockNumber blockno,
 OffsetNumber maxoffset);
-static uint32 random_relative_prime(uint32 n, SamplerRandomState randstate);
+static uint32 random_relative_prime(uint32 n, pg_prng_state *randstate);
 
 
 /*
@@ -224,25 +224,25 @@ system_time_nextsampleblock(SampleScanState *node, BlockNumber nblocks)
 		if (sampler->step == 0)
 		{
 			/* Initialize now that we have scan descriptor */
-			SamplerRandomState randstate;
+			pg_prng_state randstate;
 
 			/* If relation is empty, there's nothing to scan */
 			if (nblocks == 0)
 return InvalidBlockNumber;
 
 			/* We only need an RNG during this setup step */
-			sampler_random_init_state(sampler->seed, randstate);
+			sampler_random_init_state(sampler->seed, );
 
 			/* Compute nblocks/firstblock/step only once per query */
 			sampler->nblocks = nblocks;
 
 			/* Choose random starting block within the relation */
 			/* (Actually this is the predecessor of the first block visited) */
-			sampler->firstblock = sampler_random_fract(randstate) *
+			sampler->firstblock = sampler_random_fract() *
 sampler->nblocks;
 
 			/* Find relative prime as step size for linear probing */
-			sampler->step = random_relative_prime(sampler->nblocks, randstate);
+			sampler->step = random_relative_prime(sampler->nblocks, 

Re: rand48 replacement

2021-07-03 Thread Fabien COELHO



Hello Yura,

1. PostgreSQL source uses `uint64` and `uint32`, but not 
`uint64_t`/`uint32_t`
2. I don't see why pg_prng_state could not be `typedef uint64 
pg_prng_state[2];`


It could, but I do not see that as desirable. From an API design point of 
view we want something clean and abstract, and for me a struct looks 
better for that. It would be a struct with an array insided, I think that 
the code is more readable by avoiding constant index accesses (s[0] vs 
s0), we do not need actual indexes.



3. Then SamplerRandomState and pgbench RandomState could stay.
  Patch will be a lot shorter.


You mean "typedef pg_prng_state SamplerRandomState"? One point of the 
patch is to have "one" standard PRNG commonly used where one is needed, so 
I'd say we want the name to be used, hence the substitutions.


Also, I have a thing against objects being named "Random" which are not 
random, which is highly misleading. A PRNG is purely deterministic. 
Removing misleading names is also a benefit.


So If people want to keep the old name it can be done. But I see these 
name changes as desirable.


I don't like mix of semantic refactoring and syntactic refactoring in 
the same patch. While I could agree with replacing `SamplerRandomState 
=> pg_prng_state`, I'd rather see it in separate commit. And that 
separate commit could contain transition: `typedef uint64 
pg_prng_state[2];` => `typedef struct { uint64 s0, s1 } pg_prng_state;`


I would tend to agree on principle, but separating in two phases here 
looks pointless: why implementing a cleaner rand48 interface, which would 
then NOT be the rand48 standard, just to upgrade it to something else in 
the next commit? And the other path is as painfull and pointless.


So I think that the new feature better comes with its associated 
refactoring which is an integral part of it.



4. There is no need in ReservoirStateData->randstate_initialized. There could
  be macros/function:
  `bool pg_prng_initiated(state) { return (state[0]|state[1]) != 0; }`


It would work for this peculiar implementation but not necessary for 
others that we may want to substitute later, as it would mean either 
breaking the interface or adding a boolean in the structure if there is no 
special unintialized state that can be detected, which would impact memory 
usage and alignment.


So I think it is better to keep it that way, usually the user knows 
whether its structure has been initialized, and the special case for 
reservoir where the user does not seem to know about that can handle its 
own boolean without impacting the common API or the data structure.



5. Is there need for 128 bit prng at all?


This is a 64 bits PRNG with a 128 bit state. We are generating 64 bits 
values, so we want a 64 bit PRNG. A prng state must be larger than its 
generated value, so we need sizeof(state) > 64 bits, hence at least 128 
bits if we add 128 bits memory alignement.


  And there is 4*32bit xoshiro128: 
https://prng.di.unimi.it/xoshiro128plusplus.c

  32bit operations are faster on 32bit platforms.
  But 32bit platforms are quite rare in production this days.
  Therefore I don't have strong opinion on this.


I think that 99.9% hardware running postgres is 64 bits, so 64 bits is the 
right choice.


--
Fabien.





Re: Preventing abort() and exit() calls in libpq

2021-07-03 Thread Tom Lane
I wrote:
> Hmmm ... mine is 8.4.1.
> I'm about to go out to dinner, but will check into this with some
> newer gcc versions later.

Tried --enable-coverage on Fedora 34 (with gcc 11.1.1) and sure
enough there's an exit() call being inserted.  I've pushed a fix
to just disable the check altogether in --enable-coverage builds.

regards, tom lane




Re: OpenSSL 3.0.0 compatibility

2021-07-03 Thread Peter Eisentraut

On 12.03.21 08:51, Peter Eisentraut wrote:


On 11.03.21 11:41, Daniel Gustafsson wrote:
.. and apply the padding changes as proposed in a patch upthread like 
this (these
work for all OpenSSL versions I've tested, and I'm rather more puzzled 
as to

why we got away with not having them in the past):


Yes, before proceeding with this, we should probably understand why 
these changes are effective and why they haven't been required in the past.


I took another look at this with openssl-3.0.0-beta1.  The issue with 
the garbled padding output is still there.  What I found is that 
pgcrypto has been using the encryption and decryption API slightly 
incorrectly.  You are supposed to call EVP_DecryptUpdate() followed by 
EVP_DecryptFinal_ex() (and similarly for encryption), but pgcrypto 
doesn't do the second one.  (To be fair, this API was added to OpenSSL 
after pgcrypto first appeared.)  The "final" functions take care of the 
padding.  We have been getting away with it like this because we do the 
padding manually elsewhere.  But apparently, something has changed in 
OpenSSL 3.0.0 in that if padding is enabled in OpenSSL, 
EVP_DecryptUpdate() doesn't flush the last normal block until the 
"final" function is called.


Your proposed fix was to explicitly disable padding, and then this 
problem goes away.  You can still call the "final" functions, but they 
won't do anything, except check that there is no more data left, but we 
already check that elsewhere.


Another option is to throw out our own padding code and let OpenSSL 
handle it.  See attached demo patch.  But that breaks the non-OpenSSL 
code in internal.c, so we'd have to re-add the padding code there.  So 
this isn't quite as straightforward an option.  (At least, with the 
patch we can confirm that the OpenSSL padding works consistently with 
our own implementation.)


So I think your proposed patch is sound and a good short-term and 
low-risk solution.
From be5882c7e7b80a6d213eee88a8960f6c9773f958 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Sat, 3 Jul 2021 15:39:30 +0200
Subject: [PATCH] Use EVP_EncryptFinal_ex() and EVP_DecryptFinal_ex()

---
 contrib/pgcrypto/openssl.c |  22 +--
 contrib/pgcrypto/pgp-cfb.c |   4 +-
 contrib/pgcrypto/px.c  | 119 +
 contrib/pgcrypto/px.h  |  12 ++--
 4 files changed, 27 insertions(+), 130 deletions(-)

diff --git a/contrib/pgcrypto/openssl.c b/contrib/pgcrypto/openssl.c
index ed8e74a2b9..68fd61b716 100644
--- a/contrib/pgcrypto/openssl.c
+++ b/contrib/pgcrypto/openssl.c
@@ -369,16 +369,18 @@ gen_ossl_free(PX_Cipher *c)
 }
 
 static int
-gen_ossl_decrypt(PX_Cipher *c, const uint8 *data, unsigned dlen,
-uint8 *res)
+gen_ossl_decrypt(PX_Cipher *c, int padding, const uint8 *data, unsigned dlen,
+uint8 *res, unsigned *rlen)
 {
OSSLCipher *od = c->ptr;
-   int outlen;
+   int outlen, outlen2;
 
if (!od->init)
{
if (!EVP_DecryptInit_ex(od->evp_ctx, od->evp_ciph, NULL, NULL, 
NULL))
return PXE_CIPHER_INIT;
+   if (!EVP_CIPHER_CTX_set_padding(od->evp_ctx, padding))
+   return PXE_CIPHER_INIT;
if (!EVP_CIPHER_CTX_set_key_length(od->evp_ctx, od->klen))
return PXE_CIPHER_INIT;
if (!EVP_DecryptInit_ex(od->evp_ctx, NULL, NULL, od->key, 
od->iv))
@@ -388,21 +390,26 @@ gen_ossl_decrypt(PX_Cipher *c, const uint8 *data, 
unsigned dlen,
 
if (!EVP_DecryptUpdate(od->evp_ctx, res, , data, dlen))
return PXE_DECRYPT_FAILED;
+   if (!EVP_DecryptFinal_ex(od->evp_ctx, res + outlen, ))
+   return PXE_DECRYPT_FAILED;
+   *rlen = outlen + outlen2;
 
return 0;
 }
 
 static int
-gen_ossl_encrypt(PX_Cipher *c, const uint8 *data, unsigned dlen,
-uint8 *res)
+gen_ossl_encrypt(PX_Cipher *c, int padding, const uint8 *data, unsigned dlen,
+uint8 *res, unsigned *rlen)
 {
OSSLCipher *od = c->ptr;
-   int outlen;
+   int outlen, outlen2;
 
if (!od->init)
{
if (!EVP_EncryptInit_ex(od->evp_ctx, od->evp_ciph, NULL, NULL, 
NULL))
return PXE_CIPHER_INIT;
+   if (!EVP_CIPHER_CTX_set_padding(od->evp_ctx, padding))
+   return PXE_CIPHER_INIT;
if (!EVP_CIPHER_CTX_set_key_length(od->evp_ctx, od->klen))
return PXE_CIPHER_INIT;
if (!EVP_EncryptInit_ex(od->evp_ctx, NULL, NULL, od->key, 
od->iv))
@@ -412,6 +419,9 @@ gen_ossl_encrypt(PX_Cipher *c, const uint8 *data, unsigned 
dlen,
 
if (!EVP_EncryptUpdate(od->evp_ctx, res, , data, dlen))
return PXE_ENCRYPT_FAILED;
+   if (!EVP_EncryptFinal_ex(od->evp_ctx, 

Re: Preventing abort() and exit() calls in libpq

2021-07-03 Thread Tom Lane
I wrote:
> I'm now wondering about applying the test to *.o in libpq,
> as well as libpgport_shlib.a and libpgcommon_shlib.a.
> The latter would require some code changes, and it would make
> the prohibition extend further than libpq alone.  On the bright
> side, we could reinstate the check for abort().

After consuming a bit more caffeine, I'm afraid that won't work.
I'd imagined leaving, e.g., psprintf.c out of libpgcommon_shlib.a.
But if someone mistakenly introduced a psprintf call into libpq,
it'd still compile just fine; the symbol would be resolved against
psprintf in the calling application's code.  We'd only detect a
failure when trying to use libpq with an app that didn't contain
that function, which feels like something that our own testing
could miss.

What I'm now thinking about is restricting the test to only be run on
platforms where use of foo.a libraries is deprecated, so that we can
be pretty sure that we won't hit this situation.  Even if we only
run the test on Linux, that'd be plenty to catch any mistakes.

regards, tom lane




Re: rand48 replacement

2021-07-03 Thread Tom Lane
Yura Sokolov  writes:
> 2. I don't see why pg_prng_state could not be `typedef uint64 
> pg_prng_state[2];`

Please no.  That sort of typedef behaves so weirdly that it's
a foot-gun.

regards, tom lane




Re: Preventing abort() and exit() calls in libpq

2021-07-03 Thread Tom Lane
Noah Misch  writes:
> On Fri, Jul 02, 2021 at 11:20:17AM -0400, Tom Lane wrote:
>> Ugh.  What in the world is producing those references?

> Those come from a statically-linked libldap_r:

Blech!  I wonder if there is some way to avoid counting that.
It's not really hard to imagine that such a library might
contain an exit() call, for example, thus negating our test
altogether.

I'm now wondering about applying the test to *.o in libpq,
as well as libpgport_shlib.a and libpgcommon_shlib.a.
The latter would require some code changes, and it would make
the prohibition extend further than libpq alone.  On the bright
side, we could reinstate the check for abort().

regards, tom lane




Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

2021-07-03 Thread Andy Fan
Hi:
 I'd start to work on UniqueKey again, it would be great that we can target
it
 to PG 15. The attached patch is just for the notnull_attrs. Since we can't
say
 a column is nullable or not without saying in which resultset, so I think
attaching
it to RelOptInfo is unavoidable. Here is how my patch works.

@@ -686,6 +686,12 @@ typedef struct RelOptInfo
  /* default result targetlist for Paths scanning this relation */
  struct PathTarget *reltarget; /* list of Vars/Exprs, cost, width */

+ Bitmapset **notnull_attrs; /* The attno which is not null after evaluating
+  * all the quals on this relation, for baserel,
+  * the len would always 1. and for others the array
+  * index is relid from relids.
+  */
+

For baserel, it records the notnull attrs as a bitmapset and stores it to
RelOptInfo->notnull_attrs[0].  As for the joinrel, suppose the relids is
{1,3,
5}, then the notnull_attrs[1/3/5] will be used to store notnull_attrs
Bitmapset
for relation 1,3,5 separately. I don't handle this stuff for all kinds of
upper
relation and subquery so far since UniqueKey doesn't rely on it and looks
more stuff should be handled there.

The patch also included some debug messages in
set_baserel/joinrel_notnullattrs
and attached the test.sql for easier review. Any feedback is welcome and
hope
this implementation would not block UniqueKey stuff.


v1-0001-add-the-not-null-attrs-for-RelOptInfo.-Here-is-ho.patch
Description: Binary data


Re: [PATCH v3 1/1] Fix detection of preadv/pwritev support for OSX.

2021-07-03 Thread Peter Eisentraut

On 21.06.21 07:22, Thomas Munro wrote:

I'm not personally against the proposed change.  I'll admit there is
something annoying about Apple's environment working in a way that
doesn't suit traditional configure macros that have been the basis of
portable software for a few decades, but when all's said and done,
configure is a Unix wars era way to make things work across all the
Unixes, and most of them are long gone, configure itself is on the way
out, and Apple's still here, so...


I think this change is perfectly appropriate (modulo some small cleanups).

The objection was that you cannot reliably use AC_CHECK_FUNCS (and 
therefore AC_REPLACE_FUNCS) anymore, but that has always been true, 
since AC_CHECK_FUNCS doesn't handle macros, compiler built-ins, and 
functions that are not declared, and any other situation where looking 
for a symbol in a library is not the same as checking whether the symbol 
actual works for your purpose.  This is not too different from the long 
transition from "does this header file exists" to "can I compile this 
header file".


So in fact the correct way forward would be to get rid of all uses of 
AC_CHECK_FUNCS and related, and then this problem goes away by itself.






Re: Remove useless int64 range checks on BIGINT sequence MINVALUE/MAXVALUE values

2021-07-03 Thread Peter Eisentraut

On 21.06.21 13:32, Greg Nancarrow wrote:

On Mon, Jun 21, 2021 at 8:32 PM David Rowley  wrote:


It might be worth putting in a comment to mention that the check is
not needed.  Just in case someone looks again one day and thinks the
checks are missing.

Probably best to put this in the July commitfest so it does not get missed.


Updated the patch, and will add it to the Commitfest, thanks.


I don't think this is a good change.  It replaces one perfectly solid, 
harmless, and readable line of code with six lines of comment explaining 
why the code isn't necessary (times two).  And the code is now less 
robust against changes elsewhere.  To maintain this robustness, you'd 
have to add assertions that prove that what the comment is saying is 
actually true, thus adding even more code.


I think we should leave it as is.




Re: Add ZSON extension to /contrib/

2021-07-03 Thread Peter Eisentraut

On 04.06.21 17:09, Aleksander Alekseev wrote:

I decided to add the patch to the nearest commitfest.


With respect to the commit fest submission, I don't think there is 
consensus right now to add this.  I think people would prefer that this 
dictionary facility be somehow made available in the existing JSON 
types.  Also, I sense that there is still some volatility about some of 
the details of how this extension should work and its scope.  I think 
this is served best as an external extension for now.





Re: rand48 replacement

2021-07-03 Thread Yura Sokolov

Fabien COELHO wrote 2021-07-03 11:45:

And a v5 where an unused test file does also compile if we insist.


About patch:
1. PostgreSQL source uses `uint64` and `uint32`, but not 
`uint64_t`/`uint32_t`
2. I don't see why pg_prng_state could not be `typedef uint64 
pg_prng_state[2];`

3. Then SamplerRandomState and pgbench RandomState could stay.
   Patch will be a lot shorter.
   I don't like mix of semantic refactoring and syntactic refactoring in 
the

   same patch.
   While I could agree with replacing `SamplerRandomState => 
pg_prng_state`, I'd

   rather see it in separate commit.
   And that separate commit could contain transition:
   `typedef uint64 pg_prng_state[2];` => `typedef struct { uint64 s0, s1 
} pg_prng_state;`
4. There is no need in ReservoirStateData->randstate_initialized. There 
could

   be macros/function:
   `bool pg_prng_initiated(state) { return (state[0]|state[1]) != 0; }`
5. Is there need for 128bit prng at all? At least 2*64bit.
   There are 2*32bit xoroshiro64 
https://prng.di.unimi.it/xoroshiro64starstar.c
   And there is 4*32bit xoshiro128: 
https://prng.di.unimi.it/xoshiro128plusplus.c

   32bit operations are faster on 32bit platforms.
   But 32bit platforms are quite rare in production this days.
   Therefore I don't have strong opinion on this.

regards,
Sokolov Yura.




Re: WIP: Relaxing the constraints on numeric scale

2021-07-03 Thread Dean Rasheed
Attached is a more complete patch, with updated docs and tests.

I chose to allow the scale to be in the range -1000 to 1000, which, to
some extent, is quite arbitrary. The upper limit of 1000 makes sense,
because nearly all numeric computations (other than multiply, add and
subtract) have that as their upper scale limit (that's the maximum
display scale). It also has to be at least 1000 for SQL compliance,
since the precision can be up to 1000.

The lower limit, on the other hand, really is quite arbitrary. -1000
is a nice round number, giving it a certain symmetry, and is almost
certainly sufficient for any realistic use case (-1000 means numbers
are rounded to the nearest multiple of 10^1000).

Also, keeping some spare bits in the typemod might come in handy one
day for something else (e.g., rounding mode choice).

Regards,
Dean
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
new file mode 100644
index c473d6a..97e4cdf
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -545,8 +545,8 @@
 
 NUMERIC(precision, scale)
 
- The precision must be positive, the scale zero or positive.
- Alternatively:
+ The precision must be positive, the scale may be positive or negative
+ (see below).  Alternatively:
 
 NUMERIC(precision)
 
@@ -578,11 +578,41 @@ NUMERIC
 
  If the scale of a value to be stored is greater than the declared
  scale of the column, the system will round the value to the specified
- number of fractional digits.  Then, if the number of digits to the
- left of the decimal point exceeds the declared precision minus the
- declared scale, an error is raised.
+ number of fractional digits.  If the declared scale of the column is
+ negative, the value will be rounded to the left of the decimal point.
+ If, after rounding, the number of digits to the left of the decimal point
+ exceeds the declared precision minus the declared scale, an error is
+ raised.  Similarly, if the declared scale exceeds the declared precision
+ and the number of zero digits to the right of the decimal point is less
+ than the declared scale minus the declared precision, an error is raised.
+ For example, a column declared as
+
+NUMERIC(3, 1)
+
+ will round values to 1 decimal place and be able to store values between
+ -99.9 and 99.9, inclusive.  A column declared as
+
+NUMERIC(2, -3)
+
+ will round values to the nearest thousand and be able to store values
+ between -99000 and 99000, inclusive.  A column declared as
+
+NUMERIC(3, 5)
+
+ will round values to 5 decimal places and be able to store values between
+ -0.00999 and 0.00999, inclusive.
 
 
+
+ 
+  The scale in a NUMERIC type declaration may be any value in
+  the range -1000 to 1000.  (The SQL standard requires
+  the scale to be in the range 0 to precision.
+  Using values outside this range may not be portable to other database
+  systems.)
+ 
+
+
 
  Numeric values are physically stored without any extra leading or
  trailing zeroes.  Thus, the declared precision and scale of a column
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c
new file mode 100644
index eb78f0b..2001d75
--- a/src/backend/utils/adt/numeric.c
+++ b/src/backend/utils/adt/numeric.c
@@ -250,6 +250,17 @@ struct NumericData
 	 | ((n)->choice.n_short.n_header & NUMERIC_SHORT_WEIGHT_MASK)) \
 	: ((n)->choice.n_long.n_weight))
 
+/*
+ * Pack the numeric precision and scale in the typmod value.  The upper 16
+ * bits are used for the precision, and the lower 16 bits for the scale.  Note
+ * that the scale may be negative, so use sign extension when unpacking it.
+ */
+
+#define MAKE_TYPMOD(p, s) p) << 16) | ((s) & 0x)) + VARHDRSZ)
+
+#define TYPMOD_PRECISION(t) t) - VARHDRSZ) >> 16) & 0x)
+#define TYPMOD_SCALE(t) ((int32) ((int16) (((t) - VARHDRSZ) & 0x)))
+
 /* --
  * NumericVar is the format we use for arithmetic.  The digit-array part
  * is the same as the NumericData storage format, but the header is more
@@ -826,7 +837,7 @@ numeric_maximum_size(int32 typmod)
 		return -1;
 
 	/* precision (ie, max # of digits) is in upper bits of typmod */
-	precision = ((typmod - VARHDRSZ) >> 16) & 0x;
+	precision = TYPMOD_PRECISION(typmod);
 
 	/*
 	 * This formula computes the maximum number of NumericDigits we could need
@@ -1080,10 +1091,10 @@ numeric_support(PG_FUNCTION_ARGS)
 			Node	   *source = (Node *) linitial(expr->args);
 			int32		old_typmod = exprTypmod(source);
 			int32		new_typmod = DatumGetInt32(((Const *) typmod)->constvalue);
-			int32		old_scale = (old_typmod - VARHDRSZ) & 0x;
-			int32		new_scale = (new_typmod - VARHDRSZ) & 0x;
-			int32		old_precision = (old_typmod - VARHDRSZ) >> 16 & 0x;
-			int32		new_precision = (new_typmod - VARHDRSZ) >> 16 & 0x;
+			int32		old_scale = TYPMOD_SCALE(old_typmod);
+			int32		new_scale = 

Re: Synchronous commit behavior during network outage

2021-07-03 Thread Andrey Borodin



> 3 июля 2021 г., в 01:15, Jeff Davis  написал(а):
> 
> On Fri, 2021-07-02 at 11:39 +0500, Andrey Borodin wrote:
>> If the failover happens due to unresponsive node we cannot just turn
>> off sync rep. We need to have some spare connections for that (number
>> of stuck backends will skyrocket during network partitioning). We
>> need available descriptors and some memory to fork new backend. We
>> will need to re-read config. We need time to try after all.
>> At some failures we may lack some of these.
> 
> I think it's a good point that, when things start to go wrong, they can
> go very wrong very quickly.
> 
> But until you've disabled sync rep, the primary will essentially be
> down for writes whether using this new feature or not. Even if you can
> terminate some backends to try to free space, the application will just
> make new connections that will get stuck the same way.
Surely I'm talking about terminating postmaster, not individual backends. But 
postmaster will need to terminate each running query.
We surely need to have a way to stop whole instance without making any single 
query. And I do not like kill -9 for this purpose.

> 
> You can avoid the "fork backend" problem by keeping a connection always
> open from the HA tool, or by editing the conf to disable sync rep and
> issuing SIGHUP instead. Granted, that still takes some memory.
> 
>> Partial degradation is already hard task. Without ability to easily
>> terminate running Postgres HA tool will often resort to SIGKILL.
> 
> When the system is really wedged as you describe (waiting on sync rep,
> tons of connections, and low memory), what information do you expect
> the HA tool to be able to collect, and what actions do you expect it to
> take?
HA tool is not going to collect anything. It just calls pg_ctl stop [0] or it's 
equivalent.

> 
> Presumably, you'd want it to disable sync rep at some point to get back
> online. Where does SIGTERM fit into the picture?

HA tool is going to terminate running instance, rewind it, switch to new 
timeline and enroll into cluster again as standby.

> 
>>> If you don't handle the termination case, then there's still a
>>> chance
>>> for the transaction to become visible to other clients before its
>>> replicated.
>> 
>> Termination is admin command, they know what they are doing.
>> Cancelation is part of user protocol.
> 
> From the pg_terminate_backend() docs: "This is also allowed if the
> calling role is a member of the role whose backend is being terminated
> or the calling role has been granted pg_signal_backend", so it's not
> really an admin command. Even for an admin, it might be hard to
> understand why terminating a backend could result in losing a visible
> transaction.
Ok, I see backend termination is not described as admin command.
We cannot prevent user from doing stupid things, they are able to delete their 
data anyway.

> I'm not really seeing two use cases here for two GUCs. Are you sure you
> want to disable only cancels but allow termination to proceed?

Yes, I'm sure. I had been running production with disabled termination for some 
weeks. cluster reparation was much slower. For some reason kill-9-ed instances 
were successfully rewound much less often. But maybe I've done something wrong.

If we can stop whole instance the same way as we did without activating 
proposed GUC - there is no any problem.

Thanks!

Best regards, Andrey Borodin.

[0] 
https://github.com/zalando/patroni/blob/master/patroni/postgresql/postmaster.py#L155



Re: rand48 replacement

2021-07-03 Thread Fabien COELHO



Here is a v4, which:

- moves the stuff to common and fully removes random/srandom (Tom)
- includes a range generation function based on the bitmask method (Dean)
  but iterates with splitmix so that the state always advances once (Me)


And a v5 where an unused test file does also compile if we insist.

--
Fabien.diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 2c2f149fb0..146b524076 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -1188,7 +1188,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
  * Found a suitable tuple, so save it, replacing one old tuple
  * at random
  */
-int			k = (int) (targrows * sampler_random_fract(rstate.randstate));
+int			k = (int) (targrows * sampler_random_fract());
 
 Assert(k >= 0 && k < targrows);
 heap_freetuple(rows[k]);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index fafbab6b02..3009861e45 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -5152,7 +5152,7 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
 		if (astate->rowstoskip <= 0)
 		{
 			/* Choose a random reservoir element to replace. */
-			pos = (int) (targrows * sampler_random_fract(astate->rstate.randstate));
+			pos = (int) (targrows * sampler_random_fract(>rstate.randstate));
 			Assert(pos >= 0 && pos < targrows);
 			heap_freetuple(astate->rows[pos]);
 		}
diff --git a/contrib/tsm_system_rows/tsm_system_rows.c b/contrib/tsm_system_rows/tsm_system_rows.c
index 4996612902..1a46d4b143 100644
--- a/contrib/tsm_system_rows/tsm_system_rows.c
+++ b/contrib/tsm_system_rows/tsm_system_rows.c
@@ -69,7 +69,7 @@ static BlockNumber system_rows_nextsampleblock(SampleScanState *node, BlockNumbe
 static OffsetNumber system_rows_nextsampletuple(SampleScanState *node,
 BlockNumber blockno,
 OffsetNumber maxoffset);
-static uint32 random_relative_prime(uint32 n, SamplerRandomState randstate);
+static uint32 random_relative_prime(uint32 n, pg_prng_state *randstate);
 
 
 /*
@@ -213,25 +213,25 @@ system_rows_nextsampleblock(SampleScanState *node, BlockNumber nblocks)
 		if (sampler->step == 0)
 		{
 			/* Initialize now that we have scan descriptor */
-			SamplerRandomState randstate;
+			pg_prng_state randstate;
 
 			/* If relation is empty, there's nothing to scan */
 			if (nblocks == 0)
 return InvalidBlockNumber;
 
 			/* We only need an RNG during this setup step */
-			sampler_random_init_state(sampler->seed, randstate);
+			sampler_random_init_state(sampler->seed, );
 
 			/* Compute nblocks/firstblock/step only once per query */
 			sampler->nblocks = nblocks;
 
 			/* Choose random starting block within the relation */
 			/* (Actually this is the predecessor of the first block visited) */
-			sampler->firstblock = sampler_random_fract(randstate) *
+			sampler->firstblock = sampler_random_fract() *
 sampler->nblocks;
 
 			/* Find relative prime as step size for linear probing */
-			sampler->step = random_relative_prime(sampler->nblocks, randstate);
+			sampler->step = random_relative_prime(sampler->nblocks, );
 		}
 
 		/* Reinitialize lb */
@@ -317,7 +317,7 @@ gcd(uint32 a, uint32 b)
  * (else return 1).
  */
 static uint32
-random_relative_prime(uint32 n, SamplerRandomState randstate)
+random_relative_prime(uint32 n, pg_prng_state *randstate)
 {
 	uint32		r;
 
diff --git a/contrib/tsm_system_time/tsm_system_time.c b/contrib/tsm_system_time/tsm_system_time.c
index 788d8f9a68..36acc6c106 100644
--- a/contrib/tsm_system_time/tsm_system_time.c
+++ b/contrib/tsm_system_time/tsm_system_time.c
@@ -69,7 +69,7 @@ static BlockNumber system_time_nextsampleblock(SampleScanState *node, BlockNumbe
 static OffsetNumber system_time_nextsampletuple(SampleScanState *node,
 BlockNumber blockno,
 OffsetNumber maxoffset);
-static uint32 random_relative_prime(uint32 n, SamplerRandomState randstate);
+static uint32 random_relative_prime(uint32 n, pg_prng_state *randstate);
 
 
 /*
@@ -224,25 +224,25 @@ system_time_nextsampleblock(SampleScanState *node, BlockNumber nblocks)
 		if (sampler->step == 0)
 		{
 			/* Initialize now that we have scan descriptor */
-			SamplerRandomState randstate;
+			pg_prng_state randstate;
 
 			/* If relation is empty, there's nothing to scan */
 			if (nblocks == 0)
 return InvalidBlockNumber;
 
 			/* We only need an RNG during this setup step */
-			sampler_random_init_state(sampler->seed, randstate);
+			sampler_random_init_state(sampler->seed, );
 
 			/* Compute nblocks/firstblock/step only once per query */
 			sampler->nblocks = nblocks;
 
 			/* Choose random starting block within the relation */
 			/* (Actually this is the predecessor of the first block visited) */
-			sampler->firstblock = sampler_random_fract(randstate) *
+			sampler->firstblock = sampler_random_fract() *
 sampler->nblocks;
 
 			/* Find relative 

Re: rand48 replacement

2021-07-03 Thread Fabien COELHO


Hello Dean & Tom,

Here is a v4, which:

 - moves the stuff to common and fully removes random/srandom (Tom)
 - includes a range generation function based on the bitmask method (Dean)
   but iterates with splitmix so that the state always advances once (Me)

--
Fabien.diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 2c2f149fb0..146b524076 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -1188,7 +1188,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
  * Found a suitable tuple, so save it, replacing one old tuple
  * at random
  */
-int			k = (int) (targrows * sampler_random_fract(rstate.randstate));
+int			k = (int) (targrows * sampler_random_fract());
 
 Assert(k >= 0 && k < targrows);
 heap_freetuple(rows[k]);
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index fafbab6b02..3009861e45 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -5152,7 +5152,7 @@ analyze_row_processor(PGresult *res, int row, PgFdwAnalyzeState *astate)
 		if (astate->rowstoskip <= 0)
 		{
 			/* Choose a random reservoir element to replace. */
-			pos = (int) (targrows * sampler_random_fract(astate->rstate.randstate));
+			pos = (int) (targrows * sampler_random_fract(>rstate.randstate));
 			Assert(pos >= 0 && pos < targrows);
 			heap_freetuple(astate->rows[pos]);
 		}
diff --git a/contrib/tsm_system_rows/tsm_system_rows.c b/contrib/tsm_system_rows/tsm_system_rows.c
index 4996612902..1a46d4b143 100644
--- a/contrib/tsm_system_rows/tsm_system_rows.c
+++ b/contrib/tsm_system_rows/tsm_system_rows.c
@@ -69,7 +69,7 @@ static BlockNumber system_rows_nextsampleblock(SampleScanState *node, BlockNumbe
 static OffsetNumber system_rows_nextsampletuple(SampleScanState *node,
 BlockNumber blockno,
 OffsetNumber maxoffset);
-static uint32 random_relative_prime(uint32 n, SamplerRandomState randstate);
+static uint32 random_relative_prime(uint32 n, pg_prng_state *randstate);
 
 
 /*
@@ -213,25 +213,25 @@ system_rows_nextsampleblock(SampleScanState *node, BlockNumber nblocks)
 		if (sampler->step == 0)
 		{
 			/* Initialize now that we have scan descriptor */
-			SamplerRandomState randstate;
+			pg_prng_state randstate;
 
 			/* If relation is empty, there's nothing to scan */
 			if (nblocks == 0)
 return InvalidBlockNumber;
 
 			/* We only need an RNG during this setup step */
-			sampler_random_init_state(sampler->seed, randstate);
+			sampler_random_init_state(sampler->seed, );
 
 			/* Compute nblocks/firstblock/step only once per query */
 			sampler->nblocks = nblocks;
 
 			/* Choose random starting block within the relation */
 			/* (Actually this is the predecessor of the first block visited) */
-			sampler->firstblock = sampler_random_fract(randstate) *
+			sampler->firstblock = sampler_random_fract() *
 sampler->nblocks;
 
 			/* Find relative prime as step size for linear probing */
-			sampler->step = random_relative_prime(sampler->nblocks, randstate);
+			sampler->step = random_relative_prime(sampler->nblocks, );
 		}
 
 		/* Reinitialize lb */
@@ -317,7 +317,7 @@ gcd(uint32 a, uint32 b)
  * (else return 1).
  */
 static uint32
-random_relative_prime(uint32 n, SamplerRandomState randstate)
+random_relative_prime(uint32 n, pg_prng_state *randstate)
 {
 	uint32		r;
 
diff --git a/contrib/tsm_system_time/tsm_system_time.c b/contrib/tsm_system_time/tsm_system_time.c
index 788d8f9a68..36acc6c106 100644
--- a/contrib/tsm_system_time/tsm_system_time.c
+++ b/contrib/tsm_system_time/tsm_system_time.c
@@ -69,7 +69,7 @@ static BlockNumber system_time_nextsampleblock(SampleScanState *node, BlockNumbe
 static OffsetNumber system_time_nextsampletuple(SampleScanState *node,
 BlockNumber blockno,
 OffsetNumber maxoffset);
-static uint32 random_relative_prime(uint32 n, SamplerRandomState randstate);
+static uint32 random_relative_prime(uint32 n, pg_prng_state *randstate);
 
 
 /*
@@ -224,25 +224,25 @@ system_time_nextsampleblock(SampleScanState *node, BlockNumber nblocks)
 		if (sampler->step == 0)
 		{
 			/* Initialize now that we have scan descriptor */
-			SamplerRandomState randstate;
+			pg_prng_state randstate;
 
 			/* If relation is empty, there's nothing to scan */
 			if (nblocks == 0)
 return InvalidBlockNumber;
 
 			/* We only need an RNG during this setup step */
-			sampler_random_init_state(sampler->seed, randstate);
+			sampler_random_init_state(sampler->seed, );
 
 			/* Compute nblocks/firstblock/step only once per query */
 			sampler->nblocks = nblocks;
 
 			/* Choose random starting block within the relation */
 			/* (Actually this is the predecessor of the first block visited) */
-			sampler->firstblock = sampler_random_fract(randstate) *
+			sampler->firstblock = sampler_random_fract() *
 sampler->nblocks;
 
 			/* Find relative prime as step size for linear probing */
-