RE: logical replication empty transactions

2021-07-14 Thread osumi.takami...@fujitsu.com
On Wednesday, July 14, 2021 9:30 PM Ajin Cherian  wrote:
> I've had to rebase the patch after a recent commit by Amit Kapila of 
> supporting
> two-phase commits in pub-sub [1].
> Also I've modified the patch to also skip replicating empty prepared
> transactions. Do let me know if you have any comments.
Hi

I started to test this patch but will give you some really minor quick 
feedbacks.

(1) pg_logical_slot_get_binary_changes() params.

Technically, looks better to have proto_version 3 & two_phase option for the 
function
to test empty prepare ? I felt proto_version 1 doesn't support 2PC.
[1] says "The following messages (Begin Prepare, Prepare, Commit Prepared, 
Rollback Prepared)
are available since protocol version 3." Then, if the test wants to skip empty 
*prepares*,
I suggest to update the proto_version and set two_phase 'on'.

+##
+# Test empty prepares
+##
...
+# peek at the contents of the slot
+$result = $node_publisher->safe_psql(
+   'postgres', qq(
+   SELECT get_byte(data, 0)
+   FROM pg_logical_slot_get_binary_changes('tap_sub', NULL, NULL,
+   'proto_version', '1',
+   'publication_names', 'tap_pub')
+));

(2) The following sentences may start with a lowercase letter.
There are other similar codes for this.

+   elog(DEBUG1, "Skipping replication of an empty transaction");

[1] - 
https://www.postgresql.org/docs/devel/protocol-logicalrep-message-formats.html


Best Regards,
Takamichi Osumi



Re: speed up verifying UTF-8

2021-07-14 Thread Amit Khandekar
On Tue, 13 Jul 2021 at 01:15, John Naylor  wrote:
> > It seems like it would be easy to have pg_utf8_verify_one in my proposed 
> > pg_utf8.h header and replace the body of pg_utf8_verifychar with it.
>
> 0001: I went ahead and tried this for v15, and also attempted some clean-up:
>
> - Rename pg_utf8_verify_one to pg_utf8_verifychar_internal.
> - Have pg_utf8_verifychar_internal return -1 for invalid input to match other 
> functions in the file. We could also do this for check_ascii, but it's not 
> quite the same thing, because the string could still have valid bytes in it, 
> just not enough to advance the pointer by the stride length.
> - Remove hard-coded numbers (not wedded to this).
>
> - Use a call to pg_utf8_verifychar in the slow path.
> - Reduce pg_utf8_verifychar to thin wrapper around 
> pg_utf8_verifychar_internal.

- check_ascii() seems to be used only for 64-bit chunks. So why not
remove the len argument and the len <= sizeof(int64) checks inside the
function. We can rename it to check_ascii64() for clarity.

- I was thinking, why not have a pg_utf8_verify64() that processes
64-bit chunks (or a 32-bit version). In check_ascii(), we anyway
extract a 64-bit chunk from the string. We can use the same chunk to
extract the required bits from a two byte char or a 4 byte char. This
way we can avoid extraction of separate bytes like b1 = *s; b2 = s[1]
etc. More importantly, we can avoid the separate continuation-char
checks for each individual byte. Additionally, we can try to simplify
the subsequent overlong or surrogate char checks. Something like this
:

int pg_utf8_verifychar_32(uint32 chunk)
{
intlen, l;

for (len = sizeof(chunk); len > 0; (len -= l), (chunk = chunk << l))
{
/* Is 2-byte lead */
if ((chunk & 0xF000) == 0xC000)
{
l = 2;
/* ...  ...  */
}
/* Is 3-byte lead */
else if ((chunk & 0xF000) == 0xE000)
{
l = 3;
if (len < l)
break;

/* b2 and b3 should be continuation bytes */
if ((chunk & 0x00C0C000) != 0x00808000)
return sizeof(chunk) - len;

switch (chunk & 0xFF20)
{
/* check 3-byte overlong: 1110. 1001. 10xx.
 * i.e. (b1 == 0xE0 && b2 < 0xA0). We already know b2
is of the form
 * 10xx since it's a continuation char. Additionally
condition b2 <=
 * 0x9F means it is of the form 100x..  i.e.
either 1000.
 * or 1001.. So just verify that it is xx0x.
 */
case 0xE000:
return sizeof(chunk) - len;

/* check surrogate: 1110.1101 101x. 10xx.
 * i.e. (b1 == 0xED && b2 > 0x9F): Here, > 0x9F means either
 * 1010., 1011., 1100., or 1110.. Last
two are not
 * possible because b2 is a continuation char. So it has to be
 * first two. So just verify that it is xx1x.
 */
case 0xED20:
return sizeof(chunk) - len;
default:
;
}

}
/* Is 4-byte lead */
else if ((chunk & 0xF000) == 0xF000)
{
/* .  */
l = 4;
}
else
return sizeof(chunk) - len;
}
return sizeof(chunk) - len;
}




Re: ERROR: "ft1" is of the wrong type.

2021-07-14 Thread Kyotaro Horiguchi
At Wed, 14 Jul 2021 19:55:18 +0900, Michael Paquier  wrote 
in 
> On Fri, Jul 09, 2021 at 09:00:31PM +0900, Kyotaro Horiguchi wrote:
> > Mmm. Ok, I distributed the mother regression test into each version.
> 
> Thanks, my apologies for the late reply.  It took me some time to
> analyze the whole.
..
> > PG13:
> >Of course this works fine but doesn't seem clean, but it is
> >apparently a matter of the master branch.
> > 
> >  - ATT_TABLE | ATT_MATVIEW | ATT_INDEX | ATT_PARTITIONED_INDEX | 
> > ATT_FOREIGN_TABLE
> >Added and works as expected.
> 
> HEAD had its own improvements, and what you have here closes some
> holes of their own, so applied.  Thanks!

Thank you for commiting this!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Remove page-read callback from XLogReaderState.

2021-07-14 Thread Kyotaro Horiguchi
At Thu, 15 Jul 2021 00:39:52 +0500, Ibrar Ahmed  wrote 
in 
> On Wed, Jun 30, 2021 at 12:54 PM Kyotaro Horiguchi 
> wrote:
> > Maybe I will rebase it soon.
> >
> > Yes, rebase is required, therefore I am changing the status to "Waiting On
> Author"
> http://cfbot.cputube.org/patch_33_2113.log

Gah... Thank you for noticing me.  I thought that I have sent the
rebased version. This is the rebased version on the current master.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From cc36aefa3943e67d357be989044c951b5e6c0702 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 7 Jul 2021 13:10:34 +0900
Subject: [PATCH v21] Remove read_page callback from XLogReadRecord.

Previously, the XLogReader module would fetch new input data using a
callback function.  Redesign the interface so that it tells the caller
to insert more data with a return value instead.  This API suits later
patches for prefetching, encryption and probably other future features
that would otherwise require extending the callback interface for new
projects.

As incidental cleanup work, move global variables readOff, readLen and
readSegNo inside XlogReaderState.

Author: Kyotaro HORIGUCHI 
Author: Heikki Linnakangas  (earlier version)
Reviewed-by: Antonin Houska 
Reviewed-by: Alvaro Herrera 
Reviewed-by: Takashi Menjo 
Reviewed-by: Andres Freund 
Reviewed-by: Thomas Munro 
Discussion: https://postgr.es/m/20190418.210257.43726183.horiguchi.kyotaro%40lab.ntt.co.jp
---
 src/backend/access/transam/twophase.c |  14 +-
 src/backend/access/transam/xlog.c | 125 +--
 src/backend/access/transam/xlogreader.c   | 968 +++---
 src/backend/access/transam/xlogutils.c|  25 +-
 src/backend/replication/logical/logical.c |  26 +-
 .../replication/logical/logicalfuncs.c|  13 +-
 src/backend/replication/slotfuncs.c   |  18 +-
 src/backend/replication/walsender.c   |  42 +-
 src/bin/pg_rewind/parsexlog.c | 105 +-
 src/bin/pg_waldump/pg_waldump.c   | 141 +--
 src/include/access/xlogreader.h   | 157 +--
 src/include/access/xlogutils.h|   4 +-
 src/include/pg_config_manual.h|   2 +-
 src/include/replication/logical.h |  11 +-
 14 files changed, 955 insertions(+), 696 deletions(-)

diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 6d3efb49a4..e0c1e8f334 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1330,11 +1330,8 @@ XlogReadTwoPhaseData(XLogRecPtr lsn, char **buf, int *len)
 	char	   *errormsg;
 	TimeLineID	save_currtli = ThisTimeLineID;
 
-	xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
-	XL_ROUTINE(.page_read = _local_xlog_page,
-			   .segment_open = _segment_open,
-			   .segment_close = _segment_close),
-	NULL);
+	xlogreader = XLogReaderAllocate(wal_segment_size, NULL, wal_segment_close);
+
 	if (!xlogreader)
 		ereport(ERROR,
 (errcode(ERRCODE_OUT_OF_MEMORY),
@@ -1342,7 +1339,12 @@ XlogReadTwoPhaseData(XLogRecPtr lsn, char **buf, int *len)
  errdetail("Failed while allocating a WAL reading processor.")));
 
 	XLogBeginRead(xlogreader, lsn);
-	record = XLogReadRecord(xlogreader, );
+	while (XLogReadRecord(xlogreader, , ) ==
+		   XLREAD_NEED_DATA)
+	{
+		if (!read_local_xlog_page(xlogreader))
+			break;
+	}
 
 	/*
 	 * Restore immediately the timeline where it was previously, as
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index c7c928f50b..f9aea2a689 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -819,17 +819,13 @@ static XLogSegNo openLogSegNo = 0;
 
 /*
  * These variables are used similarly to the ones above, but for reading
- * the XLOG.  readOff is the offset of the page just read, readLen
- * indicates how much of it has been read into readBuf, and readSource
+ * the XLOG.  readOff is the offset of the page just read, and readSource
  * indicates where we got the currently open file from.
  * Note: we could use Reserve/ReleaseExternalFD to track consumption of
  * this FD too; but it doesn't currently seem worthwhile, since the XLOG is
  * not read by general-purpose sessions.
  */
 static int	readFile = -1;
-static XLogSegNo readSegNo = 0;
-static uint32 readOff = 0;
-static uint32 readLen = 0;
 static XLogSource readSource = XLOG_FROM_ANY;
 
 /*
@@ -846,13 +842,6 @@ static XLogSource currentSource = XLOG_FROM_ANY;
 static bool lastSourceFailed = false;
 static bool pendingWalRcvRestart = false;
 
-typedef struct XLogPageReadPrivate
-{
-	int			emode;
-	bool		fetching_ckpt;	/* are we fetching a checkpoint record? */
-	bool		randAccess;
-} XLogPageReadPrivate;
-
 /*
  * These variables track when we last obtained some WAL data to process,
  * and where we got it from.  (XLogReceiptSource is initially the same as
@@ -927,10 +916,12 @@ static bool 

Re: row filtering for logical replication

2021-07-14 Thread Dilip Kumar
On Thu, Jul 15, 2021 at 7:37 AM Amit Kapila  wrote:
>
> On Wed, Jul 14, 2021 at 10:55 PM Euler Taveira  wrote:
> >
> > On Wed, Jul 14, 2021, at 12:08 PM, Tomas Vondra wrote:
> >
> > Yeah, but AFAIK that's needed only when replicating DELETEs, so perhaps
> > we could ignore this for subscriptions without DELETE.
> >
> > ... and UPDATE. It seems we have a consensus to use old row in the row 
> > filter
> > for UPDATEs. I think you meant publication.
> >
>
> If I read correctly people are suggesting to use a new row for updates

Right

> but I still suggest completing the analysis (or at least spend some
> more time) Tomas and I requested in the few emails above and then
> conclude on this point.

+1

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




RE: SI messages sent when excuting ROLLBACK PREPARED command

2021-07-14 Thread liuhuail...@fujitsu.com
Hi, tom

Thanks for your reply.

>Hmmm, yeah, I think you're right.  It probably doesn't make a big difference 
>in the real world --- anyone who's dependent on the performance of 2PC 
>rollbaxks is Doing It Wrong. 
> But we'd have already done LocalExecuteInvalidationMessage when getting out 
> of the prepared transaction, so no other SI invals should be needed.
Yes, it does not make any error.

But for the beginner,  when understanding the code,  it may make confused.
And for the developer, when developing based on this code, it may make 
unnecessary handling added. 

So, I think it is better to optimize the code.

Here is the patch.

Regards, liuhl

-Original Message-
From: Tom Lane  
Sent: Thursday, July 15, 2021 1:36 AM
To: Liu, Huailing/刘 怀玲 
Cc: pgsql-hack...@postgresql.org
Subject: Re: SI messages sent when excuting ROLLBACK PREPARED command

"liuhuail...@fujitsu.com"  writes:
> So, I think we needn't send SI messags when rollbacking the two-phase 
> transaction.
> Or Does it has something special because of two-phase transaction?

Hmmm, yeah, I think you're right.  It probably doesn't make a big difference in 
the real world --- anyone who's dependent on the performance of 2PC rollbaxks 
is Doing It Wrong.  But we'd have already done LocalExecuteInvalidationMessage 
when getting out of the prepared transaction, so no other SI invals should be 
needed.

regards, tom lane


twophase.patch
Description: twophase.patch


Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

2021-07-14 Thread Ranier Vilela
Em qua., 14 de jul. de 2021 às 22:22, David Rowley 
escreveu:

> On Thu, 15 Jul 2021 at 12:30, Ranier Vilela  wrote:
> >
> > Em qua., 14 de jul. de 2021 às 21:21, David Rowley 
> escreveu:
> >> But, in v8 there is no additional branch, so no branch to mispredict.
> >> I don't really see how your explanation fits.
> >
> > In v8 the branch occurs at :
> > + if (ExecGetResultType(outerPlanState(sortstate))->natts == 1)
>
> You do know that branch is in a function that's only executed once
> during executor initialization, right?
>
The branch prediction should work better.
I have no idea why it works worse.

I redid all tests:
notebook 8GB RAM 256GB SSD
ubuntu 64 bits (20.04)
clang-12
powerhigh (charger on)
none configuration (all defaults)


  HEAD   v6  v7b  v8  v6
vs head
v7b vs v6   v8 vs v7b
Test1 576,868013 940,947236 1090,253859 1016,0443 163,11% 115,87% 93,19%
Test2 184,748363 177,6254 177,346229 178,230258 96,14% 99,84% 100,50%
Test3 410,030055 541,889704 605,843924 534,946166 132,16% 111,80% 88,30%
Test4 153,331752 147,98418 148,010894 147,771155 96,51% 100,02% 99,84%
Test5 268,97555 301,979647 316,928492 300,94932 112,27% 104,95% 94,96%
Test6 234,910125 259,71483 269,851427 260,567637 110,56% 103,90% 96,56%
Test7 142,704153 136,09163 136,802695 136,935709 95,37% 100,52% 100,10%
Test8 498,634855 763,482151 867,350046 804,833884 153,11% 113,60% 92,79%

The values are high here, because now, the tests are made with full power
of cpu to all patchs!
I think that more testing is needed with v7b and v8.

Anyway, two functions (ExecSortTuple and ExecSortDatum) are almost equal,
maybe not a good idea.

file results attached.

regards,
Ranier Vilela
Benchmarks datumSort:

1) head

a)
Test1
tps = 569.209858 (without initial connection time)
tps = 563.220614 (without initial connection time)
tps = 576.868013 (without initial connection time)
Test2
tps = 181.738827 (without initial connection time)
tps = 181.846192 (without initial connection time)
tps = 184.748363 (without initial connection time)
Test3
tps = 403.548507 (without initial connection time)
tps = 403.089439 (without initial connection time)
tps = 402.613085 (without initial connection time)
Test4
tps = 149.144474 (without initial connection time)
tps = 149.399761 (without initial connection time)
tps = 149.154955 (without initial connection time)
Test5
tps = 267.874525 (without initial connection time)
tps = 268.411193 (without initial connection time)
tps = 268.975550 (without initial connection time)
Test6
tps = 233.527790 (without initial connection time)
tps = 233.633064 (without initial connection time)
tps = 234.910125 (without initial connection time)
Test7
tps = 141.286126 (without initial connection time)
tps = 142.704153 (without initial connection time)
tps = 141.205352 (without initial connection time)
Test8
tps = 487.544464 (without initial connection time)
tps = 491.117533 (without initial connection time)
tps = 488.632738 (without initial connection time)

b)
Test1
tps = 566.830371 (without initial connection time)
tps = 570.918955 (without initial connection time)
tps = 568.750619 (without initial connection time)
Test2
tps = 183.407740 (without initial connection time)
tps = 180.912646 (without initial connection time)
tps = 180.520571 (without initial connection time)
Test3
tps = 407.706768 (without initial connection time)
tps = 410.030055 (without initial connection time)
tps = 408.764070 (without initial connection time)
Test4
tps = 153.331752 (without initial connection time)
tps = 150.824075 (without initial connection time)
tps = 151.791484 (without initial connection time)
Test5
tps = 267.737001 (without initial connection time)
tps = 268.400295 (without initial connection time)
tps = 267.696627 (without initial connection time)
Test6
tps = 232.793249 (without initial connection time)
tps = 232.996156 (without initial connection time)
tps = 233.452329 (without initial connection time)
Test7
tps = 142.180635 (without initial connection time)
tps = 141.762886 (without initial connection time)
tps = 141.488497 (without initial connection time)
Test8
tps = 489.623012 (without initial connection time)
tps = 498.634855 (without initial connection time)
tps = 491.009624 (without initial connection time)


2) v6 Ronan

a)
Test1
tps = 937.913563 (without initial connection time)
tps = 940.947236 (without initial connection time)
tps = 934.811056 (without initial connection time)
Test2
tps = 176.990496 (without initial connection time)
tps = 177.242999 (without initial connection time)
tps = 177.625400 (without initial connection time)
Test3
tps = 539.970412 (without initial connection time)
tps = 540.415402 (without initial connection time)
tps = 541.889704 (without initial connection time)
Test4
tps = 147.006777 (without initial connection time)
tps = 147.984180 (without initial connection time)
tps = 147.605122 (without initial connection time)
Test5
tps = 301.528886 (without 

RE: Avoid repeated PQfnumber() in pg_dump

2021-07-14 Thread houzj.f...@fujitsu.com
On July 15, 2021 5:35 AM Daniel Gustafsson  wrote:
> > On 14 Jul 2021, at 10:54, houzj.f...@fujitsu.com wrote:
> 
> > Since PQfnumber() is not a cheap function, I think we'd better invoke
> > PQfnumber() out of the loop like the attatched patch.
> 
> Looks good on a quick readthrough, and I didn't see any other similar
> codepaths in pg_dump on top of what you've fixed.

Thanks for reviewing the patch.
Added to the CF: https://commitfest.postgresql.org/34/3254/

> > After applying this change, I can see about 8% performance gain in my
> > test environment when dump table definitions which have many columns.
> 
> Out of curiosity, how many columns are "many columns"?

I tried dump 10 table definitions while each table has 1000 columns
(maybe not real world case).

Best regards,
houzj




Re: Remove redundant Assert(PgArchPID == 0); in PostmasterStateMachine

2021-07-14 Thread Fujii Masao




On 2021/07/15 11:21, Michael Paquier wrote:

On Wed, Jul 14, 2021 at 11:38:59PM +0530, Bharath Rupireddy wrote:

It looks like the commit d75288fb [1] added an unnecessary
Assert(PgArchPID == 0); in PostmasterStateMachine as the if block code
gets hit only when PgArchPID == 0. PSA small patch.


Good catch, Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-07-14 Thread Greg Nancarrow
On Sat, Jul 10, 2021 at 3:36 AM Tomas Vondra
 wrote:
>
> But I think removing one of the snapshots (as the v2 does it) is rather
> strange too. I very much doubt having both the transaction and active
> snapshots in the parallel worker is not intentional, and Pavel may very
> well be right this breaks isolation levels that use the xact snapshot
> (i.e. REPEATABLE READ and SERIALIZABLE). I haven't checked, though.
>

Unfortunately there is currently no test, code-comment, README or
developer-discussion that definitively determines which approach (v2
vs  v3/v4) is a valid fix for this issue.
We don't know if having both the transaction and active snapshots in a
parallel worker is intentional or not, and if so, why so?
(certainly in the non-parallel case of the same statement execution,
there is only one snapshot in question here - the obtained transaction
snapshot is pushed as the active snapshot, as it is done in 95% of
cases in the code)
It seems that only the original code authors know how the snapshot
handling in parallel-workers is MEANT to work, and they have yet to
speak up about it here.
At this point, we can only all agree that there is a problem to be fixed here.

My concern with the v3/v4 patch approach is that because the
parallel-workers use a later snapshot to what is actually used in the
execution context for the statement in the parallel leader, then it is
possible for the parallel leader and parallel workers to have
different transaction visibility, and surely this cannot be correct.
For example, suppose a transaction that deletes a row, completes in
the window between these two snapshots.
Couldn't the row be visible to the parallel workers but not to the
parallel leader?
My guess is that currently there are not enough
concurrent-transactions tests to expose such a problem, and the window
here is fairly small.

So we can fiddle xmin values to avoid the immediate Assert issue here,
but it's not addressing potential xmax-related issues.


Regards,
Greg Nancarrow
Fujitsu Australia




Re: [HACKERS] Preserving param location

2021-07-14 Thread Julien Rouhaud
On Wed, Jul 14, 2021 at 02:02:18PM -0400, Tom Lane wrote:
> 
> Hm.  I guess this point (i.e. that the Param substitution should result
> in the identical plan tree as writing a literal constant would have)
> has some force.  I still feel like your application is pretty shaky,
> but I don't really have any ideas about feasible ways to make it better.

I totally agree that it's quite shaky, but I also don't have a better idea on
how to improve it.  In the meantime it already saved my former colleagues and I
(and I'm assuming other people) many hours of boring and tedious work.

> Will push the patch.

Thanks a lot!




Re: Remove redundant Assert(PgArchPID == 0); in PostmasterStateMachine

2021-07-14 Thread Michael Paquier
On Wed, Jul 14, 2021 at 11:38:59PM +0530, Bharath Rupireddy wrote:
> It looks like the commit d75288fb [1] added an unnecessary
> Assert(PgArchPID == 0); in PostmasterStateMachine as the if block code
> gets hit only when PgArchPID == 0. PSA small patch.

Agreed that there is no need to keep that around.  Will fix.
--
Michael


signature.asc
Description: PGP signature


Using a stock openssl BIO

2021-07-14 Thread Andres Freund
Hi,

Thomas^WA bad person recently nerdsniped me (with the help of an accidental use
of an SSL connection in a benchmark leading to poor results) into checking what
would be needed to benefit from SSL/TLS hardware acceleration (available with
suitable hardware, OS support (linux and freebsd) and openssl 3).  One problem
turns out to be the custom BIO we use.

Which made me look at why we use those.

In the backend the first reason is:

 * Private substitute BIO: this does the sending and receiving using send() and
 * recv() instead. This is so that we can enable and disable interrupts
 * just while calling recv(). We cannot have interrupts occurring while
 * the bulk of OpenSSL runs, because it uses malloc() and possibly other
 * non-reentrant libc facilities.

I think this part has been obsolete for a while now (primarily [1]). These days
we always operate the backend sockets in nonblocking mode, and handle blocking
at a higher level. Which is where we then can handle interrupts etc correctly
(which we couldn't really before, because it still wasn't ok to jump out of
openssl).

The second part is
 * We also need to call send() and recv()
 * directly so it gets passed through the socket/signals layer on Win32.

And the not stated need to set/unset pgwin32_noblock around the recv/send
calls.

I don't think the signal handling stuff is still needed with nonblocking
sockets? It seems we could just ensure that there's a pgwin32_poll_signals()
somewhere higher up in secure_read/write()? E.g. in
ProcessClientReadInterrupt()/ProcessClientWriteInterrupt() or with an explicit
call.

And the pgwin32_noblock handling could just be done outside the 
SSL_read/write().



On the client side it looks like things would be a bit harder. The main problem
seems to be dealing with SIGPIPE. We could possibly deal with that by moving
the handling of that a layer up. That's a thick nest of ugly stuff :(.


A problem shared by FE & BE openssl is that openssl internally uses
BIO_set_data() inside the BIO routines we reuse. Oops. I hacked around that
using BIO_set_callback_arg()/BIO_get_callback_arg(), but that's not
particularly pretty either, although it probably is more reliable, since
callbacks are intended to be set separately from the BIO implementation.

A better approach might be to move the code using per-bio custom data from
pqsecure_raw_read() up to pqsecure_read() etc.


If we wanted to be able to use TLS acceleration while continuing to use our
custom socket routines we'd have to copy a good bit more functionality from
openssl into our BIO implementations:(.


FWIW, I don't think hardware tls acceleration is a particularly crucial thing
for now. Outside of backups it's rare to have the symmetric encryption part of
tls be the problem these days thanks, to the AES etc functions in most of the
common CPUs.


I don't plan to work on this, but Thomas encouraged me to mention this on the
list when I mention it to him.

Greetings,

Andres Freund


[1] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=387da18874afa17156ee3af63766f17efb53c4b9




Re: row filtering for logical replication

2021-07-14 Thread Amit Kapila
On Wed, Jul 14, 2021 at 10:55 PM Euler Taveira  wrote:
>
> On Wed, Jul 14, 2021, at 12:08 PM, Tomas Vondra wrote:
>
> Yeah, but AFAIK that's needed only when replicating DELETEs, so perhaps
> we could ignore this for subscriptions without DELETE.
>
> ... and UPDATE. It seems we have a consensus to use old row in the row filter
> for UPDATEs. I think you meant publication.
>

If I read correctly people are suggesting to use a new row for updates
but I still suggest completing the analysis (or at least spend some
more time) Tomas and I requested in the few emails above and then
conclude on this point.

-- 
With Regards,
Amit Kapila.




Re: row filtering for logical replication

2021-07-14 Thread Amit Kapila
On Wed, Jul 14, 2021 at 8:43 PM Alvaro Herrera  wrote:
>
> On 2021-Jul-14, Tomas Vondra wrote:
>
> > The other question is when to check/enforce this. I guess we'll have to do
> > that during decoding, not just when the publication is being created,
> > because the user can do ALTER TABLE later.
>
> ... if you're saying the user can change the replica identity after we
> have some publications with filters defined, then I think we should
> verify during ALTER TABLE and not allow the change if there's a
> publication that requires it.  I mean, during decoding we should be able
> to simply assume that the tuple is correct for what we need at that
> point.
>

+1.

-- 
With Regards,
Amit Kapila.




Re: automatically generating node support functions

2021-07-14 Thread Andres Freund
Hi,

On 2021-07-14 17:42:10 -0400, Tom Lane wrote:
> I think the main reason that the previous patch went nowhere was general
> resistance to making developers install something as complicated as
> libclang --- that could be a big lift on non-mainstream platforms.

I'm still not particularly convinced it's and issue - I was suggesting
we commit the resulting metadata, so libclang would only be needed when
modifying node types. And even in case one needs to desperately modify
node types on a system without access to libclang, for an occasionally
small change one could just modify the committed metadata structs
manually.


> So IMO it's a feature not a bug that Peter's approach just uses a perl
> script.  OTOH, the downstream aspects of your patch did seem appealing.
> So I'd like to see a merger of the two approaches, using perl for the
> data extraction and then something like what you'd done.  Maybe that's
> the same thing you're saying.

Yes, that's what I was trying to say. I'm still doubtful it's a great
idea to go further down the "weird subset of C parsed by regexes" road,
but I can live with it.  If Peter could generate something roughly like
the metadata I emitted, I'd rebase my node functions ontop of that.


> I also see Peter's point that committing what he has here might be
> a reasonable first step on that road.  Getting the data extraction
> right is a big chunk of the job, and what we do with it afterward
> could be improved later.

To me that seems likely to just cause churn without saving much
effort. The needed information isn't really the same between generating
the node functions as text and collecting the metadata for "generic node
functions", and none of the output is the same.

Greetings,

Andres Freund




Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

2021-07-14 Thread David Rowley
On Thu, 15 Jul 2021 at 12:30, Ranier Vilela  wrote:
>
> Em qua., 14 de jul. de 2021 às 21:21, David Rowley  
> escreveu:
>> But, in v8 there is no additional branch, so no branch to mispredict.
>> I don't really see how your explanation fits.
>
> In v8 the branch occurs at :
> + if (ExecGetResultType(outerPlanState(sortstate))->natts == 1)

You do know that branch is in a function that's only executed once
during executor initialization, right?

David




Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

2021-07-14 Thread Ranier Vilela
Em qua., 14 de jul. de 2021 às 21:21, David Rowley 
escreveu:

> On Thu, 15 Jul 2021 at 12:10, Ranier Vilela  wrote:
> >
> > Em qua., 14 de jul. de 2021 às 20:43, David Rowley 
> escreveu:
> >>
> >> On Thu, 15 Jul 2021 at 05:55, Ranier Vilela 
> wrote:
> >> > I repeated (3 times) the benchmark with v8 here,
> >> > and the results were not good.
> >>
> >> Do you have any good theories on why the additional branching that's
> >> done in v7b vs v8 might cause it to run faster?
> >
> >
> > Branch Predictions works with *more* probable path,
> > otherwise a penalty occurs and the cpu must revert the results.
>
> But, in v8 there is no additional branch, so no branch to mispredict.
> I don't really see how your explanation fits.
>
In v8 the branch occurs at :
+ if (ExecGetResultType(outerPlanState(sortstate))->natts == 1)

datumSort is tested first.

Cpu time is a more expensive resource.
Always is executed two branches, if it is right path, win,
otherwise occurs a penalty time.


> It seems much more likely to me that the results were just noisy.  It
> would be good to see if you can recreate them consistently.
>
I do.
Can you please share results with v7b?

regards,
Ranier Vilela


Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

2021-07-14 Thread David Rowley
On Thu, 15 Jul 2021 at 12:10, Ranier Vilela  wrote:
>
> Em qua., 14 de jul. de 2021 às 20:43, David Rowley  
> escreveu:
>>
>> On Thu, 15 Jul 2021 at 05:55, Ranier Vilela  wrote:
>> > I repeated (3 times) the benchmark with v8 here,
>> > and the results were not good.
>>
>> Do you have any good theories on why the additional branching that's
>> done in v7b vs v8 might cause it to run faster?
>
>
> Branch Predictions works with *more* probable path,
> otherwise a penalty occurs and the cpu must revert the results.

But, in v8 there is no additional branch, so no branch to mispredict.
I don't really see how your explanation fits.

It seems much more likely to me that the results were just noisy.  It
would be good to see if you can recreate them consistently.

David




Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

2021-07-14 Thread Ranier Vilela
Em qua., 14 de jul. de 2021 às 20:43, David Rowley 
escreveu:

> On Thu, 15 Jul 2021 at 05:55, Ranier Vilela  wrote:
> > I repeated (3 times) the benchmark with v8 here,
> > and the results were not good.
>
> Do you have any good theories on why the additional branching that's
> done in v7b vs v8 might cause it to run faster?


Branch Predictions works with *more* probable path,
otherwise a penalty occurs and the cpu must revert the results.

In this case it seems to me that most of the time, tuplesort is the path.
So as it is tested if it is *datumSort* and the *prediction* fails,
the cpu has more work to reverse the wrong path.

To help the branch, test a more probable case first, anywhere.
if, switch, etc.

Another gain is the local variable tupleSort, which is obviously faster
than node.

regards,
Ranier Vilela


Re: CREATE TABLE .. PARTITION OF fails to preserve tgenabled for inherited row triggers

2021-07-14 Thread Alvaro Herrera
Looking over the tests added by your (Justin's) patch, there was a
problem checking for non-existance of the CREATE TRIGGER line: it
doesn't work to use "like => {}" (empty), you need to use
"unlike => { everything }".  So your test was not catching the fact that
we were, in fact, emitting the undesirable line.  I have fixed that
here, and verified that the tests are doing what we wanted them to do.

I also verified the new test in 0001.  I was about to add a test for the
legacy inheritance case, but I eventually realized that it was already
being tested by the lines I added in commit bbb927b4db9b.


I have one vote for backpatching 0001 to pg11.  Any others?  To be
clear: the issue is that if a partitioned table has a disabled trigger,
and a partition is created, the trigger is created as enabled in the
partition.  With this patch, the trigger would be created as disabled.
Do people think that that behavior change would be unwelcome?

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"El Maquinismo fue proscrito so pena de cosquilleo hasta la muerte"
(Ijon Tichy en Viajes, Stanislaw Lem)
>From 00708a471aabc3687862bf7151bea75c821c59ee Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 14 Jul 2021 16:27:40 -0400
Subject: [PATCH v4 1/2] Preserve firing-on state when cloning row triggers to
 partitions
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When triggers are cloned from partitioned tables to their partitions,
the 'tgenabled' flag (origin/replica/always/disable) was not propagated.
Make it so that the flag on the trigger on partition is initially set to
the same value as on the partitioned table.

Add a test case to verify the behavior.

Backpatch to 14.  The original behavior, which appeared in pg11 with
commit 86f575948c77 doesn't really make much sense, but it seems risky
to change it on established branches.

Author: Álvaro Herrera 
Reported-by: Justin Pryzby 
Discussion: https://postgr.es/m/20200930223450.ga14...@telsasoft.com
---
 src/backend/commands/tablecmds.c   |  4 +-
 src/backend/commands/trigger.c | 30 +++---
 src/include/commands/trigger.h |  5 +++
 src/test/regress/expected/triggers.out | 56 ++
 src/test/regress/sql/triggers.sql  | 32 +++
 5 files changed, 119 insertions(+), 8 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 96375814a8..dd2aefe1f3 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -17736,10 +17736,10 @@ CloneRowTriggersToPartition(Relation parent, Relation partition)
 		trigStmt->initdeferred = trigForm->tginitdeferred;
 		trigStmt->constrrel = NULL; /* passed separately */
 
-		CreateTrigger(trigStmt, NULL, RelationGetRelid(partition),
+		CreateTriggerFiringOn(trigStmt, NULL, RelationGetRelid(partition),
 	  trigForm->tgconstrrelid, InvalidOid, InvalidOid,
 	  trigForm->tgfoid, trigForm->oid, qual,
-	  false, true);
+	  false, true, trigForm->tgenabled);
 
 		MemoryContextSwitchTo(oldcxt);
 		MemoryContextReset(perTupCxt);
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 952c8d582a..6d4b7ee92a 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -151,6 +151,24 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
 			  Oid relOid, Oid refRelOid, Oid constraintOid, Oid indexOid,
 			  Oid funcoid, Oid parentTriggerOid, Node *whenClause,
 			  bool isInternal, bool in_partition)
+{
+	return
+		CreateTriggerFiringOn(stmt, queryString, relOid, refRelOid,
+			  constraintOid, indexOid, funcoid,
+			  parentTriggerOid, whenClause, isInternal,
+			  in_partition, TRIGGER_FIRES_ON_ORIGIN);
+}
+
+/*
+ * Like the above; additionally the firing condition
+ * (always/origin/replica/disabled) can be specified.
+ */
+ObjectAddress
+CreateTriggerFiringOn(CreateTrigStmt *stmt, const char *queryString,
+	  Oid relOid, Oid refRelOid, Oid constraintOid,
+	  Oid indexOid, Oid funcoid, Oid parentTriggerOid,
+	  Node *whenClause, bool isInternal, bool in_partition,
+	  char trigger_fires_when)
 {
 	int16		tgtype;
 	int			ncolumns;
@@ -849,7 +867,7 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
 			 CStringGetDatum(trigname));
 	values[Anum_pg_trigger_tgfoid - 1] = ObjectIdGetDatum(funcoid);
 	values[Anum_pg_trigger_tgtype - 1] = Int16GetDatum(tgtype);
-	values[Anum_pg_trigger_tgenabled - 1] = CharGetDatum(TRIGGER_FIRES_ON_ORIGIN);
+	values[Anum_pg_trigger_tgenabled - 1] = trigger_fires_when;
 	values[Anum_pg_trigger_tgisinternal - 1] = BoolGetDatum(isInternal || in_partition);
 	values[Anum_pg_trigger_tgconstrrelid - 1] = ObjectIdGetDatum(constrrelid);
 	values[Anum_pg_trigger_tgconstrindid - 1] = ObjectIdGetDatum(indexOid);
@@ -1196,11 +1214,11 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
 	

Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

2021-07-14 Thread David Rowley
On Thu, 15 Jul 2021 at 05:55, Ranier Vilela  wrote:
> I repeated (3 times) the benchmark with v8 here,
> and the results were not good.

Do you have any good theories on why the additional branching that's
done in v7b vs v8 might cause it to run faster?

David




Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

2021-07-14 Thread John Naylor
On Wed, Jul 14, 2021 at 6:14 AM David Rowley  wrote:

> It would be good to get a 2nd opinion about this idea.  Also, more
> benchmark results with v6 and v8 would be good too.

I tested this on an older Xeon, gcc 8.4 (here median of each test, full
results attached):

test HEAD v6 v8

Test1 588 1007 998
Test2 198 202 197
Test3 374 516 512
Test4 172 165 166
Test5 255 279 283
Test6 227 251 251
Test7 145 147 146
Test8 474 783 770

Test4 could be a regression, but 2 and 7 look fine here.

--
John Naylor
EDB: http://www.enterprisedb.com
HEAD

Test1
tps = 576.619577 (without initial connection time)
tps = 588.508521 (without initial connection time)
tps = 591.358071 (without initial connection time)
Test2
tps = 193.739697 (without initial connection time)
tps = 198.485550 (without initial connection time)
tps = 212.013611 (without initial connection time)
Test3
tps = 374.865428 (without initial connection time)
tps = 365.930271 (without initial connection time)
tps = 373.662057 (without initial connection time)
Test4
tps = 170.663828 (without initial connection time)
tps = 174.067103 (without initial connection time)
tps = 172.869990 (without initial connection time)
Test5
tps = 255.232210 (without initial connection time)
tps = 254.580880 (without initial connection time)
tps = 260.868383 (without initial connection time)
Test6
tps = 227.235812 (without initial connection time)
tps = 227.012394 (without initial connection time)
tps = 232.099234 (without initial connection time)
Test7
tps = 144.924898 (without initial connection time)
tps = 141.170317 (without initial connection time)
tps = 152.338359 (without initial connection time)
Test8
tps = 473.734920 (without initial connection time)
tps = 485.953163 (without initial connection time)
tps = 474.231567 (without initial connection time)

v6

Test1
tps = 985.531367 (without initial connection time)
tps = 1006.531956 (without initial connection time)
tps = 1006.928283 (without initial connection time)
Test2
tps = 206.325749 (without initial connection time)
tps = 199.169152 (without initial connection time)
tps = 202.264704 (without initial connection time)
Test3
tps = 515.819017 (without initial connection time)
tps = 514.739251 (without initial connection time)
tps = 526.872129 (without initial connection time)
Test4
tps = 172.372655 (without initial connection time)
tps = 161.658650 (without initial connection time)
tps = 165.141814 (without initial connection time)
Test5
tps = 278.929619 (without initial connection time)
tps = 276.071475 (without initial connection time)
tps = 282.395453 (without initial connection time)
Test6
tps = 251.143322 (without initial connection time)
tps = 248.366339 (without initial connection time)
tps = 254.129250 (without initial connection time)
Test7
tps = 144.616732 (without initial connection time)
tps = 154.306604 (without initial connection time)
tps = 146.636703 (without initial connection time)
Test8
tps = 782.767588 (without initial connection time)
tps = 778.912500 (without initial connection time)
tps = 796.911010 (without initial connection time)

v8

Test1
tps = 976.674468 (without initial connection time)
tps = 997.846584 (without initial connection time)
tps = 997.877722 (without initial connection time)
Test2
tps = 199.376800 (without initial connection time)
tps = 197.235818 (without initial connection time)
tps = 201.615343 (without initial connection time)
Test3
tps = 461.241555 (without initial connection time)
tps = 511.929037 (without initial connection time)
tps = 516.238460 (without initial connection time)
Test4
tps = 166.201945 (without initial connection time)
tps = 166.315151 (without initial connection time)
tps = 169.887666 (without initial connection time)
Test5
tps = 279.323293 (without initial connection time)
tps = 285.120557 (without initial connection time)
tps = 282.674135 (without initial connection time)
Test6
tps = 250.446057 (without initial connection time)
tps = 256.509619 (without initial connection time)
tps = 250.906259 (without initial connection time)
Test7
tps = 155.097565 (without initial connection time)
tps = 145.446077 (without initial connection time)
tps = 145.750081 (without initial connection time)
Test8
tps = 769.952946 (without initial connection time)
tps = 767.403431 (without initial connection time)
tps = 787.201077 (without initial connection time)


Re: Add missing function abs (interval)

2021-07-14 Thread Tom Lane
Isaac Morland  writes:
> I've attached a patch for this. Turns out there was a comment in the source
> explaining that there is no interval_abs because it's not clear what to
> return; but I think it's clear that if i is an interval the larger of i and
> -i should be considered to be the absolute value, the same as would be done
> for any type; essentially, if the type is orderable and has a meaningful
> definition of unary minus, the definition of abs follows from those.

The problem with that blithe summary is the hidden assumption that
values that compare "equal" aren't interesting to distinguish.  As
the discussion back in 2009 pointed out, this doesn't help you decide
what to do with cases like '1 month -30 days'::interval.  Either answer
you might choose seems pretty arbitrary --- and we've got more than
enough arbitrariness in type interval :-(

For similar reasons, I find myself mighty suspicious of your proposal
to change how max(interval) and min(interval) work.  That cannot make
things any better overall --- it will only move the undesirable results
from one set of cases to some other set.  Moreover your argument for
it seems based on a false assumption, that the input values can be
expected to arrive in a particular order.  So I'm inclined to think
that backwards compatibility is sufficient reason to leave that alone.

If we wanted to make some actual progress here, maybe we should
reconsider the definition of equality/ordering for interval, with
an eye to not allowing two intervals to be considered "equal" unless
they really are identical.  That is, for two values that are currently
reported as "equal", apply some more-or-less-arbitrary tiebreak rule,
say by sorting on the individual fields left-to-right.  This would be
very similar to type text's rule that only bitwise-equal strings are
really equal, even if strcoll() claims otherwise.  I am not sure how
feasible this idea is from a compatibility standpoint, but it's
something to think about.

regards, tom lane




Re: postgres_fdw - make cached connection functions tests meaningful

2021-07-14 Thread Tom Lane
Bharath Rupireddy  writes:
> While working on [1], I got to know that there is a new GUC
> debug_invalidate_system_caches_always that has been introduced in v14.
> It can be used to switch off cache invalidation in
> CLOBBER_CACHE_ALWAYS builds which makes cache sensitive tests stable.
> Using this GUC, it is quite possible to make cached connection
> management function tests more meaningful by returning original
> values(true/false, all the output columns) instead of SELECT 1.

Note that this needs an update in the wake of d68a00391.

More generally, though, I am not sure that I believe the premise of
this patch.  AFAICS it's assuming that forcing debug_discard_caches
off guarantees zero cache flushes, which it does not.  (If it could,
we wouldn't need the whole thing; the point of that variable is to
deterministically force flushes which would otherwise be
nondeterministic, not nonexistent.)  Even in a contrib test that
seemingly has nothing else running, background activity such as
autovacuum could result in surprises.  So I fear that what you have
got here is a patch that will work 99% of the time; which is not
good enough for the buildfarm.

regards, tom lane




Re: automatically generating node support functions

2021-07-14 Thread Tom Lane
Andres Freund  writes:
> On 2021-06-08 19:45:58 +0200, Peter Eisentraut wrote:
>> On 08.06.21 15:40, David Rowley wrote:
>>> It's almost 2 years ago now, but I'm wondering if you saw what Andres
>>> proposed in [1]?

>> That project was technologically impressive, but it seemed to have
>> significant hurdles to overcome before it can be useful.  My proposal is
>> usable and useful today.  And it doesn't prevent anyone from working on a
>> more sophisticated solution.

> I think it's short-sighted to further and further go down the path of
> parsing "kind of C" without just using a proper C parser. But leaving
> that aside, a big part of the promise of the approach in that thread
> isn't actually tied to the specific way the type information is
> collected: The perl script could output something like the "node type
> metadata" I generated in that patchset, and then we don't need the large
> amount of generated code and can much more economically add additional
> operations handling node types.

I think the main reason that the previous patch went nowhere was general
resistance to making developers install something as complicated as
libclang --- that could be a big lift on non-mainstream platforms.
So IMO it's a feature not a bug that Peter's approach just uses a perl
script.  OTOH, the downstream aspects of your patch did seem appealing.
So I'd like to see a merger of the two approaches, using perl for the
data extraction and then something like what you'd done.  Maybe that's
the same thing you're saying.

I also see Peter's point that committing what he has here might be
a reasonable first step on that road.  Getting the data extraction
right is a big chunk of the job, and what we do with it afterward
could be improved later.

regards, tom lane




Re: Avoid repeated PQfnumber() in pg_dump

2021-07-14 Thread Daniel Gustafsson
> On 14 Jul 2021, at 10:54, houzj.f...@fujitsu.com wrote:

> Since PQfnumber() is not a cheap function, I think we'd better invoke
> PQfnumber() out of the loop like the attatched patch.

Looks good on a quick readthrough, and I didn't see any other similar codepaths
in pg_dump on top of what you've fixed.

> After applying this change, I can see about 8% performance gain in my test 
> environment
> when dump table definitions which have many columns.

Out of curiosity, how many columns are "many columns"?

--
Daniel Gustafsson   https://vmware.com/





Re: Quick tip on building pg with Visual Studio Build Tools 2019 + small patches

2021-07-14 Thread Andrew Dunstan


On 7/13/21 2:10 AM, Craig Ringer wrote:
>
>
> If you don't have the toolchain installed, you can install Chocolatey
> (there's a one-liner on their website) then:
>
>     choco install -y visualstudio2019buildtools
>
>     choco install -y visualstudio2019-vc++ --packageparameters "--add
> Microsoft.VisualStudio.Component.VC.140"


The first of these is probably redundant, and the second might install
more than required. Here's my recipe for what I use in testing patches
with MSVC:


choco install -y --no-progress --limit-output
visualstudio2019-workload-vctools --install-args="--add
Microsoft.VisualStudio.Component.VC.CLI.Support"


That gives you the normal command line compilers. After that these
packages are installed:


vcredist140 14.29.30037
visualstudio-installer 2.0.1
visualstudio2019-workload-vctools 1.0.1
visualstudio2019buildtools 16.10.1.0


>
> You may also want
>
>     choco install -y winflexbison
>
> (I've attached a patch that teaches pgflex.pl  and
> pgbision.pl  to use win_flex.exe and win_bison.exe
> if they're found, and to accept full paths for these tools in
> buildenv.pl ).



A simpler alternative is just to rename the chocolatey shims. Here's a
ps1 fragment I use:


    $cbin = "c:\ProgramData\chocolatey\bin"
    Rename-Item -Path $cbin\win_bison.exe -NewName bison.exe
    Rename-Item -Path $cbin\win_flex.exe -NewName flex.exe


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Replacing pg_depend PIN entries with a fixed range check

2021-07-14 Thread John Naylor
On Wed, Jul 14, 2021 at 3:34 PM Tom Lane  wrote:

> Hm, I'm not following?  setup_depend runs in initdb, that is on the
> client side.  It can't invoke backend-internal functions any other
> way than via SQL, AFAICS.

Ah, brainfade on my part.

I was also curious about the test case where Andres fixed a regression in
the parent thread [1], and there is a noticeable improvement (lowest of 10
measurements):

HEAD: 623ms
patch: 567ms

If no one else has anything, I think this is ready for commit.

[1]
https://www.postgresql.org/message-id/20210406043521.lopeo7bbigad3n6t%40alap3.anarazel.de

--
John Naylor
EDB: http://www.enterprisedb.com


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-14 Thread Tom Lane
Dave Cramer  writes:
> On Wed, 14 Jul 2021 at 15:09, David G. Johnston 
> wrote:
>> "Install ... files used by the old cluster" (which must be binary
>> compatible with the new cluster as noted elsewhere on that page) supports
>> the claim that it is the old cluster's version that is going to result.
>> But I agree that saying "because these will be upgraded from the old
>> cluster" is poorly worded and should be fixed to be more precise here.
>> 
>> Something like, "... because the installed extensions will be copied from
>> the old cluster during the upgrade."

> This is still rather opaque. Without intimate knowledge of what changes
> have occurred in each extension I have installed; how would I know what I
> have to fix after the upgrade.

That's exactly why we don't force upgrades of extensions.  It is on the
user's head to upgrade their extensions from time to time, but we don't
make them do it as part of pg_upgrade.  (There are also some
implementation-level reasons to avoid this, IIRC, but the overall
choice is intentional.)

I agree this documentation could be worded better.  Another idea
is that possibly pg_upgrade could produce a list of extensions
that are not the latest version, so people know what's left to
be addressed.

regards, tom lane




Re: Remove page-read callback from XLogReaderState.

2021-07-14 Thread Ibrar Ahmed
On Wed, Jun 30, 2021 at 12:54 PM Kyotaro Horiguchi 
wrote:

> At Fri, 09 Apr 2021 09:36:59 +0900 (JST), Kyotaro Horiguchi <
> horikyota@gmail.com> wrote in
> > I'm surprised to see this pushed this soon.  Thanks for pushing this!
>
> Then this has been reverted. I'm not sure how to check for the
> possible defect happens on that platform, but, anyways I reverted the
> CF item to "Needs Review" then moved to the next CF.
>
> Maybe I will rebase it soon.
>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>
>
> Yes, rebase is required, therefore I am changing the status to "Waiting On
Author"
http://cfbot.cputube.org/patch_33_2113.log


-- 
Ibrar Ahmed


Re: free C string

2021-07-14 Thread Tom Lane
Zhihong Yu  writes:
> On Wed, Jul 14, 2021 at 10:17 AM Tom Lane  wrote:
>> There's really very little point in adding such code.  Our memory
>> context mechanisms take care of minor leaks like this, with less
>> code and fewer cycles expended than explicit pfree calls require.
>> It's worth trying to clean up explicitly in code that might get
>> executed many times in a row, or might be allocating very big
>> temporary chunks; but fmgr_internal_validator hardly falls in
>> that category.

> How about this occurrence which is in a loop ?

I'd say the burden is on you to prove that it's worth worrying
about, not vice versa.  If we added pfree everywhere we possibly
could, the code would be larger and slower, not faster.

regards, tom lane




Re: [PATCH] Partial foreign key updates in referential integrity triggers

2021-07-14 Thread Ibrar Ahmed
On Wed, Jul 14, 2021 at 6:51 PM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

>
> On 05.01.21 22:40, Paul Martinez wrote:
> > I've created a patch to better support referential integrity constraints
> when
> > using composite primary and foreign keys. This patch allows creating a
> foreign
> > key using the syntax:
> >
> >FOREIGN KEY (tenant_id, fk_id) REFERENCES fktable ON DELETE SET NULL
> (fk_id)
> >
> > which means that only the fk_id column will be set to NULL when the
> referenced
> > row is deleted, rather than both the tenant_id and fk_id columns.
>
> I think this is an interesting feature with a legitimate use case.
>
> I'm wondering a bit about what the ON UPDATE side of this is supposed to
> mean.  Consider your example:
>
> > CREATE TABLE tenants (id serial PRIMARY KEY);
> > CREATE TABLE users (
> >tenant_id int REFERENCES tenants ON DELETE CASCADE,
> >id serial,
> >PRIMARY KEY (tenant_id, id),
> > );
> > CREATE TABLE posts (
> >  tenant_id int REFERENCES tenants ON DELETE CASCADE,
> >  id serial,
> >  author_id int,
> >  PRIMARY KEY (tenant_id, id),
> >  FOREIGN KEY (tenant_id, author_id) REFERENCES users ON DELETE SET
> NULL
> > );
> >
> > INSERT INTO tenants VALUES (1);
> > INSERT INTO users VALUES (1, 101);
> > INSERT INTO posts VALUES (1, 201, 101);
> > DELETE FROM users WHERE id = 101;
> > ERROR:  null value in column "tenant_id" violates not-null constraint
> > DETAIL:  Failing row contains (null, 201, null).
>
> Consider what should happen when you update users.id.  Per SQL standard,
> for MATCH SIMPLE an ON UPDATE SET NULL should only set to null the
> referencing column that corresponds to the referenced column actually
> updated, not all of them.  PostgreSQL doesn't do this, but if it did,
> then this would work just fine.
>
> Your feature requires specifying a fixed column or columns to update, so
> it cannot react differently to what column actually updated.  In fact,
> you might want different referential actions depending on what columns
> are updated, like what you can do with general triggers.
>
> So, unless I'm missing an angle here, I would suggest leaving out the ON
> UPDATE variant of this feature.
>
>
>
Patch does not apply on head, I am marking the status "Waiting on author"
http://cfbot.cputube.org/patch_33_2932.log

-- 
Ibrar Ahmed


Re: Replacing pg_depend PIN entries with a fixed range check

2021-07-14 Thread Tom Lane
John Naylor  writes:
> On Thu, May 27, 2021 at 6:53 PM Tom Lane  wrote:
>> Attached is a rebase over a4390abec.

> Looks good to me overall, I just had a couple questions/comments:

Thanks for looking!

> isObjectPinned and isSharedObjectPinned are now thin wrappers around
> IsPinnedObject. Is keeping those functions a matter of future-proofing in
> case something needs to be handled differently someday, or reducing
> unnecessary code churn?

Yeah, it was mostly a matter of reducing code churn.  We could probably
drop isSharedObjectPinned altogether, but isObjectPinned seems to have
some notational value in providing an API that takes an ObjectAddress.

> setup_depend now doesn't really need to execute any SQL (unless third-party
> forks have extra steps here?), and could be replaced with a direct call
> to StopGeneratingPinnedObjectIds. That's a bit more self-documenting, and
> that would allow shortening this comment:

Hm, I'm not following?  setup_depend runs in initdb, that is on the
client side.  It can't invoke backend-internal functions any other
way than via SQL, AFAICS.

regards, tom lane




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-14 Thread David G. Johnston
On Wed, Jul 14, 2021 at 12:21 PM Dave Cramer  wrote:

>
> On Wed, 14 Jul 2021 at 15:09, David G. Johnston <
> david.g.johns...@gmail.com> wrote:
>
>>
>> Something like, "... because the installed extensions will be copied from
>> the old cluster during the upgrade."
>>
>
> This is still rather opaque. Without intimate knowledge of what changes
> have occurred in each extension I have installed; how would I know what I
> have to fix after the upgrade.
>

The point of this behavior is that you don't have to fix anything after an
upgrade - so long as your current extension version works on the new
cluster.  If you are upgrading in such a way that the current extension and
new cluster are not compatible you need to not do that.  Upgrade instead to
a lesser version where they are compatible.  Then upgrade your extension to
its newer version, changing any required user code that such an upgrade
requires, then upgrade the server again.

David J.


Re: [PATCH] Hooks at XactCommand level

2021-07-14 Thread Tom Lane
Gilles Darold  writes:
> I have renamed the patch and the title of this proposal registered in 
> the commitfest "Xact/SubXact event callback at command start" to reflect 
> the last changes that do not include new hooks anymore.

Hmm, it doesn't seem like this has addressed my concern at all.
The callbacks are still effectively defined as "run during
start_xact_command", so they're not any less squishy semantically
than they were before.  The point of my criticism was that you
should move the call site to someplace that's more organically
connected to execution of commands.

Another thing I'm not too pleased with in this formulation is that it's
very unclear what the distinction is between XACT_EVENT_COMMAND_START
and SUBXACT_EVENT_COMMAND_START.  AFAICS, *every* potential use-case
for this would have to hook into both callback chains, and most likely
would treat the two events alike.  Plus, as you note, the goalposts
have suddenly been moved for the amount of overhead it's okay to have
in an XactCallback or SubXactCallback function.  So that might cause
problems for unrelated code.  It's probably better to not try to
re-use that infrastructure.



> The objective of this new callback is to be able to call user-defined 
> code before any new statement is executed. For example it can call a 
> rollback to savepoint if there was an error in the previous transaction 
> statement, which allow to implements Rollback at Statement Level at 
> server side using a PostgreSQL extension, see [1] .

Urgh.  Isn't this re-making the same mistake we made years ago, namely
trying to implement autocommit on the server side?  I fear this will
be a disaster even larger than that was, because if it's an extension
then pretty much no applications will be prepared for the new semantics.
I strongly urge you to read the discussions that led up to f85f43dfb,
and to search the commit history before that for mentions of
"autocommit", to see just how extensive the mess was.

I spent a little time trying to locate said discussions; it's harder
than it should be because we didn't have the practice of citing email
threads in the commit log at the time.  I did find

https://www.postgresql.org/message-id/flat/Pine.LNX.4.44.0303172059170.1975-10%40peter.localdomain#7ae26ed5c1bfbf9b22a420dfd8b8e69f

which seems to have been the proximate decision, and here are
a few threads talking about all the messes that were created
for JDBC etc:

https://www.postgresql.org/message-id/flat/3D793A93.703%40xythos.com#4a2e2d9bdf2967906a6e0a75815d6636
https://www.postgresql.org/message-id/flat/3383060E-272E-11D7-BA14-000502E740BA%40wellsgaming.com
https://www.postgresql.org/message-id/flat/Law14-F37PIje6n0ssr0bc1%40hotmail.com

Basically, changing transactional semantics underneath clients is
a horrid idea.  Having such behavior in an extension rather than
the core doesn't make it less horrid.  If we'd designed it to act
that way from day one, maybe it'd have been fine.  But as things
stand, we are quite locked into the position that this has to be
managed on the client side.



That point doesn't necessarily invalidate the value of having
some sort of hook in this general area.  But I would kind of like
to see another use-case, because I don't believe in this one.

regards, tom lane




Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-14 Thread Dave Cramer
On Wed, 14 Jul 2021 at 15:09, David G. Johnston 
wrote:

> On Wed, Jul 14, 2021 at 11:59 AM Dave Cramer  wrote:
>
>>
>>
>> On Wed, 14 Jul 2021 at 14:47, David G. Johnston <
>> david.g.johns...@gmail.com> wrote:
>>
>>> On Wednesday, July 14, 2021, Dave Cramer  wrote:
>>>


 Notice the upgraded version is 1.5 and the new version is 1.8

 I would think somewhere in the upgrade of the schema there should have
 been a create extension pg_stat_statements ?

>>>
>>> That would be a faulty assumption.  Modules do not get upgraded during a
>>> server version upgrade.  This is a good thing, IMO.
>>>
>>
>> This is from the documentation of pg_upgrade
>>
>> Install any custom shared object files (or DLLs) used by the old cluster
>> into the new cluster, e.g., pgcrypto.so, whether they are from contrib or
>> some other source. Do not install the schema definitions, e.g., CREATE
>> EXTENSION pgcrypto, because these will be upgraded from the old cluster.
>> Also, any custom full text search files (dictionary, synonym, thesaurus,
>> stop words) must also be copied to the new cluster.
>>
>> If indeed modules do not get upgraded then the above is confusing at
>> best, and misleading at worst.
>>
>>
> "Install ... files used by the old cluster" (which must be binary
> compatible with the new cluster as noted elsewhere on that page) supports
> the claim that it is the old cluster's version that is going to result.
> But I agree that saying "because these will be upgraded from the old
> cluster" is poorly worded and should be fixed to be more precise here.
>
> Something like, "... because the installed extensions will be copied from
> the old cluster during the upgrade."
>

This is still rather opaque. Without intimate knowledge of what changes
have occurred in each extension I have installed; how would I know what I
have to fix after the upgrade.

Seems to me extensions should either store some information in pg_extension
to indicate compatibility, or they should have some sort of upgrade script
which pg_upgrade would call to fix any problems (yes, I realize this is
hand waving at the moment)

In this example the older version of pg_stat_statements works fine, it only
fails when I do a dump restore of the new database and then the error is
rather obtuse. IIRC pg_dump wanted to revoke all from public from the
function pg_stat_statements_reset() and that could not be found, yet the
function is there. I don't believe we should be surprising our users like
this.

Dave

>
> David J.
>
>


Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails

2021-07-14 Thread Ibrar Ahmed
On Wed, Jul 14, 2021 at 1:41 AM Tom Lane  wrote:

> Ibrar Ahmed  writes:
> > The patch is failing the regression, @Tom Lane  can
> you
> > please take a look at that.
>
> Seems to just need an update of the expected-file to account for test
> cases added recently.  (I take no position on whether the new results
> are desirable; some of these might be breaking the intent of the case.
> But this should quiet the cfbot anyway.)
>
> regards, tom lane
>
>
Thanks for the update.

The test case was added by commit "Add support for asynchronous execution"
"27e1f14563cf982f1f4d71e21ef24782a052" by Etsuro Fujita. He can comment
whether the new results are desirable or not.



-- 
Ibrar Ahmed


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-14 Thread David G. Johnston
On Wed, Jul 14, 2021 at 11:59 AM Dave Cramer  wrote:

>
>
> On Wed, 14 Jul 2021 at 14:47, David G. Johnston <
> david.g.johns...@gmail.com> wrote:
>
>> On Wednesday, July 14, 2021, Dave Cramer  wrote:
>>
>>>
>>>
>>> Notice the upgraded version is 1.5 and the new version is 1.8
>>>
>>> I would think somewhere in the upgrade of the schema there should have
>>> been a create extension pg_stat_statements ?
>>>
>>
>> That would be a faulty assumption.  Modules do not get upgraded during a
>> server version upgrade.  This is a good thing, IMO.
>>
>
> This is from the documentation of pg_upgrade
>
> Install any custom shared object files (or DLLs) used by the old cluster
> into the new cluster, e.g., pgcrypto.so, whether they are from contrib or
> some other source. Do not install the schema definitions, e.g., CREATE
> EXTENSION pgcrypto, because these will be upgraded from the old cluster.
> Also, any custom full text search files (dictionary, synonym, thesaurus,
> stop words) must also be copied to the new cluster.
>
> If indeed modules do not get upgraded then the above is confusing at best,
> and misleading at worst.
>
>
"Install ... files used by the old cluster" (which must be binary
compatible with the new cluster as noted elsewhere on that page) supports
the claim that it is the old cluster's version that is going to result.
But I agree that saying "because these will be upgraded from the old
cluster" is poorly worded and should be fixed to be more precise here.

Something like, "... because the installed extensions will be copied from
the old cluster during the upgrade."

David J.


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-14 Thread Dave Cramer
On Wed, 14 Jul 2021 at 14:47, David G. Johnston 
wrote:

> On Wednesday, July 14, 2021, Dave Cramer  wrote:
>
>>
>>
>> Notice the upgraded version is 1.5 and the new version is 1.8
>>
>> I would think somewhere in the upgrade of the schema there should have
>> been a create extension pg_stat_statements ?
>>
>
> That would be a faulty assumption.  Modules do not get upgraded during a
> server version upgrade.  This is a good thing, IMO.
>

This is from the documentation of pg_upgrade

Install any custom shared object files (or DLLs) used by the old cluster
into the new cluster, e.g., pgcrypto.so, whether they are from contrib or
some other source. Do not install the schema definitions, e.g., CREATE
EXTENSION pgcrypto, because these will be upgraded from the old cluster.
Also, any custom full text search files (dictionary, synonym, thesaurus,
stop words) must also be copied to the new cluster.

If indeed modules do not get upgraded then the above is confusing at best,
and misleading at worst.

Dave


> David J.
>
>


Re: pg_upgrade does not upgrade pg_stat_statements properly

2021-07-14 Thread David G. Johnston
On Wednesday, July 14, 2021, Dave Cramer  wrote:

>
>
> Notice the upgraded version is 1.5 and the new version is 1.8
>
> I would think somewhere in the upgrade of the schema there should have
> been a create extension pg_stat_statements ?
>

That would be a faulty assumption.  Modules do not get upgraded during a
server version upgrade.  This is a good thing, IMO.

David J.


pg_upgrade does not upgrade pg_stat_statements properly

2021-07-14 Thread Dave Cramer
Upgrading from
10.5 to 13.3 using pg_upgrade -k

The following is the result of an upgrade

select * from pg_extension ;
  oid  |  extname   | extowner | extnamespace | extrelocatable |
extversion | extconfig | extcondition
---++--+--+++---+--
 12910 | plpgsql|   10 |   11 | f  |
1.0|   |
 16403 | pg_stat_statements |   10 | 2200 | t  |
1.5|   |
(2 rows)

test=# \df+ pg_stat_statements_reset

  List of functions
 Schema |   Name   | Result data type | Argument data types
| Type | Volatility | Parallel | Owner | Security | Access privileges
   | Language |   Source code| Description
+--+--+-+--++--+---+--+---+--+--+-
 public | pg_stat_statements_reset | void |
 | func | volatile   | safe | davec | invoker  | davec=X/davec
  +| c| pg_stat_statements_reset |
|  |  |
 |  ||  |   |  |
pg_read_all_stats=X/davec |  |  |
(1 row)

And this is from creating the extension in a new db on the same instance

foo=# select * from pg_extension ;
  oid  |  extname   | extowner | extnamespace | extrelocatable |
extversion | extconfig | extcondition
---++--+--+++---+--
 12910 | plpgsql|   10 |   11 | f  |
1.0|   |
 16393 | pg_stat_statements |   10 | 2200 | t  |
1.8|   |
(2 rows)

foo=# \df+ pg_stat_statements_reset

List of functions
 Schema |   Name   | Result data type |
Argument data types | Type | Volatility |
Parallel | Owner | Security | Access privileges | Language | Source
code  | Description
+--+--++--++--+---+--+---+--+--+-
 public | pg_stat_statements_reset | void | userid oid DEFAULT
0, dbid oid DEFAULT 0, queryid bigint DEFAULT 0 | func | volatile   | safe
   | davec | invoker  | davec=X/davec | c|
pg_stat_statements_reset_1_7 |
(1 row)

Notice the upgraded version is 1.5 and the new version is 1.8

I would think somewhere in the upgrade of the schema there should have been
a create extension pg_stat_statements ?

Dave
Dave Cramer


Re: CREATE TABLE .. PARTITION OF fails to preserve tgenabled for inherited row triggers

2021-07-14 Thread Zhihong Yu
On Wed, Jul 14, 2021 at 11:02 AM Alvaro Herrera 
wrote:

> On 2020-Oct-27, Justin Pryzby wrote:
>
> > I think either way could be ok - if you assume that the trigger was
> disabled
> > with ONLY, then it'd make sense to restore it with ONLY, but I think
> it's at
> > least as common to ALTER TABLE [*].  It might look weird to the user if
> they
> > used ALTER TABLE ONLY and the pg_dump output uses ALTER TABLE for that
> table,
> > and then again for all its children (or vice versa).  But it's fine as
> long as
> > the state is correctly restored.
> >
> > There are serveral issues:
> >  - fail to preserve childs' tgenabled in CREATE TABLE PARTITION OF;
> >  - fail to preserve childs' tgenabled in pg_dump;
> >  - fail to preserve childs' comments in pg_dump;
> >
> > I'm going step away from this patch at least for awhile, so I'm
> attaching my
> > latest in case it's useful.
>
> Here's a new cut of this series.  I used your pg_dump patch, but I blank
> out the CREATE TRIGGER query before stashing the ALTER TRIGGER;
> otherwise the dump has an error at restore time (because the trigger is
> created again on the partition, but it already exists because it's been
> created for the parent).  Also, the new query has the "OR tgenabled <>"
> test only if the table is a partition; and apply this new query only in
> 11 and 12; keep 9.x-10 unchanged, because it cannot possibly match
> anything.
>
> I tested this by creating 10k tables with one trigger each (no
> partitioned tables).  Total time to dump is the same as before.  I was
> concerned that because the query now has two LEFT JOINs it would be
> slower; but it seems to be only marginally so.
>
> I'm thinking to apply my patch that changes the server behavior only to
> 14 and up.  I could be persuaded to backpatch all the way to 11, if
> anybody supports the idea.
>
> --
> Álvaro Herrera   39°49'30"S 73°17'W  —
> https://www.EnterpriseDB.com/
> "Puedes vivir sólo una vez, pero si lo haces bien, una vez es suficiente"
>

Hi, Alvaro:
It would be nice if this is backported to PG 11+

Thanks


Re: row filtering for logical replication

2021-07-14 Thread Euler Taveira
On Wed, Jul 14, 2021, at 8:21 AM, Greg Nancarrow wrote:
> Some minor v19 patch review points you might consider for your next
> patch version:
Greg, thanks for another review. I agree with all of these changes. It will be
in the next patch.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: free C string

2021-07-14 Thread Zhihong Yu
On Wed, Jul 14, 2021 at 10:17 AM Tom Lane  wrote:

> Zhihong Yu  writes:
> > I was looking at fmgr_internal_validator().
> > It seems prosrc is only used internally.
> > The patch frees the C string prosrc points to, prior to returning.
>
> There's really very little point in adding such code.  Our memory
> context mechanisms take care of minor leaks like this, with less
> code and fewer cycles expended than explicit pfree calls require.
> It's worth trying to clean up explicitly in code that might get
> executed many times in a row, or might be allocating very big
> temporary chunks; but fmgr_internal_validator hardly falls in
> that category.
>
> regards, tom lane
>
Hi,
How about this occurrence which is in a loop ?

Thanks


c-str-free.patch
Description: Binary data


Re: PROXY protocol support

2021-07-14 Thread Jacob Champion
On Mon, 2021-07-12 at 18:28 +0200, Magnus Hagander wrote:
> Yeah, I have no problem being stricter than necessary, unless that
> actually causes any interop problems. It's a lot worse to not be
> strict enough..

Agreed. Haven't heard back from the HAProxy mailing list yet, so
staying strict seems reasonable in the meantime. That could always be
rolled back later.

> > I've been thinking more about your earlier comment:
> > 
> > > An interesting thing is what to do about
> > > inet_server_addr/inet_server_port. That sort of loops back up to the
> > > original question of where/how to expose the information about the
> > > proxy in general (since right now it just logs). Right now you can
> > > actually use inet_server_port() to see if the connection was proxied
> > > (as long as it was over tcp).
> > 
> > IMO these should return the "virtual" dst_addr/port, instead of
> > exposing the physical connection information to the client. That way,
> > if you intend for your proxy to be transparent, you're not advertising
> > your network internals to connected clients. It also means that clients
> > can reasonably expect to be able to reconnect to the addr:port that we
> > give them, and prevents confusion if the proxy is using an address
> > family that the client doesn't even support (e.g. the client is IPv4-
> > only but the proxy connects via IPv6).
> 
> That reasoning I think makes a lot of sense, especially with the
> comment about being able to connect back to it.
> 
> The question at that point extends to, would we also add extra
> functions to get the data on the proxy connection itself? Maybe add a
> inet_proxy_addr()/inet_proxy_port()? Better names?

What's the intended use case? I have trouble viewing those as anything
but information disclosure vectors, but I'm jaded. :)

If the goal is to give a last-ditch debugging tool to someone whose
proxy isn't behaving properly -- though I'd hope the proxy in question
has its own ways to debug its behavior -- maybe they could be locked
behind one of the pg_monitor roles, so that they're only available to
someone who could get that information anyway?

> PFA a patch that fixes the above errors, and changes
> inet_server_addr()/inet_server_port(). Does not yet add anything to
> receive the actual local port in this case.

Looking good in local testing. I'm going to reread the spec with fresh
eyes and do a full review pass, but this is shaping up nicely IMO.

Something that I haven't thought about very hard yet is proxy
authentication, but I think the simple IP authentication will be enough
for a first version. For the Unix socket case, it looks like anyone
currently relying on peer auth will need to switch to a
unix_socket_group/_permissions model. For now, that sounds like a
reasonable v1 restriction, though I think not being able to set the
proxy socket's permissions separately from the "normal" one might lead
to some complications in more advanced setups.

--Jacob


Remove redundant Assert(PgArchPID == 0); in PostmasterStateMachine

2021-07-14 Thread Bharath Rupireddy
Hi,

It looks like the commit d75288fb [1] added an unnecessary
Assert(PgArchPID == 0); in PostmasterStateMachine as the if block code
gets hit only when PgArchPID == 0. PSA small patch.

[1]
commit d75288fb27b8fe0a926aaab7d75816f091ecdc27
Author: Fujii Masao 
Date:   Mon Mar 15 13:13:14 2021 +0900

Make archiver process an auxiliary process.

Regards,
Bharath Rupireddy.


v1-0001-Remove-redundant-Assert-PgArchPID-0-inPostmasterS.patch
Description: Binary data


Re: [HACKERS] Preserving param location

2021-07-14 Thread Tom Lane
Julien Rouhaud  writes:
> On Tue, Jul 13, 2021 at 07:01:24PM -0400, Tom Lane wrote:
>> So I'm not really convinced that there's a fully-baked use case
>> here, and would like more detail about how you hope to use the
>> location value.

> As I said I have no doubt that there are other cases which are currently not
> correctly handled, but that's not what I'm trying to fix.  I just want to 
> treat
> parameterized queries the same way as regular queries, thus fixing one obvious
> and frequent case (for which some user did complain), hoping that it will on
> its own already help many users.

Hm.  I guess this point (i.e. that the Param substitution should result
in the identical plan tree as writing a literal constant would have)
has some force.  I still feel like your application is pretty shaky,
but I don't really have any ideas about feasible ways to make it better.

Will push the patch.

regards, tom lane




Re: CREATE TABLE .. PARTITION OF fails to preserve tgenabled for inherited row triggers

2021-07-14 Thread Alvaro Herrera
On 2020-Oct-27, Justin Pryzby wrote:

> I think either way could be ok - if you assume that the trigger was disabled
> with ONLY, then it'd make sense to restore it with ONLY, but I think it's at
> least as common to ALTER TABLE [*].  It might look weird to the user if they
> used ALTER TABLE ONLY and the pg_dump output uses ALTER TABLE for that table,
> and then again for all its children (or vice versa).  But it's fine as long as
> the state is correctly restored.
> 
> There are serveral issues:
>  - fail to preserve childs' tgenabled in CREATE TABLE PARTITION OF;
>  - fail to preserve childs' tgenabled in pg_dump;
>  - fail to preserve childs' comments in pg_dump;
> 
> I'm going step away from this patch at least for awhile, so I'm attaching my
> latest in case it's useful.

Here's a new cut of this series.  I used your pg_dump patch, but I blank
out the CREATE TRIGGER query before stashing the ALTER TRIGGER;
otherwise the dump has an error at restore time (because the trigger is
created again on the partition, but it already exists because it's been
created for the parent).  Also, the new query has the "OR tgenabled <>"
test only if the table is a partition; and apply this new query only in
11 and 12; keep 9.x-10 unchanged, because it cannot possibly match
anything.

I tested this by creating 10k tables with one trigger each (no
partitioned tables).  Total time to dump is the same as before.  I was
concerned that because the query now has two LEFT JOINs it would be
slower; but it seems to be only marginally so.

I'm thinking to apply my patch that changes the server behavior only to
14 and up.  I could be persuaded to backpatch all the way to 11, if
anybody supports the idea.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"Puedes vivir sólo una vez, pero si lo haces bien, una vez es suficiente"
>From 29b595d02af124900d9f1e9655f6fe3d2725bfd1 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 16 Oct 2020 10:58:54 -0300
Subject: [PATCH v3 1/2] Preserve firing-on state when cloning row triggers to
 partitions
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When triggers are cloned from partitioned tables to their partitions,
the 'tgenabled' flag (origin/replica/always/disable) was not propagated.
Make it so that the flag on the trigger on partition is initially set to
the same value as on the partitioned table.

Add a test case to verify the behavior.

Backpatch to 14.  The original behavior, which appeared in pg11 with
commit 86f575948c77 doesn't really make much sense, but it seems risky
to change it on established branches.

Author: Álvaro Herrera 
Reported-by: Justin Pryzby 
Discussion: https://postgr.es/m/20200930223450.ga14...@telsasoft.com
---
 src/backend/commands/tablecmds.c   |  4 +-
 src/backend/commands/trigger.c | 30 +++---
 src/include/commands/trigger.h |  5 +++
 src/test/regress/expected/triggers.out | 56 ++
 src/test/regress/sql/triggers.sql  | 32 +++
 5 files changed, 119 insertions(+), 8 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 96375814a8..dd2aefe1f3 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -17736,10 +17736,10 @@ CloneRowTriggersToPartition(Relation parent, Relation partition)
 		trigStmt->initdeferred = trigForm->tginitdeferred;
 		trigStmt->constrrel = NULL; /* passed separately */
 
-		CreateTrigger(trigStmt, NULL, RelationGetRelid(partition),
+		CreateTriggerFiringOn(trigStmt, NULL, RelationGetRelid(partition),
 	  trigForm->tgconstrrelid, InvalidOid, InvalidOid,
 	  trigForm->tgfoid, trigForm->oid, qual,
-	  false, true);
+	  false, true, trigForm->tgenabled);
 
 		MemoryContextSwitchTo(oldcxt);
 		MemoryContextReset(perTupCxt);
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 952c8d582a..6d4b7ee92a 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -151,6 +151,24 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
 			  Oid relOid, Oid refRelOid, Oid constraintOid, Oid indexOid,
 			  Oid funcoid, Oid parentTriggerOid, Node *whenClause,
 			  bool isInternal, bool in_partition)
+{
+	return
+		CreateTriggerFiringOn(stmt, queryString, relOid, refRelOid,
+			  constraintOid, indexOid, funcoid,
+			  parentTriggerOid, whenClause, isInternal,
+			  in_partition, TRIGGER_FIRES_ON_ORIGIN);
+}
+
+/*
+ * Like the above; additionally the firing condition
+ * (always/origin/replica/disabled) can be specified.
+ */
+ObjectAddress
+CreateTriggerFiringOn(CreateTrigStmt *stmt, const char *queryString,
+	  Oid relOid, Oid refRelOid, Oid constraintOid,
+	  Oid indexOid, Oid funcoid, Oid parentTriggerOid,
+	  Node *whenClause, bool isInternal, bool in_partition,
+	  char trigger_fires_when)
 {
 	int16		tgtype;
 	int			

Re: Replacing pg_depend PIN entries with a fixed range check

2021-07-14 Thread John Naylor
On Thu, May 27, 2021 at 6:53 PM Tom Lane  wrote:

> Attached is a rebase over a4390abec.

Looks good to me overall, I just had a couple questions/comments:

isObjectPinned and isSharedObjectPinned are now thin wrappers around
IsPinnedObject. Is keeping those functions a matter of future-proofing in
case something needs to be handled differently someday, or reducing
unnecessary code churn?

setup_depend now doesn't really need to execute any SQL (unless third-party
forks have extra steps here?), and could be replaced with a direct call
to StopGeneratingPinnedObjectIds. That's a bit more self-documenting, and
that would allow shortening this comment:

 /*
* Note that no objects created after setup_depend() will be "pinned".
* They are all droppable at the whim of the DBA.
*/

--
John Naylor
EDB: http://www.enterprisedb.com


Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

2021-07-14 Thread Ranier Vilela
Em qua., 14 de jul. de 2021 às 07:14, David Rowley 
escreveu:

> On Tue, 13 Jul 2021 at 15:15, David Rowley  wrote:
> > In theory, we likely could get rid of the small regression by having
> > two versions of ExecSort() and setting the correct one during
> > ExecInitSort() by setting the function pointer to the version we want
> > to use in sortstate->ss.ps.ExecProcNode.
>
> Just to see how it would perform, I tried what I mentioned above. I've
> included what I ended up with in the attached POC patch.
>
> I got the following results on my AMD hardware.
>
> Test master v8 patch comparison
> Test1   448.0   671.7   149.9%
> Test2   316.4   317.5   100.3%
> Test3   299.5   381.6   127.4%
> Test4   219.7   229.5   104.5%
> Test5   226.3   254.6   112.5%
> Test6   197.9   217.9   110.1%
> Test7   179.2   185.3   103.4%
> Test8   389.2   544.8   140.0%
>
I'm a little surprised by your results.
Test1 and Test8 look pretty good to me.
What is compiler and environment?

I repeated (3 times) the benchmark with v8 here,
and the results were not good.


  HEADv6  v7bv8
v6 vs head  v8 vs v6 v8 vs v7b
Test1 288,149636 449,018541 550,48505 468,168165 155,83% 104,26% 85,05%
Test2 94,766955 95,451406 94,718982 94,800275 100,72% 99,32% 100,09%
Test3 190,521319 260,279802 278,115296 262,538383 136,61% 100,87% 94,40%
Test4 78,779344 78,253455 77,941482 78,471546 99,33% 100,28% 100,68%
Test5 131,362614 142,662223 149,639041 144,849303 108,60% 101,53% 96,80%
Test6 112,884298 124,181671 127,58497 124,29376 110,01% 100,09% 97,42%
Test7 69,308587 68,643067 69,087544 69,437312 99,04% 101,16% 100,51%
Test8 243,674171 364,681142 419,259703 369,239176 149,66% 101,25% 88,07%



> This time I saw no regression on tests 2, 4 and 7.
>
> I looked to see if there was anywhere else in the executor that
> conditionally uses a different exec function in this way and found
> nothing, so I'm not too sure if it's a good idea to start doing this.
>
Specialized functions can be a way to optimize. The compilers themselves do
it.
But the ExecSortTuple and ExecSortDatum are much more similar,
which can cause maintenance problems.
I don't think in this case it would be a good idea.


>
> It would be good to get a 2nd opinion about this idea.  Also, more
> benchmark results with v6 and v8 would be good too.
>
Yeah, another different machine.
I would like to see other results with v7b.

Attached the file with all results from v8.

regards,
Ranier Vilela
Benchmarks datumSort:

6) v8 David

a)
Test1
tps = 426.606950 (without initial connection time)
tps = 420.964492 (without initial connection time)
tps = 429.016435 (without initial connection time)
Test2
tps = 93.388625 (without initial connection time)
tps = 94.571572 (without initial connection time)
tps = 94.581301 (without initial connection time)
Test3
tps = 251.625641 (without initial connection time)
tps = 251.769007 (without initial connection time)
tps = 251.576880 (without initial connection time)
Test4
tps = 77.892592 (without initial connection time)
tps = 77.664981 (without initial connection time)
tps = 77.618023 (without initial connection time)
Test5
tps = 141.801858 (without initial connection time)
tps = 141.957810 (without initial connection time)
tps = 141.849105 (without initial connection time)
Test6
tps = 122.650449 (without initial connection time)
tps = 122.603506 (without initial connection time)
tps = 122.786432 (without initial connection time)
Test7
tps = 68.602538 (without initial connection time)
tps = 68.940470 (without initial connection time)
tps = 68.770827 (without initial connection time)
Test8
tps = 350.593188 (without initial connection time)
tps = 349.741689 (without initial connection time)
tps = 349.544567 (without initial connection time)

b)
Test1
tps = 430.025697 (without initial connection time)
tps = 427.884165 (without initial connection time)
tps = 428.708592 (without initial connection time)
Test2
tps = 94.207150 (without initial connection time)
tps = 93.821936 (without initial connection time)
tps = 93.647174 (without initial connection time)
Test3
tps = 251.784817 (without initial connection time)
tps = 251.336243 (without initial connection time)
tps = 251.431278 (without initial connection time)
Test4
tps = 77.884797 (without initial connection time)
tps = 77.413191 (without initial connection time)
tps = 77.569484 (without initial connection time)
Test5
tps = 141.787480 (without initial connection time)
tps = 142.344187 (without initial connection time)
tps = 141.819273 (without initial connection time)
Test6
tps = 122.848858 (without initial connection time)
tps = 122.935840 (without initial connection time)
tps = 123.559398 (without initial connection time)
Test7
tps = 68.854804 (without initial connection time)
tps = 68.929120 (without initial connection time)
tps = 68.779992 (without initial connection time)
Test8
tps = 349.630138 (without initial connection time)
tps = 

Re: SI messages sent when excuting ROLLBACK PREPARED command

2021-07-14 Thread Tom Lane
"liuhuail...@fujitsu.com"  writes:
> So, I think we needn't send SI messags when rollbacking the two-phase 
> transaction.
> Or Does it has something special because of two-phase transaction?

Hmmm, yeah, I think you're right.  It probably doesn't make a big
difference in the real world --- anyone who's dependent on the
performance of 2PC rollbaxks is Doing It Wrong.  But we'd have
already done LocalExecuteInvalidationMessage when getting out of
the prepared transaction, so no other SI invals should be needed.

regards, tom lane




Re: row filtering for logical replication

2021-07-14 Thread Euler Taveira
On Wed, Jul 14, 2021, at 12:08 PM, Tomas Vondra wrote:
> Yeah, but AFAIK that's needed only when replicating DELETEs, so perhaps 
> we could ignore this for subscriptions without DELETE.
... and UPDATE. It seems we have a consensus to use old row in the row filter
for UPDATEs. I think you meant publication.

> The other question is when to check/enforce this. I guess we'll have to 
> do that during decoding, not just when the publication is being created, 
> because the user can do ALTER TABLE later.
I'm afraid this check during decoding has a considerable cost. If we want to
enforce this condition, I suggest that we add it to CREATE PUBLICATION, ALTER
PUBLICATION ... ADD|SET TABLE and ALTER TABLE ... REPLICA IDENTITY. Data are
being constantly modified; schema is not.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: free C string

2021-07-14 Thread Tom Lane
Zhihong Yu  writes:
> I was looking at fmgr_internal_validator().
> It seems prosrc is only used internally.
> The patch frees the C string prosrc points to, prior to returning.

There's really very little point in adding such code.  Our memory
context mechanisms take care of minor leaks like this, with less
code and fewer cycles expended than explicit pfree calls require.
It's worth trying to clean up explicitly in code that might get
executed many times in a row, or might be allocating very big
temporary chunks; but fmgr_internal_validator hardly falls in
that category.

regards, tom lane




free C string

2021-07-14 Thread Zhihong Yu
Hi,
I was looking at fmgr_internal_validator().

It seems prosrc is only used internally.

The patch frees the C string prosrc points to, prior to returning.

Please take a look.

Thanks


free-c-str.patch
Description: Binary data


Re: printf %s with NULL pointer (was Re: BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped)

2021-07-14 Thread Tom Lane
I wrote:
> Interesting question, so I took a look:
> https://git.musl-libc.org/cgit/musl/tree/src/stdio/vfprintf.c#n593
> case 's':
>   a = arg.p ? arg.p : "(null)";

BTW, the adjacent code shows that musl is also supporting glibc's
"%m" extension, so I imagine that they are endeavoring to be
compatible with glibc, and this goes along with that.  But that
just supports my larger point: printing "(null)" is clearly the
de facto standard now, whether or not POSIX has caught up with it.

regards, tom lane




Re: row filtering for logical replication

2021-07-14 Thread Euler Taveira
On Wed, Jul 14, 2021, at 11:48 AM, Dilip Kumar wrote:
> On Wed, Jul 14, 2021 at 8:04 PM Tomas Vondra
>  wrote:
> >
> 
> > Perhaps the best way forward is to stick to the approach that INSERT
> > uses new, DELETE uses old and UPDATE works as DELETE+INSERT (probably),
> > and leave anything fancier (like being able to reference both versions
> > of the row) for a future patch.
> 
> If UPDATE works as DELETE+ INSERT, does that mean both the OLD row and
> the NEW row should satisfy the filter, then only it will be sent?
> That means if we insert a row that is not satisfying the condition
> (which is not sent to the subscriber) and later if we update that row
> and change the values such that the modified value matches the filter
> then we will not send it because only the NEW row is satisfying the
> condition but OLD row doesn't.  I am just trying to understand your
> idea.  Or you are saying that in this case, we will not send anything
> for the OLD row as it was not satisfying the condition but the
> modified row will be sent as an INSERT operation because this is
> satisfying the condition?
That's a fair argument for the default UPDATE behavior. It seems we have a
consensus that UPDATE operation will use old row. If there is no objections, I
will change it in the next version.

We can certainly discuss the possibilities for UPDATE operations. It can choose
which row to use: old, new or both (using an additional publication argument or
OLD and NEW placeholders to reference old and new rows are feasible ideas).


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: printf %s with NULL pointer (was Re: BUG #17098: Assert failed on composing an error message when adding a type to an extension being dropped)

2021-07-14 Thread Tom Lane
Peter Eisentraut  writes:
> In this particular case, I would for example be quite curious how those 
> alternative minimal C libraries such as musl-libc handle this.

Interesting question, so I took a look:

https://git.musl-libc.org/cgit/musl/tree/src/stdio/vfprintf.c#n593

case 's':
a = arg.p ? arg.p : "(null)";
...

Any others you'd like to consider?

regards, tom lane




Re: VACUUM (DISABLE_PAGE_SKIPPING on)

2021-07-14 Thread vignesh C
On Thu, Apr 8, 2021 at 11:40 PM Simon Riggs  wrote:
>
> On Thu, 8 Apr 2021 at 18:15, Alvaro Herrera  wrote:
> >
> > On 2021-Apr-08, Simon Riggs wrote:
> >
> > > On Thu, 8 Apr 2021 at 16:58, David Steele  wrote:
> >
> > > > It's not clear to me which patch is which, so perhaps move one CF entry
> > > > to next CF and clarify which patch is current?
> > >
> > > Entry: Maximize page freezing
> > > has this patch, perfectly fine, awaiting review from Alvaro since 29 Nov.
> > > one_freeze_then_max_freeze.v7.patch
> >
> > Oh dear, was this waiting on me?  It was not in my to-do, so I didn't
> > pay close attention.
>
> No problem, if I felt the idea deserved priority attention I would
> have pinged you.
>
> I think I'll open a new thread later to allow it to be discussed
> without confusion.

The patch no longer applies on the head, please post a rebased patch.

Regards,
Vignesh




Re: storing an explicit nonce

2021-07-14 Thread vignesh C
On Sat, Jun 26, 2021 at 2:52 AM Bruce Momjian  wrote:
>
> On Wed, May 26, 2021 at 05:02:01PM -0400, Bruce Momjian wrote:
> > For these reasons, if we decide to go in the direction of using a
> > non-LSN nonce, I no longer plan to continue working on this feature. I
> > would rather work on things that have a more positive impact.  Maybe a
> > non-LSN nonce is a better long-term plan, but there are too many
> > unknowns and complexity for me to feel comfortable with it.
>
> As stated above, I have no plans to continue working on this feature.  I
> am attaching my final patches here in case anyone wants to make use of
> them;  it passes check-world and all my private tests.  I have removed
> my patches from the feature wiki page:
>
> https://wiki.postgresql.org/wiki/Transparent_Data_Encryption
>
> and replaced it with a link to this email.

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh




Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2021-07-14 Thread Alvaro Herrera
On 2021-Jul-14, vignesh C wrote:

> The patch does not apply on Head anymore, could you rebase and post a
> patch. I'm changing the status to "Waiting for Author".

BTW I gave a look at this patch in the March commitfest and concluded it
still requires some major surgery that I didn't have time for.  I did so
by re-reading early in the thread to understand what the actual
requirements were for this feature to work, and it seemed to me that the
patch started to derail at some point.  I suggest that somebody needs to
write up exactly what we need, lest the patches end up implementing
something else.

I don't have time for this patch during the current commitfest, and I'm
not sure that I will during the next one.  If somebody else wants to
spend time with it, ... be my guest.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"Sallah, I said NO camels! That's FIVE camels; can't you count?"
(Indiana Jones)




Re: Release SPI plans for referential integrity with DISCARD ALL

2021-07-14 Thread vignesh C
On Wed, Mar 10, 2021 at 1:49 PM yuzuko  wrote:
>
> Hello,
>
> I thought about this suggestion again.
>
> Amit's patch suggested in the thread [1] can eliminate SPI plans from
> INSERT/UPDATE triggers, so our memory pressure issue would be solved.
> But as far as I can see that thread, Amit's patch doesn't cover DELETE case.
> It is not a common case, but there is a risk of pressing memory
> capacity extremely.
> Considering that, this suggestion might still have good value as Corey
> said before.
>
> I improved a patch according to Peter's following comment :
> > but I think the
> > solution of dropping all cached plans as part of DISCARD ALL seems a bit
> > too extreme of a solution.  In the context of connection pooling,
> > getting a new session with pre-cached plans seems like a good thing, and
> > this change could potentially have a performance impact without a
> > practical way to opt out.
>
> In a new patch, I separated discarding SPI Plan caches for RI from DISCARD ALL
> and added a new option "RIPLANS" to DISCARD command to do that.  Also a new
> function, ResetRIPlanCache() which clears SPI plan caches is called by
> DISCARD ALL
> or DISCARD RIPLANS.  The amount of modification is small and this option can 
> be
> removed instantly when it is no longer needed, so I think we can use
> this patch as
> a provisional solution.
>
> Any thoughts?

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh




Re: Reduce the number of special cases to build contrib modules on windows

2021-07-14 Thread vignesh C
On Mon, Apr 19, 2021 at 5:18 PM David Rowley  wrote:
>
> On Wed, 3 Mar 2021 at 22:37, David Rowley  wrote:
> > I've attached a rebased patch.
>
> I've rebased this again.
>
> I also moved away from using hash tables for storing references and
> libraries.  I was having some problems getting psql to compile due to
> the order of the dependencies being reversed due to the order being at
> the mercy of Perl's hash function. There's mention of this in
> Makefile.global.in:
>
> # libpq_pgport is for use by client executables (not libraries) that use 
> libpq.
> # We force clients to pull symbols from the non-shared libraries libpgport
> # and libpgcommon rather than pulling some libpgport symbols from libpq just
> # because libpq uses those functions too.  This makes applications less
> # dependent on changes in libpq's usage of pgport (on platforms where we
> # don't have symbol export control for libpq).  To do this we link to
> # pgport before libpq.  This does cause duplicate -lpgport's to appear
> # on client link lines, since that also appears in $(LIBS).
> # libpq_pgport_shlib is the same idea, but for use in client shared libraries.
>
> I switched these back to arrays but added an additional check to only
> add new items to the array if we don't already have an element with
> the same value.
>
> I've attached the diffs in the *.vcxproj files between patched and unpatched.

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh




Re: [PATCH] New default role allowing to change per-role/database settings

2021-07-14 Thread vignesh C
On Wed, Apr 7, 2021 at 5:23 PM Michael Banck  wrote:
>
> Hi,
>
> Am Dienstag, den 06.04.2021, 15:37 +0200 schrieb Michael Banck:
> > Am Montag, den 05.04.2021, 14:33 -0400 schrieb Stephen Frost:
> > > Should drop the 'DEFAULT_' to match the others since the rename to
> > > 'predefined' roles went in.
> >
> > Right, will send a rebased patch ASAP.
>
> Here's a rebased version that also removes the ALTER DATABASE SET
> capability from this new predefined role and adds some documentation.
>

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh




Re: Have I found an interval arithmetic bug?

2021-07-14 Thread Zhihong Yu
On Thu, Jul 8, 2021 at 10:22 AM Zhihong Yu  wrote:

>
>
> On Wed, Jun 30, 2021 at 9:35 AM Bruce Momjian  wrote:
>
>> On Tue, Jun 29, 2021 at 06:49:45PM +0200, Daniel Gustafsson wrote:
>> > > On 29 Jun 2021, at 18:50, Zhihong Yu  wrote:
>> >
>> > > Now that PG 15 is open for commit, do you think the patch can land ?
>> >
>> > Adding it to the commitfest patch tracker is a good way to ensure it's
>> not
>> > forgotten about:
>> >
>> >   https://commitfest.postgresql.org/33/
>>
>> OK, I have been keeping it in my git tree since I wrote it and will
>> apply it in the next few days.
>>
>> --
>>   Bruce Momjian  https://momjian.us
>>   EDB  https://enterprisedb.com
>>
>>   If only the physical world exists, free will is an illusion.
>>
>> Thanks, Bruce.
>
> Hopefully you can get to this soon.
>

Bruce:
Please see if the patch can be integrated now.

Cheers


Re: Make Append Cost aware of some run time partition prune case

2021-07-14 Thread vignesh C
On Thu, Mar 4, 2021 at 9:51 AM Andy Fan  wrote:
>
>
>>
>> I have implemented a new one, which only handles 1 level of partitioned 
>> table, and
>> only 1 partition key.  and only handle the eq operators like partkey = $1 / 
>> partkey in ($1, $2)
>> / parkey = $1 or partkey = $2;  The patch works well in my user case.  I can 
>> send
>> one on the latest master shortly, and hope it is helpful for you as well.
>>
>
> Hi:
>
> Here is the new patch for this topic,  which has proved works in my limited 
> user
> case (apply the similar logic on pg11), and it is incomplete since more cases
> should be covered but not.  Uploading this patch now is just for discussing 
> and
> testing.
>
> Design principle:
>
> 1). the cost of AppendPath should be reduced for either init partition prune 
> or run
> time partition prune. All of the startup_cost, total_cost, rows should be
> adjusted. As for the startup_cost, if we should adjust it carefully, or else 
> we can
> get the run_time_cost less than 0.
>
> 2). When we merge the subpath from sub-partitioned rel via
> accumulate_append_subpath, currently we just merge the subpaths and discard 
> the
> cost/rows in AppendPath.  After this feature is involved, we may consider to 
> use
> the AppendPath's cost as well during this stage.
>
> 3). When join is involved, AppendPath is not enough since the estimated rows 
> for
> a join relation cares about rel1->rows, rel2->rows only, the Path.rows is not
> cared. so we need to adjust rel->rows as well.  and only for the init
> partition prune case.  (appendrel->rows for planning time partition prune is
> handled already).
>
> The biggest problem of the above is I don't know how to adjust the cost for
> Parallel Append Path. Currently I just ignore it, which would cause some query
> should use Parallel Append Path but not.
>
> Something I don't want to handle really unless we can address its value.
> 1. Operators like  >, <. Between and.
> 2. the uncompleted part key appeared in prunequals. Like we have partition 
> key (a,
>b). But use just use A = 1 as restrictinfo.
>
> The main reason I don't want to handle them are 1). it would be uncommon.
> b). It introduces extra complexity. c). at last, we can't estimate it well 
> like partkey > $1,
> what would be a prune ratio for ).
>
> Something I don't handle so far are: 1). accumulate_append_subpath
> stuff. 2). MergeAppend.  3). Multi Partition key.
>

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh




Re: Implementing Incremental View Maintenance

2021-07-14 Thread vignesh C
On Mon, May 17, 2021 at 10:08 AM Yugo NAGATA  wrote:
>
> On Fri, 7 May 2021 14:14:16 +0900
> Yugo NAGATA  wrote:
>
> > On Mon, 26 Apr 2021 16:03:48 +0900
> > Yugo NAGATA  wrote:
> >
> > > On Mon, 26 Apr 2021 15:46:21 +0900
> > > Yugo NAGATA  wrote:
> > >
> > > > On Tue, 20 Apr 2021 09:51:34 +0900
> > > > Yugo NAGATA  wrote:
> > > >
> > > > > On Mon, 19 Apr 2021 17:40:31 -0400
> > > > > Tom Lane  wrote:
> > > > >
> > > > > > Andrew Dunstan  writes:
> > > > > > > This patch (v22c) just crashed for me with an assertion failure on
> > > > > > > Fedora 31. Here's the stack trace:
> > > > > >
> > > > > > > #2  0x0094a54a in ExceptionalCondition
> > > > > > > (conditionName=conditionName@entry=0xa91dae 
> > > > > > > "queryDesc->sourceText !=
> > > > > > > NULL", errorType=errorType@entry=0x99b468 "FailedAssertion",
> > > > > > > fileName=fileName@entry=0xa91468
> > > > > > > "/home/andrew/pgl/pg_head/src/backend/executor/execMain.c",
> > > > > > > lineNumber=lineNumber@entry=199) at
> > > > > > > /home/andrew/pgl/pg_head/src/backend/utils/error/assert.c:69
> > > > > >
> > > > > > That assert just got added a few days ago, so that's why the patch
> > > > > > seemed OK before.
> > > > >
> > > > > Thank you for letting me know. I'll fix it.
> > > >
> > > > Attached is the fixed patch.
> > > >
> > > > queryDesc->sourceText cannot be NULL after commit b2668d8,
> > > > so now we pass an empty string "" for refresh_matview_datafill() 
> > > > instead NULL
> > > > when maintaining views incrementally.
> > >
> > > I am sorry, I forgot to include a fix for 8aba9322511.
> > > Attached is the fixed version.
> >
> > Attached is the rebased patch (for 6b8d29419d).
>
> I attached a rebased patch.

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh




Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2021-07-14 Thread vignesh C
On Tue, Mar 30, 2021 at 2:14 AM Mark Rofail  wrote:
>
> Hey Alvaro
>
>> Yes, we should do that.
>
> I have attached v12 with more tests in “ src/test/regress/sql/gin.sql”
>
> Changelog:
> - v12 (compatible with current master 2021/03/29, commit 
> 6d7a6feac48b1970c4cd127ee65d4c487acbb5e9)
> * add more tests to “ src/test/regress/sql/gin.sql”
> * merge Andreas' edits to the GIN patch
>
> Also, @Andreas Karlsson, waiting for your edits to "pg_constraint"

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh




Re: DROP INDEX CONCURRENTLY on partitioned index

2021-07-14 Thread vignesh C
On Wed, Oct 28, 2020 at 6:14 AM Justin Pryzby  wrote:
>
> Forking this thread, since the existing CFs have been closed.
> https://www.postgresql.org/message-id/flat/20200914143102.GX18552%40telsasoft.com#58b1056488451f8594b0f0ba40996afd
>
> On Mon, Sep 14, 2020 at 09:31:03AM -0500, Justin Pryzby wrote:
> > On Sat, Sep 12, 2020 at 10:35:34AM +0900, Michael Paquier wrote:
> > > On Fri, Sep 11, 2020 at 07:13:01PM -0500, Justin Pryzby wrote:
> > > > On Tue, Sep 08, 2020 at 01:31:05PM +0900, Michael Paquier wrote:
> > > >> - CIC on partitioned relations.  (Should we also care about DROP INDEX
> > > >> CONCURRENTLY as well?)
> > > >
> > > > Do you have any idea what you think that should look like for DROP INDEX
> > > > CONCURRENTLY ?
> > >
> > > Making the maintenance of the partition tree consistent to the user is
> > > the critical part here, so my guess on this matter is:
> > > 1) Remove each index from the partition tree and mark the indexes as
> > > invalid in the same transaction.  This makes sure that after commit no
> > > indexes would get used for scans, and the partition dependency tree
> > > pis completely removed with the parent table.  That's close to what we
> > > do in index_concurrently_swap() except that we just want to remove the
> > > dependencies with the partitions, and not just swap them of course.
> > > 2) Switch each index to INDEX_DROP_SET_DEAD, one per transaction
> > > should be fine as that prevents inserts.
> > > 3) Finish the index drop.
> > >
> > > Step 2) and 3) could be completely done for each index as part of
> > > index_drop().  The tricky part is to integrate 1) cleanly within the
> > > existing dependency machinery while still knowing about the list of
> > > partitions that can be removed.  I think that this part is not that
> > > straight-forward, but perhaps we could just make this logic part of
> > > RemoveRelations() when listing all the objects to remove.
> >
> > Thanks.
> >
> > I see three implementation ideas..
> >
> > 1. I think your way has an issue that the dependencies are lost.  If 
> > there's an
> > interruption, the user is maybe left with hundreds or thousands of detached
> > indexes to clean up.  This is strange since there's actually no detach 
> > command
> > for indexes (but they're implicitly "attached" when a matching parent index 
> > is
> > created).  A 2nd issue is that DETACH currently requires an exclusive lock 
> > (but
> > see Alvaro's WIP patch).
>
> I think this is a critical problem with this approach.  It's not ok if a
> failure leaves behind N partition indexes not attached to any parent.
> They may have pretty different names, which is a mess to clean up.
>
> > 2. Maybe the easiest way is to mark all indexes invalid and then drop all
> > partitions (concurrently) and then the partitioned parent.  If interrupted,
> > this would leave a parent index marked "invalid", and some child tables 
> > with no
> > indexes.  I think this may be "ok".  The same thing is possible if a 
> > concurrent
> > build is interrupted, right ?
>
> I think adding the "invalid" mark in the simple/naive way isn't enough - it 
> has
> to do everything DROP INDEX CONCURRENTLY does (of course).
>
> > 3. I have a patch which changes index_drop() to "expand" a partitioned 
> > index into
> > its list of children.  Each of these becomes a List:
> > | indexId, heapId, userIndexRelation, userHeapRelation, heaplocktag, 
> > heaprelid, indexrelid
> > The same process is followed as for a single index, but handling all 
> > partitions
> > at once in two transactions total.  Arguably, this is bad since that 
> > function
> > currently takes a single Oid but would now ends up operating on a list of 
> > indexes.
>
> This is what's implemented in the attached.  It's very possible I've missed
> opportunities for better simplification/integration.

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh




Re: [PATCH] Allow multiple recursive self-references

2021-07-14 Thread vignesh C
On Wed, Mar 31, 2021 at 7:28 PM Denis Hirn  wrote:
>
> Sorry, I didn't append the patch properly.

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh




Re: libpq compression

2021-07-14 Thread vignesh C
On Wed, Jul 14, 2021 at 6:31 PM Daniil Zakhlystov
 wrote:
>
> **sorry for the noise, but I need to re-send the message because one of the 
> recipients is blocked on the pgsql-hackers for some reason**
>
> Hi!
>
> Done, the patch should apply to the current master now.
>
> Actually, I have an almost-finished version of the patch with the previously 
> requested asymmetric compression negotiation. I plan to attach it soon.

Thanks for providing the patch quickly, I have changed the status to
"Need Review".

Regards,
Vignesh




Re: Extending amcheck to check toast size and compression

2021-07-14 Thread Mark Dilger



> On Jul 14, 2021, at 7:57 AM, Mark Dilger  wrote:
> 
> so no valid toast pointer should contain a va_rawsize field greater than 
> MaxAllocSize

... nor should any valid toast pointer contain a va_extinfo field encoding a 
va_extsize greater than va_rawsize - VARHDRSZ.

Violations of either of these properties suggest either a bug in the code which 
wrote the toast pointer, or that the toast pointer has been corrupted since 
being written, or that the page of data being read is being interpreted 
incorrectly, perhaps due to catalog corruption, or because the page is just 
random noise and not part of a valid table, etc.  The amcheck code is not 
focused specifically on whether the toasted value can be detoasted so much as 
deducing that the data cannot be correct.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: ResourceOwner refactoring

2021-07-14 Thread Zhihong Yu
On Wed, Jul 14, 2021 at 8:26 AM Heikki Linnakangas  wrote:

> Thanks for having a look!
>
> On 14/07/2021 18:18, Zhihong Yu wrote:
> > For the loop over the hash:
> >
> > +   for (int idx = 0; idx < capacity; idx++)
> >  {
> > -   if (olditemsarr[i] != resarr->invalidval)
> > -   ResourceArrayAdd(resarr, olditemsarr[i]);
> > +   while (owner->hash[idx].kind != NULL &&
> > +  owner->hash[idx].kind->phase == phase)
> > ...
> > +   } while (capacity != owner->capacity);
> >
> > Since the phase variable doesn't seem to change for the while loop, I
> > wonder what benefit the while loop has (since the release is governed by
> > phase).
>
> Hmm, the phase variable doesn't change, but could the element at
> 'owner->hash[idx]' change? I'm not sure about that. The loop is supposed
> to handle the case that the hash table grows; could that replace the
> element at 'owner->hash[idx]' with something else, with different phase?
> The check is very cheap, so I'm inclined to keep it to be sure.
>
> - Heikki
>
Hi,
Agreed that ```owner->hash[idx].kind->phase == phase``` can be kept.

I just wonder if we should put limit on the number of iterations for the
while loop.
Maybe add a bool variable indicating that kind->ReleaseResource(value) is
called in the inner loop.
If there is no releasing when the inner loop finishes, we can come out of
the while loop.

Cheers


Re: ResourceOwner refactoring

2021-07-14 Thread Heikki Linnakangas

Thanks for having a look!

On 14/07/2021 18:18, Zhihong Yu wrote:

For the loop over the hash:

+       for (int idx = 0; idx < capacity; idx++)
         {
-           if (olditemsarr[i] != resarr->invalidval)
-               ResourceArrayAdd(resarr, olditemsarr[i]);
+           while (owner->hash[idx].kind != NULL &&
+                  owner->hash[idx].kind->phase == phase)
...
+   } while (capacity != owner->capacity);

Since the phase variable doesn't seem to change for the while loop, I 
wonder what benefit the while loop has (since the release is governed by 
phase).


Hmm, the phase variable doesn't change, but could the element at 
'owner->hash[idx]' change? I'm not sure about that. The loop is supposed 
to handle the case that the hash table grows; could that replace the 
element at 'owner->hash[idx]' with something else, with different phase? 
The check is very cheap, so I'm inclined to keep it to be sure.


- Heikki




Re: Avoid repeated PQfnumber() in pg_dump

2021-07-14 Thread Justin Pryzby
On Wed, Jul 14, 2021 at 08:54:32AM +, houzj.f...@fujitsu.com wrote:
> Since PQfnumber() is not a cheap function, I think we'd better invoke
> PQfnumber() out of the loop like the attatched patch.

+1

Please add to the next CF

-- 
Justin




Re: ResourceOwner refactoring

2021-07-14 Thread Zhihong Yu
On Wed, Jul 14, 2021 at 7:40 AM Heikki Linnakangas  wrote:

> On 14/07/2021 17:07, Alvaro Herrera wrote:
> > On 2021-Jul-14, vignesh C wrote:
> >
> >> On Tue, Mar 9, 2021 at 6:10 PM Heikki Linnakangas 
> wrote:
> >
> >>> Here you go.
> >>
> >> The patch does not apply on Head anymore, could you rebase and post a
> >> patch. I'm changing the status to "Waiting for Author".
> >
> > Support for hmac was added by e6bdfd9700eb so the rebase is not trivial.
>
> Yeah, needed some manual fixing, but here you go.
>
> - Heikki
>
Hi,
For the loop over the hash:

+   for (int idx = 0; idx < capacity; idx++)
{
-   if (olditemsarr[i] != resarr->invalidval)
-   ResourceArrayAdd(resarr, olditemsarr[i]);
+   while (owner->hash[idx].kind != NULL &&
+  owner->hash[idx].kind->phase == phase)
...
+   } while (capacity != owner->capacity);

Since the phase variable doesn't seem to change for the while loop, I
wonder what benefit the while loop has (since the release is governed by
phase).

Cheers


Re: row filtering for logical replication

2021-07-14 Thread Alvaro Herrera
On 2021-Jul-14, Tomas Vondra wrote:

> On 7/14/21 4:52 PM, Alvaro Herrera wrote:

> > In any case, it seems to me that the condition expression should be
> > scanned to see which columns are used in Vars (pull_varattnos?), and
> > verify if those columns are in the REPLICA IDENTITY; and if they are
> > not, raise an error.  Most of the time the REPLICA IDENTITY is going to
> > be the primary key; but if the user wants to use other columns in the
> > expression, we can HINT that they can set REPLICA IDENTITY FULL.
> 
> Yeah, but AFAIK that's needed only when replicating DELETEs, so perhaps we
> could ignore this for subscriptions without DELETE.

Yeah, I said that too in my older reply :-)

> The other question is when to check/enforce this. I guess we'll have to do
> that during decoding, not just when the publication is being created,
> because the user can do ALTER TABLE later.

... if you're saying the user can change the replica identity after we
have some publications with filters defined, then I think we should
verify during ALTER TABLE and not allow the change if there's a
publication that requires it.  I mean, during decoding we should be able
to simply assume that the tuple is correct for what we need at that
point.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/




Re: row filtering for logical replication

2021-07-14 Thread Tomas Vondra




On 7/14/21 4:52 PM, Alvaro Herrera wrote:

On 2021-Jul-14, Tomas Vondra wrote:


The way I'm thinking about this is that for INSERT and DELETE it's clear
which row version should be used (because there's just one). And for UPDATE
we could see that as DELETE + INSERT, and apply the same rule to each
action.

On the other hand, I can imagine cases where it'd be useful to send the
UPDATE when the old row matches the condition and new row does not.


In any case, it seems to me that the condition expression should be
scanned to see which columns are used in Vars (pull_varattnos?), and
verify if those columns are in the REPLICA IDENTITY; and if they are
not, raise an error.  Most of the time the REPLICA IDENTITY is going to
be the primary key; but if the user wants to use other columns in the
expression, we can HINT that they can set REPLICA IDENTITY FULL.



Yeah, but AFAIK that's needed only when replicating DELETEs, so perhaps 
we could ignore this for subscriptions without DELETE.


The other question is when to check/enforce this. I guess we'll have to 
do that during decoding, not just when the publication is being created, 
because the user can do ALTER TABLE later.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Polyphase merge is obsolete

2021-07-14 Thread Heikki Linnakangas

On 14/07/2021 15:12, vignesh C wrote:

On Sat, Jan 23, 2021 at 3:49 AM Heikki Linnakangas  wrote:

Here's an updated version that fixes one bug:

The CFBot was reporting a failure on the FreeBSD system [1]. It turned
out to be an out-of-memory issue caused by an underflow bug in the
calculation of the size of the tape read buffer size. With a small
work_mem size, the memory left for tape buffers was negative, and that
wrapped around to a very large number. I believe that was not caught by
the other systems, because the other ones had enough memory for the
incorrectly-sized buffers anyway. That was the case on my laptop at
least. It did cause a big slowdown in the 'tuplesort' regression test
though, which I hadn't noticed.

The fix for that bug is here as a separate patch for easier review, but
I'll squash it before committing.


The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".


Here's a rebased version. I also squashed that little bug fix from 
previous patch set.


- Heikki
>From 027b20e74b3d8b09144cffe6d071706231e496d7 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 21 Oct 2020 21:57:42 +0300
Subject: [PATCH 1/2] Refactor LogicalTapeSet/LogicalTape interface.

All the tape functions, like LogicalTapeRead and LogicalTapeWrite, now
take a LogicalTape as argument, instead of LogicalTapeSet+tape number.
You can create any number of LogicalTapes in a single LogicalTapeSet, and
you don't need to decide the number upfront, when you create the tape set.

This makes the tape management in hash agg spilling in nodeAgg.c simpler.
---
 src/backend/executor/nodeAgg.c | 187 
 src/backend/utils/sort/logtape.c   | 457 -
 src/backend/utils/sort/tuplesort.c | 229 +++
 src/include/nodes/execnodes.h  |   3 +-
 src/include/utils/logtape.h|  37 ++-
 5 files changed, 360 insertions(+), 553 deletions(-)

diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 914b02ceee4..80025e5f1f3 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -208,7 +208,16 @@
  *
  *	  Spilled data is written to logical tapes. These provide better control
  *	  over memory usage, disk space, and the number of files than if we were
- *	  to use a BufFile for each spill.
+ *	  to use a BufFile for each spill.  We don't know the number of tapes needed
+ *	  at the start of the algorithm (because it can recurse), so a tape set is
+ *	  allocated at the beginning, and individual tapes are created as needed.
+ *	  As a particular tape is read, logtape.c recycles its disk space. When a
+ *	  tape is read to completion, it is destroyed entirely.
+ *
+ *	  Tapes' buffers can take up substantial memory when many tapes are open at
+ *	  once. We only need one tape open at a time in read mode (using a buffer
+ *	  that's a multiple of BLCKSZ); but we need one tape open in write mode (each
+ *	  requiring a buffer of size BLCKSZ) for each partition.
  *
  *	  Note that it's possible for transition states to start small but then
  *	  grow very large; for instance in the case of ARRAY_AGG. In such cases,
@@ -311,27 +320,6 @@
  */
 #define CHUNKHDRSZ 16
 
-/*
- * Track all tapes needed for a HashAgg that spills. We don't know the maximum
- * number of tapes needed at the start of the algorithm (because it can
- * recurse), so one tape set is allocated and extended as needed for new
- * tapes. When a particular tape is already read, rewind it for write mode and
- * put it in the free list.
- *
- * Tapes' buffers can take up substantial memory when many tapes are open at
- * once. We only need one tape open at a time in read mode (using a buffer
- * that's a multiple of BLCKSZ); but we need one tape open in write mode (each
- * requiring a buffer of size BLCKSZ) for each partition.
- */
-typedef struct HashTapeInfo
-{
-	LogicalTapeSet *tapeset;
-	int			ntapes;
-	int		   *freetapes;
-	int			nfreetapes;
-	int			freetapes_alloc;
-} HashTapeInfo;
-
 /*
  * Represents partitioned spill data for a single hashtable. Contains the
  * necessary information to route tuples to the correct partition, and to
@@ -343,9 +331,8 @@ typedef struct HashTapeInfo
  */
 typedef struct HashAggSpill
 {
-	LogicalTapeSet *tapeset;	/* borrowed reference to tape set */
 	int			npartitions;	/* number of partitions */
-	int		   *partitions;		/* spill partition tape numbers */
+	LogicalTape **partitions;	/* spill partition tapes */
 	int64	   *ntuples;		/* number of tuples in each partition */
 	uint32		mask;			/* mask to find partition from hash value */
 	int			shift;			/* after masking, shift by this amount */
@@ -365,8 +352,7 @@ typedef struct HashAggBatch
 {
 	int			setno;			/* grouping set */
 	int			used_bits;		/* number of bits of hash already used */
-	LogicalTapeSet *tapeset;	/* borrowed reference to tape set */
-	int			input_tapenum;	/* input partition tape */
+	

Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

2021-07-14 Thread Alvaro Herrera
On 2021-Jul-14, Kyotaro Horiguchi wrote:

> > >  pg_log_error("extra_float_digits must be in range 
> > > -15..3");
> > >  exit_nicely(1);
> > 
> > Should we take this occasion to reduce the burden of translators and
> > reword that as "%s must be in range %d..%d"?  That could be a separate
> > patch.

Yes, please, let's do it here.

> The first %s is not always an invariable symbol name so it could
> result in concatenating several phrases into one, like this.
> 
>  pg_log..("%s must be in range %s..%s", _("compression level"), "0", "9"))
> 
> It is translatable at least into Japanese but I'm not sure about other
> languages.

No, this doesn't work.  When the first word is something that is
not to be translated (a literal parameter name), let's use a %s (but of
course the values must be taken out of the phrase too).  But when it is
a translatable phrase, it must be present a full phrase, no
substitution:

pg_log_error("%s must be in range %s..%s", "extra_float_digits", "-15", 
"3");
pg_log..("compression level must be in range %s..%s", "0", "9"))

I think the purity test is whether you want to use a _() around the
argument, then you have to include it into the original message.

Thanks

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"After a quick R of TFM, all I can say is HOLY CR** THAT IS COOL! PostgreSQL was
amazing when I first started using it at 7.2, and I'm continually astounded by
learning new features and techniques made available by the continuing work of
the development team."
Berend Tober, http://archives.postgresql.org/pgsql-hackers/2007-08/msg01009.php




Re: Extending amcheck to check toast size and compression

2021-07-14 Thread Mark Dilger



> On Jul 14, 2021, at 3:33 AM, Heikki Linnakangas  wrote:
> 
>> +/* The largest valid toast va_rawsize */
>> +#define VARLENA_SIZE_LIMIT 0x3FFF
>> +
> 
> Hmm, a toasted datum cannot be larger than MaxAllocSize, because it's 
> reconstituted in a palloc'd datum, right?

No datum size exceeds MaxAllocSize, and no datum expands when compressed 
(because for those that do expand under any particular compression algorithm, 
we opt to instead store the datum uncompressed), so no valid toast pointer 
should contain a va_rawsize field greater than MaxAllocSize.  Any toast 
pointers that have larger va_rawsize fields are therefore corrupt.

VARLENA_SIZE_LIMIT is defined here equal to MaxAllocSize:

src/include/utils/memutils.h:#define MaxAllocSize   ((Size) 0x3fff) 
/* 1 gigabyte - 1 */

Earlier versions of the patch used MaxAllocSize rather than defining 
VARLENA_SIZE_LIMIT, but review comments suggested that was less clear.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: row filtering for logical replication

2021-07-14 Thread Tomas Vondra




On 7/14/21 4:48 PM, Dilip Kumar wrote:

On Wed, Jul 14, 2021 at 8:04 PM Tomas Vondra
 wrote:





Perhaps the best way forward is to stick to the approach that INSERT
uses new, DELETE uses old and UPDATE works as DELETE+INSERT (probably),
and leave anything fancier (like being able to reference both versions
of the row) for a future patch.


If UPDATE works as DELETE+ INSERT, does that mean both the OLD row and
the NEW row should satisfy the filter, then only it will be sent?
That means if we insert a row that is not satisfying the condition
(which is not sent to the subscriber) and later if we update that row
and change the values such that the modified value matches the filter
then we will not send it because only the NEW row is satisfying the
condition but OLD row doesn't.  I am just trying to understand your
idea.  Or you are saying that in this case, we will not send anything
for the OLD row as it was not satisfying the condition but the
modified row will be sent as an INSERT operation because this is
satisfying the condition?



Good questions. I'm not sure, I probably have not thought it through.

So yeah, I think we should probably stick to the principle that what we 
send needs to match the filter condition, which applied to this case 
would mean we should be looking at the new row version.


The more elaborate scenarios can be added later by a patch allowing to 
explicitly reference the old/new row versions.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: row filtering for logical replication

2021-07-14 Thread Alvaro Herrera
On 2021-Jul-14, Tomas Vondra wrote:

> The way I'm thinking about this is that for INSERT and DELETE it's clear
> which row version should be used (because there's just one). And for UPDATE
> we could see that as DELETE + INSERT, and apply the same rule to each
> action.
> 
> On the other hand, I can imagine cases where it'd be useful to send the
> UPDATE when the old row matches the condition and new row does not.

In any case, it seems to me that the condition expression should be
scanned to see which columns are used in Vars (pull_varattnos?), and
verify if those columns are in the REPLICA IDENTITY; and if they are
not, raise an error.  Most of the time the REPLICA IDENTITY is going to
be the primary key; but if the user wants to use other columns in the
expression, we can HINT that they can set REPLICA IDENTITY FULL.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
 It does it in a really, really complicated way
 why does it need to be complicated?
 Because it's MakeMaker.




Re: row filtering for logical replication

2021-07-14 Thread Dilip Kumar
On Wed, Jul 14, 2021 at 8:04 PM Tomas Vondra
 wrote:
>

> Perhaps the best way forward is to stick to the approach that INSERT
> uses new, DELETE uses old and UPDATE works as DELETE+INSERT (probably),
> and leave anything fancier (like being able to reference both versions
> of the row) for a future patch.

If UPDATE works as DELETE+ INSERT, does that mean both the OLD row and
the NEW row should satisfy the filter, then only it will be sent?
That means if we insert a row that is not satisfying the condition
(which is not sent to the subscriber) and later if we update that row
and change the values such that the modified value matches the filter
then we will not send it because only the NEW row is satisfying the
condition but OLD row doesn't.  I am just trying to understand your
idea.  Or you are saying that in this case, we will not send anything
for the OLD row as it was not satisfying the condition but the
modified row will be sent as an INSERT operation because this is
satisfying the condition?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: ResourceOwner refactoring

2021-07-14 Thread Heikki Linnakangas

On 14/07/2021 17:07, Alvaro Herrera wrote:

On 2021-Jul-14, vignesh C wrote:


On Tue, Mar 9, 2021 at 6:10 PM Heikki Linnakangas  wrote:



Here you go.


The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".


Support for hmac was added by e6bdfd9700eb so the rebase is not trivial.


Yeah, needed some manual fixing, but here you go.

- Heikki
>From 19fb4369e3b7bfb29afc6cfe597af5a89698dd3f Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 13 Jan 2021 12:21:28 +0200
Subject: [PATCH v8 1/3] Move a few ResourceOwnerEnlarge() calls for safety and
 clarity.

These are functions where quite a lot of things happen between the
ResourceOwnerEnlarge and ResourceOwnerRemember calls. It's important that
there are no unrelated ResourceOwnerRemember() calls in the code
inbetween, otherwise the entry reserved by the ResourceOwnerEnlarge() call
might be used up by the intervening ResourceOwnerRemember() and not be
available at the intended ResourceOwnerRemember() call anymore. The longer
the code path between them is, the harder it is to verify that.

In bufmgr.c, there is a function similar to ResourceOwnerEnlarge(),
to ensure that the private refcount array has enough space. The
ReservePrivateRefCountEntry() calls, analogous to ResourceOwnerEnlarge(),
were made at different places than the ResourceOwnerEnlarge() calls.
Move the ResourceOwnerEnlarge() calls together with the
ReservePrivateRefCountEntry() calls for consistency.
---
 src/backend/storage/buffer/bufmgr.c   | 39 +++
 src/backend/storage/buffer/localbuf.c |  3 +++
 src/backend/utils/cache/catcache.c| 13 ++---
 3 files changed, 28 insertions(+), 27 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 86ef607ff38..2383e4cac08 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -810,9 +810,6 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 
 	*hit = false;
 
-	/* Make sure we will have room to remember the buffer pin */
-	ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
-
 	isExtend = (blockNum == P_NEW);
 
 	TRACE_POSTGRESQL_BUFFER_READ_START(forkNum, blockNum,
@@ -1165,9 +1162,11 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 	{
 		/*
 		 * Ensure, while the spinlock's not yet held, that there's a free
-		 * refcount entry.
+		 * refcount entry and that the resoure owner has room to remember the
+		 * pin.
 		 */
 		ReservePrivateRefCountEntry();
+		ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
 
 		/*
 		 * Select a victim buffer.  The buffer is returned with its header
@@ -1668,8 +1667,6 @@ ReleaseAndReadBuffer(Buffer buffer,
  * taking the buffer header lock; instead update the state variable in loop of
  * CAS operations. Hopefully it's just a single CAS.
  *
- * Note that ResourceOwnerEnlargeBuffers must have been done already.
- *
  * Returns true if buffer is BM_VALID, else false.  This provision allows
  * some callers to avoid an extra spinlock cycle.
  */
@@ -1680,6 +1677,8 @@ PinBuffer(BufferDesc *buf, BufferAccessStrategy strategy)
 	bool		result;
 	PrivateRefCountEntry *ref;
 
+	ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
+
 	ref = GetPrivateRefCountEntry(b, true);
 
 	if (ref == NULL)
@@ -1760,7 +1759,8 @@ PinBuffer(BufferDesc *buf, BufferAccessStrategy strategy)
  * The spinlock is released before return.
  *
  * As this function is called with the spinlock held, the caller has to
- * previously call ReservePrivateRefCountEntry().
+ * previously call ReservePrivateRefCountEntry() and
+ * ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
  *
  * Currently, no callers of this function want to modify the buffer's
  * usage_count at all, so there's no need for a strategy parameter.
@@ -1936,9 +1936,6 @@ BufferSync(int flags)
 	int			mask = BM_DIRTY;
 	WritebackContext wb_context;
 
-	/* Make sure we can handle the pin inside SyncOneBuffer */
-	ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
-
 	/*
 	 * Unless this is a shutdown checkpoint or we have been explicitly told,
 	 * we write only permanent, dirty buffers.  But at shutdown or end of
@@ -2412,9 +2409,6 @@ BgBufferSync(WritebackContext *wb_context)
 	 * requirements, or hit the bgwriter_lru_maxpages limit.
 	 */
 
-	/* Make sure we can handle the pin inside SyncOneBuffer */
-	ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
-
 	num_to_scan = bufs_to_lap;
 	num_written = 0;
 	reusable_buffers = reusable_buffers_est;
@@ -2496,8 +2490,6 @@ BgBufferSync(WritebackContext *wb_context)
  *
  * (BUF_WRITTEN could be set in error if FlushBuffer finds the buffer clean
  * after locking it, but we don't care all that much.)
- *
- * Note: caller must have done ResourceOwnerEnlargeBuffers.
  */
 static int
 SyncOneBuffer(int buf_id, bool skip_recently_used, WritebackContext *wb_context)
@@ -2507,7 +2499,9 @@ 

Re: row filtering for logical replication

2021-07-14 Thread Tomas Vondra

On 7/14/21 2:50 PM, Amit Kapila wrote:

On Wed, Jul 14, 2021 at 3:58 PM Tomas Vondra
 wrote:


On 7/14/21 7:39 AM, Amit Kapila wrote:

On Wed, Jul 14, 2021 at 6:28 AM Euler Taveira  wrote:


On Tue, Jul 13, 2021, at 6:06 PM, Alvaro Herrera wrote:

1. if you use REPLICA IDENTITY FULL, then the expressions would work
even if they use any other column with DELETE.  Maybe it would be
reasonable to test for this in the code and raise an error if the
expression requires a column that's not part of the replica identity.
(But that could be relaxed if the publication does not publish
updates/deletes.)



+1.


I thought about it but came to the conclusion that it doesn't worth it.  Even
with REPLICA IDENTITY FULL expression evaluates to false if the column allows
NULL values. Besides that REPLICA IDENTITY is changed via another DDL (ALTER
TABLE) and you have to make sure you don't allow changing REPLICA IDENTITY
because some row filter uses the column you want to remove from it.



Yeah, that is required but is it not feasible to do so?


2. For UPDATE, does the expression apply to the old tuple or to the new
tuple?  You say it's the new tuple, but from the user point of view I
think it would make more sense that it would apply to the old tuple.
(Of course, if you're thinking that the R.I. is the PK and the PK is
never changed, then you don't really care which one it is, but I bet
that some people would not like that assumption.)

New tuple. The main reason is that new tuple is always there for UPDATEs.



I am not sure if that is a very good reason to use a new tuple.



True. Perhaps we should look at other places with similar concept of
WHERE conditions and old/new rows, and try to be consistent with those?

I can think of:

1) updatable views with CHECK option

2) row-level security

3) triggers

Is there some reasonable rule which of the old/new tuples (or both) to
use for the WHERE condition? Or maybe it'd be handy to allow referencing
OLD/NEW as in triggers?



I think apart from the above, it might be good if we can find what
some other databases does in this regard?



Yeah, that might tell us what the users would like to do with it. I did 
some quick search, but haven't found much :-( The one thing I found is 
that Debezium [1] allows accessing both the "old" and "new" rows through 
value.before and value.after, and use both for filtering.


I haven't found much about how this works in other databases, sadly.

Perhaps the best way forward is to stick to the approach that INSERT 
uses new, DELETE uses old and UPDATE works as DELETE+INSERT (probably), 
and leave anything fancier (like being able to reference both versions 
of the row) for a future patch.



[1] 
https://wanna-joke.com/wp-content/uploads/2015/01/german-translation-comics-science.jpg


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: row filtering for logical replication

2021-07-14 Thread Tomas Vondra




On 7/14/21 4:01 PM, Alvaro Herrera wrote:

On 2021-Jul-14, Dilip Kumar wrote:


I think for insert we are only allowing those rows to replicate which
are matching filter conditions, so if we updating any row then also we
should maintain that sanity right? That means at least on the NEW rows
we should apply the filter, IMHO.  Said that, now if there is any row
inserted which were satisfying the filter and replicated, if we update
it with the new value which is not satisfying the filter then it will
not be replicated,  I think that makes sense because if an insert is
not sending any row to a replica which is not satisfying the filter
then why update has to do that, right?


Right, that's a good aspect to think about.



I agree, that seems like a reasonable approach.

The way I'm thinking about this is that for INSERT and DELETE it's clear 
which row version should be used (because there's just one). And for 
UPDATE we could see that as DELETE + INSERT, and apply the same rule to 
each action.


On the other hand, I can imagine cases where it'd be useful to send the 
UPDATE when the old row matches the condition and new row does not.



I think the guiding principle for which tuple to use for the filter is
what is most useful to the potential user of the feature, rather than
what is the easiest to implement.



+1

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Introduce pg_receivewal gzip compression tests

2021-07-14 Thread gkokolatos


‐‐‐ Original Message ‐‐‐

On Wednesday, July 14th, 2021 at 04:17, Michael Paquier  
wrote:

> On Tue, Jul 13, 2021 at 11:16:06AM +, gkokola...@pm.me wrote:
> > Agreed. For the record that is why I said v6 :)
> Okay, thanks.

Please find v6 attached.

Cheers,
//Georgios

> ---
>
> Michael

v6-0001-Introduce-pg_receivewal-gzip-compression-tests.patch
Description: Binary data


Re: ResourceOwner refactoring

2021-07-14 Thread Alvaro Herrera
On 2021-Jul-14, vignesh C wrote:

> On Tue, Mar 9, 2021 at 6:10 PM Heikki Linnakangas  wrote:

> > Here you go.
> 
> The patch does not apply on Head anymore, could you rebase and post a
> patch. I'm changing the status to "Waiting for Author".

Support for hmac was added by e6bdfd9700eb so the rebase is not trivial.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"Ellos andaban todos desnudos como su madre los parió, y también las mujeres,
aunque no vi más que una, harto moza, y todos los que yo vi eran todos
mancebos, que ninguno vi de edad de más de XXX años" (Cristóbal Colón)




Re: Unused function parameter in get_qual_from_partbound()

2021-07-14 Thread John Naylor
On Mon, Jul 12, 2021 at 8:46 AM John Naylor 
wrote:
>
> On Tue, Jun 8, 2021 at 10:50 PM Michael Paquier 
wrote:
> >
> > At first glance, this looked to me like breaking something just for
> > sake of breaking it, but removing the rel argument could be helpful
> > to simplify any external code calling it as there would be no need for
> > this extra Relation.  So that looks like a good idea, no need to rush
> > that into 14 though.
>
> I found no external references in codesearch.debian.net. I plan to commit
this in the next couple of days unless there are objections.

This has been committed.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: row filtering for logical replication

2021-07-14 Thread Alvaro Herrera
On 2021-Jul-14, Dilip Kumar wrote:

> I think for insert we are only allowing those rows to replicate which
> are matching filter conditions, so if we updating any row then also we
> should maintain that sanity right? That means at least on the NEW rows
> we should apply the filter, IMHO.  Said that, now if there is any row
> inserted which were satisfying the filter and replicated, if we update
> it with the new value which is not satisfying the filter then it will
> not be replicated,  I think that makes sense because if an insert is
> not sending any row to a replica which is not satisfying the filter
> then why update has to do that, right?

Right, that's a good aspect to think about.

I think the guiding principle for which tuple to use for the filter is
what is most useful to the potential user of the feature, rather than
what is the easiest to implement.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"La libertad es como el dinero; el que no la sabe emplear la pierde" (Álvarez)




Re: Toast compression method options

2021-07-14 Thread Dilip Kumar
On Tue, Jun 22, 2021 at 1:37 PM Michael Paquier  wrote:
>
> On Tue, Jun 22, 2021 at 11:05:22AM +0530, Dilip Kumar wrote:
> > IMHO there is certainly a use case, basically, if we compress the data
> > so that we can avoid storing it externally.  Now suppose for some
> > data, with default LZ4 compression, the compression ratio is so high
> > that you are able to compress to the size which is way under the
> > limit.  So for such data, the acceleration can be increased such that
> > compression is fast and compression ratio is good enough that it is
> > not going to the external storage.  I agree it will be difficult for
> > the user to make such a decision and select the acceleration value but
> > based on the data pattern and their compressed length the admin can
> > make such a decision.  So in short select the acceleration value such
> > that you can compress it fast and the compression ratio is good enough
> > to keep it from storing externally.
>
> Theoritically, I agree that there could be a use case, and that was
> the point I was trying to outline above.  My point is more from a
> practical point of view.  LZ4 is designed to be fast and cheap in CPU
> with a rather low compression ratio compared to other modern algos.
>
> Could it be possible to think about some worst cases where one may
> want to reduce its compression to save some CPU?  The point, as you
> say, to allow a tuning of the acceleration would be that one may want
> to save a bit of CPU and does not care about the extra disk space it
> takes.  Still, I am wondering why one would not just store the values
> externally in such cases and just save as much compression effort as
> possible.

Well, that actually depends upon the data, basically, LZ4 acceleration
searches in wider increments, which may reduce the number of potential
matches but increase the speed.  So based on the actual data pattern
it is highly possible that you get the speed benefit without losing
much or nothing on the compression ratio.  So IMHO, this is user
exposed option so based on the user's data pattern why it is not wise
to provide an option for the user to give the acceration when the user
is sure that selecting a better speed will not harm anything on
compression ratio for their data pattern.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [PATCH] Partial foreign key updates in referential integrity triggers

2021-07-14 Thread Peter Eisentraut



On 05.01.21 22:40, Paul Martinez wrote:

I've created a patch to better support referential integrity constraints when
using composite primary and foreign keys. This patch allows creating a foreign
key using the syntax:

   FOREIGN KEY (tenant_id, fk_id) REFERENCES fktable ON DELETE SET NULL (fk_id)

which means that only the fk_id column will be set to NULL when the referenced
row is deleted, rather than both the tenant_id and fk_id columns.


I think this is an interesting feature with a legitimate use case.

I'm wondering a bit about what the ON UPDATE side of this is supposed to 
mean.  Consider your example:



CREATE TABLE tenants (id serial PRIMARY KEY);
CREATE TABLE users (
   tenant_id int REFERENCES tenants ON DELETE CASCADE,
   id serial,
   PRIMARY KEY (tenant_id, id),
);
CREATE TABLE posts (
 tenant_id int REFERENCES tenants ON DELETE CASCADE,
 id serial,
 author_id int,
 PRIMARY KEY (tenant_id, id),
 FOREIGN KEY (tenant_id, author_id) REFERENCES users ON DELETE SET NULL
);

INSERT INTO tenants VALUES (1);
INSERT INTO users VALUES (1, 101);
INSERT INTO posts VALUES (1, 201, 101);
DELETE FROM users WHERE id = 101;
ERROR:  null value in column "tenant_id" violates not-null constraint
DETAIL:  Failing row contains (null, 201, null).


Consider what should happen when you update users.id.  Per SQL standard, 
for MATCH SIMPLE an ON UPDATE SET NULL should only set to null the 
referencing column that corresponds to the referenced column actually 
updated, not all of them.  PostgreSQL doesn't do this, but if it did, 
then this would work just fine.


Your feature requires specifying a fixed column or columns to update, so 
it cannot react differently to what column actually updated.  In fact, 
you might want different referential actions depending on what columns 
are updated, like what you can do with general triggers.


So, unless I'm missing an angle here, I would suggest leaving out the ON 
UPDATE variant of this feature.





Re: [PATCH] Hooks at XactCommand level

2021-07-14 Thread Gilles Darold

Hi,


I have renamed the patch and the title of this proposal registered in 
the commitfest "Xact/SubXact event callback at command start" to reflect 
the last changes that do not include new hooks anymore.



Here is the new description corresponding to the current patch.


This patch allow to execute user-defined code for the start of any 
command through a xact registered callback. It introduce two new events 
in XactEvent and SubXactEvent enum called respectively 
XACT_EVENT_COMMAND_START and SUBXACT_EVENT_COMMAND_START. The callback 
is not called if a transaction is not started.



The objective of this new callback is to be able to call user-defined 
code before any new statement is executed. For example it can call a 
rollback to savepoint if there was an error in the previous transaction 
statement, which allow to implements Rollback at Statement Level at 
server side using a PostgreSQL extension, see [1] .



The patch compile and regressions tests with assert enabled passed 
successfully.


There is no regression test for this feature but extension at [1] has 
been used for validation of the new callback.



The patch adds insignificant overhead by looking at an existing callback 
definition but clearly it is the responsibility to the developer to 
evaluate the performances impact of its user-defined code for this 
callback as it will be called before each statement. Here is a very 
simple test using pgbench -c 20 -j 8 -T 30


    tps = 669.930274 (without user-defined code)
    tps = 640.718361 (with user-defined code from extension [1])

the overhead for this extension is around 4.5% which I think is not so 
bad good for such feature (internally it adds calls to RELEASE + 
SAVEPOINT before each write statement execution and in case of error a 
ROLLBACK TO savepoint).



[1] https://github.com/darold/pg_statement_rollbackv2

--
Gilles Darold
http://www.darold.net/

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 441445927e..3b5f6bfc2d 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -6118,3 +6118,32 @@ MarkSubTransactionAssigned(void)
 
 	CurrentTransactionState->assigned = true;
 }
+
+/*
+ * CallXactStartCommand
+ *
+ * Wrapper over CallXactCallbacks or CallSubXactCallbacks called in postgres.c
+ * at end of start_xact_command(). It allows to user-defined code to be executed
+ * for the start of any command through a xact registered callback. This function
+ * do nothing if a transaction is not started.
+ *
+ * The related events XactEvent/SubXactEvent are XACT_EVENT_COMMAND_START and
+ * SUBXACT_EVENT_COMMAND_START.
+ */
+void
+CallXactStartCommand(void)
+{
+	TransactionState s = CurrentTransactionState;
+
+	if (s->blockState == TBLOCK_DEFAULT || s->blockState == TBLOCK_STARTED)
+		return;
+
+	/*
+	 * Call start-of-xact callbacks with start command event
+	 */
+	if (s->parent && s->subTransactionId)
+		CallSubXactCallbacks(SUBXACT_EVENT_COMMAND_START, s->subTransactionId,
+			 s->parent->subTransactionId);
+	else
+		CallXactCallbacks(XACT_EVENT_COMMAND_START);
+}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8cea10c901..a0f4a17c51 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2708,6 +2708,16 @@ start_xact_command(void)
 		!get_timeout_active(CLIENT_CONNECTION_CHECK_TIMEOUT))
 		enable_timeout_after(CLIENT_CONNECTION_CHECK_TIMEOUT,
 			 client_connection_check_interval);
+
+	/*
+	 * Instruct registered callbacks on xact that a new command is to be
+	 * executed. It allows to execute user-defined code before any new
+	 * statement is executed. This is the responsability of the user code
+	 * to take care of the state of the transaction, it can be in an ABORT
+	 * state. The related xact events are XACT_EVENT_COMMAND_START and
+	 * SUBXACT_EVENT_COMMAND_START.
+	 */
+	CallXactStartCommand();
 }
 
 static void
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 134f6862da..190fc7151b 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -119,7 +119,8 @@ typedef enum
 	XACT_EVENT_PREPARE,
 	XACT_EVENT_PRE_COMMIT,
 	XACT_EVENT_PARALLEL_PRE_COMMIT,
-	XACT_EVENT_PRE_PREPARE
+	XACT_EVENT_PRE_PREPARE,
+	XACT_EVENT_COMMAND_START
 } XactEvent;
 
 typedef void (*XactCallback) (XactEvent event, void *arg);
@@ -129,7 +130,8 @@ typedef enum
 	SUBXACT_EVENT_START_SUB,
 	SUBXACT_EVENT_COMMIT_SUB,
 	SUBXACT_EVENT_ABORT_SUB,
-	SUBXACT_EVENT_PRE_COMMIT_SUB
+	SUBXACT_EVENT_PRE_COMMIT_SUB,
+	SUBXACT_EVENT_COMMAND_START
 } SubXactEvent;
 
 typedef void (*SubXactCallback) (SubXactEvent event, SubTransactionId mySubid,
@@ -467,4 +469,6 @@ extern void EnterParallelMode(void);
 extern void ExitParallelMode(void);
 extern bool IsInParallelMode(void);
 
+extern void CallXactStartCommand(void);
+
 #endif			/* XACT_H */


RE: Added schema level support for publication.

2021-07-14 Thread houzj.f...@fujitsu.com
Wednesday, July 14, 2021 6:17 PM vignesh C  wrote:
> On Tue, Jul 13, 2021 at 12:06 PM tanghy.f...@fujitsu.com
>  wrote:
> >
> > On Monday, July 12, 2021 5:36 PM vignesh C  wrote:
> > >
> > > Thanks for reporting this issue, this issue is fixed in the v10
> > > patch attached at [1].
> > > [1] - https://www.postgresql.org/message-id/CALDaNm2%2BtR%2B8R-
> > > sD1CSyMbZcZbkintZE-avefjsp7LCkm6HMmw%40mail.gmail.com
> >
> > Thanks for fixing it.
> >
> > By applying your V10 patch, I saw three problems, please have a look.
> >
> > 1. An issue about pg_dump.
> > When public schema was published, the publication was created in the
> > output file, but public schema was not added to it. (Other schemas
> > could be added as expected.)
> >
> > I looked into it and found that selectDumpableNamespace function marks
> DUMP_COMPONENT_DEFINITION as needless when the schema is public,
> leading to schema public is ignored in getPublicationSchemas. So we'd better
> check whether schemas should be dumped in another way.
> >
> > I tried to fix it with the following change, please have a look.
> > (Maybe we also need to add some comments for it.)
> >
> > diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
> > index f6b4f12648..a327d2568b 100644
> > --- a/src/bin/pg_dump/pg_dump.c
> > +++ b/src/bin/pg_dump/pg_dump.c
> > @@ -4206,7 +4206,8 @@ getPublicationSchemas(Archive *fout,
> NamespaceInfo nspinfo[], int numSchemas)
> >  * Ignore publication membership of schemas whose
> definitions are not
> >  * to be dumped.
> >  */
> > -   if (!(nsinfo->dobj.dump &
> DUMP_COMPONENT_DEFINITION))
> > +   if (!((nsinfo->dobj.dump &
> DUMP_COMPONENT_DEFINITION)
> > +   || (strcmp(nsinfo->dobj.name, "public") == 0
> > + && nsinfo->dobj.dump != DUMP_COMPONENT_NONE)))
> > continue;
> >
> > pg_log_info("reading publication membership for schema
> > \"%s\"",
> 
> I felt it is intentionally done like that as the pubic schema is created by 
> default,
> hence it is not required to dump else we will get errors while restoring.
> Thougths?

Thanks for the new patches and I also looked at this issue.

For user defined schema and publication:
--
create schema s1;
create publication pub2 for SCHEMA s1;
--

pg_dump will only generate the following SQLs:

--pg_dump result--
CREATE PUBLICATION pub2 WITH (publish = 'insert, update, delete, truncate');
ALTER PUBLICATION pub2 ADD SCHEMA s1;
--

But for the public schema:
--
create publication pub for SCHEMA public;
--

pg_dump will only generate the following SQL:

--pg_dump result--
CREATE PUBLICATION pub WITH (publish = 'insert, update, delete, truncate');
--

It didn't generate SQL like "ALTER PUBLICATION pub ADD SCHEMA public;" which
means the public schema won't be published after restoring. So, I think we'd
better let the pg_dump generate the ADD SCHEMA public SQL. Thoughts ?


Best regards,
Hou zhijie


Re: row filtering for logical replication

2021-07-14 Thread Amit Kapila
On Wed, Jul 14, 2021 at 3:58 PM Tomas Vondra
 wrote:
>
> On 7/14/21 7:39 AM, Amit Kapila wrote:
> > On Wed, Jul 14, 2021 at 6:28 AM Euler Taveira  wrote:
> >>
> >> On Tue, Jul 13, 2021, at 6:06 PM, Alvaro Herrera wrote:
> >>
> >> 1. if you use REPLICA IDENTITY FULL, then the expressions would work
> >> even if they use any other column with DELETE.  Maybe it would be
> >> reasonable to test for this in the code and raise an error if the
> >> expression requires a column that's not part of the replica identity.
> >> (But that could be relaxed if the publication does not publish
> >> updates/deletes.)
> >>
> >
> > +1.
> >
> >> I thought about it but came to the conclusion that it doesn't worth it.  
> >> Even
> >> with REPLICA IDENTITY FULL expression evaluates to false if the column 
> >> allows
> >> NULL values. Besides that REPLICA IDENTITY is changed via another DDL 
> >> (ALTER
> >> TABLE) and you have to make sure you don't allow changing REPLICA IDENTITY
> >> because some row filter uses the column you want to remove from it.
> >>
> >
> > Yeah, that is required but is it not feasible to do so?
> >
> >> 2. For UPDATE, does the expression apply to the old tuple or to the new
> >> tuple?  You say it's the new tuple, but from the user point of view I
> >> think it would make more sense that it would apply to the old tuple.
> >> (Of course, if you're thinking that the R.I. is the PK and the PK is
> >> never changed, then you don't really care which one it is, but I bet
> >> that some people would not like that assumption.)
> >>
> >> New tuple. The main reason is that new tuple is always there for UPDATEs.
> >>
> >
> > I am not sure if that is a very good reason to use a new tuple.
> >
>
> True. Perhaps we should look at other places with similar concept of
> WHERE conditions and old/new rows, and try to be consistent with those?
>
> I can think of:
>
> 1) updatable views with CHECK option
>
> 2) row-level security
>
> 3) triggers
>
> Is there some reasonable rule which of the old/new tuples (or both) to
> use for the WHERE condition? Or maybe it'd be handy to allow referencing
> OLD/NEW as in triggers?
>

I think apart from the above, it might be good if we can find what
some other databases does in this regard?

-- 
With Regards,
Amit Kapila.




Re: 回复:Re: Cache relation sizes?

2021-07-14 Thread Anastasia Lubennikova
On Wed, Jun 16, 2021 at 9:24 AM Thomas Munro  wrote:

> No change yet, just posting a rebase to keep cfbot happy.
>
>
Hi, Thomas

I think that the proposed feature is pretty cool not only because it fixes
some old issues with lseek() performance and reliability, but also because
it opens the door to some new features, such as disk size quota management
[1]. Cheap up-to-date size tracking is one of the major problems for it.

I've only read the 1st patch so far. Here are some thoughts about it:

- SMgrSharedRelation - what about calling it SMgrShmemRelation. It looks
weird too, but "...SharedRelation" makes me think of shared catalog tables.

-  We can exclude temp relations from consideration as they are never
shared and use RelFileNode instead of RelFileNodeBackend in
SMgrSharedRelationMapping.
Not only it will save us some space, but also it will help to avoid a
situation when temp relations occupy whole cache (which may easily happen
in some workloads).
I assume you were trying to make a generic solution, but in practice, temp
rels differ from regular ones and may require different optimizations.
Local caching will be enough for them if ever needed, for example, we can
leave smgr_cached_nblocks[] for temp relations.

>  int smgr_shared_relations = 1000;
- How bad would it be to keep cache for all relations?  Let's test memory
overhead on some realistic datasets. I suppose this 1000 default was chosen
just for WIP patch.
I wonder if we can use DSM instead of regular shared memory?

- I'd expect that CREATE DATABASE and ALTER DATABASE SET TABLESPACE require
special treatment, because they bypass smgr, just like dropdb. Don't see it
in the patch.

[1] https://github.com/greenplum-db/diskquota

-- 
Best regards,
Lubennikova Anastasia


Re: logical replication empty transactions

2021-07-14 Thread Ajin Cherian
On Thu, May 27, 2021 at 8:58 PM vignesh C  wrote:

> Thanks for the updated patch, few comments:
> 1) I'm not sure if we could add some tests for skip empty
> transactions, if possible add a few tests.
>
Added a few tests for prepared transactions as well as the existing
test in 020_messages.pl also tests regular transactions.

> 2) We could add some debug level log messages for the transaction that
> will be skipped.

Added.

>
> 3) You could keep this variable below the other bool variables in the 
> structure:
> +   boolsent_begin_txn; /* flag indicating whether begin
> +
>   * has already been sent */
> +

I've moved this variable around, so this comment no longer is valid.

>
> 4) You can split the comments to multi-line as it exceeds 80 chars
> +   /* output BEGIN if we haven't yet, avoid for streaming and
> non-transactional messages */
> +   if (!data->sent_begin_txn && !in_streaming && transactional)
> +   pgoutput_begin(ctx, txn);

Done.

I've had to rebase the patch after a recent commit by Amit Kapila of
supporting two-phase commits in pub-sub [1].
Also I've modified the patch to also skip replicating empty prepared
transactions. Do let me know if you have any comments.

regards,
Ajin Cherian
Fujitsu Australia
[1]- 
https://www.postgresql.org/message-id/CAHut+PueG6u3vwG8DU=JhJiWa2TwmZ=bdqpchzkbky7ykza...@mail.gmail.com


v7-0001-Skip-empty-transactions-for-logical-replication.patch
Description: Binary data


SI messages sent when excuting ROLLBACK PREPARED command

2021-07-14 Thread liuhuail...@fujitsu.com
Hi

When reading the code FinishPreparedTransaction, I found that SI messages are 
sent 
when executing ROLLBACK PREPARED command.

But according to AtEOXact_Inval function, we send the SI messages only when 
committing the transaction .

So, I think we needn't send SI messags when rollbacking the two-phase 
transaction.
Or Does it has something special because of two-phase transaction?


Regards



Re: ResourceOwner refactoring

2021-07-14 Thread vignesh C
On Tue, Mar 9, 2021 at 6:10 PM Heikki Linnakangas  wrote:
>
> On 08/03/2021 18:47, Ibrar Ahmed wrote:
> > The patchset does not apply successfully, there are some hunk failures.
> >
> > http://cfbot.cputube.org/patch_32_2834.log
> > 
> >
> > v6-0002-Make-resowners-more-easily-extensible.patch
> >
> > 1 out of 6 hunks FAILED -- saving rejects to file
> > src/backend/utils/cache/plancache.c.rej
> > 2 out of 15 hunks FAILED -- saving rejects to file
> > src/backend/utils/resowner/resowner.c.rej
> >
> >
> > Can we get a rebase?
>
> Here you go.

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh




  1   2   >