date:20211130

On Wed, Dec 01, 2021 at 12:22:30PM +0530, Bharath Rupireddy wrote:
> It seems like there's a following typo in code comments:
> - /* determine how many segments slots can be kept by slots */
> + /* determine how many segments can be kept by slots */
> 
> Attaching a tiny patch to fix it. This typo exists all the way until PG 13.

Indeed, thanks.  I'll fix in a bit.
--
Michael


signature.asc
Description: PGP signature

Re: pg_upgrade test for binary compatibility of core data types

On Thu, Nov 18, 2021 at 03:58:18PM +0900, Michael Paquier wrote:
> +   ver >= 905 AND ver <= 1300 AS oldpgversion_95_13,
> +   ver >= 906 AND ver <= 1300 AS oldpgversion_96_13,
> +   ver >= 906 AND ver <= 1000 AS oldpgversion_96_10,
> So here, we have the choice between conditions that play with version
> ranges or we could make those checks simpler but compensate with a set
> of IF EXISTS queries.  I think that your choice is right.  The
> buildfarm mixes both styles to compensate with the cases where the
> objects are created after a drop.

So, I have come back to this part of the patch set, that moves the SQL
queries doing the pre-upgrade cleanups in the old version we upgrade
from, and decided to go with what looks like the simplest approach,
relying on some IFEs depending on the object types if they don't
exist for some cases.

While checking the whole thing, I have noticed that some of the
operations were not really necessary.  The result is rather clean now,
with a linear organization of the version logic, so as it is a
no-brainer to get that done in back-branches per the
backward-compatibility argument.

I'll get that done down to 10 to maximize its influence, then I'll
move on with the buildfarm code and send a patch to plug this and
reduce the dependencies between core and the buildfarm code.
--
Michael
From 9bfe54d8867c9c05a36976f01ed65e5b8da442f7 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 1 Dec 2021 16:08:01 +0900
Subject: [PATCH] Move SQLs for cleanups before cross-version upgrades into a
 new file

The plan is to make the buildfarm code re-use this code, and test.sh
held a duplicated logic for this work.  Separating those SQLs into a new
file with a set of \if clauses to do version checks with the old version
upgrading will allow the buildfarm to reuse that.  An extra
simplification is that committers will be able to control the objects
cleaned up without any need to tweak the buldfarm code, at least for the
main regression test suite.

Backpatch down to 10, to maximize its effects.
---
 src/bin/pg_upgrade/test.sh   | 52 ++--
 src/bin/pg_upgrade/upgrade_adapt.sql | 92 
 2 files changed, 97 insertions(+), 47 deletions(-)
 create mode 100644 src/bin/pg_upgrade/upgrade_adapt.sql

diff --git a/src/bin/pg_upgrade/test.sh b/src/bin/pg_upgrade/test.sh
index 8593488907..54c02bc65b 100644
--- a/src/bin/pg_upgrade/test.sh
+++ b/src/bin/pg_upgrade/test.sh
@@ -181,53 +181,11 @@ if "$MAKE" -C "$oldsrc" installcheck-parallel; then
 	# Before dumping, tweak the database of the old instance depending
 	# on its version.
 	if [ "$newsrc" != "$oldsrc" ]; then
-		fix_sql=""
-		# Get rid of objects not feasible in later versions
-		case $oldpgversion in
-			804??)
-fix_sql="DROP FUNCTION public.myfunc(integer);"
-;;
-		esac
-
-		# Last appeared in v9.6
-		if [ $oldpgversion -lt 10 ]; then
-			fix_sql="$fix_sql
-	 DROP FUNCTION IF EXISTS
-		public.oldstyle_length(integer, text);"
-		fi
-		# Last appeared in v13
-		if [ $oldpgversion -lt 14 ]; then
-			fix_sql="$fix_sql
- DROP FUNCTION IF EXISTS
-	public.putenv(text);	-- last in v13
- DROP OPERATOR IF EXISTS	-- last in v13
-	public.#@# (pg_catalog.int8, NONE),
-	public.#%# (pg_catalog.int8, NONE),
-	public.!=- (pg_catalog.int8, NONE),
-	public.#@%# (pg_catalog.int8, NONE);"
-		fi
-		psql -X -d regression -c "$fix_sql;" || psql_fix_sql_status=$?
-
-		# WITH OIDS is not supported anymore in v12, so remove support
-		# for any relations marked as such.
-		if [ $oldpgversion -lt 12 ]; then
-			fix_sql="DO \$stmt\$
-DECLARE
-	rec text;
-BEGIN
-FOR rec in
-	SELECT oid::regclass::text
-	FROM pg_class
-	WHERE relname !~ '^pg_'
-		AND relhasoids
-		AND relkind in ('r','m')
-	ORDER BY 1
-LOOP
-	execute 'ALTER TABLE ' || rec || ' SET WITHOUT OIDS';
-END LOOP;
-END; \$stmt\$;"
-			psql -X -d regression -c "$fix_sql;" || psql_fix_sql_status=$?
-		fi
+		# This SQL script has its own idea of the cleanup that needs to be
+		# done and embeds version checks.  Note that this uses the script
+		# stored on the new branch.
+		psql -X -d regression -f "$newsrc/src/bin/pg_upgrade/upgrade_adapt.sql" \
+			|| psql_fix_sql_status=$?
 
 		# Handling of --extra-float-digits gets messy after v12.
 		# Note that this changes the dumps from the old and new
diff --git a/src/bin/pg_upgrade/upgrade_adapt.sql b/src/bin/pg_upgrade/upgrade_adapt.sql
new file mode 100644
index 00..175b2ebe2e
--- /dev/null
+++ b/src/bin/pg_upgrade/upgrade_adapt.sql
@@ -0,0 +1,92 @@
+--
+-- SQL queries for upgrade tests across different major versions.
+--
+-- This file includes a set of SQL queries to make a cluster to-be-upgraded
+-- compatible with the version this file is based on.  Note that this
+-- requires psql, as per-version queries are controlled with a set of \if
+-- clauses.
+
+-- This script is backward-compatible, so it is

Re: Commitfest 2021-11 Patch Triage - Part 1

2021-11-30 Thread Yugo NAGATA

On Mon, 29 Nov 2021 09:03:06 -0300
Marcos Pegoraro  wrote:

> >
> > > 2138: Incremental Materialized View Maintenance
> >
> > I've responded to it in the following thread, and described the recent
> > discussions,
> > current status, summary of IVM  feature and design, and past discussions.
> >
> 
> IVM is a wonderful feature, but some features were omitted because of
> its complexity, so maybe IVM will spend more time to completely solve what
> it wants to do.
> I did not understand, and didn´t find on docs, if a Materialized View is a
> table, why I cannot change a specific record ?
> Because if I have a way to create and refresh the entire view and update a
> single record it would give me all power of IVM is trying to.

I think the reason why we can't update a materialized view directly is because
it is basically a "view" and it should not contains any data irrelevant to its
definition and underlying tables. If we would have a feature to update a
materialized view direcly,  maybe, it should behave as updatable-view as well
as normal (virtual) views, although I am not sure

-- 
Yugo NAGATA

Re: pg_replslotdata - a tool for displaying replication slot information

2021-11-30 Thread SATYANARAYANA NARLAPURAM

On Tue, Nov 30, 2021 at 9:47 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Wed, Dec 1, 2021 at 12:13 AM Bossart, Nathan 
> wrote:
> >
> > On 11/30/21, 6:14 AM, "Peter Eisentraut" <
> peter.eisentr...@enterprisedb.com> wrote:
> > > On 23.11.21 06:09, Bharath Rupireddy wrote:
> > >> The replication slots data is stored in binary format on the disk
> under
> > >> the pg_replslot/<> directory which isn't human readable. If
> > >> the server is crashed/down (for whatever reasons) and unable to come
> up,
> > >> currently there's no way for the user/admin/developer to know what
> were
> > >> all the replication slots available at the time of server crash/down
> to
> > >> figure out what's the restart lsn, xid, two phase info or types of
> slots
> > >> etc.
> > >
> > > What do you need that for?  You can't do anything with a replication
> > > slot while the server is down.
> >
> > One use-case might be to discover the value you need to set for
> > max_replication_slots, although it's pretty trivial to discover the
> > number of replication slots by looking at the folder directly.
>
> Apart from the above use-case, one can do some exploratory analysis on
> the replication slot information after the server crash, this may be
> useful for RCA or debugging purposes, for instance:
> 1) to look at the restart_lsn of the slots to get to know why there
> were many WAL files filled up on the disk (because of the restart_lsn
> being low)
>

In a disk full scenario because of WAL, this tool comes handy identifying
which WAL files to delete  to free up the space and also help assess the
accidental delete of the WAL files. I am not sure if there is a tool to
help cleanup the WAL (may be invoking the archive_command too?) without
impacting physical / logical slots, and respecting last checkpoint location
but if one exist that will be handy

Is there any documentation on how to correctly create extensions in HA(primary-standby) setup?

Hi,

I recently did a small experiment to see how one can create extensions
properly in HA(primary-standby) setup.

Here are my findings:
1) ALTER SYSTEM SET or GUC(configuration parameters) settings are not
replicated to standby.
2) CREATE EXTENSION statements are replicated to standby.
3) If the extension doesn't need to be set up in
shared_preload_libraries GUC, no need to create extension on the
standby, it just works.
4) If the extension needs to be set up in shared_preload_libraries
GUC: the correct way to install the extension on both primary and
standby is:
a) set shared_preload_libraries GUC on primary, reload conf,
restart the primary to make the GUC effective.
b) set shared_preload_libraries GUC on standby, restart the
standby to make the GUC effective.
c) create extension on primary (we don't need to create extension
on standby as the create extension statements are replicated).
d) verify that the extension functions work on both primary and standby.
5) The extensions which perform writes to the database may not work on
standby as the write transactions are not allowed on the standby.
However, the create extension on the standby works just fine but the
functions it provides may not work.

I think I was successful in my experiment, please let me know if
anything is wrong in what I did.

Do we have the documentation on how to create extensions correctly in
HA setup? If what I did is correct and we don't have it documented,
can we have it somewhere in the existing HA related documentation?

Regards,
Bharath Rupireddy.

Re: pg_get_publication_tables() output duplicate relid

On Mon, Nov 29, 2021 at 2:37 PM houzj.f...@fujitsu.com
 wrote:
>
> On Wed, Nov 24, 2021 4:48 PM Amit Kapila 
> > On Mon, Nov 22, 2021 at 12:55 PM Amit Langote 
> > wrote:
> > >
> > > On Fri, Nov 19, 2021 at 2:28 PM Amit Kapila 
> > wrote:
> > > > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote 
> > wrote:
> > > > >  As in,
> > > > > do we know of any replication (initial/streaming) misbehavior
> > > > > caused by the duplicate partition OIDs in this case or is the only
> > > > > problem that pg_publication_tables output looks odd?
> > > >
> > > > The latter one but I think either we should document this or change
> > > > it as we can't assume users will follow what subscriber-side code does.
> > >
> > > On second thought, I agree that de-duplicating partitions from this
> > > view is an improvement.
> > >
> >
> > Fair enough. Hou-San, Can you please submit the updated patch after fixing
> > any pending comments including the test case?
>
> Attach the updated patch which address the comments so far.
>
> The patch only adds a testcase in publication.sql because the duplicate
> output doesn't cause unexpected behavior in pub-sub test.
>

Thanks, the patch looks good to me. I have slightly changed the commit
message in the attached. I would like to commit this only in HEAD as
there is no user complaint about this and it might not stop any user's
service unless it relies on this view's output for the initial table
synchronization.

What do you think? Bharath/Amit L., others, do you have any opinion on
this matter?

-- 
With Regards,
Amit Kapila.


v3-0001-De-duplicate-the-result-of-pg_publication_tables-.patch
Description: Binary data

fix a typo in slotfuncs.c

Hi,

It seems like there's a following typo in code comments:
- /* determine how many segments slots can be kept by slots */
+ /* determine how many segments can be kept by slots */

Attaching a tiny patch to fix it. This typo exists all the way until PG 13.

Regards,
Bharath Rupireddy.


v1-0001-fix-a-typo-in-slotfuncs.c.patch
Description: Binary data

Re: suboverflowed subtransactions concurrency performance optimize

2021-11-30 Thread Andrey Borodin




> 30 нояб. 2021 г., в 17:19, Simon Riggs  
> написал(а):
> 
> On Mon, 30 Aug 2021 at 11:25, Andrey Borodin  wrote:
>> 
>> Hi Pengcheng!
>> 
>> You are solving important problem, thank you!
>> 
>>> 30 авг. 2021 г., в 13:43, Pengchengliu  написал(а):
>>> 
>>> To resolve this performance problem, we think about a solution which cache
>>> SubtransSLRU to local cache.
>>> First we can query parent transaction id from SubtransSLRU, and copy the
>>> SLRU page to local cache page.
>>> After that if we need query parent transaction id again, we can query it
>>> from local cache directly.
>> 
>> A copy of SLRU in each backend's cache can consume a lot of memory.
> 
> Yes, copying the whole SLRU into local cache seems overkill.
> 
>> Why create a copy if we can optimise shared representation of SLRU?
> 
> transam.c uses a single item cache to prevent thrashing from repeated
> lookups, which reduces problems with shared access to SLRUs.
> multitrans.c also has similar.
> 
> I notice that subtrans. doesn't have this, but could easily do so.
> Patch attached, which seems separate to other attempts at tuning.
I think this definitely makes sense to do.


> On review, I think it is also possible that we update subtrans ONLY if
> someone uses >PGPROC_MAX_CACHED_SUBXIDS.
> This would make subtrans much smaller and avoid one-entry-per-page
> which is a major source of cacheing.
> This would means some light changes in GetSnapshotData().
> Let me know if that seems interesting also?

I'm afraid of unexpected performance degradation. When the system runs fine, 
you provision a VM of some vCPU\RAM, and then some backend uses a little more 
than 64 subtransactions and all the system is stuck. Or will it affect only 
backend using more than 64 subtransactions?

Best regards, Andrey Borodin.

Re: pg_get_publication_tables() output duplicate relid

On Fri, Nov 26, 2021 at 8:38 AM Amit Kapila  wrote:
>
> On Fri, Nov 26, 2021 at 7:10 AM houzj.f...@fujitsu.com
>  wrote:
> >
> > On Thursday, November 25, 2021 4:57 PM Amit Kapila 
> >  wrote:
> > > On Thu, Nov 25, 2021 at 1:30 PM Amit Langote 
> > > >
> > > > I agree with backpatching the doc fix.  I've attached a diff against
> > > > master, though it also appears to apply to 13 and 14 branches.
> > > >
> > >
> > > I think we can  for publish_via_partition_root, other 
> > > than that
> > > the patch looks good to me.
> > >
> > > Hou-San, others, do you have any opinion about this patch and whether to
> > > backpatch it or not?
> >
> > I think it makes sense to backpatch the doc fix, and the patch looks good 
> > to me.
> >
>
> Thanks, I'll push this sometime early next week unless there are any
> objections to it.
>

Pushed.

-- 
With Regards,
Amit Kapila.

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2021-11-30 Thread Greg Nancarrow

On Thu, Nov 25, 2021 at 10:17 PM Dilip Kumar  wrote:
>
> Thanks for the review and many valuable comments, I have fixed all of
> them except this comment (/* If we got a cancel signal during the copy
> of the data, quit */) because this looks fine to me.  0007, I have
> dropped from the patchset for now.  I have also included fixes for
> comments given by John.
>

I found the following issue with the patches applied:

A server crash occurs after the following sequence of commands:

create tablespace tbsp1 location '/tbsp1';
create tablespace tbsp2 location '/tbsp2';
create database test1 tablespace tbsp1;
create database test2 template test1 tablespace tbsp2;
alter database test2 set tablespace tbsp1;
checkpoint;

The following type of message is seen in the server log:

2021-12-01 16:48:26.623 AEDT [67423] PANIC:  could not fsync file
"pg_tblspc/16385/PG_15_202111301/16387/3394": No such file or
directory
2021-12-01 16:48:27.228 AEDT [67422] LOG:  checkpointer process (PID
67423) was terminated by signal 6: Aborted
2021-12-01 16:48:27.228 AEDT [67422] LOG:  terminating any other
active server processes
2021-12-01 16:48:27.233 AEDT [67422] LOG:  all server processes
terminated; reinitializing

Also (prior to running the checkpoint command above) I've seen errors
like the following when running pg_dumpall:

pg_dump: error: connection to server on socket "/tmp/.s.PGSQL.5432"
failed: PANIC:  could not open critical system index 2662
pg_dumpall: error: pg_dump failed on database "test2", exiting

Hopefully the above example will help in tracking down the cause.


Regards,
Greg Nancarrow
Fujitsu Australia

RE: Skipping logical replication transactions on subscriber side

2021-11-30 Thread houzj.f...@fujitsu.com

On Wednesday, December 1, 2021 1:23 PM Masahiko Sawada  
wrote:
> On Wed, Dec 1, 2021 at 1:00 PM Amit Kapila  wrote:
> > On Wed, Dec 1, 2021 at 9:12 AM Masahiko Sawada  
> > wrote:
> > > If so, the result from the second check_sql is unstable and it's
> > > probably better to check the result only once. That is, the first
> > > check_sql includes the command and we exit from the function once we
> > > confirm the error entry is expectedly updated.
> > >
> >
> > Yeah, I think that should be fine.
> 
> Okay, I've attached an updated patch. Please review it.
> 

I agreed that checking the result only once makes the test more stable.
The patch looks good to me.

Best regards,
Hou zj

Re: Update stale code comment in CheckpointerMain()

2021-11-30 Thread Amul Sul

On Tue, Nov 30, 2021 at 3:09 PM Daniel Gustafsson  wrote:
>
> > On 30 Nov 2021, at 08:00, Amul Sul  wrote:
>
> > The attached patch updates the code comment which is no longer true
> > after commit # 4a92a1c3d1c361ffb031ed05bf65b801241d7cdd
>
> Agreed, but looking at this shouldn't we also tweak the comment on
> RecoveryInProgress() as per the attached v2 diff?
>

Yes, we should -- diff looks good to me, thanks.

Regards,
Amul

Re: pg_waldump stucks with options --follow or -f and --stats or -z

On Mon, Nov 29, 2021 at 1:16 PM Bharath Rupireddy
 wrote:
>
> On Mon, Nov 29, 2021 at 11:09 AM Michael Paquier  wrote:
> >
> > On Sun, Nov 28, 2021 at 12:13:08PM +0530, Bharath Rupireddy wrote:
> > > Thanks. Here's the v5.
> >
> > By the way, one thing that I completely forgot here is that SIGINT is
> > not handled on Windows.  If we want to make that work for a WIN32
> > terminal, we would need to do something similar to
> > src/fe_utils/cancel.c where we need to use SetConsoleCtrlHandler() and
> > handle the stats print when facing CTRL_C_EVENT or CTRL_BREAK_EVENT as
> > events.  Perhaps we should try to think harder and have a more
> > centralized facility for the handler part between a WIN32 terminal and
> > SIGINT, as it is not the first time that we need this level of
> > handling.  Or we could just discard this issue, document its WIN32
> > limitation and paint some "#ifdef WIN32" around all the handler
> > portions of the patch.
> >
> > I would be fine with just doing the latter for now, as this stuff is
> > still useful for most users, but that's worth mentioning.  Any
> > opinions?
>
> I'm okay to have the same behaviour as pg_receivewal and pg_recvlogical tools.

Here's the v6 patch that has the SIGINT handling enabled for non-WIN32
platforms (note that I haven't specified anything in the
documentation) much like pg_receivewal and pg_recvlogical.

Regards,
Bharath Rupireddy.


v6-0001-pg_waldump-emit-stats-when-interrupted-with-SIGIN.patch
Description: Binary data

Re: Optionally automatically disable logical replication subscriptions on error

On Tue, Nov 30, 2021 at 5:34 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Tuesday, November 30, 2021 1:10 PM Greg Nancarrow  
> wrote:
> > On Sat, Nov 27, 2021 at 1:36 AM osumi.takami...@fujitsu.com
> >  wrote:
> > >
> > > This v7 uses v26 of skip xid patch [1]
> > This patch no longer applies on the latest source.
> > Also, the patch is missing an update to doc/src/sgml/catalogs.sgml, for the
> > new "subdisableonerr" column of pg_subscription.
> Thanks for your review !
>
> Fixed the documentation accordingly. Further,
> this comment invoked some more refactoring of codes
> since I wrote some internal codes related to
> 'disable_on_error' in an inconsistent order.
> I fixed this by keeping patch's codes
> after that of 'two_phase' subscription option as much as possible.
>
> I also conducted both pgindent and pgperltidy.
>
> Now, I'll share the v8 that uses PG
> whose commit id is after 8d74fc9 (pg_stat_subscription_workers).

Thanks for the updated patch, few small comments:
1) This should be changed:
+   subdisableonerr bool
+  
+  
+   If true, the subscription will be disabled when subscription
+   worker detects an error
+  
+ 

to:
+   subdisableonerr bool
+  
+  
+   If true, the subscription will be disabled when subscription
+   worker detects non-transient errors
+  
+ 


2) "Disable On Err" can be changed to "Disable On Error"
+ ",
subtwophasestate AS \"%s\"\n"
+ ",
subdisableonerr AS \"%s\"\n",
+
gettext_noop("Two phase commit"),
+
gettext_noop("Disable On Err"));

3) Can add a line in the commit message saying "Bump catalog version."
as the patch involves changing the catalog.

4) This prototype is not required, since the function is called after
the function definition:
 static inline void set_apply_error_context_xact(TransactionId xid,
TimestampTz ts);
 static inline void reset_apply_error_context_info(void);
+static bool IsTransientError(ErrorData *edata);

5) we could use the new style here:
+   ereport(LOG,
+   (errmsg("logical replication subscription
\"%s\" will be disabled due to error: %s",
+   MySubscription->name, edata->message)));

change it to:
+   ereport(LOG,
+   errmsg("logical replication subscription
\"%s\" will be disabled due to error: %s",
+   MySubscription->name, edata->message));

Similarly it can be changed in the other ereports added.

Regards,
Vignesh

Re: pg_replslotdata - a tool for displaying replication slot information

On Wed, Dec 1, 2021 at 12:13 AM Bossart, Nathan  wrote:
>
> On 11/30/21, 6:14 AM, "Peter Eisentraut"  
> wrote:
> > On 23.11.21 06:09, Bharath Rupireddy wrote:
> >> The replication slots data is stored in binary format on the disk under
> >> the pg_replslot/<> directory which isn't human readable. If
> >> the server is crashed/down (for whatever reasons) and unable to come up,
> >> currently there's no way for the user/admin/developer to know what were
> >> all the replication slots available at the time of server crash/down to
> >> figure out what's the restart lsn, xid, two phase info or types of slots
> >> etc.
> >
> > What do you need that for?  You can't do anything with a replication
> > slot while the server is down.
>
> One use-case might be to discover the value you need to set for
> max_replication_slots, although it's pretty trivial to discover the
> number of replication slots by looking at the folder directly.

Apart from the above use-case, one can do some exploratory analysis on
the replication slot information after the server crash, this may be
useful for RCA or debugging purposes, for instance:
1) to look at the restart_lsn of the slots to get to know why there
were many WAL files filled up on the disk (because of the restart_lsn
being low)
2) to know how many replication slots available at the time of crash,
if required, one can choose to drop selective replication slots or the
ones that were falling behind to make the server up
3) if we persist active_pid info of the replication slot to the
disk(currently we don't have this info in the disk), one can get to
know the inactive replication slots at the time of crash
4) if the primary server is down and failover were to happen on to the
standby, by looking at the replication slot information on the
primary, one can easily recreate the slots on the standby

> However, you also need to know how many replication origins there are,
> and AFAIK there isn't an easy way to read the replorigin_checkpoint
> file at the moment.  IMO a utility like this should also show details
> for the replication origins.  I don't have any other compelling use-
> cases at the moment, but I will say that it is typically nice from an
> administrative standpoint to be able to inspect things like this
> without logging into a running server.

Yeah, this can be added too, probably as an extra option to the
proposed pg_replslotdata tool. But for now, let's deal with the
replication slot information alone and once this gets committed, we
can extend it further for replication origin info.

Regards,
Bharath Rupireddy.

Re: Rename dead_tuples to dead_items in vacuumlazy.c

On Wed, Dec 1, 2021 at 4:42 AM Peter Geoghegan  wrote:
>
> On Mon, Nov 29, 2021 at 7:00 PM Masahiko Sawada  wrote:
> > Thanks! I'll change my parallel vacuum refactoring patch accordingly.
>
> Thanks again for working on that.
>
> > Regarding the commit, I think that there still is one place in
> > lazyvacuum.c where we can change "dead tuples” to "dead items”:
> >
> > /*
> >  * Allocate the space for dead tuples.  Note that this handles parallel
> >  * VACUUM initialization as part of allocating shared memory space used
> >  * for dead_items.
> >  */
> > dead_items_alloc(vacrel, params->nworkers);
> > dead_items = vacrel->dead_items;
>
> Oops. Pushed a fixup for that just now.

Thanks!

>
> > Also, the commit doesn't change both PROGRESS_VACUUM_MAX_DEAD_TUPLES
> > and PROGRESS_VACUUM_NUM_DEAD_TUPLES. Did you leave them on purpose?
>
> That was deliberate.
>
> It would be a bit strange to alter these constants without also
> updating the corresponding column names for the
> pg_stat_progress_vacuum system view. But if I kept the definition from
> system_views.sql in sync, then I would break user scripts -- for
> reasons that users don't care about. That didn't seem like the right
> approach.

Agreed.

>
> Also, the system as a whole still assumes "DEAD tuples and LP_DEAD
> items are the same, and are just as much of a problem in the table as
> they are in each index". As you know, this is not really true, which
> is an important problem for us. Fixing it (perhaps as part of adding
> something like Robert's conveyor belt design) will likely require
> revising this model quite fundamentally (e.g, the vacthresh
> calculation in autovacuum.c:relation_needs_vacanalyze() would be
> replaced). When this happens, we'll probably need to update system
> views that have columns with names like "dead_tuples" -- because maybe
> we no longer specifically count dead items/tuples at all. I strongly
> suspect that the approach to statistics that we take for pg_statistic
> optimizer stats just doesn't work for dead items/tuples -- statistical
> sampling only produces useful statistics for the optimizer because
> certain delicate assumptions are met (even these assumptions only
> really work with a properly normalized database schema).
>
> Maybe revising the model used for autovacuum scheduling wouldn't
> include changing pg_stat_progress_vacuum, since that isn't technically
> "part of the model" --- I'm not sure. But it's not something that I am
> in a hurry to fix.

Understood.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Skipping logical replication transactions on subscriber side

On Wed, Dec 1, 2021 at 1:00 PM Amit Kapila  wrote:
>
> On Wed, Dec 1, 2021 at 9:12 AM Masahiko Sawada  wrote:
> >
> > On Wed, Dec 1, 2021 at 12:22 PM Amit Kapila  wrote:
> > >
> > > On Wed, Dec 1, 2021 at 8:24 AM houzj.f...@fujitsu.com
> > >  wrote:
> > > >
> > > > I have a question about the testcase (I could be wrong here).
> > > >
> > > > Is it possible that the race condition happen between apply 
> > > > worker(test_tab1)
> > > > and table sync worker(test_tab2) ? If so, it seems the 
> > > > error("replication
> > > > origin with OID") could happen randomly until we resolve the conflict.
> > > > Based on this, for the following code:
> > > > -
> > > > # Wait for the error statistics to be updated.
> > > > my $check_sql = qq[SELECT count(1) > 0 ] . $part_sql;
> > > > $node->poll_query_until(
> > > > 'postgres', $check_sql,
> > > > ) or die "Timed out while waiting for statistics to be updated";
> > > >
> > > > * [1] *
> > > >
> > > > $check_sql =
> > > > qq[
> > > > SELECT subname, last_error_command, last_error_relid::regclass,
> > > > last_error_count > 0 ] . $part_sql;
> > > > my $result = $node->safe_psql('postgres', $check_sql);
> > > > is($result, $expected, $msg);
> > > > -
> > > >
> > > > Is it possible that the error("replication origin with OID") happen 
> > > > again at the
> > > > place [1]. In this case, the error message we have checked could be 
> > > > replaced by
> > > > another error("replication origin ...") and then the test fail ?
> > > >
> > >
> > > Once we get the "duplicate key violation ..." error before * [1] * via
> > > apply_worker then we shouldn't get replication origin-specific error
> > > because the origin set up is done before starting to apply changes.
> >
> > Right.
> >
> > > Also, even if that or some other happens after * [1] * because of
> > > errmsg_prefix check it should still succeed.
> >
> > In this case, the old error ("duplicate key violation ...") is
> > overwritten by a new error (e.g., connection error. not sure how
> > possible it is)
> >
>
> Yeah, or probably some memory allocation failure. I think the
> probability of such failures is very low but OTOH why take chance.
>
> > and the test fails because the query returns no
> > entries, no?
> >
>
> Right.
>
> > If so, the result from the second check_sql is unstable
> > and it's probably better to check the result only once. That is, the
> > first check_sql includes the command and we exit from the function
> > once we confirm the error entry is expectedly updated.
> >
>
> Yeah, I think that should be fine.

Okay, I've attached an updated patch. Please review it.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


v2-0001-Fix-regression-test-failure-caused-by-commit-8d74.patch
Description: Binary data

Re: Skipping logical replication transactions on subscriber side

On Wed, Dec 1, 2021 at 9:12 AM Masahiko Sawada  wrote:
>
> On Wed, Dec 1, 2021 at 12:22 PM Amit Kapila  wrote:
> >
> > On Wed, Dec 1, 2021 at 8:24 AM houzj.f...@fujitsu.com
> >  wrote:
> > >
> > > I have a question about the testcase (I could be wrong here).
> > >
> > > Is it possible that the race condition happen between apply 
> > > worker(test_tab1)
> > > and table sync worker(test_tab2) ? If so, it seems the error("replication
> > > origin with OID") could happen randomly until we resolve the conflict.
> > > Based on this, for the following code:
> > > -
> > > # Wait for the error statistics to be updated.
> > > my $check_sql = qq[SELECT count(1) > 0 ] . $part_sql;
> > > $node->poll_query_until(
> > > 'postgres', $check_sql,
> > > ) or die "Timed out while waiting for statistics to be updated";
> > >
> > > * [1] *
> > >
> > > $check_sql =
> > > qq[
> > > SELECT subname, last_error_command, last_error_relid::regclass,
> > > last_error_count > 0 ] . $part_sql;
> > > my $result = $node->safe_psql('postgres', $check_sql);
> > > is($result, $expected, $msg);
> > > -
> > >
> > > Is it possible that the error("replication origin with OID") happen again 
> > > at the
> > > place [1]. In this case, the error message we have checked could be 
> > > replaced by
> > > another error("replication origin ...") and then the test fail ?
> > >
> >
> > Once we get the "duplicate key violation ..." error before * [1] * via
> > apply_worker then we shouldn't get replication origin-specific error
> > because the origin set up is done before starting to apply changes.
>
> Right.
>
> > Also, even if that or some other happens after * [1] * because of
> > errmsg_prefix check it should still succeed.
>
> In this case, the old error ("duplicate key violation ...") is
> overwritten by a new error (e.g., connection error. not sure how
> possible it is)
>

Yeah, or probably some memory allocation failure. I think the
probability of such failures is very low but OTOH why take chance.

> and the test fails because the query returns no
> entries, no?
>

Right.

> If so, the result from the second check_sql is unstable
> and it's probably better to check the result only once. That is, the
> first check_sql includes the command and we exit from the function
> once we confirm the error entry is expectedly updated.
>

Yeah, I think that should be fine.

With Regards,
Amit Kapila.

Re: Skipping logical replication transactions on subscriber side

On Wed, Dec 1, 2021 at 12:22 PM Amit Kapila  wrote:
>
> On Wed, Dec 1, 2021 at 8:24 AM houzj.f...@fujitsu.com
>  wrote:
> >
> > On Tues, Nov 30, 2021 9:39 PM Masahiko Sawada  wrote:
> > > >
> > > > > Shouldn't we someway check that the error message also starts with
> > > > > "duplicate key value violates ..."?
> > > >
> > > > Yeah, I think it's a good idea to make the checks more specific. That
> > > > is, probably we can specify the prefix of the error message and
> > > > subrelid in addition to the current conditions: relid and xid. That
> > > > way, we can check what error was reported by which workers (tablesync
> > > > or apply) for which relations. And both check queries in
> > > > test_subscription_error() can have the same WHERE clause.
> > >
> > > I've attached a patch that fixes this issue. Please review it.
> > >
> >
> > I have a question about the testcase (I could be wrong here).
> >
> > Is it possible that the race condition happen between apply 
> > worker(test_tab1)
> > and table sync worker(test_tab2) ? If so, it seems the error("replication
> > origin with OID") could happen randomly until we resolve the conflict.
> > Based on this, for the following code:
> > -
> > # Wait for the error statistics to be updated.
> > my $check_sql = qq[SELECT count(1) > 0 ] . $part_sql;
> > $node->poll_query_until(
> > 'postgres', $check_sql,
> > ) or die "Timed out while waiting for statistics to be updated";
> >
> > * [1] *
> >
> > $check_sql =
> > qq[
> > SELECT subname, last_error_command, last_error_relid::regclass,
> > last_error_count > 0 ] . $part_sql;
> > my $result = $node->safe_psql('postgres', $check_sql);
> > is($result, $expected, $msg);
> > -
> >
> > Is it possible that the error("replication origin with OID") happen again 
> > at the
> > place [1]. In this case, the error message we have checked could be 
> > replaced by
> > another error("replication origin ...") and then the test fail ?
> >
>
> Once we get the "duplicate key violation ..." error before * [1] * via
> apply_worker then we shouldn't get replication origin-specific error
> because the origin set up is done before starting to apply changes.

Right.

> Also, even if that or some other happens after * [1] * because of
> errmsg_prefix check it should still succeed.

In this case, the old error ("duplicate key violation ...") is
overwritten by a new error (e.g., connection error. not sure how
possible it is) and the test fails because the query returns no
entries, no? If so, the result from the second check_sql is unstable
and it's probably better to check the result only once. That is, the
first check_sql includes the command and we exit from the function
once we confirm the error entry is expectedly updated.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

RE: Skipping logical replication transactions on subscriber side

2021-11-30 Thread houzj.f...@fujitsu.com

On Wed, Dec 1, 2021 11:22 AM Amit Kapila  wrote:
> On Wed, Dec 1, 2021 at 8:24 AM houzj.f...@fujitsu.com
>  wrote:
> >
> > On Tues, Nov 30, 2021 9:39 PM Masahiko Sawada
>  wrote:
> > > >
> > > > > Shouldn't we someway check that the error message also starts with
> > > > > "duplicate key value violates ..."?
> > > >
> > > > Yeah, I think it's a good idea to make the checks more specific. That
> > > > is, probably we can specify the prefix of the error message and
> > > > subrelid in addition to the current conditions: relid and xid. That
> > > > way, we can check what error was reported by which workers (tablesync
> > > > or apply) for which relations. And both check queries in
> > > > test_subscription_error() can have the same WHERE clause.
> > >
> > > I've attached a patch that fixes this issue. Please review it.
> > >
> >
> > I have a question about the testcase (I could be wrong here).
> >
> > Is it possible that the race condition happen between apply
> worker(test_tab1)
> > and table sync worker(test_tab2) ? If so, it seems the error("replication
> > origin with OID") could happen randomly until we resolve the conflict.
> > Based on this, for the following code:
> > -
> > # Wait for the error statistics to be updated.
> > my $check_sql = qq[SELECT count(1) > 0 ] . $part_sql;
> > $node->poll_query_until(
> > 'postgres', $check_sql,
> > ) or die "Timed out while waiting for statistics to be updated";
> >
> > * [1] *
> >
> > $check_sql =
> > qq[
> > SELECT subname, last_error_command, last_error_relid::regclass,
> > last_error_count > 0 ] . $part_sql;
> > my $result = $node->safe_psql('postgres', $check_sql);
> > is($result, $expected, $msg);
> > -
> >
> > Is it possible that the error("replication origin with OID") happen again 
> > at the
> > place [1]. In this case, the error message we have checked could be replaced
> by
> > another error("replication origin ...") and then the test fail ?
> >
> 
> Once we get the "duplicate key violation ..." error before * [1] * via
> apply_worker then we shouldn't get replication origin-specific error
> because the origin set up is done before starting to apply changes.
> Also, even if that or some other happens after * [1] * because of
> errmsg_prefix check it should still succeed. Does that make sense?

Oh, I missed the point that the origin set up is done once we get the expected 
error.
Thanks for the explanation, and I think the patch looks good.

Best regards,
Hou zj

Re: Skipping logical replication transactions on subscriber side

On Wed, Dec 1, 2021 at 8:24 AM houzj.f...@fujitsu.com
 wrote:
>
> On Tues, Nov 30, 2021 9:39 PM Masahiko Sawada  wrote:
> > >
> > > > Shouldn't we someway check that the error message also starts with
> > > > "duplicate key value violates ..."?
> > >
> > > Yeah, I think it's a good idea to make the checks more specific. That
> > > is, probably we can specify the prefix of the error message and
> > > subrelid in addition to the current conditions: relid and xid. That
> > > way, we can check what error was reported by which workers (tablesync
> > > or apply) for which relations. And both check queries in
> > > test_subscription_error() can have the same WHERE clause.
> >
> > I've attached a patch that fixes this issue. Please review it.
> >
>
> I have a question about the testcase (I could be wrong here).
>
> Is it possible that the race condition happen between apply worker(test_tab1)
> and table sync worker(test_tab2) ? If so, it seems the error("replication
> origin with OID") could happen randomly until we resolve the conflict.
> Based on this, for the following code:
> -
> # Wait for the error statistics to be updated.
> my $check_sql = qq[SELECT count(1) > 0 ] . $part_sql;
> $node->poll_query_until(
> 'postgres', $check_sql,
> ) or die "Timed out while waiting for statistics to be updated";
>
> * [1] *
>
> $check_sql =
> qq[
> SELECT subname, last_error_command, last_error_relid::regclass,
> last_error_count > 0 ] . $part_sql;
> my $result = $node->safe_psql('postgres', $check_sql);
> is($result, $expected, $msg);
> -
>
> Is it possible that the error("replication origin with OID") happen again at 
> the
> place [1]. In this case, the error message we have checked could be replaced 
> by
> another error("replication origin ...") and then the test fail ?
>

Once we get the "duplicate key violation ..." error before * [1] * via
apply_worker then we shouldn't get replication origin-specific error
because the origin set up is done before starting to apply changes.
Also, even if that or some other happens after * [1] * because of
errmsg_prefix check it should still succeed. Does that make sense?

-- 
With Regards,
Amit Kapila.

RE: Skipping logical replication transactions on subscriber side

2021-11-30 Thread houzj.f...@fujitsu.com

On Tues, Nov 30, 2021 9:39 PM Masahiko Sawada  wrote:
> On Tue, Nov 30, 2021 at 8:41 PM Masahiko Sawada 
> wrote:
> >
> > On Tue, Nov 30, 2021 at 6:28 PM Amit Kapila 
> wrote:
> > >
> > > On Mon, Nov 29, 2021 at 11:38 AM vignesh C 
> wrote:
> > > >
> > >
> > > I have pushed this patch and there is a buildfarm failure for it. See:
> > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&d
> > > t=2021-11-30%2005%3A05%3A25
> > >
> > > Sawada-San has shared his initial analysis on pgsql-committers [1]
> > > and I am responding here as the fix requires some more discussion.
> > >
> > > > Looking at the result the test actually got, we had two error
> > > > entries for test_tab1 instead of one:
> > > >
> > > > #   Failed test 'check the error reported by the apply worker'
> > > > #   at t/026_worker_stats.pl line 33.
> > > > #  got: 'tap_sub|INSERT|test_tab1|t
> > > > # tap_sub||test_tab1|t'
> > > > # expected: 'tap_sub|INSERT|test_tab1|t'
> > > >
> > > > The possible scenarios are:
> > > >
> > > > The table sync worker for test_tab1 failed due to an error
> > > > unrelated to apply changes:
> > > >
> > > > 2021-11-30 06:24:02.137 CET [18990:2] ERROR:  replication origin
> > > > with OID 2 is already active for PID 23706
> > > >
> > > > At this time, the view had one error entry for the table sync worker.
> > > > After retrying table sync, it succeeded:
> > > >
> > > > 2021-11-30 06:24:04.202 CET [28117:2] LOG:  logical replication
> > > > table synchronization worker for subscription "tap_sub", table
> "test_tab1"
> > > > has finished
> > > >
> > > > Then after inserting a row on the publisher, the apply worker
> > > > inserted the row but failed due to violating a unique key
> > > > violation, which is
> > > > expected:
> > > >
> > > > 2021-11-30 06:24:04.307 CET [4806:2] ERROR:  duplicate key value
> > > > violates unique constraint "test_tab1_pkey"
> > > > 2021-11-30 06:24:04.307 CET [4806:3] DETAIL:  Key (a)=(1) already 
> > > > exists.
> > > > 2021-11-30 06:24:04.307 CET [4806:4] CONTEXT:  processing remote
> > > > data during "INSERT" for replication target relation
> > > > "public.test_tab1" in transaction 721 at 2021-11-30
> > > > 06:24:04.305096+01
> > > >
> > > > As a result, we had two error entries for test_tab1: the table
> > > > sync worker error and the apply worker error. I didn't expect that
> > > > the table sync worker for test_tab1 failed due to "replication
> > > > origin with OID 2 is already active for PID 23706” error.
> > > >
> > > > Looking at test_subscription_error() in 026_worker_stats.pl, we
> > > > have two checks; in the first check, we wait for the view to show
> > > > the error entry for the given relation name and xid. This check
> > > > was passed since we had the second error (i.g., apply worker
> > > > error). In the second check, we get error entries from
> > > > pg_stat_subscription_workers by specifying only the relation name.
> > > > Therefore, we ended up getting two entries and failed the tests.
> > > >
> > > > To fix this issue, I think that in the second check, we can get
> > > > the error from pg_stat_subscription_workers by specifying the
> > > > relation name *and* xid like the first check does. I've attached the 
> > > > patch.
> > > > What do you think?
> > > >
> > >
> > > I think this will fix the reported failure but there is another race
> > > condition in the test. Isn't it possible that for table test_tab2,
> > > we get an error "replication origin with OID ..." or some other
> > > error before copy, in that case also, we will proceed from the
> > > second call of test_subscription_error() which is not what we expect in 
> > > the
> test?
> >
> > Right.
> >
> > > Shouldn't we someway check that the error message also starts with
> > > "duplicate key value violates ..."?
> >
> > Yeah, I think it's a good idea to make the checks more specific. That
> > is, probably we can specify the prefix of the error message and
> > subrelid in addition to the current conditions: relid and xid. That
> > way, we can check what error was reported by which workers (tablesync
> > or apply) for which relations. And both check queries in
> > test_subscription_error() can have the same WHERE clause.
> 
> I've attached a patch that fixes this issue. Please review it.
> 

I have a question about the testcase (I could be wrong here).

Is it possible that the race condition happen between apply worker(test_tab1)
and table sync worker(test_tab2) ? If so, it seems the error("replication
origin with OID") could happen randomly until we resolve the conflict.
Based on this, for the following code:
-
# Wait for the error statistics to be updated.
my $check_sql = qq[SELECT count(1) > 0 ] . $part_sql;
$node->poll_query_until(
'postgres', $check_sql,
) or die "Timed out while waiting for statistics to be updated";

* [1] *

$check_sql =
qq[
SELECT subname, last_error_command, last_error_relid::regclass,
last_error_count > 0 ] .

Re: row filtering for logical replication

On Wed, Dec 1, 2021 at 6:55 AM Euler Taveira  wrote:
>
> On Tue, Nov 30, 2021, at 7:25 AM, Amit Kapila wrote:
>
> On Tue, Nov 30, 2021 at 11:37 AM Dilip Kumar  wrote:
> >
> > What about the initial table sync? during that, we are going to
> > combine all the filters or we are going to apply only the insert
> > filters?
> >
>
> AFAIK, currently, initial table sync doesn't respect publication
> actions so it should combine all the filters. What do you think?
>
> I agree. If you think that it might need a row to apply DML commands (UPDATE,
> DELETE) in the future or that due to a row filter that row should be available
> in the subscriber (INSERT-only case), it makes sense to send all rows that
> satisfies any row filter.
>

Right and Good point.

-- 
With Regards,
Amit Kapila.

Re: [PATCH] DROP tab completion

On Tue, Nov 30, 2021 at 03:17:07PM +, Asif Rehman wrote:
> The patch applies cleanly and the functionality seems to work well. (master 
> e7122548a3)
> 
> The new status of this patch is: Ready for Committer

+   else if (Matches("DROP", "MATERIALIZED", "VIEW", MatchAny))
+   COMPLETE_WITH("CASCADE", "RESTRICT");
[...]
+   else if (Matches("DROP", "OWNED", "BY", MatchAny))
+   COMPLETE_WITH("CASCADE", "RESTRICT");
This stuff is gathered around line 3284 in tab-complete.c as of HEAD
at 538724f, but I think that you have done things right as there are
already sections for those commands and they have multiple keywords.
So, applied.  Thanks!
--
Michael


signature.asc
Description: PGP signature

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2021-11-30 Thread Greg Nancarrow

On Thu, Nov 25, 2021 at 10:17 PM Dilip Kumar  wrote:
>
> Thanks for the review and many valuable comments, I have fixed all of
> them except this comment (/* If we got a cancel signal during the copy
> of the data, quit */) because this looks fine to me.  0007, I have
> dropped from the patchset for now.  I have also included fixes for
> comments given by John.
>

Any progress/results yet on testing against a large database (as
suggested by John Naylor) and multiple tablespaces?

Thanks for the patch updates.
I have some additional minor comments:

0002

(1) Tidy patch comment

I suggest minor tidying of the patch comment, as follows:

Support new interfaces in relmapper, 1) Support copying the
relmap file from one database path to another database path.
2) Like RelationMapOidToFilenode, provide another interface
which does the same but, instead of getting it for the database
we are connected to, it will get it for the input database
path.

These interfaces are required for the next patch, for supporting
the WAL-logged created database.

0003

src/include/commands/tablecmds.h
(1) typedef void (*copy_relation_storage) ...

The new typename "copy_relation_storage" needs to be added to
src/tools/pgindent/typedefs.list

0006

src/backend/commands/dbcommands.c
(1) CreateDirAndVersionFile

After writing to the file, you should probably pfree(buf.data), right?
Actually, I don't think StringInfo (dynamic string allocation) is
needed here, since the version string is so short, so why not just use
a local "char buf[16]" buffer and snprintf() the
PG_MAJORVERSION+newline into that?

Also (as mentioned in my first review) shouldn't the "O_TRUNC" flag be
additionally specified in the case when OpenTransientFile() is tried
for a 2nd time because of errno==EEXIST on the 1st attempt? (otherwise
if the existing file did contain something you'd end up writing after
the existing data in the file).

src/backend/commands/dbcommands.c
(2) typedef struct CreateDBRelInfo ... CreateDBRelInfo

The new typename "CreateDBRelInfo" needs to be added to
src/tools/pgindent/typedefs.list

src/bin/pg_rewind/parsexlog.c
(3) Include additional header file

It seems that the following additional header file is not needed to
compile the source file:

+#include "utils/relmapper.h"

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2021-11-30 Thread SATYANARAYANA NARLAPURAM

On Tue, Nov 30, 2021 at 4:54 PM Michael Paquier  wrote:

> On Tue, Nov 30, 2021 at 05:58:15PM -0500, David Steele wrote:
> > The main objections as I recall are that it is much harder for simple
> backup
> > scripts and commercial backup integrations to hold a connection to
> postgres
> > open and write the backup label separately into the backup.
>
> I don't quite understand why this argument would not hold even today,
> even if I'd like to think that more people are using pg_basebackup.
>
> > I did figure out how to keep the safe part of exclusive backup (not
> having
> > to maintain a connection) while removing the dangerous part (writing
> > backup_label into PGDATA), but it was a substantial amount of work and I
> > felt that it had little chance of being committed.
>
> Which was, I guess, done by storing the backup_label contents within a
> file different than backup_label, still maintained in the main data
> folder to ensure that it gets included in the backup?
>

Non-exclusive backup has significant advantages over exclusive backups but
would like to add a few comments on the simplicity of exclusive backups -
1/ It is not uncommon nowadays to take a snapshot based backup. Exclusive
backup simplifies this story as the backup label file is part of the
snapshot. Otherwise, one needs to store it somewhere outside as snapshot
metadata and copy this file over during restore (after creating a disk from
the snapshot) to the data directory. Typical steps included are 1/ start
pg_base_backup 2/ Take disk snapshot 3/ pg_stop_backup() 4/ Mark snapshot
as consistent and add some create time metadata.
2/ Control plane code responsible for taking backups is simpler with
exclusive backups than non-exclusive as it doesn't maintain a connection to
the server, particularly when that orchestration is outside the machine the
Postgres server is running on.

IMHO, we should either remove the support for it or improve it but not
leave it hanging there.

Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?

On Tue, Nov 30, 2021 at 5:09 PM Peter Geoghegan  wrote:
> I believe that there have been several historic reasons why we need a
> cleanup lock during nbtree VACUUM, and that there is only one
> remaining reason for it today. So the history is unusually complicated.

Minor correction: we actually also have to worry about plain index
scans that don't use an MVCC snapshot, which is possible within
nbtree. It's quite likely when using logical replication, actually.
See the patch for more.

Like with the index-only scan case, a non-MVCC snapshot + plain nbtree
index scan cannot rely on heap access within the index scan node -- it
won't reliably notice that any newer heap tuples (that are really the
result of concurrent TID recycling) are not actually visible to its
MVCC snapshot -- because there isn't an MVCC snapshot. The only
difference in the index-only scan scenario is that we use the
visibility map (not the heap) -- which is racey in a way that makes
our MVCC snapshot (IOSs always have an MVCC snapshot) an ineffective
protection.

In summary, to be safe against confusion from concurrent TID recycling
during index/index-only scans, we can do either of the following
things:

1. Hold a pin of our leaf page while accessing the heap -- that'll
definitely conflict with the cleanup lock that nbtree VACUUM will
inevitably try to acquire on our leaf page.

OR:

2. Hold an MVCC snapshot, AND do an actual heap page access during the
plain index scan -- do both together.

With approach 2, our plain index scan must determine visibility using
real XIDs (against something like a dirty snapshot), rather than using
a visibility map bit. That is also necessary because the VM might
become invalid or ambiguous, in a way that's clearly not possible when
looking at full heap tuple headers with XIDs -- concurrent recycling
becomes safe if we know that we'll reliably notice it and not give
wrong answers.

Does that make sense?

-- 
Peter Geoghegan

Re: row filtering for logical replication

2021-11-30 Thread Euler Taveira

On Tue, Nov 30, 2021, at 7:25 AM, Amit Kapila wrote:
> On Tue, Nov 30, 2021 at 11:37 AM Dilip Kumar  wrote:
> >
> > What about the initial table sync? during that, we are going to
> > combine all the filters or we are going to apply only the insert
> > filters?
> >
> 
> AFAIK, currently, initial table sync doesn't respect publication
> actions so it should combine all the filters. What do you think?
I agree. If you think that it might need a row to apply DML commands (UPDATE,
DELETE) in the future or that due to a row filter that row should be available
in the subscriber (INSERT-only case), it makes sense to send all rows that
satisfies any row filter.

The current code already works this way. All row filter are combined into a
WHERE clause using OR. If any of the publications don't have a row filter,
there is no WHERE clause.


--
Euler Taveira
EDB   https://www.enterprisedb.com/

Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?

On Fri, Nov 5, 2021 at 3:26 AM Andrey Borodin  wrote:
> > 4 нояб. 2021 г., в 20:58, Peter Geoghegan  написал(а):
> > That's a pretty unlikely scenario. And even
> > if it happened it would only happen once (until the next time we get
> > unlucky). What are the chances of somebody noticing a more or less
> > once-off, slightly wrong answer?
>
> I'd say next to impossible, yet not impossible. Or, perhaps, I do not see 
> protection from this.

I think that that's probably all correct -- I would certainly make the
same guess. It's very unlikely to happen, and when it does happen it
happens only once.

> Moreover there's a "microvacuum". It kills tuples with BUFFER_LOCK_SHARE. 
> AFAIU it should take cleanup lock on buffer too?

No, because there is no heap vacuuming involved (because that doesn't
happen outside lazyvacuum.c). The work that VACUUM does inside
lazy_vacuum_heap_rel() is part of the problem here -- we need an
interlock between that work, and index-only scans. Making LP_DEAD
items in heap pages LP_UNUSED is only ever possible during a VACUUM
operation (I'm sure you know why). AFAICT there would be no bug at all
without that detail.

I believe that there have been several historic reasons why we need a
cleanup lock during nbtree VACUUM, and that there is only one
remaining reason for it today. So the history is unusually complicated. But
AFAICT it's always some kind of "interlock with heapam VACUUM" issue,
with TID recycling, with no protection from our MVCC snapshot. I would
say that that's the "real problem" here, when I get to first principles.

Attached draft patch attempts to explain things in this area within
the nbtree README. There is a much shorter comment about it within
vacuumlazy.c. I am concerned about GiST index-only scans themselves,
of course, but I discovered this issue when thinking carefully about
the concurrency rules for VACUUM -- I think it's valuable to formalize
and justify the general rules that index access methods must follow.

We can talk about this some more in NYC. See you soon!
--
Peter Geoghegan
From ea6612300e010f1f2b02119b5a0de95e46d1157d Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Wed, 3 Nov 2021 14:38:15 -0700
Subject: [PATCH v1] nbtree README: Improve VACUUM interlock section.

Also document a related issue for index-only scans in vacuumlazy.c.

Author: Peter Geoghegan 
Discussion: https://postgr.es/m/CAH2-Wz=PqOziyRSrnN5jAtfXWXY7-BJcHz9S355LH8Dt=5qxWQ@mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c |  10 ++
 src/backend/access/nbtree/README | 145 ---
 2 files changed, 75 insertions(+), 80 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 282b44f87..8bfe196bf 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2384,6 +2384,16 @@ lazy_vacuum_heap_rel(LVRelState *vacrel)
  * LP_DEAD items on the page that were determined to be LP_DEAD items back
  * when the same page was visited by lazy_scan_prune() (i.e. those whose TID
  * was recorded in the dead_items array at the time).
+ *
+ * We can opportunistically set the visibility map bit for the page here,
+ * which is valuable when lazy_scan_prune couldn't earlier on, owing only to
+ * the fact that there were LP_DEAD items that we'll now mark as unused.  This
+ * is why index AMs that support index-only scans have to hold a pin on an
+ * index page as an interlock against VACUUM while accessing the visibility
+ * map (which is really just a dense summary of visibility information in the
+ * heap).  If they didn't do this then there would be rare race conditions
+ * where a heap TID that is actually dead appears alive to an unlucky
+ * index-only scan.
  */
 static int
 lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
diff --git a/src/backend/access/nbtree/README b/src/backend/access/nbtree/README
index 2a7332d07..c6f04d856 100644
--- a/src/backend/access/nbtree/README
+++ b/src/backend/access/nbtree/README
@@ -89,25 +89,28 @@ Page read locks are held only for as long as a scan is examining a page.
 To minimize lock/unlock traffic, an index scan always searches a leaf page
 to identify all the matching items at once, copying their heap tuple IDs
 into backend-local storage.  The heap tuple IDs are then processed while
-not holding any page lock within the index.  We do continue to hold a pin
-on the leaf page in some circumstances, to protect against concurrent
-deletions (see below).  In this state the scan is effectively stopped
-"between" pages, either before or after the page it has pinned.  This is
-safe in the presence of concurrent insertions and even page splits, because
-items are never moved across pre-existing page boundaries --- so the scan
-cannot miss any items it should have seen, nor accidentally return the same
-item twice.  The scan must remember the page's right-link at the time it
-was scanned, since that is the page to m

Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

On Tue, Nov 30, 2021 at 05:58:15PM -0500, David Steele wrote:
> The main objections as I recall are that it is much harder for simple backup
> scripts and commercial backup integrations to hold a connection to postgres
> open and write the backup label separately into the backup.

I don't quite understand why this argument would not hold even today,
even if I'd like to think that more people are using pg_basebackup.

> I did figure out how to keep the safe part of exclusive backup (not having
> to maintain a connection) while removing the dangerous part (writing
> backup_label into PGDATA), but it was a substantial amount of work and I
> felt that it had little chance of being committed.

Which was, I guess, done by storing the backup_label contents within a
file different than backup_label, still maintained in the main data
folder to ensure that it gets included in the backup?
--
Michael


signature.asc
Description: PGP signature

Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

2021-11-30 Thread Andy Fan

On Wed, Dec 1, 2021 at 3:33 AM Robert Haas  wrote:

> On Tue, Nov 30, 2021 at 4:47 AM Andy Fan  wrote:
> >> my exception should be that the relcache should not be invalidated
> _after the first relation open_
> >> in the executor (not the beginning of executorRun)。
> >
> > s/exception/expectation.
> >
> > To be more accurate,  my expectation is for a single sql statement,
> after the first time I write data into
> > the relation, until the statement is completed or "aborted and
> RelationClose is called in ResourceOwnerRelease",
> > the relcache reset should never happen.
>
> Well  I'm not sure why you asked the question and then argued with
> the answer you got.

I think you misunderstand me,  I argued with the answer because after I got
the
answer and I rethink my problem, I found my question description is not
accurate
enough,  so I improved the question and willing discussion again. My
exception was
things will continue with something like this:
1. In your new described situation,  your solution still does not work
because ...
2. In your new described situation,  the solution would work for sure
3.  your situation is still not cleared enough.

-- 
Best Regards
Andy Fan

Deprecating the term "super-exclusive lock"

In commit 276db875, I made vacuumlazy.c consistently use the term
"cleanup lock", rather than the term "super-exclusive lock". But on
further reflection I should have gone further, and removed the term
"super-exclusive lock" from the tree completely. The actual relevant C
symbols only use the term cleanup.

Attached patch does this. There's not much churn.

The term "super-exclusive lock" is far more likely to be used in index
AM code, particularly nbtree. That's why I, the nbtree person,
originally added a bunch of uses of that term in heapam -- prior to
that point heapam probably didn't use the term once.

Anyway, I don't think that there is a particularly good reason for an
index AM/table AM divide in terminology. In fact, I'd go further:
nbtree's use of super-exclusive locks is actually something that
exists for the benefit of heapam alone -- and so using a different
name in index AM code makes zero sense, because it's really a heapam
thing anyway. Despite appearances.

The underlying why we need a cleanup lock when calling
_bt_delitems_vacuum() (but not when calling the near-identical
_bt_delitems_delete() function) is this: we need it as an interlock,
to avoid breaking index-only scans with concurrent heap vacuuming (not
pruning) that takes place in vacuumlazy.c [1]. This issue isn't
currently documented anywhere, though I plan on addressing that in the
near future, with a separate patch.

Historic note: the reason why this issue is so confused now has a lot
to do with how the code has evolved over time. When the cleanup lock
thing was first added to nbtree way back in 2001 (see commit
c8076f09), there was no such thing as HOT, and nbtree didn't do
page-at-a-time processing yet -- I believe that the cleanup lock was
needed to avoid breaking these things (when lazy VACUUM became the
default). Of course the cleanup lock can't have been needed for
index-only scans back then, because there weren't any. I'm pretty sure
that that's the only remaining reason for requiring a cleanup lock.

Note about a future optimization opportunity: this also means that we
could safely elide the cleanup lock during VACUUM (just get an
exclusive lock) iff lazyvacuum.c told ambulkdelete that it has
*already* decided that it won't bother performing a round of heap
VACUUM in lazy_vacuum_heap_rel(). This observation isn't useful on its
own, but in a world with something like Robert Haas's conveyor belt
design (a world with *selective* index vacuuming), it could be quite
valuable.

[1] 
https://postgr.es/m/CAH2-Wz=PqOziyRSrnN5jAtfXWXY7-BJcHz9S355LH8Dt=5q...@mail.gmail.com
-- 
Peter Geoghegan


v1-0001-Deprecate-the-term-super-exclusive-lock.patch
Description: Binary data

Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

On 11/30/21, 2:58 PM, "David Steele"  wrote:
> I did figure out how to keep the safe part of exclusive backup (not
> having to maintain a connection) while removing the dangerous part
> (writing backup_label into PGDATA), but it was a substantial amount of
> work and I felt that it had little chance of being committed.

Do you think it's still worth trying to make it safe, or do you think
we should just remove exclusive mode completely?

> Attaching the thread [1] that I started with a patch to remove exclusive
> backup for reference.

Ah, good, some light reading.  :)

Nathan

Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2021-11-30 Thread David Steele

On 11/30/21 17:26, Tom Lane wrote:

"Bossart, Nathan" writes:

It looks like the exclusive way has been marked deprecated in all
supported versions along with a note that it will eventually be
removed. If it's not going to be removed out of fear of breaking
backward compatibility, I think the documentation should be updated to
say that. However, unless there is something that is preventing users
from switching to the non-exclusive approach, I think it is reasonable
to begin thinking about removing it.

If we're willing to outright remove it, I don't have any great objection.
My original two cents was that we shouldn't put effort into improving it;
but removing it isn't that.

The main objections as I recall are that it is much harder for simple
backup scripts and commercial backup integrations to hold a connection
to postgres open and write the backup label separately into the backup.

As Stephen noted, working in this area is much harder (even in the docs)
due to the need to keep both methods working. When I removed exclusive
backup it didn't break any tests, other than one that needed to generate
a corrupt backup, so we have virtually no coverage for that method.

I did figure out how to keep the safe part of exclusive backup (not
having to maintain a connection) while removing the dangerous part
(writing backup_label into PGDATA), but it was a substantial amount of
work and I felt that it had little chance of being committed.

Attaching the thread [1] that I started with a patch to remove exclusive
backup for reference.

[1]
https://www.postgresql.org/message-id/flat/ac7339ca-3718-3c93-929f-99e725d1172c%40pgmasters.net

Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

On 11/30/21, 2:27 PM, "Tom Lane"  wrote:
> If we're willing to outright remove it, I don't have any great objection.
> My original two cents was that we shouldn't put effort into improving it;
> but removing it isn't that.

I might try to put a patch together for the January commitfest, given
there is enough support.

Nathan

Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2021-11-30 Thread Tom Lane

"Bossart, Nathan"  writes:
> It looks like the exclusive way has been marked deprecated in all
> supported versions along with a note that it will eventually be
> removed.  If it's not going to be removed out of fear of breaking
> backward compatibility, I think the documentation should be updated to
> say that.  However, unless there is something that is preventing users
> from switching to the non-exclusive approach, I think it is reasonable
> to begin thinking about removing it.

If we're willing to outright remove it, I don't have any great objection.
My original two cents was that we shouldn't put effort into improving it;
but removing it isn't that.

regards, tom lane

Re: row filtering for logical replication

2021-11-30 Thread Peter Smith

On Tue, Nov 30, 2021 at 9:34 PM vignesh C  wrote:
>
> 3) Can a user remove the row filter without removing the table from
> the publication after creating the publication or should the user drop
> the table and add the table in this case?
>

AFAIK to remove an existing filter use ALTER PUBLICATION ... SET TABLE
but do not specify any filter.
For example,

test_pub=# create table t1(a int primary key);
CREATE TABLE
test_pub=# create publication p1 for table t1 where (a > 1);
CREATE PUBLICATION
test_pub=# create publication p2 for table t1 where (a > 2);
CREATE PUBLICATION
test_pub=# \d+ t1
   Table "public.t1"
 Column |  Type   | Collation | Nullable | Default | Storage |
Compression | Stats target | Description
+-+---+--+-+-+-+--+-
 a  | integer |   | not null | | plain   |
|  |
Indexes:
"t1_pkey" PRIMARY KEY, btree (a)
Publications:
"p1" WHERE ((a > 1))
"p2" WHERE ((a > 2))
Access method: heap

test_pub=# alter publication p1 set table t1;
ALTER PUBLICATION
test_pub=# \d+ t1
   Table "public.t1"
 Column |  Type   | Collation | Nullable | Default | Storage |
Compression | Stats target | Description
+-+---+--+-+-+-+--+-
 a  | integer |   | not null | | plain   |
|  |
Indexes:
"t1_pkey" PRIMARY KEY, btree (a)
Publications:
"p1"
"p2" WHERE ((a > 2))
Access method: heap

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

On 11/30/21, 9:51 AM, "Stephen Frost"  wrote:
> I disagree that that’s a satisfactory approach. It certainly wasn’t
> intended or documented as part of the original feature and therefore
> to call it satisfactory strikes me quite strongly as revisionist
> history. 

It looks like the exclusive way has been marked deprecated in all
supported versions along with a note that it will eventually be
removed.  If it's not going to be removed out of fear of breaking
backward compatibility, I think the documentation should be updated to
say that.  However, unless there is something that is preventing users
from switching to the non-exclusive approach, I think it is reasonable
to begin thinking about removing it.

Nathan

Re: Correct handling of blank/commented lines in PSQL interactive-mode history

2021-11-30 Thread Bruce Momjian

On Tue, Nov 30, 2021 at 04:35:13PM -0500, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Mon, Nov 29, 2021 at 11:12:35PM -0500, Tom Lane wrote:
> >> Interesting.  I can probably adjust my MUA to send "text/x-patch",
> >> but I'll have to look around to see where that's determined.
> 
> > I would be interesting to know if "text/x-patch" is better than
> > "text/x-diff" --- I currently use the later.
> 
> I found out that where that is coming from is "file -i", so I'm a
> bit loath to modify it.  Is there any hard documentation as to why
> "text/x-patch" should be preferred?

I thought this was happening from /etc/mime.types:

text/x-diff diff patch

The file extensions 'diff' and 'patch' trigger mime to use text/x-diff
for its attachments, at least on Debian.  Based on that, I assumed
"text/x-diff" was more standardized than "text/x-patch".

However, it seems file -i also looks at the contents since a file with a
single word in it is not recognized as a diff:

$ git diff > /rtmp/x.diff
$ file -i /rtmp/x.diff
/rtmp/x.diff: text/x-diff; charset=us-ascii
  ---
$ echo test > /rtmp/x.diff
$ file -i /rtmp/x.diff
/rtmp/x.diff: text/plain; charset=us-ascii
  --

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Correct handling of blank/commented lines in PSQL interactive-mode history

2021-11-30 Thread Tom Lane

Bruce Momjian  writes:
> On Mon, Nov 29, 2021 at 11:12:35PM -0500, Tom Lane wrote:
>> Interesting.  I can probably adjust my MUA to send "text/x-patch",
>> but I'll have to look around to see where that's determined.

> I would be interesting to know if "text/x-patch" is better than
> "text/x-diff" --- I currently use the later.

I found out that where that is coming from is "file -i", so I'm a
bit loath to modify it.  Is there any hard documentation as to why
"text/x-patch" should be preferred?

regards, tom lane

Re: Correct handling of blank/commented lines in PSQL interactive-mode history

2021-11-30 Thread Bruce Momjian

On Mon, Nov 29, 2021 at 11:12:35PM -0500, Tom Lane wrote:
> Greg Nancarrow  writes:
> > After a bit of investigation, it seems that patch attachments (like
> > yours) with a Context-Type of "text/x-diff" download through Gmail in
> > CRLF format for me (I'm running a browser on Windows, but my Postgres
> > development environment is in a Linux VM). So those must get converted
> > from Unix to CRLF format if downloaded using a browser running on
> > Windows.
> > The majority of patch attachments (?) seem to have a Context-Type of
> > "application/octet-stream" or "text/x-patch", and these seem to
> > download raw (in their original Unix format).
> 
> Interesting.  I can probably adjust my MUA to send "text/x-patch",
> but I'll have to look around to see where that's determined.
> (I dislike using "application/octet-stream" for this, because
> the archives won't show that as text, they only let you download
> the attachment.  Maybe that's more Safari's fault than the
> archives per se, not sure.)

I would be interesting to know if "text/x-patch" is better than
"text/x-diff" --- I currently use the later.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Non-superuser subscription owners

2021-11-30 Thread Jeff Davis

On Tue, 2021-11-30 at 17:25 +0530, Amit Kapila wrote:
> I think it would be better to do it before we allow subscription
> owners to be non-superusers.

There are a couple other things to consider before allowing non-
superusers to create subscriptions anyway. For instance, a non-
superuser shouldn't be able to use a connection string that reads the
certificate file from the server unless they also have
pg_read_server_files privs.

> Yeah, it is possible that is why I suggested in one of the emails
> above to allow changing the owners only for disabled subscriptions.

The current patch detects the following cases at the transaction
boundary:

 * ALTER SUBSCRIPTION ... OWNER TO ...
 * ALTER ROLE ... NOSUPERUSER
 * privileges revoked one way or another (aside from the RLS/WCO
problems, which will be fixed)

If we want to detect at row boundaries we need to capture all of those
cases too, or else we're being inconsistent. The latter two cannot be
tied to whether the subscription is disabled or not, so I don't think
that's a complete solution.

How about (as a separate patch) we just do maybe_reread_subscription()
every K operations within a transaction? That would speed up
permissions errors if a revoke happens.

Regards,
Jeff Davis

Re: allowing "map" for password auth methods with clientcert=verify-full

2021-11-30 Thread Jacob Champion

On Mon, 2021-11-08 at 15:32 -0500, Stephen Frost wrote:
> Where does that leave us with what Jonathan is suggesting though?  For
> my 2c, we shouldn't allow 'map=' to be used for scram or md5 because
> it'll just confuse users, until and unless we actually do the PGAUTHUSER
> thing and then we can allow 'map=' to check if that mapping is allowed
> and the actual SCRAM PW check is done against PGAUTHUSER and then the
> logged in user is the user as specified in the startup packet, assuming
> that mapping is allowed.  For Jonathan's actual case though, we should
> add a 'certmap' option instead and have that be explicitly for the case
> where it's scram w/ clientcert=verify-full and then we check the mapping
> of the DN/CN to the startup-packet username.  There's no reason this
> couldn't also work with a 'map' specified and PGAUTHUSER set and SCRAM
> used to verify against that at the same time.

Agreed.

> Perhaps not a surprise, but I continue to be against the idea of adding
> anything more to the insecure hack that is our LDAP auth method.  We
> should be moving away from that, not adding to it.

Paraphrasing you from earlier, the "authenticate as one user and then
log in as another" use case is the one I'm trying to expand. LDAP is
just the auth method I happen to have present-day customer cases for.

> That this would also require a new connection option / envvar to tell us
> who the user wants to authenticate to LDAP as doesn't exactly make me
> any more thrilled with it.

Forgetting the LDAP part for a moment, do you have another suggestion
for how we can separate the role name from the user name? The
connection string seemed to be the most straightforward.

--Jacob

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

On Tue, Nov 23, 2021 at 5:01 PM Peter Geoghegan  wrote:
> > Behaviour that lead to a "sudden" falling over, rather than getting 
> > gradually
> > worse are bad - they somehow tend to happen on Friday evenings :).
>
> These are among our most important challenges IMV.

I haven't had time to work through any of your feedback just yet --
though it's certainly a priority for. I won't get to it until I return
home from PGConf NYC next week.

Even still, here is a rebased v2, just to fix the bitrot. This is just
a courtesy to anybody interested in the patch.

-- 
Peter Geoghegan

v2-0001-Simplify-lazy_scan_heap-s-handling-of-scanned-pag.patch
Description: Binary data

Re: Rename dead_tuples to dead_items in vacuumlazy.c

On Mon, Nov 29, 2021 at 7:00 PM Masahiko Sawada  wrote:
> Thanks! I'll change my parallel vacuum refactoring patch accordingly.

Thanks again for working on that.

> Regarding the commit, I think that there still is one place in
> lazyvacuum.c where we can change "dead tuples” to "dead items”:
>
> /*
>  * Allocate the space for dead tuples.  Note that this handles parallel
>  * VACUUM initialization as part of allocating shared memory space used
>  * for dead_items.
>  */
> dead_items_alloc(vacrel, params->nworkers);
> dead_items = vacrel->dead_items;

Oops. Pushed a fixup for that just now.

> Also, the commit doesn't change both PROGRESS_VACUUM_MAX_DEAD_TUPLES
> and PROGRESS_VACUUM_NUM_DEAD_TUPLES. Did you leave them on purpose?

That was deliberate.

It would be a bit strange to alter these constants without also
updating the corresponding column names for the
pg_stat_progress_vacuum system view. But if I kept the definition from
system_views.sql in sync, then I would break user scripts -- for
reasons that users don't care about. That didn't seem like the right
approach.

Also, the system as a whole still assumes "DEAD tuples and LP_DEAD
items are the same, and are just as much of a problem in the table as
they are in each index". As you know, this is not really true, which
is an important problem for us. Fixing it (perhaps as part of adding
something like Robert's conveyor belt design) will likely require
revising this model quite fundamentally (e.g, the vacthresh
calculation in autovacuum.c:relation_needs_vacanalyze() would be
replaced). When this happens, we'll probably need to update system
views that have columns with names like "dead_tuples" -- because maybe
we no longer specifically count dead items/tuples at all. I strongly
suspect that the approach to statistics that we take for pg_statistic
optimizer stats just doesn't work for dead items/tuples -- statistical
sampling only produces useful statistics for the optimizer because
certain delicate assumptions are met (even these assumptions only
really work with a properly normalized database schema).

Maybe revising the model used for autovacuum scheduling wouldn't
include changing pg_stat_progress_vacuum, since that isn't technically
"part of the model" --- I'm not sure. But it's not something that I am
in a hurry to fix.

-- 
Peter Geoghegan

Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

2021-11-30 Thread Robert Haas

On Tue, Nov 30, 2021 at 4:47 AM Andy Fan  wrote:
>> my exception should be that the relcache should not be invalidated _after 
>> the first relation open_
>> in the executor (not the beginning of executorRun)。
>
> s/exception/expectation.
>
> To be more accurate,  my expectation is for a single sql statement,  after 
> the first time I write data into
> the relation, until the statement is completed or "aborted and RelationClose 
> is called in ResourceOwnerRelease",
> the relcache reset should never happen.

Well  I'm not sure why you asked the question and then argued with
the answer you got. I mean, you can certainly decide how you think it
works, but that's not how I think it works.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] Don't block HOT update by BRIN index

2021-11-30 Thread Tomas Vondra


OK,

I've polished the last version of the patch a bit (added a regression 
test with update of attribute in index predicate and docs about the new 
flag into indexam.sgml) and pushed.


I wonder if we could/should improve handling of index predicates. In 
particular, it seems to me we could simply ignore indexes when the new 
row does not match the index predicate. For example, if there's an index


  CREATE INDEX ON t (a) WHERE b = 1;

and the update does:

  UPDATE t SET b = 2 WHERE ...;

then we'll not add the tuple pointer to this index anyway, and we could 
simply ignore this index when considering HOT. But I might be missing 
something important about HOT ...


The main problem I see with this is it requires evaluating the index 
predicate for each tuple, which makes it incompatible with the caching 
in RelationGetIndexAttrBitmap. Just ditching the caching seems like a 
bad idea, so we'd probably have to do this in two phases:


1) Do what we do now, i.e. RelationGetIndexAttrBitmap considering all 
indexes / attributes. If this says HOT is possible, great - we're done.


2) If (1) says HOT is not possible, we need to look whether it's because 
of regular or partial index. For regular indexes it's clear, for partial 
indexes we could ignore this if the predicate evaluates to false for the 
new row.


But even if such optimization is possible, it's way out of scope of this 
patch and it's not clear to me it's actually a sensible trade-off.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Support for NSS as a libpq TLS backend

2021-11-30 Thread Jacob Champion

On Mon, 2021-09-27 at 15:44 +0200, Daniel Gustafsson wrote:
> > Speaking of IP addresses in SANs, it doesn't look like our OpenSSL
> > backend can handle those. That's a separate conversation, but I might
> > take a look at a patch for next commitfest.
> 
> Please do.

Didn't get around to it for November, but I'm putting the finishing
touches on that now.

While I was looking at the new SAN code (in fe-secure-nss.c,
pgtls_verify_peer_name_matches_certificate_guts()), I noticed that code
coverage never seemed to touch a good chunk of it:

> +for (cn = san_list; cn != san_list; cn = CERT_GetNextGeneralName(cn))
> +{
> +char   *alt_name;
> +int rv;
> +chartmp[512];

That loop can never execute. But I wonder if all of that extra SAN code
should be removed anyway? There's this comment above it:

> + /*
> +  * CERT_VerifyCertName will internally perform RFC 2818 SubjectAltName
> +  * verification.
> +  */

and it seems like SAN verification is working in my testing, despite
the dead loop.

--Jacob

Re: pg_replslotdata - a tool for displaying replication slot information

On 11/30/21, 6:14 AM, "Peter Eisentraut"  
wrote:
> On 23.11.21 06:09, Bharath Rupireddy wrote:
>> The replication slots data is stored in binary format on the disk under
>> the pg_replslot/<> directory which isn't human readable. If
>> the server is crashed/down (for whatever reasons) and unable to come up,
>> currently there's no way for the user/admin/developer to know what were
>> all the replication slots available at the time of server crash/down to
>> figure out what's the restart lsn, xid, two phase info or types of slots
>> etc.
>
> What do you need that for?  You can't do anything with a replication
> slot while the server is down.

One use-case might be to discover the value you need to set for
max_replication_slots, although it's pretty trivial to discover the
number of replication slots by looking at the folder directly.
However, you also need to know how many replication origins there are,
and AFAIK there isn't an easy way to read the replorigin_checkpoint
file at the moment.  IMO a utility like this should also show details
for the replication origins.  I don't have any other compelling use-
cases at the moment, but I will say that it is typically nice from an
administrative standpoint to be able to inspect things like this
without logging into a running server.

Nathan

Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2021-11-30 Thread Stephen Frost

Greetings,

On Tue, Nov 30, 2021 at 11:47 Laurenz Albe  wrote:

> On Tue, 2021-11-30 at 09:20 -0500, Stephen Frost wrote:
> > Greetings,
> >
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > > Michael Paquier  writes:
> > > > On Thu, Nov 25, 2021 at 06:19:03PM -0800, SATYANARAYANA NARLAPURAM
> wrote:
> > > > > If we are keeping it then why not make it better?
> > >
> > > > Well, non-exclusive backups are better by design in many aspects, so
> I
> > > > don't quite see the point in spending time on something that has more
> > > > limitations than what's already in place.
> > >
> > > IMO the main reason for keeping it is backwards compatibility for users
> > > who have a satisfactory backup arrangement using it.  That same
> argument
> > > implies that we shouldn't change how it works (at least, not very
> much).
> >
> > There isn't a satisfactory backup approach using it specifically because
> > of this issue, hence why we should remove it to make it so users don't
> > run into this.
>
> There is a satisfactory approach, as long as you are satisfied with
> manually restarting the server if it crashed during a backup.


I disagree that that’s a satisfactory approach. It certainly wasn’t
intended or documented as part of the original feature and therefore to
call it satisfactory strikes me quite strongly as revisionist history.

> I don't find the reasons brought up to continue to support exclusive
> > backup to be at all compelling and the lack of huge issues with the new
> > way restore works to make it abundently clear that we can, in fact,
> > remove exclusive backup in a major version change without the world
> > coming down.
>
> I guess the lack of hue and cry was at least to a certain extent because
> the exclusive backup API was deprecated, but not removed.


These comments were in reference to the restore API, which was quite
changed (new special files that have to be touched, removing of
recovery.conf, options moved to postgresql.conf/.auto, etc). So, no.

Thanks,

Stephen

>

Re: Atomic rename feature for Windows.

2021-11-30 Thread Victor Spirin


Hi

The flags for calling the CreateFile function have been changed.

Victor Spirin
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

05.07.2021 16:53, Victor Spirin пишет:

Hi

I used the SetFileInformationByHandle function with the 
FILE_RENAME_FLAG_POSIX_SEMANTICS flag for the file rename function..


1) The _WIN32_WINNT variable needs to be increased to 0x0A00 (Windows 
10).  Fixed conflict with #undef CHECKSUM_TYPE_NONE


2) The SetFileInformationByHandle function works correctly only on 
Windows 10 and higher.


The app must have a manifest to check the Windows version using the 
IsWindows10OrGreater() function. I added a manifest to all postgres 
projects and disabled the GenerateManifest option on windows projects.


This patch related to this post: 
https://www.postgresql.org/message-id/CAEepm%3D0FV-k%2B%3Dd9z08cW%3DZXoR1%3Dkw9wdpkP6WAuOrKJdz-8ujg%40mail.gmail.com


diff --git a/src/include/common/checksum_helper.h 
b/src/include/common/checksum_helper.h
index cac7570ea1..2d533c93a6 100644
--- a/src/include/common/checksum_helper.h
+++ b/src/include/common/checksum_helper.h
@@ -26,6 +26,13 @@
  * MD5 here because any new that does need a cryptographically strong checksum
  * should use something better.
  */
+
+ /*
+ * CHECKSUM_TYPE_NONE defined in the winioctl.h when _WIN32_WINNT >= 
_WIN32_WINNT_WIN10
+ */
+#ifdef CHECKSUM_TYPE_NONE
+#undef CHECKSUM_TYPE_NONE
+#endif
 typedef enum pg_checksum_type
 {
CHECKSUM_TYPE_NONE,
diff --git a/src/include/port/win32.h b/src/include/port/win32.h
index c6213c77c3..c48a6ce3a1 100644
--- a/src/include/port/win32.h
+++ b/src/include/port/win32.h
@@ -12,12 +12,13 @@
 /*
  * Make sure _WIN32_WINNT has the minimum required value.
  * Leave a higher value in place. When building with at least Visual
- * Studio 2015 the minimum requirement is Windows Vista (0x0600) to
- * get support for GetLocaleInfoEx() with locales. For everything else
+ * Studio 2015 the minimum requirement is Windows 10 (0x0A00) to get support 
for SetFileInformationByHandle.
+ * The minimum requirement is Windows Vista (0x0600) get support for 
GetLocaleInfoEx() with locales.
+ * For everything else
  * the minimum version is Windows XP (0x0501).
  */
 #if defined(_MSC_VER) && _MSC_VER >= 1900
-#define MIN_WINNT 0x0600
+#define MIN_WINNT 0x0A00
 #else
 #define MIN_WINNT 0x0501
 #endif
diff --git a/src/port/dirmod.c b/src/port/dirmod.c
index 763bc5f915..031655f0c8 100644
--- a/src/port/dirmod.c
+++ b/src/port/dirmod.c
@@ -39,6 +39,192 @@
 #endif
 #endif
 
+
+#if defined(WIN32) && !defined(__CYGWIN__) && defined(_WIN32_WINNT_WIN10) && 
_WIN32_WINNT >= _WIN32_WINNT_WIN10
+
+#include 
+
+/*
+ * Checks Windows version using RtlGetVersion
+ * Version 1607 (Build 14393) is required for SetFileInformationByHandle
+ * function with FILE_RENAME_FLAG_POSIX_SEMANTICS flag
+*/
+typedef NTSYSAPI(NTAPI * PFN_RTLGETVERSION)
+(OUT PRTL_OSVERSIONINFOEXW lpVersionInformation);
+
+static int isWin1607 = -1;
+static int isWindows1607OrGreater()
+{
+   HMODULE ntdll;
+   PFN_RTLGETVERSION _RtlGetVersion = NULL;
+   OSVERSIONINFOEXW info;
+   if (isWin1607 >= 0) return isWin1607;
+   ntdll = LoadLibraryEx("ntdll.dll", NULL, 0);
+   if (ntdll == NULL)
+   {
+   DWORD   err = GetLastError();
+
+   _dosmaperr(err);
+   return -1;
+   }
+
+   _RtlGetVersion = (PFN_RTLGETVERSION)(pg_funcptr_t)
+   GetProcAddress(ntdll, "RtlGetVersion");
+   if (_RtlGetVersion == NULL)
+   {
+   DWORD   err = GetLastError();
+
+   FreeLibrary(ntdll);
+   _dosmaperr(err);
+   return -1;
+   }
+   info.dwOSVersionInfoSize = sizeof(OSVERSIONINFOEXW);
+   if (!NT_SUCCESS(_RtlGetVersion(&info)))
+   {
+   DWORD   err = GetLastError();
+
+   FreeLibrary(ntdll);
+   _dosmaperr(err);
+   return -1;
+   }
+
+   if (info.dwMajorVersion >= 10 && info.dwBuildNumber >= 14393) isWin1607 
= 1;
+   else isWin1607 = 0;
+   FreeLibrary(ntdll);
+   return isWin1607;
+
+}
+
+typedef struct _FILE_RENAME_INFO_EXT {
+   FILE_RENAME_INFO fri;
+   WCHAR extra_space[MAX_PATH];
+} FILE_RENAME_INFO_EXT;
+
+/*
+ * pgrename_windows_posix_semantics  - uses SetFileInformationByHandle function
+ * with FILE_RENAME_FLAG_POSIX_SEMANTICS flag for atomic rename file
+ * working only on Windows 10 (1607) or later and  _WIN32_WINNT must be >= 
_WIN32_WINNT_WIN10
+ */
+static int
+pgrename_windows_posix_semantics(const char *from, const char *to)
+{
+   int err;
+   FILE_RENAME_INFO_EXT rename_info;
+   PFILE_RENAME_INFO prename_info;
+   HANDLE f_handle;
+   wchar_t from_w[MAX_PATH];
+
+   prename_info = (PFILE_RENAME_INFO)&rename_info;
+
+   if (MultiByteToWideChar(CP_ACP, 0, (LPCCH)from, -1, (LPWSTR)from_w, 
MAX_PATH) == 0) {
+   err = GetLastErro

Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2021-11-30 Thread Laurenz Albe

On Tue, 2021-11-30 at 09:20 -0500, Stephen Frost wrote:
> Greetings,
> 
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > Michael Paquier  writes:
> > > On Thu, Nov 25, 2021 at 06:19:03PM -0800, SATYANARAYANA NARLAPURAM wrote:
> > > > If we are keeping it then why not make it better?
> > 
> > > Well, non-exclusive backups are better by design in many aspects, so I
> > > don't quite see the point in spending time on something that has more
> > > limitations than what's already in place.
> > 
> > IMO the main reason for keeping it is backwards compatibility for users
> > who have a satisfactory backup arrangement using it.  That same argument
> > implies that we shouldn't change how it works (at least, not very much).
> 
> There isn't a satisfactory backup approach using it specifically because
> of this issue, hence why we should remove it to make it so users don't
> run into this.

There is a satisfactory approach, as long as you are satisfied with
manually restarting the server if it crashed during a backup.

> I don't find the reasons brought up to continue to support exclusive
> backup to be at all compelling and the lack of huge issues with the new
> way restore works to make it abundently clear that we can, in fact,
> remove exclusive backup in a major version change without the world
> coming down.

I guess the lack of hue and cry was at least to a certain extent because
the exclusive backup API was deprecated, but not removed.

Yours,
Laurenz Albe

Re: Skipping logical replication transactions on subscriber side

On Tue, Nov 30, 2021 at 7:09 PM Masahiko Sawada  wrote:
>
> On Tue, Nov 30, 2021 at 8:41 PM Masahiko Sawada  wrote:
> >
> > On Tue, Nov 30, 2021 at 6:28 PM Amit Kapila  wrote:
> > >
> > > On Mon, Nov 29, 2021 at 11:38 AM vignesh C  wrote:
> > > >
> > >
> > > I have pushed this patch and there is a buildfarm failure for it. See:
> > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&dt=2021-11-30%2005%3A05%3A25
> > >
> > > Sawada-San has shared his initial analysis on pgsql-committers [1] and
> > > I am responding here as the fix requires some more discussion.
> > >
> > > > Looking at the result the test actually got, we had two error entries
> > > > for test_tab1 instead of one:
> > > >
> > > > #   Failed test 'check the error reported by the apply worker'
> > > > #   at t/026_worker_stats.pl line 33.
> > > > #  got: 'tap_sub|INSERT|test_tab1|t
> > > > # tap_sub||test_tab1|t'
> > > > # expected: 'tap_sub|INSERT|test_tab1|t'
> > > >
> > > > The possible scenarios are:
> > > >
> > > > The table sync worker for test_tab1 failed due to an error unrelated
> > > > to apply changes:
> > > >
> > > > 2021-11-30 06:24:02.137 CET [18990:2] ERROR:  replication origin with
> > > > OID 2 is already active for PID 23706
> > > >
> > > > At this time, the view had one error entry for the table sync worker.
> > > > After retrying table sync, it succeeded:
> > > >
> > > > 2021-11-30 06:24:04.202 CET [28117:2] LOG:  logical replication table
> > > > synchronization worker for subscription "tap_sub", table "test_tab1"
> > > > has finished
> > > >
> > > > Then after inserting a row on the publisher, the apply worker inserted
> > > > the row but failed due to violating a unique key violation, which is
> > > > expected:
> > > >
> > > > 2021-11-30 06:24:04.307 CET [4806:2] ERROR:  duplicate key value
> > > > violates unique constraint "test_tab1_pkey"
> > > > 2021-11-30 06:24:04.307 CET [4806:3] DETAIL:  Key (a)=(1) already 
> > > > exists.
> > > > 2021-11-30 06:24:04.307 CET [4806:4] CONTEXT:  processing remote data
> > > > during "INSERT" for replication target relation "public.test_tab1" in
> > > > transaction 721 at 2021-11-30 06:24:04.305096+01
> > > >
> > > > As a result, we had two error entries for test_tab1: the table sync
> > > > worker error and the apply worker error. I didn't expect that the
> > > > table sync worker for test_tab1 failed due to "replication origin with
> > > > OID 2 is already active for PID 23706” error.
> > > >
> > > > Looking at test_subscription_error() in 026_worker_stats.pl, we have
> > > > two checks; in the first check, we wait for the view to show the error
> > > > entry for the given relation name and xid. This check was passed since
> > > > we had the second error (i.g., apply worker error). In the second
> > > > check, we get error entries from pg_stat_subscription_workers by
> > > > specifying only the relation name. Therefore, we ended up getting two
> > > > entries and failed the tests.
> > > >
> > > > To fix this issue, I think that in the second check, we can get the
> > > > error from pg_stat_subscription_workers by specifying the relation
> > > > name *and* xid like the first check does. I've attached the patch.
> > > > What do you think?
> > > >
> > >
> > > I think this will fix the reported failure but there is another race
> > > condition in the test. Isn't it possible that for table test_tab2, we
> > > get an error "replication origin with OID ..." or some other error
> > > before copy, in that case also, we will proceed from the second call
> > > of test_subscription_error() which is not what we expect in the test?
> >
> > Right.
> >
> > > Shouldn't we someway check that the error message also starts with
> > > "duplicate key value violates ..."?
> >
> > Yeah, I think it's a good idea to make the checks more specific. That
> > is, probably we can specify the prefix of the error message and
> > subrelid in addition to the current conditions: relid and xid. That
> > way, we can check what error was reported by which workers (tablesync
> > or apply) for which relations. And both check queries in
> > test_subscription_error() can have the same WHERE clause.
>
> I've attached a patch that fixes this issue. Please review it.

Thanks for the updated patch, the patch applies neatly and make
check-world passes. Also I ran the failing test in a loop and found it
to be passing always.

Regards,
Vignesh

Re: row filtering for logical replication

On Tue, Nov 30, 2021 at 12:33 PM Ajin Cherian  wrote:
>
> On Thu, Nov 25, 2021 at 2:22 PM Peter Smith  wrote:
> >
> > Thanks for all the review comments so far! We are endeavouring to keep
> > pace with them.
> >
> > All feedback is being tracked and we will fix and/or reply to everything 
> > ASAP.
> >
> > Meanwhile, PSA the latest set of v42* patches.
> >
> > This version was mostly a patch restructuring exercise but it also
> > addresses some minor review comments in passing.
> >
>
> Addressed more review comments, in the attached patch-set v43. 5
> patches carried forward from v42.
> This patch-set contains the following fixes:
>
> On Tue, Nov 23, 2021 at 1:28 AM Dilip Kumar  wrote:
> >
> > in pgoutput_row_filter, we are dropping the slots if there are some
> > old slots in the RelationSyncEntry.  But then I noticed that in
> > rel_sync_cache_relation_cb(), also we are doing that but only for the
> > scantuple slot.  So IMHO, rel_sync_cache_relation_cb(), is only place
> > setting entry->rowfilter_valid to false; so why not drop all the slot
> > that time only and in pgoutput_row_filter(), you can just put an
> > assert?
> >
>
> Moved all the dropping of slots to rel_sync_cache_relation_cb()
>
> > +static bool
> > +pgoutput_row_filter_virtual(Relation relation, TupleTableSlot *slot,
> > RelationSyncEntry *entry)
> > +{
> > +EState   *estate;
> > +ExprContext *ecxt;
> >
> >
> > pgoutput_row_filter_virtual and pgoutput_row_filter are exactly same
> > except, ExecStoreHeapTuple(), so why not just put one check based on
> > whether a slot is passed or not, instead of making complete duplicate
> > copy of the function.
>
> Removed pgoutput_row_filter_virtual
>
> >  oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> >  tupdesc = CreateTupleDescCopy(tupdesc);
> >  entry->scantuple = MakeSingleTupleTableSlot(tupdesc, 
> > &TTSOpsHeapTuple);
> >
> > Why do we need to copy the tupledesc? do we think that we need to have
> > this slot even if we close the relation, if so can you add the
> > comments explaining why we are making a copy here.
>
> This code has been modified, and comments added.
>
> On Tue, Nov 23, 2021 at 8:02 PM Amit Kapila  wrote:
> > One more thing related to this code:
> > pgoutput_row_filter()
> > {
> > ..
> > + if (!entry->rowfilter_valid)
> > {
> > ..
> > + oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> > + tupdesc = CreateTupleDescCopy(tupdesc);
> > + entry->scantuple = MakeSingleTupleTableSlot(tupdesc, &TTSOpsHeapTuple);
> > + MemoryContextSwitchTo(oldctx);
> > ..
> > }
> >
> > Why do we need to initialize scantuple here unless we are sure that
> > the row filter is going to get associated with this relentry? I think
> > when there is no row filter then this allocation is not required.
> >
>
> Modified as suggested.
>
> On Tue, Nov 23, 2021 at 10:52 PM Amit Kapila  wrote:
> >
> > In 0003 patch, why is below change required?
> > --- a/src/backend/replication/pgoutput/pgoutput.c
> > +++ b/src/backend/replication/pgoutput/pgoutput.c
> > @@ -1,4 +1,4 @@
> > -/*-
> > +/*
> >   *
> >   * pgoutput.c
> >
>
> Removed.
>
> >
> > After above, rearrange the code in pgoutput_row_filter(), so that two
> > different checks related to 'rfisnull'  (introduced by different
> > patches) can be combined as if .. else check.
> >
> Fixed.
>
> On Thu, Nov 25, 2021 at 12:03 AM Amit Kapila  wrote:
> >
> > + * If the new relation or the old relation has a where clause,
> > + * we need to remove it so that it can be added afresh later.
> > + */
> > + if (RelationGetRelid(newpubrel->relation) == oldrelid &&
> > + newpubrel->whereClause == NULL && rfisnull)
> >
> > Can't we use _equalPublicationTable() here? It compares the whereClause as 
> > well.
> >
>
> Tried this, can't do this because one is an alter statement while the
> other is a publication, the whereclause is not
> the same Nodetype. In the statement, the whereclause is T_A_Expr,
> while in the publication
> catalog, it is T_OpExpr.

Here we will not be able to do a direct comparison as we store the
transformed where clause in the pg_publication_rel table. We will have
to transform the where clause and then check. I have attached a patch
where we can check the transformed where clause and see if the where
clause is the same or not. If you are ok with this approach you could
make similar changes.

Regards,
Vignesh
diff --git a/src/backend/catalog/pg_publication.c b/src/backend/catalog/pg_publication.c
index b80c21e6ae..ae4a46e44a 100644
--- a/src/backend/catalog/pg_publication.c
+++ b/src/backend/catalog/pg_publication.c
@@ -359,6 +359,32 @@ GetPubPartitionOptionRelations(List *result, PublicationPartOpt pub_partopt,
 	return result;
 }
 
+Node *
+GetTransformedWhereClause(ParseState *pstate, PublicationRelInfo *pri,
+		  bool bfixupcollation)
+{
+	ParseNamespaceItem

Re: [PATCH] DROP tab completion

2021-11-30 Thread Asif Rehman

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

The patch applies cleanly and the functionality seems to work well. (master 
e7122548a3)

The new status of this patch is: Ready for Committer

Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file

2021-11-30 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Michael Paquier  writes:
> > On Thu, Nov 25, 2021 at 06:19:03PM -0800, SATYANARAYANA NARLAPURAM wrote:
> >> If we are keeping it then why not make it better?
> 
> > Well, non-exclusive backups are better by design in many aspects, so I
> > don't quite see the point in spending time on something that has more
> > limitations than what's already in place.
> 
> IMO the main reason for keeping it is backwards compatibility for users
> who have a satisfactory backup arrangement using it.  That same argument
> implies that we shouldn't change how it works (at least, not very much).

There isn't a satisfactory backup approach using it specifically because
of this issue, hence why we should remove it to make it so users don't
run into this.  Would also simplify the documentation around the low
level backup API, which would be a very good thing.  Right now, making
improvements in that area is very challenging even if all you want to do
is improve the documentation around the non-exclusive API.

We dealt with this as best as one could in pgbackrest for PG versions
prior to when non-exclusive backup was added- which is to remove the
backup_label file as soon as possible and then put it back right before
you call pg_stop_backup() (since it'll complain otherwise).  Not a
perfect answer though and a risk still exists there of a failed restart
happening.  Of course, for versions which support non-exclusive backup,
we use that to avoid this issue.

We also extensively changed how restore works a couple releases ago and
while there was some noise about it, it certainly wasn't all that bad.
I don't find the reasons brought up to continue to support exclusive
backup to be at all compelling and the lack of huge issues with the new
way restore works to make it abundently clear that we can, in fact,
remove exclusive backup in a major version change without the world
coming down.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: pg_replslotdata - a tool for displaying replication slot information

2021-11-30 Thread Peter Eisentraut


On 23.11.21 06:09, Bharath Rupireddy wrote:
The replication slots data is stored in binary format on the disk under 
the pg_replslot/<> directory which isn't human readable. If 
the server is crashed/down (for whatever reasons) and unable to come up, 
currently there's no way for the user/admin/developer to know what were 
all the replication slots available at the time of server crash/down to 
figure out what's the restart lsn, xid, two phase info or types of slots 
etc.


What do you need that for?  You can't do anything with a replication 
slot while the server is down.

Re: Skipping logical replication transactions on subscriber side

On Tue, Nov 30, 2021 at 8:41 PM Masahiko Sawada  wrote:
>
> On Tue, Nov 30, 2021 at 6:28 PM Amit Kapila  wrote:
> >
> > On Mon, Nov 29, 2021 at 11:38 AM vignesh C  wrote:
> > >
> >
> > I have pushed this patch and there is a buildfarm failure for it. See:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&dt=2021-11-30%2005%3A05%3A25
> >
> > Sawada-San has shared his initial analysis on pgsql-committers [1] and
> > I am responding here as the fix requires some more discussion.
> >
> > > Looking at the result the test actually got, we had two error entries
> > > for test_tab1 instead of one:
> > >
> > > #   Failed test 'check the error reported by the apply worker'
> > > #   at t/026_worker_stats.pl line 33.
> > > #  got: 'tap_sub|INSERT|test_tab1|t
> > > # tap_sub||test_tab1|t'
> > > # expected: 'tap_sub|INSERT|test_tab1|t'
> > >
> > > The possible scenarios are:
> > >
> > > The table sync worker for test_tab1 failed due to an error unrelated
> > > to apply changes:
> > >
> > > 2021-11-30 06:24:02.137 CET [18990:2] ERROR:  replication origin with
> > > OID 2 is already active for PID 23706
> > >
> > > At this time, the view had one error entry for the table sync worker.
> > > After retrying table sync, it succeeded:
> > >
> > > 2021-11-30 06:24:04.202 CET [28117:2] LOG:  logical replication table
> > > synchronization worker for subscription "tap_sub", table "test_tab1"
> > > has finished
> > >
> > > Then after inserting a row on the publisher, the apply worker inserted
> > > the row but failed due to violating a unique key violation, which is
> > > expected:
> > >
> > > 2021-11-30 06:24:04.307 CET [4806:2] ERROR:  duplicate key value
> > > violates unique constraint "test_tab1_pkey"
> > > 2021-11-30 06:24:04.307 CET [4806:3] DETAIL:  Key (a)=(1) already exists.
> > > 2021-11-30 06:24:04.307 CET [4806:4] CONTEXT:  processing remote data
> > > during "INSERT" for replication target relation "public.test_tab1" in
> > > transaction 721 at 2021-11-30 06:24:04.305096+01
> > >
> > > As a result, we had two error entries for test_tab1: the table sync
> > > worker error and the apply worker error. I didn't expect that the
> > > table sync worker for test_tab1 failed due to "replication origin with
> > > OID 2 is already active for PID 23706” error.
> > >
> > > Looking at test_subscription_error() in 026_worker_stats.pl, we have
> > > two checks; in the first check, we wait for the view to show the error
> > > entry for the given relation name and xid. This check was passed since
> > > we had the second error (i.g., apply worker error). In the second
> > > check, we get error entries from pg_stat_subscription_workers by
> > > specifying only the relation name. Therefore, we ended up getting two
> > > entries and failed the tests.
> > >
> > > To fix this issue, I think that in the second check, we can get the
> > > error from pg_stat_subscription_workers by specifying the relation
> > > name *and* xid like the first check does. I've attached the patch.
> > > What do you think?
> > >
> >
> > I think this will fix the reported failure but there is another race
> > condition in the test. Isn't it possible that for table test_tab2, we
> > get an error "replication origin with OID ..." or some other error
> > before copy, in that case also, we will proceed from the second call
> > of test_subscription_error() which is not what we expect in the test?
>
> Right.
>
> > Shouldn't we someway check that the error message also starts with
> > "duplicate key value violates ..."?
>
> Yeah, I think it's a good idea to make the checks more specific. That
> is, probably we can specify the prefix of the error message and
> subrelid in addition to the current conditions: relid and xid. That
> way, we can check what error was reported by which workers (tablesync
> or apply) for which relations. And both check queries in
> test_subscription_error() can have the same WHERE clause.

I've attached a patch that fixes this issue. Please review it.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


0001-Fix-regression-test-failure-caused-by-commit-8d74fc9.patch
Description: Binary data

Re: Windows build warnings

2021-11-30 Thread Daniel Gustafsson

> On 27 Nov 2021, at 14:55, Andrew Dunstan  wrote:

> ISTM the worst case is that there will be undetected unused variables in
> Windows-only code. I guess that would mostly be detected by Msys systems
> running gcc.

Yes, that should be caught there.  I've applied this now together with the
removal of PG_USED_FOR_ASSERTS_ONLY on those variables where it was set on
variables in general use.

--
Daniel Gustafsson   https://vmware.com/

Re: suboverflowed subtransactions concurrency performance optimize

2021-11-30 Thread Simon Riggs

On Mon, 30 Aug 2021 at 11:25, Andrey Borodin  wrote:
>
> Hi Pengcheng!
>
> You are solving important problem, thank you!
>
> > 30 авг. 2021 г., в 13:43, Pengchengliu  написал(а):
> >
> > To resolve this performance problem, we think about a solution which cache
> > SubtransSLRU to local cache.
> > First we can query parent transaction id from SubtransSLRU, and copy the
> > SLRU page to local cache page.
> > After that if we need query parent transaction id again, we can query it
> > from local cache directly.
>
> A copy of SLRU in each backend's cache can consume a lot of memory.

Yes, copying the whole SLRU into local cache seems overkill.

> Why create a copy if we can optimise shared representation of SLRU?

transam.c uses a single item cache to prevent thrashing from repeated
lookups, which reduces problems with shared access to SLRUs.
multitrans.c also has similar.

I notice that subtrans. doesn't have this, but could easily do so.
Patch attached, which seems separate to other attempts at tuning.

On review, I think it is also possible that we update subtrans ONLY if
someone uses >PGPROC_MAX_CACHED_SUBXIDS.
This would make subtrans much smaller and avoid one-entry-per-page
which is a major source of cacheing.
This would means some light changes in GetSnapshotData().
Let me know if that seems interesting also?

-- 
Simon Riggshttp://www.EnterpriseDB.com/

subtrans_single_item_cache.v1.patch
Description: Binary data

RE: Optionally automatically disable logical replication subscriptions on error

2021-11-30 Thread osumi.takami...@fujitsu.com

On Monday, November 29, 2021 2:38 PM vignesh C 
> Thanks for the updated patch, Few comments:
Thank you for your review !

> 1) Since this function is used only from 027_disable_on_error and not used by
> others, this can be moved to 027_disable_on_error:
> +sub wait_for_subscriptions
> +{
> +   my ($self, $dbname, @subscriptions) = @_;
> +
> +   # Unique-ify the subscriptions passed by the caller
> +   my %unique = map { $_ => 1 } @subscriptions;
> +   my @unique = sort keys %unique;
> +   my $unique_count = scalar(@unique);
> +
> +   # Construct a SQL list from the unique subscription names
> +   my @escaped = map { s/'/''/g; s/\\//g; $_ } @unique;
> +   my $sublist = join(', ', map { "'$_'" } @escaped);
> +
> +   my $polling_sql = qq(
> +   SELECT COUNT(1) = $unique_count FROM
> +   (SELECT s.oid
> +   FROM pg_catalog.pg_subscription s
> +   LEFT JOIN pg_catalog.pg_subscription_rel
> sr
> +   ON sr.srsubid = s.oid
> +   WHERE (sr IS NULL OR sr.srsubstate IN
> ('s', 'r'))
> + AND s.subname IN ($sublist)
> + AND s.subenabled IS TRUE
> +UNION
> +SELECT s.oid
> +   FROM pg_catalog.pg_subscription s
> +   WHERE s.subname IN ($sublist)
> + AND s.subenabled IS FALSE
> +   ) AS synced_or_disabled
> +   );
> +   return $self->poll_query_until($dbname, $polling_sql); }
Fixed.

> 2) The empty line after comment is not required, it can be removed
> +# Create non-unique data in both schemas on the publisher.
> +#
> +for $schema (@schemas)
> +{
Fixed.

> 3) Similarly it can be changed across the file
> +# Wait for the initial subscription synchronizations to finish or fail.
> +#
> +$node_subscriber->wait_for_subscriptions('postgres', @schemas)
> +   or die "Timed out while waiting for subscriber to synchronize
> +data";
> 
> +# Enter unique data for both schemas on the publisher.  This should
> +succeed on # the publisher node, and not cause any additional problems
> +on the subscriber # side either, though disabled subscription "s1" should not
> replicate anything.
> +#
> +for $schema (@schemas)
Fixed.
 
> 4) Since subid is used only at one place, no need of subid variable, you could
> replace subid with subform->oid in LockSharedObject
> +   Datum   values[Natts_pg_subscription];
> +   HeapTuple   tup;
> +   Oid subid;
> +   Form_pg_subscription subform;
> 
> +   subid = subform->oid;
> +   LockSharedObject(SubscriptionRelationId, subid, 0,
> + AccessExclusiveLock);
Fixed.

> 5) "permissions errors" should be "permission errors"
> +  Specifies whether the subscription should be automatically
> disabled
> +  if replicating data from the publisher triggers non-transient 
> errors
> +  such as referential integrity or permissions errors. The default is
> +  false.
> + 
Fixed.

The new patch v8 is shared in [1].

[1] - 
https://www.postgresql.org/message-id/TYCPR01MB83735AA021E0F614A3AB3221ED679%40TYCPR01MB8373.jpnprd01.prod.outlook.com

Best Regards,
Takamichi Osumi

Re: increase size of pg_commit_ts buffers

2021-11-30 Thread Simon Riggs

On Fri, 12 Nov 2021 at 17:39, Tomas Vondra
 wrote:

> So +1 to just get this committed, as it is.

This is a real issue affecting Postgres users. Please commit this soon.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

RE: Optionally automatically disable logical replication subscriptions on error

2021-11-30 Thread osumi.takami...@fujitsu.com

On Tuesday, November 30, 2021 1:10 PM Greg Nancarrow  
wrote:
> On Sat, Nov 27, 2021 at 1:36 AM osumi.takami...@fujitsu.com
>  wrote:
> >
> > This v7 uses v26 of skip xid patch [1]
> This patch no longer applies on the latest source.
> Also, the patch is missing an update to doc/src/sgml/catalogs.sgml, for the
> new "subdisableonerr" column of pg_subscription.
Thanks for your review !

Fixed the documentation accordingly. Further,
this comment invoked some more refactoring of codes
since I wrote some internal codes related to
'disable_on_error' in an inconsistent order.
I fixed this by keeping patch's codes
after that of 'two_phase' subscription option as much as possible.

I also conducted both pgindent and pgperltidy.

Now, I'll share the v8 that uses PG
whose commit id is after 8d74fc9 (pg_stat_subscription_workers).

Best Regards,
Takamichi Osumi

v8-0001-Optionally-disable-subscriptions-on-error.patch
Description: v8-0001-Optionally-disable-subscriptions-on-error.patch

Re: Non-superuser subscription owners

On Tue, Nov 30, 2021 at 12:56 AM Jeff Davis  wrote:
>
> On Mon, 2021-11-29 at 09:43 +0530, Amit Kapila wrote:
> > The first reason is that way it would be consistent with what we can
> > see while doing the operations from the backend.
>
> Logical replication is not interactive, so it doesn't seem quite the
> same.
>
> If you have a long running INSERT INTO SELECT or COPY FROM, the
> permission checks just happen at the beginning. As a user, it wouldn't
> surprise me if logical replication was similar.
>
> > operation. Another reason is to make behavior predictable as users
> > can
> > always expect when exactly the privilege change will be reflected and
> > it won't depend on the number of changes in the transaction.
>
> This patch does detect ownership changes more quickly (at the
> transaction boundary) than the current code (only when it reloads for
> some other reason). Transaction boundary seems like a reasonable time
> to detect the change to me.
>
> Detecting faster might be nice, but I don't have a strong opinion about
> it and I don't see why it necessarily needs to happen before this patch
> goes in.
>

I think it would be better to do it before we allow subscription
owners to be non-superusers.

> Also, do you think the cost of doing maybe_reread_subscription() per-
> tuple instead of per-transaction would be detectable?
>

Yeah, it is possible that is why I suggested in one of the emails
above to allow changing the owners only for disabled subscriptions.

-- 
With Regards,
Amit Kapila.

Re: Non-superuser subscription owners

On Mon, Nov 29, 2021 at 11:52 PM Jeff Davis  wrote:
>
> On Mon, 2021-11-29 at 08:26 -0800, Mark Dilger wrote:
>
> > > I agree that if we want to do all of this then that would require a
> > > lot of changes. However, giving an error for RLS-enabled tables
> > > might
> > > also be too restrictive. The few alternatives could be that (a) we
> > > allow subscription owners to be either have "bypassrls" attribute
> > > or
> > > they could be superusers. (b) don't allow initial table_sync for
> > > rls
> > > enabled tables. (c) evaluate/analyze what is required to allow Copy
> > > From to start respecting RLS policies. (d) reject replicating any
> > > changes to tables that have RLS enabled.
>
> Maybe a combination?
>
> Allow subscriptions with copy_data=true iff the subscription owner is
> bypassrls or superuser. And then enforce RLS+WCO during
> insert/update/delete.
>

Yeah, that sounds reasonable to me.

> I don't think it's a big change (correct me if I'm wrong),
>

Yeah, I also don't think it should be a big change.

-- 
With Regards,
Amit Kapila.

Re: Skipping logical replication transactions on subscriber side

On Tue, Nov 30, 2021 at 6:28 PM Amit Kapila  wrote:
>
> On Mon, Nov 29, 2021 at 11:38 AM vignesh C  wrote:
> >
>
> I have pushed this patch and there is a buildfarm failure for it. See:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&dt=2021-11-30%2005%3A05%3A25
>
> Sawada-San has shared his initial analysis on pgsql-committers [1] and
> I am responding here as the fix requires some more discussion.
>
> > Looking at the result the test actually got, we had two error entries
> > for test_tab1 instead of one:
> >
> > #   Failed test 'check the error reported by the apply worker'
> > #   at t/026_worker_stats.pl line 33.
> > #  got: 'tap_sub|INSERT|test_tab1|t
> > # tap_sub||test_tab1|t'
> > # expected: 'tap_sub|INSERT|test_tab1|t'
> >
> > The possible scenarios are:
> >
> > The table sync worker for test_tab1 failed due to an error unrelated
> > to apply changes:
> >
> > 2021-11-30 06:24:02.137 CET [18990:2] ERROR:  replication origin with
> > OID 2 is already active for PID 23706
> >
> > At this time, the view had one error entry for the table sync worker.
> > After retrying table sync, it succeeded:
> >
> > 2021-11-30 06:24:04.202 CET [28117:2] LOG:  logical replication table
> > synchronization worker for subscription "tap_sub", table "test_tab1"
> > has finished
> >
> > Then after inserting a row on the publisher, the apply worker inserted
> > the row but failed due to violating a unique key violation, which is
> > expected:
> >
> > 2021-11-30 06:24:04.307 CET [4806:2] ERROR:  duplicate key value
> > violates unique constraint "test_tab1_pkey"
> > 2021-11-30 06:24:04.307 CET [4806:3] DETAIL:  Key (a)=(1) already exists.
> > 2021-11-30 06:24:04.307 CET [4806:4] CONTEXT:  processing remote data
> > during "INSERT" for replication target relation "public.test_tab1" in
> > transaction 721 at 2021-11-30 06:24:04.305096+01
> >
> > As a result, we had two error entries for test_tab1: the table sync
> > worker error and the apply worker error. I didn't expect that the
> > table sync worker for test_tab1 failed due to "replication origin with
> > OID 2 is already active for PID 23706” error.
> >
> > Looking at test_subscription_error() in 026_worker_stats.pl, we have
> > two checks; in the first check, we wait for the view to show the error
> > entry for the given relation name and xid. This check was passed since
> > we had the second error (i.g., apply worker error). In the second
> > check, we get error entries from pg_stat_subscription_workers by
> > specifying only the relation name. Therefore, we ended up getting two
> > entries and failed the tests.
> >
> > To fix this issue, I think that in the second check, we can get the
> > error from pg_stat_subscription_workers by specifying the relation
> > name *and* xid like the first check does. I've attached the patch.
> > What do you think?
> >
>
> I think this will fix the reported failure but there is another race
> condition in the test. Isn't it possible that for table test_tab2, we
> get an error "replication origin with OID ..." or some other error
> before copy, in that case also, we will proceed from the second call
> of test_subscription_error() which is not what we expect in the test?

Right.

> Shouldn't we someway check that the error message also starts with
> "duplicate key value violates ..."?

Yeah, I think it's a good idea to make the checks more specific. That
is, probably we can specify the prefix of the error message and
subrelid in addition to the current conditions: relid and xid. That
way, we can check what error was reported by which workers (tablesync
or apply) for which relations. And both check queries in
test_subscription_error() can have the same WHERE clause.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: SSL Tests for sslinfo extension

2021-11-30 Thread Daniel Gustafsson

> On 30 Nov 2021, at 00:16, Daniel Gustafsson  wrote:
>> On 29 Nov 2021, at 23:50, Tom Lane  wrote:

>> Otherwise I think it's good to go, so I marked it RFC.
> 
> Great!  I'll take another look over it tomorrow and will go ahead with it 
> then.

I applied your nitpick diff and took it for another spin in CI, and pushed it.
Thanks for review!

--
Daniel Gustafsson   https://vmware.com/

Re: row filtering for logical replication

On Tue, Nov 30, 2021 at 12:33 PM Ajin Cherian  wrote:
>
> On Thu, Nov 25, 2021 at 2:22 PM Peter Smith  wrote:
> >
> > Thanks for all the review comments so far! We are endeavouring to keep
> > pace with them.
> >
> > All feedback is being tracked and we will fix and/or reply to everything 
> > ASAP.
> >
> > Meanwhile, PSA the latest set of v42* patches.
> >
> > This version was mostly a patch restructuring exercise but it also
> > addresses some minor review comments in passing.
> >
>
> Addressed more review comments, in the attached patch-set v43. 5
> patches carried forward from v42.
> This patch-set contains the following fixes:
>
> On Tue, Nov 23, 2021 at 1:28 AM Dilip Kumar  wrote:
> >
> > in pgoutput_row_filter, we are dropping the slots if there are some
> > old slots in the RelationSyncEntry.  But then I noticed that in
> > rel_sync_cache_relation_cb(), also we are doing that but only for the
> > scantuple slot.  So IMHO, rel_sync_cache_relation_cb(), is only place
> > setting entry->rowfilter_valid to false; so why not drop all the slot
> > that time only and in pgoutput_row_filter(), you can just put an
> > assert?
> >
>
> Moved all the dropping of slots to rel_sync_cache_relation_cb()
>
> > +static bool
> > +pgoutput_row_filter_virtual(Relation relation, TupleTableSlot *slot,
> > RelationSyncEntry *entry)
> > +{
> > +EState   *estate;
> > +ExprContext *ecxt;
> >
> >
> > pgoutput_row_filter_virtual and pgoutput_row_filter are exactly same
> > except, ExecStoreHeapTuple(), so why not just put one check based on
> > whether a slot is passed or not, instead of making complete duplicate
> > copy of the function.
>
> Removed pgoutput_row_filter_virtual
>
> >  oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> >  tupdesc = CreateTupleDescCopy(tupdesc);
> >  entry->scantuple = MakeSingleTupleTableSlot(tupdesc, 
> > &TTSOpsHeapTuple);
> >
> > Why do we need to copy the tupledesc? do we think that we need to have
> > this slot even if we close the relation, if so can you add the
> > comments explaining why we are making a copy here.
>
> This code has been modified, and comments added.
>
> On Tue, Nov 23, 2021 at 8:02 PM Amit Kapila  wrote:
> > One more thing related to this code:
> > pgoutput_row_filter()
> > {
> > ..
> > + if (!entry->rowfilter_valid)
> > {
> > ..
> > + oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> > + tupdesc = CreateTupleDescCopy(tupdesc);
> > + entry->scantuple = MakeSingleTupleTableSlot(tupdesc, &TTSOpsHeapTuple);
> > + MemoryContextSwitchTo(oldctx);
> > ..
> > }
> >
> > Why do we need to initialize scantuple here unless we are sure that
> > the row filter is going to get associated with this relentry? I think
> > when there is no row filter then this allocation is not required.
> >
>
> Modified as suggested.
>
> On Tue, Nov 23, 2021 at 10:52 PM Amit Kapila  wrote:
> >
> > In 0003 patch, why is below change required?
> > --- a/src/backend/replication/pgoutput/pgoutput.c
> > +++ b/src/backend/replication/pgoutput/pgoutput.c
> > @@ -1,4 +1,4 @@
> > -/*-
> > +/*
> >   *
> >   * pgoutput.c
> >
>
> Removed.
>
> >
> > After above, rearrange the code in pgoutput_row_filter(), so that two
> > different checks related to 'rfisnull'  (introduced by different
> > patches) can be combined as if .. else check.
> >
> Fixed.
>
> On Thu, Nov 25, 2021 at 12:03 AM Amit Kapila  wrote:
> >
> > + * If the new relation or the old relation has a where clause,
> > + * we need to remove it so that it can be added afresh later.
> > + */
> > + if (RelationGetRelid(newpubrel->relation) == oldrelid &&
> > + newpubrel->whereClause == NULL && rfisnull)
> >
> > Can't we use _equalPublicationTable() here? It compares the whereClause as 
> > well.
> >
>
> Tried this, can't do this because one is an alter statement while the
> other is a publication, the whereclause is not
> the same Nodetype. In the statement, the whereclause is T_A_Expr,
> while in the publication
> catalog, it is T_OpExpr.
>
> >   /* Must be owner of the table or superuser. */
> > - if (!pg_class_ownercheck(RelationGetRelid(rel), GetUserId()))
> > + if (!pg_class_ownercheck(relid, GetUserId()))
> >
> > Here, you can directly use RelationGetRelid as was used in the
> > previous code without using an additional variable.
> >
>
> Fixed.
>
> > 2.
> > +typedef struct {
> > + Relation rel;
> > + bool check_replident;
> > + Bitmapset  *bms_replident;
> > +}
> > +rf_context;
> >
> > Add rf_context in the same line where } ends.
>
> Code has been modified, this comment no longer applies.
>
> > 4.
> > + * Rules: Node-type validation
> > + * ---
> > + * Allow only simple or compound expressions like:
> > + * - "(Var Op Const)" or
> >
> > It seems Var Op Var is allowed. I tried below and it works:
> > create publication pub for table t1 where (c1 <

Re: row filtering for logical replication

2021-11-30 Thread Dilip Kumar

On Tue, Nov 30, 2021 at 3:55 PM Amit Kapila  wrote:
> > >
> > > We can try that way but I think we should still be able to combine in
> > > many cases like where all the operations are specified for
> > > publications having the table or maybe pubactions are same. So, we
> > > should not give up on those cases. We can do this new logic only when
> > > we find that pubactions are different and probably store them as
> > > independent expressions and corresponding pubactions for it at the
> > > current location in the v42* patch (in pgoutput_row_filter). It is
> > > okay to combine them at a later stage during execution when we can't
> > > do it at the time of forming cache entry.
> >
> > What about the initial table sync? during that, we are going to
> > combine all the filters or we are going to apply only the insert
> > filters?
> >
>
> AFAIK, currently, initial table sync doesn't respect publication
> actions so it should combine all the filters. What do you think?

Yeah, I have the same opinion.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: row filtering for logical replication

On Tue, Nov 30, 2021 at 11:37 AM Dilip Kumar  wrote:
>
> On Tue, Nov 30, 2021 at 10:26 AM Amit Kapila  wrote:
> >
> > On Mon, Nov 29, 2021 at 8:40 PM Euler Taveira  wrote:
> > >
> > > On Mon, Nov 29, 2021, at 7:11 AM, Amit Kapila wrote:
> > >
> > > I don't think it is a good idea to combine the row-filter from the
> > > publication that publishes just 'insert' with the row-filter that
> > > publishes 'updates'. We shouldn't apply the 'insert' filter for
> > > 'update' and similarly for publication operations. We can combine the
> > > filters when the published operations are the same. So, this means
> > > that we might need to cache multiple row-filters but I think that is
> > > better than having another restriction that publish operation 'insert'
> > > should also honor RI columns restriction.
> > >
> > > That's exactly what I meant to say but apparently I didn't explain in 
> > > details.
> > > If a subscriber has multiple publications and a table is part of these
> > > publications with different row filters, it should check the publication 
> > > action
> > > *before* including it in the row filter list. It means that an UPDATE 
> > > operation
> > > cannot apply a row filter that is part of a publication that has only 
> > > INSERT as
> > > an action. Having said that we cannot always combine multiple row filter
> > > expressions into one. Instead, it should cache individual row filter 
> > > expression
> > > and apply the OR during the row filter execution (as I did in the initial
> > > patches before this caching stuff). The other idea is to have multiple 
> > > caches
> > > for each action.  The main disadvantage of this approach is to create 4x
> > > entries.
> > >
> > > I'm experimenting the first approach that stores multiple row filters and 
> > > its
> > > publication action right now.
> > >
> >
> > We can try that way but I think we should still be able to combine in
> > many cases like where all the operations are specified for
> > publications having the table or maybe pubactions are same. So, we
> > should not give up on those cases. We can do this new logic only when
> > we find that pubactions are different and probably store them as
> > independent expressions and corresponding pubactions for it at the
> > current location in the v42* patch (in pgoutput_row_filter). It is
> > okay to combine them at a later stage during execution when we can't
> > do it at the time of forming cache entry.
>
> What about the initial table sync? during that, we are going to
> combine all the filters or we are going to apply only the insert
> filters?
>

AFAIK, currently, initial table sync doesn't respect publication
actions so it should combine all the filters. What do you think?

-- 
With Regards,
Amit Kapila.

Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd

2021-11-30 Thread Andy Fan

Thanks for everyone's insight so far!

my exception should be that the relcache should not be invalidated _after
> the first relation open_
> in the executor (not the beginning of executorRun)。
>
>
s/exception/expectation.

To be more accurate,  my expectation is for a single sql statement,  after
the first time I write data into
the relation, until the statement is completed or "aborted and
RelationClose is called in ResourceOwnerRelease",
the relcache reset should never happen.

Since there are many places the write table access methods can be called
like COPY,  CAST,  REFRESH MATVIEW,
VACUUM and ModifyNode,  and it is possible that we can get error in each
place,  so I think RelationClose
should be a great places for exceptional case(at the same time, we should
remember to destroy properly for non
exceptional case).

This might be not very professional since I bind some executor related data
into a relcache related struct.
But it should be workable in my modified user case.  The most professional
method I can think out is adding
another resource type in ResourceOwner and let ResourceOwnerRelease to
handle the exceptional cases.

-- 
Best Regards
Andy Fan

Re: Update stale code comment in CheckpointerMain()

2021-11-30 Thread Daniel Gustafsson

> On 30 Nov 2021, at 08:00, Amul Sul  wrote:

> The attached patch updates the code comment which is no longer true
> after commit # 4a92a1c3d1c361ffb031ed05bf65b801241d7cdd

Agreed, but looking at this shouldn't we also tweak the comment on
RecoveryInProgress() as per the attached v2 diff?

--
Daniel Gustafsson   https://vmware.com/



update_code_comment_v2.patch
Description: Binary data

Re: Skipping logical replication transactions on subscriber side

On Mon, Nov 29, 2021 at 11:38 AM vignesh C  wrote:
>

I have pushed this patch and there is a buildfarm failure for it. See:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&dt=2021-11-30%2005%3A05%3A25

Sawada-San has shared his initial analysis on pgsql-committers [1] and
I am responding here as the fix requires some more discussion.

> Looking at the result the test actually got, we had two error entries
> for test_tab1 instead of one:
>
> #   Failed test 'check the error reported by the apply worker'
> #   at t/026_worker_stats.pl line 33.
> #  got: 'tap_sub|INSERT|test_tab1|t
> # tap_sub||test_tab1|t'
> # expected: 'tap_sub|INSERT|test_tab1|t'
>
> The possible scenarios are:
>
> The table sync worker for test_tab1 failed due to an error unrelated
> to apply changes:
>
> 2021-11-30 06:24:02.137 CET [18990:2] ERROR:  replication origin with
> OID 2 is already active for PID 23706
>
> At this time, the view had one error entry for the table sync worker.
> After retrying table sync, it succeeded:
>
> 2021-11-30 06:24:04.202 CET [28117:2] LOG:  logical replication table
> synchronization worker for subscription "tap_sub", table "test_tab1"
> has finished
>
> Then after inserting a row on the publisher, the apply worker inserted
> the row but failed due to violating a unique key violation, which is
> expected:
>
> 2021-11-30 06:24:04.307 CET [4806:2] ERROR:  duplicate key value
> violates unique constraint "test_tab1_pkey"
> 2021-11-30 06:24:04.307 CET [4806:3] DETAIL:  Key (a)=(1) already exists.
> 2021-11-30 06:24:04.307 CET [4806:4] CONTEXT:  processing remote data
> during "INSERT" for replication target relation "public.test_tab1" in
> transaction 721 at 2021-11-30 06:24:04.305096+01
>
> As a result, we had two error entries for test_tab1: the table sync
> worker error and the apply worker error. I didn't expect that the
> table sync worker for test_tab1 failed due to "replication origin with
> OID 2 is already active for PID 23706” error.
>
> Looking at test_subscription_error() in 026_worker_stats.pl, we have
> two checks; in the first check, we wait for the view to show the error
> entry for the given relation name and xid. This check was passed since
> we had the second error (i.g., apply worker error). In the second
> check, we get error entries from pg_stat_subscription_workers by
> specifying only the relation name. Therefore, we ended up getting two
> entries and failed the tests.
>
> To fix this issue, I think that in the second check, we can get the
> error from pg_stat_subscription_workers by specifying the relation
> name *and* xid like the first check does. I've attached the patch.
> What do you think?
>

I think this will fix the reported failure but there is another race
condition in the test. Isn't it possible that for table test_tab2, we
get an error "replication origin with OID ..." or some other error
before copy, in that case also, we will proceed from the second call
of test_subscription_error() which is not what we expect in the test?
Shouldn't we someway check that the error message also starts with
"duplicate key value violates ..."?

[1] - 
https://www.postgresql.org/message-id/CAD21AoChP5wOT2AYziF%2B-j7vvThF2NyAs7wr%2Byy%2B8hsnu%3D8Rgg%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: [PATCH] pg_statio_all_tables: several rows per table due to invalid TOAST index

2021-11-30 Thread Andrei Zubkov

Hi, Michael

Thank you for your attention!

On Tue, 2021-11-30 at 17:29 +0900, Michael Paquier wrote:
> Hmm.  Why should we care about invalid indexes at all, including
> pg_statio_all_indexes?
> 

I think we should care about them at least because they are exists and
can consume resources. For example, invalid index is to be updated by
DML operations.
Of course we can exclude such indexes from a view using isvalid,
isready, islive fields. But in such case we should mention this in the
docs, and more important is that the new such states of indexes can
appear in the future causing change in a view definition. Counting all
indexes regardless of states seems more reasonable to me.

-- 
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [PATCH] pg_statio_all_tables: several rows per table due to invalid TOAST index