Re: Wrong usage of RelationNeedsWAL

2021-01-28 Thread Kyotaro Horiguchi
At Wed, 27 Jan 2021 23:10:53 -0800, Noah Misch  wrote in 
> On Thu, Jan 28, 2021 at 12:06:27PM +0900, Kyotaro Horiguchi wrote:
> > At Wed, 27 Jan 2021 02:48:48 -0800, Noah Misch  wrote in 
> > > On Thu, Jan 21, 2021 at 01:23:36AM -0800, Noah Misch wrote:
> > > > On Thu, Jan 21, 2021 at 06:02:11PM +0900, Kyotaro Horiguchi wrote:
> > > > > Perhaps I'm missing something, but the patch doesn't pass the v5-0001
> > > > > test with wal_level=minimal?
> > > > 
> > > > Correct.  The case we must avoid is letting an old snapshot read an
> > > > early-pruned page without error.  v5-0001 expects "ERROR:  snapshot too 
> > > > old".
> > > > The patch suspends early pruning, so that error is not applicable.
> 
> > I studied the sto feature further and concluded that the checker side
> > is fine that it always follow the chages of page-LSN.
> > 
> > So what we can do for the issue is setting seemingly correct page LSN
> > at pruning or refrain from early-pruning while we are skipping
> > WAL. The reason I took the former is I thought that the latter might
> > be a problem since early-pruning would be postponed by a long-running
> > wal-skipping transaction.
> 
> Yes, that's an accurate summary.
> 
> > So the patch looks fine to me. The commit message mekes sense.
> > 
> > However, is it ok that the existing tests (modules/snapshot_too_old)
> > fails when wal_level=minimal?
>
> That would not be okay, but I'm not seeing that.  How did your setup differ
> from the following?

I did that with the following temp-conf. However, the tests succeeds
when I ran them again with the configuration. Sorry for the noise.

autovacuum = off
old_snapshot_threshold = 0
wal_level=minimal
max_wal_senders=0
wal_skip_threshold=0

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Wrong usage of RelationNeedsWAL

2021-01-27 Thread Noah Misch
On Thu, Jan 28, 2021 at 12:06:27PM +0900, Kyotaro Horiguchi wrote:
> At Wed, 27 Jan 2021 02:48:48 -0800, Noah Misch  wrote in 
> > On Thu, Jan 21, 2021 at 01:23:36AM -0800, Noah Misch wrote:
> > > On Thu, Jan 21, 2021 at 06:02:11PM +0900, Kyotaro Horiguchi wrote:
> > > > Perhaps I'm missing something, but the patch doesn't pass the v5-0001
> > > > test with wal_level=minimal?
> > > 
> > > Correct.  The case we must avoid is letting an old snapshot read an
> > > early-pruned page without error.  v5-0001 expects "ERROR:  snapshot too 
> > > old".
> > > The patch suspends early pruning, so that error is not applicable.

> I studied the sto feature further and concluded that the checker side
> is fine that it always follow the chages of page-LSN.
> 
> So what we can do for the issue is setting seemingly correct page LSN
> at pruning or refrain from early-pruning while we are skipping
> WAL. The reason I took the former is I thought that the latter might
> be a problem since early-pruning would be postponed by a long-running
> wal-skipping transaction.

Yes, that's an accurate summary.

> So the patch looks fine to me. The commit message mekes sense.
> 
> However, is it ok that the existing tests (modules/snapshot_too_old)
> fails when wal_level=minimal?

That would not be okay, but I'm not seeing that.  How did your setup differ
from the following?

[nm@rfd 6:1 2021-01-27T23:06:33 postgresql 0]$ cat /nmscratch/minimal.conf
log_statement = all
wal_level = minimal
max_wal_senders = 0
log_line_prefix = '%m [%p %l %x] %q%a '
[nm@rfd 6:1 2021-01-27T23:06:38 postgresql 0]$ make -C 
src/test/modules/snapshot_too_old check TEMP_CONFIG=/nmscratch/minimal.conf
== creating temporary instance==
== initializing database system   ==
== starting postmaster==
running on port 58080 with PID 2603099
== creating database "isolation_regression" ==
CREATE DATABASE
ALTER DATABASE
== running regression test queries==
test sto_using_cursor ... ok30168 ms
test sto_using_select ... ok24197 ms
test sto_using_hash_index ... ok 6089 ms
== shutting down postmaster   ==
== removing temporary instance==

=
 All 3 tests passed.
=




Re: Wrong usage of RelationNeedsWAL

2021-01-27 Thread Kyotaro Horiguchi
At Wed, 27 Jan 2021 02:48:48 -0800, Noah Misch  wrote in 
> On Thu, Jan 21, 2021 at 01:23:36AM -0800, Noah Misch wrote:
> > On Thu, Jan 21, 2021 at 06:02:11PM +0900, Kyotaro Horiguchi wrote:
> > > Perhaps I'm missing something, but the patch doesn't pass the v5-0001
> > > test with wal_level=minimal?
> > 
> > Correct.  The case we must avoid is letting an old snapshot read an
> > early-pruned page without error.  v5-0001 expects "ERROR:  snapshot too 
> > old".
> > The patch suspends early pruning, so that error is not applicable.
> 
> I think the attached version is ready.  The changes since v6nm are cosmetic:
> 
> - Wrote log messages
> - Split into two patches, since the user-visible bugs are materially different
> - Fixed typos
> - Ran perltidy
> 
> Is it okay if I push these on Saturday, or would you like more time to
> investigate?

Thank you for the new version.

I studied the sto feature further and concluded that the checker side
is fine that it always follow the chages of page-LSN.

So what we can do for the issue is setting seemingly correct page LSN
at pruning or refrain from early-pruning while we are skipping
WAL. The reason I took the former is I thought that the latter might
be a problem since early-pruning would be postponed by a long-running
wal-skipping transaction.

So the patch looks fine to me. The commit message mekes sense.

However, is it ok that the existing tests (modules/snapshot_too_old)
fails when wal_level=minimal?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Wrong usage of RelationNeedsWAL

2021-01-27 Thread Noah Misch
On Thu, Jan 21, 2021 at 01:23:36AM -0800, Noah Misch wrote:
> On Thu, Jan 21, 2021 at 06:02:11PM +0900, Kyotaro Horiguchi wrote:
> > At Thu, 21 Jan 2021 00:19:58 -0800, Noah Misch  wrote in 
> > > On Thu, Jan 21, 2021 at 12:28:44AM +0900, Kyotaro Horiguchi wrote:
> > > > However, with the previous patch, two existing tests sto_using_cursor
> > > > and sto_using_select behaves differently from the master.  That change
> > > > is coming from the omission of actual LSN check in TestForOldSnapshot
> > > > while wal_level=minimal. So we have no choice other than actually
> > > > updating page LSN.
> > > > 
> > > > In the scenario under discussion there are two places we need to do
> > > > that. one is heap_page_prune, and the other is RelationCopyStorge. As
> > > > a PoC, I used gistXLogAssignLSN() as is for thie purpose. See the
> > > > attached third file.
> > > 
> > > Fake LSNs make the system harder to understand, so I prefer not to spread 
> > > fake
> > > LSNs to more access methods.  What I had in mind is to simply suppress 
> > > early
> > > pruning when the relation is skipping WAL.  Attached.  Is this 
> > > reasonable?  It
> > > passes the older tests.  While it changes the sto_wal_optimized.spec 
> > > output, I
> > > think it preserves the old_snapshot_threshold behavior contract.
> > 
> > Perhaps I'm missing something, but the patch doesn't pass the v5-0001
> > test with wal_level=minimal?
> 
> Correct.  The case we must avoid is letting an old snapshot read an
> early-pruned page without error.  v5-0001 expects "ERROR:  snapshot too old".
> The patch suspends early pruning, so that error is not applicable.

I think the attached version is ready.  The changes since v6nm are cosmetic:

- Wrote log messages
- Split into two patches, since the user-visible bugs are materially different
- Fixed typos
- Ran perltidy

Is it okay if I push these on Saturday, or would you like more time to
investigate?
Author: Noah Misch 
Commit: Noah Misch 

Fix error with CREATE PUBLICATION, wal_level=minimal, and new tables.

CREATE PUBLICATION has failed spuriously when applied to a permanent
relation created or rewritten in the current transaction.  Make the same
change to another site having the same semantic intent; the second
instance has no user-visible consequences.  Back-patch to v13, where
commit c6b92041d38512a4176ed76ad06f713d2e6c01a8 broke this.

Kyotaro Horiguchi

Discussion: 
https://postgr.es/m/20210113.160705.2225256954956139776.horikyota@gmail.com

diff --git a/src/backend/catalog/pg_publication.c 
b/src/backend/catalog/pg_publication.c
index 5f8e1c6..84d2efc 100644
--- a/src/backend/catalog/pg_publication.c
+++ b/src/backend/catalog/pg_publication.c
@@ -67,7 +67,7 @@ check_publication_add_relation(Relation targetrel)
 errdetail("System tables cannot be added to 
publications.")));
 
/* UNLOGGED and TEMP relations cannot be part of publication. */
-   if (!RelationNeedsWAL(targetrel))
+   if (targetrel->rd_rel->relpersistence != RELPERSISTENCE_PERMANENT)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 errmsg("table \"%s\" cannot be replicated",
diff --git a/src/backend/optimizer/util/plancat.c 
b/src/backend/optimizer/util/plancat.c
index da322b4..177e6e3 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -126,7 +126,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, 
bool inhparent,
relation = table_open(relationObjectId, NoLock);
 
/* Temporary and unlogged relations are inaccessible during recovery. */
-   if (!RelationNeedsWAL(relation) && RecoveryInProgress())
+   if (relation->rd_rel->relpersistence != RELPERSISTENCE_PERMANENT &&
+   RecoveryInProgress())
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("cannot access temporary or unlogged 
relations during recovery")));
diff --git a/src/test/subscription/t/001_rep_changes.pl 
b/src/test/subscription/t/001_rep_changes.pl
index 0680f44..3d90f81 100644
--- a/src/test/subscription/t/001_rep_changes.pl
+++ b/src/test/subscription/t/001_rep_changes.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 23;
+use Test::More tests => 24;
 
 # Initialize publisher node
 my $node_publisher = get_new_node('publisher');
@@ -358,3 +358,21 @@ is($result, qq(0), 'check replication origin was dropped 
on subscriber');
 
 $node_subscriber->stop('fast');
 $node_publisher->stop('fast');
+
+# CREATE PUBLICATION while wal_level=minimal should succeed, with a WARNING
+$node_publisher->append_conf(
+   'postgresql.conf', qq(
+wal_level=minimal
+max_wal_senders=0
+));
+$node_publisher->start;
+($result, my $retout, my $reterr) = 

Re: Wrong usage of RelationNeedsWAL

2021-01-21 Thread Noah Misch
On Thu, Jan 21, 2021 at 06:02:11PM +0900, Kyotaro Horiguchi wrote:
> At Thu, 21 Jan 2021 00:19:58 -0800, Noah Misch  wrote in 
> > On Thu, Jan 21, 2021 at 12:28:44AM +0900, Kyotaro Horiguchi wrote:
> > > However, with the previous patch, two existing tests sto_using_cursor
> > > and sto_using_select behaves differently from the master.  That change
> > > is coming from the omission of actual LSN check in TestForOldSnapshot
> > > while wal_level=minimal. So we have no choice other than actually
> > > updating page LSN.
> > > 
> > > In the scenario under discussion there are two places we need to do
> > > that. one is heap_page_prune, and the other is RelationCopyStorge. As
> > > a PoC, I used gistXLogAssignLSN() as is for thie purpose. See the
> > > attached third file.
> > 
> > Fake LSNs make the system harder to understand, so I prefer not to spread 
> > fake
> > LSNs to more access methods.  What I had in mind is to simply suppress early
> > pruning when the relation is skipping WAL.  Attached.  Is this reasonable?  
> > It
> > passes the older tests.  While it changes the sto_wal_optimized.spec 
> > output, I
> > think it preserves the old_snapshot_threshold behavior contract.
> 
> Perhaps I'm missing something, but the patch doesn't pass the v5-0001
> test with wal_level=minimal?

Correct.  The case we must avoid is letting an old snapshot read an
early-pruned page without error.  v5-0001 expects "ERROR:  snapshot too old".
The patch suspends early pruning, so that error is not applicable.




Re: Wrong usage of RelationNeedsWAL

2021-01-21 Thread Kyotaro Horiguchi
At Thu, 21 Jan 2021 00:19:58 -0800, Noah Misch  wrote in 
> On Thu, Jan 21, 2021 at 12:28:44AM +0900, Kyotaro Horiguchi wrote:
> > At Wed, 20 Jan 2021 17:34:44 +0900 (JST), Kyotaro Horiguchi 
> >  wrote in 
> > > Anyway, it seems actually dangerous that cause pruning on wal-skipped
> > > relation.
> > > 
> > > > with your patch versions.  Could you try implementing both test 
> > > > procedures in
> > > > src/test/modules/snapshot_too_old?  There's no need to make the test use
> > > > wal_level=minimal explicitly; it's sufficient to catch these bugs when
> > > > wal_level=minimal is in the TEMP_CONFIG file.
> > > 
> > > In the attached, TestForOldSnapshot() considers XLogIsNeeded(),
> > > instead of moving the condition on LSN to TestForOldSnapshot_impl for
> > > performance.
> > > 
> > > I'll add the test part in the next version.
> 
> That test helped me.  I now see "there's not a single tuple removed due to
> old_snapshot_threshold in src/test/modules/snapshot_too_old"[1], which limits
> our ability to test using this infrastructure.

Yes.

> > However, with the previous patch, two existing tests sto_using_cursor
> > and sto_using_select behaves differently from the master.  That change
> > is coming from the omission of actual LSN check in TestForOldSnapshot
> > while wal_level=minimal. So we have no choice other than actually
> > updating page LSN.
> > 
> > In the scenario under discussion there are two places we need to do
> > that. one is heap_page_prune, and the other is RelationCopyStorge. As
> > a PoC, I used gistXLogAssignLSN() as is for thie purpose. See the
> > attached third file.
> 
> Fake LSNs make the system harder to understand, so I prefer not to spread fake
> LSNs to more access methods.  What I had in mind is to simply suppress early
> pruning when the relation is skipping WAL.  Attached.  Is this reasonable?  It
> passes the older tests.  While it changes the sto_wal_optimized.spec output, I
> think it preserves the old_snapshot_threshold behavior contract.

Perhaps I'm missing something, but the patch doesn't pass the v5-0001
test with wal_level=minimal?

> [1] 
> https://www.postgresql.org/message-id/20200403001235.e6jfdll3gh2ygbuc%40alap3.anarazel.de

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Wrong usage of RelationNeedsWAL

2021-01-21 Thread Noah Misch
On Thu, Jan 21, 2021 at 12:28:44AM +0900, Kyotaro Horiguchi wrote:
> At Wed, 20 Jan 2021 17:34:44 +0900 (JST), Kyotaro Horiguchi 
>  wrote in 
> > Anyway, it seems actually dangerous that cause pruning on wal-skipped
> > relation.
> > 
> > > with your patch versions.  Could you try implementing both test 
> > > procedures in
> > > src/test/modules/snapshot_too_old?  There's no need to make the test use
> > > wal_level=minimal explicitly; it's sufficient to catch these bugs when
> > > wal_level=minimal is in the TEMP_CONFIG file.
> > 
> > In the attached, TestForOldSnapshot() considers XLogIsNeeded(),
> > instead of moving the condition on LSN to TestForOldSnapshot_impl for
> > performance.
> > 
> > I'll add the test part in the next version.

That test helped me.  I now see "there's not a single tuple removed due to
old_snapshot_threshold in src/test/modules/snapshot_too_old"[1], which limits
our ability to test using this infrastructure.

> However, with the previous patch, two existing tests sto_using_cursor
> and sto_using_select behaves differently from the master.  That change
> is coming from the omission of actual LSN check in TestForOldSnapshot
> while wal_level=minimal. So we have no choice other than actually
> updating page LSN.
> 
> In the scenario under discussion there are two places we need to do
> that. one is heap_page_prune, and the other is RelationCopyStorge. As
> a PoC, I used gistXLogAssignLSN() as is for thie purpose. See the
> attached third file.

Fake LSNs make the system harder to understand, so I prefer not to spread fake
LSNs to more access methods.  What I had in mind is to simply suppress early
pruning when the relation is skipping WAL.  Attached.  Is this reasonable?  It
passes the older tests.  While it changes the sto_wal_optimized.spec output, I
think it preserves the old_snapshot_threshold behavior contract.

[1] 
https://www.postgresql.org/message-id/20200403001235.e6jfdll3gh2ygbuc%40alap3.anarazel.de
diff --git a/src/backend/catalog/pg_publication.c 
b/src/backend/catalog/pg_publication.c
index 5f8e1c6..84d2efc 100644
--- a/src/backend/catalog/pg_publication.c
+++ b/src/backend/catalog/pg_publication.c
@@ -67,7 +67,7 @@ check_publication_add_relation(Relation targetrel)
 errdetail("System tables cannot be added to 
publications.")));
 
/* UNLOGGED and TEMP relations cannot be part of publication. */
-   if (!RelationNeedsWAL(targetrel))
+   if (targetrel->rd_rel->relpersistence != RELPERSISTENCE_PERMANENT)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 errmsg("table \"%s\" cannot be replicated",
diff --git a/src/backend/optimizer/util/plancat.c 
b/src/backend/optimizer/util/plancat.c
index da322b4..177e6e3 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -126,7 +126,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, 
bool inhparent,
relation = table_open(relationObjectId, NoLock);
 
/* Temporary and unlogged relations are inaccessible during recovery. */
-   if (!RelationNeedsWAL(relation) && RecoveryInProgress())
+   if (relation->rd_rel->relpersistence != RELPERSISTENCE_PERMANENT &&
+   RecoveryInProgress())
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("cannot access temporary or unlogged 
relations during recovery")));
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index ae16c3e..9570426 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1764,7 +1764,11 @@ TransactionIdLimitedForOldSnapshots(TransactionId 
recentXmin,
Assert(OldSnapshotThresholdActive());
Assert(limit_ts != NULL && limit_xid != NULL);
 
-   if (!RelationAllowsEarlyPruning(relation))
+   /*
+* TestForOldSnapshot() assumes early pruning advances the page LSN, so 
we
+* can't prune early when skipping WAL.
+*/
+   if (!RelationAllowsEarlyPruning(relation) || 
!RelationNeedsWAL(relation))
return false;
 
ts = GetSnapshotCurrentTimestamp();
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 579be35..c21ee3c 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -37,7 +37,7 @@
  */
 #define RelationAllowsEarlyPruning(rel) \
 ( \
-RelationNeedsWAL(rel) \
+(rel)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT  \
   && !IsCatalogRelation(rel) \
   && !RelationIsAccessibleInLogicalDecoding(rel) \
 )
diff --git a/src/test/subscription/t/001_rep_changes.pl 
b/src/test/subscription/t/001_rep_changes.pl
index 0680f44..ed9b48e 100644
--- a/src/test/subscription/t/001_rep_changes.pl
+++ b/src/test/subscription/t/001_rep_changes.pl
@@ -3,7 +3,7 @@ use strict;
 use 

Re: Wrong usage of RelationNeedsWAL

2021-01-20 Thread Kyotaro Horiguchi
At Wed, 20 Jan 2021 17:34:44 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> Anyway, it seems actually dangerous that cause pruning on wal-skipped
> relation.
> 
> > with your patch versions.  Could you try implementing both test procedures 
> > in
> > src/test/modules/snapshot_too_old?  There's no need to make the test use
> > wal_level=minimal explicitly; it's sufficient to catch these bugs when
> > wal_level=minimal is in the TEMP_CONFIG file.
> 
> In the attached, TestForOldSnapshot() considers XLogIsNeeded(),
> instead of moving the condition on LSN to TestForOldSnapshot_impl for
> performance.
> 
> I'll add the test part in the next version.

This is it.

However, with the previous patch, two existing tests sto_using_cursor
and sto_using_select behaves differently from the master.  That change
is coming from the omission of actual LSN check in TestForOldSnapshot
while wal_level=minimal. So we have no choice other than actually
updating page LSN.

In the scenario under discussion there are two places we need to do
that. one is heap_page_prune, and the other is RelationCopyStorge. As
a PoC, I used gistXLogAssignLSN() as is for thie purpose. See the
attached third file.

If it is the right direction, I will move XLOG_GIST_ASSIGN_LSN to
XLOG_ASSIGN_LSN and move gistXLogAssignLSN() to XLogAdvanceLSN() or
XLogNop() or such.

With the third patch, the test succedds both wal_level = replica and
minimal.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 332ac10844befb8a9c00df9f68993eb3696f0a85 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 20 Jan 2021 20:57:03 +0900
Subject: [PATCH v5 1/3] Test for snapshot too old and wal_level=minimal

---
 src/test/modules/snapshot_too_old/Makefile| 12 -
 .../expected/sto_wal_optimized.out| 24 ++
 .../input_sto/sto_wal_optimized.spec  | 44 +++
 src/test/modules/snapshot_too_old/sto.conf|  3 ++
 4 files changed, 82 insertions(+), 1 deletion(-)
 create mode 100644 src/test/modules/snapshot_too_old/expected/sto_wal_optimized.out
 create mode 100644 src/test/modules/snapshot_too_old/input_sto/sto_wal_optimized.spec

diff --git a/src/test/modules/snapshot_too_old/Makefile b/src/test/modules/snapshot_too_old/Makefile
index dfb4537f63..1e0b0b6efc 100644
--- a/src/test/modules/snapshot_too_old/Makefile
+++ b/src/test/modules/snapshot_too_old/Makefile
@@ -4,7 +4,7 @@
 # we have to clean those result files explicitly
 EXTRA_CLEAN = $(pg_regress_clean_files)
 
-ISOLATION = sto_using_cursor sto_using_select sto_using_hash_index
+ISOLATION = sto_using_cursor sto_using_select sto_using_hash_index sto_wal_optimized
 ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/snapshot_too_old/sto.conf
 
 # Disabled because these tests require "old_snapshot_threshold" >= 0, which
@@ -22,6 +22,16 @@ include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
 
+.PHONY: tablespace-setup specfile-setup
+tablespace-setup:
+	rm -rf ./testtablespace
+	mkdir ./testtablespace
+
+specfile-setup: input_sto/*.spec
+	sed 's!@srcdir@!$(realpath $(top_srcdir))!g' $? > specs/$(notdir $?)
+
+check: tablespace-setup specfile-setup
+
 # But it can nonetheless be very helpful to run tests on preexisting
 # installation, allow to do so, but only if requested explicitly.
 installcheck-force:
diff --git a/src/test/modules/snapshot_too_old/expected/sto_wal_optimized.out b/src/test/modules/snapshot_too_old/expected/sto_wal_optimized.out
new file mode 100644
index 00..4038d0a713
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/expected/sto_wal_optimized.out
@@ -0,0 +1,24 @@
+Parsed test spec with 3 sessions
+
+starting permutation: s1a1 s1a2 s2a1 s2a2 s1b1 a3-0 a3-1 s3-2 s3-3 s3-4 s2b1
+step s1a1: BEGIN;
+step s1a2: DELETE FROM t;
+step s2a1: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s2a2: SELECT 1;
+?column?   
+
+1  
+step s1b1: COMMIT;
+step a3-0: SELECT setting, pg_sleep(6) FROM pg_settings WHERE name = 'old_snapshot_threshold';
+settingpg_sleep   
+
+0 
+step a3-1: BEGIN;
+step s3-2: ALTER TABLE t SET TABLESPACE tsp1;
+step s3-3: SELECT count(*) FROM t;
+count  
+
+0  
+step s3-4: COMMIT;
+step s2b1: SELECT count(*) FROM t;
+ERROR:  snapshot too old
diff --git a/src/test/modules/snapshot_too_old/input_sto/sto_wal_optimized.spec b/src/test/modules/snapshot_too_old/input_sto/sto_wal_optimized.spec
new file mode 100644
index 00..02adb50581
--- /dev/null
+++ b/src/test/modules/snapshot_too_old/input_sto/sto_wal_optimized.spec
@@ -0,0 +1,44 @@
+# This test provokes a "snapshot too old" error 
+#
+# The sleep is needed because with a threshold of zero a statement could error
+# on changes it made.  With more normal settings no external delay is needed,
+# but we don't want these tests to run long enough to see that, since
+# granularity is in minutes.
+#
+# Since results depend on the 

Re: Wrong usage of RelationNeedsWAL

2021-01-20 Thread Kyotaro Horiguchi
At Tue, 19 Jan 2021 01:31:52 -0800, Noah Misch  wrote in 
> On Tue, Jan 19, 2021 at 01:48:31PM +0900, Kyotaro Horiguchi wrote:
> > I understand that you are suggesting that at least
> > TransactionIdLimitedForOldSnapshots should follow not only relation
> > persistence but RelationNeedsWAL, based on the discussion on the
> > check on LSN of TestForOldSnapshot().
> > 
> > I still don't think that LSN in the WAL-skipping-created relfilenode
> > harm.  ALTER TABLE SET TABLESPACE always makes a dead copy of every
> > block (except checksum) including page LSN regardless of wal_level. In
> > the scenario above, the last select at H correctly reads page LSN set
> > by E then copied by G, which is larger than the snapshot LSN at B. So
> > doesn't go the fast-return path before actual check performed by
> > RelationAllowsEarlyPruning.
> 
> I agree the above procedure doesn't have a problem under your patch versions.
> See https://postgr.es/m/20210116043816.ga1644...@rfd.leadboat.com for a
> different test case.  In more detail:
> 
A> setup: CREATE TABLE t AS SELECT ...;
B> xact1: BEGIN; DELETE FROM t;
C> xact2: BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT 1;  -- take snapshot
D> xact1: COMMIT;
> (plenty of time passes, rows of t are now eligible for early pruning)
E> xact3: BEGIN;
F> xact3: ALTER TABLE t SET TABLESPACE something;  -- start skipping WAL
G> xact3: SELECT count(*) FROM t;  -- early pruning w/o WAL, w/o LSN changes

H> xact3: COMMIT;
J> xact2: SELECT count(*) FROM t;  -- no error, wanted "snapshot too old"
> 
> I expect that shows no bug for git master, but I expect it does show a bug

I didn't see "snapshot too old" at J with the patch, but at the same
time the SELECT at G didn't prune tuples almost all of the trial. (the
last SELECT returns the correct answer.) I saw it get pruned only once
among many trials but I'm not sure how I could get to the situation
and haven't succeed to reproduce that.

The reason that the SELECT at G doesn't prune is that limited_xmin in
heap_page_prune_opt at G is limited up to the oldest xid among active
sessions. In this case the xmin of the session 2 is the same with the
xid of the session 1. So xmin of the session 3 at G is the same with
the xmin of the session 2, which is the same with the xid of the
session 1.  So pruning doesn't happen. However that is very fragile as
the base for asserting that pruning won't happen.

Anyway, it seems actually dangerous that cause pruning on wal-skipped
relation.

> with your patch versions.  Could you try implementing both test procedures in
> src/test/modules/snapshot_too_old?  There's no need to make the test use
> wal_level=minimal explicitly; it's sufficient to catch these bugs when
> wal_level=minimal is in the TEMP_CONFIG file.

In the attached, TestForOldSnapshot() considers XLogIsNeeded(),
instead of moving the condition on LSN to TestForOldSnapshot_impl for
performance.

I'll add the test part in the next version.

regards.
-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 5976674e48fdce3c6e911f0ae63485d92941bfd8 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Mon, 18 Jan 2021 14:47:21 +0900
Subject: [PATCH v4] Do not use RelationNeedsWAL to identify relation
 persistence

RelationNeedsWAL() may return false for a permanent relation when
wal_level=minimal and the relation is created or truncated in the
current transaction. Directly examine relpersistence instead of using
the function to know relation persistence.
---
 src/backend/catalog/pg_publication.c | 2 +-
 src/backend/optimizer/util/plancat.c | 3 ++-
 src/include/storage/bufmgr.h | 8 +++-
 src/include/utils/rel.h  | 2 +-
 src/include/utils/snapmgr.h  | 2 +-
 5 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/src/backend/catalog/pg_publication.c b/src/backend/catalog/pg_publication.c
index 5f8e1c64e1..84d2efcfd2 100644
--- a/src/backend/catalog/pg_publication.c
+++ b/src/backend/catalog/pg_publication.c
@@ -67,7 +67,7 @@ check_publication_add_relation(Relation targetrel)
  errdetail("System tables cannot be added to publications.")));
 
 	/* UNLOGGED and TEMP relations cannot be part of publication. */
-	if (!RelationNeedsWAL(targetrel))
+	if (targetrel->rd_rel->relpersistence != RELPERSISTENCE_PERMANENT)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("table \"%s\" cannot be replicated",
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index da322b453e..177e6e336a 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -126,7 +126,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	relation = table_open(relationObjectId, NoLock);
 
 	/* Temporary and unlogged relations are inaccessible during recovery. */
-	if (!RelationNeedsWAL(relation) && RecoveryInProgress())
+	if (relation->rd_rel->relpersistence != RELPERSISTENCE_PERMANENT &&
+		RecoveryInProgress())
 		

Re: Wrong usage of RelationNeedsWAL

2021-01-19 Thread Noah Misch
On Tue, Jan 19, 2021 at 01:48:31PM +0900, Kyotaro Horiguchi wrote:
> At Mon, 18 Jan 2021 17:30:22 +0900 (JST), Kyotaro Horiguchi 
>  wrote in 
> > At Sun, 17 Jan 2021 23:02:18 -0800, Noah Misch  wrote in 
> > > On Sun, Jan 17, 2021 at 10:36:31PM -0800, Noah Misch wrote:
> > > > I wrote the above based on the "PageGetLSN(page) > (snapshot)->lsn" 
> > > > check in
> > > > TestForOldSnapshot().  If the LSN isn't important, what else explains
> > > > RelationAllowsEarlyPruning() checking RelationNeedsWAL()?
> > > 
> > > Thinking about it more, some RelationAllowsEarlyPruning() callers need
> > > different treatment.  Above, I was writing about the case of deciding 
> > > whether
> > > to do early pruning.  The other RelationAllowsEarlyPruning() call sites 
> > > are
> > > deciding whether the relation might be lacking old data.  For that case, 
> > > we
> > > should check RELPERSISTENCE_PERMANENT, not RelationNeedsWAL().  
> > > Otherwise, we
> > > could fail to report an old-snapshot error in a case like this:
> > > 
> A> > setup: CREATE TABLE t AS SELECT ...;
> B> > xact1: BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT 1;  -- take snapshot
> C> > xact2: DELETE FROM t;
> D> > (plenty of time passes)
> E> > xact3: SELECT count(*) FROM t;  -- early pruning
> F> > xact1: SAVEPOINT q; SELECT count(*) FROM t; ROLLBACK TO q;  -- "snapshot 
> too old"
> G> > xact1: ALTER TABLE t SET TABLESPACE something;  -- start skipping WAL
> H> > xact1: SELECT count(*) FROM t;  -- no error, wanted "snapshot too old"
> > > 
> > > Is that plausible?
> > 
> > Thank you for the consideration and yes. But I get "snapshot too old"
> > from the last query with the patched version. maybe I'm missing
> > something. I'm going to investigate the case.
> 
> Ah. I took that in reverse way. (caught by the discussion on page
> LSN.) Ok, the "current" code works that way. So we need to perform the
> check the new way in RelationAllowsEarlyPruning. So in a short answer
> for the last question in regard to the reference side is yes.

Right.  I expect the above procedure shows a bug in git master, but your patch
versions suffice to fix that bug.

> I understand that you are suggesting that at least
> TransactionIdLimitedForOldSnapshots should follow not only relation
> persistence but RelationNeedsWAL, based on the discussion on the
> check on LSN of TestForOldSnapshot().
> 
> I still don't think that LSN in the WAL-skipping-created relfilenode
> harm.  ALTER TABLE SET TABLESPACE always makes a dead copy of every
> block (except checksum) including page LSN regardless of wal_level. In
> the scenario above, the last select at H correctly reads page LSN set
> by E then copied by G, which is larger than the snapshot LSN at B. So
> doesn't go the fast-return path before actual check performed by
> RelationAllowsEarlyPruning.

I agree the above procedure doesn't have a problem under your patch versions.
See https://postgr.es/m/20210116043816.ga1644...@rfd.leadboat.com for a
different test case.  In more detail:

setup: CREATE TABLE t AS SELECT ...;
xact1: BEGIN; DELETE FROM t;
xact2: BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT 1;  -- take snapshot
xact1: COMMIT;
(plenty of time passes, rows of t are now eligible for early pruning)
xact3: BEGIN;
xact3: ALTER TABLE t SET TABLESPACE something;  -- start skipping WAL
xact3: SELECT count(*) FROM t;  -- early pruning w/o WAL, w/o LSN changes
xact3: COMMIT;
xact2: SELECT count(*) FROM t;  -- no error, wanted "snapshot too old"

I expect that shows no bug for git master, but I expect it does show a bug
with your patch versions.  Could you try implementing both test procedures in
src/test/modules/snapshot_too_old?  There's no need to make the test use
wal_level=minimal explicitly; it's sufficient to catch these bugs when
wal_level=minimal is in the TEMP_CONFIG file.




Re: Wrong usage of RelationNeedsWAL

2021-01-18 Thread Kyotaro Horiguchi
At Mon, 18 Jan 2021 17:30:22 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Sun, 17 Jan 2021 23:02:18 -0800, Noah Misch  wrote in 
> > On Sun, Jan 17, 2021 at 10:36:31PM -0800, Noah Misch wrote:
> > > I wrote the above based on the "PageGetLSN(page) > (snapshot)->lsn" check 
> > > in
> > > TestForOldSnapshot().  If the LSN isn't important, what else explains
> > > RelationAllowsEarlyPruning() checking RelationNeedsWAL()?
> > 
> > Thinking about it more, some RelationAllowsEarlyPruning() callers need
> > different treatment.  Above, I was writing about the case of deciding 
> > whether
> > to do early pruning.  The other RelationAllowsEarlyPruning() call sites are
> > deciding whether the relation might be lacking old data.  For that case, we
> > should check RELPERSISTENCE_PERMANENT, not RelationNeedsWAL().  Otherwise, 
> > we
> > could fail to report an old-snapshot error in a case like this:
> > 
A> > setup: CREATE TABLE t AS SELECT ...;
B> > xact1: BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT 1;  -- take snapshot
C> > xact2: DELETE FROM t;
D> > (plenty of time passes)
E> > xact3: SELECT count(*) FROM t;  -- early pruning
F> > xact1: SAVEPOINT q; SELECT count(*) FROM t; ROLLBACK TO q;  -- "snapshot 
too old"
G> > xact1: ALTER TABLE t SET TABLESPACE something;  -- start skipping WAL
H> > xact1: SELECT count(*) FROM t;  -- no error, wanted "snapshot too old"
> > 
> > Is that plausible?
> 
> Thank you for the consideration and yes. But I get "snapshot too old"
> from the last query with the patched version. maybe I'm missing
> something. I'm going to investigate the case.

Ah. I took that in reverse way. (caught by the discussion on page
LSN.) Ok, the "current" code works that way. So we need to perform the
check the new way in RelationAllowsEarlyPruning. So in a short answer
for the last question in regard to the reference side is yes.

I understand that you are suggesting that at least
TransactionIdLimitedForOldSnapshots should follow not only relation
persistence but RelationNeedsWAL, based on the discussion on the
check on LSN of TestForOldSnapshot().

I still don't think that LSN in the WAL-skipping-created relfilenode
harm.  ALTER TABLE SET TABLESPACE always makes a dead copy of every
block (except checksum) including page LSN regardless of wal_level. In
the scenario above, the last select at H correctly reads page LSN set
by E then copied by G, which is larger than the snapshot LSN at B. So
doesn't go the fast-return path before actual check performed by
RelationAllowsEarlyPruning.

As the result still RelationAllowsEarlyPruning is changed to check
only for the persistence of the table in this version. (In other
words, all the callers of the function works based on the same
criteria.)

- Removed RelationIsWalLoggeed which was forgotten to remove in the
  last version.

- Enclosed parameter of RelationAllowsEarlyPruning.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From d688abbc04459b11bc2801f3c7f1955a86ef7a51 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Mon, 18 Jan 2021 14:47:21 +0900
Subject: [PATCH v3] Do not use RelationNeedsWAL to identify relation
 persistence

RelationNeedsWAL() may return false for a permanent relation when
wal_level=minimal and the relation is created or truncated in the
current transaction. Directly examine relpersistence instead of using
the function to know relation persistence.
---
 src/backend/catalog/pg_publication.c   |  2 +-
 src/backend/optimizer/util/plancat.c   |  3 ++-
 src/include/utils/rel.h|  2 +-
 src/include/utils/snapmgr.h|  2 +-
 src/test/subscription/t/001_rep_changes.pl | 20 +++-
 5 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/src/backend/catalog/pg_publication.c b/src/backend/catalog/pg_publication.c
index 5f8e1c64e1..84d2efcfd2 100644
--- a/src/backend/catalog/pg_publication.c
+++ b/src/backend/catalog/pg_publication.c
@@ -67,7 +67,7 @@ check_publication_add_relation(Relation targetrel)
  errdetail("System tables cannot be added to publications.")));
 
 	/* UNLOGGED and TEMP relations cannot be part of publication. */
-	if (!RelationNeedsWAL(targetrel))
+	if (targetrel->rd_rel->relpersistence != RELPERSISTENCE_PERMANENT)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("table \"%s\" cannot be replicated",
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index da322b453e..177e6e336a 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -126,7 +126,8 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	relation = table_open(relationObjectId, NoLock);
 
 	/* Temporary and unlogged relations are inaccessible during recovery. */
-	if (!RelationNeedsWAL(relation) && RecoveryInProgress())
+	if (relation->rd_rel->relpersistence != RELPERSISTENCE_PERMANENT &&
+		RecoveryInProgress())
 		ereport(ERROR,
 

Re: Wrong usage of RelationNeedsWAL

2021-01-18 Thread Kyotaro Horiguchi
At Sun, 17 Jan 2021 23:02:18 -0800, Noah Misch  wrote in 
> On Sun, Jan 17, 2021 at 10:36:31PM -0800, Noah Misch wrote:
> > I wrote the above based on the "PageGetLSN(page) > (snapshot)->lsn" check in
> > TestForOldSnapshot().  If the LSN isn't important, what else explains
> > RelationAllowsEarlyPruning() checking RelationNeedsWAL()?
> 
> Thinking about it more, some RelationAllowsEarlyPruning() callers need
> different treatment.  Above, I was writing about the case of deciding whether
> to do early pruning.  The other RelationAllowsEarlyPruning() call sites are
> deciding whether the relation might be lacking old data.  For that case, we
> should check RELPERSISTENCE_PERMANENT, not RelationNeedsWAL().  Otherwise, we
> could fail to report an old-snapshot error in a case like this:
> 
> setup: CREATE TABLE t AS SELECT ...;
> xact1: BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT 1;  -- take snapshot
> xact2: DELETE FROM t;
> (plenty of time passes)
> xact3: SELECT count(*) FROM t;  -- early pruning
> xact1: SAVEPOINT q; SELECT count(*) FROM t; ROLLBACK TO q;  -- "snapshot too 
> old"
> xact1: ALTER TABLE t SET TABLESPACE something;  -- start skipping WAL
> xact1: SELECT count(*) FROM t;  -- no error, wanted "snapshot too old"
> 
> Is that plausible?

Thank you for the consideration and yes. But I get "snapshot too old"
from the last query with the patched version. maybe I'm missing
something. I'm going to investigate the case.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Wrong usage of RelationNeedsWAL

2021-01-17 Thread Noah Misch
On Sun, Jan 17, 2021 at 10:36:31PM -0800, Noah Misch wrote:
> On Mon, Jan 18, 2021 at 03:08:38PM +0900, Kyotaro Horiguchi wrote:
> > At Fri, 15 Jan 2021 20:38:16 -0800, Noah Misch  wrote in 
> > > On Wed, Jan 13, 2021 at 04:07:05PM +0900, Kyotaro Horiguchi wrote:
> > > > --- a/src/include/utils/snapmgr.h
> > > > +++ b/src/include/utils/snapmgr.h
> > > > @@ -37,7 +37,7 @@
> > > >   */
> > > >  #define RelationAllowsEarlyPruning(rel) \
> > > >  ( \
> > > > -RelationNeedsWAL(rel) \
> > > > +RelationIsWalLogged(rel) \
> > > 
> > > I suspect this is user-visible for a scenario like:
> > > 
> > > CREATE TABLE t AS SELECT ...; DELETE FROM t;
> > > -- ... time passes, rows of t are now eligible for early pruning ...
> > > BEGIN;
> > > ALTER TABLE t SET TABLESPACE something;  -- start skipping WAL
> > > SELECT count(*) FROM t;
> > > 
> > > After this patch, the SELECT would do early pruning, as it does in the 
> > > absence
> > > of the ALTER TABLE.  When pruning doesn't update the page LSN,
> > > TestForOldSnapshot() will be unable to detect that early pruning changed 
> > > the
> > > query results.  Hence, RelationAllowsEarlyPruning() must not change this 
> > > way.
> > > Does that sound right?
> > 
> > Mmm, maybe no. The pruning works on XID (or timestamp), not on LSN, so
> > it seems to work well even if pruning happened at the SELECT.
> > Conversely that should work after old_snapshot_threshold elapsed.
> > 
> > Am I missing something?
> 
> I wrote the above based on the "PageGetLSN(page) > (snapshot)->lsn" check in
> TestForOldSnapshot().  If the LSN isn't important, what else explains
> RelationAllowsEarlyPruning() checking RelationNeedsWAL()?

Thinking about it more, some RelationAllowsEarlyPruning() callers need
different treatment.  Above, I was writing about the case of deciding whether
to do early pruning.  The other RelationAllowsEarlyPruning() call sites are
deciding whether the relation might be lacking old data.  For that case, we
should check RELPERSISTENCE_PERMANENT, not RelationNeedsWAL().  Otherwise, we
could fail to report an old-snapshot error in a case like this:

setup: CREATE TABLE t AS SELECT ...;
xact1: BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT 1;  -- take snapshot
xact2: DELETE FROM t;
(plenty of time passes)
xact3: SELECT count(*) FROM t;  -- early pruning
xact1: SAVEPOINT q; SELECT count(*) FROM t; ROLLBACK TO q;  -- "snapshot too 
old"
xact1: ALTER TABLE t SET TABLESPACE something;  -- start skipping WAL
xact1: SELECT count(*) FROM t;  -- no error, wanted "snapshot too old"

Is that plausible?




Re: Wrong usage of RelationNeedsWAL

2021-01-17 Thread Noah Misch
On Mon, Jan 18, 2021 at 03:08:38PM +0900, Kyotaro Horiguchi wrote:
> At Fri, 15 Jan 2021 20:38:16 -0800, Noah Misch  wrote in 
> > On Wed, Jan 13, 2021 at 04:07:05PM +0900, Kyotaro Horiguchi wrote:
> > > --- a/src/include/utils/snapmgr.h
> > > +++ b/src/include/utils/snapmgr.h
> > > @@ -37,7 +37,7 @@
> > >   */
> > >  #define RelationAllowsEarlyPruning(rel) \
> > >  ( \
> > > -  RelationNeedsWAL(rel) \
> > > +  RelationIsWalLogged(rel) \
> > 
> > I suspect this is user-visible for a scenario like:
> > 
> > CREATE TABLE t AS SELECT ...; DELETE FROM t;
> > -- ... time passes, rows of t are now eligible for early pruning ...
> > BEGIN;
> > ALTER TABLE t SET TABLESPACE something;  -- start skipping WAL
> > SELECT count(*) FROM t;
> > 
> > After this patch, the SELECT would do early pruning, as it does in the 
> > absence
> > of the ALTER TABLE.  When pruning doesn't update the page LSN,
> > TestForOldSnapshot() will be unable to detect that early pruning changed the
> > query results.  Hence, RelationAllowsEarlyPruning() must not change this 
> > way.
> > Does that sound right?
> 
> Mmm, maybe no. The pruning works on XID (or timestamp), not on LSN, so
> it seems to work well even if pruning happened at the SELECT.
> Conversely that should work after old_snapshot_threshold elapsed.
> 
> Am I missing something?

I wrote the above based on the "PageGetLSN(page) > (snapshot)->lsn" check in
TestForOldSnapshot().  If the LSN isn't important, what else explains
RelationAllowsEarlyPruning() checking RelationNeedsWAL()?




Re: Wrong usage of RelationNeedsWAL

2021-01-17 Thread Kyotaro Horiguchi
Thank you for the comments, Noah and Andres.

At Fri, 15 Jan 2021 20:38:16 -0800, Noah Misch  wrote in 
> On Wed, Jan 13, 2021 at 04:07:05PM +0900, Kyotaro Horiguchi wrote:
> > The definition of the macro RelationNeedsWAL has been changed by
> > c6b92041d3 to include conditions related to the WAL-skip optimization
> > but some uses of the macro are not relevant to the optimization. That
> > misuses are harmless for now as they are only executed while wal_level
> > >= replica or WAL-skipping is inactive. However, this should be
> > corrected to prevent future hazard.
> 
> I see user-visible consequences:
> 
> > --- a/src/backend/catalog/pg_publication.c
> > +++ b/src/backend/catalog/pg_publication.c
> > @@ -67,7 +67,7 @@ check_publication_add_relation(Relation targetrel)
> >  errdetail("System tables cannot be added to 
> > publications.")));
> >  
> > /* UNLOGGED and TEMP relations cannot be part of publication. */
> > -   if (!RelationNeedsWAL(targetrel))
> > +   if (!RelationIsWalLogged(targetrel))
> 
> Today, the following fails needlessly under wal_level=minimal:
> 
> BEGIN;
> SET client_min_messages = 'ERROR';
> CREATE TABLE skip_wal ();
> CREATE PUBLICATION p FOR TABLE skip_wal;
> ROLLBACK;
> 
> Could you add that test to one of the regress scripts?

Mmm. I thought that it cannot be used while wal_level=minimal. The
WARNING for insiffucient wal_level shows that it is intended to
work. A test is added in the attached.

> > ereport(ERROR,
> > (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> >  errmsg("table \"%s\" cannot be replicated",
> > diff --git a/src/backend/optimizer/util/plancat.c 
> > b/src/backend/optimizer/util/plancat.c
> > index da322b453e..0500efcdb9 100644
> > --- a/src/backend/optimizer/util/plancat.c
> > +++ b/src/backend/optimizer/util/plancat.c
> > @@ -126,7 +126,7 @@ get_relation_info(PlannerInfo *root, Oid 
> > relationObjectId, bool inhparent,
> > relation = table_open(relationObjectId, NoLock);
> >  
> > /* Temporary and unlogged relations are inaccessible during recovery. */
> > -   if (!RelationNeedsWAL(relation) && RecoveryInProgress())
> > +   if (!RelationIsWalLogged(relation) && RecoveryInProgress())
> 
> This has no user-visible consequences, but I agree you've clarified it.
> 
> > ereport(ERROR,
> > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> >  errmsg("cannot access temporary or unlogged 
> > relations during recovery")));
> > diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
> > index 10b63982c0..810806a542 100644
> > --- a/src/include/utils/rel.h
> > +++ b/src/include/utils/rel.h
> > @@ -552,16 +552,23 @@ typedef struct ViewOptions
> > (relation)->rd_smgr->smgr_targblock = (targblock); \
> > } while (0)
> >  
> > +/*
> > + * RelationIsPermanent
> > + * True if relation is WAL-logged.
> > + */
> > +#define RelationIsWalLogged(relation)  
> > \
> > +   ((relation)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT)
> 
> To me, the names RelationIsWalLogged() and RelationNeedsWAL() don't convey
> their behavior difference.  How about one of the following spellings?

Yeah! I was concerned about that.

> - Name the new macro RelationEverNeedsWAL().
> - Just test "relpersistence == RELPERSISTENCE_PERMANENT", without a macro.

Yes. I wasn't confident on using the macro since as you pointed the
macro name is very confusing.  The reason for using a macro was
RelationUsesLocalBuffers().

I'm not sure the exact meaning the "ever" (doesn't seem to be
"always"). On the other hand there are many places where the second
line above takes place. So I chose to do that without a macro.

> > +
> >  /*
> >   * RelationNeedsWAL
> > - * True if relation needs WAL.
> > + * True if relation needs WAL at the time.
> >   *
> >   * Returns false if wal_level = minimal and this relation is created or
> >   * truncated in the current transaction.  See "Skipping WAL for New
> >   * RelFileNode" in src/backend/access/transam/README.
> >   */
> >  #define RelationNeedsWAL(relation) 
> > \
> > -   ((relation)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT &&  
> > \
> > +   (RelationIsWalLogged(relation) &&   
> > \
> >  (XLogIsNeeded() || 
> > \
> >   (relation->rd_createSubid == InvalidSubTransactionId &&   
> > \
> >relation->rd_firstRelfilenodeSubid == InvalidSubTransactionId)))
> > @@ -619,7 +626,7 @@ typedef struct ViewOptions
> >   */
> >  #define RelationIsAccessibleInLogicalDecoding(relation) \
> > 

Re: Wrong usage of RelationNeedsWAL

2021-01-15 Thread Noah Misch
On Wed, Jan 13, 2021 at 04:07:05PM +0900, Kyotaro Horiguchi wrote:
> The definition of the macro RelationNeedsWAL has been changed by
> c6b92041d3 to include conditions related to the WAL-skip optimization
> but some uses of the macro are not relevant to the optimization. That
> misuses are harmless for now as they are only executed while wal_level
> >= replica or WAL-skipping is inactive. However, this should be
> corrected to prevent future hazard.

I see user-visible consequences:

> --- a/src/backend/catalog/pg_publication.c
> +++ b/src/backend/catalog/pg_publication.c
> @@ -67,7 +67,7 @@ check_publication_add_relation(Relation targetrel)
>errdetail("System tables cannot be added to 
> publications.")));
>  
>   /* UNLOGGED and TEMP relations cannot be part of publication. */
> - if (!RelationNeedsWAL(targetrel))
> + if (!RelationIsWalLogged(targetrel))

Today, the following fails needlessly under wal_level=minimal:

BEGIN;
SET client_min_messages = 'ERROR';
CREATE TABLE skip_wal ();
CREATE PUBLICATION p FOR TABLE skip_wal;
ROLLBACK;

Could you add that test to one of the regress scripts?

>   ereport(ERROR,
>   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>errmsg("table \"%s\" cannot be replicated",
> diff --git a/src/backend/optimizer/util/plancat.c 
> b/src/backend/optimizer/util/plancat.c
> index da322b453e..0500efcdb9 100644
> --- a/src/backend/optimizer/util/plancat.c
> +++ b/src/backend/optimizer/util/plancat.c
> @@ -126,7 +126,7 @@ get_relation_info(PlannerInfo *root, Oid 
> relationObjectId, bool inhparent,
>   relation = table_open(relationObjectId, NoLock);
>  
>   /* Temporary and unlogged relations are inaccessible during recovery. */
> - if (!RelationNeedsWAL(relation) && RecoveryInProgress())
> + if (!RelationIsWalLogged(relation) && RecoveryInProgress())

This has no user-visible consequences, but I agree you've clarified it.

>   ereport(ERROR,
>   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>errmsg("cannot access temporary or unlogged 
> relations during recovery")));
> diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
> index 10b63982c0..810806a542 100644
> --- a/src/include/utils/rel.h
> +++ b/src/include/utils/rel.h
> @@ -552,16 +552,23 @@ typedef struct ViewOptions
>   (relation)->rd_smgr->smgr_targblock = (targblock); \
>   } while (0)
>  
> +/*
> + * RelationIsPermanent
> + *   True if relation is WAL-logged.
> + */
> +#define RelationIsWalLogged(relation)
> \
> + ((relation)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT)

To me, the names RelationIsWalLogged() and RelationNeedsWAL() don't convey
their behavior difference.  How about one of the following spellings?

- Name the new macro RelationEverNeedsWAL().
- Just test "relpersistence == RELPERSISTENCE_PERMANENT", without a macro.

> +
>  /*
>   * RelationNeedsWAL
> - *   True if relation needs WAL.
> + *   True if relation needs WAL at the time.
>   *
>   * Returns false if wal_level = minimal and this relation is created or
>   * truncated in the current transaction.  See "Skipping WAL for New
>   * RelFileNode" in src/backend/access/transam/README.
>   */
>  #define RelationNeedsWAL(relation)   
> \
> - ((relation)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT &&  
> \
> + (RelationIsWalLogged(relation) &&   
> \
>(XLogIsNeeded() || 
> \
> (relation->rd_createSubid == InvalidSubTransactionId &&   
> \
>  relation->rd_firstRelfilenodeSubid == InvalidSubTransactionId)))
> @@ -619,7 +626,7 @@ typedef struct ViewOptions
>   */
>  #define RelationIsAccessibleInLogicalDecoding(relation) \
>   (XLogLogicalInfoActive() && \
> -  RelationNeedsWAL(relation) && \
> +  RelationIsWalLogged(relation) && \

Today's condition expands to:

  wal_level >= WAL_LEVEL_LOGICAL &&
  relation->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT &&
  (wal_level >= WAL_LEVEL_REPLICA || ...)

The proposed change doesn't affect semantics, and it likely doesn't change
optimized code.  I slightly prefer to leave this line unchanged.

>(IsCatalogRelation(relation) || 
> RelationIsUsedAsCatalogTable(relation)))
>  
>  /*
> @@ -635,7 +642,7 @@ typedef struct ViewOptions
>   */
>  #define RelationIsLogicallyLogged(relation) \
>   (XLogLogicalInfoActive() && \
> -  RelationNeedsWAL(relation) && \
> +  RelationIsWalLogged(relation) && \

Likewise.

>!IsCatalogRelation(relation))
>  
>  /* 

Re: Wrong usage of RelationNeedsWAL

2021-01-15 Thread Andres Freund
Hi,

On 2021-01-13 16:07:05 +0900, Kyotaro Horiguchi wrote:
> Commit c6b92041d3 changed the definition of RelationNeedsWAL().
> 
> -#define RelationNeedsWAL(relation) \
> - ((relation)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT)
> +#define RelationNeedsWAL(relation)   
> \
> + ((relation)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT &&  
> \
> +  (XLogIsNeeded() || 
> \
> +   (relation->rd_createSubid == InvalidSubTransactionId &&   
> \
> +relation->rd_firstRelfilenodeSubid == InvalidSubTransactionId)))
> 
> On the other hand I found this usage.
> 
> plancat.c:128 get_relation_info()
> > /* Temporary and unlogged relations are inaccessible during recovery. */
> > if (!RelationNeedsWAL(relation) && RecoveryInProgress())
> > ereport(ERROR,
> > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> >  errmsg("cannot access temporary or unlogged 
> > relations during recovery")));
> 
> It works as expected accidentally, but the meaning is off.
> WAL-skipping optmization is irrelevant to the condition for the error.
> 
> I found five misues in the tree. Please find the attached.

Noah?

Greetings,

Andres Freund




Wrong usage of RelationNeedsWAL

2021-01-12 Thread Kyotaro Horiguchi
Hello.

Commit c6b92041d3 changed the definition of RelationNeedsWAL().

-#define RelationNeedsWAL(relation) \
-   ((relation)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT)
+#define RelationNeedsWAL(relation) 
\
+   ((relation)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT &&  
\
+(XLogIsNeeded() || 
\
+ (relation->rd_createSubid == InvalidSubTransactionId &&   
\
+  relation->rd_firstRelfilenodeSubid == InvalidSubTransactionId)))

On the other hand I found this usage.

plancat.c:128 get_relation_info()
>   /* Temporary and unlogged relations are inaccessible during recovery. */
>   if (!RelationNeedsWAL(relation) && RecoveryInProgress())
>   ereport(ERROR,
>   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>errmsg("cannot access temporary or unlogged 
> relations during recovery")));

It works as expected accidentally, but the meaning is off.
WAL-skipping optmization is irrelevant to the condition for the error.

I found five misues in the tree. Please find the attached.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 2912e1db38ff034e27e4010eff5b2e5afcce3f85 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 13 Jan 2021 15:52:03 +0900
Subject: [PATCH] Fix misuses of RelationNeedsWAL

The definition of the macro RelationNeedsWAL has been changed by
c6b92041d3 to include conditions related to the WAL-skip optimization
but some uses of the macro are not relevant to the optimization. That
misuses are harmless for now as they are only executed while wal_level
>= replica or WAL-skipping is inactive. However, this should be
corrected to prevent future hazard.
---
 src/backend/catalog/pg_publication.c |  2 +-
 src/backend/optimizer/util/plancat.c |  2 +-
 src/include/utils/rel.h  | 15 +++
 src/include/utils/snapmgr.h  |  2 +-
 4 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/src/backend/catalog/pg_publication.c b/src/backend/catalog/pg_publication.c
index 5f8e1c64e1..f3060a4cf1 100644
--- a/src/backend/catalog/pg_publication.c
+++ b/src/backend/catalog/pg_publication.c
@@ -67,7 +67,7 @@ check_publication_add_relation(Relation targetrel)
  errdetail("System tables cannot be added to publications.")));
 
 	/* UNLOGGED and TEMP relations cannot be part of publication. */
-	if (!RelationNeedsWAL(targetrel))
+	if (!RelationIsWalLogged(targetrel))
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("table \"%s\" cannot be replicated",
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index da322b453e..0500efcdb9 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -126,7 +126,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	relation = table_open(relationObjectId, NoLock);
 
 	/* Temporary and unlogged relations are inaccessible during recovery. */
-	if (!RelationNeedsWAL(relation) && RecoveryInProgress())
+	if (!RelationIsWalLogged(relation) && RecoveryInProgress())
 		ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  errmsg("cannot access temporary or unlogged relations during recovery")));
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 10b63982c0..810806a542 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -552,16 +552,23 @@ typedef struct ViewOptions
 		(relation)->rd_smgr->smgr_targblock = (targblock); \
 	} while (0)
 
+/*
+ * RelationIsPermanent
+ *		True if relation is WAL-logged.
+ */
+#define RelationIsWalLogged(relation)	\
+	((relation)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT)
+
 /*
  * RelationNeedsWAL
- *		True if relation needs WAL.
+ *		True if relation needs WAL at the time.
  *
  * Returns false if wal_level = minimal and this relation is created or
  * truncated in the current transaction.  See "Skipping WAL for New
  * RelFileNode" in src/backend/access/transam/README.
  */
 #define RelationNeedsWAL(relation)		\
-	((relation)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT &&	\
+	(RelationIsWalLogged(relation) &&	\
 	 (XLogIsNeeded() ||	\
 	  (relation->rd_createSubid == InvalidSubTransactionId &&			\
 	   relation->rd_firstRelfilenodeSubid == InvalidSubTransactionId)))
@@ -619,7 +626,7 @@ typedef struct ViewOptions
  */
 #define RelationIsAccessibleInLogicalDecoding(relation) \
 	(XLogLogicalInfoActive() && \
-	 RelationNeedsWAL(relation) && \
+	 RelationIsWalLogged(relation) && \
 	 (IsCatalogRelation(relation) || RelationIsUsedAsCatalogTable(relation)))
 
 /*
@@ -635,7 +642,7 @@ typedef struct ViewOptions
  */
 #define