Add --{no-,}bypassrls flags to createuser

2022-04-12 Thread Shinya Kato

Hi,

Add --{no-,}bypassrls flags to createuser.
The following is an example of execution.
--
$ createuser a --bypassrls
$ psql -c "\du a"
   List of roles
 Role name | Attributes | Member of
---++---
 a | Bypass RLS | {}

--

Do you think?

Regards,

--
Shinya Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATIONdiff --git a/doc/src/sgml/ref/createuser.sgml b/doc/src/sgml/ref/createuser.sgml
index 17579e50af..6c2ee1e0c6 100644
--- a/doc/src/sgml/ref/createuser.sgml
+++ b/doc/src/sgml/ref/createuser.sgml
@@ -290,6 +290,28 @@ PostgreSQL documentation
   
  
 
+ 
+  --bypassrls
+  
+   
+The new user will have the BYPASSRLS privilege,
+which is described more fully in the documentation for .
+   
+  
+ 
+
+ 
+  --no-bypassrls
+  
+   
+The new user will not have the BYPASSRLS
+privilege, which is described more fully in the documentation for .
+   
+  
+ 
+
  
-?
--help
diff --git a/src/bin/scripts/createuser.c b/src/bin/scripts/createuser.c
index bfba0d09d1..5b363b6f54 100644
--- a/src/bin/scripts/createuser.c
+++ b/src/bin/scripts/createuser.c
@@ -48,6 +48,8 @@ main(int argc, char *argv[])
 		{"replication", no_argument, NULL, 1},
 		{"no-replication", no_argument, NULL, 2},
 		{"interactive", no_argument, NULL, 3},
+		{"bypassrls", no_argument, NULL, 4},
+		{"no-bypassrls", no_argument, NULL, 5},
 		{"connection-limit", required_argument, NULL, 'c'},
 		{"pwprompt", no_argument, NULL, 'P'},
 		{"encrypted", no_argument, NULL, 'E'},
@@ -76,7 +78,8 @@ main(int argc, char *argv[])
 createrole = TRI_DEFAULT,
 inherit = TRI_DEFAULT,
 login = TRI_DEFAULT,
-replication = TRI_DEFAULT;
+replication = TRI_DEFAULT,
+bypassrls = TRI_DEFAULT;
 
 	PQExpBufferData sql;
 
@@ -165,6 +168,12 @@ main(int argc, char *argv[])
 			case 3:
 interactive = true;
 break;
+			case 4:
+bypassrls = TRI_YES;
+break;
+			case 5:
+bypassrls = TRI_NO;
+break;
 			default:
 /* getopt_long already emitted a complaint */
 pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -304,6 +313,10 @@ main(int argc, char *argv[])
 		appendPQExpBufferStr(, " REPLICATION");
 	if (replication == TRI_NO)
 		appendPQExpBufferStr(, " NOREPLICATION");
+	if (bypassrls == TRI_YES)
+		appendPQExpBufferStr(, " BYPASSRLS");
+	if (bypassrls == TRI_NO)
+		appendPQExpBufferStr(, " NOBYPASSRLS");
 	if (conn_limit >= -1)
 		appendPQExpBuffer(, " CONNECTION LIMIT %d", conn_limit);
 	if (roles.head != NULL)
@@ -366,6 +379,8 @@ help(const char *progname)
 			 "than using defaults\n"));
 	printf(_("  --replication role can initiate replication\n"));
 	printf(_("  --no-replication  role cannot initiate replication\n"));
+	printf(_("  --bypassrls   role can bypass row-level security (RLS) policy\n"));
+	printf(_("  --no-bypassrlsrole cannot bypass row-level security (RLS) policy\n"));
 	printf(_("  -?, --helpshow this help, then exit\n"));
 	printf(_("\nConnection options:\n"));
 	printf(_("  -h, --host=HOSTNAME   database server host or socket directory\n"));


Re: Skip partition tuple routing with constant partition key

2022-04-12 Thread Amit Kapila
On Tue, Apr 12, 2022 at 6:16 AM Masahiko Sawada  wrote:
>
> Hi,
>
> On Thu, Apr 7, 2022 at 4:37 PM Andres Freund  wrote:
> >
> > Hi,
> >
> > On 2022-04-06 00:07:07 -0400, Tom Lane wrote:
> > > Amit Langote  writes:
> > > > On Sun, Apr 3, 2022 at 10:31 PM Greg Stark  wrote:
> > > >> Is this a problem with the patch or its tests?
> > > >> [18:14:20.798] Test Summary Report
> > > >> [18:14:20.798] ---
> > > >> [18:14:20.798] t/013_partition.pl (Wstat: 15360 Tests: 31 Failed: 0)
> > >
> > > > Hmm, make check-world passes for me after rebasing the patch (v10) to
> > > > the latest HEAD (clean), nor do I see a failure on cfbot:
> > > > http://cfbot.cputube.org/amit-langote.html
> > >
> > > 013_partition.pl has been failing regularly in the buildfarm,
> > > most recently here:
> > >
> > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican=2022-03-31%2000%3A49%3A45
> >
> > Just failed locally on my machine as well.
> >
> >
> > > I don't think there's room to blame any uncommitted patches
> > > for that.  Somebody broke it a short time before here:
> > >
> > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse=2022-03-17%2016%3A08%3A19
> >
> > The obvious thing to point a finger at is
> >
> > commit c91f71b9dc91ef95e1d50d6d782f477258374fc6
> > Author: Tomas Vondra 
> > Date:   2022-03-16 16:42:47 +0100
> >
> > Fix publish_as_relid with multiple publications
> >
>
> I've not managed to reproduce this issue on my machine but while
> reviewing the code and the server logs[1] I may have found possible
> bugs:
>
> 2022-04-08 12:59:30.701 EDT [91997:1] LOG:  logical replication apply
> worker for subscription "sub2" has started
> 2022-04-08 12:59:30.702 EDT [91998:3] 013_partition.pl LOG:
> statement: ALTER SUBSCRIPTION sub2 SET PUBLICATION pub_lower_level,
> pub_all
> 2022-04-08 12:59:30.733 EDT [91998:4] 013_partition.pl LOG:
> disconnection: session time: 0:00:00.036 user=buildfarm
> database=postgres host=[local]
> 2022-04-08 12:59:30.740 EDT [92001:1] LOG:  logical replication table
> synchronization worker for subscription "sub2", table "tab4_1" has
> started
> 2022-04-08 12:59:30.744 EDT [91997:2] LOG:  logical replication apply
> worker for subscription "sub2" will restart because of a parameter
> change
> 2022-04-08 12:59:30.750 EDT [92003:1] LOG:  logical replication table
> synchronization worker for subscription "sub2", table "tab3" has
> started
>
> The logs say that the apply worker for "sub2" finished whereas the
> tablesync workers for "tab4_1" and "tab3" started. After these logs,
> there are no logs that these tablesync workers finished and the apply
> worker for "sub2" restarted, until the timeout. While reviewing the
> code, I realized that the tablesync workers can advance its relstate
> even without the apply worker intervention.
>
> After a tablesync worker copies the table it sets
> SUBREL_STATE_SYNCWAIT to its relstate, then it waits for the apply
> worker to update the relstate to SUBREL_STATE_CATCHUP. If the apply
> worker has already died, it breaks from the wait loop and returns
> false:
>
> wait_for_worker_state_change():
>
> for (;;)
> {
> LogicalRepWorker *worker;
>
> :
>
> /*
>  * Bail out if the apply worker has died, else signal it we're
>  * waiting.
>  */
> LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
> worker = logicalrep_worker_find(MyLogicalRepWorker->subid,
> InvalidOid, false);
> if (worker && worker->proc)
> logicalrep_worker_wakeup_ptr(worker);
> LWLockRelease(LogicalRepWorkerLock);
> if (!worker)
> break;
>
> :
> }
>
> return false;
>
> However, the caller doesn't check the return value at all:
>
> /*
>  * We are done with the initial data synchronization, update the state.
>  */
> SpinLockAcquire(>relmutex);
> MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCWAIT;
> MyLogicalRepWorker->relstate_lsn = *origin_startpos;
> SpinLockRelease(>relmutex);
>
> /*
>  * Finally, wait until the main apply worker tells us to catch up and then
>  * return to let LogicalRepApplyLoop do it.
>  */
> wait_for_worker_state_change(SUBREL_STATE_CATCHUP);
> return slotname;
>
> Therefore, the tablesync worker started logical replication while
> keeping its relstate as SUBREL_STATE_SYNCWAIT.
>
> Given the server logs, it's likely that both tablesync workers for
> "tab4_1" and "tab3" were in this situation. That is, there were two
> tablesync workers who were applying changes for the target relation
> but the relstate was SUBREL_STATE_SYNCWAIT.
>
> When it comes to starting the apply worker, probably it didn't happen
> since there are already running tablesync workers as much as
> max_sync_workers_per_subscription (2 by default):
>
> logicalrep_worker_launch():
>
> /*
>  * If we reached the sync worker limit per 

Re: Fixes for compression options of pg_receivewal and refactoring of backup_compression.{c,h}

2022-04-12 Thread Michael Paquier
On Tue, Apr 12, 2022 at 06:22:48PM +0900, Michael Paquier wrote:
> On Mon, Apr 11, 2022 at 12:46:02PM +, gkokola...@pm.me wrote:
>> It looks good. If you choose to discard the comment regarding the use of
>> 'method' over 'algorithm' from above, can you please use the full word in the
>> variable, e.g. 'wal_compress_algorithm' instead of 'wal_compress_algo'. I can
>> not really explain it, the later reads a bit rude. Then again that may be 
>> just
>> me.
> 
> Thanks.  I have been able to do an extra pass on 0001 and 0002, fixing
> those naming inconsistencies with "algo" vs "algorithm" that you and
> Robert have reported, and applied them.  For 0003, I'll look at it
> later.  Attached is a rebase with improvements about the variable
> names.

This has been done with the proper renames.  With that in place, I see
no reason now to not be able to set the compression level as it is
possible to pass it down with the options available.  This requires
only a couple of lines, as of the attached.  LZ4 has a dummy structure
called LZ4F_INIT_PREFERENCES to initialize LZ4F_preferences_t, that
holds the compression level before passing it down to
LZ4F_compressBegin(), but that's available only in v1.8.3.  Using it
risks lowering down the minimal version of LZ4 we are able to use now,
but replacing that with a set of memset()s is also a way to set up
things as per its documentation.

Thoughts?
--
Michael
diff --git a/src/bin/pg_basebackup/walmethods.c b/src/bin/pg_basebackup/walmethods.c
index d5bcc208a9..1b3cc5c4f7 100644
--- a/src/bin/pg_basebackup/walmethods.c
+++ b/src/bin/pg_basebackup/walmethods.c
@@ -165,6 +165,7 @@ dir_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_
 	{
 		size_t		ctx_out;
 		size_t		header_size;
+		LZ4F_preferences_t prefs;
 
 		ctx_out = LZ4F_createCompressionContext(, LZ4F_VERSION);
 		if (LZ4F_isError(ctx_out))
@@ -177,8 +178,13 @@ dir_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_
 		lz4bufsize = LZ4F_compressBound(LZ4_IN_SIZE, NULL);
 		lz4buf = pg_malloc0(lz4bufsize);
 
+		/* assign the compression level, default is 0 */
+		memset(, 0, sizeof(prefs));
+		memset(, 0, sizeof(prefs.frameInfo));
+		prefs.compressionLevel = dir_data->compression_level;
+
 		/* add the header */
-		header_size = LZ4F_compressBegin(ctx, lz4buf, lz4bufsize, NULL);
+		header_size = LZ4F_compressBegin(ctx, lz4buf, lz4bufsize, );
 		if (LZ4F_isError(header_size))
 		{
 			dir_data->lasterrstring = LZ4F_getErrorName(header_size);


signature.asc
Description: PGP signature


Re: Skipping schema changes in publication

2022-04-12 Thread Amit Kapila
On Wed, Apr 13, 2022 at 8:45 AM vignesh C  wrote:
>
> On Tue, Apr 12, 2022 at 4:46 PM Amit Kapila  wrote:
> >
> > I understand that part but what I pointed out was that it might be
> > better to avoid using ADD keyword in this syntax like: ALTER
> > PUBLICATION pub1 SKIP TABLE t1,t2;
>
> Currently we are supporting Alter publication using the following syntax:
> ALTER PUBLICATION pub1 ADD TABLE t1;
> ALTER PUBLICATION pub1 SET TABLE t1;
> ALTER PUBLICATION pub1 DROP TABLE T1;
> ALTER PUBLICATION pub1 ADD ALL TABLES IN SCHEMA sch1;
> ALTER PUBLICATION pub1 SET ALL TABLES IN SCHEMA sch1;
> ALTER PUBLICATION pub1 DROP ALL TABLES IN SCHEMA sch1;
>
> I have extended the new syntax in similar lines:
> ALTER PUBLICATION pub1 ADD SKIP TABLE t1;
> ALTER PUBLICATION pub1 SET SKIP TABLE t1;
> ALTER PUBLICATION pub1 DROP SKIP TABLE T1;
>
> I did it like this to maintain consistency.
>

What is the difference between ADD and SET variants? I understand we
need some way to remove the SKIP table setting but not sure if DROP is
the best alternative.

The other ideas could be:
To set skip tables: ALTER PUBLICATION pub1 SKIP TABLE t1, t2...;
To reset skip tables: ALTER PUBLICATION pub1 SKIP TABLE; /* basically
an empty list*/
Yet another way to reset skip tables: ALTER PUBLICATION pub1 RESET
SKIP TABLE; /* Here we need to introduce RESET. */

-- 
With Regards,
Amit Kapila.




Re: deparsing utility commands

2022-04-12 Thread Ajin Cherian
On Wed, Apr 13, 2022 at 2:29 PM Ajin Cherian  wrote:
>
>
> This patch-set has not been rebased for some time. I see that there is
> interest in this patch from the logical
> replication of DDL thread [1].
>

Forgot to add the link to the thread.

[1] - 
https://www.postgresql.org/message-id/202203162206.7spggyktx63e@alvherre.pgsql




Re: deparsing utility commands

2022-04-12 Thread Ajin Cherian
On Wed, Apr 13, 2022 at 2:12 PM Shulgin, Oleksandr
 wrote:

>> You seem to have squashed the patches?  Please keep the split out.
>
>
> Well, if that makes the review process easier :-)
>
> --
> Alex
>

This patch-set has not been rebased for some time. I see that there is
interest in this patch from the logical
replication of DDL thread [1].

I will take a stab at rebasing this patch-set, I have already rebased
the first patch and will work on the other
patches in the coming days. Do review and give me feedback.

regards,
Ajin Cherian
Fujitsu Australia


0001-ddl_deparse-core-support.patch
Description: Binary data


Re: Support logical replication of DDLs

2022-04-12 Thread Amit Kapila
On Wed, Apr 13, 2022 at 5:49 AM Zheng Li  wrote:
>
> Hi,
>
> Here is the rebased new branch
> https://github.com/zli236/postgres/commits/ddl_replication
>

Thanks, but it would be good if you can share it in the patch form as
well unless you need to send patches too frequently. It is easier to
give comments.

-- 
With Regards,
Amit Kapila.




Re: row filtering for logical replication

2022-04-12 Thread Amit Kapila
On Tue, Apr 12, 2022 at 6:16 PM Alvaro Herrera  wrote:
>
> On 2022-Apr-12, Amit Kapila wrote:
>
> > It still has the same problem. The table can be dropped just before
> > this message and the get_rel_name will return NULL and we don't expect
> > that.
>
> Ugh, I forgot to change the errmsg() parts to use the new variable,
> apologies.  Fixed.
>

Thanks, this will work and fix the issue. I think this looks better
than the current code, however, I am not sure if the handling for the
concurrently dropped tables is better (both get_rel_relkind() and
get_rel_name() can fail due to those reasons). I understand this won't
fail because of the protection you have in the patch, so feel free to
go ahead with this if you like this style better.

-- 
With Regards,
Amit Kapila.




Re: PG DOCS - logical replication filtering

2022-04-12 Thread Peter Smith
On Wed, Apr 13, 2022 at 12:08 AM Euler Taveira  wrote:
>
> On Tue, Apr 12, 2022, at 5:30 AM, Peter Smith wrote:
>
> Not changed. Because in fact, I copied most of this sentence
> (including the uppercase, "operations", "and/or") from existing
> documentation [1]
> e.g. see "The tables added to a publication that publishes UPDATE
> and/or DELETE operations must ..."
> [1] https://www.postgresql.org/docs/current/sql-createpublication.html
>
> Hmm. You are checking the Notes. I'm referring the the publish parameter. IMO
> this sentence should use operations in lowercase letters too because even if
> you create it with uppercase letters, Postgres will provide a lowercase word
> when you dump it.

IIRC the row filter replication identity checking is at run-time based
on the operation (not at DDL time based on the publish_parameter). For
this reason, and also because this is the same format/wording used in
many places already on the create_publication.sgml, I did not change
this formatting. But I did modify [v10] as earlier suggested to
replace the “and/or” with “or”, and also added another word
“operation”.

>
> Yeah, I know the information is the same in the summary and in the
> text. Personally, I find big slabs of technical text difficult to
> digest, so I'd have to spend 5 minutes of careful reading and drawing
> the exact same summary on a piece of paper anyway just to visualize
> what the text says. The summary makes it easy to understand at a
> glance. But I have modified the summary in [v9] to remove the "case"
> and to add other column headings as suggested.
>
> Isn't it better to use a table instead of synopsis?

Modified [v10] as suggested.

>
> Not changed. The readers of this docs page are all users who will be
> familiar with the filter expressions, so I felt the "OR'ed together"
> part will be perfectly clear to the intended audience.
>
> If you want to keep it, change it to "ORed". It is used in indices.sgml. Let's
> keep the consistence.

Modified [v10] as suggested.

>
> But I did not understand your point about “If, for some reason,
> someone decided to change it, this section will contain obsolete
> information”, because IIUC that will be equally true for both the
> explanation and the output, so I did not understand why you say "psql
> output is fine, the explanation is not". It is the business of this
> documentation to help the user to know how and where they can find the
> row filter information they may need to know.
>
> You are describing a psql command here. My point is keep psql explanation in
> the psql section. This section is to describe the row filter feature. If we
> start describing features in other sections, we will have outdated information
> when the referred feature is changed and someone fails to find all references.
> I tend to concentrate detailed explanation in the feature section. If I have 
> to
> add links in other sections, I use "Seee Section 1.23 for details ...".

Modified [v10] so the psql descriptions are now very generic.

>
> Not changed. The publisher and subscriber programlistings are always
> separated. If you are looking at the rendered HTML I think it is quite
> clear that one is at the publisher and one is at the subscriber. OTOH,
> if we omitted creating the tables on the subscriber then I think that
> really would cause some confusion.
>
> The difference is extra space. By default, the CSS does not include a border
> for programlisting. That's why I complained about it. I noticed that the
> website CSS includes it. However, the PDF will not include the border. I would
> add a separate description for the subscriber just to be clear.
>

Modified [v10] as suggested to add a separate description for the subscriber.

> One last suggestion, you are using identifiers in uppercase letters but
> "primary key" is in lowercase.
>

Modified [v10] as suggested to make this uppercase

--
[v10] 
https://www.postgresql.org/message-id/CAHut%2BPvMEYkCRWDoZSpFnP%2B5SExus7YzWAd%3D6ah9vwkfRhOnSg%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: PG DOCS - logical replication filtering

2022-04-12 Thread Peter Smith
PSA patch v10 which addresses the remaining review comments from Euler [1]

--
[1] 
https://www.postgresql.org/message-id/3cd8d622-6a26-4eaf-a5aa-ac78030e5f50%40www.fastmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia


v10-0001-Add-additional-documentation-for-row-filters.patch
Description: Binary data


Re: Skipping schema changes in publication

2022-04-12 Thread vignesh C
On Tue, Apr 12, 2022 at 4:46 PM Amit Kapila  wrote:
>
> On Tue, Apr 12, 2022 at 4:17 PM vignesh C  wrote:
> >
> > On Tue, Apr 12, 2022 at 12:19 PM Amit Kapila  
> > wrote:
> > >
> > >
> > > For the second syntax (Alter Publication ...), isn't it better to
> > > avoid using ADD? It looks odd to me because we are not adding anything
> > > in publication with this sytax.
> >
> > I was thinking of the scenario where user initially creates the
> > publication for all tables:
> > CREATE PUBLICATION pub1 FOR ALL TABLES;
> >
> > After that user decides to skip few tables ex: t1, t2
> >  ALTER PUBLICATION pub1 ADD SKIP TABLE t1,t2;
> >
> > I thought of supporting this syntax if incase user decides to add the
> > skipping of a few tables later.
> >
>
> I understand that part but what I pointed out was that it might be
> better to avoid using ADD keyword in this syntax like: ALTER
> PUBLICATION pub1 SKIP TABLE t1,t2;

Currently we are supporting Alter publication using the following syntax:
ALTER PUBLICATION pub1 ADD TABLE t1;
ALTER PUBLICATION pub1 SET TABLE t1;
ALTER PUBLICATION pub1 DROP TABLE T1;
ALTER PUBLICATION pub1 ADD ALL TABLES IN SCHEMA sch1;
ALTER PUBLICATION pub1 SET ALL TABLES IN SCHEMA sch1;
ALTER PUBLICATION pub1 DROP ALL TABLES IN SCHEMA sch1;

I have extended the new syntax in similar lines:
ALTER PUBLICATION pub1 ADD SKIP TABLE t1;
ALTER PUBLICATION pub1 SET SKIP TABLE t1;
ALTER PUBLICATION pub1 DROP SKIP TABLE T1;

I did it like this to maintain consistency.
But I'm fine doing it either way to keep it simple for the user.

Regards,
Vignesh




Re: Improving the "Routine Vacuuming" docs

2022-04-12 Thread David G. Johnston
On Tue, Apr 12, 2022 at 5:22 PM Peter Geoghegan  wrote:

> I just don't think that you need to make it any more complicated than
> this: physical XID values are only meaningful when compared to other
> XIDs from the same cluster. The system needs to make sure that no two
> XIDs can ever be more than about 2 billion XIDs apart, and here's how
> you as a DBA can help the system to make sure of that.
>
>
I decided to run with that perspective and came up with the following rough
draft.  A decent amount of existing material I would either just remove or
place elsewhere as "see for details".

The following represents the complete section.

David J.

   
This vacuum responsibility is necessary due to the fact that a
transaction ID (xid)
has a lifetime of 2 billion transactions.  The rows created by a given
transaction
(recorded in xmin) must be frozen prior to the expiration of the xid.
(The expired xid values can then be resurrected, see ... for details).
This is done by flagging the rows as frozen and thus visible for the
remainder
of the row's life.
   

   
While vacuum will not touch a row's xmin while updating its frozen
status, two reserved xid
values may be seen. BootstreapTransactionId (1) may
be seen on system catalog
tables to indicate records inserted during initdb.
FronzenTransactionID (2)
may be seen on any table and also indicates that the row is frozen.
This was the mechanism
used in versions prior to 9.4, when it was decided to keep the xmin
unchanged for forensic use.
   

   
VACUUM uses the visibility map
to determine which pages of a table must be scanned.  Normally, it
will skip pages that don't have any dead row versions even if those
pages
might still have row versions with old XID values.  Therefore, normal
VACUUMs won't always freeze every old row version in
the table.
When that happens, VACUUM will eventually need to
perform an
aggressive vacuum, which will freeze all
eligible unfrozen
XID and MXID values, including those from all-visible but not
all-frozen pages.
In practice most tables require periodic aggressive vacuuming.
   

   
Thus, an aging transaction will potentially pass a number of milestone
ages,
controlled by various configuration settings or hard-coded into the
server,
as it awaits its fate either being memorialized cryogenically or in
death.
While the following speaks of an individual transaction's age, in
practice
each table has a relfrozenxid attribute which is used by system as a
reference
age as it is oldest potentially living transaction on the table (see
xref for details).
   

   
The first milestone is controlled by vacuum_freeze_min_age (50 million)
and marks the age
at which the row becomes eligible to become frozen.
   
   
Next up is vacuum_freeze_table_age (120 million).  Before this age the
row can be frozen,
but a non-aggressive vacuum may not encounter the row due to the
visibility
map optimizations described above.  Vacuums performed while relfrozenxid
is older than this age will be done aggressively.
   
   
For tables where routine complete vacuuming doesn't happen the
auto-vacuum
daemon acts as a safety net.  When the age of the row exceeds
autovacuum_freeze_max_age (200 million) the autovacuum daemon, even if
disabled for the table,
will perform an anti-wraparound vacuum on the table (see below).
   
   
Finally, as a measure of last resort, the system will begin emitting
warnings
(1.940 billion) and then (1.997 billion) shutdown.
It may be restarted in single user mode for manual aggressive vacuuming.
   

   
An anti-wraparound vacuum is much more expensive than an aggressive
vacuum and
so the gap between the vacuum_freeze_table_age and
autovacuum_freeze_max_age
should be somewhat large (vacuum age must be at most 95% of the
autovacuum age
to be meaningful).
   

   
Transaction history and commit status storage requirements are directly
related to
autovacuum_freeze_max_age due to retention policies
based upon
that age. See xref ... for additional details.
   

   
The reason for vacuum_freeze_min_age is to manage the trade-off between
minimizing
rows marked dead that are already frozen versus minimizing the number
of rows
being frozen aggressively.
   


Re: failures in t/031_recovery_conflict.pl on CI

2022-04-12 Thread Andres Freund
Hi,

On 2022-04-12 15:05:22 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2022-04-09 19:34:26 -0400, Tom Lane wrote:
> >> +1.  This is probably more feasible given the latch infrastructure
> >> than it was when that code was first written.
> 
> > What do you think about just reordering the disable_all_timeouts() to be
> > before the got_standby_deadlock_timeout check in the back branches? I think
> > that should close at least the most obvious hole.  And fix it properly in
> > HEAD?
> 
> I don't have much faith in that, and I don't see why we can't fix it
> properly.

It's not too hard, agreed.

It's somewhat hard to test that it works though. The new recovery conflict
tests test all recovery conflicts except for deadlock ones, because they're
not easy to trigger... But I think I nearly got it working reliably.

It's probably worth backpatching the tests, after stripping them of the stats
checks?

Three questions:

- For HEAD we have to replace the disable_all_timeouts() calls, it breaks the
  replay progress reporting. Is there a reason to keep them in the
  backbranches? Hard to see how an extension or something could rely on it,
  but ...?

- I named the variable set by the signal handler got_standby_delay_timeout,
  because just got_standby_timeout looked weird besides the _deadlock_. Which
  makes me think that we should rename STANDBY_TIMEOUT to
  STANDBY_DELAY_TIMEOUT too?

- There's the following comment in ResolveRecoveryConflictWithBufferPin():

  "We assume that only UnpinBuffer() and the timeout requests established
   above can wake us up here."

  That bogus afaict? There's plenty other things that can cause MyProc->latch
  to be set. Is it worth doing something about this at the same time? Right
  now we seem to call ResolveRecoveryConflictWithBufferPin() in rapid
  succession initially.


Greetings,

Andres Freund




Re: Improving the "Routine Vacuuming" docs

2022-04-12 Thread Peter Geoghegan
On Tue, Apr 12, 2022 at 4:24 PM David G. Johnston
 wrote:
> I've attached some off-the-cuff thoughts on reworking the first three 
> paragraphs and the note.
>
> It's hopefully useful for providing perspective if nothing else.

More perspective is definitely helpful.

> I'm assuming and caring only about visible rows when I'm reading this 
> section. Maybe we need to make that explicit - only xmin matters (and the 
> invisible frozen flag)?

The statement "only xmin matters" is true in spirit. If xmax needed to
be frozen then we'd actually remove the whole tuple instead (unless it
was a MultiXact). Alternatively, if it looked like xmax needed to be
frozen, but the XID turned out to have been from an aborted xact, then
we'd clear the XID from xmax instead.

Freezing tends to lag the removal of dead tuples, but that's just an
optimization. If you set vacuum_freeze_min_age to 0 then freezing and
dead tuple removal happen in tandem (actually, we can remove tuples
inserted by an XID after VACUUM's OldestXmin/removal cutoff when the
inserting xact aborts, but VACUUM makes no real promises about XIDs >=
OldestXmin anyway).

>> It might also be useful to describe freezing all of a live tuple's
>> XIDs as roughly the opposite process as completely physically removing
>> a dead tuple. It follows that we don't necessarily need to freeze
>> anything to advance relfrozenxid (especially not on Postgres 15).
>
>
> I failed to pickup on how this and "mod-2^32" math interplay, and I'm not 
> sure I care when reading this.  It made more sense to consider "shortest 
> path" along the "circle".

It would probably be possible to teach the system to deal with
coexisting XIDs that are close to a full ~4 billion XIDs apart. We'd
have to give up on the mod-2^32 comparison stuff in functions like
TransactionIdPrecedes(), and carefully keep track of things per-table,
and carry that context around a lot more. I certainly don't think that
that's a good idea (if 2 billion XIDs wasn't enough, why should 4
billion XIDs be?), but it does seem feasible.

My point is this: there is nothing particularly intuitive or natural
about the current ~2 billion XID limit, even if you already know that
XIDs are generally represented on disk as 32-bit unsigned integers.
And so the fact is that we are already asking users to take it on
faith that there are truly good reasons why the system cannot tolerate
any scenario in which two unfrozen XIDs are more than about ~2 billion
XIDs apart. Why not just admit that, and then deem the XID comparison
rules out of scope for this particular chapter of the docs?

*Maybe* it's still useful to discuss why things work that way in code
like TransactionIdPrecedes(), but that's a totally different
discussion -- it doesn't seem particularly relevant to the design of
VACUUM, no matter the audience. Most DBAs will just accept that the
"XID distance" limit/invariant is about ~2 billion XIDs for esoteric
implementation reasons, with some vague idea of why it must be so
(e.g., "32-bit integers don't have enough space"). They will be no
worse off for it.

(Bear in mind that mod-2^32 comparison stuff was only added when
freezing/wraparound was first implemented back in 2001, by commit
bc7d37a525.)

>> Currently, "25.1.5. Preventing Transaction ID Wraparound Failures"
>> says this, right up-front:
>>
>> "But since transaction IDs have limited size (32 bits) a cluster that
>> runs for a long time (more than 4 billion transactions) would suffer
>> transaction ID wraparound"
>
>
> I both agree and disagree - where I settled (as of now) is reflected in the 
> patch.

I just don't think that you need to make it any more complicated than
this: physical XID values are only meaningful when compared to other
XIDs from the same cluster. The system needs to make sure that no two
XIDs can ever be more than about 2 billion XIDs apart, and here's how
you as a DBA can help the system to make sure of that.

Discussion of the past becoming the future just isn't helpful, because
that simply cannot ever happen on any version of Postgres from the
last decade. Freezing is more or less an overhead of storing data in
Postgres long term (even medium term) -- that is the simple reality.
We should say so.

> I am wondering, for the more technical details, is there an existing place to 
> send xrefs, do you plan to create one, or is it likely unnecessary?

I might end up doing that, but just want to get a general sense of how
other hackers feel about it for now.

-- 
Peter Geoghegan




Re: Support logical replication of DDLs

2022-04-12 Thread Zheng Li
Hi,

Here is the rebased new branch
https://github.com/zli236/postgres/commits/ddl_replication

Regards,
Zheng




Re: Improving the "Routine Vacuuming" docs

2022-04-12 Thread David G. Johnston
On Tue, Apr 12, 2022 at 2:53 PM Peter Geoghegan  wrote:

> Recent work on VACUUM and relfrozenxid advancement required that I
> update the maintenance.sgml VACUUM documentation ("Routine
> Vacuuming"). It was tricky to keep things current, due in part to
> certain structural problems. Many of these problems are artifacts of
> how the document evolved over time.
>
> "Routine Vacuuming" ought to work as a high level description of how
> VACUUM keeps the system going over time. The intended audience is
> primarily DBAs, so low level implementation details should either be
> given much less prominence, or not even mentioned. We should keep it
> practical -- without going too far in the direction of assuming that
> we know the limits of what information might be useful.
>

+1

I've attached some off-the-cuff thoughts on reworking the first three
paragraphs and the note.

It's hopefully useful for providing perspective if nothing else.


> My high level concerns are:
>
> * Instead of discussing FrozenTransactionId (and then explaining how
> that particular magic value is not really used anymore anyway), why
> not describe freezing in terms of the high level rules?
>

Agreed and considered

>
> Something along the lines of the following seems more useful: "A tuple
> whose xmin is frozen (and xmax is unset) is considered visible to
> every possible MVCC snapshot. In other words, the transaction that
> inserted the tuple is treated as if it ran and committed at some point
> that is now *infinitely* far in the past."
>

I'm assuming and caring only about visible rows when I'm reading this
section. Maybe we need to make that explicit - only xmin matters (and the
invisible frozen flag)?


> It might also be useful to describe freezing all of a live tuple's
> XIDs as roughly the opposite process as completely physically removing
> a dead tuple. It follows that we don't necessarily need to freeze
> anything to advance relfrozenxid (especially not on Postgres 15).
>

I failed to pickup on how this and "mod-2^32" math interplay, and I'm not
sure I care when reading this.  It made more sense to consider "shortest
path" along the "circle".


> Currently, "25.1.5. Preventing Transaction ID Wraparound Failures"
> says this, right up-front:
>
> "But since transaction IDs have limited size (32 bits) a cluster that
> runs for a long time (more than 4 billion transactions) would suffer
> transaction ID wraparound"
>

I both agree and disagree - where I settled (as of now) is reflected in the
patch.


> * The description of wraparound sounds terrifying, implying that data
> corruption can result.
>

Agreed, though I just skimmed a bit after the material the patch covers.

>
> * XID space isn't really a precious resource -- it isn't even a
> resource at all IMV.
>

Agreed

>
> * We don't cleanly separate discussion of anti-wraparound autovacuums,
> and aggressive vacuums, and the general danger of wraparound (by which
> I actually mean the danger of having the xidStopLimit stop limit kick
> in).
>

Didn't really get this far.

I am wondering, for the more technical details, is there an existing place
to send xrefs, do you plan to create one, or is it likely unnecessary?
David J.


v0001-doc-reworking-of-vacuum-transaction-wraparound-section.patch
Description: Binary data


Re: make MaxBackends available in _PG_init

2022-04-12 Thread Nathan Bossart
On Tue, Apr 12, 2022 at 05:46:36PM -0400, Tom Lane wrote:
> Nathan Bossart  writes:
>> If we allow changing GUCs in _PG_init() and provide another hook where
>> MaxBackends will be initialized, do we need to introduce another GUC flag,
>> or can we get away with just blocking all GUC changes when the new hook is
>> called?  I'm slightly hesitant to add a GUC flag that will need to manually
>> maintained.  Wouldn't it be easily forgotten?
> 
> On the whole I think Robert's got the right idea: we do not really
> need an enforcement mechanism at all.  People who are writing
> extensions that do this sort of thing will learn how to do it right.
> (It's not like there's not a thousand other things they have to get
> right.)

Okay.  So maybe we only need the attached patches.  0001 is just 5ecd018
un-reverted, and 0002 is Julien's patch from a few weeks ago with some
minor edits.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 4c5ebca537ffdfbf61079a82b18ce7bc97222c69 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 12 Apr 2022 14:57:00 -0700
Subject: [PATCH v3 1/2] Fix comments about bgworker registration before
 MaxBackends initialization

Since 6bc8ef0b, InitializeMaxBackends() has used max_worker_processes
instead of adapting MaxBackends to the number of background workers
registered by modules loaded in shared_preload_libraries (at this time,
bgworkers were only static, but gained dynamic capabilities as a matter
of supporting parallel queries meaning that a control cap was
necessary).

Some comments referred to the past registration logic, making them
confusing and incorrect, so fix these.

Some of the out-of-core modules that could be loaded in this path
sometimes like to manipulate dynamically some of the resource-related
GUCs for their own needs, this commit adds a note about that.

Author: Nathan Bossart
Discussion: https://postgr.es/m/20220127181815.GA551692@nathanxps13
---
 src/backend/postmaster/postmaster.c | 10 --
 src/backend/utils/init/postinit.c   |  5 ++---
 2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 3dcaf8a4a5..d57fefa9a8 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1005,10 +1005,8 @@ PostmasterMain(int argc, char *argv[])
 	LocalProcessControlFile(false);
 
 	/*
-	 * Register the apply launcher.  Since it registers a background worker,
-	 * it needs to be called before InitializeMaxBackends(), and it's probably
-	 * a good idea to call it before any modules had chance to take the
-	 * background worker slots.
+	 * Register the apply launcher.  It's probably a good idea to call this
+	 * before any modules had a chance to take the background worker slots.
 	 */
 	ApplyLauncherRegister();
 
@@ -1029,8 +1027,8 @@ PostmasterMain(int argc, char *argv[])
 #endif
 
 	/*
-	 * Now that loadable modules have had their chance to register background
-	 * workers, calculate MaxBackends.
+	 * Now that loadable modules have had their chance to alter any GUCs,
+	 * calculate MaxBackends.
 	 */
 	InitializeMaxBackends();
 
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 9139fe895c..a28612b375 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -538,9 +538,8 @@ pg_split_opts(char **argv, int *argcp, const char *optstr)
 /*
  * Initialize MaxBackends value from config options.
  *
- * This must be called after modules have had the chance to register background
- * workers in shared_preload_libraries, and before shared memory size is
- * determined.
+ * This must be called after modules have had the chance to alter GUCs in
+ * shared_preload_libraries and before shared memory size is determined.
  *
  * Note that in EXEC_BACKEND environment, the value is passed down from
  * postmaster to subprocesses via BackendParameters in SubPostmasterMain; only
-- 
2.25.1

>From 862a5b9a8a27995157a8b71b8f3ce09bf6ed17c5 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Fri, 25 Mar 2022 11:05:24 +0800
Subject: [PATCH v3 2/2] Add a new shmem_request_hook hook.

Currently, preloaded libraries are expected to request additional
shared memory in _PG_init().  However, it is not unusal for such
requests to depend on MaxBackends, which won't be initialized at
that time.  This introduces a new hook where modules can use
MaxBackends to request additional shared memory.

Author: Julien Rouhaud
Discussion: https://postgr.es/m/20220412210112.GA2065815%40nathanxps13
---
 src/backend/storage/ipc/ipci.c   | 15 +++
 src/include/storage/ipc.h|  2 ++
 src/tools/pgindent/typedefs.list |  1 +
 3 files changed, 18 insertions(+)

diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 75e456360b..fdd72175d5 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -52,6 +52,7 @@
 /* GUCs */
 int			

Improving the "Routine Vacuuming" docs

2022-04-12 Thread Peter Geoghegan
Recent work on VACUUM and relfrozenxid advancement required that I
update the maintenance.sgml VACUUM documentation ("Routine
Vacuuming"). It was tricky to keep things current, due in part to
certain structural problems. Many of these problems are artifacts of
how the document evolved over time.

"Routine Vacuuming" ought to work as a high level description of how
VACUUM keeps the system going over time. The intended audience is
primarily DBAs, so low level implementation details should either be
given much less prominence, or not even mentioned. We should keep it
practical -- without going too far in the direction of assuming that
we know the limits of what information might be useful.

My high level concerns are:

* Instead of discussing FrozenTransactionId (and then explaining how
that particular magic value is not really used anymore anyway), why
not describe freezing in terms of the high level rules?

Something along the lines of the following seems more useful: "A tuple
whose xmin is frozen (and xmax is unset) is considered visible to
every possible MVCC snapshot. In other words, the transaction that
inserted the tuple is treated as if it ran and committed at some point
that is now *infinitely* far in the past."

It might also be useful to describe freezing all of a live tuple's
XIDs as roughly the opposite process as completely physically removing
a dead tuple. It follows that we don't necessarily need to freeze
anything to advance relfrozenxid (especially not on Postgres 15).

* The general description of how the XID space works similarly places
way too much emphasis on low level details that are of very little
relevance.

These details would even seem totally out of place if I was the
intended audience. The problem isn't really that the information is
too technical. The problem is that we emphasize mechanistic stuff
while never quite explaining the point of it all.

Currently, "25.1.5. Preventing Transaction ID Wraparound Failures"
says this, right up-front:

"But since transaction IDs have limited size (32 bits) a cluster that
runs for a long time (more than 4 billion transactions) would suffer
transaction ID wraparound"

This is way too mechanistic. We totally muddle things by even
mentioning 4 billion XIDs in the first place. It seems like a
confusing artefact of a time before freezing was invented, back when
you really could have XIDs that were more than 2 billion XIDs apart.

This statement has another problem: it's flat-out untrue. The
xidStopLimit stuff will reliably kick in at about 2 billion XIDs.

* The description of wraparound sounds terrifying, implying that data
corruption can result.

The alarming language isn't proportionate to the true danger
(something I complained about in a dedicated thread last year [1]).

* XID space isn't really a precious resource -- it isn't even a
resource at all IMV.

ISTM that we should be discussing wraparound as an issue about the
maximum *difference* between any two unfrozen XIDs in a
cluster/installation.

Talking about an abstract-sounding XID space seems to me to be quite
counterproductive. The logical XID space is practically infinite,
after all. We should move away from the idea that physical XID space
is a precious resource. Sure, users are often concerned that the
xidStopLimit mechanism might kick-in, effectively resulting in an
outage. That makes perfect sense. But it doesn't follow that XIDs are
precious, and implying that they are intrinsically valuable just
confuses matters.

First of all, physical XID space is usually abundantly available. A
"distance" of ~2 billion XIDs is a vast distance in just about any
application (barring those with pathological problems, such as a
leaked replication slot). Second of all, Since the amount of physical
freezing required to be able to advance relfrozenxid by any given
amount (amount of XIDs) varies enormously, and is not even predictable
for a given table (because individual tables don't get their own
physical XID space), the age of datfrozenxid predicts very little
about how close we are to having the dreaded xidStopLimit mechanism
kick in. We do need some XID-wise slack, but that's just a way of
absorbing shocks -- it's ballast, usually only really needed for one
or two very large tables.

Third of all, and most importantly, the whole idea that we can just
put off freezing indefinitely and actually reduce the pain (rather
than having a substantial increase in problems) seems to have just
about no basis in reality, at least once you get into the tens of
millions range (though usually well before that).

Why should you be better off if all of your freezing occurs in one big
balloon payment? Sometimes getting into debt for a while is useful,
but why should it make sense to keep delaying freezing? And if it
doesn't make sense, then why does it still make sense to treat XID
space as a precious resource?

* We don't cleanly separate discussion of anti-wraparound autovacuums,
and aggressive vacuums, and 

Re: make MaxBackends available in _PG_init

2022-04-12 Thread Tom Lane
Nathan Bossart  writes:
> If we allow changing GUCs in _PG_init() and provide another hook where
> MaxBackends will be initialized, do we need to introduce another GUC flag,
> or can we get away with just blocking all GUC changes when the new hook is
> called?  I'm slightly hesitant to add a GUC flag that will need to manually
> maintained.  Wouldn't it be easily forgotten?

On the whole I think Robert's got the right idea: we do not really
need an enforcement mechanism at all.  People who are writing
extensions that do this sort of thing will learn how to do it right.
(It's not like there's not a thousand other things they have to get
right.)

regards, tom lane




Re: make MaxBackends available in _PG_init

2022-04-12 Thread Tom Lane
Robert Haas  writes:
> On Tue, Apr 12, 2022 at 4:58 PM Tom Lane  wrote:
>> It seems after a bit of reflection that what we ought to do is identify
>> the subset of PGC_POSTMASTER GUCs that feed into shared memory sizing,
>> mark those with a new GUC flag, and not allow them to be changed after
>> shared memory is set up.

> I dunno, I feel like this is over-engineered.

It probably is.  I'm just offering this as a solution if people want to
insist on a mechanism to prevent unsafe GUC changes.  If we drop the
idea of trying to forcibly prevent extensions from Doing It Wrong,
then we don't need this, only the extra hook.

regards, tom lane




Re: make MaxBackends available in _PG_init

2022-04-12 Thread Robert Haas
On Tue, Apr 12, 2022 at 4:58 PM Tom Lane  wrote:
> It's reasonable to allow changing *most* GUCs in _PG_init(); let us
> please not break cases that work fine today.
>
> It seems after a bit of reflection that what we ought to do is identify
> the subset of PGC_POSTMASTER GUCs that feed into shared memory sizing,
> mark those with a new GUC flag, and not allow them to be changed after
> shared memory is set up.  (An alternative to a new flag bit is to
> subdivide the PGC_POSTMASTER context into two values, but that seems
> like it'd be far more invasive.  A flag bit needn't be user-visible.)
> Then, what we'd need to do is arrange for shared-preload modules to
> receive their _PG_init call before shared memory creation (when they
> can still twiddle such GUCs) and then another callback later.

I dunno, I feel like this is over-engineered. An extra hook wouldn't
really affect anyone who doesn't need to use it, and I doubt that
adapting to it would create much pain for anyone who is knowledgeable
enough to be writing these kinds of extensions in the first place.
This design is something everyone who adds a GUC from now until the
end of the project needs to know about for the benefit of a tiny
number of extension authors who weren't really going to have a big
problem anyway.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: make MaxBackends available in _PG_init

2022-04-12 Thread Nathan Bossart
On Tue, Apr 12, 2022 at 04:58:42PM -0400, Tom Lane wrote:
> Nathan Bossart  writes:
>> I think it'd be reasonable to allow changing custom GUCs in _PG_init().
>> Those aren't going to impact things like MaxBackends.
> 
> It's reasonable to allow changing *most* GUCs in _PG_init(); let us
> please not break cases that work fine today.

Right, this is a fair point.

> It seems after a bit of reflection that what we ought to do is identify
> the subset of PGC_POSTMASTER GUCs that feed into shared memory sizing,
> mark those with a new GUC flag, and not allow them to be changed after
> shared memory is set up.  (An alternative to a new flag bit is to
> subdivide the PGC_POSTMASTER context into two values, but that seems
> like it'd be far more invasive.  A flag bit needn't be user-visible.)
> Then, what we'd need to do is arrange for shared-preload modules to
> receive their _PG_init call before shared memory creation (when they
> can still twiddle such GUCs) and then another callback later.

If we allow changing GUCs in _PG_init() and provide another hook where
MaxBackends will be initialized, do we need to introduce another GUC flag,
or can we get away with just blocking all GUC changes when the new hook is
called?  I'm slightly hesitant to add a GUC flag that will need to manually
maintained.  Wouldn't it be easily forgotten?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: make MaxBackends available in _PG_init

2022-04-12 Thread Nathan Bossart
On Tue, Apr 12, 2022 at 04:33:39PM -0400, Tom Lane wrote:
> Nathan Bossart  writes:
>> On Tue, Apr 12, 2022 at 03:12:42PM -0400, Robert Haas wrote:
>>> But if there's even one use case where adjusting GUCs at this phase is
>>> reasonable, then 0003 isn't really good enough. We need an 0004 that
>>> provides a new hook in a place where such changes can safely be made.
> 
>> I think that is doable.  IMO it should be ѕomething like _PG_change_GUCs()
>> that is called before _PG_init().  The other option is to add a hook called
>> after _PG_init() where MaxBackends is available (in which case we likely
>> want GetMaxBackends() again).  Thoughts?
> 
> I like the second option.  Calling into a module before we've called its
> _PG_init function is just weird, and will cause no end of confusion.

Alright.  I think that is basically what Julien recommended a few weeks ago
[0].  Besides possibly reintroducing GetMaxBackends(), we might also want
to block SetConfigOption() when called in this new hook.

[0] https://postgr.es/m/20220325031146.cd23gaak5qlzdy6l%40jrouhaud

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: make MaxBackends available in _PG_init

2022-04-12 Thread Tom Lane
Nathan Bossart  writes:
> I think it'd be reasonable to allow changing custom GUCs in _PG_init().
> Those aren't going to impact things like MaxBackends.

It's reasonable to allow changing *most* GUCs in _PG_init(); let us
please not break cases that work fine today.

It seems after a bit of reflection that what we ought to do is identify
the subset of PGC_POSTMASTER GUCs that feed into shared memory sizing,
mark those with a new GUC flag, and not allow them to be changed after
shared memory is set up.  (An alternative to a new flag bit is to
subdivide the PGC_POSTMASTER context into two values, but that seems
like it'd be far more invasive.  A flag bit needn't be user-visible.)
Then, what we'd need to do is arrange for shared-preload modules to
receive their _PG_init call before shared memory creation (when they
can still twiddle such GUCs) and then another callback later.

regards, tom lane




Re: make MaxBackends available in _PG_init

2022-04-12 Thread Tom Lane
Nathan Bossart  writes:
> On Tue, Apr 12, 2022 at 03:12:42PM -0400, Robert Haas wrote:
>> But if there's even one use case where adjusting GUCs at this phase is
>> reasonable, then 0003 isn't really good enough. We need an 0004 that
>> provides a new hook in a place where such changes can safely be made.

> I think that is doable.  IMO it should be ѕomething like _PG_change_GUCs()
> that is called before _PG_init().  The other option is to add a hook called
> after _PG_init() where MaxBackends is available (in which case we likely
> want GetMaxBackends() again).  Thoughts?

I like the second option.  Calling into a module before we've called its
_PG_init function is just weird, and will cause no end of confusion.

regards, tom lane




Re: WIP: WAL prefetch (another approach)

2022-04-12 Thread Thomas Munro
On Wed, Apr 13, 2022 at 3:57 AM Dagfinn Ilmari Mannsåker
 wrote:
> Simon Riggs  writes:
> > This is a nice feature if it is safe to turn off full_page_writes.

As other have said/shown, it does also help if a block with FPW is
evicted and then read back in during one checkpoint cycle, in other
words if the working set is larger than shared buffers.

This also provides infrastructure for proposals in the next cycle, as
part of commitfest #3316:
* in direct I/O mode, I/O stalls become more likely due to lack of
kernel prefetching/double-buffering, so prefetching becomes more
essential
* even in buffered I/O mode when benefiting from free
double-buffering, the copy from kernel buffer to user space buffer can
be finished in the background instead of calling pread() when you need
the page, but you need to start it sooner
* adjacent blocks accessed by nearby records can be merged into a
single scatter-read, for example with preadv() in the background
* repeated buffer lookups, pins, locks (and maybe eventually replay)
to the same page can be consolidated

Pie-in-the-sky ideas:
* someone might eventually want to be able to replay in parallel
(hard, but certainly requires lookahead)
* I sure hope we'll eventually use different techniques for torn-page
protection to avoid the high online costs of FPW

> > When is it safe to do that? On which platform?
> >
> > I am not aware of any released software that allows full_page_writes
> > to be safely disabled. Perhaps something has been released recently
> > that allows this? I think we have substantial documentation about
> > safety of other settings, so we should carefully document things here
> > also.
>
> Our WAL reliability docs claim that ZFS is safe against torn pages:
>
> https://www.postgresql.org/docs/current/wal-reliability.html:
>
> If you have file-system software that prevents partial page writes
> (e.g., ZFS), you can turn off this page imaging by turning off the
> full_page_writes parameter.

Unfortunately, posix_fadvise(WILLNEED) doesn't do anything on ZFS
right now :-(.  I have some patches to fix that on Linux[1] and
FreeBSD and it seems like there's a good chance of getting them
committed based on feedback, but it needs some more work on tests and
mmap integration.  If anyone's interested in helping get that landed
faster, please ping me off-list.

[1] https://github.com/openzfs/zfs/pull/9807




Re: make MaxBackends available in _PG_init

2022-04-12 Thread Nathan Bossart
On Tue, Apr 12, 2022 at 03:30:23PM -0400, Robert Haas wrote:
> On Tue, Apr 12, 2022 at 3:22 PM Tom Lane  wrote:
>> Yeah.  It's a very long way from "changing shared memory sizes here
>> doesn't work" to "no extension is allowed to change any GUC at load
>> time".  A counterexample is that it's not clear why an extension
>> shouldn't be allowed to define a GUC using DefineCustomXXXVariable
>> and then immediately set it to some other value.  I think trying to
>> forbid that across-the-board is going to mostly result in breaking
>> a lot of cases that are perfectly safe.
> 
> Hmm, that's an interesting case which I hadn't considered. It seems
> like sort of a lame thing to do, because why wouldn't you just create
> the GUC with the right initial value instead of modifying it after the
> fact? But maybe there's some use case in which it makes sense to do it
> that way. A read-only GUC that advertises some calculated value,
> perhaps?

I think it'd be reasonable to allow changing custom GUCs in _PG_init().
Those aren't going to impact things like MaxBackends.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: make MaxBackends available in _PG_init

2022-04-12 Thread Nathan Bossart
On Tue, Apr 12, 2022 at 03:12:42PM -0400, Robert Haas wrote:
> But if there's even one use case where adjusting GUCs at this phase is
> reasonable, then 0003 isn't really good enough. We need an 0004 that
> provides a new hook in a place where such changes can safely be made.

I think that is doable.  IMO it should be ѕomething like _PG_change_GUCs()
that is called before _PG_init().  The other option is to add a hook called
after _PG_init() where MaxBackends is available (in which case we likely
want GetMaxBackends() again).  Thoughts?

> Meanwhile, committed 0001.

Thanks.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: How about a psql backslash command to show GUCs?

2022-04-12 Thread Jonathan S. Katz

On 4/12/22 1:00 PM, Tom Lane wrote:

"Jonathan S. Katz"  writes:

On 4/12/22 11:19 AM, Tom Lane wrote:

It'd just look like this, I think.  I see from looking at guc.c that
boot_val can be NULL, so we'd better use IS DISTINCT FROM.



I tested it and I like this a lot better, at least it's much more
consolidated. They all seem to be generated (directories, timezones,
collations/encodings).


Yeah, most of what shows up in a minimally-configured installation is
postmaster-computed settings like config_file, rather than things
that were actually set by the DBA.  Personally I'd rather hide the
ones that have source = 'override', but that didn't seem to be the
consensus.


The list seems more reasonable now, though now that I'm fully in the 
"less is more" camp based on the "non-defaults" description, I think 
anything we can do to further prune is good.



The one exception to this seems to be "max_stack_depth", which is
rendering on my "\dconfig" though I didn't change it, an it's showing
it's default value of 2MB. "boot_val" says 100, "reset_val" says 2048,
and it's commented out in my postgresql.conf. Do we want to align that?


I don't think there's any principled thing we could do about that in
psql.  The boot_val is a conservatively small 100kB, but we crank
that up automatically based on getrlimit(RLIMIT_STACK), so on any
reasonable platform it's going to show as not being default.


Got it.

We may be at a point where it's "good enough" and let more people chime 
in during beta.


Thanks,

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature


Re: make MaxBackends available in _PG_init

2022-04-12 Thread Robert Haas
On Tue, Apr 12, 2022 at 3:22 PM Tom Lane  wrote:
> Yeah.  It's a very long way from "changing shared memory sizes here
> doesn't work" to "no extension is allowed to change any GUC at load
> time".  A counterexample is that it's not clear why an extension
> shouldn't be allowed to define a GUC using DefineCustomXXXVariable
> and then immediately set it to some other value.  I think trying to
> forbid that across-the-board is going to mostly result in breaking
> a lot of cases that are perfectly safe.

Hmm, that's an interesting case which I hadn't considered. It seems
like sort of a lame thing to do, because why wouldn't you just create
the GUC with the right initial value instead of modifying it after the
fact? But maybe there's some use case in which it makes sense to do it
that way. A read-only GUC that advertises some calculated value,
perhaps?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: make MaxBackends available in _PG_init

2022-04-12 Thread Tom Lane
Robert Haas  writes:
> Gah, I hate putting this off for another year, but I guess I'm also
> not convinced that 0003 has the right idea, so maybe it's for the
> best. Here's what's bugging me: 0003 decrees that _PG_init() is the
> Wrong Place to Adjust GUC Settings. Now, that begs the question: what
> is the right place to adjust GUC settings? I argued upthread that
> extensions shouldn't be overriding values from postgresql.conf at all,
> but on further reflection I think there's at least one legitimate use
> case for this sort of thing: some kind of auto-tuning module that, for
> example, looks at how much memory you have and sets shared_buffers or
> work_mem or something based on the result.

Yeah.  It's a very long way from "changing shared memory sizes here
doesn't work" to "no extension is allowed to change any GUC at load
time".  A counterexample is that it's not clear why an extension
shouldn't be allowed to define a GUC using DefineCustomXXXVariable
and then immediately set it to some other value.  I think trying to
forbid that across-the-board is going to mostly result in breaking
a lot of cases that are perfectly safe.

regards, tom lane




Re: make MaxBackends available in _PG_init

2022-04-12 Thread Robert Haas
On Tue, Apr 12, 2022 at 2:03 PM Nathan Bossart  wrote:
> On Tue, Apr 12, 2022 at 10:44:27AM -0700, Andres Freund wrote:
> > On 2022-04-11 14:14:35 -0700, Nathan Bossart wrote:
> >> Here's a new patch set.  I've only changed 0003 as described above.  My
> >> apologies for the unnecessary round trip.
> >
> > ISTM we shouldn't apply 0002, 0003 to master before we've branches 15 off..
>
> That seems reasonable to me.

Gah, I hate putting this off for another year, but I guess I'm also
not convinced that 0003 has the right idea, so maybe it's for the
best. Here's what's bugging me: 0003 decrees that _PG_init() is the
Wrong Place to Adjust GUC Settings. Now, that begs the question: what
is the right place to adjust GUC settings? I argued upthread that
extensions shouldn't be overriding values from postgresql.conf at all,
but on further reflection I think there's at least one legitimate use
case for this sort of thing: some kind of auto-tuning module that, for
example, looks at how much memory you have and sets shared_buffers or
work_mem or something based on the result. It's arguable how well
something like this can actually work, but we probably shouldn't try
to prevent people from doing it. I'm a little less clear why Citus
thinks this is an appropriate thing for it to do, vs. telling the user
to configure a useful value, and I'd be curious to hear an explanation
around that.

But if there's even one use case where adjusting GUCs at this phase is
reasonable, then 0003 isn't really good enough. We need an 0004 that
provides a new hook in a place where such changes can safely be made.

Meanwhile, committed 0001.

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: failures in t/031_recovery_conflict.pl on CI

2022-04-12 Thread Tom Lane
Andres Freund  writes:
> On 2022-04-09 19:34:26 -0400, Tom Lane wrote:
>> +1.  This is probably more feasible given the latch infrastructure
>> than it was when that code was first written.

> What do you think about just reordering the disable_all_timeouts() to be
> before the got_standby_deadlock_timeout check in the back branches? I think
> that should close at least the most obvious hole.  And fix it properly in
> HEAD?

I don't have much faith in that, and I don't see why we can't fix it
properly.  Don't we just need to have the signal handler set MyLatch,
and then do the unsafe stuff back in the "if (got_standby_deadlock_timeout)"
stanza in mainline?

regards, tom lane




Re: Frontend error logging style

2022-04-12 Thread Tom Lane
Peter Eisentraut  writes:
> On 11.04.22 17:22, Tom Lane wrote:
>> The patch I presented keeps the unlikely() checks in the debug-level
>> macros.  Do you think we should drop those too?  I figured that avoiding
>> evaluating the arguments would be worth something.

> Oh, that's right, the whole thing is to not evaluate the arguments if 
> the log level isn't adequate.  We should probably keep that.

I don't think that's nearly as interesting in the frontend as in
the backend.  Error messages in the backend frequently include
catalog lookups and the like in the arguments, but I think nearly
all frontend messages are writing nothing more complicated than
variables and maybe some indirections or array fetches.  So I'm
not all that worried about runtime, and would rather save some
code space.

regards, tom lane




Re: failures in t/031_recovery_conflict.pl on CI

2022-04-12 Thread Andres Freund
Hi,

On 2022-04-09 19:34:26 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > It's been broken in different ways all the way back to 9.0, from what I can
> > see, but I didn't check every single version.
> 
> > Afaics the fix is to nuke the idea of doing anything substantial in the 
> > signal
> > handler from orbit, and instead just set a flag in the handler.
> 
> +1.  This is probably more feasible given the latch infrastructure
> than it was when that code was first written.

What do you think about just reordering the disable_all_timeouts() to be
before the got_standby_deadlock_timeout check in the back branches? I think
that should close at least the most obvious hole.  And fix it properly in
HEAD?

Greetings,

Andres Freund




Re: Frontend error logging style

2022-04-12 Thread Peter Eisentraut



On 11.04.22 17:22, Tom Lane wrote:

Peter Eisentraut  writes:

On 08.04.22 22:26, Tom Lane wrote:

I think we should put a centralized level check
into logging.c, and get rid of at least the "if (likely())"
checks, because those are going to succeed approximately 100.0%
of the time.  Maybe there's an argument for keeping the unlikely()
ones.



Yeah, that seems ok to change.  The previous coding style is more useful
if you have a lot of debug messages in a hot code path, but that usually
doesn't apply to where this is used.


The patch I presented keeps the unlikely() checks in the debug-level
macros.  Do you think we should drop those too?  I figured that avoiding
evaluating the arguments would be worth something.


Oh, that's right, the whole thing is to not evaluate the arguments if 
the log level isn't adequate.  We should probably keep that.


Is the code size a big problem?  ereport() has a bunch of extra code 
around each call as well.  Does it have similar problems?





Re: WIP: WAL prefetch (another approach)

2022-04-12 Thread SATYANARAYANA NARLAPURAM
> Other way around. FPWs make prefetch unnecessary.
> Therefore you would only want prefetch with FPW=off, AFAIK.
>
A few scenarios I can imagine page prefetch can help are, 1/ A DR replica
instance that is smaller instance size than primary. Page prefetch can
bring the pages back into memory in advance when they are evicted. This
speeds up the replay and is cost effective. 2/ Allows larger
checkpoint_timeout for the same recovery SLA and perhaps improved
performance? 3/ WAL prefetch (not pages by itself) can improve replay by
itself (not sure if it was measured in isolation, Tomas V can comment on
it). 4/ Read replica running analytical workload scenario Tomas V mentioned
earlier.


>
> Or put this another way: when is it safe and sensible to use
> recovery_prefetch != off?
>
When checkpoint_timeout is set large and under heavy write activity, on a
read replica that has working set higher than the memory and receiving
constant updates from primary. This covers 1 & 4 above.


> --
> Simon Riggshttp://www.EnterpriseDB.com/
>
>
>


Re: make MaxBackends available in _PG_init

2022-04-12 Thread Nathan Bossart
On Tue, Apr 12, 2022 at 10:44:27AM -0700, Andres Freund wrote:
> On 2022-04-11 14:14:35 -0700, Nathan Bossart wrote:
>> Here's a new patch set.  I've only changed 0003 as described above.  My
>> apologies for the unnecessary round trip.
> 
> ISTM we shouldn't apply 0002, 0003 to master before we've branches 15 off..

That seems reasonable to me.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: CLUSTER sort on abbreviated expressions is broken

2022-04-12 Thread Peter Geoghegan
On Sun, Apr 3, 2022 at 4:34 PM Thomas Munro  wrote:
> I probably should have made it clearer in the commit message,
> cc58eecc5 doesn't fix this problem in the master branch.  It only
> fixes the code that incorrectly assumed that datum1 was always
> available.

Attached patch fixes the issue, and includes the test case that you posted.

There is only a one line change to tuplesort.c. This is arguably the
same bug -- abbreviation is just another "haveDatum1 optimization"
that needs to be accounted for.

-- 
Peter Geoghegan


v1-0001-Fix-CLUSTER-sort-on-abbreviated-expressions.patch
Description: Binary data


Re: make MaxBackends available in _PG_init

2022-04-12 Thread Andres Freund
Hi,

On 2022-04-11 14:14:35 -0700, Nathan Bossart wrote:
> On Mon, Apr 11, 2022 at 01:44:42PM -0700, Nathan Bossart wrote:
> > On Mon, Apr 11, 2022 at 04:36:36PM -0400, Robert Haas wrote:
> >> If we throw an error while defining_custom_guc is true, how will it
> >> ever again become false?
> > 
> > Ah, I knew I was forgetting something this morning.
> > 
> > It won't, but the only place it is presently needed is when loading
> > shared_preload_libraries, so I believe startup will fail anyway.  However,
> > I can see defining_custom_guc being used elsewhere, so that is probably not
> > good enough.  Another approach could be to add a static
> > set_config_option_internal() function with a boolean argument to indicate
> > whether it is being used while defining a custom GUC.  I'll adjust 0003
> > with that approach unless a better idea prevails.
> 
> Here's a new patch set.  I've only changed 0003 as described above.  My
> apologies for the unnecessary round trip.

ISTM we shouldn't apply 0002, 0003 to master before we've branches 15 off..

Greetings,

Andres Freund




Re: How about a psql backslash command to show GUCs?

2022-04-12 Thread Tom Lane
"Jonathan S. Katz"  writes:
> On 4/12/22 11:19 AM, Tom Lane wrote:
>> It'd just look like this, I think.  I see from looking at guc.c that
>> boot_val can be NULL, so we'd better use IS DISTINCT FROM.

> I tested it and I like this a lot better, at least it's much more 
> consolidated. They all seem to be generated (directories, timezones, 
> collations/encodings).

Yeah, most of what shows up in a minimally-configured installation is
postmaster-computed settings like config_file, rather than things
that were actually set by the DBA.  Personally I'd rather hide the
ones that have source = 'override', but that didn't seem to be the
consensus.

> The one exception to this seems to be "max_stack_depth", which is 
> rendering on my "\dconfig" though I didn't change it, an it's showing 
> it's default value of 2MB. "boot_val" says 100, "reset_val" says 2048, 
> and it's commented out in my postgresql.conf. Do we want to align that?

I don't think there's any principled thing we could do about that in
psql.  The boot_val is a conservatively small 100kB, but we crank
that up automatically based on getrlimit(RLIMIT_STACK), so on any
reasonable platform it's going to show as not being default.

regards, tom lane




Re: How about a psql backslash command to show GUCs?

2022-04-12 Thread Jonathan S. Katz

On 4/12/22 11:19 AM, Tom Lane wrote:

"Jonathan S. Katz"  writes:

On 4/11/22 4:11 PM, Tom Lane wrote:

This idea does somewhat address my unhappiness upthread about printing
values with source = 'internal', but I see that it gets confused by
some GUCs with custom show hooks, like unix_socket_permissions.
Maybe it needs to be "source != 'default' AND setting != boot_val"?



Running through a few GUCs, that seems reasonable. Happy to test the
patch out prior to commit to see if it renders better.


It'd just look like this, I think.  I see from looking at guc.c that
boot_val can be NULL, so we'd better use IS DISTINCT FROM.


(IS DISTINCT FROM is pretty handy :)

I tested it and I like this a lot better, at least it's much more 
consolidated. They all seem to be generated (directories, timezones, 
collations/encodings).


The one exception to this seems to be "max_stack_depth", which is 
rendering on my "\dconfig" though I didn't change it, an it's showing 
it's default value of 2MB. "boot_val" says 100, "reset_val" says 2048, 
and it's commented out in my postgresql.conf. Do we want to align that?


That said, the patch itself LGTM.

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature


Re: avoid multiple hard links to same WAL file after a crash

2022-04-12 Thread Nathan Bossart
On Tue, Apr 12, 2022 at 03:46:31PM +0900, Kyotaro Horiguchi wrote:
> At Mon, 11 Apr 2022 09:52:57 -0700, Nathan Bossart  
> wrote in 
>> I traced this back a while ago.  I believe the link() was first added in
>> November 2000 as part of f0e37a8.  This even predates WAL recycling, which
>> was added in July 2001 as part of 7d4d5c0.
> 
> f0e37a8 lacks discussion.. It introduced the CHECKPOINT command from
> somwhere out of the ML.. This patch changed XLogFileInit to
> supportusing existent files so that XLogWrite can use the new segment
> provided by checkpoint and still allow XLogWrite to create a new
> segment by itself.

Yeah, I've been unable to find any discussion besides a brief reference to
adding checkpointing [0].

[0] 
https://postgr.es/m/8F4C99C66D04D4118F580090272A7A23018D85%40sectorbase1.sectorbase.com

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: make MaxBackends available in _PG_init

2022-04-12 Thread Nathan Bossart
On Tue, Apr 12, 2022 at 01:08:35PM +0800, Julien Rouhaud wrote:
> It looks sensible to me, although I think I would apply 0003 before 0002.

Great.

> +   if (process_shared_preload_libraries_in_progress &&
> +   !allow_when_loading_preload_libs)
> +   ereport(ERROR,
> +   (errcode(ERRCODE_CANT_CHANGE_RUNTIME_PARAM),
> +errmsg("cannot change parameters while loading "
> +   "\"shared_preload_libraries\"")));
> +
> 
> I think the error message should mention at least which GUC is being modified.

My intent was to make it clear that no parameters can be updated while
loading s_p_l, but I agree that we should mention the specific GUC
somewhere.  Maybe this could go in a hint?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: WIP: WAL prefetch (another approach)

2022-04-12 Thread Tomas Vondra
On 4/12/22 17:46, Simon Riggs wrote:
> On Tue, 12 Apr 2022 at 16:41, Tomas Vondra
>  wrote:
>>
>> On 4/12/22 15:58, Simon Riggs wrote:
>>> On Thu, 7 Apr 2022 at 08:46, Thomas Munro  wrote:
>>>
 With that... I've finally pushed the 0002 patch and will be watching
 the build farm.
>>>
>>> This is a nice feature if it is safe to turn off full_page_writes.
>>>
>>> When is it safe to do that? On which platform?
>>>
>>> I am not aware of any released software that allows full_page_writes
>>> to be safely disabled. Perhaps something has been released recently
>>> that allows this? I think we have substantial documentation about
>>> safety of other settings, so we should carefully document things here
>>> also.
>>>
>>
>> I don't see why/how would an async prefetch make FPW unnecessary. Did
>> anyone claim that be the case?
> 
> Other way around. FPWs make prefetch unnecessary.
> Therefore you would only want prefetch with FPW=off, AFAIK.
> 
> Or put this another way: when is it safe and sensible to use
> recovery_prefetch != off?
> 

That assumes the FPI stays in memory until the next modification, and
that can be untrue for a number of reasons. Long checkpoint interval
with enough random accesses in between is a nice example. See the
benchmarks I did a year ago (regular pgbench).

Or imagine a r/o replica used to run analytics queries, that access so
much data it evicts the buffers initialized by the FPI records.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: WIP: WAL prefetch (another approach)

2022-04-12 Thread Dagfinn Ilmari Mannsåker
Simon Riggs  writes:

> On Thu, 7 Apr 2022 at 08:46, Thomas Munro  wrote:
>
>> With that... I've finally pushed the 0002 patch and will be watching
>> the build farm.
>
> This is a nice feature if it is safe to turn off full_page_writes.
>
> When is it safe to do that? On which platform?
>
> I am not aware of any released software that allows full_page_writes
> to be safely disabled. Perhaps something has been released recently
> that allows this? I think we have substantial documentation about
> safety of other settings, so we should carefully document things here
> also.

Our WAL reliability docs claim that ZFS is safe against torn pages:

https://www.postgresql.org/docs/current/wal-reliability.html:

If you have file-system software that prevents partial page writes
(e.g., ZFS), you can turn off this page imaging by turning off the
full_page_writes parameter.

- ilmari




Re: WIP: WAL prefetch (another approach)

2022-04-12 Thread Simon Riggs
On Tue, 12 Apr 2022 at 16:41, Tomas Vondra
 wrote:
>
> On 4/12/22 15:58, Simon Riggs wrote:
> > On Thu, 7 Apr 2022 at 08:46, Thomas Munro  wrote:
> >
> >> With that... I've finally pushed the 0002 patch and will be watching
> >> the build farm.
> >
> > This is a nice feature if it is safe to turn off full_page_writes.
> >
> > When is it safe to do that? On which platform?
> >
> > I am not aware of any released software that allows full_page_writes
> > to be safely disabled. Perhaps something has been released recently
> > that allows this? I think we have substantial documentation about
> > safety of other settings, so we should carefully document things here
> > also.
> >
>
> I don't see why/how would an async prefetch make FPW unnecessary. Did
> anyone claim that be the case?

Other way around. FPWs make prefetch unnecessary.
Therefore you would only want prefetch with FPW=off, AFAIK.

Or put this another way: when is it safe and sensible to use
recovery_prefetch != off?

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: WIP: WAL prefetch (another approach)

2022-04-12 Thread Tomas Vondra
On 4/12/22 15:58, Simon Riggs wrote:
> On Thu, 7 Apr 2022 at 08:46, Thomas Munro  wrote:
> 
>> With that... I've finally pushed the 0002 patch and will be watching
>> the build farm.
> 
> This is a nice feature if it is safe to turn off full_page_writes.
> 
> When is it safe to do that? On which platform?
> 
> I am not aware of any released software that allows full_page_writes
> to be safely disabled. Perhaps something has been released recently
> that allows this? I think we have substantial documentation about
> safety of other settings, so we should carefully document things here
> also.
> 

I don't see why/how would an async prefetch make FPW unnecessary. Did
anyone claim that be the case?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: random() function documentation

2022-04-12 Thread Tom Lane
=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?=  writes:
> Dean Rasheed  writes:
>> I think it'd be sufficient to just say that it's a deterministic
>> pseudorandom number generator. I don't see much value in documenting
>> the internal algorithm used.

> WFM on both points.

Sold then, I'll make it so.

regards, tom lane




Re: How about a psql backslash command to show GUCs?

2022-04-12 Thread Tom Lane
"Jonathan S. Katz"  writes:
> On 4/11/22 4:11 PM, Tom Lane wrote:
>> This idea does somewhat address my unhappiness upthread about printing
>> values with source = 'internal', but I see that it gets confused by
>> some GUCs with custom show hooks, like unix_socket_permissions.
>> Maybe it needs to be "source != 'default' AND setting != boot_val"?

> Running through a few GUCs, that seems reasonable. Happy to test the 
> patch out prior to commit to see if it renders better.

It'd just look like this, I think.  I see from looking at guc.c that
boot_val can be NULL, so we'd better use IS DISTINCT FROM.

regards, tom lane

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index e7377d4583..17790bd8a4 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -4410,7 +4410,8 @@ describeConfigurationParameters(const char *pattern, bool verbose,
 			  NULL, "pg_catalog.lower(s.name)", NULL,
 			  NULL);
 	else
-		appendPQExpBufferStr(, "WHERE s.source <> 'default'\n");
+		appendPQExpBufferStr(, "WHERE s.source <> 'default' AND\n"
+			 "  s.setting IS DISTINCT FROM s.boot_val\n");
 
 	appendPQExpBufferStr(, "ORDER BY 1;");
 


Re: random() function documentation

2022-04-12 Thread Dagfinn Ilmari Mannsåker
Dean Rasheed  writes:

> On Mon, 11 Apr 2022 at 20:20, Tom Lane  wrote:
>>
>> >> How about we just say "uses a linear-feedback shift register algorithm"?
>
> I think it'd be sufficient to just say that it's a deterministic
> pseudorandom number generator. I don't see much value in documenting
> the internal algorithm used.
>
>> > Should we
>> > perhaps also add a warning that the same seed is not guaranteed to
>> > produce the same sequence across different (major?) versions?
>>
>> I wouldn't bother, on the grounds that then we'd need such disclaimers
>> in a whole lot of places.  Others might see it differently though.
>
> Agreed, though I think when the release notes are written, they ought
> to warn that the sequence will change with this release.

WFM on both points.

> Regards,
> Dean

- ilmari




Re: random() function documentation

2022-04-12 Thread Tom Lane
Fabien COELHO  writes:
>>> How about we just say "uses a linear-feedback shift register algorithm"?

>> I think it'd be sufficient to just say that it's a deterministic
>> pseudorandom number generator. I don't see much value in documenting
>> the internal algorithm used.

> Hmmm… I'm not so sure. ISTM that people interested in using the random 
> user-facing variants (only random?) could like a pointer on the algorithm 
> to check for the expected quality of the produced pseudo-random stream?

> See attached.

I don't want to get that specific.  We were not specific before and
there has been no call for such detail in the docs.  (Unlike
closed-source software, anybody who really wants algorithmic details
can find all they want to know in the source code.)  It would just
amount to another thing to forget to update next time someone changes
the algorithm ... which is a consideration that leads me to favor
Dean's phrasing.

regards, tom lane




Re: Temporary file access API

2022-04-12 Thread Matthias van de Meent
On Mon, 11 Apr 2022 at 10:05, Antonin Houska  wrote:
>
> Robert Haas  wrote:
>
> > On Fri, Apr 8, 2022 at 4:54 AM Antonin Houska  wrote:
> > > Do you think that the use of a system call is a problem itself (e.g. 
> > > because
> > > the code looks less simple if read/write is used somewhere and 
> > > fread/fwrite
> > > elsewhere; of course of read/write is mandatory in special cases like WAL,
> > > heap pages, etc.)  or is the problem that the system calls are used too
> > > frequently? I suppose only the latter.
> >
> > I'm not really super-interested in reducing the number of system
> > calls. It's not a dumb thing in which to be interested and I know that
> > for example Thomas Munro is very interested in it and has done a bunch
> > of work in that direction just to improve performance. But for me the
> > attraction of this is mostly whether it gets us closer to TDE.
> >
> > And that's why I'm asking these questions about adopting it in
> > different places. I kind of thought that your previous patches needed
> > to encrypt, I don't know, 10 or 20 different kinds of files. So I was
> > surprised to see this patch touching the handling of only 2 kinds of
> > files. If we consolidate the handling of let's say 15 of 20 cases into
> > a single mechanism, we've really moved the needle in the right
> > direction -- but consolidating the handling of 2 of 20 cases, or
> > whatever the real numbers are, isn't very exciting.
>
> There are't really that many kinds of files to encrypt:
>
> https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#List_of_the_files_that_contain_user_data
>
> (And pg_stat/* files should be removed from the list.)

I was looking at that list of files that contain user data, and
noticed that all relation forks except the main fork were marked as
'does not contain user data'. To me this seems not necessarily true:
AMs do have access to forks for user data storage as well (without any
real issues or breaking the abstraction), and the init-fork is
expected to store user data (specifically in the form of unlogged
sequences). Shouldn't those forks thus also be encrypted-by-default,
or should we provide some other method to ensure that non-main forks
with user data are encrypted?


- Matthias




Re: support for MERGE

2022-04-12 Thread Ranier Vilela
Em ter., 12 de abr. de 2022 às 10:47, Alvaro Herrera <
alvhe...@alvh.no-ip.org> escreveu:

> On 2022-Apr-02, Ranier Vilela wrote:
>
> > Em sáb., 2 de abr. de 2022 às 12:01, Alvaro Herrera <
> alvhe...@alvh.no-ip.org>
> > escreveu:
>
> > IMHO, actually there are bug here.
> > ExecGetChildToRootMap is clear, is possible returning NULL.
> > To discover if the map is NULL, ExecGetChildToRootMap needs to process
> > "ResultRelInfo *leaf_part_rri".
> > So, the argument "if the map is NULL, this function should not be
> called",
> > is contradictory.
>
> I was not explicit enough.  I meant "if no map is needed to adjust
> columns, then this function should not be called".  The caller already
> knows if it's needed or not; it doesn't depend on literally testing
> 'map'.  If somebody mis-calls this function, it would have crashed, yes;
> but that's a caller bug, not this function's.
>
Thanks for the explanation.


>
> A few days ago, the community Coverity also complained about this, so I
> added an Assert that the map is not null, which should silence it.
>
Thanks for hardening this.


>
> > If the right fix is to return the original list, here is the patch
> attached.
>
> ... for a buggy caller (one that calls it when unnecessary), then yes
> this would be the correct code -- except that now the caller doesn't
> know if the returned list needs to be freed or not.  So it seems better
> to avoid accumulating pointless calls to this function by just not
> coping with them.
>
 Sure, it is always better to avoid doing work, unless strictly necessary.

regards,
Ranier Vilela


Re: PG DOCS - logical replication filtering

2022-04-12 Thread Euler Taveira
On Tue, Apr 12, 2022, at 5:30 AM, Peter Smith wrote:
> Not changed. Because in fact, I copied most of this sentence
> (including the uppercase, "operations", "and/or") from existing
> documentation [1]
> e.g. see "The tables added to a publication that publishes UPDATE
> and/or DELETE operations must ..."
> [1] https://www.postgresql.org/docs/current/sql-createpublication.html
Hmm. You are checking the Notes. I'm referring the the publish parameter. IMO
this sentence should use operations in lowercase letters too because even if
you create it with uppercase letters, Postgres will provide a lowercase word
when you dump it.

> Yeah, I know the information is the same in the summary and in the
> text. Personally, I find big slabs of technical text difficult to
> digest, so I'd have to spend 5 minutes of careful reading and drawing
> the exact same summary on a piece of paper anyway just to visualize
> what the text says. The summary makes it easy to understand at a
> glance. But I have modified the summary in [v9] to remove the "case"
> and to add other column headings as suggested.
Isn't it better to use a table instead of synopsis?

> Not changed. The readers of this docs page are all users who will be
> familiar with the filter expressions, so I felt the "OR'ed together"
> part will be perfectly clear to the intended audience.
If you want to keep it, change it to "ORed". It is used in indices.sgml. Let's
keep the consistence.

> But I did not understand your point about “If, for some reason,
> someone decided to change it, this section will contain obsolete
> information”, because IIUC that will be equally true for both the
> explanation and the output, so I did not understand why you say "psql
> output is fine, the explanation is not". It is the business of this
> documentation to help the user to know how and where they can find the
> row filter information they may need to know.
You are describing a psql command here. My point is keep psql explanation in
the psql section. This section is to describe the row filter feature. If we
start describing features in other sections, we will have outdated information
when the referred feature is changed and someone fails to find all references.
I tend to concentrate detailed explanation in the feature section. If I have to
add links in other sections, I use "Seee Section 1.23 for details ...".

> Not changed. The publisher and subscriber programlistings are always
> separated. If you are looking at the rendered HTML I think it is quite
> clear that one is at the publisher and one is at the subscriber. OTOH,
> if we omitted creating the tables on the subscriber then I think that
> really would cause some confusion.
The difference is extra space. By default, the CSS does not include a border
for programlisting. That's why I complained about it. I noticed that the
website CSS includes it. However, the PDF will not include the border. I would
add a separate description for the subscriber just to be clear.

One last suggestion, you are using identifiers in uppercase letters but
"primary key" is in lowercase.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: WIP: WAL prefetch (another approach)

2022-04-12 Thread Simon Riggs
On Thu, 7 Apr 2022 at 08:46, Thomas Munro  wrote:

> With that... I've finally pushed the 0002 patch and will be watching
> the build farm.

This is a nice feature if it is safe to turn off full_page_writes.

When is it safe to do that? On which platform?

I am not aware of any released software that allows full_page_writes
to be safely disabled. Perhaps something has been released recently
that allows this? I think we have substantial documentation about
safety of other settings, so we should carefully document things here
also.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: row filtering for logical replication

2022-04-12 Thread Alvaro Herrera
On 2022-Apr-12, Amit Kapila wrote:

> I mean that it fetches the tuple from the RELOID cache and then
> performs relkind and other checks similar to what we are doing. I
> think it could also have used get_rel_relkind() but probably not done
> because it doesn't have a lock on the relation.

Ah, but that one uses a lot more fields from the pg_class tuple in the
non-error path.  We only need relkind, up until we know the error is to
be thrown.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: support for MERGE

2022-04-12 Thread Alvaro Herrera
On 2022-Apr-02, Ranier Vilela wrote:

> Em sáb., 2 de abr. de 2022 às 12:01, Alvaro Herrera 
> escreveu:

> IMHO, actually there are bug here.
> ExecGetChildToRootMap is clear, is possible returning NULL.
> To discover if the map is NULL, ExecGetChildToRootMap needs to process
> "ResultRelInfo *leaf_part_rri".
> So, the argument "if the map is NULL, this function should not be called",
> is contradictory.

I was not explicit enough.  I meant "if no map is needed to adjust
columns, then this function should not be called".  The caller already
knows if it's needed or not; it doesn't depend on literally testing
'map'.  If somebody mis-calls this function, it would have crashed, yes;
but that's a caller bug, not this function's.

A few days ago, the community Coverity also complained about this, so I
added an Assert that the map is not null, which should silence it.

> If the right fix is to return the original list, here is the patch attached.

... for a buggy caller (one that calls it when unnecessary), then yes
this would be the correct code -- except that now the caller doesn't
know if the returned list needs to be freed or not.  So it seems better
to avoid accumulating pointless calls to this function by just not
coping with them.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"I suspect most samba developers are already technically insane...
Of course, since many of them are Australians, you can't tell." (L. Torvalds)




Re: Temporary file access API

2022-04-12 Thread Robert Haas
On Tue, Apr 12, 2022 at 5:30 AM Antonin Houska  wrote:
> Robert Haas  wrote:
> > On Mon, Apr 11, 2022 at 4:05 AM Antonin Houska  wrote:
> > > There are't really that many kinds of files to encrypt:
> > >
> > > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#List_of_the_files_that_contain_user_data
> > >
> > > (And pg_stat/* files should be removed from the list.)
> >
> > This kind of gets into some theoretical questions. Like, do we think
> > that it's an information leak if people can look at how many
> > transactions are committing and aborting in pg_xact_status? In theory
> > it could be, but I know it's been argued that that's too much of a
> > side channel. I'm not sure I believe that, but it's arguable.
>
> I was referring to the fact that the statistics are no longer stored in files:
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5891c7a8ed8f2d3d577e7eea34dacff12d7b6bbd

Oh, yeah, I agree with that. I was saying that I'm not altogether a
believer in the idea that we need not encrypt SLRU files.

TBH, I think that no matter what we do there are going to be side
channel leaks, where some security researcher demonstrates that they
can infer something based on file length or file creation rate or
something. It seems inevitable. But the more stuff we don't encrypt,
the easier those attacks are going to be. On the other hand,
encrypting more stuff also makes the project harder. It's hard for me
to judge what the right balance is here. Maybe it's OK to exclude that
stuff for an initial version and just disclaim the heck out of it, but
I don't think that should be our long term strategy.

> > I really don't know how you can argue that pg_dynshmem/mmap.NNN
> > doesn't contain user data - surely it can.
>
> Good point. Since postgres does not control writing into this file, it's a
> special case though. (Maybe TDE will have to reject to start if
> dynamic_shared_memory_type is set to mmap and the instance is encrypted.)

Interesting point.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




GSOC: New and improved website for pgjdbc (JDBC) (2022)

2022-04-12 Thread S.R Keshav
Hi, I'm keshav,  and I have updated my proposal. kindly accept my changes.


postgreSql_ New and improved website for pgjdbc (JDBC) (2022).pdf
Description: Adobe PDF document


Re: row filtering for logical replication

2022-04-12 Thread Alvaro Herrera
On 2022-Apr-12, Amit Kapila wrote:

> It still has the same problem. The table can be dropped just before
> this message and the get_rel_name will return NULL and we don't expect
> that.

Ugh, I forgot to change the errmsg() parts to use the new variable,
apologies.  Fixed.

> Also, is there a reason that you haven't kept the test case added by Hou-San?

None.  I put it back here.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
>From f23be23c27eb9bed7350745233f4660f4c5b326a Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 12 Apr 2022 12:59:50 +0200
Subject: [PATCH v4] fixup checking for rowfilter/collist on altering
 publication

---
 src/backend/commands/publicationcmds.c| 89 +++
 src/test/regress/expected/publication.out | 16 +++-
 src/test/regress/sql/publication.sql  |  8 ++
 3 files changed, 63 insertions(+), 50 deletions(-)

diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c
index 7aacb6b2fe..d2b9f95f6d 100644
--- a/src/backend/commands/publicationcmds.c
+++ b/src/backend/commands/publicationcmds.c
@@ -945,60 +945,57 @@ AlterPublicationOptions(ParseState *pstate, AlterPublicationStmt *stmt,
 
 		foreach(lc, root_relids)
 		{
-			HeapTuple	rftuple;
 			Oid			relid = lfirst_oid(lc);
-			bool		has_column_list;
-			bool		has_row_filter;
+			char		relkind;
+			char	   *relname;
+			HeapTuple	rftuple;
+			bool		has_rowfilter;
+			bool		has_collist;
+
+			/* Beware: we don't have lock on the relations */
 
 			rftuple = SearchSysCache2(PUBLICATIONRELMAP,
 	  ObjectIdGetDatum(relid),
 	  ObjectIdGetDatum(pubform->oid));
-
-			has_row_filter
-= !heap_attisnull(rftuple, Anum_pg_publication_rel_prqual, NULL);
-
-			has_column_list
-= !heap_attisnull(rftuple, Anum_pg_publication_rel_prattrs, NULL);
-
-			if (HeapTupleIsValid(rftuple) &&
-(has_row_filter || has_column_list))
+			if (!HeapTupleIsValid(rftuple))
+continue;
+			has_rowfilter = !heap_attisnull(rftuple, Anum_pg_publication_rel_prqual, NULL);
+			has_collist = !heap_attisnull(rftuple, Anum_pg_publication_rel_prattrs, NULL);
+			if (!has_rowfilter && !has_collist)
 			{
-HeapTuple	tuple;
-
-tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
-if (HeapTupleIsValid(tuple))
-{
-	Form_pg_class relform = (Form_pg_class) GETSTRUCT(tuple);
-
-	if ((relform->relkind == RELKIND_PARTITIONED_TABLE) &&
-		has_row_filter)
-		ereport(ERROR,
-(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("cannot set %s for publication \"%s\"",
-		"publish_via_partition_root = false",
-		stmt->pubname),
- errdetail("The publication contains a WHERE clause for a partitioned table \"%s\" "
-		   "which is not allowed when %s is false.",
-		   NameStr(relform->relname),
-		   "publish_via_partition_root")));
-
-	if ((relform->relkind == RELKIND_PARTITIONED_TABLE) &&
-		has_column_list)
-		ereport(ERROR,
-(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("cannot set %s for publication \"%s\"",
-		"publish_via_partition_root = false",
-		stmt->pubname),
- errdetail("The publication contains a column list for a partitioned table \"%s\" "
-		   "which is not allowed when %s is false.",
-		   NameStr(relform->relname),
-		   "publish_via_partition_root")));
-
-	ReleaseSysCache(tuple);
-}
-
 ReleaseSysCache(rftuple);
+continue;
 			}
+
+			relkind = get_rel_relkind(relid);
+			if (relkind != RELKIND_PARTITIONED_TABLE)
+			{
+ReleaseSysCache(rftuple);
+continue;
+			}
+			relname = get_rel_name(relid);
+			if (relname == NULL)	/* table concurrently dropped */
+			{
+ReleaseSysCache(rftuple);
+continue;
+			}
+
+			if (has_rowfilter)
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("cannot set parameter \"%s\" to false for publication \"%s\"",
+"publish_via_partition_root",
+stmt->pubname),
+		 errdetail("The publication contains a WHERE clause for partitioned table \"%s\" which is not allowed when \"%s\" is false.",
+   relname, "publish_via_partition_root")));
+			Assert(has_collist);
+			ereport(ERROR,
+	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+	 errmsg("cannot set parameter \"%s\" to false for publication \"%s\"",
+			"publish_via_partition_root",
+			stmt->pubname),
+	 errdetail("The publication contains a column list for partitioned table \"%s\", which is not allowed when \"%s\" is false.",
+			   relname, "publish_via_partition_root")));
 		}
 	}
 
diff --git a/src/test/regress/expected/publication.out b/src/test/regress/expected/publication.out
index 8208f9fa0e..cc9c8990ae 100644
--- a/src/test/regress/expected/publication.out
+++ b/src/test/regress/expected/publication.out
@@ -588,8 +588,12 @@ UPDATE rf_tbl_abcd_part_pk SET a = 1;
 -- fail 

Re: row filtering for logical replication

2022-04-12 Thread Amit Kapila
On Tue, Apr 12, 2022 at 5:01 PM Alvaro Herrera  wrote:
>
> On 2022-Apr-12, Amit Kapila wrote:
>
> > On Tue, Apr 12, 2022 at 3:45 PM Amit Kapila  wrote:
>
> > > We don't have a lock on the relation, so if it gets dropped
> > > concurrently, it won't behave sanely. For example, get_rel_name() will
> > > return NULL which seems incorrect to me.
>
> Oh, oops ... a trap for the unwary?  Anyway, yes, we can disregard the
> entry when get_rel_name returns null.  Amended patch attached.
>
> > > I am not sure about this as well because you will instead do a RELOID
> > > cache lookup even when there is no row filter or column list.
>
> I guess my assumption is that the pg_class cache is typically more
> populated than other relcaches, but that's unsubstantiated.  I'm not
> sure if we have any way to tell which one is the more common case.
> Anyway, let's do it the way you already had it.
>
> > It seems to me that we have a similar coding pattern in 
> > ExecGrant_Relation().
>
> Not sure what you mean?
>

I mean that it fetches the tuple from the RELOID cache and then
performs relkind and other checks similar to what we are doing. I
think it could also have used get_rel_relkind() but probably not done
because it doesn't have a lock on the relation.

-- 
With Regards,
Amit Kapila.




Re: row filtering for logical replication

2022-04-12 Thread Amit Kapila
On Tue, Apr 12, 2022 at 5:12 PM Alvaro Herrera  wrote:
>
> Sorry, I think I neglected to "git add" some late changes.
>

+ if (has_rowfilter)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("cannot set parameter \"%s\" to false for publication \"%s\"",
+ "publish_via_partition_root",
+ stmt->pubname),
+ errdetail("The publication contains a WHERE clause for partitioned
table \"%s\" which is not allowed when \"%s\" is false.",
+get_rel_name(relid),
+"publish_via_partition_root")));

It still has the same problem. The table can be dropped just before
this message and the get_rel_name will return NULL and we don't expect
that.

Also, is there a reason that you haven't kept the test case added by Hou-San?

-- 
With Regards,
Amit Kapila.




Re: row filtering for logical replication

2022-04-12 Thread Alvaro Herrera
Sorry, I think I neglected to "git add" some late changes.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
>From e7569ed4c4a01f782f9326ebc9a3c9049973ef4b Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 12 Apr 2022 12:59:50 +0200
Subject: [PATCH v3] fixup checking for rowfilter/collist on altering
 publication

---
 src/backend/commands/publicationcmds.c| 91 +++
 src/test/regress/expected/publication.out |  8 +-
 2 files changed, 49 insertions(+), 50 deletions(-)

diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c
index 7aacb6b2fe..5aa5201055 100644
--- a/src/backend/commands/publicationcmds.c
+++ b/src/backend/commands/publicationcmds.c
@@ -945,60 +945,59 @@ AlterPublicationOptions(ParseState *pstate, AlterPublicationStmt *stmt,
 
 		foreach(lc, root_relids)
 		{
-			HeapTuple	rftuple;
 			Oid			relid = lfirst_oid(lc);
-			bool		has_column_list;
-			bool		has_row_filter;
+			char		relkind;
+			char	   *relname;
+			HeapTuple	rftuple;
+			bool		has_rowfilter;
+			bool		has_collist;
+
+			/* Beware: we don't have lock on the relations */
 
 			rftuple = SearchSysCache2(PUBLICATIONRELMAP,
 	  ObjectIdGetDatum(relid),
 	  ObjectIdGetDatum(pubform->oid));
-
-			has_row_filter
-= !heap_attisnull(rftuple, Anum_pg_publication_rel_prqual, NULL);
-
-			has_column_list
-= !heap_attisnull(rftuple, Anum_pg_publication_rel_prattrs, NULL);
-
-			if (HeapTupleIsValid(rftuple) &&
-(has_row_filter || has_column_list))
+			if (!HeapTupleIsValid(rftuple))
+continue;
+			has_rowfilter = !heap_attisnull(rftuple, Anum_pg_publication_rel_prqual, NULL);
+			has_collist = !heap_attisnull(rftuple, Anum_pg_publication_rel_prattrs, NULL);
+			if (!has_rowfilter && !has_collist)
 			{
-HeapTuple	tuple;
-
-tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
-if (HeapTupleIsValid(tuple))
-{
-	Form_pg_class relform = (Form_pg_class) GETSTRUCT(tuple);
-
-	if ((relform->relkind == RELKIND_PARTITIONED_TABLE) &&
-		has_row_filter)
-		ereport(ERROR,
-(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("cannot set %s for publication \"%s\"",
-		"publish_via_partition_root = false",
-		stmt->pubname),
- errdetail("The publication contains a WHERE clause for a partitioned table \"%s\" "
-		   "which is not allowed when %s is false.",
-		   NameStr(relform->relname),
-		   "publish_via_partition_root")));
-
-	if ((relform->relkind == RELKIND_PARTITIONED_TABLE) &&
-		has_column_list)
-		ereport(ERROR,
-(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("cannot set %s for publication \"%s\"",
-		"publish_via_partition_root = false",
-		stmt->pubname),
- errdetail("The publication contains a column list for a partitioned table \"%s\" "
-		   "which is not allowed when %s is false.",
-		   NameStr(relform->relname),
-		   "publish_via_partition_root")));
-
-	ReleaseSysCache(tuple);
-}
-
 ReleaseSysCache(rftuple);
+continue;
 			}
+
+			relkind = get_rel_relkind(relid);
+			if (relkind != RELKIND_PARTITIONED_TABLE)
+			{
+ReleaseSysCache(rftuple);
+continue;
+			}
+			relname = get_rel_name(relid);
+			if (relname == NULL)	/* table concurrently dropped */
+			{
+ReleaseSysCache(rftuple);
+continue;
+			}
+
+			if (has_rowfilter)
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("cannot set parameter \"%s\" to false for publication \"%s\"",
+"publish_via_partition_root",
+stmt->pubname),
+		 errdetail("The publication contains a WHERE clause for partitioned table \"%s\" which is not allowed when \"%s\" is false.",
+   get_rel_name(relid),
+   "publish_via_partition_root")));
+			Assert(has_collist);
+			ereport(ERROR,
+	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+	 errmsg("cannot set parameter \"%s\" to false for publication \"%s\"",
+			"publish_via_partition_root",
+			stmt->pubname),
+	 errdetail("The publication contains a column list for partitioned table \"%s\", which is not allowed when \"%s\" is false.",
+			   get_rel_name(relid),
+			   "publish_via_partition_root")));
 		}
 	}
 
diff --git a/src/test/regress/expected/publication.out b/src/test/regress/expected/publication.out
index 8208f9fa0e..37cf11e785 100644
--- a/src/test/regress/expected/publication.out
+++ b/src/test/regress/expected/publication.out
@@ -588,8 +588,8 @@ UPDATE rf_tbl_abcd_part_pk SET a = 1;
 -- fail - cannot set PUBLISH_VIA_PARTITION_ROOT to false if any row filter is
 -- used for partitioned table
 ALTER PUBLICATION testpub6 SET (PUBLISH_VIA_PARTITION_ROOT=0);
-ERROR:  cannot set publish_via_partition_root = false for publication "testpub6"
-DETAIL:  The publication contains a WHERE clause for a partitioned table 

Re: row filtering for logical replication

2022-04-12 Thread Alvaro Herrera
On 2022-Apr-12, Amit Kapila wrote:

> On Tue, Apr 12, 2022 at 3:45 PM Amit Kapila  wrote:

> > We don't have a lock on the relation, so if it gets dropped
> > concurrently, it won't behave sanely. For example, get_rel_name() will
> > return NULL which seems incorrect to me.

Oh, oops ... a trap for the unwary?  Anyway, yes, we can disregard the
entry when get_rel_name returns null.  Amended patch attached.

> > I am not sure about this as well because you will instead do a RELOID
> > cache lookup even when there is no row filter or column list.

I guess my assumption is that the pg_class cache is typically more
populated than other relcaches, but that's unsubstantiated.  I'm not
sure if we have any way to tell which one is the more common case.
Anyway, let's do it the way you already had it.

> It seems to me that we have a similar coding pattern in ExecGrant_Relation().

Not sure what you mean?  In that function, when the syscache lookup
returns NULL, an error is raised.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"El número de instalaciones de UNIX se ha elevado a 10,
y se espera que este número aumente" (UPM, 1972)
>From 631a6d04cbe420164833dd4e88a86d0e076fd47d Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 12 Apr 2022 12:59:50 +0200
Subject: [PATCH v2] fixup checking for rowfilter/collist on altering
 publication

---
 src/backend/commands/publicationcmds.c| 91 +++
 src/test/regress/expected/publication.out |  8 +-
 2 files changed, 49 insertions(+), 50 deletions(-)

diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c
index 7aacb6b2fe..59fc39e9f2 100644
--- a/src/backend/commands/publicationcmds.c
+++ b/src/backend/commands/publicationcmds.c
@@ -945,60 +945,59 @@ AlterPublicationOptions(ParseState *pstate, AlterPublicationStmt *stmt,
 
 		foreach(lc, root_relids)
 		{
-			HeapTuple	rftuple;
 			Oid			relid = lfirst_oid(lc);
-			bool		has_column_list;
-			bool		has_row_filter;
+			char		relkind;
+			char	   *relname;
+			HeapTuple	rftuple;
+			bool		has_rowfilter;
+			bool		has_collist;
+
+			/* Beware: we don't have lock on the relations */
 
 			rftuple = SearchSysCache2(PUBLICATIONRELMAP,
 	  ObjectIdGetDatum(relid),
 	  ObjectIdGetDatum(pubform->oid));
-
-			has_row_filter
-= !heap_attisnull(rftuple, Anum_pg_publication_rel_prqual, NULL);
-
-			has_column_list
-= !heap_attisnull(rftuple, Anum_pg_publication_rel_prattrs, NULL);
-
-			if (HeapTupleIsValid(rftuple) &&
-(has_row_filter || has_column_list))
+			if (!HeapTupleIsValid(rftuple))
+continue;
+			has_rowfilter = !heap_attisnull(rftuple, Anum_pg_publication_rel_prqual, NULL);
+			has_collist = !heap_attisnull(rftuple, Anum_pg_publication_rel_prattrs, NULL);
+			if (!has_rowfilter && !has_collist)
 			{
-HeapTuple	tuple;
-
-tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
-if (HeapTupleIsValid(tuple))
-{
-	Form_pg_class relform = (Form_pg_class) GETSTRUCT(tuple);
-
-	if ((relform->relkind == RELKIND_PARTITIONED_TABLE) &&
-		has_row_filter)
-		ereport(ERROR,
-(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("cannot set %s for publication \"%s\"",
-		"publish_via_partition_root = false",
-		stmt->pubname),
- errdetail("The publication contains a WHERE clause for a partitioned table \"%s\" "
-		   "which is not allowed when %s is false.",
-		   NameStr(relform->relname),
-		   "publish_via_partition_root")));
-
-	if ((relform->relkind == RELKIND_PARTITIONED_TABLE) &&
-		has_column_list)
-		ereport(ERROR,
-(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("cannot set %s for publication \"%s\"",
-		"publish_via_partition_root = false",
-		stmt->pubname),
- errdetail("The publication contains a column list for a partitioned table \"%s\" "
-		   "which is not allowed when %s is false.",
-		   NameStr(relform->relname),
-		   "publish_via_partition_root")));
-
-	ReleaseSysCache(tuple);
-}
-
 ReleaseSysCache(rftuple);
+continue;
 			}
+
+			relkind = get_rel_relkind(relid);
+			if (relkind != RELKIND_PARTITIONED_TABLE)
+			{
+ReleaseSysCache(rftuple);
+continue;
+			}
+			relname = get_rel_name(relid);
+			if (relname == NULL)	/* table concurrently dropped */
+			{
+ReleaseSysCache(rftuple);
+continue;
+			}
+
+			if (has_rowfilter)
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("cannot set parameter \"%s\" to false for publication \"%s\"",
+"publish_via_partition_root",
+stmt->pubname),
+		 errdetail("The publication includes partitioned table \"%s\" with a WHERE clause, which is not allowed when \"%s\" is false.",
+   get_rel_name(relid),
+   "publish_via_partition_root")));
+			Assert(has_collist);
+			ereport(ERROR,
+		

Re: typos

2022-04-12 Thread Amit Kapila
On Mon, Apr 11, 2022 at 4:15 PM Amit Kapila  wrote:
>
> On Mon, Apr 11, 2022 at 3:55 PM Masahiko Sawada  wrote:
> >
> > On Mon, Apr 11, 2022 at 7:10 PM Justin Pryzby  wrote:
> > >
> > > Amit or Masahiko may want to comment on 0012 (doc review: Add ALTER
> > > SUBSCRIPTION ... SKIP).
> >
>
> +1. I'll take care of pushing this one tomorrow unless we have more
> comments on this part.
>

I have pushed this one.

-- 
With Regards,
Amit Kapila.




Re: Skipping schema changes in publication

2022-04-12 Thread Amit Kapila
On Tue, Apr 12, 2022 at 4:17 PM vignesh C  wrote:
>
> On Tue, Apr 12, 2022 at 12:19 PM Amit Kapila  wrote:
> >
> >
> > For the second syntax (Alter Publication ...), isn't it better to
> > avoid using ADD? It looks odd to me because we are not adding anything
> > in publication with this sytax.
>
> I was thinking of the scenario where user initially creates the
> publication for all tables:
> CREATE PUBLICATION pub1 FOR ALL TABLES;
>
> After that user decides to skip few tables ex: t1, t2
>  ALTER PUBLICATION pub1 ADD SKIP TABLE t1,t2;
>
> I thought of supporting this syntax if incase user decides to add the
> skipping of a few tables later.
>

I understand that part but what I pointed out was that it might be
better to avoid using ADD keyword in this syntax like: ALTER
PUBLICATION pub1 SKIP TABLE t1,t2;

-- 
With Regards,
Amit Kapila.




Re: Support logical replication of DDLs

2022-04-12 Thread Amit Kapila
On Mon, Apr 11, 2022 at 11:01 PM Zheng Li  wrote:
>
> >Even if this works, how will we make Alter Table statement work where
> >it needs to rewrite the table? There also I think we can face a
> >similar problem if we directly send the statement, once the table will
> >be updated due to the DDL statement and then again due to table
> >rewrite as that will have a separate WAL.
>
> Yes, I think any DDL that can generate DML changes should be listed
> out and handled properly or documented. Here is one extreme example
> involving volatile functions:
> ALTER TABLE nd_ddl ADD COLUMN t timestamp DEFAULT now().
> Again, I think we need to somehow skip the data rewrite on the
> subscriber when replicating such DDL and let DML replication handle
> the rewrite.
>

I am not sure what is the right way to deal with this but another idea
we can investigate is to probably rewrite the DDL such that it doesn't
rewrite the table.

> >Another somewhat unrelated problem I see with this work is how to save
> >recursion of the same command between nodes (when the involved nodes
> >replicate DDLs). For DMLs, we can avoid that via replication origins
> >as is being done in the patch proposed [1] but not sure how will we
> >deal with that here?
>
> I'll need to investigate "recursion of the same command between
> nodes", could you provide an example?
>

See email [1]. I think the same solution should work for your proposal
as well because I see that LogLogicalMessage includes origin but it is
better if you can confirm it.

[1] - 
https://www.postgresql.org/message-id/flat/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9=9orqubh...@mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: Support logical replication of DDLs

2022-04-12 Thread Amit Kapila
On Mon, Apr 11, 2022 at 6:16 PM Euler Taveira  wrote:
>
> On Mon, Apr 11, 2022, at 2:00 AM, Amit Kapila wrote:
>
> On Thu, Apr 7, 2022 at 3:46 PM Amit Kapila  wrote:
> >
> > On Wed, Mar 23, 2022 at 10:39 AM Japin Li  wrote:
> >
> > 2. For DDL replication, do we need to wait for a consistent point of
> > snapshot? For DMLs, that point is a convenient point to initialize
> > replication from, which is why we export a snapshot at that point,
> > which is used to read normal data. Do we have any similar needs for
> > DDL replication?
> >
>
> I have thought a bit more about this and I think we need to build the
> snapshot for DML replication as we need to read catalog tables to
> decode the corresponding WAL but it is not clear to me if we have a
> similar requirement for DDL replication. If the catalog access is
> required then it makes sense to follow the current snapshot model,
> otherwise, we may need to think differently for DDL replication.
>
> One more related point is that for DML replication, we do ensure that
> we copy the entire data of the table (via initial sync) which exists
> even before the publication for that table exists, so do we want to do
> something similar for DDLs? How do we sync the schema of the table
> before the user has defined the publication? Say the table has been
> created before the publication is defined and after that, there are
> only Alter statements, so do we expect, users to create the table on
> the subscriber and then we can replicate the Alter statements? And
> even if we do that it won't be clear which Alter statements will be
> replicated after publication is defined especially if those Alters
> happened concurrently with defining publications?
>
> The *initial* DDL replication is a different problem than DDL replication. The
> former requires a snapshot to read the current catalog data and build a CREATE
> command as part of the subscription process. The subsequent DDLs in that 
> object
> will be handled by a different approach that is being discussed here.
>

I think they are not completely independent because of the current way
to do initial sync followed by replication. The initial sync and
replication need some mechanism to ensure that one of those doesn't
overwrite the work done by the other. Now, the initial idea and patch
can be developed separately but I think both the patches have some
dependency.

-- 
With Regards,
Amit Kapila.




Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]

2022-04-12 Thread Kyotaro Horiguchi
(My mailer has been fixed.)

At Mon, 11 Apr 2022 21:45:59 +0200, Markus Wanner 
 wrote in 
> On Mon, 2022-04-11 at 15:21 -0400, Robert Haas wrote:
> > ... before v13, the commit in question actually
> > changed the size of PGXACT, which is really quite bad -- it needs to
> > be 12 bytes for performance reasons. And there's no spare bytes
> > available, so I think we should follow one of the suggestions that he
> > had over in that email thread, and put delayChkptEnd in PGPROC even
> > though delayChkpt is in PGXACT.
> 
> This makes sense to me.  Kudos to Kyotaro for considering this.
> 
> At first read, this sounded like a trade-off between compatibility and
> performance for PG 12 and older.  But I realize leaving delayChkpt in
> PGXACT and adding just delayChkptEnd to PGPROC is compatible and leaves
> PGXACT at a size of 12 bytes.  So this sounds like a good approach to
> me.

Thanks!

So, I created the patches for back-patching from 10 to 14.  (With
fixed a silly bug of the v1-pg14 that HaveVirtualXIDsDelayingChkpt and
HaveVirtualXIDsDelayingChkptEnd are inverted..)

They revert delayChkpt-related changes made by the patch and add
delayChkptEnd stuff.  I compared among every pair of the patches for
neighbouring versions, to make sure not making the same change in
different way and they have the same set of hunks.

This version takes the way sharing the common static function
(*ChkptGuts) between the functions *Chkpt() and *ChkptEnd().

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 471ca4aa38a92f837ed4e4aeda40648c3b0b0a9b Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 6 Apr 2022 15:47:50 +0900
Subject: [PATCH v2] Fix ABI/API break

Commit bbace5697d caused ABI break by changing the type and width of
PGPROC.delayChkpt and API break by changing the signature of some
public functions.  Restore the member and function signature to
previous state and add delayChkptEnd to an existing gap in PGPROC
struct.

Backpatch to 10, all supported branches.
---
 src/backend/access/transam/multixact.c  |  6 +--
 src/backend/access/transam/twophase.c   | 13 +++---
 src/backend/access/transam/xact.c   |  5 +--
 src/backend/access/transam/xlog.c   | 10 ++---
 src/backend/access/transam/xloginsert.c |  2 +-
 src/backend/catalog/storage.c   |  6 +--
 src/backend/storage/buffer/bufmgr.c |  6 +--
 src/backend/storage/ipc/procarray.c | 60 +++--
 src/backend/storage/lmgr/proc.c |  6 ++-
 src/include/storage/proc.h  | 42 +++--
 src/include/storage/procarray.h |  8 ++--
 11 files changed, 82 insertions(+), 82 deletions(-)

diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 50d8bab9e2..b643564f16 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -3075,8 +3075,8 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
 	 * crash/basebackup, even though the state of the data directory would
 	 * require it.
 	 */
-	Assert((MyProc->delayChkpt & DELAY_CHKPT_START) == 0);
-	MyProc->delayChkpt |= DELAY_CHKPT_START;
+	Assert(!MyProc->delayChkpt);
+	MyProc->delayChkpt = true;
 
 	/* WAL log truncation */
 	WriteMTruncateXlogRec(newOldestMultiDB,
@@ -3102,7 +3102,7 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
 	/* Then offsets */
 	PerformOffsetsTruncation(oldestMulti, newOldestMulti);
 
-	MyProc->delayChkpt &= ~DELAY_CHKPT_START;
+	MyProc->delayChkpt = false;
 
 	END_CRIT_SECTION();
 	LWLockRelease(MultiXactTruncationLock);
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index dea3f485f7..911d93fbf4 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -474,7 +474,8 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
 	}
 	proc->xid = xid;
 	Assert(proc->xmin == InvalidTransactionId);
-	proc->delayChkpt = 0;
+	proc->delayChkpt = false;
+	proc->delayChkptEnd = false;
 	proc->statusFlags = 0;
 	proc->pid = 0;
 	proc->databaseId = databaseid;
@@ -1165,8 +1166,7 @@ EndPrepare(GlobalTransaction gxact)
 
 	START_CRIT_SECTION();
 
-	Assert((MyProc->delayChkpt & DELAY_CHKPT_START) == 0);
-	MyProc->delayChkpt |= DELAY_CHKPT_START;
+	MyProc->delayChkpt = true;
 
 	XLogBeginInsert();
 	for (record = records.head; record != NULL; record = record->next)
@@ -1209,7 +1209,7 @@ EndPrepare(GlobalTransaction gxact)
 	 * checkpoint starting after this will certainly see the gxact as a
 	 * candidate for fsyncing.
 	 */
-	MyProc->delayChkpt &= ~DELAY_CHKPT_START;
+	MyProc->delayChkpt = false;
 
 	/*
 	 * Remember that we have this GlobalTransaction entry locked for us.  If
@@ -2276,8 +2276,7 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	START_CRIT_SECTION();
 
 	/* See notes in RecordTransactionCommit */
-	Assert((MyProc->delayChkpt & DELAY_CHKPT_START) == 0);
-	

Re: Skipping schema changes in publication

2022-04-12 Thread vignesh C
On Tue, Apr 12, 2022 at 12:19 PM Amit Kapila  wrote:
>
> On Tue, Apr 12, 2022 at 11:53 AM vignesh C  wrote:
> >
> > On Sat, Mar 26, 2022 at 7:37 PM vignesh C  wrote:
> > >
> > > On Tue, Mar 22, 2022 at 12:38 PM vignesh C  wrote:
> > > >
> > > > Hi,
> > > >
> > > > This feature adds an option to skip changes of all tables in specified
> > > > schema while creating publication.
> > > > This feature is helpful for use cases where the user wants to
> > > > subscribe to all the changes except for the changes present in a few
> > > > schemas.
> > > > Ex:
> > > > CREATE PUBLICATION pub1 FOR ALL TABLES SKIP ALL TABLES IN SCHEMA s1,s2;
> > > > OR
> > > > ALTER PUBLICATION pub1 ADD SKIP ALL TABLES IN SCHEMA s1,s2;
> > > >
> > > > A new column pnskip is added to table "pg_publication_namespace", to
> > > > maintain the schemas that the user wants to skip publishing through
> > > > the publication. Modified the output plugin (pgoutput) to skip
> > > > publishing the changes if the relation is part of skip schema
> > > > publication.
> > > > As a continuation to this, I will work on implementing skipping tables
> > > > from all tables in schema and skipping tables from all tables
> > > > publication.
> > > >
> > > > Attached patch has the implementation for this.
> > >
> > > The patch was not applying on top of HEAD because of the recent
> > > commits, attached patch is rebased on top of HEAD.
> >
> > The patch does not apply on top of HEAD because of the recent commit,
> > attached patch is rebased on top of HEAD.
> >
> > I have also included the implementation for skipping a few tables from
> > all tables publication, the 0002 patch has the implementation for the
> > same.
> > This feature is helpful for use cases where the user wants to
> > subscribe to all the changes except for the changes present in a few
> > tables.
> > Ex:
> > CREATE PUBLICATION pub1 FOR ALL TABLES SKIP TABLE t1,t2;
> > OR
> > ALTER PUBLICATION pub1 ADD SKIP  TABLE t1,t2;
> >
>
> For the second syntax (Alter Publication ...), isn't it better to
> avoid using ADD? It looks odd to me because we are not adding anything
> in publication with this sytax.

I was thinking of the scenario where user initially creates the
publication for all tables:
CREATE PUBLICATION pub1 FOR ALL TABLES;

After that user decides to skip few tables ex: t1, t2
 ALTER PUBLICATION pub1 ADD SKIP TABLE t1,t2;

I thought of supporting this syntax if incase user decides to add the
skipping of a few tables later.
Thoughts?

Regards,
Vignesh




Re: row filtering for logical replication

2022-04-12 Thread Amit Kapila
On Tue, Apr 12, 2022 at 3:45 PM Amit Kapila  wrote:
>
> On Tue, Apr 12, 2022 at 2:35 PM Alvaro Herrera  
> wrote:
> >
> > I understand that this is a minimal fix, and for that it seems OK, but I
> > think the surrounding style is rather baroque.  This code can be made
> > simpler.  Here's my take on it.
> >
>
> We don't have a lock on the relation, so if it gets dropped
> concurrently, it won't behave sanely. For example, get_rel_name() will
> return NULL which seems incorrect to me.
>

It seems to me that we have a similar coding pattern in ExecGrant_Relation().

-- 
With Regards,
Amit Kapila.




Re: row filtering for logical replication

2022-04-12 Thread Amit Kapila
On Tue, Apr 12, 2022 at 2:35 PM Alvaro Herrera  wrote:
>
> I understand that this is a minimal fix, and for that it seems OK, but I
> think the surrounding style is rather baroque.  This code can be made
> simpler.  Here's my take on it.
>

We don't have a lock on the relation, so if it gets dropped
concurrently, it won't behave sanely. For example, get_rel_name() will
return NULL which seems incorrect to me.

>  I think it's also faster: we avoid
> looking up pg_publication_rel entries for rels that aren't partitioned
> tables.
>

I am not sure about this as well because you will instead do a RELOID
cache lookup even when there is no row filter or column list.

-- 
With Regards,
Amit Kapila.




RE: WIP: WAL prefetch (another approach)

2022-04-12 Thread Shinoda, Noriyoshi (PN Japan FSIP)
Hi,
Thank you for your reply. 
I missed the message, sorry.

Regards,
Noriyoshi Shinoda

-Original Message-
From: Thomas Munro  
Sent: Tuesday, April 12, 2022 6:28 PM
To: Shinoda, Noriyoshi (PN Japan FSIP) 
Cc: Justin Pryzby ; Tomas Vondra 
; Stephen Frost ; Andres 
Freund ; Jakub Wartak ; Alvaro 
Herrera ; Tomas Vondra 
; Dmitry Dolgov <9erthali...@gmail.com>; David 
Steele ; pgsql-hackers 
Subject: Re: WIP: WAL prefetch (another approach)

On Tue, Apr 12, 2022 at 9:03 PM Shinoda, Noriyoshi (PN Japan FSIP) 
 wrote:
> Thank you for developing the great feature. I tested this feature and checked 
> the documentation. Currently, the documentation for the 
> pg_stat_prefetch_recovery view is included in the description for the 
> pg_stat_subscription view.
>
> INVALID URI REMOVED
> toring-stats.html*MONITORING-PG-STAT-SUBSCRIPTION__;Iw!!NpxR!xRu7zc4Hc
> ZppB-32Fp3YfESPqJ7B4AOP_RF7QuYP-kCWidoiJ5txu9CW8sX61TfwddE$

Hi!  Thanks.  I had just committed a fix before I saw your message, because 
there was already another report here:

https://www.postgresql.org/message-id/flat/cakrakevk-lrhmdyt6x_p33ef6dcorm2jed5h_ehdrdv0res...@mail.gmail.com
 


Re: Temporary file access API

2022-04-12 Thread Antonin Houska
Robert Haas  wrote:

> On Mon, Apr 11, 2022 at 4:05 AM Antonin Houska  wrote:
> > There are't really that many kinds of files to encrypt:
> >
> > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#List_of_the_files_that_contain_user_data
> >
> > (And pg_stat/* files should be removed from the list.)
> 
> This kind of gets into some theoretical questions. Like, do we think
> that it's an information leak if people can look at how many
> transactions are committing and aborting in pg_xact_status? In theory
> it could be, but I know it's been argued that that's too much of a
> side channel. I'm not sure I believe that, but it's arguable.

I was referring to the fact that the statistics are no longer stored in files:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5891c7a8ed8f2d3d577e7eea34dacff12d7b6bbd

> Similarly, the argument that global/pg_internal.init doesn't contain
> user data relies on the theory that the only table data that will make
> its way into the file is for system catalogs. I guess that's not user
> data *exactly* but ... are we sure that's how we want to roll here?

Yes, this is worth attention.

> I really don't know how you can argue that pg_dynshmem/mmap.NNN
> doesn't contain user data - surely it can.

Good point. Since postgres does not control writing into this file, it's a
special case though. (Maybe TDE will have to reject to start if
dynamic_shared_memory_type is set to mmap and the instance is encrypted.)

Thanks.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: WIP: WAL prefetch (another approach)

2022-04-12 Thread Thomas Munro
On Tue, Apr 12, 2022 at 9:03 PM Shinoda, Noriyoshi (PN Japan FSIP)
 wrote:
> Thank you for developing the great feature. I tested this feature and checked 
> the documentation. Currently, the documentation for the 
> pg_stat_prefetch_recovery view is included in the description for the 
> pg_stat_subscription view.
>
> https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-SUBSCRIPTION

Hi!  Thanks.  I had just committed a fix before I saw your message,
because there was already another report here:

https://www.postgresql.org/message-id/flat/CAKrAKeVk-LRHMdyT6x_p33eF6dCorM2jed5h_eHdRdv0reSYTA%40mail.gmail.com




Re: Documentation issue with pg_stat_recovery_prefetch

2022-04-12 Thread Thomas Munro
On Tue, Apr 12, 2022 at 8:11 AM sirisha chamarthi
 wrote:
> I was going through pg_stat_recovery_prefetch documentation and saw an issue 
> with formatting. Attached a small patch to fix the issue. This is the first 
> time I am sending an email to hackers. Please educate me if I miss something.

Thanks Sirisha!

Ouch, that's embarrassing.  My best guess is that I might have screwed
that up a long time ago while rebasing an early development version
over commit 92f94686, which changed the link style and moved
paragraphs around, and then never noticed that it was wrong.
Researching that made me notice another problem: the table was using
the 3 column layout from a couple of years ago, because I had also
missed the style change in commit a0427506.  Oops.  Fixed.




Re: Fixes for compression options of pg_receivewal and refactoring of backup_compression.{c,h}

2022-04-12 Thread Michael Paquier
On Mon, Apr 11, 2022 at 12:46:02PM +, gkokola...@pm.me wrote:
> On Monday, April 11th, 2022 at 8:52 AM, Michael Paquier 
> wrote:
>> - 0001 is a simple rename of backup_compression.{c,h} to
>> compression.{c,h}, removing anything related to base backups from
>> that. One extra reason behind this renaming is that I would like to
>> use this infra for pg_dump, but that's material for 16~.
> 
> I agree with the design. If you permit me a couple of nitpicks regarding 
> naming.
> 
> +typedef enum pg_compress_algorithm
> +{
> +   PG_COMPRESSION_NONE,
> +   PG_COMPRESSION_GZIP,
> +   PG_COMPRESSION_LZ4,
> +   PG_COMPRESSION_ZSTD
> +} pg_compress_algorithm;
> 
> Elsewhere in the codebase, (e.g. create_table.sgml, alter_table.sgml,
> brin_tuple.c, detoast.c, toast_compression.c, tupdesc.c, gist.c to mention a
> few) variations of of the nomenclature "compression method" are used, like
> 'VARDATA_COMPRESSED_GET_COMPRESS_METHOD' or 'InvalidCompressionMethod' etc. I
> feel that it would be nicer if we followed one naming rule for this and I
> recommend to substitute algorithm for method throughout.

Technically and as far as I know, both are correct and hold more or
less the same meaning.  pg_basebackup's code exposes algorithm in a
more extended way, so I have just stuck to it for the internal
variables and such.  Perhaps we could rename the whole, but I see no
strong reason to do that.

> Last, even though it is not needed now, it will be helpful to have a
> PG_COMPRESSION_INVALID in some scenarios. Though we can add it when we come to
> it.

Perhaps.  There is no need for it yet, though.  pg_dump would not need
that, as well.

>> - 0002 removes WalCompressionMethod, replacing it by
>> pg_compress_algorithm as these are the same enums. Robert complained
>> about the confusion that WalCompressionMethod could lead to as this
>> could be used for the compression of data, and not only WAL. I have
>> renamed some variables to be more consistent, while on it.
> 
> It looks good. If you choose to discard the comment regarding the use of
> 'method' over 'algorithm' from above, can you please use the full word in the
> variable, e.g. 'wal_compress_algorithm' instead of 'wal_compress_algo'. I can
> not really explain it, the later reads a bit rude. Then again that may be just
> me.

Thanks.  I have been able to do an extra pass on 0001 and 0002, fixing
those naming inconsistencies with "algo" vs "algorithm" that you and
Robert have reported, and applied them.  For 0003, I'll look at it
later.  Attached is a rebase with improvements about the variable
names.
--
Michael
From 94850ac604402371135e85de27a5d9074a947d4b Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 12 Apr 2022 18:14:28 +0900
Subject: [PATCH v2] Rework compression options of pg_receivewal

Since ba5 and the introduction of LZ4 in pg_receivewal, the
compression of archived WAL is controlled by two options:
- --compression-method with "gzip", "none" or "lz4" as possible value.
- --compress=N to specify a compression level, with a
backward-incompatible change where a value of zero leads to a failure
instead of no compression.

This commit takes advantage of the previous refactoring done to rework
the compression options of pg_receivewal:
- --compression-method is removed.
- --compress is extended to use the same grammar as pg_basebackup, as of
a METHOD:DETAIL specification, where a METHOD is "gzip", "none" or "lz4"
and a DETAIL is a comma-separated list of options, the only keyword
supported is now "level" to control the compression level.  If only an
integer is specified as value of this option, "none" is implied for 0
and "gzip" is implied.  So this brings back --compress to be
backward-compatible with ~14, while still supporting LZ4.

This has the advantage of centralizing the set of checks used by
pg_basebackup.
---
 src/bin/pg_basebackup/pg_receivewal.c| 131 +--
 src/bin/pg_basebackup/t/020_pg_receivewal.pl |  16 +--
 doc/src/sgml/ref/pg_receivewal.sgml  |  47 ---
 3 files changed, 121 insertions(+), 73 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index ede1d4648d..f942883332 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -57,6 +57,8 @@ static XLogRecPtr endpos = InvalidXLogRecPtr;
 
 
 static void usage(void);
+static void parse_compress_options(char *option, char **algorithm,
+   char **detail);
 static DIR *get_destination_dir(char *dest_folder);
 static void close_destination_dir(DIR *dest_dir, char *dest_folder);
 static XLogRecPtr FindStreamingStart(uint32 *tli);
@@ -90,9 +92,8 @@ usage(void)
 	printf(_("  --synchronous  flush write-ahead log immediately after writing\n"));
 	printf(_("  -v, --verbose  output verbose messages\n"));
 	printf(_("  -V, --version  output version information, then exit\n"));
-	printf(_("  --compression-method=METHOD\n"
-			 "

Re: row filtering for logical replication

2022-04-12 Thread Alvaro Herrera
I understand that this is a minimal fix, and for that it seems OK, but I
think the surrounding style is rather baroque.  This code can be made
simpler.  Here's my take on it.  I think it's also faster: we avoid
looking up pg_publication_rel entries for rels that aren't partitioned
tables.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Cuando mañana llegue pelearemos segun lo que mañana exija" (Mowgli)
diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c
index 7aacb6b2fe..e8ef003fe5 100644
--- a/src/backend/commands/publicationcmds.c
+++ b/src/backend/commands/publicationcmds.c
@@ -945,60 +945,42 @@ AlterPublicationOptions(ParseState *pstate, AlterPublicationStmt *stmt,
 
 		foreach(lc, root_relids)
 		{
-			HeapTuple	rftuple;
 			Oid			relid = lfirst_oid(lc);
-			bool		has_column_list;
-			bool		has_row_filter;
+			char		relkind;
+			HeapTuple	rftuple;
+
+			relkind = get_rel_relkind(relid);
+			if (relkind != RELKIND_PARTITIONED_TABLE)
+continue;
 
 			rftuple = SearchSysCache2(PUBLICATIONRELMAP,
 	  ObjectIdGetDatum(relid),
 	  ObjectIdGetDatum(pubform->oid));
+			if (!HeapTupleIsValid(rftuple))
+continue;
 
-			has_row_filter
-= !heap_attisnull(rftuple, Anum_pg_publication_rel_prqual, NULL);
+			if (!heap_attisnull(rftuple, Anum_pg_publication_rel_prqual, NULL))
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("cannot set %s to false for publication \"%s\"",
+"publish_via_partition_root",
+stmt->pubname),
+		 errdetail("The publication contains a WHERE clause for a partitioned table \"%s\" which is not allowed when %s is false.",
+   get_rel_name(relid),
+   "publish_via_partition_root")));
 
-			has_column_list
-= !heap_attisnull(rftuple, Anum_pg_publication_rel_prattrs, NULL);
+			if (!heap_attisnull(rftuple, Anum_pg_publication_rel_prattrs, NULL))
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("cannot set %s to false for publication \"%s\"",
+"publish_via_partition_root",
+stmt->pubname),
+		 errdetail("The publication contains a column list for a partitioned table \"%s\" which is not allowed when %s is false.",
+   get_rel_name(relid),
+   "publish_via_partition_root")));
 
-			if (HeapTupleIsValid(rftuple) &&
-(has_row_filter || has_column_list))
-			{
-HeapTuple	tuple;
+			ReleaseSysCache(rftuple);
 
-tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
-if (HeapTupleIsValid(tuple))
-{
-	Form_pg_class relform = (Form_pg_class) GETSTRUCT(tuple);
-
-	if ((relform->relkind == RELKIND_PARTITIONED_TABLE) &&
-		has_row_filter)
-		ereport(ERROR,
-(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("cannot set %s for publication \"%s\"",
-		"publish_via_partition_root = false",
-		stmt->pubname),
- errdetail("The publication contains a WHERE clause for a partitioned table \"%s\" "
-		   "which is not allowed when %s is false.",
-		   NameStr(relform->relname),
-		   "publish_via_partition_root")));
-
-	if ((relform->relkind == RELKIND_PARTITIONED_TABLE) &&
-		has_column_list)
-		ereport(ERROR,
-(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("cannot set %s for publication \"%s\"",
-		"publish_via_partition_root = false",
-		stmt->pubname),
- errdetail("The publication contains a column list for a partitioned table \"%s\" "
-		   "which is not allowed when %s is false.",
-		   NameStr(relform->relname),
-		   "publish_via_partition_root")));
-
-	ReleaseSysCache(tuple);
-}
-
-ReleaseSysCache(rftuple);
-			}
 		}
 	}
 
diff --git a/src/test/regress/expected/publication.out b/src/test/regress/expected/publication.out
index 8208f9fa0e..580cc5be7f 100644
--- a/src/test/regress/expected/publication.out
+++ b/src/test/regress/expected/publication.out
@@ -588,7 +588,7 @@ UPDATE rf_tbl_abcd_part_pk SET a = 1;
 -- fail - cannot set PUBLISH_VIA_PARTITION_ROOT to false if any row filter is
 -- used for partitioned table
 ALTER PUBLICATION testpub6 SET (PUBLISH_VIA_PARTITION_ROOT=0);
-ERROR:  cannot set publish_via_partition_root = false for publication "testpub6"
+ERROR:  cannot set publish_via_partition_root to false for publication "testpub6"
 DETAIL:  The publication contains a WHERE clause for a partitioned table "rf_tbl_abcd_part_pk" which is not allowed when publish_via_partition_root is false.
 -- Now change the root filter to use a column "b"
 -- (which is not in the replica identity)
@@ -951,7 +951,7 @@ UPDATE rf_tbl_abcd_part_pk SET a = 1;
 -- fail - cannot set PUBLISH_VIA_PARTITION_ROOT to false if any column list is
 -- used for partitioned table
 ALTER PUBLICATION testpub6 SET (PUBLISH_VIA_PARTITION_ROOT=0);
-ERROR:  cannot set publish_via_partition_root = 

RE: WIP: WAL prefetch (another approach)

2022-04-12 Thread Shinoda, Noriyoshi (PN Japan FSIP)
Hi, 
Thank you for developing the great feature. I tested this feature and checked 
the documentation. Currently, the documentation for the 
pg_stat_prefetch_recovery view is included in the description for the 
pg_stat_subscription view.

https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-SUBSCRIPTION

It is also not displayed in the list of "28.2. The Statistics Collector".
https://www.postgresql.org/docs/devel/monitoring.html

The attached patch modifies the pg_stat_prefetch_recovery view to appear as a 
separate view.

Regards,
Noriyoshi Shinoda
-Original Message-
From: Thomas Munro  
Sent: Friday, April 8, 2022 10:47 AM
To: Justin Pryzby 
Cc: Tomas Vondra ; Stephen Frost 
; Andres Freund ; Jakub Wartak 
; Alvaro Herrera ; Tomas 
Vondra ; Dmitry Dolgov <9erthali...@gmail.com>; 
David Steele ; pgsql-hackers 
Subject: Re: WIP: WAL prefetch (another approach)

On Fri, Apr 8, 2022 at 12:55 AM Justin Pryzby  wrote:
> The docs seem to be wrong about the default.
>
> +are not yet in the buffer pool, during recovery.  Valid values are
> +off (the default), on and
> +try.  The setting try 
> + enables

Fixed.

> +   concurrency and distance, respectively.  By default, it is set to
> +   try, which enabled the feature on systems where
> +   posix_fadvise is available.
>
> Should say "which enables".

Fixed.

> Curiously, I reported a similar issue last year.

Sorry.  I guess both times we only agreed on what the default should be in the 
final review round before commit, and I let the docs get out of sync (well, the 
default is mentioned in two places and I apparently ended my search too soon, 
changing only one).  I also found another recently obsoleted sentence: the one 
about showing nulls sometimes was no longer true.  Removed.




pg_stat_recovery_prefetch_doc_v1.diff
Description: pg_stat_recovery_prefetch_doc_v1.diff


Re: PG DOCS - logical replication filtering

2022-04-12 Thread Peter Smith
On Tue, Apr 12, 2022 at 3:33 AM Euler Taveira  wrote:
>
> On Mon, Apr 11, 2022, at 7:40 AM, Amit Kapila wrote:
>
> On Mon, Apr 11, 2022 at 12:39 PM Peter Smith  wrote:
> >
> > On Fri, Apr 8, 2022 at 4:12 PM Peter Smith  wrote:
> >
> > OK. Added in v7 [1]
> >
>
> Thanks, this looks mostly good to me. I didn't like the new section
> added for partitioned tables examples, so I removed it and added some
> explanation of the tests. I have slightly changed a few other lines. I
> am planning to commit the attached tomorrow unless there are more
> comments.
>
> I have a few comments.

Thanks for your review comments! I addressed most of them as suggested
- see the details below.

>
> > If a publication publishes UPDATE and/or DELETE operations ...
> >
>
> If we are talking about operations, use lowercase like I suggested in the
> previous version. See the publish parameter [1]. If we are talking about
> commands, the word "operations" should be removed or replaced by "commands".
> The "and/or" isn't required, "or" implies the same. If you prefer 
> "operations",
> my suggestion is to adjust the last sentence that says "only INSERT" to "only
> insert operation".

Not changed. Because in fact, I copied most of this sentence
(including the uppercase, "operations", "and/or") from existing
documentation [1]
e.g. see "The tables added to a publication that publishes UPDATE
and/or DELETE operations must ..."
[1] https://www.postgresql.org/docs/current/sql-createpublication.html


>
> > If the old row satisfies the row filter expression (it was sent to the
> > subscriber) but the new row doesn't, then from a data consistency 
> > perspective
> > the old row should be removed from the subscriber.
> >
>
> I suggested small additions to this sentence. We should at least add a comma
> after "then" and "perspective". The same applies to the next paragraph too.

Modified the commas in [v9] as suggested.

>
> Regarding the "Summary", it is redundant as I said before. We already 
> described
> all 4 cases. I vote to remove it. However, if we would go with a table, I
> suggest to improve the formatting: add borders, "old row" and "new row" should
> be titles, "no match" and "match" should be represented by symbols ("" and 
> "X",
> for example), and "Case X" column should be removed because this extra column
> adds nothing.

Yeah, I know the information is the same in the summary and in the
text. Personally, I find big slabs of technical text difficult to
digest, so I'd have to spend 5 minutes of careful reading and drawing
the exact same summary on a piece of paper anyway just to visualize
what the text says. The summary makes it easy to understand at a
glance. But I have modified the summary in [v9] to remove the "case"
and to add other column headings as suggested.

>
> Regarding the "Partitioned Tables", I suggested to remove the bullets. We
> generally have paragraphs with a few sentences. I tend to avoid short
> paragraphs. I also didn't like the 2 bullets using almost the same words. In
> the previous version, I suggested something like
>
> If the publication contains a partitioned table, the parameter
> publish_via_partition_root determines which row filter expression is used. If
> the parameter publish_via_partition_root is true, the row filter expression
> associated with the partitioned table is used. Otherwise, the row filter
> expression associated with the individual partition is used.
>

Modified in [v9] to remove the bullets.

> > will be copied. (see Section 31.3.6 for details).
>
> There is an extra period after "copied" that should be removed. The other
> option is to remove the parentheses and have another sentence for "See ...".
>

Modified in [v9] as suggested.

> > those expressions get OR'ed together
>
> I prefer plain English here. This part of the sentence is also redundant with
> the rest of the sentence so I suggested to remove it in the previous version.
>
> rows that satisfy any of the row filter expressions is replicated.
>
> instead of
>
> those expressions get OR'ed together, so that rows satisfying any of the
> expressions will be replicated.

Not changed. The readers of this docs page are all users who will be
familiar with the filter expressions, so I felt the "OR'ed together"
part will be perfectly clear to the intended audience.

>
> I also didn't use a different paragraph (like I suggested in the previous
> version) because we are talking about the same thing.
>

Modified in [v9] to use a single paragraph.

> The bullets in the example sounds strange, that's why I suggested removing it.
> We can even combine the 3 sentences into one paragraph.

Modified [v9] so the whole example now has no bullets. Also combined
all these 3 sentences as suggested.

>
> > The PSQL command \dRp+ shows the row filter expressions (if defined) for 
> > each
> > table of the publications.
>
> Well, we don't use PSQL (upppercase) in the documentation. I suggested a
> different sentence:
>
> The psql shows the row 

Re: random() function documentation

2022-04-12 Thread Fabien COELHO



How about we just say "uses a linear-feedback shift register algorithm"?


I think it'd be sufficient to just say that it's a deterministic
pseudorandom number generator. I don't see much value in documenting
the internal algorithm used.


Hmmm… I'm not so sure. ISTM that people interested in using the random 
user-facing variants (only random?) could like a pointer on the algorithm 
to check for the expected quality of the produced pseudo-random stream?


See attached.

Should we perhaps also add a warning that the same seed is not 
guaranteed to produce the same sequence across different (major?) 
versions?


I wouldn't bother, on the grounds that then we'd need such disclaimers
in a whole lot of places.  Others might see it differently though.


Agreed,


Agreed.


though I think when the release notes are written, they ought
to warn that the sequence will change with this release.


Yes.

--
Fabien.diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 0a5c402640..7492454592 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1832,10 +1832,11 @@ repeat('Pg', 4) PgPgPgPg

 
   
-   The random() function uses a simple linear
-   congruential algorithm.  It is fast but not suitable for cryptographic
-   applications; see the  module for a more
-   secure alternative.
+   The random() function uses
+   https://en.wikipedia.org/wiki/Xoroshiro128%2B;>xoroshiro128**, a
+   linear feedback shift register algorithm.
+   It is fast but not suitable for cryptographic applications;
+   see the  module for a more secure alternative.
If setseed() is called, the series of results of
subsequent random() calls in the current session
can be repeated by re-issuing setseed() with the same


Re: Unable to connect to Postgres13 server from psql client built on master

2022-04-12 Thread Julien Rouhaud
Hi,

On Tue, Apr 12, 2022 at 12:47:54AM -0700, sirisha chamarthi wrote:
>
> I am unable to connect to my Postgres server (version 13 running) in Azure
> Postgres from the PSQL client built on the latest master. However, I am
> able to connect to the Postgres 15 server running locally on the machine. I
> installed an earlier version of the PSQL client (v 12) and was able to
> connect to both the Azure PG instance as well as the local instance. Can
> this be a bug in the master? I tried looking at the server logs in Azure
> but couldn't get anything meaningful from those. Any tips on how I can
> debug psql client further?
>
> root@userspgdev:/usr/local/pgsql# psql -U postgres -h
> inst.postgres.database.azure.com -d postgres
> Password for user postgres:
> psql (12.9 (Ubuntu 12.9-0ubuntu0.20.04.1), server 13.6)
> WARNING: psql major version 12, server major version 13.
>  Some psql features might not work.
> SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits:
> 256, compression: off)
> Type "help" for help.
>
> postgres=> \q
>
>  bin/psql -U postgres -h inst.postgres.database.azure.com -d postgres
> psql: error: connection to server at "inst.postgres.database.azure.com"
> (20.116.167.xx), port 5432 failed: FATAL:  no pg_hba.conf entry for host
> "20.125.61.xx", user "postgres", database "postgres", SSL off

It's hard to be sure without the pg_hba.conf file, but the most likely
explanation is that your remote server only accept connection with SSL and you
haven't built your local binaries with SSL support.




Re: PG DOCS - logical replication filtering

2022-04-12 Thread Peter Smith
PSA patch v9 which addresses most of Euler's review comments [1]

--
[1] 
https://www.postgresql.org/message-id/1c78ebd4-b38d-4b5d-a6ea-d583efe87d97%40www.fastmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia


v9-0001-Add-additional-documentation-for-row-filters.patch
Description: Binary data


Unable to connect to Postgres13 server from psql client built on master

2022-04-12 Thread sirisha chamarthi
Hi hackers.

I am unable to connect to my Postgres server (version 13 running) in Azure
Postgres from the PSQL client built on the latest master. However, I am
able to connect to the Postgres 15 server running locally on the machine. I
installed an earlier version of the PSQL client (v 12) and was able to
connect to both the Azure PG instance as well as the local instance. Can
this be a bug in the master? I tried looking at the server logs in Azure
but couldn't get anything meaningful from those. Any tips on how I can
debug psql client further?

My local server is running with trust authentication and the remote server
is running with md5 in the pg_hba.conf. I am not sure if this changes the
psql behavior somehow.

root@userspgdev:/usr/local/pgsql# ./psql -U postgres -h
inst.postgres.database.azure.com -d postgres
bash: ./psql: No such file or directory
root@userspgdev:/usr/local/pgsql# psql -U postgres -h
inst.postgres.database.azure.com -d postgres
Password for user postgres:
psql (12.9 (Ubuntu 12.9-0ubuntu0.20.04.1), server 13.6)
WARNING: psql major version 12, server major version 13.
 Some psql features might not work.
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits:
256, compression: off)
Type "help" for help.

postgres=> \q
 bin/psql -U postgres -h inst.postgres.database.azure.com -d postgres
psql: error: connection to server at "inst.postgres.database.azure.com"
(20.116.167.xx), port 5432 failed: FATAL:  no pg_hba.conf entry for host
"20.125.61.xx", user "postgres", database "postgres", SSL off

Also, wondering why no error is emitted by the psql client when the
connection attempt fails?

Thanks,
Sirisha


Re: MERGE bug report

2022-04-12 Thread Alvaro Herrera
On 2022-Apr-11, Richard Guo wrote:

> At first I was wondering whether we need to also include vars used in
> each action's targetlist, just as what we did for each action's qual.
> Then later I realized parse_merge.c already did that. But now it looks
> much better to process them two in preprocess_targetlist.

Yeah.  I pushed that.

However, now EXPLAIN VERBOSE doesn't show the columns from the source
relation in the Output line --- I think only those that are used as join
quals are shown, thanks to distribute_quals_to_rels.  I think it would
be better to fix this.  Maybe expanding the source target list earlier
is called for, after all.  I looked at transformUpdateStmt and siblings
for inspiration, but came out blank.

> A minor comment is that we can use list_concat_copy(list1, list2)
> instead of list_concat(list_copy(list1), list2) for better efficiency.

Thanks for that tip.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"La vida es para el que se aventura"




fix cost subqueryscan wrong parallel cost

2022-04-12 Thread bu...@sohu.com
The cost_subqueryscan function does not judge whether it is parallel.
regress
-- Incremental sort vs. set operations with varno 0
set enable_hashagg to off;
explain (costs off) select * from t union select * from t order by 1,3;
QUERY PLAN
--
 Incremental Sort
   Sort Key: t.a, t.c
   Presorted Key: t.a
   ->  Unique
 ->  Sort
   Sort Key: t.a, t.b, t.c
   ->  Append
 ->  Gather
   Workers Planned: 2
   ->  Parallel Seq Scan on t
 ->  Gather
   Workers Planned: 2
   ->  Parallel Seq Scan on t t_1
to
 Incremental Sort
   Sort Key: t.a, t.c
   Presorted Key: t.a
   ->  Unique
 ->  Sort
   Sort Key: t.a, t.b, t.c
   ->  Gather
 Workers Planned: 2
 ->  Parallel Append
   ->  Parallel Seq Scan on t
   ->  Parallel Seq Scan on t t_1
Obviously the latter is less expensive



bu...@sohu.com


fix-cost_subqueryscan-get-wrong-parallel-cost.patch
Description: Binary data


Re: Handle infinite recursion in logical replication setup

2022-04-12 Thread vignesh C
On Tue, Apr 12, 2022 at 10:26 AM Peter Smith  wrote:
>
> On Thu, Apr 7, 2022 at 2:09 PM Peter Smith  wrote:
> >
> > FYI, here is a test script that is using the current patch (v6) to
> > demonstrate a way to share table data between different numbers of
> > nodes (up to 5 of them here).
> >
> > The script starts off with just 2-way sharing (nodes N1, N2),
> > then expands to 3-way sharing (nodes N1, N2, N3),
> > then 4-way sharing (nodes N1, N2, N3, N4),
> > then 5-way sharing (nodes N1, N2, N3, N4, N5).
> >
> > As an extra complication, for this test, all 5 nodes have different
> > initial table data, which gets replicated to the others whenever each
> > new node joins the existing share group.
> >
> > PSA.
> >
>
>
> Hi Vignesh. I had some problems getting the above test script working.
> It was OK up until I tried to join the 5th node (N5) to the existing 4
> nodes. The ERROR was manifesting itself strangely because it appeared
> that there was an index violation in the pg_subscription_rel catalog
> even though AFAIK the N5 did not have any entries in it.
>
>
> e.g.
> 2022-04-07 09:13:28.361 AEST [24237] ERROR: duplicate key value
> violates unique constraint "pg_subscription_rel_srrelid_srsubid_index"
> 2022-04-07 09:13:28.361 AEST [24237] DETAIL: Key (srrelid,
> srsubid)=(16384, 16393) already exists.
> 2022-04-07 09:13:28.361 AEST [24237] STATEMENT: create subscription
> sub51 connection 'port=7651' publication pub1 with
> (subscribe_local_only=true,copy_data=force);
> 2022-04-07 09:13:28.380 AEST [24237] ERROR: duplicate key value
> violates unique constraint "pg_subscription_rel_srrelid_srsubid_index"
> 2022-04-07 09:13:28.380 AEST [24237] DETAIL: Key (srrelid,
> srsubid)=(16384, 16394) already exists.
> 2022-04-07 09:13:28.380 AEST [24237] STATEMENT: create subscription
> sub52 connection 'port=7652' publication pub2 with
> (subscribe_local_only=true,copy_data=false);
> 2022-04-07 09:13:28.405 AEST [24237] ERROR: duplicate key value
> violates unique constraint "pg_subscription_rel_srrelid_srsubid_index"
> 2022-04-07 09:13:28.405 AEST [24237] DETAIL: Key (srrelid,
> srsubid)=(16384, 16395) already exists.
> 2022-04-07 09:13:28.405 AEST [24237] STATEMENT: create subscription
> sub53 connection 'port=7653' publication pub3 with
> (subscribe_local_only=true,copy_data=false);
> 2022-04-07 09:13:28.425 AEST [24237] ERROR: duplicate key value
> violates unique constraint "pg_subscription_rel_srrelid_srsubid_index"
> 2022-04-07 09:13:28.425 AEST [24237] DETAIL: Key (srrelid,
> srsubid)=(16384, 16396) already exists.
> 2022-04-07 09:13:28.425 AEST [24237] STATEMENT: create subscription
> sub54 connection 'port=7654' publication pub4 with
> (subscribe_local_only=true,copy_data=false);
> 2022-04-07 09:17:52.472 AEST [25852] ERROR: duplicate key value
> violates unique constraint "pg_subscription_rel_srrelid_srsubid_index"
> 2022-04-07 09:17:52.472 AEST [25852] DETAIL: Key (srrelid,
> srsubid)=(16384, 16397) already exists.
> 2022-04-07 09:17:52.472 AEST [25852] STATEMENT: create subscription
> sub51 connection 'port=7651' publication pub1;
>
> ~~~
>
> When I debugged this it seemed like each of the CREAT SUBSCRIPTION was
> trying to make a double-entry, because the fetch_tables (your patch
> v6-0002 modified SQL of this) was retuning the same table 2x.
>
> (gdb) bt
> #0 errfinish (filename=0xbc1057 "nbtinsert.c", lineno=671,
> funcname=0xbc25e0 <__func__.15798> "_bt_check_unique") at elog.c:510
> #1 0x00526d83 in _bt_check_unique (rel=0x7f654219c2a0,
> insertstate=0x7ffd9629ddd0, heapRel=0x7f65421b0e28,
> checkUnique=UNIQUE_CHECK_YES, is_unique=0x7ffd9629de01,
> speculativeToken=0x7ffd9629ddcc) at nbtinsert.c:664
> #2 0x00526157 in _bt_doinsert (rel=0x7f654219c2a0,
> itup=0x19ea8e8, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=false,
> heapRel=0x7f65421b0e28) at nbtinsert.c:208
> #3 0x0053450e in btinsert (rel=0x7f654219c2a0,
> values=0x7ffd9629df10, isnull=0x7ffd9629def0, ht_ctid=0x19ea894,
> heapRel=0x7f65421b0e28, checkUnique=UNIQUE_CHECK_YES,
> indexUnchanged=false, indexInfo=0x19dea80) at nbtree.c:201
> #4 0x005213b6 in index_insert (indexRelation=0x7f654219c2a0,
> values=0x7ffd9629df10, isnull=0x7ffd9629def0, heap_t_ctid=0x19ea894,
> heapRelation=0x7f65421b0e28,  checkUnique=UNIQUE_CHECK_YES,
> indexUnchanged=false, indexInfo=0x19dea80) at indexam.c:193
> #5 0x005c81d5 in CatalogIndexInsert (indstate=0x19de540,
> heapTuple=0x19ea890) at indexing.c:158
> #6 0x005c8325 in CatalogTupleInsert (heapRel=0x7f65421b0e28,
> tup=0x19ea890) at indexing.c:231
> #7 0x005f0170 in AddSubscriptionRelState (subid=16400,
> relid=16384, state=105 'i', sublsn=0) at pg_subscription.c:315
> #8 0x006d6fa5 in CreateSubscription (pstate=0x1942dc0,
> stmt=0x191f6a0, isTopLevel=true) at subscriptioncmds.c:767
>
> ~~
>
> Aside: All this was happening when I did not have enough logical
> replication workers configured. (There were WARNINGS in the logfile
> that I had 

Re: Skipping schema changes in publication

2022-04-12 Thread Amit Kapila
On Tue, Apr 12, 2022 at 11:53 AM vignesh C  wrote:
>
> On Sat, Mar 26, 2022 at 7:37 PM vignesh C  wrote:
> >
> > On Tue, Mar 22, 2022 at 12:38 PM vignesh C  wrote:
> > >
> > > Hi,
> > >
> > > This feature adds an option to skip changes of all tables in specified
> > > schema while creating publication.
> > > This feature is helpful for use cases where the user wants to
> > > subscribe to all the changes except for the changes present in a few
> > > schemas.
> > > Ex:
> > > CREATE PUBLICATION pub1 FOR ALL TABLES SKIP ALL TABLES IN SCHEMA s1,s2;
> > > OR
> > > ALTER PUBLICATION pub1 ADD SKIP ALL TABLES IN SCHEMA s1,s2;
> > >
> > > A new column pnskip is added to table "pg_publication_namespace", to
> > > maintain the schemas that the user wants to skip publishing through
> > > the publication. Modified the output plugin (pgoutput) to skip
> > > publishing the changes if the relation is part of skip schema
> > > publication.
> > > As a continuation to this, I will work on implementing skipping tables
> > > from all tables in schema and skipping tables from all tables
> > > publication.
> > >
> > > Attached patch has the implementation for this.
> >
> > The patch was not applying on top of HEAD because of the recent
> > commits, attached patch is rebased on top of HEAD.
>
> The patch does not apply on top of HEAD because of the recent commit,
> attached patch is rebased on top of HEAD.
>
> I have also included the implementation for skipping a few tables from
> all tables publication, the 0002 patch has the implementation for the
> same.
> This feature is helpful for use cases where the user wants to
> subscribe to all the changes except for the changes present in a few
> tables.
> Ex:
> CREATE PUBLICATION pub1 FOR ALL TABLES SKIP TABLE t1,t2;
> OR
> ALTER PUBLICATION pub1 ADD SKIP  TABLE t1,t2;
>

For the second syntax (Alter Publication ...), isn't it better to
avoid using ADD? It looks odd to me because we are not adding anything
in publication with this sytax.


-- 
With Regards,
Amit Kapila.




Re: avoid multiple hard links to same WAL file after a crash

2022-04-12 Thread Kyotaro Horiguchi
At Mon, 11 Apr 2022 09:52:57 -0700, Nathan Bossart  
wrote in 
> On Mon, Apr 11, 2022 at 12:28:47PM -0400, Tom Lane wrote:
> > Robert Haas  writes:
> >> On Mon, Apr 11, 2022 at 5:12 AM Kyotaro Horiguchi
> >>  wrote:
> >>> If this diagnosis is correct, the comment is proved to be paranoid.
> > 
> >> It's sometimes difficult to understand what problems really old code
> >> comments are worrying about. For example, could they have been
> >> worrying about bugs in the code? Could they have been worrying about
> >> manual interference with the pg_wal directory? It's hard to know.
> > 
> > "git blame" can be helpful here, if you trace back to when the comment
> > was written and then try to find the associated mailing-list discussion.
> > (That leap can be difficult for commits pre-dating our current
> > convention of including links in the commit message, but it's usually
> > not *that* hard to locate contemporaneous discussion.)
> 
> I traced this back a while ago.  I believe the link() was first added in
> November 2000 as part of f0e37a8.  This even predates WAL recycling, which
> was added in July 2001 as part of 7d4d5c0.

f0e37a8 lacks discussion.. It introduced the CHECKPOINT command from
somwhere out of the ML.. This patch changed XLogFileInit to
supportusing existent files so that XLogWrite can use the new segment
provided by checkpoint and still allow XLogWrite to create a new
segment by itself.

Just before the commit, calls to XLogFileInit were protected (or
serialized) by logwr_lck.  At the commit calls to the same function
were still serialized by ControlFileLockId.

I *guess* that Vadim faced/noticed a race condition when he added
checkpoint. Thus introduced the link+remove protocol but finally it
became useless by moving the call to XLogFileInit within
ControlFileLockId section.  But, of course, all of story is mere a
guess.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center