Re: row filtering for logical replication

2021-11-27 Thread Peter Smith
On Fri, Nov 26, 2021 at 1:16 PM houzj.f...@fujitsu.com
 wrote:
>
...
> Based on this direction, I tried to write a top up POC patch(0005) which I'd 
> like to share.
>
> The top up patch mainly did the following things.
>
> * Move the row filter columns invalidation to CheckCmdReplicaIdentity, so that
> the invalidation is executed only when actual UPDATE or DELETE executed on the
> published relation. It's consistent with the existing check about replica
> identity.
>
> * Cache the results of the validation for row filter columns in relcache to
> reduce the cost of the validation. It's safe because every operation that
> change the row filter and replica identity will invalidate the relcache.
>
> Also attach the v42 patch set to keep cfbot happy.

Hi Hou-san.

Thanks for providing your "top-up" 0005 patch!

I suppose the goal will be to later merge this top-up with the current
0002 validation patch, but in the meantime here are my review comments
for 0005.

==

1) src/include/catalog/pg_publication.h - PublicationInfo
+typedef struct PublicationInfo
+{
+ PublicationActions pubactions;
+
+ /*
+ * True if pubactions don't include UPDATE and DELETE or
+ * all the columns in the row filter expression are part
+ * of replica identity.
+ */
+ bool rfcol_valid_for_replid;
+} PublicationInfo;
+

IMO "PublicationInfo" sounded too much like it is about the
Publication only, but IIUC it is really *per* Relation publication
info, right? So I thought perhaps it should be called more like struct
"RelationPubInfo".

==

2) src/include/catalog/pg_publication.h - PublicationInfo

The member "rfcol_valid_for_replid" also seems a little bit mis-named
because in some scenario (not UPDATE/DELETE) it can be true even if
there is not replica identity columns. So I thought perhaps it should
be called more like just "rfcols_valid"

Another thing - IIUC this is a kind of a "unified" boolean that covers
*all* filters for this Relation (across multiple publications). If
that is right., then the comment for this member should say something
about this.

==

3) src/include/catalog/pg_publication.h - PublicationInfo

This new typedef should be added to src/tools/pgindent/typedefs.list

==

4) src/backend/catalog/pg_publication.c - check_rowfilter_replident
+/*
+ * Check if all the columns used in the row-filter WHERE clause are part of
+ * REPLICA IDENTITY
+ */
+bool
+check_rowfilter_replident(Node *node, Bitmapset *bms_replident)
+{

IIUC here the false means "valid" and true means "invalid" which is
counter-intuitive to me. So at least true/false meaning ought to be
clarified in the function comment, and/or perhaps also rename the
function so that the return meaning is more obvious.

==

5) src/backend/executor/execReplication.c - CheckCmdReplicaIdentity
+ pubinfo = RelationGetPublicationInfo(rel);
+

IIUC this pubinfo* is palloced *every* time by
RelationGetPublicationInfo isn't it? If that is the case shouldn't
CheckCmdReplicaIdentity be doing a pfree(pubinfo)?

==

6) src/backend/executor/execReplication.c - CheckCmdReplicaIdentity
+ pubinfo = RelationGetPublicationInfo(rel);
+
+ /*
+ * if not all columns in the publication row filter are part of the REPLICA
+ * IDENTITY, then it's unsafe to execute it for UPDATE and DELETE.
+ */
+ if (!pubinfo->rfcol_valid_for_replid)
+ {
+ if (cmd == CMD_UPDATE)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+ errmsg("cannot update table \"%s\"",
+ RelationGetRelationName(rel)),
+ errdetail("Not all row filter columns are not part of the REPLICA
IDENTITY")));
+ else if (cmd == CMD_DELETE)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+ errmsg("cannot delete from table \"%s\"",
+ RelationGetRelationName(rel)),
+ errdetail("Not all row filter columns are not part of the REPLICA
IDENTITY")));

The comment seemed worded in a confusingly negative way.

Before:
+ * if not all columns in the publication row filter are part of the REPLICA
+ * IDENTITY, then it's unsafe to execute it for UPDATE and DELETE.

My Suggestion:
It is only safe to execute UPDATE/DELETE when all columns of the
publication row filters are part of the REPLICA IDENTITY.

~~

Also, is "publication row filter" really the correct terminology?
AFAIK it is more like *all* filters for this Relation across multiple
publications, but I have not got a good idea how to word that in a
comment. Anyway, I have a feeling this whole idea might be impacted by
other discussions in this RF thread.

==

7) src/backend/executor/execReplication.c - CheckCmdReplicaIdentity

Error messages have double negative wording? I think Greg already
commented on this same point.

+ errdetail("Not all row filter columns are not part of the REPLICA
IDENTITY")));

==

8) src/backend/executor/execReplication.c - CheckCmdReplicaIdentity

But which are the bad filter columns?

Previously the Row Filter column validation gave errors for the
invalid filter column, but in this top-up patch there is no indicati

Re: enhance pg_log_backend_memory_contexts() to log memory contexts of auxiliary processes

2021-11-27 Thread Bharath Rupireddy
On Sun, Nov 28, 2021 at 12:22 PM vignesh C  wrote:
> > Attaching v4 patch, please review it further.
>
> One small comment:
> 1) There should be a space in between "LOGmessage level"
> +it can) for memory contexts. These memory contexts will be logged at
> +LOGmessage level. They will appear in the server 
> log
> +based on the log configuration set (See  linkend="runtime-config-logging"/>
> +for more information), but will not be sent to the client regardless 
> of

Done.

> The rest of the patch looks good to me.

Thanks for the review. Here's the v5 patch.

Regards,
Bharath Rupireddy.


v5-0001-enhance-pg_log_backend_memory_contexts-to-log-mem.patch
Description: Binary data


Re: Synchronizing slots from primary to standby

2021-11-27 Thread Bharath Rupireddy
On Sun, Oct 31, 2021 at 3:38 PM Peter Eisentraut
 wrote:
>
> I want to reactivate $subject.  I took Petr Jelinek's patch from [0],
> rebased it, added a bit of testing.  It basically works, but as
> mentioned in [0], there are various issues to work out.
>
> The idea is that the standby runs a background worker to periodically
> fetch replication slot information from the primary.  On failover, a
> logical subscriber would then ideally find up-to-date replication slots
> on the new publisher and can just continue normally.
>
> The previous thread didn't have a lot of discussion, but I have gathered
> from off-line conversations that there is a wider agreement on this
> approach.  So the next steps would be to make it more robust and
> configurable and documented.  As I said, I added a small test case to
> show that it works at all, but I think a lot more tests should be added.
>   I have also found that this breaks some seemingly unrelated tests in
> the recovery test suite.  I have disabled these here.  I'm not sure if
> the patch actually breaks anything or if these are just differences in
> timing or implementation dependencies.  This patch adds a LIST_SLOTS
> replication command, but I think this could be replaced with just a
> SELECT FROM pg_replication_slots query now.  (This patch is originally
> older than when you could run SELECT queries over the replication protocol.)
>
> So, again, this isn't anywhere near ready, but there is already a lot
> here to gather feedback about how it works, how it should work, how to
> configure it, and how it fits into an overall replication and HA
> architecture.
>
>
> [0]:
> https://www.postgresql.org/message-id/flat/3095349b-44d4-bf11-1b33-7eefb585d578%402ndquadrant.com

Thanks for working on this patch. This feature will be useful as it
avoids manual intervention during the failover.

Here are some thoughts:
1) Instead of a new LIST_SLOT command, can't we use
READ_REPLICATION_SLOT (slight modifications needs to be done to make
it support logical replication slots and to get more information from
the subscriber).

2) How frequently the new bg worker is going to sync the slot info?
How can it ensure that the latest information exists say when the
subscriber is down/crashed before it picks up the latest slot
information?

3) Instead of the subscriber pulling the slot info, why can't the
publisher (via the walsender or a new bg worker maybe?) push the
latest slot info? I'm not sure we want to add more functionality to
the walsender, if yes, isn't it going to be much simpler?

4) IIUC, the proposal works only for logical replication slots but do
you also see the need for supporting some kind of synchronization of
physical replication slots as well? IMO, we need a better and
consistent way for both types of replication slots. If the walsender
can somehow push the slot info from the primary (for physical
replication slots)/publisher (for logical replication slots) to the
standby/subscribers, this will be a more consistent and simplistic
design. However, I'm not sure if this design is doable at all.

Regards,
Bharath Rupireddy.




Re: enhance pg_log_backend_memory_contexts() to log memory contexts of auxiliary processes

2021-11-27 Thread vignesh C
On Mon, Nov 15, 2021 at 10:27 PM Bharath Rupireddy
 wrote:
>
> On Mon, Nov 15, 2021 at 10:04 PM vignesh C  wrote:
> >
> > On Mon, Nov 15, 2021 at 7:47 AM Bharath Rupireddy
> >  wrote:
> > >
> > > On Fri, Nov 5, 2021 at 11:12 AM Bharath Rupireddy
> > >  wrote:
> > > > PSA v2 patch and review it.
> > >
> > > I've modified the docs part a bit, please consider v3 for review.
> >
> > Thanks for the update patch, Few comments:
> > 1) Should we change "CHECK_FOR_INTERRUPTS()" to
> > "CHECK_FOR_INTERRUPTS() or process specific interrupt handlers"
>
> Done.
>
> > 2) Should we mention Postmaster process also along with logger and
> > statistics collector process
> > +backend or the
> > +WAL sender or 
> > the
> > +auxiliary
> > process
> > +with the specified process ID. All of the
> > +auxiliary
> > processes
> > +are supported except the  > linkend="glossary-logger">logger
> > +and the  > linkend="glossary-stats-collector">statistics collector
> > +as they are not connected to shared memory the function can
> > not make requests.
> > +The backtrace will be logged at LOG message 
> > level.
> > +They will appear in the server log based on the log configuration 
> > set
> > +(See  for more 
> > information),
> > +but will not be sent to the client regardless of
>
> Done.
>
> Attaching v4 patch, please review it further.

One small comment:
1) There should be a space in between "LOGmessage level"
+it can) for memory contexts. These memory contexts will be logged at
+LOGmessage level. They will appear in the server log
+based on the log configuration set (See 
+for more information), but will not be sent to the client regardless of

The rest of the patch looks good to me.

Regards,
Vignesh




Re: pg_waldump stucks with options --follow or -f and --stats or -z

2021-11-27 Thread Bharath Rupireddy
On Sun, Nov 28, 2021 at 9:50 AM Michael Paquier  wrote:
> v4 is just moving this paragraph in its correct subsection.  My
> wording may have been confusing here, sorry about that.  What I meant
> is that there is no need to mention --follow at all, as an
> interruption done before reaching the end LSN or the end of WAL would
> still print out the stats accumulated up to the interrupted point.

Thanks. Here's the v5.

Regards,
Bharath Rupireddy.


v5-0001-pg_waldump-emit-stats-when-interrupted-with-SIGIN.patch
Description: Binary data


Re: SSL Tests for sslinfo extension

2021-11-27 Thread Michael Paquier
On Sat, Nov 27, 2021 at 02:27:19PM -0500, Tom Lane wrote:
> I think testing sslinfo in src/test/ssl is fine, while putting its test
> inside contrib/ would be dangerous, because then the test would be run
> by default.  src/test/ssl is not run by default because the server it
> starts is potentially accessible by other local users, and AFAICS the
> same has to be true for an sslinfo test.

Ah, indeed, good point.  I completely forgot that we'd better control
this stuff with PG_TEST_EXTRA.
--
Michael


signature.asc
Description: PGP signature


Re: Deduplicate code updating ControleFile's DBState.

2021-11-27 Thread Michael Paquier
On Sun, Nov 28, 2021 at 07:53:13AM +0530, Bharath Rupireddy wrote:
> Isn't it better if we update the ControlFile->time at the end of the
> update_controlfile, after file write/sync?

I don't quite understand your point here.  We want to update the
control file's timestamp when it is written, before calculating its
CRC.

> Why do we even need UpdateControlFile which just calls another
> function? It may be there for usability and readability, but can't the
> pg backend code just call update_controlfile(DataDir, ControlFile,
> true); directly so that a function call cost can be avoided?
> Otherwise, why can't we make UpdateControlFile an inline function? I'm
> not sure if any of the compilers will ever optimize by inlining it
> without the "inline" keyword.

I would leave it as-is as UpdateControlFile() is a public API old
enough to vote (a70e74b0).  Anyway, that's a useful wrapper for the
backend.
--
Michael


signature.asc
Description: PGP signature


Re: pg_waldump stucks with options --follow or -f and --stats or -z

2021-11-27 Thread Michael Paquier
On Fri, Nov 26, 2021 at 03:47:30PM +0530, Bharath Rupireddy wrote:
> On Fri, Nov 26, 2021 at 11:50 AM Michael Paquier  wrote:
>> +When --follow is used with --stats and
>> +the pg_waldump is terminated or interrupted
>> +with signal SIGINT or 
>> SIGTERM
>> +or SIGQUIT, the summary statistics computed
>> +as of the termination will be displayed.
>> This description is not completely correct, as the set of stats would
>> show up only by using --stats, in non-quiet mode.  Rather than
>> describing this behavior at the end of the docs, I think that it would
>> be better to add a new paragraph in the description of --stats.
> 
> Done.

v4 is just moving this paragraph in its correct subsection.  My
wording may have been confusing here, sorry about that.  What I meant
is that there is no need to mention --follow at all, as an
interruption done before reaching the end LSN or the end of WAL would
still print out the stats accumulated up to the interrupted point.
--
Michael


signature.asc
Description: PGP signature


Re: Unnecessary delay in streaming replication due to replay lag

2021-11-27 Thread Bharath Rupireddy
On Tue, Nov 23, 2021 at 1:39 AM Soumyadeep Chakraborty
 wrote:
>
> Hi Bharath,
>
> Yes, that thread has been discussed here. Asim had x-posted the patch to [1]. 
> This thread
> was more recent when Ashwin and I picked up the patch in Aug 2021, so we 
> continued here.
> The patch has been significantly updated by us, addressing Michael's long 
> outstanding feedback.

Thanks for the patch. I reviewed it a bit, here are some comments:

1) A memory leak: add FreeDir(dir); before returning.
+ ereport(LOG,
+ (errmsg("Could not start streaming WAL eagerly"),
+ errdetail("There are timeline changes in the locally available WAL files."),
+ errhint("WAL streaming will begin once all local WAL and archives
are exhausted.")));
+ return;
+ }

2) Is there a guarantee that while we traverse the pg_wal directory to
find startsegno and endsegno, the new wal files arrive from the
primary or archive location or old wal files get removed/recycled by
the standby? Especially when wal_receiver_start_condition=consistency?
+ startsegno = (startsegno == -1) ? logSegNo : Min(startsegno, logSegNo);
+ endsegno = (endsegno == -1) ? logSegNo : Max(endsegno, logSegNo);
+ }

3) I think the errmsg text format isn't correct. Note that the errmsg
text starts with lowercase and doesn't end with "." whereas errdetail
or errhint starts with uppercase and ends with ".". Please check other
messages for reference.
The following should be changed.
+ errmsg("Requesting stream from beginning of: %s",
+ errmsg("Invalid WAL segment found while calculating stream start:
%s. Skipping.",
+ (errmsg("Could not start streaming WAL eagerly"),

4) I think you also need to have wal files names in double quotes,
something like below:
errmsg("could not close file \"%s\": %m", xlogfname)));

5) It is ".stream start: \"%s\", skipping..",
+ errmsg("Invalid WAL segment found while calculating stream start:
%s. Skipping.",

4) I think the patch can make the startup process significantly slow,
especially when there are lots of wal files that exist in the standby
pg_wal directory. This is because of the overhead
StartWALReceiverEagerlyIfPossible adds i.e. doing two while loops to
figure out the start position of the
streaming in advance. This might end up the startup process doing the
loop over in the directory rather than the important thing of doing
crash recovery or standby recovery.

5) What happens if this new GUC is enabled in case of a synchronous standby?
What happens if this new GUC is enabled in case of a crash recovery?
What happens if this new GUC is enabled in case a restore command is
set i.e. standby performing archive recovery?

6) How about bgwriter/checkpointer which gets started even before the
startup process (or a new bg worker? of course it's going to be an
overkill) finding out the new start pos for the startup process and
then we could get rid of startup behaviour of the
patch? This avoids an extra burden on the startup process. Many times,
users will be complaining about why recovery is taking more time now,
after the GUC wal_receiver_start_condition=startup.

7) I think we can just have 'consistency' and 'exhaust' behaviours and
let the bgwrite or checkpointer find out the start position for the
startup process, so the startup process whenever reaches a consistent
point, it sees if the other process has calculated
start pos for it or not, if yes it starts wal receiver other wise it
goes with its usual recovery. I'm not sure if this will be a good
idea.

8) Can we have a better GUC name than wal_receiver_start_condition?
Something like wal_receiver_start_at or wal_receiver_start or some
other?

Regards,
Bharath Rupireddy.




Re: Deduplicate code updating ControleFile's DBState.

2021-11-27 Thread Bharath Rupireddy
On Fri, Nov 26, 2021 at 2:48 PM Amul Sul  wrote:
> > ControlFile.state = DB_SHUTDOWNED;
> > -   ControlFile.time = (pg_time_t) time(NULL);
> > This one had better not be removed, either, as we require pg_resetwal
> > to guess a set of control file values.  Removing the one in
> > RewriteControlFile() is fine, on the contrary.
>
> My bad, sorry for the sloppy change, corrected it in the attached version.

Thanks for the patch. By moving the time update to update_controlfile,
the patch ensures that we have the correct last updated time. Earlier
we were missing (in some places) to update the time before calling
UpdateControlFile.

Isn't it better if we update the ControlFile->time at the end of the
update_controlfile, after file write/sync?

Why do we even need UpdateControlFile which just calls another
function? It may be there for usability and readability, but can't the
pg backend code just call update_controlfile(DataDir, ControlFile,
true); directly so that a function call cost can be avoided?
Otherwise, why can't we make UpdateControlFile an inline function? I'm
not sure if any of the compilers will ever optimize by inlining it
without the "inline" keyword.

Regards,
Bharath Rupireddy.




Re: Correct handling of blank/commented lines in PSQL interactive-mode history

2021-11-27 Thread Tom Lane
I wrote:
> We could perhaps finesse that point by deciding that comment lines
> that are handled this way will never be sent to the server --- but
> I'm sure people will complain about that, too.  I've definitely heard
> people complain because "--" comments are stripped from what's sent
> (so I'd look favorably on a patch to undo that).

In hopes of moving this thread forward, I experimented with doing that
bit, basically by simplifying the {whitespace} rule in psqlscan.l
to be just "ECHO;".  That caused massive regression test failures,
of which this'll do as a sample:

 --
 -- This should fail
 --
 copy (select * from test1) (t,id) to stdout;
 ERROR:  syntax error at or near "("
-LINE 1: copy (select * from test1) (t,id) to stdout;
+LINE 4: copy (select * from test1) (t,id) to stdout;
^
 --
 -- Test JOIN

Of course, the problem is that since we're now including the three "--"
lines in what's sent to the server, it thinks the "copy" is on line 4.
I do not think we want such a behavior change: people don't tend to
think that such comments are part of the query.

I then experimented with combining the psqlscan.l change with mainloop.c
changes akin to what Greg had proposed, so as to discard leading comments
at the level of mainloop.c rather than inside the lexer.  I didn't have
much luck getting to a behavior that I thought could be acceptable,
although maybe with more sweat it'd be possible.

One thing I noticed along the line is that because the history mechanism
records raw input lines while psqlscan.l discards dash-dash comments,
it's already the case that history entries don't match up too well with
what's sent to the server.  So I'm not sure that my upthread complaint
about that holds water, and I'm less against Greg's original patch than
I was.

Trying to gather together the various issues mentioned on this thread,
I see:

* Initial input lines that are blank (whitespace, maybe including a
comment) are merged into the next command's history entry; but since
said lines don't give rise to any text sent to the server, there's
not really any reason why they couldn't be treated as text to be
emitted to the history file immediately.  This is what Greg originally
set out to change.  After my experiments mentioned above, I'm quite
doubtful that his patch is correct in detail (I'm afraid that it
probably emits stuff too soon in some cases), but it could likely be
fixed if we could just get agreement that a change of that sort is OK.

* It's not great that dash-dash comments aren't included in what we
send to the server.  However, changing that is a lot trickier than
it looks.  I think we want to continue suppressing comments that
precede the query proper.  Including comments that are within the
query text (ahead of the trailing semi) is not so hard, but comments
following the semicolon look too much like comments-ahead-of-the-
next-query.  Perhaps that issue should be left for another day ...
although it does feel like it interacts with the first point.

* Misbehavior of M-# was also mentioned.  Does anyone object to
the draft patch I posted for that?

regards, tom lane




Re: rand48 replacement

2021-11-27 Thread Tom Lane
Fabien COELHO  writes:
>> How did Knuth get into this?  This algorithm is certainly not his,
>> so why are those constants at all relevant?

> They are not more nor less relevant than any other "random" constant. The 
> state needs a default initialization. The point of using DK's is that it 
> is somehow cannot be some specially crafted value which would have some 
> special property only know to the purveyor of the constant and could be 
> used by them to break the algorithm.

Well, none of that is in the comment, which is probably just as well
because it reads like baseless paranoia.  *Any* initialization vector
should be as good as any other; if it's not, that's an algorithm fault.
(OK, I'll give it a pass for zeroes being bad, but otherwise not.)

>> * Function names like these convey practically nothing to readers:
>> 
>> +extern int64 pg_prng_i64(pg_prng_state *state);
>> +extern uint32 pg_prng_u32(pg_prng_state *state);
>> +extern int32 pg_prng_i32(pg_prng_state *state);
>> +extern double pg_prng_f64(pg_prng_state *state);
>> +extern bool pg_prng_bool(pg_prng_state *state);

> The intention is obviously "postgres pseudo-random number generator for 
> ". ISTM that it conveys (1) that it is a postgres-specific stuff, 
> (2) that it is a PRNG (which I find *MUCH* more informative than the 
> misleading statement that something is random when it is not, and it is 
> shorter) and (3) about the type it returns, because C does require 
> functions to have distinct names.

> What would you suggest?

We have names for these types, and those abbreviations are (mostly)
not them.  Name-wise I'd be all right with pg_prng_int64 and so on,
but I still expect that these functions' header comments should be
explicit about uniformity and about the precise output range.
As an example, it's far from obvious whether the minimum value
of pg_prng_int32 should be zero or INT_MIN.  (Actually, I suspect
you ought to provide both of those cases.)  And the output range
of pg_prng_float8 is not merely unobvious, but not very easy
to deduce from examining the code either; not that users should
have to.

>> BTW, why are we bothering with FIRST_BIT_MASK in the first place,
>> rather than returning "v & 1" for pg_prng_bool?

> Because some PRNG are very bad in the low bits, not xoroshiro stuff, 
> though.

Good, but then you shouldn't write associated code as if that's still
a problem, because you'll cause other people to think it's still a
problem and write equally contorted code elsewhere.  "v & 1" is a
transparent way of producing a bool, while this code isn't.

regards, tom lane




Re: SSL Tests for sslinfo extension

2021-11-27 Thread Tom Lane
Daniel Gustafsson  writes:
> On 17 Jun 2021, at 09:29, Michael Paquier  wrote:
>> Hmm.  Adding internal dependencies between the tests should be avoided
>> if we can.  What would it take to move those TAP tests to
>> contrib/sslinfo instead?  This is while keeping in mind that there was
>> a patch aimed at refactoring the SSL test suite for NSS.

> It would be quite invasive as we currently don't provide the SSLServer test
> harness outside of src/test/ssl, and given how tailored it is today I'm not
> sure doing so without a rewrite would be a good idea.

I think testing sslinfo in src/test/ssl is fine, while putting its test
inside contrib/ would be dangerous, because then the test would be run
by default.  src/test/ssl is not run by default because the server it
starts is potentially accessible by other local users, and AFAICS the
same has to be true for an sslinfo test.

So I don't have any problem with this structurally, but I do have a
few nitpicks:

* I think the error message added in 0001 should complain about
missing password "encryption" not "encoding", no?

* 0002 hasn't been updated for the great PostgresNode renaming.

* 0002 needs to extend src/test/ssl/README to mention that
"make installcheck" requires having installed contrib/sslinfo,
analogous to similar comments in (eg) src/test/recovery/README.

* 0002 writes a temporary file in the source tree.  This is bad;
for one thing I bet it fails under VPATH, but in any case there
is no reason to risk it.  Put it in the tmp_check directory instead
(cf temp kdc files in src/test/kerberos/t/001_auth.pl).  That's
safer and you needn't worry about cleaning it up.

* Hmm ... now I notice that you borrowed the key-file-copying logic
from the 001 and 002 tests, but it's just as bad practice there.
We should fix them too.

* I ran a code-coverage check and it shows that this doesn't test
ssl_issuer_field() or any of the following functions in sslinfo.c.
I think at least ssl_extension_info() is complicated enough to
deserve a test.

regards, tom lane




Re: rand48 replacement

2021-11-27 Thread Fabien COELHO



Hello Tom,

Thanks for the feedback.


+/* use Donald Knuth's LCG constants for default state */

How did Knuth get into this?  This algorithm is certainly not his,
so why are those constants at all relevant?


They are not more nor less relevant than any other "random" constant. The 
state needs a default initialization. The point of using DK's is that it 
is somehow cannot be some specially crafted value which would have some 
special property only know to the purveyor of the constant and could be 
used by them to break the algorithm.


  https://en.wikipedia.org/wiki/Dual_EC_DRBG


* I could do without the stream-of-consciousness notes in pg_prng.c.
I think what's appropriate is to say "We use thus-and-such a generator
which is documented here", maybe with a line or two about its properties.


The stuff was really written essentially as a "why this" for the first 
patch, and to prevent questions about "why not this other generator" 
later, because it could never stop.



* Function names like these convey practically nothing to readers:

+extern int64 pg_prng_i64(pg_prng_state *state);
+extern uint32 pg_prng_u32(pg_prng_state *state);
+extern int32 pg_prng_i32(pg_prng_state *state);
+extern double pg_prng_f64(pg_prng_state *state);
+extern bool pg_prng_bool(pg_prng_state *state);


The intention is obviously "postgres pseudo-random number generator for 
". ISTM that it conveys (1) that it is a postgres-specific stuff, 
(2) that it is a PRNG (which I find *MUCH* more informative than the 
misleading statement that something is random when it is not, and it is 
shorter) and (3) about the type it returns, because C does require 
functions to have distinct names.


What would you suggest?


and these functions' header comments add a grand total of zero bits
of information.


Yes, probably. I do not like not to comment at all on a function.

What someone generally wants to know first about a PRNG is (a) is it 
uniform and (b) what is the range of outputs, neither of which are 
specified anywhere.


ISTM (b) is suggested thanks to the type and (a) I'm not sure about a PRNG 
which would claim not at least claim to be uniform. Non uniform PRNG are 
usually built based on a uniform one.


What do you suggest as alternate names?


+#define FIRST_BIT_MASK UINT64CONST(0x8000)
+#define RIGHT_HALF_MASK UINT64CONST(0x)
+#define DMANTISSA_MASK UINT64CONST(0x000F)

I'm not sure that these named constants are any more readable than
writing the underlying constant, maybe less so --- in particular
I think something based on (1<<52)-1 would be more appropriate for
the float mantissa operations.  We don't need RIGHT_HALF_MASK at
all, the casts to uint32 or int32 will accomplish that just fine.


Yep. I did it for uniformity.


BTW, why are we bothering with FIRST_BIT_MASK in the first place,
rather than returning "v & 1" for pg_prng_bool?


Because some PRNG are very bad in the low bits, not xoroshiro stuff, 
though.


Is xoroshiro128ss less random in the low-order bits than the higher? 
If so, that would be a pretty important thing to document.  If it's not, 
we shouldn't make the code look like it is.


Dunno. Why should we prefer low bits?


+ * select in a range with bitmask rejection.

What is "bitmask rejection"?  Is it actually important to callers?


No, it is important to understand how it does it. That is the name of the 
technique which is implemented, which helps if you want to understand what 
is going on by googling it. This point could be moved inside the function.



I think this should be documented more like "Produce a random
integer uniformly selected from the range [rmin, rmax)."


Sure.

--
Fabien.




Re: Non-superuser subscription owners

2021-11-27 Thread Mark Dilger



> On Nov 24, 2021, at 4:30 PM, Jeff Davis  wrote:
> 
> We need to do permission checking for WITH CHECK OPTION and RLS. The
> patch right now allows the subscription to write data that an RLS
> policy forbids.

Thanks for reviewing and for this observation!  I can verify that RLS is not 
being honored on the subscriber side.  I agree this is a problem for 
subscriptions owned by non-superusers.

The implementation of the table sync worker uses COPY FROM, which makes this 
problem hard to fix, because COPY FROM does not support row level security.  We 
could do some work to honor the RLS policies during the apply workers' INSERT 
statements, but then some data would circumvent RLS during table sync and other 
data would honor RLS during worker apply, which would make the implementation 
not only wrong but inconsistently so.

I think a more sensible approach for v15 is to raise an ERROR if a 
non-superuser owned subscription is trying to replicate into a table which has 
RLS enabled.  We might try to be more clever and check whether the RLS policies 
could possibly reject the operation (by comparing the TO and FOR clauses of the 
policies against the role and operation type) but that seems like a partial 
re-implementation of RLS.  It would be simpler and more likely correct if we 
just unconditionally reject replicating into tables which have RLS enabled.

What do you think?

> A couple other points:
> 
> * We shouldn't refer to the behavior of previous versions in the docs
> unless there's a compelling reason

Fair enough.

> * Do we need to be smarter about partitioned tables, where an insert
> can turn into an update?

Do you mean an INSERT statement with an ON CONFLICT DO UPDATE clause that is 
running against a partitioned table?  If so, I don't think we need to handle 
that on the subscriber side under the current logical replication design.  I 
would expect the plain INSERT or UPDATE that ultimately executes on the 
publisher to be what gets replicated to the subscriber, and not the original 
INSERT .. ON CONFLICT DO UPDATE statement.

> * Should we refactor to borrow logic from ExecInsert so that it's less
> likely that we miss something in the future?

Hooking into the executor at a higher level, possibly ExecInsert or 
ExecModifyTable would do a lot more than what logical replication currently 
does.  If we also always used INSERT/UPDATE/DELETE statements and never COPY 
FROM statements, we might solve several problems at once, including honoring 
RLS policies and honoring rules defined for the target table on the subscriber 
side.

Doing this would clearly be a major design change, and possibly one we do not 
want.  Can we consider this out of scope?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Enforce work_mem per worker

2021-11-27 Thread Justin Pryzby
On Sat, Nov 27, 2021 at 04:33:07PM +, Arne Roland wrote:
> Hello!
> 
> Since I used a lot of my time chasing short lived processes eating away big 
> chunks of memory in recent weeks, I am wondering about a decent way to go 
> about this.
> The problem I am facing essentially relates to the fact that work_mem 
> settings, while they are enforced per hash and sort node, aren't enforced 
> globally.
> One common case, that causes this problem more frequently than a few years 
> ago, is the partitionwise_join. If there are a lot of partitions hash joined, 
> we get a lot of hash nodes, each one potentially consuming work_mem.

> While avoiding oom seems a big deal to me, my search didn't turn up previous 
> hackers discussions about this. There is a good chance I am missing something 
> here, so I'd appreciate any pointers.

Here's some pointers ;)

https://www.postgresql.org/message-id/flat/20190708164401.GA22387%40telsasoft.com
https://www.postgresql.org/message-id/flat/20191216215314.qvrtgxaz4m755geq%40development#75e9930ac2cd353a8036dc71e8f5e6f7
https://www.postgresql.org/message-id/flat/CAH2-WzmNwV%3DLfDRXPsmCqgmm91mp%3D2b4FvXNF%3DcCvMrb8YFLfQ%40mail.gmail.com
 - I don't recall reading all of this last one before, and it's got interesting
   historic value, so I'm reading it myself now...

-- 
Justin




Enforce work_mem per worker

2021-11-27 Thread Arne Roland
Hello!

Since I used a lot of my time chasing short lived processes eating away big 
chunks of memory in recent weeks, I am wondering about a decent way to go about 
this.
The problem I am facing essentially relates to the fact that work_mem settings, 
while they are enforced per hash and sort node, aren't enforced globally.
One common case, that causes this problem more frequently than a few years ago, 
is the partitionwise_join. If there are a lot of partitions hash joined, we get 
a lot of hash nodes, each one potentially consuming work_mem.

While avoiding oom seems a big deal to me, my search didn't turn up previous 
hackers discussions about this. There is a good chance I am missing something 
here, so I'd appreciate any pointers.

The most reasonable solution seems to me to have a data structure per worker, 
that 1. tracks the amount of memory used by certain nodes and 2. offers a 
callback to let the node spill it's contents (almost) completely to disc. I am 
thinking about hash and sort nodes for now, since they affect memory usage a 
lot.
This would allow a node to spill other nodes contents to disc to avoid 
exceeding work_mem.

I'd love to hear your thoughts and suggestions!

Regards
Arne



Re: Windows build warnings

2021-11-27 Thread Andrew Dunstan


On 11/26/21 15:14, Daniel Gustafsson wrote:
>> On 26 Nov 2021, at 20:33, Tom Lane  wrote:
>>
>> I think our policy is to suppress unused-variable warnings if they
>> appear on current mainstream compilers; and it feels a little churlish
>> to deem MSVC non-mainstream.  So I stick with my previous suggestion,
>> which basically was to disable C4101 until such time as somebody can
>> make PG_USED_FOR_ASSERTS_ONLY work correctly on MSVC.  In the worst
>> case, that might lead a Windows-based developer to submit a patch that
>> draws warnings elsewhere ... but the cfbot, other developers, or the
>> buildfarm will find such problems soon enough.
> I agree with that, and can go make that happen.
>

[trust I have attributions right]


ISTM the worst case is that there will be undetected unused variables in
Windows-only code. I guess that would mostly be detected by Msys systems
running gcc.


Anyway I don't think it's worth arguing a lot about.


cheers


andrew


-- 

Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: pg_upgrade and publication/subscription problem

2021-11-27 Thread Marcos Pegoraro
>
> I think we don't want to make assumptions about the remote end being
> the same after the upgrade and we let users reactivate the
> subscriptions in a suitable way. See [1] (Start reading from "..When
> dumping logical replication subscriptions..") In your case, if you
> wouldn't have allowed new tables in the publication then a simple
> Alter Subscription  Refresh Publication with (copy_data =
> false) would have served the purpose.


I understand that this is not related with version 14, pg_upgrade would do
the same steps on previous versions too.
Additionally it would be interesting to document that pg_upgrade does not
upgrade completely if the server is a subscriber of logical replication, so
it´ll have pre and post steps to do if the server has this kind of
replication.


Re: pg_upgrade and publication/subscription problem

2021-11-27 Thread Amit Kapila
On Fri, Nov 26, 2021 at 5:47 PM Marcos Pegoraro  wrote:
>>
>> AFAIU the main problem in your case is that you didn't block the write
>> traffic on the publisher side. Let me try to understand the situation.
>> After the upgrade is finished, there are some new tables with data on
>> the publisher, and did old tables have any additional data?
>
> Correct.
>>
>>
>> Are the contents in pg_replication_origin intact after the upgrade?
>
> Yes
>>
>>
>> So, in short, I think what we need to solve is to get the data from
>> new tables and newly performed writes on old tables. I could think of
>> the following two approaches:
>>
>> Approach-1:
>> 1. Drop subscription and Truncate all tables corresponding to subscription.
>>
>> 2. Create a new subscription for the publication.
>
> If I drop subscription it will drop WAL ou publication side and I lost all 
> changed data between the starting of pg_upgrade process and now.
>

I think you can disable the subscription as well or before dropping
disassociate the slot from subscription.

> My problem is not related with new tables, they will be copied fine because 
> doesn´t exists any record on subscriber.
> But old tables had records modified since that pg_upgrade process, that is my 
> problem, only that.
>

Yeah, I understand that point. Here, the problem is that both old and
new tables belong to the same publication, and you can't refresh some
tables from the publication.

> My question remains the same, why pg_subscription_rel was not copied from 
> previous version ?
>
> If pg_upgrade would copy pg_replication_origin (it did) and these 
> pg_subscription_rel (it didn´t) records from version 13 to 14, when I enable 
> subscription it would start copying data from that point on, correct ?
>

I think we don't want to make assumptions about the remote end being
the same after the upgrade and we let users reactivate the
subscriptions in a suitable way. See [1] (Start reading from "..When
dumping logical replication subscriptions..") In your case, if you
wouldn't have allowed new tables in the publication then a simple
Alter Subscription  Refresh Publication with (copy_data =
false) would have served the purpose.

BTW, just for records, this problem has nothing to do with any changes
in PG-14, the same behavior exists in the previous versions as well.

[1] - https://www.postgresql.org/docs/devel/app-pgdump.html
-- 
With Regards,
Amit Kapila.




Re: Windows: Wrong error message at connection termination

2021-11-27 Thread Lars Kanis

Am 22.11.21 um 00:04 schrieb Tom Lane:

Do we know that that actually happens in an arm's-length connection
(ie two separate machines)?  I wonder if the data loss is strictly
an artifact of a localhost connection.  There'd be a lot more pressure
on them to make cross-machine TCP work per spec, one would think.
But in any case, if we can avoid sending RST in this situation,
it seems mostly moot for our usage.


Sorry it took some days to get a setup to check this!

The result is as expected:

1. Windows client to Linux server works without dropping the error message
2. Linux client to Windows server works without dropping the error message
3. Windows client to remote Windows server drops the error message,
   depending on the timing of the event loop

In 1. the Linux server doesn't end the connection with a RST packet, so 
that the Windows client enqueues the error message properly and doesn't 
drop it.


In 2. the Linux client doesn't care about the RST packet of the Windows 
server and properly enqueues and raises the error message.


In 3. the combination of the bad RST behavior of client and server leads 
to data loss. It depends on the network timing. A delay of 0.5 ms in the 
event loop was enough in a localhost setup and as wall as in some LAN 
setup. On the contrary over some slower WLAN connection a delay of less 
than 15 ms did not loose data, but higher delays still did.


The idea of running a second process, pass the socket handle to it, 
observe the parent process and close the socket when it exited, could 
work, but I guess it's overly complicated and creates more issues than 
it solves. Probably the same if the master process handles the socket 
closing.


So I still think it's best to close the socket as proposed in the patch.

--

Regards,
Lars Kanis






Re: Skipping logical replication transactions on subscriber side

2021-11-27 Thread Amit Kapila
On Fri, Nov 26, 2021 at 7:45 AM tanghy.f...@fujitsu.com
 wrote:
>
> On Friday, November 26, 2021 9:30 AM Masahiko Sawada  
> wrote:
> >
> > Indeed. Attached an updated patch. Thanks!
>
> Thanks for your patch. A small comment:
>
> +   OID of the relation that the worker is synchronizing; null for the
> +   main apply worker
>
> Should we modify it to "OID of the relation that the worker was synchronizing 
> ..."?
>

I don't think this change is required, see the description of the
similar column in pg_stat_subscription.

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2021-11-27 Thread Amit Kapila
On Fri, Nov 26, 2021 at 6:00 AM Masahiko Sawada  wrote:
>
> Indeed. Attached an updated patch. Thanks!
>

I have made a number of changes in the attached patch which includes
(a) the patch was trying to register multiple array entries for the
same subscription which doesn't seem to be required, see changes in
pgstat_vacuum_stat, (b) multiple changes in the test like reduced the
wal_retrieve_retry_interval to 2s which has reduced the test time to
half, remove the check related to resetting of stats as there is no
guarantee that the message will be received by the collector and we
were not sending it again, changed the test case file name to
026_stats as we can add more subscription-related stats in this test
file itself (c) added/edited multiple comments, (d) updated
PGSTAT_FILE_FORMAT_ID.

Do let me know what you think of the attached?

-- 
With Regards,
Amit Kapila.


v27-0001-Add-a-view-to-show-the-stats-of-subscription-wor.patch
Description: Binary data