Re: Improve error handling of HMAC computations and SCRAM

2022-01-10 Thread Michael Paquier
On Tue, Jan 11, 2022 at 10:50:50AM +0300, Sergey Shinderuk wrote:
> A few comments after a quick glance...

Thanks!

> + * Returns a static string providing errors about an error that happened
> 
> "errors about an error" looks odd.

Sure, that could be reworded.  What about "providing details about an
error"?

> We already have SSLerrmessage elsewhere and it's documented to never return
> NULL.  I find that confusing.

This name is chosen on purpose.  There could be some refactoring done
with those things.

> If I have two distinct pg_hmac_ctx's, are their errreason's idependent from
> one another or do they really point to the same static buffer?

Each errreason could be different, as each computation could fail for
a different reason.  If they fail for the same reason, they would
point to the same error context strings.
--
Michael


signature.asc
Description: PGP signature


Re: Windows crash / abort handling

2022-01-10 Thread Andres Freund
Hi,

On 2022-01-10 10:57:00 -0500, Andrew Dunstan wrote:
> On 10/5/21 15:30, Andres Freund wrote
> > The above ends up dumping all crashes into a single file, but that can
> > probably be improved. But cdb is so gnarly that I wanted to stop looking 
> > once
> > I got this far...

FWIW, I figured out how to put the dumps into separate files by now...


> > Andrew, I wonder if something like this could make sense for windows BF 
> > animals?

> Very possibly. I wonder how well it will work on machines where I have
> more than one animal .e.g. lorikeet (cygwin) jacana (msys) and bowerbird
> (MSVC) are all on the same machine. Likewise drongo (MSVC) and fairywren
> (msys2).

Hm. I can see a few ways to deal with it. Are they running concurrently?
If not then it's easy enough to deal with.

It'd be a bit of a fight with cdb's awfully documented and quirky
scripting [1], but the best solution would probably be to just use an
environment variable from the target process to determine the dump
location. Then each buildfarm config could set a BF_BACKTRACE_LOCATION
variable or such...

[1] So there's !envvar. But that yields a string like
BF_BACKTRACE_LOCATION = value of environment variable when set to an
alias.  And I haven't found an easy way to get rid of the "variablename
= ". There is .foreach /pS [2] which could be used to skip over the
varname =, but that then splits on all whitespaces. Gah.

[2] https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/-foreach

Greetings,

Andres Freund




Re: Improve error handling of HMAC computations and SCRAM

2022-01-10 Thread Sergey Shinderuk

Hi,

On 11.01.2022 07:56, Michael Paquier wrote:
> Thoughts?

A few comments after a quick glance...

+ * Returns a static string providing errors about an error that happened

"errors about an error" looks odd.


+static const char *
+SSLerrmessage(unsigned long ecode)
+{
+   if (ecode == 0)
+   return NULL;
+
+   /*
+* This may return NULL, but we would fall back to a default error path 
if
+* that were the case.
+*/
+   return ERR_reason_error_string(ecode);
+}

We already have SSLerrmessage elsewhere and it's documented to never 
return NULL.  I find that confusing.


If I have two distinct pg_hmac_ctx's, are their errreason's idependent 
from one another or do they really point to the same static buffer?



Regards,

--
Sergey Shinderukhttps://postgrespro.com/




Re: pg_dump/restore --no-tableam

2022-01-10 Thread Michael Paquier
On Mon, Jan 03, 2022 at 03:44:24PM -0600, Justin Pryzby wrote:
> + 
> + 
> +  --no-table-am
> +  
> +   
> +Do not output commands to select table access methods.
> +With this option, all objects will be created with whichever
> +table access method is the default during restore.
> +   

Hmm.  --no-table-am may not be the best choice.  Should this be called
--no-table-access-method instead?

> - no_toast_compression => {
> - dump_cmd => [
> - 'pg_dump', '--no-sync',
> - "--file=$tempdir/no_toast_compression.sql",
> - '--no-toast-compression', 'postgres',
> - ],
> - },

Why is this command moved down?
--
Michael


signature.asc
Description: PGP signature


Re: [Ext:] Re: Stream Replication not working

2022-01-10 Thread Sushant Postgres
Hi All,

for us also, logs are applying at slave server but very very slow. While
checking we also have seen same set of locks to Master and Slave servers.
Please suggest the solution for that.
Many Thanks in Advance !!
Thanks

On Tue, Jan 11, 2022 at 2:12 AM Allie Crawford <
crawfor...@churchofjesuschrist.org> wrote:

> Thank you so much for your help on this Satya. I have detailed right below
> the output of the query you asked me to run.
>
>
>
> *Master *
>
> postgresql@ ~>psql
>
> psql (13.5)
>
> Type "help" for help.
>
>
>
> postgresql=# select * from pg_locks;
>
>   locktype  | database | relation | page | tuple | virtualxid |
> transactionid | classid | objid | objsubid | virtualtransaction |   pid
> |  mode   | granted | fastpath
>
>
> +--+--+--+---++---+-+---+--++-+-+-+--
>
>  relation   |16384 |12141 |  |   ||
> | |   |  | 3/6715 | 2669949 |
> AccessShareLock | t   | t
>
>  virtualxid |  |  |  |   | 3/6715 |
> | |   |  | 3/6715 | 2669949 |
> ExclusiveLock   | t   | t
>
> (2 rows)
>
>
>
> postgresql=#
>
>
>
>
>
> *Standby*
>
> postgresql@ ~>psql
>
> psql (13.5)
>
> Type "help" for help.
>
>
>
> postgresql=# select * from pg_locks;
>
>   locktype  | database | relation | page | tuple | virtualxid |
> transactionid | classid | objid | objsubid | virtualtransaction |  pid   |
> mode   | granted | fastpath
>
>
> +--+--+--+---++---+-+---+--+++-+-+--
>
>  relation   |16384 |12141 |  |   ||
> | |   |  | 2/50   | 642064 |
> AccessShareLock | t   | t
>
>  virtualxid |  |  |  |   | 2/50   |
> | |   |  | 2/50   | 642064 |
> ExclusiveLock   | t   | t
>
>  virtualxid |  |  |  |   | 1/1|
> | |   |  | 1/0|  17333 |
> ExclusiveLock   | t   | t
>
> (3 rows)
>
>
>
> postgresql=#
>
>
>
>
>
>
>
>
>
> *From: *SATYANARAYANA NARLAPURAM 
> *Date: *Monday, January 10, 2022 at 1:06 PM
> *To: *Allie Crawford 
> *Cc: *pgsql-hackers@lists.postgresql.org <
> pgsql-hackers@lists.postgresql.org>
> *Subject: *[Ext:] Re: Stream Replication not working
> [External Email]
>
> Seems there is a problem with the replay on your standby. Either it is too
> slow or stuck behind some locks ( replay_lag of 20:38:47.00904 indicates
> this and the flush_lsn is the same as lsn on primary ) . Run pg_locks to
> see if the replay is stuck behind a lock.
>
>
>
>
>
>
>
> On Mon, Jan 10, 2022 at 11:53 AM Allie Crawford <
> crawfor...@churchofjesuschrist.org> wrote:
>
> Hi All,
>
> I have implemented Stream replication in one of my environments, and for
> some reason even though all the health checks are showing that the
> replication is working, when I run manual tests to see if changes are being
> replicated, the changes are not replicated to the standby postgresql
> environment. I have been researching for two day and I cannot find any
> documentation that talks about the case I am running into. I will
> appreciate if anybody could take a look at the details I have detailed
> below and give me some guidance on where the problem might be that is
> preventing my changes for being replicated. Even though I was able to
> instantiate the standby while firewalld was enabled, I decided to disable
> it just in case that it was causing any issue to the manual changes, but
> disabling firewalld has not had any effect, I am still not able to get the
> manual changes test to be replicated to the standby site. As you will see
> in the details below, the streaming is working, both sites are in sync to
> the latest WAL but for some reasons the latest changes are not on the
> standby site. How is it possible that the standby site is completely in
> sync but yet does not contain the latest changes?
>
>
>
> Thanks in advance for any help you can give me with this problem.
>
>
>
> Regards,
>
> Allie
>
>
>
> *Details:*
>
>
>
> *Master **postgresql Environment*
>
> postgresql=# select * from pg_stat_replication;
>
> -[ RECORD 1 ]+--
>
> pid  | 1979089
>
> usesysid | 16404
>
> usename  | replacct
>
> application_name | walreceiver
>
> client_addr  | 
>
> client_hostname  | 
>
> client_port  | 55096
>
> backend_start| 2022-01-06 17:29:51.542784-07
>
> backend_xmin |
>
> state| streaming
>
> sent_lsn | 0/35000788
>
> write_lsn| 0/35000788
>
> flush_lsn| 0/35000788
>
> replay_lsn   | 0/31000500
>
> write_lag| 

Re: pg_upgrade should truncate/remove its logs before running

2022-01-10 Thread Michael Paquier
On Sat, Jan 08, 2022 at 12:48:57PM -0600, Justin Pryzby wrote:
> I fixed it by calling get_restricted_token() before parseCommandLine().
> There's precedent for that in pg_regress (but the 3 other callers do it
> differently).
>
> It seems more ideal to always call get_restricted_token sooner than later, but
> for now I only changed pg_upgrade.  It's probably also better if
> parseCommandLine() only parses the commandline, but for now I added on to the
> logfile stuff that's already there.
> 
Well, the routine does a bit more than just parsing the options as it
creates the directory infrastructure as well.  As you say, I think
that it would be better to have the option parsing and the
loading-into-structure portions in one routine, and the creation of
the paths in a second one.  So, the new contents of the patch could
just be moved in a new routine, after getting the restricted token.
Moving get_restricted_token() before or after the option parsing as
you do is not a big deal, but your patch is introducing in the
existing routine more than what's currently done there as of HEAD.

> Maybe the commandline argument should be callled something other than "logdir"
> since it also outputs dumps there.  But the dumps are more or less not
> user-facing.  But -d and -o are already used.  Maybe it shouldn't be
> configurable at all?

If the choice of a short option becomes confusing, I'd be fine with
just a long option, but -l is fine IMO.  Including the internal dumps
in the directory is fine to me, and using a subdir, as you do, makes
things more organized.

- "--binary-upgrade %s -f %s",
+ "--binary-upgrade %s -f %s/dump/%s",
Some quotes seem to be missing here.

static void
cleanup(void)
{
+   int dbnum;
+   char  **filename;
+   charfilename_path[MAXPGPATH];
[...] 
+   if (rmdir(filename_path))
+   pg_log(PG_WARNING, "failed to rmdir: %s: %m\n",
filename_path);
+
+   if (rmdir(log_opts.basedir))
+   pg_log(PG_WARNING, "failed to rmdir: %s: %m\n", log_opts.basedir);

Is it intentional to not use rmtree() here?  If you put all the data
in the same directory, cleanup() gets simpler.
--
Michael


signature.asc
Description: PGP signature


Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set

2022-01-10 Thread Michael Paquier
On Wed, Jan 05, 2022 at 05:12:41PM +0900, Michael Paquier wrote:
> Rebased patch to cool down the CF bot, as per the addition of
> --no-sync to pg_upgrade.

The CF bot is unhappy, so here is a rebase, with some typo fixes
reported by Justin offlist.
--
Michael
From 1d2dc68b8c27d9c328276ed21e57290f64bca586 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 11 Jan 2022 16:13:02 +0900
Subject: [PATCH v6] Switch tests of pg_upgrade to use TAP

---
 src/bin/pg_upgrade/.gitignore|   5 +
 src/bin/pg_upgrade/Makefile  |  23 +-
 src/bin/pg_upgrade/TESTING   |  33 ++-
 src/bin/pg_upgrade/t/001_basic.pl|   9 +
 src/bin/pg_upgrade/t/002_pg_upgrade.pl   | 275 ++
 src/bin/pg_upgrade/test.sh   | 279 ---
 src/test/perl/PostgreSQL/Test/Cluster.pm |  25 ++
 src/tools/msvc/vcregress.pl  |  94 +---
 8 files changed, 355 insertions(+), 388 deletions(-)
 create mode 100644 src/bin/pg_upgrade/t/001_basic.pl
 create mode 100644 src/bin/pg_upgrade/t/002_pg_upgrade.pl
 delete mode 100644 src/bin/pg_upgrade/test.sh

diff --git a/src/bin/pg_upgrade/.gitignore b/src/bin/pg_upgrade/.gitignore
index 2d3bfeaa50..3b64522ab6 100644
--- a/src/bin/pg_upgrade/.gitignore
+++ b/src/bin/pg_upgrade/.gitignore
@@ -7,3 +7,8 @@
 /loadable_libraries.txt
 /log/
 /tmp_check/
+
+# Generated by pg_regress
+/sql/
+/expected/
+/results/
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index 44d06be5a6..35b6c123a5 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -28,6 +28,12 @@ OBJS = \
 override CPPFLAGS := -DDLSUFFIX=\"$(DLSUFFIX)\" -I$(srcdir) -I$(libpq_srcdir) $(CPPFLAGS)
 LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
 
+# required for 002_pg_upgrade.pl
+REGRESS_SHLIB=$(abs_top_builddir)/src/test/regress/regress$(DLSUFFIX)
+export REGRESS_SHLIB
+REGRESS_OUTPUTDIR=$(abs_top_builddir)/src/bin/pg_upgrade
+export REGRESS_OUTPUTDIR
+
 all: pg_upgrade
 
 pg_upgrade: $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils
@@ -49,17 +55,8 @@ clean distclean maintainer-clean:
 	   pg_upgrade_dump_globals.sql \
 	   pg_upgrade_dump_*.custom pg_upgrade_*.log
 
-# When $(MAKE) is present, make automatically infers that this is a
-# recursive make. which is not actually what we want here, as that
-# e.g. prevents output synchronization from working (as make thinks
-# that the subsidiary make knows how to deal with that itself, but
-# we're invoking a shell script that doesn't know). Referencing
-# $(MAKE) indirectly avoids that behaviour.
-# See https://www.gnu.org/software/make/manual/html_node/MAKE-Variable.html#MAKE-Variable
-NOTSUBMAKEMAKE=$(MAKE)
+check:
+	$(prove_check)
 
-check: test.sh all temp-install
-	MAKE=$(NOTSUBMAKEMAKE) $(with_temp_install) bindir=$(abs_top_builddir)/tmp_install/$(bindir) EXTRA_REGRESS_OPTS="$(EXTRA_REGRESS_OPTS)" $(SHELL) $<
-
-# installcheck is not supported because there's no meaningful way to test
-# pg_upgrade against a single already-running server
+installcheck:
+	$(prove_installcheck)
diff --git a/src/bin/pg_upgrade/TESTING b/src/bin/pg_upgrade/TESTING
index 78b9747908..43a71566e2 100644
--- a/src/bin/pg_upgrade/TESTING
+++ b/src/bin/pg_upgrade/TESTING
@@ -2,21 +2,30 @@ THE SHORT VERSION
 -
 
 On non-Windows machines, you can execute the testing process
-described below by running
+described below by running the following command in this directory:
 	make check
-in this directory.  This will run the shell script test.sh, performing
-an upgrade from the version in this source tree to a new instance of
-the same version.
 
-To test an upgrade from a different version, you must have a built
-source tree for the old version as well as this version, and you
-must have done "make install" for both versions.  Then do:
+This will run the TAP tests to run pg_upgrade, performing an upgrade
+from the version in this source tree to a  new instance of the same
+version.
+
+To test an upgrade from a different version, there are two options
+available:
+
+1) You have a built source tree for the old version as well as this
+version's binaries.  Then set up the following variables before
+launching the test:
 
 export oldsrc=...somewhere/postgresql	(old version's source tree)
-export oldbindir=...otherversion/bin	(old version's installed bin dir)
-export bindir=...thisversion/bin	(this version's installed bin dir)
-export libdir=...thisversion/lib	(this version's installed lib dir)
-sh test.sh
+export oldinstall=...otherversion/	(old version's install base path)
+
+2) You have a dump that can be used to set up the old version, as well
+as this version's binaries.  Then set up the following variables:
+export olddump=...somewhere/dump.sql	(old version's dump)
+export oldinstall=...otherversion/	(old version's install base path)
+
+Finally, the tests can be done by running
+	make check
 
 In this case, you will 

Patch: Code comments: why some text-handling functions are leakproof

2022-01-10 Thread Gurjeet Singh
Please see attached a small patch to document why some text-processing
functions are marked as leakproof, while some others are not.

This is more or less a verbatim copy of Tom's comment in email thread at
[1].

I could not find an appropriate spot to place these comments, so I placed
them on bttextcmp() function, The only other place that I could see we can
place these comments is in the file src/backend/optimizer/README, because
there is some consideration given to leakproof functions in optimizer docs.
But these comments seem quite out of place in optimizer docs.

[1]:
https://www.postgresql.org/message-id/flat/673096.1630006990%40sss.pgh.pa.us#cd378cba4b990fda070c6fa4b51a069c

Best regards,
--
Gurjeet Singh http://gurjeet.singh.im/


leakproof_comments.patch
Description: Binary data


Re: Isn't wait_for_catchup slightly broken?

2022-01-10 Thread Julien Rouhaud
On Mon, Jan 10, 2022 at 02:31:38PM -0500, Tom Lane wrote:
> 
> So I think we need to fix it to capture the target WAL position
> at the start, as I've done in the attached patch.

+1, it looks sensible to me.

> In principle
> this might make things a bit slower because of the extra
> transaction required, but I don't notice any above-the-noise
> difference on my own workstation.

I'm wondering if the environments where this extra transaction could make
a noticeable difference are also environments where doing that extra
transaction can save some iteration(s), which would be at least as costly.




Re: Skipping logical replication transactions on subscriber side

2022-01-10 Thread Amit Kapila
On Tue, Jan 11, 2022 at 8:52 AM Masahiko Sawada  wrote:
>
> On Mon, Jan 10, 2022 at 8:50 PM Amit Kapila  wrote:
> >
> > I was thinking what if we don't advance origin explicitly in this
> > case? Actually, that will be no different than the transactions where
> > the apply worker doesn't apply any change because the initial sync is
> > in progress (see should_apply_changes_for_rel()) or we have received
> > an empty transaction. In those cases also, the origin lsn won't be
> > advanced even though we acknowledge the advanced last_received
> > location because of keep_alive messages. Now, it is possible after the
> > restart we send the old start_lsn location because the replication
> > origin was not updated before restart but we handle that case in the
> > server by starting from the last confirmed location. See below code:
> >
> > CreateDecodingContext()
> > {
> > ..
> > else if (start_lsn < slot->data.confirmed_flush)
> > ..
>
> Good point. Probably one minor thing that is different from the
> transaction where the apply worker applied an empty transaction is a
> case where the server restarts/crashes before sending an
> acknowledgment of the flush location. That is, in the case of the
> empty transaction, the publisher sends an empty transaction again. On
> the other hand in the case of skipping the transaction, a non-empty
> transaction will be sent again but skip_xid is already changed or
> cleared, therefore the user will have to specify skip_xid again. If we
> write replication origin WAL record to advance the origin lsn, it
> reduces the possibility of that. But I think it’s a very minor case so
> we won’t need to deal with that.
>

Yeah, in the worst case, it will lead to conflict again and the user
needs to set the xid again.

> Anyway, according to your analysis, I think we don't necessarily need
> to do replorigin_advance() in this case.
>

Right.

> >
> > 5.
> > +  by the logical replication worker. Setting
> > NONE means
> > +  to reset the transaction ID.
> >
> > Let's slightly reword the second part of the sentence as: "Setting
> > NONE resets the transaction ID."
>
> Will fix.
>
> >
> > 6.
> > Once we start skipping
> > + * changes, we don't stop it until the we skip all changes of the
> > transaction even
> > + * if the subscription invalidated and MySubscription->skipxid gets
> > changed or reset.
> >
> > /subscription invalidated/subscription is invalidated
>
> Will fix.
>
> >
> > What do you mean by subscription invalidated and how is it related to
> > this feature? I think we should mention something on these lines in
> > the docs as well.
>
> I meant that MySubscription, a cache of pg_subscription entry, is
> invalidated by the catalog change. IIUC while applying changes we
> don't re-read pg_subscription (i.e., not calling
> maybe_reread_subscription()). Similarly, while skipping changes, we
> also don't do that. Therefore, even if skip_xid has been changed while
> skipping changes, we don't stop skipping changes.
>

Okay, but I don't think we need to mention subscription is invalidated
as that could be confusing, the other part of the comment is quite
clear.

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2022-01-10 Thread Masahiko Sawada
On Tue, Jan 11, 2022 at 11:22 AM Amit Kapila  wrote:
>
> On Mon, Jan 10, 2022 at 2:57 PM vignesh C  wrote:
> >
> > 2) Can we have an option to specify last_error_xid of
> > pg_stat_subscription_workers. Something like:
> > alter subscription sub1 skip ( XID = 'last_subscription_error');
> >
> > When the user specified last_subscription_error, it should pick
> > last_error_xid from pg_stat_subscription_workers.
> > As this operation is a critical operation, if there is an option which
> > could automatically pick and set from pg_stat_subscription_workers, it
> > would be useful.
> >
>
> I think having some automatic functionality around this would be good
> but I am not so sure about this idea because it is possible that the
> error has not reached the stats collector and the user might be
> referring to server logs to set the skip xid. In such cases, even
> though an error would have occurred but we won't be able to set the
> required xid. Now, one can imagine that if we don't get the required
> value from pg_stat_subscription_workers then we can return an error to
> the user indicating that she can cross-verify the server logs and set
> the appropriate xid value but IMO it could be confusing. I feel even
> if we want some automatic functionality like you are proposing or
> something else, it could be done as a separate patch but let's wait
> and see what Sawada-San or others think about this?

Agreed. The automatically setting XID would be a good idea but we can
do that in a separate patch so we can keep the first patch simple.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Use -fvisibility=hidden for shared libraries

2022-01-10 Thread Andres Freund
Hi,

On 2022-01-10 23:44:06 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2022-01-10 21:11:07 -0600, Justin Pryzby wrote:
> >> Without the patch, it fails under windows like:
> 
> > Ah, yea. It's about matching the declarations in ltree.h with the
> > PG_FUNCTION_INFO_V1() one.
> 
> > What is bugging me is that I am fairly sure that my local compilers at some
> > point complained about such mismatches on linux as well. But I can't 
> > reproduce
> > that right now :/

Now I wonder if I just saw it when cross compiling locally...


> Ah, I wondered about that.  I'd sort of expected warnings from
> mismatched declarations, but I didn't see any on Linux or macOS.

There are warnings when the declarations "explicitly" mismatch, e.g. one with
__attribute__((visibility("hidden"))) and one with
__attribute__((visibility("default"))). But -fvisibility=hidden isn't
sufficient.

Which makes sense to some degree, because of -fvisibility not applying to
extern declarations... But it's still annoying.

Greetings,

Andres Freund




Re: Fix a possible typo in rewriteheap.c code comments

2022-01-10 Thread Amit Kapila
On Mon, Jan 10, 2022 at 10:22 AM Amit Kapila  wrote:
>
> On Mon, Jan 10, 2022 at 9:42 AM Bharath Rupireddy
>  wrote:
> >
> > Hi,
> >
> > It looks like there's a typo, attaching a tiny patch to fix it.
> >
> >   *
> >   * When doing logical decoding - which relies on using cmin/cmax of catalog
> >   * tuples, via xl_heap_new_cid records - heap rewrites have to log enough
> > - * information to allow the decoding backend to updates its internal 
> > mapping
> > + * information to allow the decoding backend to update its internal mapping
> >
>
> LGTM.
>

Pushed.

-- 
With Regards,
Amit Kapila.




Improve error handling of HMAC computations and SCRAM

2022-01-10 Thread Michael Paquier
Hi all,

This is a follow-up of the work done in b69aba7 for cryptohashes, but
this time for HMAC.  The main issue here is related to SCRAM, where we
have a lot of code paths that have no idea about what kind of failure
is happening when an error happens, and this exists since v10 where
SCRAM has been introduced, for some of them, frontend and backend
included.  \password is one example.

The set of errors improved here would only trigger in scenarios that
are unlikely going to happen, like an OOM or an internal OpenSSL
error.  It would be possible to create a HMAC from a MD5, which would
cause an error when compiling with OpenSSL and FIPS enabled, but the
only callers of the pg_hmac_* routines involve SHA-256 in core through
SCRAM, so I don't see much a point in backpatching any of the things
proposed here.

The attached patch creates a new routine call pg_hmac_error() that one
can use to grab details about the error that happened, in the same
fashion as what has been done for cryptohashes.  The logic is not that
complicated, but note that the fallback HMAC implementation relies
itself on cryptohashes, so there are cases where we need to look at
the error from pg_cryptohash_error() and store it in the HMAC private
context.

Thoughts?
--
Michael
From 687dd48a8150fae4597b126d68f6758b52ff67cb Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 11 Jan 2022 13:47:06 +0900
Subject: [PATCH] Improve HMAC error handling

---
 src/include/common/hmac.h|  1 +
 src/include/common/scram-common.h| 14 +++--
 src/backend/libpq/auth-scram.c   | 22 ---
 src/common/hmac.c| 64 
 src/common/hmac_openssl.c| 90 
 src/common/scram-common.c| 47 +++
 src/interfaces/libpq/fe-auth-scram.c | 63 +--
 src/interfaces/libpq/fe-auth.c   | 17 --
 src/interfaces/libpq/fe-auth.h   |  3 +-
 9 files changed, 271 insertions(+), 50 deletions(-)

diff --git a/src/include/common/hmac.h b/src/include/common/hmac.h
index cf7aa17be4..c18783fe11 100644
--- a/src/include/common/hmac.h
+++ b/src/include/common/hmac.h
@@ -25,5 +25,6 @@ extern int	pg_hmac_init(pg_hmac_ctx *ctx, const uint8 *key, size_t len);
 extern int	pg_hmac_update(pg_hmac_ctx *ctx, const uint8 *data, size_t len);
 extern int	pg_hmac_final(pg_hmac_ctx *ctx, uint8 *dest, size_t len);
 extern void pg_hmac_free(pg_hmac_ctx *ctx);
+extern const char *pg_hmac_error(pg_hmac_ctx *ctx);
 
 #endif			/* PG_HMAC_H */
diff --git a/src/include/common/scram-common.h b/src/include/common/scram-common.h
index d53b4fa7f5..d1f840c11c 100644
--- a/src/include/common/scram-common.h
+++ b/src/include/common/scram-common.h
@@ -47,12 +47,16 @@
 #define SCRAM_DEFAULT_ITERATIONS	4096
 
 extern int	scram_SaltedPassword(const char *password, const char *salt,
- int saltlen, int iterations, uint8 *result);
-extern int	scram_H(const uint8 *str, int len, uint8 *result);
-extern int	scram_ClientKey(const uint8 *salted_password, uint8 *result);
-extern int	scram_ServerKey(const uint8 *salted_password, uint8 *result);
+ int saltlen, int iterations, uint8 *result,
+ const char **errstr);
+extern int	scram_H(const uint8 *str, int len, uint8 *result,
+	const char **errstr);
+extern int	scram_ClientKey(const uint8 *salted_password, uint8 *result,
+			const char **errstr);
+extern int	scram_ServerKey(const uint8 *salted_password, uint8 *result,
+			const char **errstr);
 
 extern char *scram_build_secret(const char *salt, int saltlen, int iterations,
-const char *password);
+const char *password, const char **errstr);
 
 #endif			/* SCRAM_COMMON_H */
diff --git a/src/backend/libpq/auth-scram.c b/src/backend/libpq/auth-scram.c
index 7c9dee70ce..ee7f52218a 100644
--- a/src/backend/libpq/auth-scram.c
+++ b/src/backend/libpq/auth-scram.c
@@ -465,6 +465,7 @@ pg_be_scram_build_secret(const char *password)
 	pg_saslprep_rc rc;
 	char		saltbuf[SCRAM_DEFAULT_SALT_LEN];
 	char	   *result;
+	const char *errstr = NULL;
 
 	/*
 	 * Normalize the password with SASLprep.  If that doesn't work, because
@@ -482,7 +483,8 @@ pg_be_scram_build_secret(const char *password)
  errmsg("could not generate random salt")));
 
 	result = scram_build_secret(saltbuf, SCRAM_DEFAULT_SALT_LEN,
-SCRAM_DEFAULT_ITERATIONS, password);
+SCRAM_DEFAULT_ITERATIONS, password,
+);
 
 	if (prep_password)
 		pfree(prep_password);
@@ -509,6 +511,7 @@ scram_verify_plain_password(const char *username, const char *password,
 	uint8		computed_key[SCRAM_KEY_LEN];
 	char	   *prep_password;
 	pg_saslprep_rc rc;
+	const char *errstr = NULL;
 
 	if (!parse_scram_secret(secret, , _salt,
 			stored_key, server_key))
@@ -539,10 +542,10 @@ scram_verify_plain_password(const char *username, const char *password,
 
 	/* Compute Server Key based on the user-supplied plaintext password */
 	if (scram_SaltedPassword(password, salt, 

Re: Use -fvisibility=hidden for shared libraries

2022-01-10 Thread Tom Lane
Andres Freund  writes:
> On 2022-01-10 21:11:07 -0600, Justin Pryzby wrote:
>> Without the patch, it fails under windows like:

> Ah, yea. It's about matching the declarations in ltree.h with the
> PG_FUNCTION_INFO_V1() one.

> What is bugging me is that I am fairly sure that my local compilers at some
> point complained about such mismatches on linux as well. But I can't reproduce
> that right now :/

Ah, I wondered about that.  I'd sort of expected warnings from
mismatched declarations, but I didn't see any on Linux or macOS.

regards, tom lane




Re: row filtering for logical replication

2022-01-10 Thread Amit Kapila
On Tue, Jan 11, 2022 at 6:52 AM Peter Smith  wrote:
>
> e.g. Perhaps there should be an entirely new page (section 31 ?)
> devoted just to "Logical Replication Filtering" - with subsections for
> "Row-Filtering" and "Col-Filtering".
>

+1. I think we need to be careful to avoid any duplicate updates in
docs, other than that I think this will be helpful.

-- 
With Regards,
Amit Kapila.




Re: Use -fvisibility=hidden for shared libraries

2022-01-10 Thread Andres Freund
Hi,

On 2022-01-10 21:11:07 -0600, Justin Pryzby wrote:
> On Mon, Jan 10, 2022 at 07:01:36PM -0500, Tom Lane wrote:
> > Andres Freund  writes:
> > > [ v2-0001-Use-hidden-visibility-for-shared-libraries-where-.patch ]
> > 
> > This seems like a good idea, but it's failing to apply right now,
> > mainly because of some cleanup I did in e04a8059a.  As far as I can
> > tell given the now-clarified meanings of PGDLLIMPORT/PGDLLEXPORT,
> > this patch shouldn't be touching PGDLLIMPORT.  The attached revision
> > works for me under gcc 8.5 and clang 13.
> > 
> > Also, it seemed like you'd maybe been more enthusiastic than necessary
> > about plastering PGDLLEXPORT on things.  I got through check-world
> > cleanly without the diffs in either ltree.h or worker_spi.c (although
> > I confess being confused why I didn't need the latter).  I didn't try
> > individually removing other diffs.  Those diffs are still in v3 below,
> > but we should clarify exactly which functions need marking.
> 
> Without the patch, it fails under windows like:
> 
> https://cirrus-ci.com/task/6171477065072640
> [01:44:45.399] c:\cirrus\contrib\ltree\ltree.h(193): message : see 
> declaration of '_ltree_isparent' [c:\cirrus\ltree.vcxproj]
> [01:44:45.399] c:\cirrus\contrib\ltree\_ltree_op.c(16,1): error C2375: 
> '_ltree_risparent': redefinition; different linkage [c:\cirrus\ltree.vcxproj]

Ah, yea. It's about matching the declarations in ltree.h with the
PG_FUNCTION_INFO_V1() one.

What is bugging me is that I am fairly sure that my local compilers at some
point complained about such mismatches on linux as well. But I can't reproduce
that right now :/

Greetings,

Andres Freund




Re: Skipping logical replication transactions on subscriber side

2022-01-10 Thread Masahiko Sawada
On Mon, Jan 10, 2022 at 8:50 PM Amit Kapila  wrote:
>
> On Thu, Dec 16, 2021 at 11:12 AM Masahiko Sawada  
> wrote:
> >
> > On Thu, Dec 16, 2021 at 2:21 PM Amit Kapila  wrote:
> > >
> > > >
> > > > So if skip_xid is already changed, the apply worker would do
> > > > replorigin_advance() with WAL logging, instead of committing the
> > > > catalog change?
> > > >
> > >
> > > Right. BTW, how are you planning to advance the origin? Normally, a
> > > commit transaction would do it but when we are skipping all changes,
> > > the commit might not do it as there won't be any transaction id
> > > assigned.
> >
> > I've not tested it yet but replorigin_advance() with wal_log = true
> > seems to work for this case.
> >
>
> IIUC, the changes corresponding to above in the latest patch are as follows:
>
> --- a/src/backend/replication/logical/origin.c
> +++ b/src/backend/replication/logical/origin.c
> @@ -921,7 +921,8 @@ replorigin_advance(RepOriginId node,
>   LWLockAcquire(_state->lock, LW_EXCLUSIVE);
>
>   /* Make sure it's not used by somebody else */
> - if (replication_state->acquired_by != 0)
> + if (replication_state->acquired_by != 0 &&
> + replication_state->acquired_by != MyProcPid)
>   {
> ...
>
> clear_subscription_skip_xid()
> {
> ..
> + else if (!XLogRecPtrIsInvalid(origin_lsn))
> + {
> + /*
> + * User has already changed subskipxid before clearing the subskipxid, so
> + * don't change the catalog but just advance the replication origin.
> + */
> + replorigin_advance(replorigin_session_origin, origin_lsn,
> +GetXLogInsertRecPtr(),
> +false, /* go_backward */
> +true /* wal_log */);
> + }
> ..
> }
>
> I was thinking what if we don't advance origin explicitly in this
> case? Actually, that will be no different than the transactions where
> the apply worker doesn't apply any change because the initial sync is
> in progress (see should_apply_changes_for_rel()) or we have received
> an empty transaction. In those cases also, the origin lsn won't be
> advanced even though we acknowledge the advanced last_received
> location because of keep_alive messages. Now, it is possible after the
> restart we send the old start_lsn location because the replication
> origin was not updated before restart but we handle that case in the
> server by starting from the last confirmed location. See below code:
>
> CreateDecodingContext()
> {
> ..
> else if (start_lsn < slot->data.confirmed_flush)
> ..

Good point. Probably one minor thing that is different from the
transaction where the apply worker applied an empty transaction is a
case where the server restarts/crashes before sending an
acknowledgment of the flush location. That is, in the case of the
empty transaction, the publisher sends an empty transaction again. On
the other hand in the case of skipping the transaction, a non-empty
transaction will be sent again but skip_xid is already changed or
cleared, therefore the user will have to specify skip_xid again. If we
write replication origin WAL record to advance the origin lsn, it
reduces the possibility of that. But I think it’s a very minor case so
we won’t need to deal with that.

Anyway, according to your analysis, I think we don't necessarily need
to do replorigin_advance() in this case.

>
> Few other comments on the latest patch:
> =
> 1.
> A conflict will produce an error and will stop the replication; it must be
> resolved manually by the user.  Details about the conflict can be found in
> -   the subscriber's server log.
> +as well as the
> +   subscriber's server log.
>
> Can we slightly change the modified line to: "Details about the
> conflict can be found in  linkend="monitoring-pg-stat-subscription-workers"/> and the
> subscriber's server log."?

Will fix it.

>  I think we can commit this change
> separately as this is true even without this patch.

Right. It seems an oversight of 8d74fc96db. I've attached the patch.

>
> 2.
> The resolution can be done either by changing data on the subscriber so
> -   that it does not conflict with the incoming change or by skipping the
> -   transaction that conflicts with the existing data.  The transaction can be
> -   skipped by calling the 
> +   that it does not conflict with the incoming changes or by skipping the 
> whole
> +   transaction.  This option specifies the ID of the transaction whose
> +   application is to be skipped by the logical replication worker.  The 
> logical
> +   replication worker skips all data modification transaction conflicts with
> +   the existing data. When a conflict produce an error, it is shown in
> +   pg_stat_subscription_workers view as follows:
>
> I don't think most of the additional text added in the above paragraph
> is required. We can rephrase it as: "The resolution can be done either
> by changing data on the subscriber so that it does not conflict with
> the incoming change or by skipping the transaction that conflicts with
> the existing data. When a conflict 

Re: Use -fvisibility=hidden for shared libraries

2022-01-10 Thread Justin Pryzby
On Mon, Jan 10, 2022 at 07:01:36PM -0500, Tom Lane wrote:
> Andres Freund  writes:
> > [ v2-0001-Use-hidden-visibility-for-shared-libraries-where-.patch ]
> 
> This seems like a good idea, but it's failing to apply right now,
> mainly because of some cleanup I did in e04a8059a.  As far as I can
> tell given the now-clarified meanings of PGDLLIMPORT/PGDLLEXPORT,
> this patch shouldn't be touching PGDLLIMPORT.  The attached revision
> works for me under gcc 8.5 and clang 13.
> 
> Also, it seemed like you'd maybe been more enthusiastic than necessary
> about plastering PGDLLEXPORT on things.  I got through check-world
> cleanly without the diffs in either ltree.h or worker_spi.c (although
> I confess being confused why I didn't need the latter).  I didn't try
> individually removing other diffs.  Those diffs are still in v3 below,
> but we should clarify exactly which functions need marking.

Without the patch, it fails under windows like:

https://cirrus-ci.com/task/6171477065072640
[01:44:45.399] c:\cirrus\contrib\ltree\ltree.h(193): message : see declaration 
of '_ltree_isparent' [c:\cirrus\ltree.vcxproj]
[01:44:45.399] c:\cirrus\contrib\ltree\_ltree_op.c(16,1): error C2375: 
'_ltree_risparent': redefinition; different linkage [c:\cirrus\ltree.vcxproj]

-- 
Justin




Re: Use -fvisibility=hidden for shared libraries

2022-01-10 Thread Andres Freund
Hi,

On 2022-01-10 19:01:36 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > [ v2-0001-Use-hidden-visibility-for-shared-libraries-where-.patch ]
> 
> This seems like a good idea, but it's failing to apply right now,
> mainly because of some cleanup I did in e04a8059a.  As far as I can
> tell given the now-clarified meanings of PGDLLIMPORT/PGDLLEXPORT,
> this patch shouldn't be touching PGDLLIMPORT.

Hm. I'm not sure that's "semantically" entirely right - but for now I think
it'd have no consequences.

Without marking PGDLLIMPORT symbols as visible, the compiler / linker
compiling an extension shlib would theoretically be justified emitting a
reference that doesn't expect to go through any indirection table. Which would
lead to linker issues (about relocation sizes or undefined symbols), and could
potentially even lead to misoptimization.

However, that won't cause a problem right now, because 'extern' in a
declaration causes the reference to be to a 'default' visibility symbol (note
that the *definition* of the symbol wouldn't change).

We'd benefit from separating local cross-translation-unit (i.e. within a
shared library) from "foreign" cross-translation-unit (i.e. from extension
library into main postgres binary) symbols. But I don't really see a realistic
path to get there starting where we are today. And -Wl,-Bdynamic, -fno-plt
probably get us close enough.

It'd be a bit a different story if we could make gcc default to "local"
references to variable declarations, but not function declarations. But I
don't see an option for that.


> The attached revision works for me under gcc 8.5 and clang 13.

Thanks for doing that!


> Also, it seemed like you'd maybe been more enthusiastic than necessary
> about plastering PGDLLEXPORT on things.  I got through check-world
> cleanly without the diffs in either ltree.h or worker_spi.c (although
> I confess being confused why I didn't need the latter).

Ugh. In the case of worker_spi it's because -fvisibility=hidden isn't applied
to worker_spi.c at all. Which in turn is because Makefile.shlib isn't used at
all for MODULES= style modules just for MODULE_big= ones :(.

Once that's "fixed" it fails as expected...

I'm not sure what the best way to deal with this is. Just now I copied the
logic from Makefile.shlib to pgxs.mk (by the existing CFLAGS_SL use), but that
doesn't scale - there's other direct uses of CFLAGS_SL.

Perhaps the best way would be to add the -fvisibility=hidden to CFLAGS_SL, and
work around the explicit exports issue in Makefile.shlib by adding an explicit
-fvisibility=default? Or perhaps CFLAGS_SL


A cleaner way would probably be to bite the bullet and add explicit
PGDLLEXPORTs to all the symbols in export lists? But that's a couple hundred
functions :/.


I'll try to crawl into the dusty path of a few months ago, and figure out why
I thought the ltree additions are needed...

Greetings,

Andres Freund




Re: Skipping logical replication transactions on subscriber side

2022-01-10 Thread vignesh C
On Tue, Jan 11, 2022 at 7:52 AM Amit Kapila  wrote:
>
> On Mon, Jan 10, 2022 at 2:57 PM vignesh C  wrote:
> >
> > 2) Can we have an option to specify last_error_xid of
> > pg_stat_subscription_workers. Something like:
> > alter subscription sub1 skip ( XID = 'last_subscription_error');
> >
> > When the user specified last_subscription_error, it should pick
> > last_error_xid from pg_stat_subscription_workers.
> > As this operation is a critical operation, if there is an option which
> > could automatically pick and set from pg_stat_subscription_workers, it
> > would be useful.
> >
>
> I think having some automatic functionality around this would be good
> but I am not so sure about this idea because it is possible that the
> error has not reached the stats collector and the user might be
> referring to server logs to set the skip xid. In such cases, even
> though an error would have occurred but we won't be able to set the
> required xid. Now, one can imagine that if we don't get the required
> value from pg_stat_subscription_workers then we can return an error to
> the user indicating that she can cross-verify the server logs and set
> the appropriate xid value but IMO it could be confusing. I feel even
> if we want some automatic functionality like you are proposing or
> something else, it could be done as a separate patch but let's wait
> and see what Sawada-San or others think about this?

If we are ok with the suggested idea then it can be done as a separate
patch, I agree that it need not be part of the existing patch.

Regards,
Vignesh




Re: Skipping logical replication transactions on subscriber side

2022-01-10 Thread Amit Kapila
On Mon, Jan 10, 2022 at 2:57 PM vignesh C  wrote:
>
> 2) Can we have an option to specify last_error_xid of
> pg_stat_subscription_workers. Something like:
> alter subscription sub1 skip ( XID = 'last_subscription_error');
>
> When the user specified last_subscription_error, it should pick
> last_error_xid from pg_stat_subscription_workers.
> As this operation is a critical operation, if there is an option which
> could automatically pick and set from pg_stat_subscription_workers, it
> would be useful.
>

I think having some automatic functionality around this would be good
but I am not so sure about this idea because it is possible that the
error has not reached the stats collector and the user might be
referring to server logs to set the skip xid. In such cases, even
though an error would have occurred but we won't be able to set the
required xid. Now, one can imagine that if we don't get the required
value from pg_stat_subscription_workers then we can return an error to
the user indicating that she can cross-verify the server logs and set
the appropriate xid value but IMO it could be confusing. I feel even
if we want some automatic functionality like you are proposing or
something else, it could be done as a separate patch but let's wait
and see what Sawada-San or others think about this?

-- 
With Regards,
Amit Kapila.




Re: \dP and \dX use ::regclass without "pg_catalog."

2022-01-10 Thread Tatsuro Yamada

Hi justin,

On 2022/01/08 6:56, Justin Pryzby wrote:

On Fri, Jan 07, 2022 at 08:08:57PM +0900, Michael Paquier wrote:

On Fri, Jan 07, 2022 at 06:30:30PM +0900, Tatsuro Yamada wrote:

We should prefix them with pg_catalog as well.
Are you planning to make a patch?
If not, I'll make a patch later since that's where \dX is.


If any of you can make a patch, that would be great.  Thanks!


I'd propose these.



Thanks!

Regards,
Tatsuro Yamada





Re: row filtering for logical replication

2022-01-10 Thread Peter Smith
The current documentation updates (e.g. in the v61 patch) for the
Row-Filter are good, but they are mostly about syntax changes and
accompanying notes for the new WHERE clause etc. There are also notes
for subscription tablesync behaviour etc.

But these new docs feel a bit like scattered fragments - there is
nowhere that gives an overview of this feature.

IMO there should be some overview for the whole Row-Filtering feature.
The overview text would be similar to the text of the 0001/0002 commit
messages, and it would summarise how everything works, describe the
UPDATE transformations (which currently seems not documented anywhere
in PG docs?), and maybe include a few useful filtering examples.

e.g. Perhaps there should be an entirely new page (section 31 ?)
devoted just to "Logical Replication Filtering" - with subsections for
"Row-Filtering" and "Col-Filtering".

Thoughts?

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Add index scan progress to pg_stat_progress_vacuum

2022-01-10 Thread Imseih (AWS), Sami
I agree, Renaming "index_vacuum_count" can be taken up in a separate discussion.

I have attached the 3rd revision of the patch which also includes the 
documentation changes. Also attached is a rendered html of the docs for review.

"max_index_vacuum_cycle_time" has been removed.
"index_rows_vacuumed" renamed to "index_tuples_removed". "tuples" is a more 
consistent with the terminology used.
"vacuum_cycle_ordinal_position" renamed to "index_ordinal_position".

On 1/10/22, 12:30 PM, "Bossart, Nathan"  wrote:

On 1/6/22, 6:14 PM, "Imseih (AWS), Sami"  wrote:
> I am hesitant to make column name changes for obvious reasons, as it 
breaks existing tooling. However, I think there is a really good case to change 
"index_vacuum_count" as the name is confusing. "index_vacuum_cycles_completed" 
is the name I suggest if we agree to rename.
>
> For the new column, "num_indexes_to_vacuum" is good with me. 

Yeah, I think we can skip renaming index_vacuum_count for now.  In any
case, it would probably be good to discuss that in a separate thread.

Nathan


Title: 28.4. Progress Reporting

28.4. Progress ReportingPrev UpChapter 28. Monitoring Database ActivityHome Next28.4. Progress Reporting28.4.1. ANALYZE Progress Reporting28.4.2. CREATE INDEX Progress Reporting28.4.3. VACUUM Progress Reporting28.4.4. CLUSTER Progress Reporting28.4.5. Base Backup Progress Reporting28.4.6. COPY Progress Reporting
   PostgreSQL has the ability to report the progress of
   certain commands during command execution.  Currently, the only commands
   which support progress reporting are ANALYZE,
   CLUSTER,
   CREATE INDEX, VACUUM,
   COPY,
   and BASE_BACKUP (i.e., replication
   command that pg_basebackup issues to take
   a base backup).
   This may be expanded in the future.
  28.4.1. ANALYZE Progress Reporting
   Whenever ANALYZE is running, the
   pg_stat_progress_analyze view will contain a
   row for each backend that is currently running that command.  The tables
   below describe the information that will be reported and provide
   information about how to interpret it.
  Table 28.35. pg_stat_progress_analyze View
   Column Type
  
  
   Description
  
   pid integer
  
  
   Process ID of backend.
  
   datid oid
  
  
   OID of the database to which this backend is connected.
  
   datname name
  
  
   Name of the database to which this backend is connected.
  
   relid oid
  
  
   OID of the table being analyzed.
  
   phase text
  
  
   Current processing phase. See Table 28.36.
  
   sample_blks_total bigint
  
  
   Total number of heap blocks that will be sampled.
  
   sample_blks_scanned bigint
  
  
   Number of heap blocks scanned.
  
   ext_stats_total bigint
  
  
   Number of extended statistics.
  
   ext_stats_computed bigint
  
  
   Number of extended statistics computed. This counter only advances
   when the phase is computing extended statistics.
  
   child_tables_total bigint
  
  
   Number of child tables.
  
   child_tables_done bigint
  
  
   Number of child tables scanned. This counter only advances when the
   phase is acquiring inherited sample rows.
  
   current_child_table_relid oid
  
  
   OID of the child table currently being scanned. This field is
   only valid when the phase is
   acquiring inherited sample rows.
  Table 28.36. ANALYZE phasesPhaseDescriptioninitializing
   The command is preparing to begin scanning the heap.  This phase is
   expected to be very brief.
  acquiring sample rows
   The command is currently scanning the table given by
   relid to obtain sample rows.
  acquiring inherited sample rows
   The command is currently scanning child tables to obtain sample rows.
   Columns child_tables_total,
   child_tables_done, and
   current_child_table_relid contain the
   progress information for this phase.
  computing statistics
   The command is computing statistics from the sample rows obtained
   during the table scan.
  computing extended statistics
   The command is computing extended statistics from the sample rows
   obtained during the table scan.
  finalizing analyze
   The command is updating pg_class. When this
   phase is completed, ANALYZE will end.
  Note
Note that when ANALYZE is run on a partitioned table,
all of its partitions are also recursively analyzed.
In that case, ANALYZE
progress is reported first for the parent table, whereby its inheritance
statistics are collected, followed by that for each partition.
   28.4.2. CREATE INDEX Progress Reporting
   Whenever CREATE INDEX or REINDEX is running, the
   pg_stat_progress_create_index view will contain
 

Re: row filtering for logical replication

2022-01-10 Thread Peter Smith
On Thu, Sep 23, 2021 at 10:33 PM Tomas Vondra
 wrote:
>
> Hi,
>
> I finally had time to take a closer look at the patch again, so here's
> some review comments. The thread is moving fast, so chances are some of
> the comments are obsolete or were already raised in the past.
>
>
...

> 10) WHERE expression vs. data type
>
> Seem ATExecAlterColumnType might need some changes, because changing a
> data type for column referenced by the expression triggers this:
>
>test=# alter table t alter COLUMN c type text;
>ERROR:  unexpected object depending on column: publication of
>table t in publication p
>
>

I reproduced this same error message using the following steps.

[postgres@CentOS7-x64 ~]$ psql -d test_pub
psql (15devel)
Type "help" for help.

test_pub=# create table t1(a text primary key);
CREATE TABLE
test_pub=# create publication p1 for table t1 where (a = '123');
CREATE PUBLICATION
test_pub=# \d+ t1
  Table "public.t1"
Column | Type | Collation | Nullable | Default | Storage  | Compression | Stats
target | Description
+--+---+--+-+--+-+--
+-
a  | text |   | not null | | extended | |
|
Indexes:
"t1_pkey" PRIMARY KEY, btree (a)
Publications:
"p1" WHERE (a = '123'::text)
Access method: heap

test_pub=# alter table t1 alter column a type varchar;
2022-01-10 08:39:52.106 AEDT [2066] ERROR:  unexpected object
depending on column: publication of table t1 in publication p1
2022-01-10 08:39:52.106 AEDT [2066] STATEMENT:  alter table t1 alter
column a type varchar;
ERROR:  unexpected object depending on column: publication of table t1
in publication p1
test_pub=#

~~

But the message looks OK. What exactly was your expectation for this
review comment?

--
Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: Column Filtering in Logical Replication

2022-01-10 Thread Alvaro Herrera
On 2022-Jan-07, Peter Eisentraut wrote:

> ATExecReplicaIdentity(): Regarding the question of how to handle
> REPLICA_IDENTITY_NOTHING: I see two ways to do this.  Right now, the
> approach is that the user can set the replica identity freely, and we
> decide later based on that what we can replicate (e.g., no updates).
> For this patch, that would mean we don't restrict what columns can be
> in the column list, but we check what actions we can replicate based
> on the column list.  The alternative is that we require the column
> list to include the replica identity, as the patch is currently doing,
> which would mean that REPLICA_IDENTITY_NOTHING can be allowed since
> it's essentially a set of zero columns.
> 
> I find the current behavior a bit weird on reflection.  If a user
> wants to replicate on some columns and only INSERTs, that should be
> allowed regardless of what the replica identity columns are.

Hmm.  So you're saying that we should only raise errors about the column
list if we are publishing UPDATE or DELETE, but otherwise let the
replica identity be anything.  OK, I'll see if I can come up with a
reasonable set of rules ...

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Before you were born your parents weren't as boring as they are now. They
got that way paying your bills, cleaning up your room and listening to you
tell them how idealistic you are."  -- Charles J. Sykes' advice to teenagers




Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

2022-01-10 Thread Bossart, Nathan
I noticed this thread and thought I'd share my experiences building
something similar for Multi-AZ DB clusters [0].  It's not a strict RPO
mechanism, but it does throttle backends in an effort to keep the
replay lag below a configured maximum.  I can share the code if there
is interest.

I wrote it as a new extension, and except for one piece that I'll go
into later, I was able to avoid changes to core PostgreSQL code.  The
extension manages a background worker that periodically assesses the
state of the designated standbys and updates an atomic in shared
memory that indicates how long to delay.  A transaction callback
checks this value and sleeps as necessary.  Delay can be injected for
write-enabled transactions on the primary, read-only transactions on
the standbys, or both.  The extension is heavily configurable so that
it can meet the needs of a variety of workloads.

One interesting challenge I encountered was accurately determining the
amount of replay lag.  The problem was twofold.  First, if there is no
activity on the primary, there will be nothing to replay on the
standbys, so the replay lag will appear to grow unbounded.  To work
around this, the extension's background worker periodically creates an
empty COMMIT record.  Second, if a standby reconnects after a long
time, the replay lag won't be accurate for some time.  Instead, the
replay lag will slowly increase until it reaches the correct value.
Since the delay calculation looks at the trend of the replay lag, this
apparent unbounded growth causes it to inject far more delay than is
necessary.  My guess is that this is related to 9ea3c64, and maybe it
is worth rethinking that logic.  For now, the extension just
periodically reports the value of GetLatestXTime() from the standbys
to the primary to get an accurate reading.  This is done via a new
replication callback mechanism (which requires core PostgreSQL
changes).  I can share this patch along with the extension, as I bet
there are other applications for it.

I should also note that the extension only considers "active" standbys
and primaries.  That is, ones with an active WAL sender or WAL
receiver.  This avoids the need to guess what should be done during a
network partition, but it also means that we must gracefully handle
standbys reconnecting with massive amounts of lag.  The extension is
designed to slowly ramp up the amount of injected delay until the
standby's apply lag is trending down at a sufficient rate.

I see that an approach was suggested upthread for throttling based on
WAL distance instead of per-transaction.  While the transaction
approach works decently well for certain workloads (e.g., many small
transactions like those from pgbench), it might require further tuning
for very large transactions or workloads with a variety of transaction
sizes.  For that reason, I would definitely support building a way to
throttle based on WAL generation.  It might be a good idea to avoid
throttling critical activity such as anti-wraparound vacuuming, too.

Nathan

[0] 
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/multi-az-db-clusters-concepts.html



Re: sepgsql logging

2022-01-10 Thread Jacob Champion
On Wed, Apr 14, 2021 at 8:42 AM Dave Page  wrote:
> Attached is a patch to clean this up. It will log denials as such
> regardless of whether or not either selinux or sepgsql is in
> permissive mode. When either is in permissive mode, it'll add "
> permissive=1" to the end of the log messages. e.g.

Dave,

Just to clarify -- it looks like this patch *only* adds the
"permissive=1" part, right? I don't see any changes around denied-vs-
allowed.

I read the previous posts to mean that you were seeing "allowed" when
you should have been seeing "denied". I don't see that behavior --
without this patch, I see the correct "denied" entries even when
running in permissive mode. (It's been a while since the patch was
posted, so I checked to make sure there hadn't been any relevant
changes in the meantime, and none jumped out at me.)

That said, the patch looks good as-is and seems to be working for me on
a Rocky 8 VM. (You weren't kidding about the setup difficulty.) Having
permissive mode show up in the logs seems very useful.

As an aside, I don't see the "allowed" verbiage that sepgsql uses in
any of the SELinux documentation. I do see third-party references to
"granted", though, as in e.g.

avc: granted { execute } for ...

That's not something that I think this patch should touch, but it
seemed tangentially relevant for future convergence work.

On Wed, 2021-04-14 at 09:49 -0400, Robert Haas wrote:
> Looks superficially reasonable on first glance, but I think we should
> try to get an opinion from someone who knows more about SELinux.

I am not that someone, but this looks straightforward, it's been
stalled for a while, and I think it should probably go in.

--Jacob


Re: Use -fvisibility=hidden for shared libraries

2022-01-10 Thread Tom Lane
Andres Freund  writes:
> [ v2-0001-Use-hidden-visibility-for-shared-libraries-where-.patch ]

This seems like a good idea, but it's failing to apply right now,
mainly because of some cleanup I did in e04a8059a.  As far as I can
tell given the now-clarified meanings of PGDLLIMPORT/PGDLLEXPORT,
this patch shouldn't be touching PGDLLIMPORT.  The attached revision
works for me under gcc 8.5 and clang 13.

Also, it seemed like you'd maybe been more enthusiastic than necessary
about plastering PGDLLEXPORT on things.  I got through check-world
cleanly without the diffs in either ltree.h or worker_spi.c (although
I confess being confused why I didn't need the latter).  I didn't try
individually removing other diffs.  Those diffs are still in v3 below,
but we should clarify exactly which functions need marking.

regards, tom lane

diff --git a/configure b/configure
index 3f2aea0d7d..98eabe8b08 100755
--- a/configure
+++ b/configure
@@ -735,6 +735,8 @@ CPP
 CFLAGS_SL
 BITCODE_CXXFLAGS
 BITCODE_CFLAGS
+CXXFLAGS_SL_MOD
+CFLAGS_SL_MOD
 CFLAGS_VECTORIZE
 CFLAGS_UNROLL_LOOPS
 PERMIT_DECLARATION_AFTER_STATEMENT
@@ -6288,6 +6290,154 @@ if test x"$pgac_cv_prog_CC_cflags__ftree_vectorize" = x"yes"; then
 fi
 
 
+  #
+  # If the compiler knows how to hide symbols, set CFLAGS_SL_MOD
+  # to the switch needed for that, and define HAVE_VISIBILITY_ATTRIBUTE.
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CC} supports -fvisibility=hidden, for CFLAGS_SL_MOD" >&5
+$as_echo_n "checking whether ${CC} supports -fvisibility=hidden, for CFLAGS_SL_MOD... " >&6; }
+if ${pgac_cv_prog_CC_cflags__fvisibility_hidden+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  pgac_save_CFLAGS=$CFLAGS
+pgac_save_CC=$CC
+CC=${CC}
+CFLAGS="${CFLAGS_SL_MOD} -fvisibility=hidden"
+ac_save_c_werror_flag=$ac_c_werror_flag
+ac_c_werror_flag=yes
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  pgac_cv_prog_CC_cflags__fvisibility_hidden=yes
+else
+  pgac_cv_prog_CC_cflags__fvisibility_hidden=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+ac_c_werror_flag=$ac_save_c_werror_flag
+CFLAGS="$pgac_save_CFLAGS"
+CC="$pgac_save_CC"
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_prog_CC_cflags__fvisibility_hidden" >&5
+$as_echo "$pgac_cv_prog_CC_cflags__fvisibility_hidden" >&6; }
+if test x"$pgac_cv_prog_CC_cflags__fvisibility_hidden" = x"yes"; then
+  CFLAGS_SL_MOD="${CFLAGS_SL_MOD} -fvisibility=hidden"
+fi
+
+
+  if test "$pgac_cv_prog_CC_cflags__fvisibility_hidden" = yes; then
+
+$as_echo "#define HAVE_VISIBILITY_ATTRIBUTE 1" >>confdefs.h
+
+  fi
+  # For C++ we additionally want -fvisibility-inlines-hidden
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CXX} supports -fvisibility=hidden, for CXXFLAGS_SL_MOD" >&5
+$as_echo_n "checking whether ${CXX} supports -fvisibility=hidden, for CXXFLAGS_SL_MOD... " >&6; }
+if ${pgac_cv_prog_CXX_cxxflags__fvisibility_hidden+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  pgac_save_CXXFLAGS=$CXXFLAGS
+pgac_save_CXX=$CXX
+CXX=${CXX}
+CXXFLAGS="${CXXFLAGS_SL_MOD} -fvisibility=hidden"
+ac_save_cxx_werror_flag=$ac_cxx_werror_flag
+ac_cxx_werror_flag=yes
+ac_ext=cpp
+ac_cpp='$CXXCPP $CPPFLAGS'
+ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5'
+ac_link='$CXX -o conftest$ac_exeext $CXXFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5'
+ac_compiler_gnu=$ac_cv_cxx_compiler_gnu
+
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_cxx_try_compile "$LINENO"; then :
+  pgac_cv_prog_CXX_cxxflags__fvisibility_hidden=yes
+else
+  pgac_cv_prog_CXX_cxxflags__fvisibility_hidden=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+ac_ext=c
+ac_cpp='$CPP $CPPFLAGS'
+ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
+ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5'
+ac_compiler_gnu=$ac_cv_c_compiler_gnu
+
+ac_cxx_werror_flag=$ac_save_cxx_werror_flag
+CXXFLAGS="$pgac_save_CXXFLAGS"
+CXX="$pgac_save_CXX"
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_prog_CXX_cxxflags__fvisibility_hidden" >&5
+$as_echo "$pgac_cv_prog_CXX_cxxflags__fvisibility_hidden" >&6; }
+if test x"$pgac_cv_prog_CXX_cxxflags__fvisibility_hidden" = x"yes"; then
+  CXXFLAGS_SL_MOD="${CXXFLAGS_SL_MOD} -fvisibility=hidden"
+fi
+
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CXX} supports -fvisibility-inlines-hidden, for CXXFLAGS_SL_MOD" >&5
+$as_echo_n "checking whether ${CXX} supports -fvisibility-inlines-hidden, for CXXFLAGS_SL_MOD... " >&6; }
+if ${pgac_cv_prog_CXX_cxxflags__fvisibility_inlines_hidden+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  pgac_save_CXXFLAGS=$CXXFLAGS
+pgac_save_CXX=$CXX
+CXX=${CXX}
+CXXFLAGS="${CXXFLAGS_SL_MOD} -fvisibility-inlines-hidden"

Re: null iv parameter passed to combo_init()

2022-01-10 Thread Zhihong Yu
On Sun, Jan 9, 2022 at 6:45 PM Zhihong Yu  wrote:

>
>
> On Sun, Jan 9, 2022 at 1:27 PM Zhihong Yu  wrote:
>
>>
>>
>> On Sun, Jan 9, 2022 at 12:38 PM Zhihong Yu  wrote:
>>
>>>
>>>
>>> On Sun, Jan 9, 2022 at 8:48 AM Noah Misch  wrote:
>>>
 On Sun, Jan 09, 2022 at 04:37:32AM -0800, Zhihong Yu wrote:
 > On Sat, Jan 8, 2022 at 11:32 PM Tom Lane  wrote:
 > > Noah Misch  writes:
 > > > On further thought, I would write it this way:
 > >
 > > > - else
 > > > + else if (ivlen != 0)
 > > >   memcpy(ivbuf, iv, ivlen);
 > >
 > > FWIW, I liked the "ivlen > 0" formulation better.  They should be
 > > equivalent, because ivlen is unsigned, but it just seems like "> 0"
 > > is more natural.

 If I were considering the one code site in isolation, I'd pick "ivlen >
 0".
 But of the four sites identified so far, three have signed length
 variables.
 Since we're likely to get more examples of this pattern, some signed
 and some
 unsigned, I'd rather use a style that does the optimal thing whether or
 not
 the variable is signed.  What do you think?

 > Patch v4 is attached.

 Does this pass the test procedure shown upthread?

>>> Hi,
>>> I installed gcc 4.9.3
>>>
>>> When I ran:
>>> ./configure CFLAGS='-fsanitize=undefined
>>> -fsanitize-undefined-trap-on-error'
>>>
>>> I saw:
>>>
>>> configure:3977: $? = 0
>>> configure:3966: gcc -V >&5
>>> gcc: error: unrecognized command line option '-V'
>>> gcc: fatal error: no input files
>>> compilation terminated.
>>> configure:3977: $? = 1
>>> configure:3966: gcc -qversion >&5
>>> gcc: error: unrecognized command line option '-qversion'
>>> gcc: fatal error: no input files
>>> compilation terminated.
>>> configure:3977: $? = 1
>>> configure:3997: checking whether the C compiler works
>>> configure:4019: gcc -fsanitize=undefined
>>> -fsanitize-undefined-trap-on-error   conftest.c  >&5
>>> gcc: error: unrecognized command line option
>>> '-fsanitize-undefined-trap-on-error'
>>> configure:4023: $? = 1
>>> configure:4061: result: no
>>>
>>> I wonder if a higher version gcc is needed.
>>>
>>> FYI
>>>
>>
>> After installing gcc-11, ./configure passed (with 0003-memcpy-null.patch).
>> In the output of `make check-world`, I don't see `runtime error`.
>> Though there was a crash (maybe specific to my machine):
>>
>> Core was generated by
>> `/nfusr/dev-server/zyu/postgres/tmp_install/usr/local/pgsql/bin/postgres
>> --singl'.
>> Program terminated with signal SIGILL, Illegal instruction.
>> #0  0x0050642d in write_item.cold ()
>> Missing separate debuginfos, use: debuginfo-install
>> glibc-2.17-325.el7_9.x86_64 nss-pam-ldapd-0.8.13-25.el7.x86_64
>> sssd-client-1.16.5-10.el7_9.10.x86_64
>> (gdb) bt
>> #0  0x0050642d in write_item.cold ()
>> #1  0x00ba9d1b in write_relcache_init_file ()
>> #2  0x00bb58f7 in RelationCacheInitializePhase3 ()
>> #3  0x00bd5cb5 in InitPostgres ()
>> #4  0x00a0a9ea in PostgresMain ()
>>
>> FYI
>>
> Hi,
> Earlier I was using devtoolset-11 which had an `Illegal instruction` error.
>
> I compiled / installed gcc-11 from source (which took whole afternoon).
> `make check-world` passed with patch v3.
> In tmp_install/log/install.log, I saw:
>
> gcc -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Werror=vla -Wendif-labels
> -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type
> -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard
> -Wno-format-truncation -Wno-stringop-truncation -fsanitize=undefined
> -fsanitize-undefined-trap-on-error -I../../src/port -DFRONTEND
> -I../../src/include  -D_GNU_SOURCE   -c -o path.o path.c
> rm -f libpgport.a
>

Hi, Noah:
Patch v3 passes `make check-world`

Can you take another look ?


Re: Avoiding smgrimmedsync() during nbtree index builds

2022-01-10 Thread Melanie Plageman
I have attached a v3 which includes two commits -- one of which
implements the directmgr API and uses it and the other which adds
functionality to use either directmgr or bufmgr API during index build.

Also registering for march commitfest.

On Thu, Dec 9, 2021 at 2:33 PM Andres Freund  wrote:
>
> Hi,
>
> On 2021-11-19 15:11:57 -0500, Melanie Plageman wrote:
> > From 2130175c5d794f60c5f15d6eb1b626c81fb7c39b Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman 
> > Date: Thu, 15 Apr 2021 07:01:01 -0400
> > Subject: [PATCH v2] Index build avoids immed fsync
> >
> > Avoid immediate fsync for just built indexes either by using shared
> > buffers or by leveraging checkpointer's SyncRequest queue. When a
> > checkpoint begins during the index build, if not using shared buffers,
> > the backend will have to do its own fsync.
> > ---
> >  src/backend/access/nbtree/nbtree.c  |  39 +++---
> >  src/backend/access/nbtree/nbtsort.c | 186 +++-
> >  src/backend/access/transam/xlog.c   |  14 +++
> >  src/include/access/xlog.h   |   1 +
> >  4 files changed, 188 insertions(+), 52 deletions(-)
> >
> > diff --git a/src/backend/access/nbtree/nbtree.c 
> > b/src/backend/access/nbtree/nbtree.c
> > index 40ad0956e0..a2e32f18e6 100644
> > --- a/src/backend/access/nbtree/nbtree.c
> > +++ b/src/backend/access/nbtree/nbtree.c
> > @@ -150,30 +150,29 @@ void
> >  btbuildempty(Relation index)
> >  {
> >   Pagemetapage;
> > + Buffer metabuf;
> >
> > - /* Construct metapage. */
> > - metapage = (Page) palloc(BLCKSZ);
> > - _bt_initmetapage(metapage, P_NONE, 0, _bt_allequalimage(index, 
> > false));
> > -
> > + // TODO: test this.
>
> Shouldn't this path have plenty coverage?

Yep. Sorry.

> >   /*
> > -  * Write the page and log it.  It might seem that an immediate sync 
> > would
> > -  * be sufficient to guarantee that the file exists on disk, but 
> > recovery
> > -  * itself might remove it while replaying, for example, an
> > -  * XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE record.  Therefore, we need
> > -  * this even when wal_level=minimal.
> > +  * Construct metapage.
> > +  * Because we don't need to lock the relation for extension (since
> > +  * noone knows about it yet) and we don't need to initialize the
> > +  * new page, as it is done below by _bt_blnewpage(), _bt_getbuf()
> > +  * (with P_NEW and BT_WRITE) is overkill.
>
> Isn't the more relevant operation the log_newpage_buffer()?

Returning to this after some time away, many of my comments no longer
make sense to me either. I can't actually tell which diff your question
applies to because this comment was copy-pasted in two different places
in my code. Also, I've removed this comment and added new ones. So,
given all that, is there still something about log_newpage_buffer() I
should be commenting on?

> > However, it might be worth
> > +  * either modifying it or adding a new helper function instead of
> > +  * calling ReadBufferExtended() directly. We pass mode 
> > RBM_ZERO_AND_LOCK
> > +  * because we want to hold an exclusive lock on the buffer content
> >*/
>
> "modifying it" refers to what?
>
> I don't see a problem using ReadBufferExtended() here. Why would you like to
> replace it with something else?

I would just disregard these comments now.

> > + /*
> > +  * Based on the number of tuples, either create a buffered or 
> > unbuffered
> > +  * write state. if the number of tuples is small, make a buffered 
> > write
> > +  * if the number of tuples is larger, then we make an unbuffered 
> > write state
> > +  * and must ensure that we check the redo pointer to know whether or 
> > not we
> > +  * need to fsync ourselves
> > +  */
> >
> >   /*
> >* Finish the build by (1) completing the sort of the spool file, (2)
> >* inserting the sorted tuples into btree pages and (3) building the 
> > upper
> >* levels.  Finally, it may also be necessary to end use of 
> > parallelism.
> >*/
> > - _bt_leafbuild(buildstate.spool, buildstate.spool2);
> > + if (reltuples > 1000)
>
> I'm ok with some random magic constant here, but it seems worht putting it in
> some constant / #define to make it more obvious.

Done.

> > + _bt_leafbuild(buildstate.spool, buildstate.spool2, false);
> > + else
> > + _bt_leafbuild(buildstate.spool, buildstate.spool2, true);
>
> Why duplicate the function call?

Fixed.

> >  /*
> >   * allocate workspace for a new, clean btree page, not linked to any 
> > siblings.
> > + * If index is not built in shared buffers, buf should be InvalidBuffer
> >   */
> >  static Page
> > -_bt_blnewpage(uint32 level)
> > +_bt_blnewpage(uint32 level, Buffer buf)
> >  {
> >   Pagepage;
> >   BTPageOpaque opaque;
> >
> > - page = (Page) palloc(BLCKSZ);
> > + if (buf)
> > + page = BufferGetPage(buf);
> > 

Re: CREATEROLE and role ownership hierarchies

2022-01-10 Thread Andrew Dunstan


On 1/5/22 19:05, Mark Dilger wrote:
>
>> On Jan 4, 2022, at 12:47 PM, Joshua Brindle  
>> wrote:
>>
>>> I was able to reproduce that using REASSIGN OWNED BY to cause a user to own 
>>> itself.  Is that how you did it, or is there yet another way to get into 
>>> that state?
>> I did:
>> ALTER ROLE brindle OWNER TO brindle;
> Ok, thanks.  I have rebased, fixed both REASSIGN OWNED BY and ALTER ROLE .. 
> OWNER TO cases, and added regression coverage for them.
>
> The last patch set to contain significant changes was v2, with v3 just being 
> a rebase.  Relative to those sets:
>
> 0001 -- rebased.
> 0002 -- rebased; extend AlterRoleOwner_internal to disallow making a role its 
> own immediate owner.
> 0003 -- rebased; extend AlterRoleOwner_internal to disallow cycles in the 
> role ownership graph.
> 0004 -- rebased.
> 0005 -- new; removes the broken pg_auth_members.grantor field.


In general this looks good. Some nitpicks:


+/*
+ * Ownership check for a role (specified by OID)
+ */
+bool
+pg_role_ownercheck(Oid role_oid, Oid roleid)


This is a bit confusing. Let's rename these params so it's clear which
is the owner and which the owned role.


+ * Note: In versions prior to PostgreSQL version 15, roles did not have
owners
+ * per se; instead we used this test in places where an ownership-like
+ * permissions test was needed for a role.


No need to talk about what we used to do. People who want to know can
look back at older branches.


+bool
+has_rolinherit_privilege(Oid roleid)
+{


This and similar functions should have header comments.


+   /* Owners of roles have every privilge the owned role has */

s/privlge/privilege/


+CREATE ROLE regress_role_1 CREATEDB CREATEROLE REPLICATION BYPASSRLS;


I don't really like this business of just numbering large numbers of
roles in the tests. Let's give them more meaningful names.


+   Role owners can change any of these settings on roles they own except


I would say "on roles they directly or indirectly own", here and
similarly in one or two other places.


...


I will probably do one or two more passes over the patches, but as I say
in general they look fairly good.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Adding CI to our tree

2022-01-10 Thread Justin Pryzby
On Sun, Jan 09, 2022 at 11:57:44AM -0800, Andres Freund wrote:
> On 2022-01-09 13:16:50 -0600, Justin Pryzby wrote:
> > diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
> > index 9a31e0b8795..14fd847ba7f 100644
> > --- a/contrib/test_decoding/Makefile
> > +++ b/contrib/test_decoding/Makefile
> > @@ -10,7 +10,7 @@ ISOLATION = mxact delayed_startup ondisk_startup 
> > concurrent_ddl_dml \
> > oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \
> > twophase_snapshot
> >  
> > -REGRESS_OPTS = --temp-config 
> > $(top_srcdir)/contrib/test_decoding/logical.conf
> > +REGRESS_OPTS = 
> > --temp-config=$(top_srcdir)/contrib/test_decoding/logical.conf
> >  ISOLATION_OPTS = --temp-config 
> > $(top_srcdir)/contrib/test_decoding/logical.conf
> 
> Not sure why these are part of the diff?

Because otherwise vcregress runs pg_regress --temp-config test1 test2 [...]
..which means test1 gets eaten as the argument to --temp-config

> > diff --git a/src/tools/ci/pg_ci_base.conf b/src/tools/ci/pg_ci_base.conf
> > index d8faa9c26c1..52cdb697a57 100644
> > --- a/src/tools/ci/pg_ci_base.conf
> > +++ b/src/tools/ci/pg_ci_base.conf
> > @@ -12,3 +12,24 @@ log_connections = true
> >  log_disconnections = true
> >  log_line_prefix = '%m [%p][%b] %q[%a][%v:%x] '
> >  log_lock_waits = true
> > +
> > +# test_decoding
> > +wal_level = logical
> > +max_replication_slots = 4
> > +logical_decoding_work_mem = 64kB
> > [ more ]
> 
> This doesn't really seem like a scalable path forward - duplicating
> configuration in more places doesn't seem sane. It seems it'd make more sense
> to teach vcregress.pl to run NO_INSTALLCHECK targets properly? ISTM that
> changing the options passed to pg_regress based on fetchTests() return value
> wouldn't be too hard?

It needs to run the tests with separate instance.  Maybe you're suggesting to
use --temp-instance.

It needs to avoid running on the buildfarm, right ?

-- 
Justin
>From 0818f79b27de42182e26dd9dad991de8258c8238 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 9 Jan 2022 18:25:02 -0600
Subject: [PATCH 1/3] vcregress/ci: test modules/contrib with NO_INSTALLCHECK=1

---
 .cirrus.yml|  4 +-
 contrib/pg_stat_statements/Makefile|  2 +-
 contrib/test_decoding/Makefile |  2 +-
 src/test/modules/snapshot_too_old/Makefile |  2 +-
 src/test/modules/worker_spi/Makefile   |  2 +-
 src/tools/msvc/vcregress.pl| 46 +++---
 6 files changed, 47 insertions(+), 11 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index 19b3737fa11..02ea7e67189 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -398,9 +398,9 @@ task:
   test_isolation_script:
 - perl src/tools/msvc/vcregress.pl isolationcheck
   test_modules_script:
-- perl src/tools/msvc/vcregress.pl modulescheck
+- perl src/tools/msvc/vcregress.pl modulescheck install
   test_contrib_script:
-- perl src/tools/msvc/vcregress.pl contribcheck
+- perl src/tools/msvc/vcregress.pl contribcheck install
   stop_script:
 - tmp_install\bin\pg_ctl.exe stop -D tmp_check/db -l tmp_check/postmaster.log
   test_ssl_script:
diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 7fabd96f38d..d732e1ade73 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -15,7 +15,7 @@ PGFILEDESC = "pg_stat_statements - execution statistics of SQL statements"
 
 LDFLAGS_SL += $(filter -lm, $(LIBS))
 
-REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf
+REGRESS_OPTS = --temp-config=$(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf
 REGRESS = pg_stat_statements oldextversions
 # Disabled because these tests require "shared_preload_libraries=pg_stat_statements",
 # which typical installcheck users do not have (e.g. buildfarm clients).
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 9a31e0b8795..14fd847ba7f 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -10,7 +10,7 @@ ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
 	oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \
 	twophase_snapshot
 
-REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
+REGRESS_OPTS = --temp-config=$(top_srcdir)/contrib/test_decoding/logical.conf
 ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
 
 # Disabled because these tests require "wal_level=logical", which
diff --git a/src/test/modules/snapshot_too_old/Makefile b/src/test/modules/snapshot_too_old/Makefile
index dfb4537f63c..752a0039fdc 100644
--- a/src/test/modules/snapshot_too_old/Makefile
+++ b/src/test/modules/snapshot_too_old/Makefile
@@ -5,7 +5,7 @@
 EXTRA_CLEAN = $(pg_regress_clean_files)
 
 ISOLATION = sto_using_cursor sto_using_select sto_using_hash_index
-ISOLATION_OPTS = --temp-config 

Re: preserve timestamps when installing headers

2022-01-10 Thread Tom Lane
Peter Eisentraut  writes:
> I don't think preserving timestamps should be the default behavior, but 
> I would support organizing things so that additional options can be 
> passed to "install" to make it do whatever the user prefers.  But that 
> won't work if some installations don't go through install.

Check, but ...

> Btw., a quick test of make -C src/include/ install:
> cp (current code): 0.5 s
> GNU install: 0.6 s
> install-sh: 12.5 s

So this says that there's only a performance issue with install-sh;
but that's used by just a tiny minority of systems anymore.  Scraping
the buildfarm's configure results, I find this many animals reporting
each of these choices:

  4 /bin/install -c
  8 config/install-sh -c
  2 /opt/packages/coreutils-8.6/inst/bin/install -c
  1 /usr/bin/ginstall -c
100 /usr/bin/install -c
  1 /usr/gnu/bin/install -c

The 8 holdouts are

gaur
haddock
hake
hornet
hoverfly
mandrill
sungazer
tern

ie, ancient HPUX, OpenIndiana (Solaris), and AIX, none of which
are likely development platforms anymore --- and if somebody
did care about this, there's nothing stopping them from
installing GNU install on their machine.

So I fear we're optimizing for a case that stopped being mainstream
a decade or more back.  I could get behind switching the code back
to using $(INSTALL) for this, and then offering some way to inject
user-selected switches into the $(INSTALL) invocations.  That
wouldn't need much more than another gmake macro.  (Does there
need to be a way to inject such switches only into header
installations, or is it OK to do it across the board?)

[ wanders away wondering how this'd affect the meson conversion
project ]

regards, tom lane




Re: Column Filtering in Logical Replication

2022-01-10 Thread Alvaro Herrera
In this version I have addressed these points, except the REPLICA
IDENTITY NOTHING stuff.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"The eagle never lost so much time, as
when he submitted to learn of the crow." (William Blake)
>From e907582e4cd249850dc3784fa4e21b8c1448e99f Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 6 Sep 2021 10:34:29 -0300
Subject: [PATCH v16] Support column lists for logical replication of tables

Add the capability of specifying a column list for individual tables as
part of a publication.  Columns not in the list are not published.  This
enables replicating to a table with only a subset of the columns.

If no column list is specified, all the columns are replicated, as
previously

Author: Rahila Syed 
Discussion: https://postgr.es/m/CAH2L28vddB_NFdRVpuyRBJEBWjz4BSyTB=_ektnrh8nj1jf...@mail.gmail.com
---
 doc/src/sgml/catalogs.sgml  |  13 +
 doc/src/sgml/protocol.sgml  |   4 +-
 doc/src/sgml/ref/alter_publication.sgml |  20 +-
 doc/src/sgml/ref/create_publication.sgml|  11 +-
 src/backend/catalog/pg_publication.c| 321 +++-
 src/backend/commands/publicationcmds.c  |  67 +++-
 src/backend/commands/tablecmds.c|  79 -
 src/backend/nodes/copyfuncs.c   |   1 +
 src/backend/nodes/equalfuncs.c  |   1 +
 src/backend/parser/gram.y   |  60 +++-
 src/backend/replication/logical/proto.c |  66 ++--
 src/backend/replication/logical/tablesync.c | 119 +++-
 src/backend/replication/pgoutput/pgoutput.c | 118 ++-
 src/bin/pg_dump/pg_dump.c   |  41 ++-
 src/bin/pg_dump/pg_dump.h   |   1 +
 src/bin/psql/describe.c |  26 +-
 src/bin/psql/tab-complete.c |   2 +
 src/include/catalog/pg_publication.h|   5 +
 src/include/catalog/pg_publication_rel.h|   3 +
 src/include/nodes/parsenodes.h  |   4 +-
 src/include/replication/logicalproto.h  |   6 +-
 src/test/regress/expected/publication.out   |  36 ++-
 src/test/regress/sql/publication.sql|  22 +-
 src/test/subscription/t/028_column_list.pl  | 164 ++
 24 files changed, 1105 insertions(+), 85 deletions(-)
 create mode 100644 src/test/subscription/t/028_column_list.pl

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 03e2537b07..b7b75f64a2 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -6311,6 +6311,19 @@ SCRAM-SHA-256$iteration count:
Reference to relation
   
  
+
+ 
+  
+   prattrs int2vector
+   (references pg_attribute.attnum)
+  
+  
+   This is an array of values that indicates which table columns are
+   part of the publication.  For example a value of 1 3
+   would mean that the first and the third table columns are published.
+   A null value indicates that all attributes are published.
+  
+ 
 

   
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 34a7034282..5bc2e7a591 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -6877,7 +6877,9 @@ Relation
 
 
 
-Next, the following message part appears for each column (except generated columns):
+Next, the following message part appears for each column (except
+generated columns and other columns that don't appear in the column
+filter list, for tables that have one):
 
 
 
diff --git a/doc/src/sgml/ref/alter_publication.sgml b/doc/src/sgml/ref/alter_publication.sgml
index bb4ef5e5e2..d0e97243b8 100644
--- a/doc/src/sgml/ref/alter_publication.sgml
+++ b/doc/src/sgml/ref/alter_publication.sgml
@@ -25,12 +25,13 @@ ALTER PUBLICATION name ADD name SET publication_object [, ...]
 ALTER PUBLICATION name DROP publication_object [, ...]
 ALTER PUBLICATION name SET ( publication_parameter [= value] [, ... ] )
+ALTER PUBLICATION name ALTER TABLE publication_object SET COLUMNS { ( name [, ...] ) | ALL }
 ALTER PUBLICATION name OWNER TO { new_owner | CURRENT_ROLE | CURRENT_USER | SESSION_USER }
 ALTER PUBLICATION name RENAME TO new_name
 
 where publication_object is one of:
 
-TABLE [ ONLY ] table_name [ * ] [, ... ]
+TABLE [ ONLY ] table_name [ * ] [ ( column_name, [, ... ] ) ] [, ... ]
 ALL TABLES IN SCHEMA { schema_name | CURRENT_SCHEMA } [, ... ]
 
  
@@ -62,6 +63,11 @@ ALTER PUBLICATION name RENAME TO 
 
+  
+   The ALTER TABLE ... SET COLUMNS variant allows changing
+   the set of columns that are included in the publication.
+  
+
   
The remaining variants change the owner and the name of the publication.
   
@@ -110,6 +116,8 @@ ALTER PUBLICATION name RENAME TO * can be specified after the table
   name to explicitly indicate that descendant tables are included.
+  Optionally, a column list can be specified.  See  for details.
  
 

@@ -164,9 +172,15 @@ ALTER PUBLICATION 

Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

2022-01-10 Thread Tom Lane
Thomas Munro  writes:
> Hmm, one thing I'm still unclear on: did this problem really start
> with 6051857fc/ed52c3707?  My initial email in this thread lists
> similar failures going back further, doesn't it?  (And what's tern
> doing mixed up in this mess?)

Well, those earlier ones may be committs failures, but a lot of
them contain different-looking symptoms, eg pg_ctl failures.

tern's failure at
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern=2021-07-17+10%3A57%3A49
does look similar, but we can see in its log that the standby
*did* notice the primary disconnection and then reconnect:

2021-07-17 16:29:08.248 UTC [17498380:2] LOG:  replication terminated by 
primary server
2021-07-17 16:29:08.248 UTC [17498380:3] DETAIL:  End of WAL reached on 
timeline 1 at 0/30378F8.
2021-07-17 16:29:08.248 UTC [17498380:4] FATAL:  could not send 
end-of-streaming message to primary: no COPY in progress
2021-07-17 16:29:08.248 UTC [25166230:5] LOG:  invalid record length at 
0/30378F8: wanted 24, got 0
2021-07-17 16:29:08.350 UTC [16318578:1] FATAL:  could not connect to the 
primary server: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
2021-07-17 16:29:36.369 UTC [7077918:1] FATAL:  could not connect to the 
primary server: FATAL:  the database system is starting up
2021-07-17 16:29:36.380 UTC [11338028:1] FATAL:  could not connect to the 
primary server: FATAL:  the database system is starting up
...
2021-07-17 16:29:36.881 UTC [17367092:1] LOG:  started streaming WAL from 
primary at 0/300 on timeline 1

So I'm not sure what happened there, but it's not an instance
of this problem.  One thing that looks a bit suspicious is
this in the primary's log:

2021-07-17 16:26:47.832 UTC [12386550:1] LOG:  using stale statistics instead 
of current ones because stats collector is not responding

which makes me wonder if the timeout is down to out-of-date
pg_stats data.  The loop in 002_standby.pl doesn't appear to
depend on the stats collector:

my $primary_lsn =
  $primary->safe_psql('postgres', 'select pg_current_wal_lsn()');
$standby->poll_query_until('postgres',
qq{SELECT '$primary_lsn'::pg_lsn <= pg_last_wal_replay_lsn()})
  or die "standby never caught up";

but maybe I'm missing the connection.

Apropos of that, it's worth noting that wait_for_catchup *is*
dependent on up-to-date stats, and here's a recent run where
it sure looks like the timeout cause is AWOL stats collector:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer=2022-01-10%2004%3A51%3A34

I wonder if we should refactor wait_for_catchup to probe the
standby directly instead of relying on the upstream's view.

regards, tom lane




Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

2022-01-10 Thread Andrew Dunstan


On 1/10/22 15:52, Thomas Munro wrote:
>
> Hmm, one thing I'm still unclear on: did this problem really start
> with 6051857fc/ed52c3707?  My initial email in this thread lists
> similar failures going back further, doesn't it?  (And what's tern
> doing mixed up in this mess?)



Your list contains at least some false positives. e.g.

which has a different script failing.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

2022-01-10 Thread Thomas Munro
On Tue, Jan 11, 2022 at 6:00 AM Alexander Lakhin  wrote:
> 10.01.2022 12:40, Thomas Munro wrote:
> > This is super quick-and-dirty code (and doesn't handle some errors or
> > socket changes correctly), but does it detect the closed socket?

> Yes, it fixes the behaviour and makes the 002_standby test pass (100 of
> 100 iterations).

Thanks for testing.  That result does seem to confirm the hypothesis
that FD_CLOSE is reported only once for the socket on graceful
shutdown (that is, it's edge-triggered and incidentally you won't get
FD_READ), so you need to keep track of it carefully.  Incidentally,
another observation is that your WSAPoll() test appears to be
returning POLLHUP where at least Linux, FreeBSD and Solaris would not:
a socket that is only half shut down (the primary shut down its end
gracefully, but walreceiver did not), so I suspect Windows' POLLHUP
might have POLLRDHUP semantics.

> I'm yet to find out whether the other
> WaitLatchOrSocket' users (e. g. postgres_fdw) can suffer from the
> disconnected socket state, but this approach definitely works for
> walreceiver.

I see where you're going: there might be safe call sequences and
unsafe call sequences, and maybe walreceiver is asking for trouble by
double-polling.  I'm not sure about that; I got the impression
recently that it's possible to get FD_CLOSE while you still have
buffered data to read, so then the next recv() will return > 0 and
then we don't have any state left anywhere to remember that we saw
FD_CLOSE, even if you're careful to poll and read in the ideal
sequence.  I could be wrong, and it would be nice if there is an easy
fix along those lines...  The documentation around FD_CLOSE is
unclear.

I do plan to make a higher quality patch like the one I showed
(material from earlier unfinished work[1] that needs a bit more
infrastructure), but to me that's new feature/efficiency work, not
something we'd want to back-patch.

Hmm, one thing I'm still unclear on: did this problem really start
with 6051857fc/ed52c3707?  My initial email in this thread lists
similar failures going back further, doesn't it?  (And what's tern
doing mixed up in this mess?)

[1] 
https://www.postgresql.org/message-id/flat/CA%2BhUKGJPaygh-6WHEd0FnH89GrkTpVyN_ew9ckv3%2BnwjmLcSeg%40mail.gmail.com#aa33ec3e7ad85499f35dd1434a139c3f




Re: [Ext:] Re: Stream Replication not working

2022-01-10 Thread Allie Crawford
Thank you so much for your help on this Satya. I have detailed right below the 
output of the query you asked me to run.

Master

postgresql@ ~>psql

psql (13.5)

Type "help" for help.



postgresql=# select * from pg_locks;

  locktype  | database | relation | page | tuple | virtualxid | transactionid | 
classid | objid | objsubid | virtualtransaction |   pid   |  mode   | 
granted | fastpath

+--+--+--+---++---+-+---+--++-+-+-+--

 relation   |16384 |12141 |  |   ||   | 
|   |  | 3/6715 | 2669949 | AccessShareLock | t 
  | t

 virtualxid |  |  |  |   | 3/6715 |   | 
|   |  | 3/6715 | 2669949 | ExclusiveLock   | t 
  | t

(2 rows)



postgresql=#


Standby
postgresql@ ~>psql
psql (13.5)
Type "help" for help.

postgresql=# select * from pg_locks;
  locktype  | database | relation | page | tuple | virtualxid | transactionid | 
classid | objid | objsubid | virtualtransaction |  pid   |  mode   | 
granted | fastpath
+--+--+--+---++---+-+---+--+++-+-+--
 relation   |16384 |12141 |  |   ||   | 
|   |  | 2/50   | 642064 | AccessShareLock | t  
 | t
 virtualxid |  |  |  |   | 2/50   |   | 
|   |  | 2/50   | 642064 | ExclusiveLock   | t  
 | t
 virtualxid |  |  |  |   | 1/1|   | 
|   |  | 1/0|  17333 | ExclusiveLock   | t  
 | t
(3 rows)

postgresql=#




From: SATYANARAYANA NARLAPURAM 
Date: Monday, January 10, 2022 at 1:06 PM
To: Allie Crawford 
Cc: pgsql-hackers@lists.postgresql.org 
Subject: [Ext:] Re: Stream Replication not working
[External Email]
Seems there is a problem with the replay on your standby. Either it is too slow 
or stuck behind some locks ( replay_lag of 20:38:47.00904 indicates this and 
the flush_lsn is the same as lsn on primary ) . Run pg_locks to see if the 
replay is stuck behind a lock.



On Mon, Jan 10, 2022 at 11:53 AM Allie Crawford 
mailto:crawfor...@churchofjesuschrist.org>> 
wrote:
Hi All,
I have implemented Stream replication in one of my environments, and for some 
reason even though all the health checks are showing that the replication is 
working, when I run manual tests to see if changes are being replicated, the 
changes are not replicated to the standby postgresql environment. I have been 
researching for two day and I cannot find any documentation that talks about 
the case I am running into. I will appreciate if anybody could take a look at 
the details I have detailed below and give me some guidance on where the 
problem might be that is preventing my changes for being replicated. Even 
though I was able to instantiate the standby while firewalld was enabled, I 
decided to disable it just in case that it was causing any issue to the manual 
changes, but disabling firewalld has not had any effect, I am still not able to 
get the manual changes test to be replicated to the standby site. As you will 
see in the details below, the streaming is working, both sites are in sync to 
the latest WAL but for some reasons the latest changes are not on the standby 
site. How is it possible that the standby site is completely in sync but yet 
does not contain the latest changes?

Thanks in advance for any help you can give me with this problem.

Regards,
Allie

Details:

Master postgresql Environment

postgresql=# select * from pg_stat_replication;

-[ RECORD 1 ]+--

pid  | 1979089

usesysid | 16404

usename  | replacct

application_name | walreceiver

client_addr  | 

client_hostname  | 

client_port  | 55096

backend_start| 2022-01-06 17:29:51.542784-07

backend_xmin |

state| streaming

sent_lsn | 0/35000788

write_lsn| 0/35000788

flush_lsn| 0/35000788

replay_lsn   | 0/31000500

write_lag| 00:00:00.001611

flush_lag| 00:00:00.001693

replay_lag   | 20:38:47.00904

sync_priority| 1

sync_state   | sync

reply_time   | 2022-01-07 14:11:58.996277-07



postgresql=#


postgresql=# select * from pg_roles;

  rolname  | rolsuper | rolinherit | rolcreaterole | 
rolcreatedb | rolcanlogin | rolreplication | rolconnlimit | rolpassword | 
rolvaliduntil | rolbypassrls | rolconfig |  oid


Re: [PoC] Delegating pg_ident to a third party

2022-01-10 Thread Stephen Frost
Greetings,

* Jacob Champion (pchamp...@vmware.com) wrote:
> On Tue, 2022-01-04 at 22:24 -0500, Stephen Frost wrote:
> > On Tue, Jan 4, 2022 at 18:56 Jacob Champion  wrote:
> > > 
> > > Could you talk more about the use cases for which having the "actual
> > > user" is better? From an auditing perspective I don't see why
> > > "authenticated as ja...@example.net, logged in as admin" is any worse
> > > than "logged in as jacob".
> > 
> > The above case isn’t what we are talking about, as far as I
> > understand anyway. You’re suggesting “authenticated as 
> > ja...@example.net, logged in as sales” where the user in the database
> > is “sales”.  Consider triggers which only have access to “sales”, or
> > a tool like pgaudit which only has access to “sales”.
> 
> Okay. So an additional getter function in miscadmin.h, and surfacing
> that function to trigger languages, are needed to make authn_id more
> generally useful. Any other cases you can think of?

That would help but now you've got two different things that have to be
tracked, potentially, because for some people you might not want to use
their system auth'd-as ID.  I don't see that as a great solution and
instead as a workaround.  Yes, we should also do this but it's really an
argument for how to deal with such a setup, not a justification for
going down this route.

> > > I was responding more to your statement that "Being able to have roles
> > > and memberships automatically created is much more the direction that
> > > I'd say we should be going in". It's not that one-role-per-user is
> > > inherently wasteful, but forcing role proliferation where it's not
> > > needed is. If all users have the same set of permissions, there doesn't
> > > need to be more than one role. But see below.
> > 
> > Just saying it’s wasteful isn’t actually saying what is wasteful about it.
> 
> Well, I felt like it was irrelevant; you've already said you have no
> intention to force one-user-per-role.

Forcing one-user-per-role would be breaking things we already support
so, no, I certainly don't have any intention of requiring such a change.
That said, I do feel it's useful to have these discussions.

> But to elaborate: *forcing* one-user-per-role is wasteful, because if I
> have a thousand employees, and I want to give all my employees access
> to a guest role in the database, then I have to administer a thousand
> roles: maintaining them through dump/restores and pg_upgrades, auditing
> them to figure out why Bob in Accounting somehow got a different
> privilege GRANT than the rest of the users, adding new accounts,
> purging old ones, maintaining the inevitable scripts that will result.

pg_upgrade just handles it, no?  pg_dumpall -g does too.  Having to deal
with roles in general is a pain but the number of them isn't necessarily
an issue.  A guest role which doesn't have any auditing requirements
might be a decent use-case for what you're talking about here but I
don't know that we'd implement this for just that case.  Part of this
discussion was specifically about addressing the other challenges- like
having automation around the account addition/removal and sync'ing role
membership too.  As for auditing privileges, that should be done
regardless and the case you outline isn't somehow different from others
(the same could be as easily said for how the 'guest' account got access
to whatever it did).

> If none of the users need to be "special" in any way, that's all wasted
> overhead. (If they do actually need to be special, then at least some
> of that overhead becomes necessary. Otherwise it's waste.) You may be
> able to mitigate the cost of the waste, or absorb the mitigations into
> Postgres so that the user can't see the waste, or decide that the waste
> is not costly enough to care about. It's still waste.

Except the amount of 'wasted' overhead being claimed here seems to be
hardly any.  The biggest complaint levied at this seems to really be
just the issues around the load on the ldap systems from having to deal
with the frequent sync queries, and that's largely a solvable issue in
the majority of environments out there today.

> > > > I'm also not suggesting that we make everyone do the same
> > > > thing, indeed, later on I was supportive of having an external system
> > > > provide the mapping.  Here, I'm just making the point that we should
> > > > also be looking at automatic role/membership creation.
> > > 
> > > Gotcha. Agreed; that would open up the ability to administer role
> > > privileges externally too, which would be cool. That could be used in
> > > tandem with something like this patchset.
> > 
> > Not sure exactly what you’re referring to here by “administer role
> > privileges externally too”..?  Curious to hear what you are imagining
> > specifically.
> 
> Just that it would be nice to centrally provision role GRANTs as well
> as role membership, that's all. No specifics in mind, and I'm not even
> sure if LDAP would be a helpful place 

Re: Stream Replication not working

2022-01-10 Thread SATYANARAYANA NARLAPURAM
Seems there is a problem with the replay on your standby. Either it is too
slow or stuck behind some locks ( replay_lag of 20:38:47.00904 indicates
this and the flush_lsn is the same as lsn on primary ) . Run pg_locks to
see if the replay is stuck behind a lock.



On Mon, Jan 10, 2022 at 11:53 AM Allie Crawford <
crawfor...@churchofjesuschrist.org> wrote:

> Hi All,
>
> I have implemented Stream replication in one of my environments, and for
> some reason even though all the health checks are showing that the
> replication is working, when I run manual tests to see if changes are being
> replicated, the changes are not replicated to the standby postgresql
> environment. I have been researching for two day and I cannot find any
> documentation that talks about the case I am running into. I will
> appreciate if anybody could take a look at the details I have detailed
> below and give me some guidance on where the problem might be that is
> preventing my changes for being replicated. Even though I was able to
> instantiate the standby while firewalld was enabled, I decided to disable
> it just in case that it was causing any issue to the manual changes, but
> disabling firewalld has not had any effect, I am still not able to get the
> manual changes test to be replicated to the standby site. As you will see
> in the details below, the streaming is working, both sites are in sync to
> the latest WAL but for some reasons the latest changes are not on the
> standby site. How is it possible that the standby site is completely in
> sync but yet does not contain the latest changes?
>
>
>
> Thanks in advance for any help you can give me with this problem.
>
>
>
> Regards,
>
> Allie
>
>
>
> *Details:*
>
>
>
> *Master **postgresql Environment*
>
> postgresql=# select * from pg_stat_replication;
>
> -[ RECORD 1 ]+--
>
> pid  | 1979089
>
> usesysid | 16404
>
> usename  | replacct
>
> application_name | walreceiver
>
> client_addr  | 
>
> client_hostname  | 
>
> client_port  | 55096
>
> backend_start| 2022-01-06 17:29:51.542784-07
>
> backend_xmin |
>
> state| streaming
>
> sent_lsn | 0/35000788
>
> write_lsn| 0/35000788
>
> flush_lsn| 0/35000788
>
> replay_lsn   | 0/31000500
>
> write_lag| 00:00:00.001611
>
> flush_lag| 00:00:00.001693
>
> replay_lag   | 20:38:47.00904
>
> sync_priority| 1
>
> sync_state   | sync
>
> reply_time   | 2022-01-07 14:11:58.996277-07
>
>
>
> postgresql=#
>
>
>
> postgresql=# select * from pg_roles;
>
>   rolname  | rolsuper | rolinherit | rolcreaterole |
> rolcreatedb | rolcanlogin | rolreplication | rolconnlimit | rolpassword |
> rolvaliduntil | rolbypassrls | rolconfig |  oid
>
>
> ---+--++---+-+-++--+-+---+--+---+---
>
>  postgresql| t| t  | t | t
> | t   | t  |   -1 | |
>   | t|   |10
>
>  pg_monitor| f| t  | f | f
> | f   | f  |   -1 | |
>   | f|   |  3373
>
>  pg_read_all_settings  | f| t  | f | f
> | f   | f  |   -1 | |
>   | f|   |  3374
>
>  pg_read_all_stats | f| t  | f | f
> | f   | f  |   -1 | |
>   | f|   |  3375
>
>  pg_stat_scan_tables   | f| t  | f | f
> | f   | f  |   -1 | |
>   | f|   |  3377
>
>  pg_read_server_files  | f| t  | f | f
> | f   | f  |   -1 | |
>   | f|   |  4569
>
>  pg_write_server_files | f| t  | f | f
> | f   | f  |   -1 | |
>   | f|   |  4570
>
>  pg_execute_server_program | f| t  | f | f
> | f   | f  |   -1 | |
>   | f|   |  4571
>
>  pg_signal_backend | f| t  | f | f
> | f   | f  |   -1 | |
>   | f|   |  4200
>
>  replacct  | t| t  | t | t
> | t   | t  |   -1 | |
>   | t|   | 16404
>
> (10 rows)
>
>
>
> postgresql=#
>
>
>
> postgresql=# 

Stream Replication not working

2022-01-10 Thread Allie Crawford
Hi All,
I have implemented Stream replication in one of my environments, and for some 
reason even though all the health checks are showing that the replication is 
working, when I run manual tests to see if changes are being replicated, the 
changes are not replicated to the standby postgresql environment. I have been 
researching for two day and I cannot find any documentation that talks about 
the case I am running into. I will appreciate if anybody could take a look at 
the details I have detailed below and give me some guidance on where the 
problem might be that is preventing my changes for being replicated. Even 
though I was able to instantiate the standby while firewalld was enabled, I 
decided to disable it just in case that it was causing any issue to the manual 
changes, but disabling firewalld has not had any effect, I am still not able to 
get the manual changes test to be replicated to the standby site. As you will 
see in the details below, the streaming is working, both sites are in sync to 
the latest WAL but for some reasons the latest changes are not on the standby 
site. How is it possible that the standby site is completely in sync but yet 
does not contain the latest changes?

Thanks in advance for any help you can give me with this problem.

Regards,
Allie

Details:

Master postgresql Environment

postgresql=# select * from pg_stat_replication;

-[ RECORD 1 ]+--

pid  | 1979089

usesysid | 16404

usename  | replacct

application_name | walreceiver

client_addr  | 

client_hostname  | 

client_port  | 55096

backend_start| 2022-01-06 17:29:51.542784-07

backend_xmin |

state| streaming

sent_lsn | 0/35000788

write_lsn| 0/35000788

flush_lsn| 0/35000788

replay_lsn   | 0/31000500

write_lag| 00:00:00.001611

flush_lag| 00:00:00.001693

replay_lag   | 20:38:47.00904

sync_priority| 1

sync_state   | sync

reply_time   | 2022-01-07 14:11:58.996277-07



postgresql=#


postgresql=# select * from pg_roles;

  rolname  | rolsuper | rolinherit | rolcreaterole | 
rolcreatedb | rolcanlogin | rolreplication | rolconnlimit | rolpassword | 
rolvaliduntil | rolbypassrls | rolconfig |  oid

---+--++---+-+-++--+-+---+--+---+---

 postgresql| t| t  | t | t  
 | t   | t  |   -1 | |   | 
t|   |10

 pg_monitor| f| t  | f | f  
 | f   | f  |   -1 | |   | 
f|   |  3373

 pg_read_all_settings  | f| t  | f | f  
 | f   | f  |   -1 | |   | 
f|   |  3374

 pg_read_all_stats | f| t  | f | f  
 | f   | f  |   -1 | |   | 
f|   |  3375

 pg_stat_scan_tables   | f| t  | f | f  
 | f   | f  |   -1 | |   | 
f|   |  3377

 pg_read_server_files  | f| t  | f | f  
 | f   | f  |   -1 | |   | 
f|   |  4569

 pg_write_server_files | f| t  | f | f  
 | f   | f  |   -1 | |   | 
f|   |  4570

 pg_execute_server_program | f| t  | f | f  
 | f   | f  |   -1 | |   | 
f|   |  4571

 pg_signal_backend | f| t  | f | f  
 | f   | f  |   -1 | |   | 
f|   |  4200

 replacct  | t| t  | t | t  
 | t   | t  |   -1 | |   | 
t|   | 16404

(10 rows)



postgresql=#


postgresql=# create database test_replication_3;

CREATE DATABASE

postgresql=#



postgresql=# select datname from pg_database;

  datname



 postgres

 postgresql

 template1

 template0

 stream

 test_replication

 test_replication_2

 test_replication_3

(8 rows)



postgresql=#



postgresql=# SELECT pg_current_wal_lsn();

 pg_current_wal_lsn



 0/35000788

(1 row)



postgresql=#


Standby postgresql Environment
postgresql=# select * from pg_stat_wal_receiver;
-[ 

Isn't wait_for_catchup slightly broken?

2022-01-10 Thread Tom Lane
While trying to make sense of some recent buildfarm failures,
I happened to notice that the default query issued by
the TAP sub wait_for_catchup looks like

SELECT pg_current_wal_lsn() <= replay_lsn AND state = 'streaming' FROM 
pg_catalog.pg_stat_replication WHERE application_name = '';

ISTM there are two things wrong with this:

1. Since pg_current_wal_lsn() is re-evaluated each time, we're
effectively setting a moving target for the standby to reach.
Admittedly we're not going to be issuing any new DML while
waiting in wait_for_catchup, but background activity such as
autovacuum could be creating new WAL.  Thus, the test is likely
to wait longer than it needs to.  In the worst case, we'd never
catch up until the primary server has been totally quiescent
for awhile.

2. Aside from being slower than necessary, this also makes the
test squishy and formally incorrect, because the standby might
get the opportunity to replay more WAL than the test intends.

So I think we need to fix it to capture the target WAL position
at the start, as I've done in the attached patch.  In principle
this might make things a bit slower because of the extra
transaction required, but I don't notice any above-the-noise
difference on my own workstation.

Another thing that is bothering me a bit is that a number of the
callers use $node->lsn('insert') as the target.  This also seems
rather dubious, because that could be ahead of what's been written
out.  These callers are just taking it on faith that something will
eventually cause that extra WAL to get written out (and become
available to the standby).  Again, that seems to make the test
slower than it need be, with a worst-case scenario being that it
eventually times out.  Admittedly this is unlikely to be a big
problem unless some background op issues an abortive transaction
at just the wrong time.  Nonetheless, I wonder if we shouldn't
standardize on "thou shalt use the write position", because I
don't think the other alternatives have anything to recommend them.
I've not addressed that below, though I did tweak the comment about
that parameter.

Thoughts?

regards, tom lane

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index e18f27276c..fb6cc39db4 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -2509,8 +2509,10 @@ poll_query_until timeout.
 
 Requires that the 'postgres' db exists and is accessible.
 
-target_lsn may be any arbitrary lsn, but is typically $primary_node->lsn('insert').
-If omitted, pg_current_wal_lsn() is used.
+The default value of target_lsn is $node->lsn('write'), which ensures
+that the standby has caught up to what has been committed on the primary.
+Another plausible choice is $node->lsn('insert'), which ensures that
+preceding executed-but-not-committed WAL has been replayed as well.
 
 This is not a test. It die()s on failure.
 
@@ -2531,23 +2533,18 @@ sub wait_for_catchup
 	{
 		$standby_name = $standby_name->name;
 	}
-	my $lsn_expr;
-	if (defined($target_lsn))
+	if (!defined($target_lsn))
 	{
-		$lsn_expr = "'$target_lsn'";
-	}
-	else
-	{
-		$lsn_expr = 'pg_current_wal_lsn()';
+		$target_lsn = $self->lsn('write');
 	}
 	print "Waiting for replication conn "
 	  . $standby_name . "'s "
 	  . $mode
 	  . "_lsn to pass "
-	  . $lsn_expr . " on "
+	  . $target_lsn . " on "
 	  . $self->name . "\n";
 	my $query =
-	  qq[SELECT $lsn_expr <= ${mode}_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name = '$standby_name';];
+	  qq[SELECT '$target_lsn' <= ${mode}_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name = '$standby_name';];
 	$self->poll_query_until('postgres', $query)
 	  or croak "timed out waiting for catchup";
 	print "done\n";


Re: make MaxBackends available in _PG_init

2022-01-10 Thread Bossart, Nathan
On 1/7/22, 12:27 PM, "Robert Haas"  wrote:
> On Fri, Jan 7, 2022 at 1:09 PM Bossart, Nathan  wrote:
>> While that approach would provide a way to safely retrieve the value,
>> I think it would do little to prevent the issue in practice.  If the
>> size of the patch is a concern, we could also convert MaxBackends into
>> a macro for calling GetMaxBackends().  This could also be a nice way
>> to avoid breaking extensions that are using it correctly while
>> triggering ERRORs for extensions that are using it incorrectly.  I've
>> attached a new version of the patch that does it this way.
>
> That's too magical for my taste.

Fair point.  v4 [0] is the less magical version.

Nathan

[0] 
https://postgr.es/m/attachment/125445/v4-0001-Disallow-external-access-to-MaxBackends.patch



Postgres Replication from windows to linux

2022-01-10 Thread rajesh singarapu
Hi Hackers,

I am trying to migrate my postgres to linux, as we are moving away from
windows.
I am trying both dump/restore  and logical decoding, but people are not
happy with performance.
Is there a way/tooling I can use around WAL shipping/physical replication
here ?


thanks
Rajesh


Re: Converting WAL to SQL

2022-01-10 Thread rajesh singarapu
Thanks much for your suggestions,
I am exploring logical decoding because I have two different platforms and
versions as well.
So my best bet is logical decoding, but I am also wondering if somebody has
done replication/migration from windows to linux or vise-a-versa at
physical level with some tooling.

thanks
Rajesh


On Thu, Jan 6, 2022 at 12:21 AM Fabrízio de Royes Mello <
fabriziome...@gmail.com> wrote:

>
> On Wed, Jan 5, 2022 at 2:19 PM Julien Rouhaud  wrote:
> >
> > On Thu, Jan 6, 2022 at 12:19 AM Bruce Momjian  wrote:
> > >
> > > On Tue, Jan  4, 2022 at 10:47:47AM -0300, Fabrízio de Royes Mello
> wrote:
> > > >
> > > >
> > > > What we did was decode the 9.6 wal files and apply transactions to
> the
> > > > old 9.2 to keep it in sync with the new promoted version. This was
> our
> > > > "rollback" strategy if something went wrong with the new 9.6 version.
> > >
> > > How did you deal with the issue that SQL isn't granular enough (vs.
> > > row-level changes) to reproduce the result reliably, as outlined here?
> >
> > This is a logical decoding plugin, so it's SQL containing decoded
> > row-level changes.  It will behave the same as a
> > publication/suscription (apart from being far less performant, due to
> > being plain SQL of course).
>
> Exactly!
>
> --
>Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
>PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
>


Re: Add index scan progress to pg_stat_progress_vacuum

2022-01-10 Thread Bossart, Nathan
On 1/6/22, 6:14 PM, "Imseih (AWS), Sami"  wrote:
> I am hesitant to make column name changes for obvious reasons, as it breaks 
> existing tooling. However, I think there is a really good case to change 
> "index_vacuum_count" as the name is confusing. 
> "index_vacuum_cycles_completed" is the name I suggest if we agree to rename.
>
> For the new column, "num_indexes_to_vacuum" is good with me. 

Yeah, I think we can skip renaming index_vacuum_count for now.  In any
case, it would probably be good to discuss that in a separate thread.

Nathan



Re: Multiple Query IDs for a rewritten parse tree

2022-01-10 Thread Andrey Lepikhov

On 10/1/2022 15:39, Julien Rouhaud wrote:

On Mon, Jan 10, 2022 at 03:24:46PM +0500, Andrey Lepikhov wrote:

On 10/1/2022 13:56, Julien Rouhaud wrote:
Yes. the same input query string doesn't prove that frozen query plan can be
used, because rewrite rules could be changed. So we use only a query tree.


Yes, I'm fully aware of that.  I wasn't asking about using the input query
string but the need for generating a dedicated normalized output query string
that would be potentially different from the one generated by
pg_stat_statements (or similar).

Thanks, now I got it. I don't remember a single situation where we would 
need to normalize a query string.

But then, if generating 2 queryid is a better for performance, does the
extension really need an additional jumble state and/or normalized query
string?

Additional Jumble state isn't necessary, but it would be better for
performance to collect pointers to all constant nodes during a process of
hash generation.


I agree, but it's even better for performance if this is collected only once.
I still don't know if this extension (or any extension) would require something
different from a common jumble state that would serve for all purpose or if we
can work with a single jumble state and only handle multiple queryid.
I think, JumbleState in highly monitoring-specific, maybe even 
pg_stat_statements specific (maybe you can designate another examples). 
I haven't ideas how it can be used in arbitrary extension.
But, it is not so difficult to imagine an implementation (as part of 
Tom's approach, described earlier) such kind of storage for each 
generation method.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Matching domains-over-enums to anyenum types

2022-01-10 Thread Tom Lane
Ronan Dunklau  writes:
> This has been already discussed a long time ago, and the idea was rejected at 
> the time since there was no demand for it:
> https://www.postgresql.org/message-id/flat/BANLkTi%3DaGxDbGPSF043V2K-C2vF2YzGz9w%40mail.gmail.com#da4826d2cbbaca20e3440aadb3093158

I see that one of the considerations in that thread was the lack
of arrays over domains.  We've since fixed that, so probably it'd
be reasonable to take a fresh look, but I'm not sure that the
conclusion would be the same as what I proposed then.

regards, tom lane




Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

2022-01-10 Thread Alexander Lakhin
10.01.2022 12:40, Thomas Munro wrote:
> This is super quick-and-dirty code (and doesn't handle some errors or
> socket changes correctly), but does it detect the closed socket?
Yes, it fixes the behaviour and makes the 002_standby test pass (100 of
100 iterations). I'm yet to find out whether the other
WaitLatchOrSocket' users (e. g. postgres_fdw) can suffer from the
disconnected socket state, but this approach definitely works for
walreceiver.

Best regards,
Alexander




Matching domains-over-enums to anyenum types

2022-01-10 Thread Ronan Dunklau
Hello,

A colleague of mine was surprised to discover the following statements raised 
an error:

postgres=# CREATE TYPE abc_enum AS ENUM ('a', 'b', 'c');
CREATE TYPE
postgres=# CREATE DOMAIN abc_domain AS abc_enum;
CREATE DOMAIN
postgres=#  SELECT 'a'::abc_domain = 'a'::abc_domain; 
ERROR:  operator does not exist: abc_domain = abc_domain
LINE 1: SELECT 'a'::abc_domain = 'a'::abc_domain;
   ^
HINT:  No operator matches the given name and argument types. You might need 
to add explicit type casts.

This has been already discussed a long time ago, and the idea was rejected at 
the time since there was no demand for it:

https://www.postgresql.org/message-id/flat/BANLkTi%3DaGxDbGPSF043V2K-C2vF2YzGz9w%40mail.gmail.com#da4826d2cbbaca20e3440aadb3093158

Given that we implemented that behaviour for domains over ranges and 
multiranges, I don't see the harm in doing the same for domains over enums.

What do you think ?

-- 
Ronan Dunklau






Re: Windows crash / abort handling

2022-01-10 Thread Andrew Dunstan


On 10/5/21 15:30, Andres Freund wrote
>
> To actually get the crash reports I ended up doing the following on the OS
> level [5]:
>
> Set-ItemProperty -Path 'HKLM:\SOFTWARE\Microsoft\Windows 
> NT\CurrentVersion\AeDebug' -Name 'Debugger' -Value '\"C:\Windows 
> Kits\10\Debuggers\x64\cdb.exe\" -p %ld -e %ld -g -kqm -c \".lines -e; 
> .symfix+ ;.logappend c:\cirrus\crashlog.txt ; !peb; ~*kP ; .logclose ; q \"' 
> ; `
> New-ItemProperty -Path 'HKLM:\SOFTWARE\Microsoft\Windows 
> NT\CurrentVersion\AeDebug' -Name 'Auto' -Value 1 -PropertyType DWord ; `
> Get-ItemProperty -Path 'HKLM:\SOFTWARE\Microsoft\Windows 
> NT\CurrentVersion\AeDebug' -Name Debugger; `
>
> This requires 'cdb' to be present, which is included in the Windows 10 SDK (or
> other OS versions, it doesn't appear to have changed much). Whenever there's
> an unhandled crash, cdb.exe is invoked with the parameters above, which
> appends the crash report to crashlog.txt.
>
> Alternatively we can generate "minidumps" [6], but that doesn't appear to be 
> more
> helpful for CI purposes at least - all we'd do is to create a backtrace using
> the same tool. But it might be helpful for local development, to e.g. analyze
> crashes in more detail.
>
> The above ends up dumping all crashes into a single file, but that can
> probably be improved. But cdb is so gnarly that I wanted to stop looking once
> I got this far...
>
>
> Andrew, I wonder if something like this could make sense for windows BF 
> animals?
>


Very possibly. I wonder how well it will work on machines where I have
more than one animal .e.g. lorikeet (cygwin) jacana (msys) and bowerbird
(MSVC) are all on the same machine. Likewise drongo (MSVC) and fairywren
(msys2).


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: pg_upgrade verbosity when redirecting output to log file

2022-01-10 Thread Bruce Momjian
On Sun, Jan  9, 2022 at 10:39:58PM -0800, Andres Freund wrote:
> Hi,
> 
> On 2022-01-10 01:14:32 -0500, Tom Lane wrote:
> > Andres Freund  writes:
> > Fun!  That seems to me to be a strong argument for not letting
> > the behavior vary depending on isatty().
> 
> Yea.
> 
> 
> > I think I'd vote for just nuking that output altogether.
> > It seems of very dubious value.
> 
> It seems worthwhile in some form - on large cluster in copy mode, the "Copying
> user relation files" step can take *quite* a while, and even link/clone mode
> aren't fast. But perhaps what'd be really needed is something counting up
> actual progress in percentage of files and/or space...
> 
> I think just coupling it to verbose mode makes the most sense, for now?

All of this logging is from the stage where I was excited pg_upgrade
worked, and I wanted to give clear output if it failed in some way ---
printing the file names seems like an easy solution.  I agree at this
point that logging should be reduced, and if they want more logging, the
verbose option is the right way to get it.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Undocumented behavior of timezone(zone, timestamp) for impossible timestamptz's

2022-01-10 Thread Tom Lane
I wrote:
> Aleksander Alekseev  writes:
>> 1. Should this behavior be documented in the 9.9.4. AT TIME ZONE
>> section or maybe it's documented elsewhere and I just missed it?

> https://www.postgresql.org/docs/current/datetime-invalid-input.html

... and reading that again, I realize that I screwed up the
fall-back example :-(.  2:30 is not ambiguous; I should have
demonstrated the behavior for, say, 1:30.  Will fix.

regards, tom lane




Re: Fix vcregress plpython3 warning

2022-01-10 Thread Juan José Santamaría Flecha
On Mon, Jan 10, 2022 at 4:14 PM Andrew Dunstan  wrote:

>
> Pushed, and backpatched.
>
> Great, thanks.

Regards,

Juan José Santamaría Flecha


Re: Undocumented behavior of timezone(zone, timestamp) for impossible timestamptz's

2022-01-10 Thread Tom Lane
Aleksander Alekseev  writes:
> Due to DST and also changes in local laws, there could be gaps in
> local time [1].

Yup.

> 1. Should this behavior be documented in the 9.9.4. AT TIME ZONE
> section or maybe it's documented elsewhere and I just missed it?

https://www.postgresql.org/docs/current/datetime-invalid-input.html

> 2. Is it possible to detect an impossible timestamptz's for users who
> wants stricter semantics? If there is a way I think it's worth
> documenting as well.

Maybe convert back and see if you get an identical result?

regards, tom lane




Re: ICU for global collation

2022-01-10 Thread Julien Rouhaud
On Mon, Jan 10, 2022 at 03:45:47PM +0100, Daniel Verite wrote:
> 
> By that I understand that CREATE DATABASE is limited to copying a template
> database and then not write anything into it beyond that, as it's
> not even connected to it.

Yes.

> I guess there's still the possibility of requiring that the ICU db-wide
> collation of the new database does exist in the template database,
> and then the CREATE DATABASE would refer to that collation instead of
> an independent locale string.

That could work.  However if having the collation in the template database a
strict requirement the something should also be done for initdb, and it will
probably be a bigger problem.




Re: Fix vcregress plpython3 warning

2022-01-10 Thread Andrew Dunstan


On 1/10/22 06:53, Juan José Santamaría Flecha wrote:
>
> On Mon, Jan 10, 2022 at 12:51 PM Juan José Santamaría Flecha
>  wrote:
>
> Please find attached a patch for so. 
>
> The patch.
>
>  
>
>

Pushed, and backpatched.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: small bug in ecpg unicode identifier error handling

2022-01-10 Thread Tom Lane
Peter Eisentraut  writes:
> I think this patch is necessary:
> - if (literallen == 2) /* "U&" */
> + if (literallen == 0)

Seems sensible, and matches the corresponding code in scan.l.
+1.

regards, tom lane




Re: reporting TID/table with corruption error

2022-01-10 Thread Alvaro Herrera
On 2022-Jan-10, Andrey Borodin wrote:

> 3. The tuple seems to be updated in a high-contention concurrency
> trigger function, autovacuum keeks in ~20-30 seconds after the message
> in logs

Hmm, I bet this is related.

> [ 2022-01-10 09:07:17.671 MSK [unknown],,_s,310759,XX001 ]:ERROR: 
>  t_xmin 696079792 is uncommitted in tuple (1419011,109) to be updated in 
> table "s_statistics"
> [ 2022-01-10 09:07:17.671 MSK [unknown],,_s,310759,XX001 
> ]:CONTEXT:  SQL statement "UPDATE _s.s_statistics os
>  SET __found_ts = COALESCE(os.__found_ts, NOW()),
>  last__found_ts = NOW(),
>  num_s = os.num_s + 1
>  WHERE _id = NEW._id"
> PL/pgSQL function statistics__update_from_new_() line 3 at SQL 
> statement
> [ 2022-01-10 09:07:17.671 MSK [unknown],,_s,310759,XX001 
> ]:STATEMENT:  
> INSERT INTO _s.s_s AS s 
> 
> 4. t_xmin is relatevely new, not ancient
> 5. pageinspect shows dead tuple after some time
> 6. no suspicious activity in logs nearby
> 7. vacuum (disable_page_skipping) and repack of indexes did not change 
> anything
> 
> 
> I suspect this can be relatively new concurrency stuff. At least I never saw 
> this before on clusters with clean amcheck and heapcheck results.

Ah.  I've been thinking that it'd be some very old tuple that is in
trouble, but that seems to be proven false.  I think we should examine
the affected tuples more closely while they're in the state that causes
the problem.  Can you set up things so that pageinspect's
heap_page_items() is run on the broken page, before the problem
disappears?  Maybe extract the page number from the log line, have
get_raw_page() store the page in a separate table, so that we can run
heap_page_items() at leisure later.  I would also grep through
pg_waldump output to see what changes have been done to that tuple.

Maybe the reason the problem fixes itself is that something else deletes
the tuple.

> Alvaro, did you observe this on binaries from August 13 minor release or 
> older?

Well, the only reports I have of this problem are with the original
error message that didn't give any clues at to what the problem was or
where to look for it, so I don't know if the xmin was recent or not.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/




Re: ICU for global collation

2022-01-10 Thread Daniel Verite
Julien Rouhaud wrote:

> > The lack of these fields overall suggest the idea that when CREATE
> > DATABASE is called with a global ICU collation, what if it somehow
> > inserted the collation into pg_collation in the new database?
> > Then pg_database would just store the collation oid and no other
> > collation-related field would need to be added into it, now
> > or in the future.
> 
> I don't think it would be doable given the single-database-per-backend
> restriction.

By that I understand that CREATE DATABASE is limited to copying a template
database and then not write anything into it beyond that, as it's
not even connected to it.
I guess there's still the possibility of requiring that the ICU db-wide
collation of the new database does exist in the template database,
and then the CREATE DATABASE would refer to that collation instead of
an independent locale string.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: CREATE ROLE IF NOT EXISTS

2022-01-10 Thread Asif Rehman
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

Wouldn't using opt_or_replace rule be a better option?

The new status of this patch is: Waiting on Author


Re: pg_upgrade verbosity when redirecting output to log file

2022-01-10 Thread Tom Lane
Andres Freund  writes:
> On 2022-01-10 01:14:32 -0500, Tom Lane wrote:
>> Andres Freund  writes:
>> Fun!  That seems to me to be a strong argument for not letting
>> the behavior vary depending on isatty().

> Yea.

> I think just coupling it to verbose mode makes the most sense, for now?

WFM.

regards, tom lane




small bug in ecpg unicode identifier error handling

2022-01-10 Thread Peter Eisentraut

I think this patch is necessary:

diff --git a/src/interfaces/ecpg/preproc/pgc.l 
b/src/interfaces/ecpg/preproc/pgc.l
index 07fee80a9c..3529b2ea86 100644
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -753,7 +753,7 @@ cppline 
{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
}
 {dquote}{
BEGIN(state_before_str_start);
-   if (literallen == 2) /* "U&" */
+   if (literallen == 0)
mmerror(PARSE_ERROR, ET_ERROR, 
"zero-length delimited identifier");
/* The backend will truncate the 
identifier here. We do not as it does not change the result. */
base_yylval.str = psprintf("U&\"%s\"", 
literalbuf);

The old code doesn't make sense.  The literallen is the length of the
data in literalbuf, which clearly doesn't include the "U&" as the
comment suggests.

A test case is to preprocess a file like this (ecpg test.pgc):

exec sql select u&"

which currently does *not* give the above error, but it should.




Re: ICU for global collation

2022-01-10 Thread Julien Rouhaud
On Mon, Jan 10, 2022 at 12:49:07PM +0100, Daniel Verite wrote:
> 
> The "daticucol" column also suggests that we don't expect to add
> other collation providers in the future. Maybe a pair of columns like
> (datcollprovider, datcolllocale) would be more future-proof,
> or a (datcollprovider, datcolllocale, datcollisdeterministic)
> triplet if non-deterministic collations are allowed.

I'm not sure about the non-deterministic default collation given the current
restrictions with it, but the extra column seems like a good idea.  It would
require a bit more thinking, as we would need a second collation column in
pg_database for any default provider that's not libc.

> Also, pg_collation has "collversion" to detect a mismatch between
> the ICU runtime and existing indexes. I don't see that field
> for the db-wide ICU collation, so maybe we currently miss the capability
> to detect the mismatch for the db-wide collation?

I don't think that storing a version there will really help.  There's no
guarantee that any object has been created with the version of the collation
that was installed when the database was created.  And we would still need
to store a version with each underlying object anyway, as rebuilding all broken
dependencies can last for a long time, including a server restart.

> The lack of these fields overall suggest the idea that when CREATE
> DATABASE is called with a global ICU collation, what if it somehow
> inserted the collation into pg_collation in the new database?
> Then pg_database would just store the collation oid and no other
> collation-related field would need to be added into it, now
> or in the future.

I don't think it would be doable given the single-database-per-backend
restriction.




Re: Add jsonlog log_destination for JSON server logs

2022-01-10 Thread Michael Paquier
On Fri, Jan 07, 2022 at 03:49:47PM +0900, Michael Paquier wrote:
> Yes, I was waiting for the latest results, but that did not help at
> all.  Something is wrong with the patch, I am not sure what yet, but
> that seems like a mistake in the backend part of it rather than the
> tests.  I have switched the CF entry as waiting on author until this
> is addressed.

The issue comes from an incorrect change in syslogger_parseArgs()
where I missed that the incrementation of argv by 3 has no need to be
changed.  A build with -DEXEC_BACKEND is enough to show the failure,
which caused a crash when starting up the syslogger because of a NULL
pointer dereference.  The attached v9 should be enough to switch the
CF bot to green.
--
Michael
From 6ab52532b232bc3314cef5c9c4ac38110957ab34 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 19 Oct 2021 16:25:45 +0900
Subject: [PATCH v9 1/3] Some refactoring of elog-specific routines

This refactors out the following things in elog.c, for ease of use
across multiple log destinations:
- start_timestamp (including reset)
- log_timestamp
- decide if query can be logged
- backend type
- write using the elog piped protocol
- Error severity to string.

These will be reused by csvlog and jsonlog.
---
 src/include/utils/elog.h   |  12 +++
 src/backend/utils/error/elog.c | 159 +
 2 files changed, 114 insertions(+), 57 deletions(-)

diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index 8df2799dc2..f37956f100 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -442,6 +442,18 @@ extern void DebugFileOpen(void);
 extern char *unpack_sql_state(int sql_state);
 extern bool in_error_recursion_trouble(void);
 
+/* Common functions shared across destinations */
+extern void reset_formatted_start_time(void);
+extern char *get_formatted_start_time(void);
+extern char *get_formatted_log_time(void);
+extern const char *get_backend_type_for_log(void);
+extern bool	check_log_of_query(ErrorData *edata);
+extern const char *error_severity(int elevel);
+extern void write_pipe_chunks(char *data, int len, int dest);
+
+/* Destination-specific functions */
+extern void write_csvlog(ErrorData *edata);
+
 #ifdef HAVE_SYSLOG
 extern void set_syslog_parameters(const char *ident, int facility);
 #endif
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 2f2c3ba41b..4ccad9bac1 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -175,15 +175,10 @@ static const char *err_gettext(const char *str) pg_attribute_format_arg(1);
 static pg_noinline void set_backtrace(ErrorData *edata, int num_skip);
 static void set_errdata_field(MemoryContextData *cxt, char **ptr, const char *str);
 static void write_console(const char *line, int len);
-static void setup_formatted_log_time(void);
-static void setup_formatted_start_time(void);
 static const char *process_log_prefix_padding(const char *p, int *padding);
 static void log_line_prefix(StringInfo buf, ErrorData *edata);
-static void write_csvlog(ErrorData *edata);
 static void send_message_to_server_log(ErrorData *edata);
-static void write_pipe_chunks(char *data, int len, int dest);
 static void send_message_to_frontend(ErrorData *edata);
-static const char *error_severity(int elevel);
 static void append_with_tabs(StringInfo buf, const char *str);
 
 
@@ -2289,14 +2284,23 @@ write_console(const char *line, int len)
 }
 
 /*
- * setup formatted_log_time, for consistent times between CSV and regular logs
+ * get_formatted_log_time -- compute and get the log timestamp.
+ *
+ * The timestamp is computed if not set yet, so as it is kept consistent
+ * among all the log destinations that require it to be consistent.  Note
+ * that the computed timestamp is returned in a static buffer, not
+ * palloc()'d.
  */
-static void
-setup_formatted_log_time(void)
+char *
+get_formatted_log_time(void)
 {
 	pg_time_t	stamp_time;
 	char		msbuf[13];
 
+	/* leave if already computed */
+	if (formatted_log_time[0] != '\0')
+		return formatted_log_time;
+
 	if (!saved_timeval_set)
 	{
 		gettimeofday(_timeval, NULL);
@@ -2318,16 +2322,34 @@ setup_formatted_log_time(void)
 	/* 'paste' milliseconds into place... */
 	sprintf(msbuf, ".%03d", (int) (saved_timeval.tv_usec / 1000));
 	memcpy(formatted_log_time + 19, msbuf, 4);
+
+	return formatted_log_time;
 }
 
 /*
- * setup formatted_start_time
+ * reset_formatted_start_time -- reset the start timestamp
  */
-static void
-setup_formatted_start_time(void)
+void
+reset_formatted_start_time(void)
+{
+	formatted_start_time[0] = '\0';
+}
+
+/*
+ * get_formatted_start_time -- compute and get the start timestamp.
+ *
+ * The timestamp is computed if not set yet.  Note that the computed
+ * timestamp is returned in a static buffer, not palloc()'d.
+ */
+char *
+get_formatted_start_time(void)
 {
 	pg_time_t	stamp_time = (pg_time_t) MyStartTime;
 
+	/* leave if already computed */
+	if (formatted_start_time[0] != 

Undocumented behavior of timezone(zone, timestamp) for impossible timestamptz's

2022-01-10 Thread Aleksander Alekseev
Hi hackers,

Due to DST and also changes in local laws, there could be gaps in
local time [1]. For instance, 1 second after "2011-03-27 01:59:59 MSK"
goes "2011-03-27 03:00:00 MSK":

```
select (timestamptz '2011-03-27 01:59:59 MSK') at time zone 'MSK';
  timezone
-
 2011-03-27 01:59:59
(1 row)

select ((timestamptz '2011-03-27 01:59:59 MSK') + interval '1 second')
at time zone 'MSK';
  timezone
-
 2011-03-27 03:00:00
(1 row)
```

This makes '2011-03-27 02:00:00 MSK' an impossible timestamptz. I was
curious how `timezone(zone, timestamp)` aka `timestamp at time zone`
handles such dates and discovered that it seems to round impossible
dates to the nearest possible one:

```
set time zone 'Europe/Moscow';

select (timestamp '2011-03-27 01:00:00') at time zone 'MSK';
timezone

 2011-03-27 01:00:00+03
(1 row)

select (timestamp '2011-03-27 02:00:00') at time zone 'MSK';
timezone

 2011-03-27 01:00:00+03
(1 row)
```

I don't know what the SQL standard says about it, but personally, I
find this behavior very convenient. Although it doesn't seem to be
documented [2].

So I have two questions:

1. Should this behavior be documented in the 9.9.4. AT TIME ZONE
section or maybe it's documented elsewhere and I just missed it?
2. Is it possible to detect an impossible timestamptz's for users who
wants stricter semantics? If there is a way I think it's worth
documenting as well.

[1]: https://en.wikipedia.org/wiki/Moscow_Time#Past_usage
[2]: 
https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-ZONECONVERT
--
Best regards,
Aleksander Alekseev




Re: Fix vcregress plpython3 warning

2022-01-10 Thread Juan José Santamaría Flecha
On Mon, Jan 10, 2022 at 12:51 PM Juan José Santamaría Flecha <
juanjo.santama...@gmail.com> wrote:

> Please find attached a patch for so.
>
The patch.

>
>
Regards,
>
> Juan José Santamaría Flecha
>


v2-0001-Fix-vcregress-plpython3-warnings.patch
Description: Binary data


Re: Fix vcregress plpython3 warning

2022-01-10 Thread Juan José Santamaría Flecha
On Fri, Jan 7, 2022 at 3:24 PM Andrew Dunstan  wrote:

>
> In that case, just this should work:
>
> s/EXTENSION (\S*?)plpython2?u/EXTENSION $1plpython3u/g ;
>
> Please find attached a patch for so. I have also open an item in the
commitfest:

https://commitfest.postgresql.org/37/3507/

Regards,

Juan José Santamaría Flecha


Re: Skipping logical replication transactions on subscriber side

2022-01-10 Thread Amit Kapila
On Thu, Dec 16, 2021 at 11:12 AM Masahiko Sawada  wrote:
>
> On Thu, Dec 16, 2021 at 2:21 PM Amit Kapila  wrote:
> >
> > >
> > > So if skip_xid is already changed, the apply worker would do
> > > replorigin_advance() with WAL logging, instead of committing the
> > > catalog change?
> > >
> >
> > Right. BTW, how are you planning to advance the origin? Normally, a
> > commit transaction would do it but when we are skipping all changes,
> > the commit might not do it as there won't be any transaction id
> > assigned.
>
> I've not tested it yet but replorigin_advance() with wal_log = true
> seems to work for this case.
>

IIUC, the changes corresponding to above in the latest patch are as follows:

--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -921,7 +921,8 @@ replorigin_advance(RepOriginId node,
  LWLockAcquire(_state->lock, LW_EXCLUSIVE);

  /* Make sure it's not used by somebody else */
- if (replication_state->acquired_by != 0)
+ if (replication_state->acquired_by != 0 &&
+ replication_state->acquired_by != MyProcPid)
  {
...

clear_subscription_skip_xid()
{
..
+ else if (!XLogRecPtrIsInvalid(origin_lsn))
+ {
+ /*
+ * User has already changed subskipxid before clearing the subskipxid, so
+ * don't change the catalog but just advance the replication origin.
+ */
+ replorigin_advance(replorigin_session_origin, origin_lsn,
+GetXLogInsertRecPtr(),
+false, /* go_backward */
+true /* wal_log */);
+ }
..
}

I was thinking what if we don't advance origin explicitly in this
case? Actually, that will be no different than the transactions where
the apply worker doesn't apply any change because the initial sync is
in progress (see should_apply_changes_for_rel()) or we have received
an empty transaction. In those cases also, the origin lsn won't be
advanced even though we acknowledge the advanced last_received
location because of keep_alive messages. Now, it is possible after the
restart we send the old start_lsn location because the replication
origin was not updated before restart but we handle that case in the
server by starting from the last confirmed location. See below code:

CreateDecodingContext()
{
..
else if (start_lsn < slot->data.confirmed_flush)
..

Few other comments on the latest patch:
=
1.
A conflict will produce an error and will stop the replication; it must be
resolved manually by the user.  Details about the conflict can be found in
-   the subscriber's server log.
+as well as the
+   subscriber's server log.

Can we slightly change the modified line to: "Details about the
conflict can be found in  and the
subscriber's server log."? I think we can commit this change
separately as this is true even without this patch.

2.
The resolution can be done either by changing data on the subscriber so
-   that it does not conflict with the incoming change or by skipping the
-   transaction that conflicts with the existing data.  The transaction can be
-   skipped by calling the 
+   that it does not conflict with the incoming changes or by skipping the whole
+   transaction.  This option specifies the ID of the transaction whose
+   application is to be skipped by the logical replication worker.  The logical
+   replication worker skips all data modification transaction conflicts with
+   the existing data. When a conflict produce an error, it is shown in
+   pg_stat_subscription_workers view as follows:

I don't think most of the additional text added in the above paragraph
is required. We can rephrase it as: "The resolution can be done either
by changing data on the subscriber so that it does not conflict with
the incoming change or by skipping the transaction that conflicts with
the existing data. When a conflict produces an error, it is shown in
pg_stat_subscription_workers view as
follows:". After that keep the text, you have.

3.
They skip the whole transaction, including changes that may not violate any
+   constraint.  They may easily make the subscriber inconsistent, especially if
+   a user specifies the wrong transaction ID or the position of origin.

Can we slightly reword the above text as: "Skipping the whole
transaction includes skipping the changes that may not violate any
constraint.  This can easily make the subscriber inconsistent,
especially if a user specifies the wrong transaction ID or the
position of origin."?

4.
The logical replication worker skips all data
+  modification changes within the specified transaction.  Therefore, since
+  it skips the whole transaction including the changes that may not violate
+  the constraint, it should only be used as a last resort. This option has
+  no effect for the transaction that is already prepared with enabling
+  two_phase on susbscriber.

Let's slightly reword the above text as: "The logical replication
worker skips all data modification changes within the specified
transaction including the changes that may not 

Re: ICU for global collation

2022-01-10 Thread Daniel Verite
Peter Eisentraut wrote:

> Unlike in the previous patch, where the ICU 
> collation name was written in datcollate, there is now a third column 
> (daticucoll), so we can store all three values.

I think some users would want their db-wide ICU collation to be
case/accent-insensitive. Postgres users are trained to expect
case-sensitive comparisons, but some apps initially made for
e.g. MySQL or MS-SQL that use such collations natively would be easier
to port to Postgres.
IIRC, that was the context for some questions where people were
enquiring about db-wide ICU collations.

With the current patch, it's not possible, AFAICS, because the user
can't tell that the collation is non-deterministic. Presumably this
would require another option to CREATE DATABASE and another
column to store that bit of information.

The "daticucol" column also suggests that we don't expect to add
other collation providers in the future. Maybe a pair of columns like
(datcollprovider, datcolllocale) would be more future-proof,
or a (datcollprovider, datcolllocale, datcollisdeterministic)
triplet if non-deterministic collations are allowed.

Also, pg_collation has "collversion" to detect a mismatch between
the ICU runtime and existing indexes. I don't see that field
for the db-wide ICU collation, so maybe we currently miss the capability
to detect the mismatch for the db-wide collation?

The lack of these fields overall suggest the idea that when CREATE
DATABASE is called with a global ICU collation, what if it somehow
inserted the collation into pg_collation in the new database?
Then pg_database would just store the collation oid and no other
collation-related field would need to be added into it, now
or in the future.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: Worth using personality(ADDR_NO_RANDOMIZE) for EXEC_BACKEND on linux?

2022-01-10 Thread Thomas Munro
On Sat, Jan 8, 2022 at 9:20 AM Bossart, Nathan  wrote:
> FWIW I just found this patch very useful for testing some EXEC_BACKEND
> stuff on Linux.

Thanks for testing.  Tidied and pushed, to master only for now.




Re: Multiple Query IDs for a rewritten parse tree

2022-01-10 Thread Julien Rouhaud
On Mon, Jan 10, 2022 at 03:24:46PM +0500, Andrey Lepikhov wrote:
> On 10/1/2022 13:56, Julien Rouhaud wrote:
> > 
> > I don't know the details of that extension, but I think that it should work 
> > as
> > long as you have the constants information in the jumble state, whatever the
> > resulting normalized query string is right?
> Yes. the same input query string doesn't prove that frozen query plan can be
> used, because rewrite rules could be changed. So we use only a query tree.

Yes, I'm fully aware of that.  I wasn't asking about using the input query
string but the need for generating a dedicated normalized output query string
that would be potentially different from the one generated by
pg_stat_statements (or similar).

> > But then, if generating 2 queryid is a better for performance, does the
> > extension really need an additional jumble state and/or normalized query
> > string?
> Additional Jumble state isn't necessary, but it would be better for
> performance to collect pointers to all constant nodes during a process of
> hash generation.

I agree, but it's even better for performance if this is collected only once.
I still don't know if this extension (or any extension) would require something
different from a common jumble state that would serve for all purpose or if we
can work with a single jumble state and only handle multiple queryid.




Re: Multiple Query IDs for a rewritten parse tree

2022-01-10 Thread Andrey Lepikhov

On 10/1/2022 13:56, Julien Rouhaud wrote:

On Mon, Jan 10, 2022 at 12:37:34PM +0500, Andrey V. Lepikhov wrote:

I think, pg_stat_statements can live with an queryId generator of the
sr_plan extension. But It replaces all constants with $XXX parameter at the
query string. In our extension user defines which plan is optimal and which
constants can be used as parameters in the plan.


I don't know the details of that extension, but I think that it should work as
long as you have the constants information in the jumble state, whatever the
resulting normalized query string is right?
Yes. the same input query string doesn't prove that frozen query plan 
can be used, because rewrite rules could be changed. So we use only a 
query tree. Here we must have custom jumbling implementation.

queryId in this extension defines two aspects:
1. How many input queries will be compared with a query tree template of 
the frozen statement.

2. As a result, performance overheads on unsuccessful comparings.



One drawback I see here - creating or dropping of my extension changes
behavior of pg_stat_statements that leads to distortion of the DB load
profile. Also, we haven't guarantees, that another extension will work
correctly (or in optimal way) with such queryId.


But then, if generating 2 queryid is a better for performance, does the
extension really need an additional jumble state and/or normalized query
string?
Additional Jumble state isn't necessary, but it would be better for 
performance to collect pointers to all constant nodes during a process 
of hash generation.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

2022-01-10 Thread Thomas Munro
On Mon, Jan 10, 2022 at 10:20 PM Thomas Munro  wrote:
> On Mon, Jan 10, 2022 at 8:00 PM Alexander Lakhin  wrote:
> > The libpqrcv_PQgetResult function, in turn, invokes WaitLatchOrSocket()
> > where WaitEvents are defined locally, and the closed flag set on the
> > first invocation but expected to be checked on second.
>
> D'oh, right.  There's also a WaitLatchOrSocket call in walreceiver.c.
> We'd need a long-lived WaitEventSet common across all of these sites,
> which is hard here (because the socket might change under you, as
> discussed in other threads that introduced long lived WaitEventSets to
> other places but not here).

This is super quick-and-dirty code (and doesn't handle some errors or
socket changes correctly), but does it detect the closed socket?
From 3590a8c9b3e8992a65dea7a9d6aecdca2f25582d Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 10 Jan 2022 14:45:05 +1300
Subject: [PATCH v2 1/2] Make Windows FD_CLOSE reporting sticky.

XXX Just testing an idea...
---
 src/backend/storage/ipc/latch.c | 16 
 src/include/storage/latch.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 1d893cf863..8f3176a00f 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -899,6 +899,7 @@ AddWaitEventToSet(WaitEventSet *set, uint32 events, pgsocket fd, Latch *latch,
 	event->user_data = user_data;
 #ifdef WIN32
 	event->reset = false;
+	event->closed = false;
 #endif
 
 	if (events == WL_LATCH_SET)
@@ -1882,6 +1883,20 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 return 1;
 			}
 		}
+
+		/*
+		 * XXX: HYPOTHESIS TO TEST
+		 * Windows only reports FD_CLOSE once.  If we've seen that already,
+		 * continue to report it.
+		 */
+		if ((cur_event->events & WL_SOCKET_MASK) && cur_event->closed)
+		{
+			occurred_events->pos = cur_event->pos;
+			occurred_events->user_data = cur_event->user_data;
+			occurred_events->events = (cur_event->events & WL_SOCKET_MASK);
+			occurred_events->fd = cur_event->fd;
+			return 1;
+		}
 	}
 
 	/*
@@ -2002,6 +2017,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 		{
 			/* EOF/error, so signal all caller-requested socket flags */
 			occurred_events->events |= (cur_event->events & WL_SOCKET_MASK);
+			cur_event->closed = true;
 		}
 
 		if (occurred_events->events != 0)
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index 44f9368c64..c24f46dc37 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -147,6 +147,7 @@ typedef struct WaitEvent
 	void	   *user_data;		/* pointer provided in AddWaitEventToSet */
 #ifdef WIN32
 	bool		reset;			/* Is reset of the event required? */
+	bool		closed;			/* Has FD_CLOSE event been reported? */
 #endif
 } WaitEvent;
 
-- 
2.33.1

From 755ee3da7934d33a706d59eb0abc0e4f1829acb0 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 10 Jan 2022 22:02:21 +1300
Subject: [PATCH v2 2/2] XXX Quick hack to use persistent WaitEventSet in
 walreceiver

XXX There are several things wrong with this, it's just experimental code.
---
 .../libpqwalreceiver/libpqwalreceiver.c   | 97 ---
 src/backend/replication/walreceiver.c | 12 +--
 src/include/replication/walreceiver.h |  8 ++
 3 files changed, 75 insertions(+), 42 deletions(-)

diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index c08e599eef..f8dd76e2d0 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -40,6 +40,7 @@ void		_PG_init(void);
 
 struct WalReceiverConn
 {
+	WaitEventSet *wes;
 	/* Current connection to the primary, if any */
 	PGconn	   *streamConn;
 	/* Used to remember if the connection is logical or physical */
@@ -82,6 +83,7 @@ static WalRcvExecResult *libpqrcv_exec(WalReceiverConn *conn,
 	   const int nRetTypes,
 	   const Oid *retTypes);
 static void libpqrcv_disconnect(WalReceiverConn *conn);
+static int libpqrcv_wait(WalReceiverConn *conn, int timeout, int wait_event);
 
 static WalReceiverFunctionsType PQWalReceiverFunctions = {
 	libpqrcv_connect,
@@ -98,12 +100,13 @@ static WalReceiverFunctionsType PQWalReceiverFunctions = {
 	libpqrcv_create_slot,
 	libpqrcv_get_backend_pid,
 	libpqrcv_exec,
-	libpqrcv_disconnect
+	libpqrcv_disconnect,
+	libpqrcv_wait
 };
 
 /* Prototypes for private functions */
-static PGresult *libpqrcv_PQexec(PGconn *streamConn, const char *query);
-static PGresult *libpqrcv_PQgetResult(PGconn *streamConn);
+static PGresult *libpqrcv_PQexec(WalReceiverConn *conn, const char *query);
+static PGresult *libpqrcv_PQgetResult(WalReceiverConn *conn);
 static char *stringlist_to_identifierstr(PGconn *conn, List *strings);
 
 /*
@@ -182,6 +185,15 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname,
 		

Re: Skipping logical replication transactions on subscriber side

2022-01-10 Thread vignesh C
On Fri, Jan 7, 2022 at 11:23 AM Masahiko Sawada  wrote:
>
> On Fri, Jan 7, 2022 at 10:04 AM Masahiko Sawada  wrote:
> >
> > On Wed, Jan 5, 2022 at 12:31 PM Amit Kapila  wrote:
> > >
> > > On Mon, Dec 27, 2021 at 9:54 AM Masahiko Sawada  
> > > wrote:
> > > >
> > > > On Thu, Dec 16, 2021 at 2:42 PM Masahiko Sawada  
> > > > wrote:
> > > > >
> > > > > On Thu, Dec 16, 2021 at 2:21 PM Amit Kapila  
> > > > > wrote:
> > > > > >
> > > > > > On Thu, Dec 16, 2021 at 10:37 AM Masahiko Sawada 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Thu, Dec 16, 2021 at 11:43 AM Amit Kapila 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > I thought we just want to lock before clearing the skip_xid 
> > > > > > > > something
> > > > > > > > like take the lock, check if the skip_xid in the catalog is the 
> > > > > > > > same
> > > > > > > > as we have skipped, if it is the same then clear it, otherwise, 
> > > > > > > > leave
> > > > > > > > it as it is. How will that disallow users to change skip_xid 
> > > > > > > > when we
> > > > > > > > are skipping changes?
> > > > > > >
> > > > > > > Oh I thought we wanted to keep holding the lock while skipping 
> > > > > > > changes
> > > > > > > (changing skip_xid requires acquiring the lock).
> > > > > > >
> > > > > > > So if skip_xid is already changed, the apply worker would do
> > > > > > > replorigin_advance() with WAL logging, instead of committing the
> > > > > > > catalog change?
> > > > > > >
> > > > > >
> > > > > > Right. BTW, how are you planning to advance the origin? Normally, a
> > > > > > commit transaction would do it but when we are skipping all changes,
> > > > > > the commit might not do it as there won't be any transaction id
> > > > > > assigned.
> > > > >
> > > > > I've not tested it yet but replorigin_advance() with wal_log = true
> > > > > seems to work for this case.
> > > >
> > > > I've tested it and realized that we cannot use replorigin_advance()
> > > > for this purpose without changes. That is, the current
> > > > replorigin_advance() doesn't allow to advance the origin by the owner:
> > > >
> > > > /* Make sure it's not used by somebody else */
> > > > if (replication_state->acquired_by != 0)
> > > > {
> > > > ereport(ERROR,
> > > > (errcode(ERRCODE_OBJECT_IN_USE),
> > > >  errmsg("replication origin with OID %d is already
> > > > active for PID %d",
> > > > replication_state->roident,
> > > > replication_state->acquired_by)));
> > > > }
> > > >
> > > > So we need to change it so that the origin owner can advance its
> > > > origin, which makes sense to me.
> > > >
> > > > Also, when we have to update the origin instead of committing the
> > > > catalog change while updating the origin, we cannot record the origin
> > > > timestamp.
> > > >
> > >
> > > Is it because we currently update the origin timestamp with commit record?
> >
> > Yes.
> >
> > >
> > > > This behavior makes sense to me because we skipped the
> > > > transaction. But ISTM it’s not good if we emit the origin timestamp
> > > > only when directly updating the origin. So probably we need to always
> > > > omit origin timestamp.
> > > >
> > >
> > > Do you mean to say that you want to omit it even when we are
> > > committing the changes?
> >
> > Yes, it would be better to record only origin lsn in terms of consistency.
> >
> > >
> > > > Apart from that, I'm vaguely concerned that the logic seems to be
> > > > getting complex. Probably it comes from the fact that we store
> > > > skip_xid in the catalog and update the catalog to clear/set the
> > > > skip_xid. It might be worth revisiting the idea of storing skip_xid on
> > > > shmem (e.g., ReplicationState)?
> > > >
> > >
> > > IIRC, the problem with that idea was that we won't remember skip_xid
> > > information after server restart and the user won't even know that it
> > > has to set it again.
> >
> > Right, I agree that it’s not convenient when the server restarts or
> > crashes, but these problems could not be critical in the situation
> > where users have to use this feature; the subscriber already entered
> > an error loop so they can know xid again and it’s an uncommon case
> > that they need to restart during skipping changes.
> >
> > Anyway, I'll submit an updated patch soon so we can discuss complexity
> > vs. convenience.
>
> Attached an updated patch. Please review it.

Thanks for the updated patch, few comments:
1) Should this be case insensitive to support NONE too:
+   /* Setting xid = NONE is treated as resetting xid */
+   if (strcmp(xid_str, "none") == 0)
+   xid = InvalidTransactionId;

2) Can we have an option to specify last_error_xid of
pg_stat_subscription_workers. Something like:
alter subscription sub1 skip ( XID = 'last_subscription_error');

When the user specified last_subscription_error, it 

Re: Why is src/test/modules/committs/t/002_standby.pl flaky?

2022-01-10 Thread Thomas Munro
On Mon, Jan 10, 2022 at 8:00 PM Alexander Lakhin  wrote:
> The libpqrcv_PQgetResult function, in turn, invokes WaitLatchOrSocket()
> where WaitEvents are defined locally, and the closed flag set on the
> first invocation but expected to be checked on second.

D'oh, right.  There's also a WaitLatchOrSocket call in walreceiver.c.
We'd need a long-lived WaitEventSet common across all of these sites,
which is hard here (because the socket might change under you, as
discussed in other threads that introduced long lived WaitEventSets to
other places but not here).

/me wonders if it's possible that graceful FD_CLOSE is reported only
once, but abortive/error FD_CLOSE is reported multiple times...




Re: [Proposal] Global temporary tables

2022-01-10 Thread Andrew Bille
Hi!

I could not detect crashes with your last patch, so I think the patch is
ready for a review.
Please, also consider fixing error messages, as existing ones don't follow
message writing guidelines.
https://www.postgresql.org/docs/14/error-style-guide.html

Regards, Andrew

On Thu, Dec 23, 2021 at 7:36 PM wenjing  wrote:

>
>
> Andrew Bille  于2021年12月21日周二 14:00写道:
>
>> Hi!
>> Thanks for new patches.
>> Yet another crash reproduced on master with v63 patches:
>>
>> CREATE TABLESPACE ts LOCATION '/tmp/ts';
>> CREATE GLOBAL TEMP TABLE tbl (num1 bigint);
>> INSERT INTO tbl (num1) values (1);
>> CREATE INDEX tbl_idx ON tbl (num1);
>> REINDEX (TABLESPACE ts) TABLE tbl;
>>
> This is a feature made in PG14 that supports reindex change tablespaces.
> Thank you for pointing that out and I fixed it in v64.
> Waiting for your feedback.
>


Re: reporting TID/table with corruption error

2022-01-10 Thread Andrey Borodin



> 19 авг. 2021 г., в 21:37, Alvaro Herrera  написал(а):
> 
> A customer recently hit this error message:
> 
> ERROR:  t_xmin is uncommitted in tuple to be updated

Hi!

Currently I'm observing this on one of our production clusters. The problem 
occurs at random points in time, seems to be covered by retries on client's 
side and so far did not inflict any harm (except woken engineers).

Few facts:
0. PostgreSQL 12.9 (with some unrelated patches)
1. amcheck\heapcheck\pg_visibility never suspected the cluster and remain silent
2. I observe the problem ~once a day
3. The tuple seems to be updated in a high-contention concurrency trigger 
function, autovacuum keeks in ~20-30 seconds after the message in logs

[ 2022-01-10 09:07:17.671 MSK [unknown],,_s,310759,XX001 ]:ERROR:  
t_xmin 696079792 is uncommitted in tuple (1419011,109) to be updated in table 
"s_statistics"
[ 2022-01-10 09:07:17.671 MSK [unknown],,_s,310759,XX001 ]:CONTEXT: 
 SQL statement "UPDATE _s.s_statistics os
 SET __found_ts = COALESCE(os.__found_ts, NOW()),
 last__found_ts = NOW(),
 num_s = os.num_s + 1
 WHERE _id = NEW._id"
PL/pgSQL function statistics__update_from_new_() line 3 at SQL 
statement
[ 2022-01-10 09:07:17.671 MSK [unknown],,_s,310759,XX001 
]:STATEMENT:  
INSERT INTO _s.s_s AS s 

4. t_xmin is relatevely new, not ancient
5. pageinspect shows dead tuple after some time
6. no suspicious activity in logs nearby
7. vacuum (disable_page_skipping) and repack of indexes did not change anything


I suspect this can be relatively new concurrency stuff. At least I never saw 
this before on clusters with clean amcheck and heapcheck results.

Alvaro, did you observe this on binaries from August 13 minor release or older?

Thanks!

Best regards, Andrey Borodin.



Re: Multiple Query IDs for a rewritten parse tree

2022-01-10 Thread Julien Rouhaud
On Mon, Jan 10, 2022 at 12:37:34PM +0500, Andrey V. Lepikhov wrote:
> I think, pg_stat_statements can live with an queryId generator of the
> sr_plan extension. But It replaces all constants with $XXX parameter at the
> query string. In our extension user defines which plan is optimal and which
> constants can be used as parameters in the plan.

I don't know the details of that extension, but I think that it should work as
long as you have the constants information in the jumble state, whatever the
resulting normalized query string is right?

> One drawback I see here - creating or dropping of my extension changes
> behavior of pg_stat_statements that leads to distortion of the DB load
> profile. Also, we haven't guarantees, that another extension will work
> correctly (or in optimal way) with such queryId.

But then, if generating 2 queryid is a better for performance, does the
extension really need an additional jumble state and/or normalized query
string?




Re: SQL:2011 application time

2022-01-10 Thread Peter Eisentraut



On 06.01.22 06:44, Paul A Jungwirth wrote:

I didn't follow why indexes would have periods, for example, the new
period field in IndexStmt.  Is that explained anywhere?


When you create a primary key or a unique constraint (which are backed
by a unique index), you can give a period name to make it a temporal
constraint. We create the index first and then create the constraint
as a side-effect of that (e.g. index_create calls
index_constraint_create). The analysis phase generates an IndexStmt.
So I think this was mostly a way to pass the period info down to the
constraint. It probably doesn't actually need to be stored on pg_index
though. Maybe it does for index_concurrently_create_copy. I'll add
some comments, but if you think it's the wrong approach let me know.


This seems backwards.  Currently, when you create a constraint, the 
index is created as a side effect and is owned, so to speak, by the 
constraint.  What you are describing here sounds like the index owns the 
constraint.  This needs to be reconsidered, I think.



Of course, the main problem in this patch is that for most uses it
requires btree_gist.  I think we should consider moving that into
core, or at least the support for types that are most relevant to this
functionality, specifically the date/time types.  Aside from user
convenience, this would also allow writing more realistic test cases.


I think this would be great too. How realistic do you think it is? I
figured since exclusion constraints are also pretty useless without
btree_gist, it wasn't asking too much to have people install the
extension, but still it'd be better if it were all built in.


IMO, if this temporal feature is to happen, btree_gist needs to be moved 
into core first.  Having to install an extension in order to use an 
in-core feature like this isn't going to be an acceptable experience.



src/backend/access/brin/brin_minmax_multi.c

These renaming changes seem unrelated (but still seem like a good
idea).  Should they be progressed separately?


I can pull this out into a separate patch. I needed to do it because
when I added an `#include ` somewhere, these conflicted
with the range_{de,}serialize functions declared there.


OK, I have committed this separately.


I don't understand why a temporal primary key is required for doing
UPDATE FOR PORTION OF.  I don't see this in the standard.


You're right, it's not in the standard. I'm doing that because
creating the PK is when we add the triggers to implement UPDATE FOR
PORTION OF. I thought it was acceptable since we also require a
PK/unique constraint as the referent of a foreign key.


That part *is* in the standard.


But we could
avoid it if I went back to the executor-based FOR PORTION OF
implementation, since that doesn't depend on triggers. What do you
think?


I think it's worth trying to do this without triggers.

But if you are just looking for a way to create the triggers, why are 
they not just created when the table is created?



I think it would be smart to have a rough plan for how this work will
be compatible with system-time support. Corey & I have talked about
that a lot, and In general they are orthogonal, but it would be nice
to have details written down somewhere.


I personally don't see why we need to worry about system time now. 
System time seems quite a complicated feature, since you have to figure 
out a system to store and clean the old data, whereas this application 
time feature is ultimately mostly syntax sugar around ranges and 
exclusion constraints.  As long as we keep the standard syntax for 
system time available for future use (which is what your patch does), I 
don't see a need to go deeper right now.