Re: row filtering for logical replication

2021-12-03 Thread Sascha Kuhl
This is great work  thanks for the Realisation Update.

Euler Taveira  schrieb am Sa., 4. Dez. 2021, 00:13:

> On Thu, Dec 2, 2021, at 4:18 AM, Peter Smith wrote:
>
> PSA a new v44* patch set.
>
> We are actively developing this feature for some months and we improved
> this
> feature a lot. This has been a good team work. It seems a good time to
> provide
> a retrospective for this feature based on the consensus we reached until
> now.
>
> The current design has one row filter per publication-table mapping. It
> allows
> flexible choices while using the same table for multiple replication
> purposes.
> The WHERE clause was chosen as the syntax to declare the row filter
> expression
> (enclosed by parentheses).
>
> There was a lot of discussion about which columns are allowed to use in
> the row
> filter expression. The consensus was that publications that publish UPDATE
> and/or DELETE operations, should check if the columns in the row filter
> expression is part of the replica identity. Otherwise, these DML operations
> couldn't be replicated.
>
> We also discussed about which expression would be allowed. We couldn't
> allow
> all kind of expressions because the way logical decoding infrastructure was
> designed, some expressions could break the replication. Hence, we decided
> to
> allow only "simple expressions". By "simple expression", we mean to
> restrict
> (a) user-defined objects (functions, operators, types) and (b) immutable
> builtin functions.
>
> A subscription can subscribe to multiple publications. These publication
> can
> publish the same table. In this case, we have to combine the row filter
> expression to decide if the row will be replicated or not. The consensus
> was to
> replicate a row if any of the row filters returns true. It means that if
> one
> publication-table mapping does not have a row filter, the row will be
> replicated. There is an optimization for this case that provides an empty
> expression for this table. Hence, it bails out and replicate the row
> without
> running the row filter code.
>
> The same logic applies to the initial table synchronization if there are
> multiple row filters. Copy all rows that satisfies at least one row filter
> expression. If the subscriber is a pre-15 version, data synchronization
> won't
> use row filters if they are defined in the publisher.
>
> If we are dealing with partitioned tables, the publication parameter
> publish_via_partition_root determines if it uses the partition row filter
> (false) or the root partitioned table row filter (true).
>
> I used the last patch series (v44) posted by Peter Smith [1]. I did a lot
> of
> improvements in this new version (v45). I merged 0001 (it is basically the
> main
> patch I wrote) and 0004 (autocomplete). As I explained in [2], I
> implemented a
> patch (that is incorporated in the v45-0001) to fix this issue. I saw that
> Peter already proposed a slightly different patch (0006). I read this
> patch and
> concludes that it  would be better to keep the version I have. It fixes a
> few
> things and also includes more comments. I attached another patch (v45-0002)
> that includes the expression validation. It is based on 0002. I completely
> overhaul it. There are additional expressions that was not supported by the
> previous version (such as conditional expressions [CASE, COALESCE, NULLIF,
> GREATEST, LEAST], array operators, XML operators). I probably didn't
> finish the
> supported node list (there are a few primitive nodes that need to be
> checked).
> However, the current "simple expression" routine seems promising. I plan to
> integrate v45-0002 in the next patch version. I attached it here for
> comparison
> purposes only.
>
> My next step is to review 0003. As I said before it would like to treat it
> as a
> separate feature. I know that it is useful for data consistency but this
> patch
> is already too complex. Having said that, I didn't include it in this patch
> series because it doesn't apply cleanly. If Ajin would like to provide a
> new
> version, I would appreciate.
>
> PS> I will update the commit message in the next version. I barely changed
> the
> documentation to reflect the current behavior. I probably missed some
> changes
> but I will fix in the next version.
>
>
> [1]
> https://postgr.es/m/CAHut%2BPtJnnM8MYQDf7xCyFAp13U_0Ya2dv-UQeFD%3DghixFLZiw%40mail.gmail.com
> [2]
> https://postgr.es/m/ca8d270d-f930-4d15-9f24-60f95b364173%40www.fastmail.com
>
>
> --
> Euler Taveira
> EDB   https://www.enterprisedb.com/
>
>


Re: stat() vs ERROR_DELETE_PENDING, round N + 1

2021-12-03 Thread Thomas Munro
On Thu, Dec 2, 2021 at 3:11 AM Daniel Gustafsson  wrote:
> > On 3 Nov 2021, at 12:02, Daniel Gustafsson  wrote:
> >
> > This patch doesn't compile on Windows according to Appveyor, seemingly 
> > because
> > of a syntax error in the new win32ntdll.h file, but the MSVC logs are hard 
> > on
> > the eye so it might be unrelated.
>
> As the thread has stalled with a patch that doesn't apply, I'm marking this
> patch Returned with Feedback.  Please feel free to resubmit when a new patch 
> is
> ready.

I think this was broken by WIN32_LEAN_AND_MEAN (and since gained a
merge conflict, but that's easy to fix).  I'll try to figure out the
right system header hacks to unbreak it...




Re: Skip vacuum log report code in lazy_scan_heap() if possible

2021-12-03 Thread Michael Paquier
On Fri, Dec 03, 2021 at 05:46:51PM +, Bossart, Nathan wrote:
> Perhaps this patch should be marked Rejected in favor of that one,
> then.

Let's do so, then.  I can also get behind the coverage argument for
this code path.
--
Michael


signature.asc
Description: PGP signature


Re: A test for replay of regression tests

2021-12-03 Thread Thomas Munro
On Tue, Nov 30, 2021 at 3:14 AM Andrew Dunstan  wrote:
> choco install -y Carbon
> Import-Module Carbon
> Grant-CPrivilege -Identity myuser -Privilege SeCreateSymbolicLinkPrivilege

Interesting.  Well, I found the problem with my last patch (to wit:
junction points must be absolute, unlike real symlinks, which I'd
considered already but I missed that tmp_check's DataDir had a stray
internal \.\), and now I'm wondering whether these newer real symlinks
could help.  The constraints are pretty hard to work with... I thought
about a couple of options:

1.  We could try to use real symlinks, and fall back to junction
points if that fails.  That means that these new tests I'm proposing
would fail unless you grant that privilege or run in developer mode as
you were saying.  It bothers me a bit that developers and the BF would
be testing a different code path than production databases run...
unless you're thinking we should switch to symlinks with no fallback,
and require that privilege to be granted by end users to production
servers at least if they want to use tablespaces, and also drop
pre-Win10 support in the same release?  That's bigger than I was
thinking.

2.  We could convert relative paths to absolute paths at junction
point creation time, which I tried, and "check" now passes.  Problems:
(1) now you can't move pgdata around, (2) the is-the-path-too-long
check performed on a primary is not sufficient to check if the
transformed absolute path will be too long on a replica.

The most conservative simple idea I have so far is: go with option 2,
but make this whole thing an undocumented developer-only mode, and
turn it on in the regression tests.  Here's a patch set like that.
Thoughts?

Another option would be to stop using operating system symlinks, and
build the target paths ourselves; I didn't investigate that as it
seemed like a bigger change than this warrants.

Next problem:  The below is clearly not the right way to find the
pg_regress binary and bindir, and doesn't work on Windows or VPATH.
Any suggestions for how to do this?  I feel like something like
$node->installed_command() or similar logic is needed...

# Run the regression tests against the primary.
# XXX How should we find the pg_regress binary and bindir?
system_or_bail("../regress/pg_regress",
   "--bindir=../../bin/psql",
   "--port=" . $node_primary->port,
   "--schedule=../regress/parallel_schedule",
   "--dlpath=../regress",
   "--inputdir=../regress");

BTW 0002 is one of those renaming patches from git that GNU patch
doesn't seem to apply correctly, sorry cfbot...
From 1a442bfde9b44d28db7344df56e5c0069dbd2364 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 25 May 2021 20:42:17 +1200
Subject: [PATCH v7 1/5] Allow restricted relative tablespace paths.

Traditionally, tablespace paths provided to LOCATION had to be absolute,
because relative paths risked colliding with PostgreSQL-managed files
and generally blurring the boundary between system-managed and
user-managed data.

This commit adds a very small exception: it allows relative paths to be
used, but only under a new directory "pg_user_files", and only if the
developer-only option allow_relative_tablespaces is enabled.  This is
intended to be used in automated testing.

Discussion: https://postgr.es/m/CA%2BhUKGKpRWQ9SxdxxDmTBCJoR0YnFpMBe7kyzY8SUQk%2BHeskxg%40mail.gmail.com
---
 src/backend/commands/tablespace.c | 55 +++
 src/backend/utils/misc/guc.c  | 12 +++
 src/bin/initdb/initdb.c   |  1 +
 src/common/string.c   |  9 +
 src/include/commands/tablespace.h |  2 ++
 src/include/common/string.h   |  1 +
 src/port/dirmod.c |  4 +++
 7 files changed, 77 insertions(+), 7 deletions(-)

diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
index 4b96eec9df..7e98ca5b76 100644
--- a/src/backend/commands/tablespace.c
+++ b/src/backend/commands/tablespace.c
@@ -70,6 +70,7 @@
 #include "commands/tablecmds.h"
 #include "commands/tablespace.h"
 #include "common/file_perm.h"
+#include "common/string.h"
 #include "miscadmin.h"
 #include "postmaster/bgwriter.h"
 #include "storage/fd.h"
@@ -87,6 +88,7 @@
 /* GUC variables */
 char	   *default_tablespace = NULL;
 char	   *temp_tablespaces = NULL;
+bool		allow_relative_tablespaces = false;
 
 
 static void create_tablespace_directories(const char *location,
@@ -267,11 +269,15 @@ CreateTableSpace(CreateTableSpaceStmt *stmt)
  errmsg("tablespace location cannot contain single quotes")));
 
 	/*
-	 * Allowing relative paths seems risky
+	 * Allowing relative paths seems risky in general, but we'll allow it under
+	 * pg_user_files if a developer-mode option is enabled, for the purpose
+	 * of testing.
 	 *
 	 * this also helps us ensure that location is not empty or whitespace
 	 */
-	if (!is_absolute_path(location))
+	if (!is_absolute_path(location) &&
+		

Re: SPI TupTable memory context

2021-12-03 Thread Tom Lane
Chapman Flack  writes:
> If a person wanted to refer to those tuples after SPI_finish,
> would it be a dangerous idea to just reparent that context into one
> that will live longer, shortly before SPI_finish is called?

Seems kinda dangerous to me ...

> AtEOSubXact_SPI can free tuptables retail, but only in the rollback case.

... precisely because of that.  If you wanted to take control of
the TupTable, you'd really need to unhook it from the SPI context's
tuptables list, and that *really* seems like undue familiarity
with the implementation.

On the whole this seems like the sort of thing where if it breaks,
nobody is going to have a lot of sympathy.  What I'd suggest,
if you don't want to let the SPI mechanisms manage that memory,
is to not put the tuple set into a SPITupleTable in the first
place.  Run the query with a different DestReceiver that saves the
query result someplace you want it to be (see SPI_execute_extended
and the options->dest argument).

regards, tom lane




SPI TupTable memory context

2021-12-03 Thread Chapman Flack
Hi,

When SPI produces a tuple table result, its DestReceiver makes a new
"SPI TupTable" memory context, as a child of the SPI Proc context,
and the received tuples get copied into it. It goes away at SPI_finish,
as a consequence of its parent SPI Proc context going away.

If a person wanted to refer to those tuples after SPI_finish,
would it be a dangerous idea to just reparent that context into one
that will live longer, shortly before SPI_finish is called?

AtEOSubXact_SPI can free tuptables retail, but only in the rollback case.

The idea makes sense to me, only I notice that SPITupleTable.tuptabcxt
is under /* Private members, not intended for external callers */.

There's no such warning painted on GetMemoryChunkContext, but it would be
hard to get the Woody Guthrie song out of my head after doing it that way.

Am I overlooking a reason that reparenting an SPI TupTable context
would be a bad idea?

Regards,
-Chap




Re: The "char" type versus non-ASCII characters

2021-12-03 Thread Chapman Flack
On 12/03/21 14:12, Tom Lane wrote:
> This reminds me of something I've been intending to bring up, which
> is that the "char" type is not very encoding-safe.  charout() for
> example just regurgitates the single byte as-is.

I wonder if maybe what to do about that lies downstream of some other
thought about encoding-related type properties.

ISTM we don't, at present, have a clear story for types that have an
encoding (or repertoire) property that isn't one of (inapplicable,
server_encoding).

And yet such things exist, and more such things could or should exist
(NCHAR, healthier versions of xml or json, ...). "char" is an existing
example, because its current behavior is exactly as if it declared
"I am one byte of SQL_ASCII regardless of server setting".

Which is no trouble at all when the server setting is also SQL_ASCII.
But what does it mean when the server setting and the inherent
repertoire property of a type can be different? The present answer
isn't pretty.

When can charout() be called? typoutput functions don't have any
'internal' parameters, so nothing stops user code from calling them;
I don't know how often that's done, and that's a complication.
The canonical place for it to be called is inside printtup(), when
the client driver has requested format 0 for that attribute.

Up to that point, we could have known it was a type with SQL_ASCII
wired in, but after charout() we have a cstring, and printtup treats
that type as having the server encoding, and it goes through encoding
conversion from that to the client encoding in pq_sendcountedtext.

Indeed, cstring behaves completely as if it is a type with the server
encoding. If you send a cstring with format 1 rather than format 0,
while it is no longer subject to the encoding conversion done in
pq_sendcountedtext, it will dutifully perform the same conversion
in its own cstring_send. unknownsend is the same way.

But of course a "char" column in format 1 would never go through cstring;
char_send would be called, and just plop the byte in the buffer unchanged
(which is the same operation as an encoding conversion from SQL_ASCII
to anything).

Ever since I figured out I have to look at the send/recv functions
for a type to find out if it is encoding-dependent, I have to walk myself
through those steps again every time I forget why that is. Having
the type's character-encoding details show up in its send/recv functions
and not in its in/out functions never stops being counterintuitive to me.
But for server-encoding-dependent types, that's how it is: you don't
see it in the typoutput function, because on the format-0 path,
the transcoding happens in pq_sendcountedtext. But on the format-1 path,
the same transcoding happens, this time under the type's own control
in its typsend function.

That was the second thing that surprised me: we have what we call
a text and a binary path, but for an encoding-dependent type, neither
one is a path where transcoding doesn't happen!

The difference is, the format-0 transcoding is applied blindly,
in pq_sendcountedtext, with no surviving information about the data
type (which has become cstring by that point). In contrast, on the
format-1 path, the type's typsend is in control. In theory, that would
allow type-aware conversion; a smarter xml_send could use 

Re: keepliaves etc. as environment variables

2021-12-03 Thread Robert Haas
On Thu, Dec 2, 2021 at 8:59 PM Tatsuo Ishii  wrote:
> > On 2021-12-03 10:28:34 +0900, Tatsuo Ishii wrote:
> >> It seems there are no environment variables corresponding to keepalives
> >> etc. connection parameters in libpq. Is there any reason for this?
> >
> > PGOPTIONS='-c tcp_keepalive_*=foo' should work.
>
> Sorry I was not clear. I wanted to know why there are no specific
> environment variable for keepalives etc. like PGCONNECT_TIMEOUT.

In theory we could have an environment variable for every connection
parameter, but there's some cost to that. For example, if some program
wants to sanitize the environment of all PG-related environment
variables, it has more to do. It seems reasonable to me to have
environment variables only for the most important connection
parameters, rather than all of them. I would argue we've overdone it
already.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_dump versus ancient server versions

2021-12-03 Thread Robert Haas
On Fri, Dec 3, 2021 at 1:30 PM Tom Lane  wrote:
> What's most likely to happen IMO is that committers will just start
> back-patching essential portability fixes into out-of-support-but-
> still-in-the-buildability-window branches, contemporaneously with
> the original fix.  Yeah, that does mean more committer effort,
> but only for a very small number of patches.

I agree. I think that's exactly what we want to have happen, and if a
given policy won't have exactly this result then the policy needs
adjusting.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: The "char" type versus non-ASCII characters

2021-12-03 Thread Kenneth Marshall
On Fri, Dec 03, 2021 at 03:13:24PM -0500, Tom Lane wrote:
> Andrew Dunstan  writes:
> > On 12/3/21 14:42, Tom Lane wrote:
> >> Right, I envisioned that ASCII behaves the same but we'd use
> >> a numeric representation for high-bit-set values.  These
> >> cases could be told apart fairly easily by charin(), since
> >> the numeric representation would always be three digits.
> 
> > OK, this seems the most attractive. Can we also allow 2 hex digits?
> 
> I think we should pick one base and stick to it.  I don't mind
> hex if you have a preference for that.
> 
>   regards, tom lane

+1 for hex

Regards,
Ken




Re: The "char" type versus non-ASCII characters

2021-12-03 Thread Tom Lane
Andrew Dunstan  writes:
> On 12/3/21 14:42, Tom Lane wrote:
>> Right, I envisioned that ASCII behaves the same but we'd use
>> a numeric representation for high-bit-set values.  These
>> cases could be told apart fairly easily by charin(), since
>> the numeric representation would always be three digits.

> OK, this seems the most attractive. Can we also allow 2 hex digits?

I think we should pick one base and stick to it.  I don't mind
hex if you have a preference for that.

regards, tom lane




Re: The "char" type versus non-ASCII characters

2021-12-03 Thread Andrew Dunstan


On 12/3/21 14:42, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 12/3/21 14:12, Tom Lane wrote:
>>> I can think of at least three ways we might address this:
>>>
>>> * Forbid all non-ASCII values for type "char".  This results in
>>> simple and portable semantics, but it might break usages that
>>> work okay today.
>>>
>>> * Allow such values only in single-byte server encodings.  This
>>> is a bit messy, but it wouldn't break any cases that are not
>>> problematic already.
>>>
>>> * Continue to allow non-ASCII values, but change charin/charout,
>>> char_text, etc so that the external representation is encoding-safe
>>> (perhaps make it an octal or decimal number).
>> Is #3 going to change the external representation only
>> for non-ASCII values? If so, that seems OK.
> Right, I envisioned that ASCII behaves the same but we'd use
> a numeric representation for high-bit-set values.  These
> cases could be told apart fairly easily by charin(), since
> the numeric representation would always be three digits.


OK, this seems the most attractive. Can we also allow 2 hex digits?


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





daitch_mokotoff module

2021-12-03 Thread Dag Lem
Hello,

Please find attached a patch for the daitch_mokotoff module.

This implements the Daitch-Mokotoff Soundex System, as described in
https://www.avotaynu.com/soundex.htm

The module is used in production at Finance Norway.

In order to verify correctness, I have compared generated soundex codes
with corresponding results from the implementation by Stephen P. Morse
at https://stevemorse.org/census/soundex.html

Where soundex codes differ, the daitch_mokotoff module has been found
to be correct. The Morse implementation uses a few unofficial rules,
and also has an error in the handling of adjacent identical code
digits. Please see daitch_mokotoff.c for further references and
comments.

For reference, detailed instructions for soundex code comparison are
attached.


Best regards

Dag Lem

diff --git a/contrib/Makefile b/contrib/Makefile
index 87bf87ab90..5ea729 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -14,6 +14,7 @@ SUBDIRS = \
 		btree_gist	\
 		citext		\
 		cube		\
+		daitch_mokotoff	\
 		dblink		\
 		dict_int	\
 		dict_xsyn	\
diff --git a/contrib/daitch_mokotoff/Makefile b/contrib/daitch_mokotoff/Makefile
new file mode 100644
index 00..baec5e31d4
--- /dev/null
+++ b/contrib/daitch_mokotoff/Makefile
@@ -0,0 +1,25 @@
+# contrib/daitch_mokotoff/Makefile
+
+MODULE_big = daitch_mokotoff
+OBJS = \
+	$(WIN32RES) \
+	daitch_mokotoff.o
+
+EXTENSION = daitch_mokotoff
+DATA = daitch_mokotoff--1.0.sql
+PGFILEDESC = "daitch_mokotoff - Daitch-Mokotoff Soundex System"
+
+HEADERS = daitch_mokotoff.h
+
+REGRESS = daitch_mokotoff
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/daitch_mokotoff
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/daitch_mokotoff/daitch_mokotoff--1.0.sql b/contrib/daitch_mokotoff/daitch_mokotoff--1.0.sql
new file mode 100644
index 00..0b5a643175
--- /dev/null
+++ b/contrib/daitch_mokotoff/daitch_mokotoff--1.0.sql
@@ -0,0 +1,8 @@
+/* contrib/daitch_mokotoff/daitch_mokotoff--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION daitch_mokotoff" to load this file. \quit
+
+CREATE FUNCTION daitch_mokotoff(text) RETURNS text
+AS 'MODULE_PATHNAME', 'daitch_mokotoff'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
diff --git a/contrib/daitch_mokotoff/daitch_mokotoff.c b/contrib/daitch_mokotoff/daitch_mokotoff.c
new file mode 100644
index 00..9e66aee434
--- /dev/null
+++ b/contrib/daitch_mokotoff/daitch_mokotoff.c
@@ -0,0 +1,551 @@
+/*
+ * Daitch-Mokotoff Soundex
+ *
+ * Copyright (c) 2021 Finance Norway
+ * Author: Dag Lem 
+ *
+ * This implementation of the Daitch-Mokotoff Soundex System aims at high
+ * performance.
+ *
+ * - The processing of each phoneme is initiated by an O(1) table lookup.
+ * - For phonemes containing more than one character, a coding tree is traversed
+ *   to process the complete phoneme.
+ * - The (alternate) soundex codes are produced digit by digit in-place in
+ *   another tree structure.
+ *
+ *  References:
+ *
+ * https://www.avotaynu.com/soundex.htm
+ * https://www.jewishgen.org/InfoFiles/Soundex.html
+ * https://familypedia.fandom.com/wiki/Daitch-Mokotoff_Soundex
+ * https://stevemorse.org/phoneticinfo.htm (dmlat.php, dmsoundex.php)
+ * https://github.com/apache/commons-codec/ (dmrules.txt, DaitchMokotoffSoundex.java)
+ * https://metacpan.org/pod/Text::Phonetic (DaitchMokotoff.pm)
+ *
+ * A few notes on other implementations:
+ *
+ * - "J" is considered a vowel in dmlat.php
+ * - The official rules for "RS" are commented out in dmlat.php
+ * - Adjacent identical code digits are not collapsed correctly in dmsoundex.php
+ *   when double digit codes are involved. E.g. "BESST" yields 744300 instead of
+ *   743000 as for "BEST".
+ * - "Y" is not considered a vowel in DaitchMokotoffSoundex.java
+ * - Both dmlat.php and dmrules.txt have the same unofficial rules for "UE".
+ * - Coding of MN/NM + M/N differs between dmsoundex.php and DaitchMokotoffSoundex.java
+ * - Neither dmsoundex.php nor DaitchMokotoffSoundex.java yields all valid codes for e.g.
+ *   "CJC" (55 54 545000 45 40 44).
+ *
+ * Permission to use, copy, modify, and distribute this software and its
+ * documentation for any purpose, without fee, and without a written agreement
+ * is hereby granted, provided that the above copyright notice and this
+ * paragraph and the following two paragraphs appear in all copies.
+ *
+ * IN NO EVENT SHALL THE AUTHOR OR DISTRIBUTORS BE LIABLE TO ANY PARTY FOR
+ * DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
+ * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
+ * DOCUMENTATION, EVEN IF THE AUTHOR OR DISTRIBUTORS HAVE BEEN ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ *
+ * THE AUTHOR AND DISTRIBUTORS SPECIFICALLY DISCLAIM ANY WARRANTIES,
+ * 

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2021-12-03 Thread Melanie Plageman
Thanks again! I really appreciate the thorough review.

I have combined responses to all three of your emails below.
Let me know if it is more confusing to do it this way.

On Wed, Dec 1, 2021 at 6:59 PM Justin Pryzby  wrote:
>
> On Wed, Dec 01, 2021 at 05:00:14PM -0500, Melanie Plageman wrote:
> > > Also:
> > > src/include/miscadmin.h:#define BACKEND_NUM_TYPES (B_LOGGER + 1)
> > >
> > > I think it's wrong to say NUM_TYPES = B_LOGGER + 1 (which would suggest 
> > > using
> > > lessthan-or-equal instead of lessthan as you are).
> > >
> > > Since the valid backend types start at 1 , the "count" of backend types is
> > > currently B_LOGGER (13) - not 14.  I think you should remove the "+1" 
> > > here.
> > > Then NROWS (if it continued to exist at all) wouldn't need to subtract 
> > > one.
> >
> > I think what I currently have is technically correct because I start at
> > 1 when I am using it as a loop condition. I do waste a spot in the
> > arrays I allocate with BACKEND_NUM_TYPES size.
> >
> > I was hesitant to make the value of BACKEND_NUM_TYPES == B_LOGGER
> > because it seems kind of weird to have it have the same value as the
> > B_LOGGER enum.
>
> I don't mean to say that the code is misbehaving - I mean "num_x" means "the
> number of x's" - how many there are.  Since the first, valid backend type is 
> 1,
> and they're numbered consecutively and without duplicates, then "the number of
> backend types" is the same as the value of the last one (B_LOGGER).  It's
> confusing if there's a macro called BACKEND_NUM_TYPES which is greater than 
> the
> number of backend types.
>
> Most loops say for (int i=0; i If it's 1-based, they say for (int i=1; i<=NUM; ++i)
> You have two different loops like:
>
> +   for (int i = 0; i < BACKEND_NUM_TYPES - 1 ; i++)
> +   for (int backend_type = 1; backend_type < BACKEND_NUM_TYPES; 
> backend_type++)
>
> Both of these iterate over the correct number of backend types, but they both
> *look* wrong, which isn't desirable.

I've changed this and added comments wherever I could to make it clear.
Whenever the parameter was of type BackendType, I tried to use the
correct (not adjusted by subtracting 1) number and wherever the type was
int and being used as an index into the array, I used the adjusted value
and added the idx suffix to make it clear that the number does not
reflect the actual BackendType:

On Wed, Dec 1, 2021 at 10:31 PM Justin Pryzby  wrote:
>
> On Wed, Dec 01, 2021 at 04:59:44PM -0500, Melanie Plageman wrote:
> > Thanks for the review!
> >
> > On Wed, Nov 24, 2021 at 8:16 PM Justin Pryzby  wrote:
> > > There's extraneous blank lines in these functions:
> > > +pgstat_recv_resetsharedcounter
> > I didn't see one here
>
> => The extra blank line is after the RESET_BUFFERS memset.

Fixed.

> > +* Reset the global, bgwriter and checkpointer statistics for the
> > +* cluster.
>
> The first comma in this comment was introduced in 1bc8e7b09, and seems to be
> extraneous, since bgwriter and checkpointer are both global.  With the comma,
> it looks like it should be memsetting 3 things.

Fixed.

> > +   /* Don't count dead backends. They should already be 
> > counted */
>
> Maybe this comment should say ".. they'll be added below"

Fixed.

> > +   row[COLUMN_BACKEND_TYPE] = backend_type_desc;
> > +   row[COLUMN_IO_PATH] = 
> > CStringGetTextDatum(GetIOPathDesc(io_path));
> > +   row[COLUMN_ALLOCS] += io_ops->allocs - 
> > resets->allocs;
> > +   row[COLUMN_EXTENDS] += io_ops->extends - 
> > resets->extends;
> > +   row[COLUMN_FSYNCS] += io_ops->fsyncs - 
> > resets->fsyncs;
> > +   row[COLUMN_WRITES] += io_ops->writes - 
> > resets->writes;
> > +   row[COLUMN_RESET_TIME] = reset_time;
>
> It'd be clearer if RESET_TIME were set adjacent to BACKEND_TYPE and IO_PATH.

If you mean just in the order here (not in the column order in the
view), then I have changed it as you recommended.

> This message needs to be updated:
> errhint("Target must be \"archiver\", \"bgwriter\", or \"wal\".")))

Done.

> When I query the view, I see reset times as: 1999-12-31 18:00:00-06.
> I guess it should be initialized like this one:
> globalStats.bgwriter.stat_reset_timestamp = ts

Done.

> The cfbot shows failures now (I thought it was passing with the previous 
> patch,
> but I suppose I'm wrong.)
>
> It looks like running recovery during single user mode hits this assertion.
> TRAP: FailedAssertion("beentry", File: 
> "../../../../src/include/utils/backend_status.h", Line: 359, PID: 3499)
>

Yes, thank you for catching this.
I have moved up pgstat_beinit and pgstat_bestart so that single user
mode process will also have PgBackendStatus. I also have to guard
against sending these stats to the collector since there is no room for
B_INVALID backendtype in the array of IO Op values.

With this change 

Re: The "char" type versus non-ASCII characters

2021-12-03 Thread Tom Lane
Andrew Dunstan  writes:
> On 12/3/21 14:12, Tom Lane wrote:
>> I can think of at least three ways we might address this:
>> 
>> * Forbid all non-ASCII values for type "char".  This results in
>> simple and portable semantics, but it might break usages that
>> work okay today.
>> 
>> * Allow such values only in single-byte server encodings.  This
>> is a bit messy, but it wouldn't break any cases that are not
>> problematic already.
>> 
>> * Continue to allow non-ASCII values, but change charin/charout,
>> char_text, etc so that the external representation is encoding-safe
>> (perhaps make it an octal or decimal number).

> I don't like #2.

Yeah, it's definitely messy --- for example, maybe é works in
a latin1 database but is rejected when you try to restore into
a DB with utf8 encoding.

> Is #3 going to change the external representation only
> for non-ASCII values? If so, that seems OK.

Right, I envisioned that ASCII behaves the same but we'd use
a numeric representation for high-bit-set values.  These
cases could be told apart fairly easily by charin(), since
the numeric representation would always be three digits.

> #1 is the simplest to implement and to understand,
> and I suspect it would break very little in practice, but others might
> disagree with that assessment.

We'd still have to decide what to do with pg_upgrade'd
non-ASCII values, so there's messiness there too.
Having charout() throw an error seems not very nice.

regards, tom lane




Re: The "char" type versus non-ASCII characters

2021-12-03 Thread Andrew Dunstan


On 12/3/21 14:12, Tom Lane wrote:
> [ breaking off a different new thread ]
>
> Chapman Flack  writes:
>> Then there's "char". It's category S, but does not apply the server
>> encoding. You could call it an 8-bit int type, but it's typically used
>> as a character, making it well-defined for ASCII values and not so
>> for others, just like SQL_ASCII encoding. You could as well say that
>> the "char" type has a defined encoding of SQL_ASCII at all times,
>> regardless of the database encoding.
> This reminds me of something I've been intending to bring up, which
> is that the "char" type is not very encoding-safe.  charout() for
> example just regurgitates the single byte as-is.  I think we deemed
> that okay the last time anyone thought about it, but that was when
> single-byte encodings were the mainstream usage for non-ASCII data.
> If you're using UTF8 or another multi-byte server encoding, it's
> quite easy to get an invalidly-encoded string this way, which at
> minimum is going to break dump/restore scenarios.
>
> I can think of at least three ways we might address this:
>
> * Forbid all non-ASCII values for type "char".  This results in
> simple and portable semantics, but it might break usages that
> work okay today.
>
> * Allow such values only in single-byte server encodings.  This
> is a bit messy, but it wouldn't break any cases that are not
> problematic already.
>
> * Continue to allow non-ASCII values, but change charin/charout,
> char_text, etc so that the external representation is encoding-safe
> (perhaps make it an octal or decimal number).
>
> Either of the first two ways would have to contemplate what to do
> with disallowed values that snuck into the DB via pg_upgrade.
> That leads me to think that the third way might be the most
> preferable, even though it's not terribly backward-compatible.
>


I don't like #2. Is #3 going to change the external representation only
for non-ASCII values? If so, that seems OK.  Changing it for ASCII
values seems ugly. #1 is the simplest to implement and to understand,
and I suspect it would break very little in practice, but others might
disagree with that assessment.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





The "char" type versus non-ASCII characters

2021-12-03 Thread Tom Lane
[ breaking off a different new thread ]

Chapman Flack  writes:
> Then there's "char". It's category S, but does not apply the server
> encoding. You could call it an 8-bit int type, but it's typically used
> as a character, making it well-defined for ASCII values and not so
> for others, just like SQL_ASCII encoding. You could as well say that
> the "char" type has a defined encoding of SQL_ASCII at all times,
> regardless of the database encoding.

This reminds me of something I've been intending to bring up, which
is that the "char" type is not very encoding-safe.  charout() for
example just regurgitates the single byte as-is.  I think we deemed
that okay the last time anyone thought about it, but that was when
single-byte encodings were the mainstream usage for non-ASCII data.
If you're using UTF8 or another multi-byte server encoding, it's
quite easy to get an invalidly-encoded string this way, which at
minimum is going to break dump/restore scenarios.

I can think of at least three ways we might address this:

* Forbid all non-ASCII values for type "char".  This results in
simple and portable semantics, but it might break usages that
work okay today.

* Allow such values only in single-byte server encodings.  This
is a bit messy, but it wouldn't break any cases that are not
problematic already.

* Continue to allow non-ASCII values, but change charin/charout,
char_text, etc so that the external representation is encoding-safe
(perhaps make it an octal or decimal number).

Either of the first two ways would have to contemplate what to do
with disallowed values that snuck into the DB via pg_upgrade.
That leads me to think that the third way might be the most
preferable, even though it's not terribly backward-compatible.

There's a nearby issue that we might do something about at the
same time, which is that chartoi4() and i4tochar() think that
the byte value of a "char" is signed, while all the other
operations treat it as unsigned.  I wouldn't be too surprised if
this behavior is the direct cause of the bug fixed in a6bd28beb.
The issue vanishes if we forbid non-ASCII values, but otherwise
I'd be inclined to change these functions to treat the byte
values as unsigned.

Thoughts?

regards, tom lane




Re: types reliant on encodings [was Re: Dubious usage of TYPCATEGORY_STRING]

2021-12-03 Thread Tom Lane
[ breaking this off to an actual new thread ]

Chapman Flack  writes:
> Is there any way to find out, from the catalogs or in any automatable way,
> which types are implemented with a dependence on the database encoding
> (or on some encoding)?

Nope.  Base types are quite opaque; we don't have a lot of concepts
of type properties.  We do know which types respond to collations,
which is at least adjacent to your question, but it's not the same.

> In the alternate world, you would know that certain datatypes were
> inherently encoding-oblivious (numbers, polygons, times, ...), certain
> others are bound to the server encoding (text, varchar, name, ...), and
> still others are bound to a known encoding other than the server encoding:
> the ISO SQL NCHAR type (bound to an alternate configurable database
> encoding), "char" (always SQL_ASCII), xml/json/jsonb (always with the full
> Unicode repertoire, however they choose to represent it internally).

Another related problem here is that a "name" in a shared catalog could
be in some other encoding besides the current database's encoding.
We have no way to track that nor perform the implied conversions.
I don't have a solution for that one either, but we should include
it in the discussion if we're trying to improve the situation.

regards, tom lane




types reliant on encodings [was Re: Dubious usage of TYPCATEGORY_STRING]

2021-12-03 Thread Chapman Flack
On 12/02/21 16:22, Tom Lane wrote:
> ... types belonging to the STRING category, which are predicated
> on the assumption that such types are reasonably general-purpose
> string types.

This prods me to submit a question I've been incubating for a while.

Is there any way to find out, from the catalogs or in any automatable way,
which types are implemented with a dependence on the database encoding
(or on some encoding)?

You might think S category types, for a start: name, text, character,
varchar, all dependent on the server encoding, as you'd expect. The ones
Tom moves here to category Z were most of the ones I wondered about.

Then there's "char". It's category S, but does not apply the server
encoding. You could call it an 8-bit int type, but it's typically used
as a character, making it well-defined for ASCII values and not so
for others, just like SQL_ASCII encoding. You could as well say that
the "char" type has a defined encoding of SQL_ASCII at all times,
regardless of the database encoding.

U types are a mixed bag. Category U includes bytea (no character encoding)
and xml/json/jsonb (server encoding). Also tied to the server encoding
are cstring and unknown.

As an aside, I think it's unfortunate that the xml type has this implicit
dependency on the server encoding, when XML is by definition Unicode.
It means there are valid XML documents that PostgreSQL may not be able
to store, and which documents those are depends on what the database
encoding is. I think json and jsonb suffer in the same way.

Changing that would be disruptive at this point and I'm not suggesting it,
but there might be value in the thought experiment to see what the
alternate universe would look like.

In the alternate world, you would know that certain datatypes were
inherently encoding-oblivious (numbers, polygons, times, ...), certain
others are bound to the server encoding (text, varchar, name, ...), and
still others are bound to a known encoding other than the server encoding:
the ISO SQL NCHAR type (bound to an alternate configurable database
encoding), "char" (always SQL_ASCII), xml/json/jsonb (always with the full
Unicode repertoire, however they choose to represent it internally).

That last parenthetical reminded me that I'm really talking
about 'repertoire' here, which ISO SQL treats as a separate topic from
'encoding'. Exactly how an xml or jsonb type is represented internally
might be none of my business (unless I am developing a binary-capable
driver), but it's fair to ask what its repertoire is, and whether that's
full Unicode or not, and if not, whether the repertoire changes when some
server setting does.

I also think in that ideal world, or even this one, you could want
some way to query the catalogs to answer that basic question
about some given type.

Am I right that we simply don't have that? I currently answer such questions
by querying the catalog for the type's _send or _recv function name, then
going off to read the code, but that's hard to automate.

Regards,
-Chap




Re: pg_dump versus ancient server versions

2021-12-03 Thread Tom Lane
Peter Eisentraut  writes:
> On 02.12.21 23:16, Andres Freund wrote:
>> I realize it's more complicated for users, but a policy based on supporting a
>> certain number of out-of-support branches calculated from the newest major
>> version is more realistic. I'd personally go for something like newest-major 
>> -
>> 7 (i.e. 2 extra releases), but I realize that others think it's worthwhile to
>> support a few more.  I think there's a considerable advantage of having one
>> cutoff date across all branches.

> I'm not sure it will be clear what this would actually mean.  Assume 
> PG11 supports back to 9.4 (14-7) now, but when PG15 comes out, we drop 
> 9.4 support. But the PG11 code hasn't changed, and PG9.4 hasn't changed, 
> so it will most likely still work.  Then we have messaging that is out 
> of sync with reality.  I can see the advantage of this approach, but the 
> communication around it might have to be refined.

I don't find this suggestion to be an improvement over Peter's original
formulation, for two reasons:

* I'm not convinced that it saves us any actual work; as you say, the
code doesn't stop working just because we declare it out-of-support.

* There's a real-world use-case underneath here.  If somewhere you've
discovered a decades-old server that you need to upgrade, and current
pg_dump won't dump from it, you would like it to be well-defined
which intermediate pg_dump versions you can use.  So if 10.19 can
dump from that hoary server, it would not be nice if 10.20 can't;
nor if the documentation lies to you about that based on which minor
version you happen to consult.

>> I think we should explicitly limit the number of platforms we care about for
>> this purpose. I don't think we should even try to keep 8.2 compile on AIX or
>> whatnot.

> It's meant to be developer-facing, so only for platforms that developers 
> use.  I think that can police itself, if we define it that way.

I agree that if you care about doing this sort of test on platform X,
it's up to you to patch for that.  I think Andres' concern is about
the amount of committer bandwidth that might be needed to handle
such patches submitted by non-committers.  However, based on the
experiment I just ran, I think it's not really likely to be a big deal:
there are not that many problems, and most of them just amount to
back-patching something that originally wasn't back-patched.

What's most likely to happen IMO is that committers will just start
back-patching essential portability fixes into out-of-support-but-
still-in-the-buildability-window branches, contemporaneously with
the original fix.  Yeah, that does mean more committer effort,
but only for a very small number of patches.

regards, tom lane




Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-03 Thread Bossart, Nathan
On 12/3/21, 5:57 AM, "Bharath Rupireddy" 
 wrote:
> On Fri, Dec 3, 2021 at 3:01 AM Bossart, Nathan  wrote:
>>
>> On 12/1/21, 6:48 PM, "Bharath Rupireddy" 
>>  wrote:
>> > +1 for the overall idea of making the checkpoint faster. In fact, we
>> > here at our team have been thinking about this problem for a while. If
>> > there are a lot of files that checkpoint has to loop over and remove,
>> > IMO, that task can be delegated to someone else (maybe a background
>> > worker called background cleaner or bg cleaner, of course, we can have
>> > a GUC to enable or disable it). The checkpoint can just write some
>>
>> Right.  IMO it isn't optimal to have critical things like startup and
>> checkpointing depend on somewhat-unrelated tasks.  I understand the
>> desire to avoid adding additional processes, and maybe it is a bigger
>> hammer than what is necessary to reduce the impact, but it seemed like
>> a natural solution for this problem.  That being said, I'm all for
>> exploring other ways to handle this.
>
> Having a generic background cleaner process (controllable via a few
> GUCs), which can delete a bunch of files (snapshot, mapping, old WAL,
> temp files etc.) or some other task on behalf of the checkpointer,
> seems to be the easiest solution.
>
> I'm too open for other ideas.

I might hack something together for the separate worker approach, if
for no other reason than to make sure I really understand how these
functions work.  If/when a better idea emerges, we can alter course.

Nathan



Re: pg_dump versus ancient server versions

2021-12-03 Thread Tom Lane
Andrew Dunstan  writes:
> On 12/3/21 12:10, Tom Lane wrote:
>> I experimented to see what this would entail exactly.  Using
>> current macOS (Apple clang version 13.0.0) on M1 hardware,
>> I built with minimal configure options (--enable-debug --enable-cassert)
>> and ran the core regression tests.  

> I've mentioned my efforts on fedora previously. But like you I used a
> minimal configuration. So what would be reasonable to test? I know you
> mentioned building with perl and python upthread so we could possibly
> test transforms. Anything else? I don't think we need to worry about all
> the authentication-supporting options. XML/XSLT maybe.

Not sure.  I think we should evaluate based on
1. how integral is the option, ie how much PG code can't we test if
   we don't enable it.
2. how stable is the referenced code.

Point 2 makes me want to exclude both Python and OpenSSL, as they've
both proven to be moving API targets.  If we want to have tests for
transform modules, plperl would be sufficient for that, and perl seems
to be a lot more stable than python.  libxml is pretty morib^H^H^Hstable,
but on the other hand it seems quite noncritical for the sorts of tests
we want to run against old servers, so I'd be inclined to exclude it.

Looking through the other configure options, the only one that I find
to be a hard call is --enable-nls.  In theory this shouldn't be
critical for testing pg_dump or psql ... but you never know, and it
hasn't been a stability problem.  Every other one I think we could
ignore for these purposes.  At some point --with-icu might become
interesting, but it isn't yet relevant to any out-of-support
branches, so we can leave that call for another day.

regards, tom lane




Re: should we document an example to set multiple libraries in shared_preload_libraries?

2021-12-03 Thread Bossart, Nathan
On 12/3/21, 6:21 AM, "Bharath Rupireddy" 
 wrote:
> +1 to add here in the "Parameter Names and Values section", but do we
> want to backlink every string parameter to this section? I think it
> needs more effort. IMO, we can just backlink for
> shared_preload_libraries alone. Thoughts?

IMO this is most important for GUC_LIST_QUOTE parameters, of which
there are only a handful.  I don't think adding a link to every string
parameter is necessary.

Nathan



Re: Skip vacuum log report code in lazy_scan_heap() if possible

2021-12-03 Thread Bossart, Nathan
On 12/3/21, 7:40 AM, "Peter Geoghegan"  wrote:
> If my patch to unite vacuum verbose and the autovacuum logging gets
> in, then this issue also goes away. 

Perhaps this patch should be marked Rejected in favor of that one,
then.

Nathan



Re: Remove extra Logging_collector check before calling SysLogger_Start()

2021-12-03 Thread Tom Lane
Bharath Rupireddy  writes:
> On Fri, Dec 3, 2021 at 1:43 PM Daniel Gustafsson  wrote:
>> On 3 Dec 2021, at 08:58, Bharath Rupireddy 
>>  wrote:
>>> It seems like there's an extra Logging_collector check before calling
>>> SysLogger_Start().

>> I think the code reads clearer with the Logging_collector check left intact,
>> and avoiding a function call in this codepath doesn't hurt.

> In that case, can we remove if (!Logging_collector) within
> SysLogger_Start() and have the check outside?

I think it's fine as-is; good belt-and-suspenders-too programming.
It's not like the extra check inside SysLogger_Start() is costly.

regards, tom lane




Re: Do sys logger and stats collector need wait events WAIT_EVENT_SYSLOGGER_MAIN/_PGSTAT_MAIN?

2021-12-03 Thread Tom Lane
Bharath Rupireddy  writes:
> Although the pg_stat_activity has no entry for the sys logger and
> stats collector (because of no shared memory access), the wait events
> WAIT_EVENT_SYSLOGGER_MAIN and WAIT_EVENT_PGSTAT_MAIN are defined. They
> seem to be unnecessary. Passing 0 or some other undefined wait event
> value to the existing calls of WaitLatch and WaitLatchOrSocket instead
> of WAIT_EVENT_SYSLOGGER_MAIN/WAIT_EVENT_PGSTAT_MAIN, would work. We
> can delete these wait events and their info from pgstat.c.

Well ... mumble.  The fact that these events are defined would lead
people to wonder why they're not hit, so there's a documentation reason
to get rid of them.  However, I quite dislike the suggestion of "just
pass zero"; that will probably draw compiler warnings, or if it doesn't
it should.  We'd have to invent some other "unused" wait event, and
then it's not clear that we've made any gain in intelligibility.

On the whole I'd be inclined to leave it alone.  Even if the reporting
mechanisms aren't able to report these events today, maybe they'll
be able to do so in future.  The context of the stats collector
process, in particular, seems likely to change.

regards, tom lane




Re: pg_dump versus ancient server versions

2021-12-03 Thread Andrew Dunstan


On 12/3/21 12:10, Tom Lane wrote:
> Peter Eisentraut  writes:
>> [ policy requiring that 9.2 and up be kept buildable, as of today ]
> I experimented to see what this would entail exactly.  Using
> current macOS (Apple clang version 13.0.0) on M1 hardware,
> I built with minimal configure options (--enable-debug --enable-cassert)
> and ran the core regression tests.  


I've mentioned my efforts on fedora previously. But like you I used a
minimal configuration. So what would be reasonable to test? I know you
mentioned building with perl and python upthread so we could possibly
test transforms. Anything else? I don't think we need to worry about all
the authentication-supporting options. XML/XSLT maybe.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Alter all tables in schema owner fix

2021-12-03 Thread Bossart, Nathan
On 12/2/21, 11:57 PM, "tanghy.f...@fujitsu.com"  wrote:
> Thanks for your patch.
> I tested it and it fixed this problem as expected. It also passed "make 
> check-world".

+1, the patch looks good to me, too.  My only other suggestion would
be to move IsSchemaPublication() to pg_publication.c

Nathan



Re: extended stats on partitioned tables

2021-12-03 Thread Zhihong Yu
On Thu, Dec 2, 2021 at 9:24 PM Justin Pryzby  wrote:

> On Thu, Nov 04, 2021 at 12:44:45AM +0100, Tomas Vondra wrote:
> > >> And I'm not sure we do the right thing after removing children, for
> example
> > >> (that should drop the inheritance stats, I guess).
>
> > > Do you mean for inheritance only ?  Or partitions too ?
> > > I think for partitions, the stats should stay.
> > > And for inheritence, they can stay, for consistency with partitions,
> and since
> > > it does no harm.
> >
> > I think the behavior should be the same as for data in pg_statistic,
> > i.e. if we keep/remove those, we should do the same thing for extended
> > statistics.
>
> That works for column stats the way I proposed for extended stats: child
> stats
> are never removed, neither when the only child is dropped, nor when
> re-running
> analyze (that part is actually a bit odd).
>
> Rebased, fixing an intermediate compile error, and typos in the commit
> message.
>
> --
> Justin
>
Hi,

+   if (!HeapTupleIsValid(tup)) /* should not happen */
+   // elog(ERROR, "cache lookup failed for statistics data %u",
statsOid);

You may want to remove commented out code.

+   for (i = 0; i < staForm->stxkeys.dim1; i++)
+   keys = bms_add_member(keys, staForm->stxkeys.values[i]);

Since the above code is in a loop now, should keys be cleared across the
outer loop iterations ?

Cheers


Re: pg_dump versus ancient server versions

2021-12-03 Thread Tom Lane
Peter Eisentraut  writes:
> [ policy requiring that 9.2 and up be kept buildable, as of today ]

I experimented to see what this would entail exactly.  Using
current macOS (Apple clang version 13.0.0) on M1 hardware,
I built with minimal configure options (--enable-debug --enable-cassert)
and ran the core regression tests.  I found that commit
1c0cf52b3 (Use return instead of exit() in configure) is needed
in 9.4 and before, else we don't get through configure.
That's the only fix needed to get a clean build in 9.4 and 9.3.
9.2 shows several compiler warnings, the scarier ones of which could
be cleaned up by back-patching c74d586d2 (Fix function return type
confusion).  The remainder are variable-may-be-used-uninitialized
warnings, which I think people are accustomed to ignoring in
dubious cases.  In any case, I failed to get rid of them without
back-patching 71450d7fd (Teach compiler that ereport(>=ERROR) does
not return), which seems like a bridge too far.

I also tried 9.1, but it has multiple compile-time problems:
* fails to select a spinlock implementation
* "conflicting types for 'base_yylex'"
* strange type-conflict warnings in zlib calls

So at least on this platform, there are solid technical reasons
to select 9.2 not 9.1 as the cutoff.

Obviously, we might find some other things to fix if we checked
with other compilers, or tested more than the core tests.
But this much seems quite doable, and it's probably prerequisite
for any further testing.

regards, tom lane




Re: Non-superuser subscription owners

2021-12-03 Thread Mark Dilger



> On Dec 2, 2021, at 1:29 AM, Amit Kapila  wrote:
> 
> If we want to maintain the property that subscriptions can only be
> owned by superuser for your first version then isn't a simple check
> like ((!superuser()) for each of the operations is sufficient?

As things stand today, nothing prevents a superuser subscription owner from 
having superuser revoked.  The patch does nothing to change this.

> In (2), I am not clear what do you mean by "the old owner has
> privileges increased"? If the owners can only be superusers then what
> does it mean to increase the privileges.

The old owner may have had privileges reduced (no superuser, only permission to 
write into a specific schema, etc.) and the subscription enabled only after 
those privilege reductions were put in place.  This is a usage pattern this 
patch is intended to support, by honoring those privilege restrictions.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

2021-12-03 Thread Andrei Zubkov
On Fri, 2021-12-03 at 17:03 +0300, Andrei Zubkov wrote:
> I've added the following fields to the pg_stat_statements view:
> 
>     min_plan_time_local float8,
>     max_plan_time_local float8,
>     min_exec_time_local float8,
>     max_exec_time_local float8
> 
> and a function that is able to reset those fields:
> 
> CREATE FUNCTION pg_stat_statements_reset_local(IN userid Oid DEFAULT
> 0,
>     IN dbid Oid DEFAULT 0,
>     IN queryid bigint DEFAULT 0
> )
> 
> It resets the local fields mentioned above and updates the new field
> 
>     local_stats_since timestamp with time zone
> 
> with the current timestamp. All other statement statistics are
> remains
> unchanged. 

After adding new fields to pg_stat_statements view it looks a little
bit overloaded. Furthermore, fields in this view have different
behavior.

What if we'll create a new view for such resettable fields? It will
make description of views and reset functions in the docs much more
clear.
-- 
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: [PROPOSAL] new diagnostic items for the dynamic sql

2021-12-03 Thread Zhihong Yu
On Fri, Dec 3, 2021 at 3:15 AM Dinesh Chemuduru 
wrote:

> Hi Michael,
>
> Attaching the latest patch here(It's the recent patch), and looking for
> more suggestions/inputs from the team.
>
> On Fri, 3 Dec 2021 at 13:09, Michael Paquier  wrote:
>
>> On Wed, Nov 10, 2021 at 01:58:58PM +0530, Dinesh Chemuduru wrote:
>> > The proposed statements are much clear, but will wait for other’s
>> > suggestion, and will fix it accordingly.
>>
>> This update was three weeks ago, and no new version has been
>> provided, so I am marking this as returned with feedback in the CF
>> app.  If you can work more on this proposal and send an updated patch,
>> please feel free to resubmit.
>> --
>> Michael
>>
> Hi,

+int
+set_errquery(const char *query)

Since the return value is ignored, the return type can be void.

Cheers


Re: pg_dump versus ancient server versions

2021-12-03 Thread Peter Eisentraut

On 02.12.21 23:16, Andres Freund wrote:

I think we should at least include pg_upgrade in this as well, it's pretty
closely tied to at least pg_dump.


right


* pg_dump and psql will maintain compatibility with servers at least
   ten major releases back.


Personally I think that's too long... It boils down keeping branches buildable
for ~15 years after they've been released. That strikes me as pretty far into
diminishing-returns, and steeply increasing costs, territory.


Well, it is a lot, but it's on the order of what we have historically 
provided.



I realize it's more complicated for users, but a policy based on supporting a
certain number of out-of-support branches calculated from the newest major
version is more realistic. I'd personally go for something like newest-major -
7 (i.e. 2 extra releases), but I realize that others think it's worthwhile to
support a few more.  I think there's a considerable advantage of having one
cutoff date across all branches.


I'm not sure it will be clear what this would actually mean.  Assume 
PG11 supports back to 9.4 (14-7) now, but when PG15 comes out, we drop 
9.4 support. But the PG11 code hasn't changed, and PG9.4 hasn't changed, 
so it will most likely still work.  Then we have messaging that is out 
of sync with reality.  I can see the advantage of this approach, but the 
communication around it might have to be refined.



I think we should explicitly limit the number of platforms we care about for
this purpose. I don't think we should even try to keep 8.2 compile on AIX or
whatnot.


It's meant to be developer-facing, so only for platforms that developers 
use.  I think that can police itself, if we define it that way.





Re: Commitfest 2021-11 closed

2021-12-03 Thread Dmitry Dolgov
> On Fri, Dec 03, 2021 at 09:51:21AM +0100, Daniel Gustafsson wrote:
> I've now closed the 2021-11 commitfest, ~36% of the patches were closed in 
> some
> way (committed, returned with feedback, withdrawn or rejected) with 184 
> patches
> moved to the next CF.

Impressive numbers, thank you!




Re: pg_dump versus ancient server versions

2021-12-03 Thread Peter Eisentraut

On 02.12.21 18:30, Tom Lane wrote:

This assumes a yearly major release cadence.


If the point is to not have to count dates carefully, why does the cadence
matter?


If we were to change the release cadence, then it would be appropriate 
to review this policy.



I can get behind something roughly like this, but I wonder if it wouldn't
be better to formulate the policy in a reactive way, i.e. when X happens
we'll do Y.  If we don't plan to proactively remove some code every year,
then it seems like the policy really is more like "when something breaks,
then we'll make an attempt to keep it working if the release is less than
ten majors back; otherwise we'll declare that release no longer
buildable."


This sounds like it would give license to accidentally break support for 
old releases in the code and only fix them if someone complains.  That's 
not really what I would be aiming for.



I agree with the idea of being conservative about what outside
dependencies we will worry about for "buildable" old versions.
(Your nearby message about Python breakage is a good example of
why we must limit that.)  But I wonder about, say, libxml or libicu,
or even if we can afford to drop all the non-plpgsql PLs.  An
example of why that seems worrisome is that it's not clear we'd
have any meaningful coverage of transforms in pg_dump with no PLs.
I don't have any immediate proposal here, but it seems like an area
that needs some thought and specific policy.


Yeah, I think questions like this will currently quickly lead to dead 
ends.  We are talking 5 years this, 10 years that here.  Everybody else 
(apart from RHEL) is talking at best in the range 3-5 years.  We will 
have to figure this out as we go.





Re: Replace uses of deprecated Python module distutils.sysconfig

2021-12-03 Thread Peter Eisentraut

On 02.12.21 19:22, Tom Lane wrote:

That's surely no problem in HEAD, but as you say, it is an issue for
the older branches.  How difficult would it be to teach configure to
try both ways, or adapt based on its python version check?


I think it wouldn't be unreasonable to do that.  I'll look into it.




Re: pgcrypto: Remove explicit hex encoding/decoding from tests

2021-12-03 Thread Tom Lane
Peter Eisentraut  writes:
> pg_regress.c sets bytea_output to hex already.

Ah, right.  Nevermind that then.

regards, tom lane




Re: pgcrypto: Remove explicit hex encoding/decoding from tests

2021-12-03 Thread Peter Eisentraut

On 02.12.21 19:30, Tom Lane wrote:

Generally +1, but I see you removed some instances of

--- ensure consistent test output regardless of the default bytea format
-SET bytea_output TO escape;

I think that the principle still applies that this should work regardless
of the installation's default bytea format, so I'd recommend putting

-- ensure consistent test output regardless of the default bytea format
SET bytea_output TO hex;

at the top of each file instead.


pg_regress.c sets bytea_output to hex already.




Re: Skip vacuum log report code in lazy_scan_heap() if possible

2021-12-03 Thread Peter Geoghegan
If my patch to unite vacuum verbose and the autovacuum logging gets in,
then this issue also goes away.

Peter Geoghegan
(Sent from my phone)


Re: Commitfest 2021-11 closed

2021-12-03 Thread Tom Lane
Daniel Gustafsson  writes:
> I've now closed the 2021-11 commitfest, ~36% of the patches were closed in 
> some
> way (committed, returned with feedback, withdrawn or rejected) with 184 
> patches
> moved to the next CF.

Thanks for all your hard work on managing the CF!

(And particularly, thanks for being so aggressive about closing
stalled patches instead of just punting them forward.)

regards, tom lane




Re: Allow escape in application_name

2021-12-03 Thread Fujii Masao




On 2021/11/08 22:40, kuroda.hay...@fujitsu.com wrote:

Attached patches are the latest version.


Thanks for updating the patch!

+   buf = makeStringInfo();

This is necessary only when application_name is set. So it's better to avoid 
this if appname is not set.

Currently StringInfo variable is defined in connect_pg_server() and passed to 
parse_pgfdw_appname(). But there is no strong reason why the variable needs to 
be defined connect_pg_server(). Right? If so how about refactoring 
parse_pgfdw_appname() so that it defines StringInfoData variable and returns 
the processed character string? You can see such coding at, for example, 
escape_param_str() in dblink.c.

If this refactoring is done, probably we can get rid of "#include 
"lib/stringinfo.h"" line from connection.c because connect_pg_server() no longer 
needs to use StringInfo.

+   int i;

+   for (i = n - 1; i >= 0; i--)

I'm tempted to simplify these into "for (int i = n - 1; i >= 0; i--)".

+   /*
+* Search application_name and replace it if found.
+*
+* We search paramters from the end because the later
+* one have higher priority.  See also the above comment.
+*/

How about updating these comments into the following?

---
Search the parameter arrays to find application_name setting,
and replace escape sequences in it with status information if found.
The arrays are searched backwards because the last value
is used if application_name is repeatedly set.
---

+   if (strcmp(buf->data, "") != 0)

Is this condition the same as "data[0] == '\0'"?

+ * parse postgres_fdw.application_name and set escaped string.
+ * This function is almost same as log_line_prefix(), but
+ * accepted escape sequence is different and this cannot understand
+ * padding integer.
+ *
+ * Note that argument buf must be initialized.

How about updating these comments into the following?
I thought that using the term "parse" was a bit confusing because what the 
function actually does is to replace escape seequences in application_name. Probably it's 
also better to rename the function, e.g., to process_pgfdw_appname .

-
Replace escape sequences beginning with % character in the given
application_name with status information, and return it.
-

+   if (appname == NULL)
+   appname = "[unknown]";
+   if (username == NULL || *username == 
'\0')
+   username = "[unknown]";
+   if (dbname == NULL || *dbname == '\0')
+   dbname =  "[unknown]";

I'm still not sure why these are necessary. Could you clarify that?

+  Same as , this is a
+  printf-style string. Accepted escapes are
+  bit different from ,
+  but padding can be used like as it.

This description needs to be updated.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2021-12-03 Thread Dilip Kumar
On Fri, Dec 3, 2021 at 7:38 PM Ashutosh Sharma  wrote:
>
> I see that this patch is reducing the database creation time by almost 3-4 
> times provided that the template database has some user data in it. However, 
> there are couple of points to be noted:

Thanks a lot for looking into the patches.
>
> 1) It makes the crash recovery a bit slower than before if the crash has 
> occurred after the execution of a create database statement. Moreover, if the 
> template database size is big, it might even generate a lot of WAL files 
> which the user needs to be aware of.

Yes it will but actually that is the only correct way to do it, in
current we are just logging the WAL as copying the source directory to
destination directory without really noting down exactly what we
wanted to copy, so we are force to do the checkpoint right after
create database because in crash recovery we can not actually replay
that WAL.  Because WAL just say copy the source to destination so it
is very much possible that at the DO time source directory had some
different content than the REDO time so this would have created the
inconsistencies in the crash recovery so to avoid this bug they force
the checkpoint so now also if you do force checkpoint then again crash
recovery will be equally fast.  So I would not say that we have made
crash recovery slow but we have removed some bugs and with that now we
don't need to force the checkpoint.  Also note that in current code
even with force checkpoint the bug is not completely avoided in all
the cases, check below comments from the code[1].

> 2) This will put a lot of load on the first checkpoint that will occur after 
> creating the database statement. I will experiment around this to see if this 
> has any side effects.

But now a checkpoint can happen at its own need and there is no need
to force a checkpoint like it was before patch.

So the major goal of this patch is 1) Correctly WAL log the create
database which is hack in the current system,  2) Avoid force
checkpoints, 3) We copy page by page so it will support TDE because if
the source and destination database has different encryption then we
can reencrypt the page before copying to destination database, which
is not possible in current system as we are copying directory  4) Now
the new database pages will get the latest LSN which is the correct
things earlier new database pages were getting copied directly with
old LSN only.


> Further, the code changes in the patch looks good. I just have few comments:

I will look into the other comments and get back to you, thanks.

[1]
* In PITR replay, the first of these isn't an issue, and the second
* is only a risk if the CREATE DATABASE and subsequent template
* database change both occur while a base backup is being taken.
* There doesn't seem to be much we can do about that except document
* it as a limitation.
*
* Perhaps if we ever implement CREATE DATABASE in a less cheesy way,
* we can avoid this.
*/
RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Is there any documentation on how to correctly create extensions in HA(primary-standby) setup?

2021-12-03 Thread Bharath Rupireddy
On Wed, Dec 1, 2021 at 12:31 PM Bharath Rupireddy
 wrote:
>
> Hi,
>
> I recently did a small experiment to see how one can create extensions
> properly in HA(primary-standby) setup.
>
> Here are my findings:
> 1) ALTER SYSTEM SET or GUC(configuration parameters) settings are not
> replicated to standby.
> 2) CREATE EXTENSION statements are replicated to standby.
> 3) If the extension doesn't need to be set up in
> shared_preload_libraries GUC, no need to create extension on the
> standby, it just works.
> 4) If the extension needs to be set up in shared_preload_libraries
> GUC: the correct way to install the extension on both primary and
> standby is:
> a) set shared_preload_libraries GUC on primary, reload conf,
> restart the primary to make the GUC effective.
> b) set shared_preload_libraries GUC on standby, restart the
> standby to make the GUC effective.
> c) create extension on primary (we don't need to create extension
> on standby as the create extension statements are replicated).
> d) verify that the extension functions work on both primary and standby.
> 5) The extensions which perform writes to the database may not work on
> standby as the write transactions are not allowed on the standby.
> However, the create extension on the standby works just fine but the
> functions it provides may not work.
>
> I think I was successful in my experiment, please let me know if
> anything is wrong in what I did.
>
> Do we have the documentation on how to create extensions correctly in
> HA setup? If what I did is correct and we don't have it documented,
> can we have it somewhere in the existing HA related documentation?

I'm thinking of adding the above steps into the "Additional Supplied
Modules" section documentation. Any thoughts please?

[1] - https://www.postgresql.org/docs/devel/contrib.html

Regards,
Bharath Rupireddy.




Re: should we document an example to set multiple libraries in shared_preload_libraries?

2021-12-03 Thread Bharath Rupireddy
On Fri, Dec 3, 2021 at 6:33 AM Michael Paquier  wrote:
>
> On Fri, Dec 03, 2021 at 12:45:56AM +, Bossart, Nathan wrote:
> > I think the problems you noted upthread are shared for all GUCs with
> > type GUC_LIST_QUOTE (e.g., search_path, temp_tablespaces).  Perhaps
> > the documentation for each of these GUCs should contain a short blurb
> > about how to properly SET a list of values.
>
> Yeah, the approach taken by the proposed patch is not going to scale
> and age well.
>
> It seems to me that we should have something dedicated to lists around
> the section for "Parameter Names and Values", and add a link in the
> description of each parameters concerned back to the generic
> description.

+1 to add here in the "Parameter Names and Values section", but do we
want to backlink every string parameter to this section? I think it
needs more effort. IMO, we can just backlink for
shared_preload_libraries alone. Thoughts?

 
  
   String:
   In general, enclose the value in single quotes, doubling any single
   quotes within the value.  Quotes can usually be omitted if the value
   is a simple number or identifier, however.
  
 

> > Also upthread, I see that you gave the following example for an
> > incorrect way to set shared_preload_libraries:
> >
> > ALTER SYSTEM SET shared_preload_libraries =
> > auth_delay,pg_stat_statements,sepgsql;  --> wrong
> >
> > Why is this wrong?  It seems to work okay for me.
>
> Yep.

My bad. Yes, it works.

Regards,
Bharath Rupireddy.




Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?

2021-12-03 Thread Bharath Rupireddy
On Wed, Dec 1, 2021 at 9:50 PM Alvaro Herrera  wrote:
>
> On 2021-Dec-01, Bharath Rupireddy wrote:
>
> > The active_pid of ReplicationSlot structure, which tells whether a
> > replication slot is active or inactive, isn't persisted to the disk
> > i.e has no entry in ReplicationSlotPersistentData structure. Isn't it
> > better if we add that info to ReplicationSlotPersistentData structure
> > and persist to the disk? This will help to know what were the inactive
> > replication slots in case the server goes down or crashes for some
> > reason. Currently, we don't have a way to interpret the replication
> > slot info in the disk but there's a patch for pg_replslotdata tool at
> > [1]. This way, one can figure out the reasons for the server
> > down/crash and figure out which replication slots to remove to bring
> > the server up and running without touching the other replication
> > slots.
>
> I think the PIDs are log-worthy for sure, but it's not clear to me that
> it is desirable to write them to the persistent state file.  In case of
> crashes, the log should serve just fine to aid root cause investigation
> -- in fact even better than the persistent file, where the data would be
> lost as soon as the next client acquires that slot.

Thanks. +1 to log a message at LOG level whenever a replication slot
becomes active (gets assigned a valid pid to active_pid) and becomes
inactive(gets assigned 0 to active_pid). Having said that, isn't it
also helpful if we write a bool (1 byte character) whenever the slot
becomes active and inactive to the disk?

Regards,
Bharath Rupireddy.




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2021-12-03 Thread Ashutosh Sharma
I see that this patch is reducing the database creation time by almost 3-4
times provided that the template database has some user data in it.
However, there are couple of points to be noted:

1) It makes the crash recovery a bit slower than before if the crash has
occurred after the execution of a create database statement. Moreover, if
the template database size is big, it might even generate a lot of WAL
files which the user needs to be aware of.

2) This will put a lot of load on the first checkpoint that will occur
after creating the database statement. I will experiment around this to see
if this has any side effects.

--

Further, the code changes in the patch looks good. I just have few comments:

+void
+LockRelationId(LockRelId *relid, LOCKMODE lockmode)
+{
+   LOCKTAG tag;
+   LOCALLOCK  *locallock;
+   LockAcquireResult res;
+
+   SET_LOCKTAG_RELATION(tag, relid->dbId, relid->relId);

Should there be an assertion statement here to ensure that relid->dbid
and  relid->relid is valid?

--

if (info == XLOG_DBASE_CREATE)
{
xl_dbase_create_rec *xlrec = (xl_dbase_create_rec *)
XLogRecGetData(record);
-   char   *src_path;
-   char   *dst_path;
-   struct stat st;
-
-   src_path = GetDatabasePath(xlrec->src_db_id,
xlrec->src_tablespace_id);
-   dst_path = GetDatabasePath(xlrec->db_id, xlrec->tablespace_id);
+   char   *dbpath;

-   /*
-* Our theory for replaying a CREATE is to forcibly drop the target
-* subdirectory if present, then re-copy the source data. This may
be
-* more work than needed, but it is simple to implement.
-*/
-   if (stat(dst_path, ) == 0 && S_ISDIR(st.st_mode))
-   {
-   if (!rmtree(dst_path, true))
-   /* If this failed, copydir() below is going to error. */
-   ereport(WARNING,
-   (errmsg("some useless files may be left behind in
old database directory \"%s\"",
-   dst_path)));
-   }

I think this is a significant change and probably needs some kind of
explanation/comments as-in why we are just creating a dir and copying the
version file when replaying create database operation. Earlier, this meant
replaying the complete create database operation, that doesn't seem to be
the case now.

--

Have you intentionally skipped pg_internal.init file from being copied to
the target database?

--
With Regards,
Ashutosh Sharma.


On Thu, Dec 2, 2021 at 7:20 PM Dilip Kumar  wrote:

> On Wed, Dec 1, 2021 at 6:04 PM Dilip Kumar  wrote:
>
> > Thanks a lot for testing this. From the error, it seems like some of
> > the old buffer w.r.t. the previous tablespace is not dropped after the
> > movedb.  Actually, we are calling DropDatabaseBuffers() after copying
> > to a new tablespace and dropping all the buffers of this database
> > w.r.t the old tablespace.  But seems something is missing, I will
> > reproduce this and try to fix it by tomorrow.  I will also fix the
> > other review comments raised by you in the previous mail.
>
> Okay, I got the issue, basically we are dropping the database buffers
> but not unregistering the existing sync request for database buffers
> w.r.t old tablespace. Attached patch fixes that.  I also had to extend
> ForgetDatabaseSyncRequests so that we can delete the sync request of
> the database for the particular tablespace so added another patch for
> the same (0006).
>
> I will test the performance scenario next week, which is suggested by John.
>
>
> --
> Regards,
> Dilip Kumar
> EnterpriseDB: http://www.enterprisedb.com
>


Re: Shouldn't postgres_fdw report warning when it gives up getting result from foreign server?

2021-12-03 Thread Bharath Rupireddy
On Fri, Dec 3, 2021 at 2:26 PM Fujii Masao  wrote:
> > For a cancel request maybe we can just say without te errcontext:
> >  ereport(WARNING,
> >  (errmsg("could not get result of cancel
> > request due to timeout")));
> >
> > See the below existing message using "cancel request":
> >   errmsg("could not send cancel request: %s",
> >
> > For SQL command we can say:
> >  ereport(WARNING,
> >  (errmsg("could not get query result due to
> > timeout"),
> >   query ? errcontext("remote SQL command:
> > %s", query) : 0));
>
> I wonder how pgfdw_get_cleanup_result() can determine which
> log message to report. Probably we can add new boolean argument
> to pgfdw_get_cleanup_result() so that it should be set to true
> for cancel request case, but false for query case. Then
> pgfdw_get_cleanup_result() can decide wihch message to log
> based on that argument. But it seems not good design to me.
> Thought?

Let's not use the boolean just for the cancel request which isn't
scalable IMO. Maybe a macro/enum?

Otherwise, we could just do, although it doesn't look elegant:

if (pgfdw_get_cleanup_result(conn, endtime, , "(cancel request)"))

if (strcmp(query, "(cancel request)") == 0)
WARNING without  "remote SQL command:
else
WARNING with "remote SQL command:

Regards,
Bharath Rupireddy.




Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

2021-12-03 Thread Andrei Zubkov
Hi, Anton!

Thank you for your review!

On Mon, 2021-10-18 at 22:11 +0300, Anton A. Melnikov wrote:
> So i suppose that additional vars loc_min and loc_max is a right way
> to do it.

I've added the following fields to the pg_stat_statements view:

min_plan_time_local float8,
max_plan_time_local float8,
min_exec_time_local float8,
max_exec_time_local float8

and a function that is able to reset those fields:

CREATE FUNCTION pg_stat_statements_reset_local(IN userid Oid DEFAULT 0,
IN dbid Oid DEFAULT 0,
IN queryid bigint DEFAULT 0
)

It resets the local fields mentioned above and updates the new field

local_stats_since timestamp with time zone

with the current timestamp. All other statement statistics are remains
unchanged. After the reset _local fields will have NULL values till the
next statement execution.

> And one more thing, if there was a reset of stats between two
> samples,
> then i think it is the best to ignore the new values, 
> since they were obtained for an incomplete period.
> This patch, thanks to the saved time stamp, makes possible 
> to determine the presence of reset between samples and
> there should not be a problem to realize such algorithm.
Yes, it seems this is up to the sampling solution. Maybe in some cases
incomplete information will be better than nothing... Anyway we have
all necessary data now.


> The only thing I could suggest to your notice
> is a small cosmetic edit to replace
> the numeric value in #define on line 1429 of pg_stat_statements.c
> by one of the constants defined above. 
Hmm. I've left it just like it was before me. But it seems, you are
right.

I've attached a new version of a patch. The first_seen column was
renamed to stats_since - it seems to be more self-explaining to me. But
I'm not sure in the current naming at all.

The tests is not ready yet, but any thoughts about the patch are
welcome right now.

-- 
Andrei Zubkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From bea0ae9fd062ee1810c9ff2d5f1be98a19114d84 Mon Sep 17 00:00:00 2001
From: Andrei Zubkov 
Date: Fri, 3 Dec 2021 16:25:17 +0300
Subject: [PATCH] pg_stat_statements: Track statement entry timestamp

This patch adds stats_since column to the pg_stat_statements view. This column
is populated with the current timestamp when a new statement is added to the
pg_stat_statements hashtable. This column provides clean information about
statistics collection time interval for each statement. Besides it can be used
by sampling solutions to detect situations when a statement was evicted and
returned back between two samples.
Such sampling solution could derive any pg_stat_statements statistic value for
an interval between two samples with except of all min/max statistics. To
address this issue this patch adds the following columns:

min_plan_time_local float8,
max_plan_time_local float8,
min_exec_time_local float8,
max_exec_time_local float8

And a function to reset those statistics during a sample:

CREATE FUNCTION pg_stat_statements_reset_local(IN userid Oid DEFAULT 0,
IN dbid Oid DEFAULT 0,
IN queryid bigint DEFAULT 0
)

The timestamp of last local statistics reset is stored for every statement in
a column:

local_stats_since timestamp with time zone

Discussion: https://www.postgresql.org/message-id/flat/72e80e7b160a6eb189df9ef6f068cce3765d37f8.camel%40moonset.ru
---
 contrib/pg_stat_statements/Makefile   |   3 +-
 .../expected/pg_stat_statements.out   |  29 +++
 .../pg_stat_statements--1.9--1.10.sql |  72 +++
 .../pg_stat_statements/pg_stat_statements.c   | 200 +-
 .../pg_stat_statements.control|   2 +-
 .../sql/pg_stat_statements.sql|   9 +
 doc/src/sgml/pgstatstatements.sgml| 111 +-
 7 files changed, 419 insertions(+), 7 deletions(-)
 create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql

diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 7fabd96f38..edc40c8bbf 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -6,7 +6,8 @@ OBJS = \
 	pg_stat_statements.o
 
 EXTENSION = pg_stat_statements
-DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.8--1.9.sql \
+DATA = pg_stat_statements--1.4.sql \
+	pg_stat_statements--1.9--1.10.sql pg_stat_statements--1.8--1.9.sql \
 	pg_stat_statements--1.7--1.8.sql pg_stat_statements--1.6--1.7.sql \
 	pg_stat_statements--1.5--1.6.sql pg_stat_statements--1.4--1.5.sql \
 	pg_stat_statements--1.3--1.4.sql pg_stat_statements--1.2--1.3.sql \
diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index e0abe34bb6..9388490073 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -1077,4 +1077,33 @@ 

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-03 Thread Bharath Rupireddy
On Fri, Dec 3, 2021 at 3:01 AM Bossart, Nathan  wrote:
>
> On 12/1/21, 6:48 PM, "Bharath Rupireddy" 
>  wrote:
> > +1 for the overall idea of making the checkpoint faster. In fact, we
> > here at our team have been thinking about this problem for a while. If
> > there are a lot of files that checkpoint has to loop over and remove,
> > IMO, that task can be delegated to someone else (maybe a background
> > worker called background cleaner or bg cleaner, of course, we can have
> > a GUC to enable or disable it). The checkpoint can just write some
>
> Right.  IMO it isn't optimal to have critical things like startup and
> checkpointing depend on somewhat-unrelated tasks.  I understand the
> desire to avoid adding additional processes, and maybe it is a bigger
> hammer than what is necessary to reduce the impact, but it seemed like
> a natural solution for this problem.  That being said, I'm all for
> exploring other ways to handle this.

Having a generic background cleaner process (controllable via a few
GUCs), which can delete a bunch of files (snapshot, mapping, old WAL,
temp files etc.) or some other task on behalf of the checkpointer,
seems to be the easiest solution.

I'm too open for other ideas.

> > Another idea could be to parallelize the checkpoint i.e. IIUC, the
> > tasks that checkpoint do in CheckPointGuts are independent and if we
> > have some counters like (how many snapshot/mapping files that the
> > server generated)
>
> Could you elaborate on this?  Is your idea that the checkpointer would
> create worker processes like autovacuum does?

Yes, I was thinking that the checkpointer creates one or more dynamic
background workers (we can assume one background worker for now) to
delete the files. If a threshold of files crosses (snapshot files
count is more than this threshold), the new worker gets spawned which
would then enumerate the files and delete the unneeded ones, the
checkpointer can proceed with the other tasks and finish the
checkpointing. Having said this, I prefer the background cleaner
approach over the dynamic background worker. The advantage with the
background cleaner being that it can do other tasks (like other kinds
of file deletion).

Another idea could be that, use the existing background writer to do
the file deletion while the checkpoint is happening. But again, this
might cause problems because the bg writer flushing dirty buffers will
get delayed.

Regards,
Bharath Rupireddy.




Re: Multi-Column List Partitioning

2021-12-03 Thread Amul Sul
Hi,

Few comments for v7 patch, note that I haven't been through the
previous discussion, if any of the review comments that has been
already discussed & overridden, then please ignore here too:


partbounds.c: In function ‘get_qual_for_list.isra.18’:
partbounds.c:4284:29: warning: ‘boundinfo’ may be used uninitialized
in this function [-Wmaybe-uninitialized]
 datumCopy(bound_info->datums[i][j],
   ~~^~~~
partbounds.c:4335:21: note: ‘boundinfo’ was declared here
  PartitionBoundInfo boundinfo;
 ^
partbounds.c: In function ‘partition_bounds_merge’:
partbounds.c:1305:12: warning: ‘inner_isnull’ may be used
uninitialized in this function [-Wmaybe-uninitialized]
   bool*inner_isnull;
^~~~
partbounds.c:1304:12: warning: ‘outer_isnull’ may be used
uninitialized in this function [-Wmaybe-uninitialized]
   bool*outer_isnull;
   ^~~~

Got these warnings with gcc -O2 compilation.


 /*
+ * isListBoundDuplicated
+ *
+ * Returns TRUE if the list bound element 'new_bound' is already present
+ * in the target list 'list_bounds', FALSE otherwise.
+ */
+static bool
+isListBoundDuplicated(List *list_bounds, List *new_bound)
+{
+ ListCell   *cell = NULL;
+
+ foreach(cell, list_bounds)
+ {
+ int i;
+ List   *elem = lfirst(cell);
+ bool isDuplicate = true;
+
+ Assert(list_length(elem) == list_length(new_bound));
+
+ for (i = 0; i < list_length(elem); i++)
+ {
+ Const   *value1 = castNode(Const, list_nth(elem, i));
+ Const   *value2 = castNode(Const, list_nth(new_bound, i));
+
+ if (!equal(value1, value2))
+ {
+ isDuplicate = false;
+ break;
+ }
+ }
+
+ if (isDuplicate)
+ return true;
+ }
+
+ return false;
+}

This function is unnecessarily complicated, I think you can avoid
inner for loops; simply replace for-loop-block with  "if
(equal(lfirst(cell), new_bound)) return true".


+ char   **colname = (char **) palloc0(partnatts * sizeof(char *));
+ Oid*coltype = palloc0(partnatts * sizeof(Oid));
+ int32*coltypmod = palloc0(partnatts * sizeof(int));
+ Oid*partcollation = palloc0(partnatts * sizeof(Oid));
+
This allocation seems to be worthless, read ahead.


+ for (i = 0; i < partnatts; i++)
+ {
+ if (key->partattrs[i] != 0)
+ colname[i] = get_attname(RelationGetRelid(parent),
+ key->partattrs[i], false);
+ else
+ {
+ colname[i] =
+ deparse_expression((Node *) list_nth(partexprs, j),
+deparse_context_for(RelationGetRelationName(parent),
+RelationGetRelid(parent)),
+false, false);
+ ++j;
+ }
+
+ coltype[i] = get_partition_col_typid(key, i);
+ coltypmod[i] = get_partition_col_typmod(key, i);
+ partcollation[i] = get_partition_col_collation(key, i);
+ }

I think there is no need for this separate loop inside
transformPartitionListBounds, you can do that same in the next loop as
well. And instead of  get_partition_col_* calling and storing, simply
use that directly as an argument to transformPartitionBoundValue().


+
+ if (IsA(expr, RowExpr) &&
+ partnatts != list_length(((RowExpr *) expr)->args))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+ errmsg("Must specify exactly one value per partitioning column"),
+ parser_errposition(pstate, exprLocation((Node *) spec;
+

I think this should be inside the "else" block after "!IsA(rowexpr,
RowExpr)" error and you can avoid IsA() check too.


-   if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
+   if (b1->isnulls)
+   b1_isnull = b1->isnulls[i][j];
+   if (b2->isnulls)
+   b2_isnull = b2->isnulls[i][j];
+
+   /*
+* If any of the partition bound has NULL value, then check
+* equality for the NULL value instead of comparing the datums
+* as it does not contain valid value in case of NULL.
+*/
+   if (b1_isnull || b2_isnull)
+   {
+   if (b1_isnull != b2_isnull)
+   return false;
+   }
+   else if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],

Looks difficult to understand at first glance, how about the following:

if (b1->isnulls != b2->isnulls)
return false;

if (b1->isnulls)
{
if (b1->isnulls[i][j] != b2->isnulls[i][j])
return false;
if (b1->isnulls[i][j])
continue;
}

See how range partitioning infinite values are handled. Also, place
this before the comment block that was added for the "!datumIsEqual()"
case.


+   if (src->isnulls)
+   dest->isnulls[i] = (bool *) palloc(sizeof(bool) * natts);
...
+   if (src->isnulls)
+   dest->isnulls[i][j] = src->isnulls[i][j];
+
Nothing wrong with this but if we could have checked "dest->isnulls"
instead of "src->isnulls" would be much better.


-   if (dest->kind == NULL ||
-   dest->kind[i][j] == PARTITION_RANGE_DATUM_VALUE)
+   if ((dest->kind == 

Re: Is ssl_crl_file "SSL server cert revocation list"?

2021-12-03 Thread Daniel Gustafsson
> On 2 Dec 2021, at 16:04, Peter Eisentraut  
> wrote:

> This change looks correct to me.

Thanks for review, I've pushed this backpatched (in part) down to 10.

--
Daniel Gustafsson   https://vmware.com/





Re: pg_get_publication_tables() output duplicate relid

2021-12-03 Thread Amit Langote
On Fri, Dec 3, 2021 at 21:34 Amit Langote  wrote:

> On Fri, Dec 3, 2021 at 12:37 PM Amit Kapila 
> wrote:
> > On Thu, Dec 2, 2021 at 7:18 PM Amit Langote 
> wrote:
> > > On Thu, Dec 2, 2021 at 2:27 PM Amit Kapila 
> wrote:
> > > > On Thu, Dec 2, 2021 at 8:42 AM Amit Langote 
> wrote:
> > > > > Reading Alvaro's email at the top again gave me a pause to
> reconsider
> > > > > what I had said in reply.  It might indeed have been nice if the
> > > > > publication DDL itself had prevented the cases where a partition
> and
> > > > > its ancestor are added to a publication, though as Hou-san
> mentioned,
> > > > > partition ATTACHes make that a bit tricky to enforce at all times,
> > > > > though of course not impossible.  Maybe it's okay to just
> de-duplicate
> > > > > pg_publication_tables output as the patch does, even though it may
> > > > > appear to be a band-aid solution if we one considers Alvaro's point
> > > > > about consistency.
> > > >
> > > > True, I think even if we consider that idea it will be a much bigger
> > > > change, and also as it will be a behavioral change so we might want
> to
> > > > keep it just for HEAD and some of these fixes need to be backpatched.
> > > > Having said that, if you or someone want to pursue that idea and come
> > > > up with a better solution than what we have currently it is certainly
> > > > welcome.
> > >
> > > Okay, I did write a PoC patch this morning after sending out my
> > > earlier email.  I polished it a bit, which is attached.
> >
> > I see multiple problems with this patch and idea.
>
> Thanks for looking at it.  Yeah, I have not looked very closely at ALL
> TABLES [IN SCHEMA], though only because I suspected that those cases
> deal with partitioning in such a way that the partition duplication
> issue doesn't arise.  That is, only the FOR TABLE list_of_tables and
> ADD TABLE syntax allow for the duplication issue to occur.


Another thing I forgot to mention is that the patch passes check-world.
Perhaps we don’t have enough tests that would’ve exposed any problems with
the patch’s approach.
-- 
Amit Langote
EDB: http://www.enterprisedb.com


RE: Optionally automatically disable logical replication subscriptions on error

2021-12-03 Thread osumi.takami...@fujitsu.com
Thursday, December 2, 2021 4:41 PM I wrote:
> On Thursday, December 2, 2021 1:49 PM Amit Kapila
>  wrote:
> > On Thu, Dec 2, 2021 at 6:35 AM osumi.takami...@fujitsu.com
> >  wrote:
> > >
> > > On Wednesday, December 1, 2021 10:16 PM Amit Kapila
> >  wrote:
> > > Updated the patch to include the notification.
> > >
> > The patch disables the subscription for non-transient errors. I am not
> > sure if we can easily make the call to decide whether any particular
> > error is transient or not. For example, DISK_FULL or OUT_OF_MEMORY
> might not rectify itself.
> > Why not just allow to disable the subscription on any error? And then
> > let the user check the error either in view or logs and decide whether
> > it would like to enable the subscription or do something before it
> > (like making space in disk, or fixing the network).
> Agreed. I'll treat any errors as the trigger of the feature in the next 
> version.
> 
> > The other problem I see with this transient error stuff is maintaining
> > the list of error codes that we think are transient. I think we need a
> > discussion for each of the error_codes we are listing now and whatever
> > new error_code we add in the future which doesn't seem like a good idea.
> This is also true. The maintenance cost of my current implementation didn't
> sound cheap.
> 
> > I think the code to deal with apply worker errors and then disable the
> > subscription has some flaws. Say, while disabling the subscription if
> > it leads to another error then I think the original error won't be
> > reported.  Can't we simply emit the error via EmitErrorReport and then
> > do AbortOutOfAnyTransaction, FlushErrorState, and any other memory
> > context clean up if required and then disable the subscription after coming
> out of catch?
> You are right. I'll fix related parts accordingly.
Hi, I've made a new patch v11 that incorporated suggestions described above.

There are several notes to share regarding v11 modifications.

1. Modified the commit message a bit.

2. DisableSubscriptionOnError() doesn't return ErrData anymore,
since now to emit error message is done in the error recovery area
and the function purpose has become purely to run a transaction to disable
the subscription.

3. In DisableSubscriptionOnError(), v10 rethrew the error if the 
disable_on_error
flag became false in the interim, but v11 just closes the transaction and
finishes the function.

4. If table sync worker detects an error during synchronization
and needs to disable the subscription, the worker disables it and just exit by 
proc_exit.
The processing after disabling the subscription didn't look necessary to me
for disabled subscription.

5. Only when we succeed in the table synchronization, it's necessary to
allocate slot name in long-lived context, after the table synchronization in
ApplyWorkerMain(). Otherwise, we'll see junk value of syncslotname
because it is the return value of LogicalRepSyncTableStart().

6. There are 3 places for error recovery in ApplyWorkerMain().
All of those might look similar but I didn't make an united function for them.
Those are slightly different from each other and I felt
readability is reduced by grouping them into one type of function call.

7. In v11, I covered the case that apply worker failed with
apply_error_callback_arg.command == 0, as one path to disable the subscription
in order to cover all errors.

8. I changed one flag name from 'disable_subscription' to 'did_error'
in ApplyWorkerMain().

9. All chages in this version are C codes and checked by pgindent.

Best Regards,
Takamichi Osumi



v11-0001-Optionally-disable-subscriptions-on-error.patch
Description: v11-0001-Optionally-disable-subscriptions-on-error.patch


Re: Some RELKIND macro refactoring

2021-12-03 Thread Peter Eisentraut

On 23.11.21 16:09, Alvaro Herrera wrote:

In init_fork(), there's a typo:

+* For tables, the AM callback creates both the main and the init fork.
should read:
+* For tables, the AM callback creates both the main and the init forks.


I believe the original wording is correct.


Overall, LGTM.


Committed with the (other) suggested adjustments.





Re: parallel vacuum comments

2021-12-03 Thread Masahiko Sawada
On Fri, Dec 3, 2021 at 6:03 PM Amit Kapila  wrote:
>
> On Thu, Dec 2, 2021 at 6:01 PM Masahiko Sawada  wrote:
> >
> > I've attached updated patches.
> >
>
> I have a few comments on v4-0001.

Thank you for the comments!

> 1.
> In parallel_vacuum_process_all_indexes(), we can combine the two
> checks for vacuum/cleanup at the beginning of the function

Agreed.

> and I think
> it is better to keep the variable name as bulkdel or cleanup instead
> of vacuum as that is more specific and clear.

I was thinking to use the terms "bulkdel" and "cleanup" instead of
"vacuum" and "cleanup" for the same reason. That way, probably we can
use “bulkdel" and “cleanup" when doing index bulk-deletion (i.g.,
calling to ambulkdelete) and index cleanup (calling to
amvacuumcleanup), respectively, and use "vacuum" when processing an
index, i.g., doing either bulk-delete or cleanup, instead of using
just "processing" . But we already use “vacuum” and “cleanup” in many
places, e.g., lazy_vacuum_index() and lazy_cleanup_index(). If we want
to use “bulkdel” instead of “vacuum”, I think it would be better to
change the terminology in vacuumlazy.c thoroughly, probably in a
separate patch.

> 2. The patch seems to be calling parallel_vacuum_should_skip_index
> thrice even before starting parallel vacuum. It has a call to find the
> number of blocks which has to be performed for each index. I
> understand it might not be too costly to call this but it seems better
> to remember this info like we are doing in the current code.

Yes, we can bring will_vacuum_parallel array back to the code. That
way, we can remove the call to parallel_vacuum_should_skip_index() in
parallel_vacuum_begin().

> We can
> probably set parallel_workers_can_process in parallel_vacuum_begin and
> then again update in parallel_vacuum_process_all_indexes. Won't doing
> something like that be better?

parallel_workers_can_process can vary depending on bulk-deletion, the
first time cleanup, or the second time (or more) cleanup. What can we
set parallel_workers_can_process based on in parallel_vacuum_begin()?

>
> 3.  /*
>   * Copy the index bulk-deletion result returned from ambulkdelete and
> @@ -2940,19 +2935,20 @@ parallel_process_one_index(Relation indrel,
>   * Since all vacuum workers write the bulk-deletion result at different
>   * slots we can write them without locking.
>   */
> - if (shared_istat && !shared_istat->updated && istat_res != NULL)
> + if (!pindstats->istat_updated && istat_res != NULL)
>   {
> - memcpy(_istat->istat, istat_res, sizeof(IndexBulkDeleteResult));
> - shared_istat->updated = true;
> + memcpy(&(pindstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
> + pindstats->istat_updated = true;
>
>   /* Free the locally-allocated bulk-deletion result */
>   pfree(istat_res);
> -
> - /* return the pointer to the result from shared memory */
> - return _istat->istat;
>   }
>
> I think here now we copy the results both for local and parallel
> cleanup. Isn't it better to write something about that in comments as
> it is not clear from current comments?

What do you mean by "local cleanup"?

>
> 4.
> + /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
> + est_shared_len = MAXALIGN(sizeof(LVShared));
>   shm_toc_estimate_chunk(>estimator, est_shared_len);
>
> Do we need MAXALIGN here? I think previously we required it here
> because immediately after this we were writing index stats but now
> those are allocated separately. Normally, shm_toc_estimate_chunk()
> aligns the size but sometimes we need to do it so as to avoid
> unaligned accessing of shared mem. I am really not sure whether we
> require it for dead_items, do you remember why it is there in the
> first place.

Indeed. I don't remember that. Probably it's an oversight.

>
> 5. The below-updated comment based on one of my previous suggestions
> seems to be missing in this version.
> + /*
> + * Not safe, if the index supports parallel cleanup conditionally,
> + * but we have already processed the index (for bulkdelete).  We do
> + * this to avoid the need to invoke workers when parallel index
> + * cleanup doesn't need to scan the index.  See the comments for
> + * option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know when indexes
> + * support parallel cleanup conditionally.
> + */

Added.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: pg_get_publication_tables() output duplicate relid

2021-12-03 Thread Amit Langote
On Fri, Dec 3, 2021 at 12:37 PM Amit Kapila  wrote:
> On Thu, Dec 2, 2021 at 7:18 PM Amit Langote  wrote:
> > On Thu, Dec 2, 2021 at 2:27 PM Amit Kapila  wrote:
> > > On Thu, Dec 2, 2021 at 8:42 AM Amit Langote  
> > > wrote:
> > > > Reading Alvaro's email at the top again gave me a pause to reconsider
> > > > what I had said in reply.  It might indeed have been nice if the
> > > > publication DDL itself had prevented the cases where a partition and
> > > > its ancestor are added to a publication, though as Hou-san mentioned,
> > > > partition ATTACHes make that a bit tricky to enforce at all times,
> > > > though of course not impossible.  Maybe it's okay to just de-duplicate
> > > > pg_publication_tables output as the patch does, even though it may
> > > > appear to be a band-aid solution if we one considers Alvaro's point
> > > > about consistency.
> > >
> > > True, I think even if we consider that idea it will be a much bigger
> > > change, and also as it will be a behavioral change so we might want to
> > > keep it just for HEAD and some of these fixes need to be backpatched.
> > > Having said that, if you or someone want to pursue that idea and come
> > > up with a better solution than what we have currently it is certainly
> > > welcome.
> >
> > Okay, I did write a PoC patch this morning after sending out my
> > earlier email.  I polished it a bit, which is attached.
>
> I see multiple problems with this patch and idea.

Thanks for looking at it.  Yeah, I have not looked very closely at ALL
TABLES [IN SCHEMA], though only because I suspected that those cases
deal with partitioning in such a way that the partition duplication
issue doesn't arise.  That is, only the FOR TABLE list_of_tables and
ADD TABLE syntax allow for the duplication issue to occur.

> (a) I think you
> forgot to deal with "All Tables In Schema" Publication which will be
> quite tricky to deal with during attach operation. How will you remove
> a particular relation from such a publication if there is a need to do
> so?

Hmm, my understanding of how FOR ALL TABLES... features work is that
one cannot remove a particular relation from such publications?

create schema sch;
create table sch.p (a int primary key) partition by list (a);
create table sch.p1 partition of sch.p for values in (1);
create table sch.p2 partition of sch.p for values in (2);
create table p (a int primary key) partition by list (a);
create table p1 partition of p for values in (1);
create table p2 partition of p for values in (2);
create publication puball for all tables;
create publication pubsch for all tables in schema sch;

alter publication puball drop table p;
ERROR:  publication "puball" is defined as FOR ALL TABLES
DETAIL:  Tables cannot be added to or dropped from FOR ALL TABLES publications.

alter publication pubsch drop table sch.p;
ERROR:  relation "p" is not part of the publication

What am I missing?

> (b) Also, how will we prohibit adding partition and its root in
> the case of "All Tables In Schema" or "All Tables" publication? If
> there is no way to do that, then that would mean we anyway need to
> deal with parent and child tables as part of the same publication and
> this new behavior will add an exception to it.

I checked that FOR ALL TABLES publications don't allow adding a table
explicitly, but apparently IN SCHEMA one does:

alter publication pubsch add table p2;
\dRp+ pubsch
   Publication pubsch
 Owner | All tables | Inserts | Updates | Deletes | Truncates | Via root
---++-+-+-+---+--
 amit  | f  | t   | t   | t   | t | f
Tables:
"public.p2"
Tables from schemas:
"sch"

ISTM that the ..IN SCHEMA syntax does not behave like FOR ALL TABLES
without the IN SCHEMA in this regard.  Is that correct?

> (c) I think this can
> make switching "publish_via_partition_root" inconvenient for users.
> Say all or some of the partitions are part of the publication and then
> the user decides to add root table and change publish option
> "publish_via_partition_root" to "true" at that moment she needs to
> remove all partitions first.

Hmm, actually I don't think I have tested the case where some or all
partitions are present by themselves before adding the root table.
I'll check that and try to think of sensible behavior for that.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: parallel vacuum comments

2021-12-03 Thread Masahiko Sawada
 and

On Fri, Dec 3, 2021 at 6:56 PM Amit Kapila  wrote:
>
> On Fri, Dec 3, 2021 at 2:33 PM Amit Kapila  wrote:
> >
> > On Thu, Dec 2, 2021 at 6:01 PM Masahiko Sawada  
> > wrote:
> > >
> > > I've attached updated patches.
> > >
> >
> > I have a few comments on v4-0001.
> >
>
> The new test proposed by v4-0003 is increasing the vacuum_parallel.sql
> timing by more than 10 times. It appears to be taking the highest time
> among all the tests in make check. I think it is because of a large
> amount of data being processed by the test.

Right.

> I think it is good to use
> it for testing of patches during development but won't be a good idea
> to commit. Can we reduce its timings?

On reflection, we already have test cases for:

* a parallel vacuum does bulkdeletion and cleanup
* a parallel vacuum does only cleanup

and the case that the new tests add is that a vacuum does bulkdeletion
twice and cleanup. Given the increasing the regression test time, the
thrid patch might not worth to be added.


Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: suboverflowed subtransactions concurrency performance optimize

2021-12-03 Thread Dilip Kumar
On Fri, Dec 3, 2021 at 5:00 PM Simon Riggs  wrote:
>
> On Fri, 3 Dec 2021 at 01:27, Dilip Kumar  wrote:
>
> > > On review, I think it is also possible that we update subtrans ONLY if
> > > someone uses >PGPROC_MAX_CACHED_SUBXIDS.
> > > This would make subtrans much smaller and avoid one-entry-per-page
> > > which is a major source of cacheing.
> > > This would means some light changes in GetSnapshotData().
> > > Let me know if that seems interesting also?
> >
> > Do you mean to say avoid setting the sub-transactions parent if the
> > number of sun-transactions is not crossing PGPROC_MAX_CACHED_SUBXIDS?
> > But the TransactionIdDidCommit(), might need to fetch the parent if
> > the transaction status is TRANSACTION_STATUS_SUB_COMMITTED, so how
> > would we handle that?
>
> TRANSACTION_STATUS_SUB_COMMITTED is set as a transient state during
> final commit.
> In that case, the top-level xid is still in procarray when nsubxids <
> PGPROC_MAX_CACHED_SUBXIDS
> so we need not consult pg_subtrans in that case, see step 4 of.
> TransactionIdIsInProgress()

Okay I see, that there is a rule that before calling
TransactionIdDidCommit(), we must consult TransactionIdIsInProgress()
for non MVCC snapshot or XidInMVCCSnapshot().  Okay so now I don't
have this concern, thanks for clarifying.  I will think more about
this approach from other aspects.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: suboverflowed subtransactions concurrency performance optimize

2021-12-03 Thread Simon Riggs
On Fri, 3 Dec 2021 at 01:27, Dilip Kumar  wrote:

> > On review, I think it is also possible that we update subtrans ONLY if
> > someone uses >PGPROC_MAX_CACHED_SUBXIDS.
> > This would make subtrans much smaller and avoid one-entry-per-page
> > which is a major source of cacheing.
> > This would means some light changes in GetSnapshotData().
> > Let me know if that seems interesting also?
>
> Do you mean to say avoid setting the sub-transactions parent if the
> number of sun-transactions is not crossing PGPROC_MAX_CACHED_SUBXIDS?
> But the TransactionIdDidCommit(), might need to fetch the parent if
> the transaction status is TRANSACTION_STATUS_SUB_COMMITTED, so how
> would we handle that?

TRANSACTION_STATUS_SUB_COMMITTED is set as a transient state during
final commit.
In that case, the top-level xid is still in procarray when nsubxids <
PGPROC_MAX_CACHED_SUBXIDS
so we need not consult pg_subtrans in that case, see step 4 of
TransactionIdIsInProgress()

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: [PROPOSAL] new diagnostic items for the dynamic sql

2021-12-03 Thread Dinesh Chemuduru
Hi Michael,

Attaching the latest patch here(It's the recent patch), and looking for
more suggestions/inputs from the team.

On Fri, 3 Dec 2021 at 13:09, Michael Paquier  wrote:

> On Wed, Nov 10, 2021 at 01:58:58PM +0530, Dinesh Chemuduru wrote:
> > The proposed statements are much clear, but will wait for other’s
> > suggestion, and will fix it accordingly.
>
> This update was three weeks ago, and no new version has been
> provided, so I am marking this as returned with feedback in the CF
> app.  If you can work more on this proposal and send an updated patch,
> please feel free to resubmit.
> --
> Michael
>


05_diag_parse_sql_statement.patch
Description: Binary data


Re: parallel vacuum comments

2021-12-03 Thread Amit Kapila
On Fri, Dec 3, 2021 at 2:33 PM Amit Kapila  wrote:
>
> On Thu, Dec 2, 2021 at 6:01 PM Masahiko Sawada  wrote:
> >
> > I've attached updated patches.
> >
>
> I have a few comments on v4-0001.
>

The new test proposed by v4-0003 is increasing the vacuum_parallel.sql
timing by more than 10 times. It appears to be taking the highest time
among all the tests in make check. I think it is because of a large
amount of data being processed by the test. I think it is good to use
it for testing of patches during development but won't be a good idea
to commit. Can we reduce its timings?

-- 
With Regards,
Amit Kapila.




Re: Logical replication - schema change not invalidating the relation cache

2021-12-03 Thread vignesh C
On Fri, Dec 3, 2021 at 1:13 PM Michael Paquier  wrote:
>
> On Thu, Aug 26, 2021 at 09:00:39PM +0530, vignesh C wrote:
> > The previous patch was failing because of the recent test changes made
> > by commit 201a76183e2 which unified new and get_new_node, attached
> > patch has the changes to handle the changes accordingly.
>
> Please note that the CF app is complaining about this patch, so a
> rebase is required.  I have moved it to next CF, waiting on author,
> for now.

Thanks for letting me know, I have rebased it on top of HEAD, the
attached v2 version has the rebased changes.

Regards,
Vignesh
From e91a4ff5d3f3d615de3e959c7dae691bda798195 Mon Sep 17 00:00:00 2001
From: Vigneshwaran C 
Date: Fri, 3 Dec 2021 15:18:53 +0530
Subject: [PATCH v2] Fix for invalidating logical replication relations when
 there is a change in schema.

When the schema gets changed, the rel sync cache invalidation was not
happening, fixed it by adding a callback for schema change.
---
 src/backend/replication/pgoutput/pgoutput.c | 51 ++
 src/test/subscription/t/001_rep_changes.pl  | 76 -
 2 files changed, 126 insertions(+), 1 deletion(-)

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 6f6a203dea..03a105dc86 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -141,6 +141,8 @@ static RelationSyncEntry *get_rel_sync_entry(PGOutputData *data, Oid relid);
 static void rel_sync_cache_relation_cb(Datum arg, Oid relid);
 static void rel_sync_cache_publication_cb(Datum arg, int cacheid,
 		  uint32 hashvalue);
+static void rel_sync_cache_namespace_cb(Datum arg, int cacheid,
+		uint32 hashvalue);
 static void set_schema_sent_in_streamed_txn(RelationSyncEntry *entry,
 			TransactionId xid);
 static bool get_schema_sent_in_streamed_txn(RelationSyncEntry *entry,
@@ -1071,6 +1073,9 @@ init_rel_sync_cache(MemoryContext cachectx)
 	CacheRegisterSyscacheCallback(PUBLICATIONNAMESPACEMAP,
   rel_sync_cache_publication_cb,
   (Datum) 0);
+	CacheRegisterSyscacheCallback(NAMESPACEOID,
+  rel_sync_cache_namespace_cb,
+  (Datum) 0);
 }
 
 /*
@@ -1357,6 +1362,52 @@ rel_sync_cache_relation_cb(Datum arg, Oid relid)
 	}
 }
 
+/*
+ * Namespace syscache invalidation callback
+ */
+static void
+rel_sync_cache_namespace_cb(Datum arg, int cacheid, uint32 hashvalue)
+{
+	HASH_SEQ_STATUS status;
+	RelationSyncEntry *entry;
+
+	/*
+	 * We can get here if the plugin was used in SQL interface as the
+	 * RelSchemaSyncCache is destroyed when the decoding finishes, but there
+	 * is no way to unregister the relcache invalidation callback.
+	 */
+	if (RelationSyncCache == NULL)
+		return;
+
+	hash_seq_init(, RelationSyncCache);
+	while ((entry = (RelationSyncEntry *) hash_seq_search()) != NULL)
+	{
+		/*
+		 * Reset schema sent status as the relation definition may have changed.
+		 * Also free any objects that depended on the earlier definition.
+		 */
+		entry->schema_sent = false;
+		list_free(entry->streamed_txns);
+		entry->streamed_txns = NIL;
+		if (entry->map)
+		{
+			/*
+			 * Must free the TupleDescs contained in the map explicitly,
+			 * because free_conversion_map() doesn't.
+			 */
+			FreeTupleDesc(entry->map->indesc);
+			FreeTupleDesc(entry->map->outdesc);
+			free_conversion_map(entry->map);
+		}
+		entry->map = NULL;
+
+		if (hash_search(RelationSyncCache,
+(void *) >relid,
+HASH_REMOVE, NULL) == NULL)
+			elog(ERROR, "hash table corrupted");
+	}
+}
+
 /*
  * Publication relation/schema map syscache invalidation callback
  */
diff --git a/src/test/subscription/t/001_rep_changes.pl b/src/test/subscription/t/001_rep_changes.pl
index 9531d81f19..622c508b3b 100644
--- a/src/test/subscription/t/001_rep_changes.pl
+++ b/src/test/subscription/t/001_rep_changes.pl
@@ -6,7 +6,7 @@ use strict;
 use warnings;
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
-use Test::More tests => 32;
+use Test::More tests => 33;
 
 # Initialize publisher node
 my $node_publisher = PostgreSQL::Test::Cluster->new('publisher');
@@ -520,6 +520,80 @@ is($result, qq(0), 'check replication origin was dropped on subscriber');
 $node_subscriber->stop('fast');
 $node_publisher->stop('fast');
 
+# Test schema invalidation by renaming the schema
+# Create tables on publisher
+# Initialize publisher node
+my $node_publisher1 = PostgreSQL::Test::Cluster->new('publisher1');
+$node_publisher1->init(allows_streaming => 'logical');
+$node_publisher1->start;
+
+# Create subscriber node
+my $node_subscriber1 = PostgreSQL::Test::Cluster->new('subscriber1');
+$node_subscriber1->init(allows_streaming => 'logical');
+$node_subscriber1->start;
+
+my $publisher1_connstr = $node_publisher1->connstr . ' dbname=postgres';
+
+$node_publisher1->safe_psql('postgres', "CREATE SCHEMA sch1");
+$node_publisher1->safe_psql('postgres', "CREATE TABLE sch1.t1 (c1 int)");
+
+# Create 

Re: parallel vacuum comments

2021-12-03 Thread Amit Kapila
On Fri, Dec 3, 2021 at 3:01 PM houzj.f...@fujitsu.com
 wrote:
>
> On Thur, Dec 2, 2021 8:31 PM Masahiko Sawada  wrote:
> > I've attached updated patches.
> >
> > The first patch is the main patch for refactoring parallel vacuum code; 
> > removes
> > bitmap-related code and renames function names for consistency. The second
> > patch moves these parallel-related codes to vacuumparallel.c as well as
> > common functions that are used by both lazyvacuum.c and vacuumparallel.c to
> > vacuum.c. The third patch adds regression tests for parallel vacuum on
> > different kinds of indexes with multiple index scans. Please review them.
>
> Thanks for updating the patch.
> I reviewed the 0001 patch and didn’t find some big issues in the patch.
>
> I only have a personally suggestion for the following function name:
>
> parallel_vacuum_process_unsafe_indexes
> parallel_vacuum_index_is_parallel_safe
>
> It seems not only "unsafe" index are processed in the above functions,
> but also index which is unsuitable(based on 
> parallel_vacuum_should_skip_index).
>

I have given one comment to remove the call to
parallel_vacuum_should_skip_index() from
parallel_vacuum_index_is_parallel_safe(). If Sawada-San follows that
then maybe your point will be addressed.

-- 
With Regards,
Amit Kapila.




RE: parallel vacuum comments

2021-12-03 Thread houzj.f...@fujitsu.com
On Thur, Dec 2, 2021 8:31 PM Masahiko Sawada  wrote:
> I've attached updated patches.
> 
> The first patch is the main patch for refactoring parallel vacuum code; 
> removes
> bitmap-related code and renames function names for consistency. The second
> patch moves these parallel-related codes to vacuumparallel.c as well as
> common functions that are used by both lazyvacuum.c and vacuumparallel.c to
> vacuum.c. The third patch adds regression tests for parallel vacuum on
> different kinds of indexes with multiple index scans. Please review them.

Thanks for updating the patch.
I reviewed the 0001 patch and didn’t find some big issues in the patch.

I only have a personally suggestion for the following function name:

parallel_vacuum_process_unsafe_indexes
parallel_vacuum_index_is_parallel_safe

It seems not only "unsafe" index are processed in the above functions,
but also index which is unsuitable(based on parallel_vacuum_should_skip_index).
So, it might be clear to avoid "unsafe" in the name. Maybe we can use: 
"xxin_leader"
or " can_participate".

Best regards,
Hou zj
 






Re: Column Filtering in Logical Replication

2021-12-03 Thread vignesh C
On Fri, Dec 3, 2021 at 12:45 AM Alvaro Herrera  wrote:
>
> On 2021-Sep-16, Peter Smith wrote:
>
> > I noticed that the latest v5 no longer includes the TAP test which was
> > in the v4 patch.
> >
> > (src/test/subscription/t/021_column_filter.pl)
> >
> > Was that omission deliberate?
>
> Somehow I not only failed to notice the omission, but also your email
> where you told us about it.  I have since posted a version of the patch
> that again includes it.

Thanks for the patch, Few comments:
I had a look at the patch, I felt the following should be handled:
1) Dump changes to include the column filters while adding table to
publication in dumpPublicationTable
2) Documentation changes for column filtering in create_publication.sgml
3) describe publication changes to support \dRp command in describePublications
4) I felt we need not allow specifying columns in case of "alter
publication drop table" as currently dropping column filter is not
allowed.
5) We should check if the column specified is present in the table,
currently we are able to specify non existent column for column
filtering
+   foreach(lc, targetrel->columns)
+   {
+   char   *colname;
+
+   colname = strVal(lfirst(lc));
+   target_cols = lappend(target_cols, colname);
+   }
+   check_publication_add_relation(targetrel->relation, target_cols);

Regards,
Vignesh




Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit

2021-12-03 Thread Fujii Masao




On 2021/11/16 18:55, Etsuro Fujita wrote:

I changed my mind; I’ll update the patch to ignore the error as
before, because 1) as far as I know, there are no reports from the
field concerning that we ignore all kinds of errors in cleaning up the
prepared statements, so maybe we don’t need to change that, and 2) we
already committed at least one of the remote transactions, so it’s not
good to abort the local transaction unless we really have to.


Are you planning to update the patch? In addition to this change,
at least documentation about new parallel_commit parameter needs
to be included in the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: parallel vacuum comments

2021-12-03 Thread Amit Kapila
On Thu, Dec 2, 2021 at 6:01 PM Masahiko Sawada  wrote:
>
> I've attached updated patches.
>

I have a few comments on v4-0001.
1.
In parallel_vacuum_process_all_indexes(), we can combine the two
checks for vacuum/cleanup at the beginning of the function and I think
it is better to keep the variable name as bulkdel or cleanup instead
of vacuum as that is more specific and clear.

2. The patch seems to be calling parallel_vacuum_should_skip_index
thrice even before starting parallel vacuum. It has a call to find the
number of blocks which has to be performed for each index. I
understand it might not be too costly to call this but it seems better
to remember this info like we are doing in the current code. We can
probably set parallel_workers_can_process in parallel_vacuum_begin and
then again update in parallel_vacuum_process_all_indexes. Won't doing
something like that be better?

3.  /*
  * Copy the index bulk-deletion result returned from ambulkdelete and
@@ -2940,19 +2935,20 @@ parallel_process_one_index(Relation indrel,
  * Since all vacuum workers write the bulk-deletion result at different
  * slots we can write them without locking.
  */
- if (shared_istat && !shared_istat->updated && istat_res != NULL)
+ if (!pindstats->istat_updated && istat_res != NULL)
  {
- memcpy(_istat->istat, istat_res, sizeof(IndexBulkDeleteResult));
- shared_istat->updated = true;
+ memcpy(&(pindstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
+ pindstats->istat_updated = true;

  /* Free the locally-allocated bulk-deletion result */
  pfree(istat_res);
-
- /* return the pointer to the result from shared memory */
- return _istat->istat;
  }

I think here now we copy the results both for local and parallel
cleanup. Isn't it better to write something about that in comments as
it is not clear from current comments?

4.
+ /* Estimate size for shared information -- PARALLEL_VACUUM_KEY_SHARED */
+ est_shared_len = MAXALIGN(sizeof(LVShared));
  shm_toc_estimate_chunk(>estimator, est_shared_len);

Do we need MAXALIGN here? I think previously we required it here
because immediately after this we were writing index stats but now
those are allocated separately. Normally, shm_toc_estimate_chunk()
aligns the size but sometimes we need to do it so as to avoid
unaligned accessing of shared mem. I am really not sure whether we
require it for dead_items, do you remember why it is there in the
first place.

5. The below-updated comment based on one of my previous suggestions
seems to be missing in this version.
+ /*
+ * Not safe, if the index supports parallel cleanup conditionally,
+ * but we have already processed the index (for bulkdelete).  We do
+ * this to avoid the need to invoke workers when parallel index
+ * cleanup doesn't need to scan the index.  See the comments for
+ * option VACUUM_OPTION_PARALLEL_COND_CLEANUP to know when indexes
+ * support parallel cleanup conditionally.
+ */

-- 
With Regards,
Amit Kapila.




Re: Shouldn't postgres_fdw report warning when it gives up getting result from foreign server?

2021-12-03 Thread Fujii Masao




On 2021/11/22 14:16, Bharath Rupireddy wrote:

BTW, we can hide the message "remote SQL command: .." in cancel request case,
but which would make the debug and troubleshooting harder.


Yeah, let's not hide the message.


Yes!



For a cancel request maybe we can just say without te errcontext:
 ereport(WARNING,
 (errmsg("could not get result of cancel
request due to timeout")));

See the below existing message using "cancel request":
  errmsg("could not send cancel request: %s",

For SQL command we can say:
 ereport(WARNING,
 (errmsg("could not get query result due to
timeout"),
  query ? errcontext("remote SQL command:
%s", query) : 0));


I wonder how pgfdw_get_cleanup_result() can determine which
log message to report. Probably we can add new boolean argument
to pgfdw_get_cleanup_result() so that it should be set to true
for cancel request case, but false for query case. Then
pgfdw_get_cleanup_result() can decide wihch message to log
based on that argument. But it seems not good design to me.
Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Commitfest 2021-11 closed

2021-12-03 Thread Daniel Gustafsson
I've now closed the 2021-11 commitfest, ~36% of the patches were closed in some
way (committed, returned with feedback, withdrawn or rejected) with 184 patches
moved to the next CF.

--
Daniel Gustafsson   https://vmware.com/





Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-12-03 Thread Daniel Gustafsson
This review has gone unanswered for two months, so I'm marking this patch
Returned with Feedback.  Please feel free to resubmit when a new version of the
patch is available.

--
Daniel Gustafsson   https://vmware.com/





Re: issue in pgfdw_report_error()?

2021-12-03 Thread Fujii Masao




On 2021/11/22 13:59, Bharath Rupireddy wrote:

On Mon, Nov 22, 2021 at 8:17 AM Fujii Masao  wrote:

Well, in that case, why can't we get rid of "(message_primary != NULL"
and just have "message_primary[0] != '\0' ? errmsg_internal("%s",
message_primary) : errmsg("could not obtain message string for remote
error")" ?


That's possible if we can confirm that PQerrorMessage() never returns
NULL all the cases. I'm not sure how much it's worth doing that, though..
It seems more robust to check also NULL there.


Okay.


BTW, we might have to fix it in dblink_res_error too?


Yeah, that's good idea. I included that change in the patch. Attached.


Thanks. pgfdw_report_error_v2 patch looks good to me.


Pushed. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Remove extra Logging_collector check before calling SysLogger_Start()

2021-12-03 Thread Bharath Rupireddy
On Fri, Dec 3, 2021 at 1:43 PM Daniel Gustafsson  wrote:
>
> > On 3 Dec 2021, at 08:58, Bharath Rupireddy 
> >  wrote:
>
> > It seems like there's an extra Logging_collector check before calling
> > SysLogger_Start(). Note that the SysLogger_Start() has a check to
> > return 0 if Logging_collector is off. This change is consistent with
> > the other usage of SysLogger_Start().
> >
> >/* If we have lost the log collector, try to start a new one 
> > */
> > -   if (SysLoggerPID == 0 && Logging_collector)
> > +   if (SysLoggerPID == 0)
> >SysLoggerPID = SysLogger_Start();
>
> I think the code reads clearer with the Logging_collector check left intact,
> and avoiding a function call in this codepath doesn't hurt.

In that case, can we remove if (!Logging_collector) within
SysLogger_Start() and have the check outside? This will save function
call costs in the other places too. The pgarch_start and
AutoVacuumingActive have checks outside PgArchStartupAllowed and
AutoVacuumingActive.

Regards,
Bharath Rupireddy.




Re: Remove extra Logging_collector check before calling SysLogger_Start()

2021-12-03 Thread Daniel Gustafsson
> On 3 Dec 2021, at 08:58, Bharath Rupireddy 
>  wrote:

> It seems like there's an extra Logging_collector check before calling
> SysLogger_Start(). Note that the SysLogger_Start() has a check to
> return 0 if Logging_collector is off. This change is consistent with
> the other usage of SysLogger_Start().
> 
>/* If we have lost the log collector, try to start a new one */
> -   if (SysLoggerPID == 0 && Logging_collector)
> +   if (SysLoggerPID == 0)
>SysLoggerPID = SysLogger_Start();

I think the code reads clearer with the Logging_collector check left intact,
and avoiding a function call in this codepath doesn't hurt.

--
Daniel Gustafsson   https://vmware.com/