date:20220915

Re: New strategies for freezing, advancing relfrozenxid early

2022-09-15 Thread John Naylor

On Wed, Sep 14, 2022 at 11:33 PM Peter Geoghegan  wrote:
>
> On Wed, Sep 14, 2022 at 3:18 AM John Naylor

> > Furthermore, it doesn't have to anticipate the maximum size, so there
> > is no up front calculation assuming max-tuples-per-page, so it
> > automatically uses less memory for less demanding tables.
>
> The final number of TIDs doesn't seem like the most interesting
> information that VM snapshots could provide us when it comes to
> building the dead_items TID data structure -- the *distribution* of
> TIDs across heap pages seems much more interesting. The "shape" can be
> known ahead of time, at least to some degree. It can help with
> compression, which will reduce cache misses.

My point here was simply that spilling to disk is an admission of
failure to utilize memory efficiently and thus shouldn't be a selling
point of VM snapshots. Other selling points could still be valid.

-- 
John Naylor
EDB: http://www.enterprisedb.com

Re: Improve description of XLOG_RUNNING_XACTS

2022-09-15 Thread Masahiko Sawada

On Wed, Sep 14, 2022 at 6:33 PM Amit Kapila  wrote:
>
> On Fri, Sep 9, 2022 at 6:18 AM Masahiko Sawada  wrote:
> >
> > Updated the patch accordingly.
> >
>
> I have created two xacts each with savepoints and after your patch,
> the record will show xacts/subxacts information as below:
>
> rmgr: Standby len (rec/tot): 74/74, tx:  0, lsn:
> 0/014AC238, prev 0/014AC1F8, desc: RUNNING_XACTS nextXid 733
> latestCompletedXid 726 oldestRunningXid 727; 2 xacts: 729 727; 4
> subxacts: 730 731 728 732
>
> There is no way to associate which subxacts belong to which xact, so
> will it be useful, and if so, how? I guess we probably don't need it
> here because the describe records just display the record information.

I think it's useful for debugging purposes. For instance, when I was
working on the fix 68dcce247f1a13318613a0e27782b2ca21a4ceb7
(REL_14_STABLE), I checked if all initial running transactions
including subtransactions are properly stored and purged by checking
the debug logs and pg_waldump output. Actually, until I realize that
the description of RUNNING_XACTS doesn't show subtransaction
information, I was confused by the fact that the saved initial running
transactions didn't match the description shown by pg_waldump.

Regards,

-- 
Masahiko Sawada
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Add parameter jit_warn_above_fraction

2022-09-15 Thread Ibrar Ahmed

On Sat, Apr 9, 2022 at 8:43 PM Julien Rouhaud  wrote:

> On Sat, Apr 09, 2022 at 12:31:23PM -0400, Tom Lane wrote:
> > Julien Rouhaud  writes:
> > > On Sat, Apr 09, 2022 at 10:42:12AM -0400, Tom Lane wrote:
> > >> Also, good luck with "looking in the logs", because by default
> > >> WARNING-level messages don't go to the postmaster log.
> >
> > > I must be missing something because by default log_min_messages is
> WARNING, and
> > > AFAICS I do get WARNING-level messages in some vanilla cluster logs,
> using
> > > vanilla .psqlrc.
> >
> > Ah, sorry, I was looking at the wrong thing namely
> > log_min_error_statement.  The point still holds though: if we emit this
> > at level WARNING, what will get logged is just the bare message and not
> > the statement that triggered it.  How useful do you think that will be?
>
> Ah right, I did miss that "small detail".  And of course I agree, as-is
> that
> would be entirely useless and it should be LOG, like the rest of similar
> features.
>
>
>
The patch does not apply successfully; please rebase the patch.

patching file src/backend/utils/misc/guc.c
Hunk #1 FAILED at 642.
Hunk #2 FAILED at 3943.
2 out of 2 hunks FAILED -- saving rejects to file
src/backend/utils/misc/guc.c.rej
patching file src/backend/utils/misc/postgresql.conf.sample
patching file src/include/jit/jit.h


-- 
Ibrar Ahmed

Re: Add the ability to limit the amount of memory that can be allocated to backends.

2022-09-15 Thread Ibrar Ahmed

On Mon, Sep 12, 2022 at 8:30 PM Reid Thompson 
wrote:

> On Fri, 2022-09-09 at 12:14 -0500, Justin Pryzby wrote:
> > On Sat, Sep 03, 2022 at 11:40:03PM -0400, Reid Thompson wrote:
> > > > > +   0, 0, INT_MAX,
> > > > > +   NULL, NULL, NULL
> > > > I think this needs a maximum like INT_MAX/1024/1024
> > >
> > > Is this noting that we'd set a ceiling of 2048MB?
> >
> > The reason is that you're later multiplying it by 1024*1024, so you
> > need
> > to limit it to avoid overflowing.  Compare with
> > min_dynamic_shared_memory, Log_RotationSize, maintenance_work_mem,
> > autovacuum_work_mem.
>
> What I originally attempted to implement is:
> GUC "max_total_backend_memory" max value as INT_MAX = 2147483647 MB
> (2251799812636672 bytes). And the other variables and comparisons as
> bytes represented as uint64 to avoid overflow.
>
> Is this invalid?
>
> > typo: Explicitely
>
> corrected
>
> > +errmsg("request will exceed postgresql.conf
> > defined max_total_backend_memory limit (%lu > %lu)",
> >
> > I wouldn't mention postgresql.conf - it could be in
> > postgresql.auto.conf, or an include file, or a -c parameter.
> > Suggest: allocation would exceed max_total_backend_memory limit...
> >
>
> updated
>
> >
> > +   ereport(LOG, errmsg("decrease reduces reported
> > backend memory allocated below zero; setting reported to 0"));
> >
> > Suggest: deallocation would decrease backend memory below zero;
>
> updated
>
> > +   {"max_total_backend_memory", PGC_SIGHUP,
> > RESOURCES_MEM,
> >
> >
> >
> > Should this be PGC_SU_BACKEND to allow a superuser to set a higher
> > limit (or no limit)?
>
> Sounds good to me. I'll update to that.
> Would PGC_SUSET be too open?
>
> > There's compilation warning under mingw cross compile due to
> > sizeof(long).  See d914eb347 and other recent commits which I guess
> > is
> > the current way to handle this.
> > http://cfbot.cputube.org/reid-thompson.html
>
> updated %lu to %llu and changed cast from uint64 to
> unsigned long long in the ereport call
>
> > For performance test, you'd want to check what happens with a large
> > number of max_connections (and maybe a large number of clients).  TPS
> > isn't the only thing that matters.  For example, a utility command
> > might
> > sometimes do a lot of allocations (or deallocations), or a
> > "parameterized nested loop" may loop over over many outer tuples and
> > reset for each.  There's also a lot of places that reset to a
> > "per-tuple" context.  I started looking at its performance, but
> > nothing
> > to show yet.
>
> Thanks
>
> > Would you keep people copied on your replies ("reply all") ?
> > Otherwise
> > I (at least) may miss them.  I think that's what's typical on these
> > lists (and the list tool is smart enough not to send duplicates to
> > people who are direct recipients).
>
> Ok - will do, thanks.
>
> --
> Reid Thompson
> Senior Software Engineer
> Crunchy Data, Inc.
>
> reid.thomp...@crunchydata.com
> www.crunchydata.com
>
>
> The patch does not apply; please rebase the patch.

patching file src/backend/utils/misc/guc.c
Hunk #1 FAILED at 3664.
1 out of 1 hunk FAILED -- saving rejects to file
src/backend/utils/misc/guc.c.rej

patching file src/backend/utils/misc/postgresql.conf.sample


-- 
Ibrar Ahmed

Re: Cleaning up historical portability baggage

2022-09-15 Thread Ibrar Ahmed

On Mon, Aug 29, 2022 at 3:13 AM Thomas Munro  wrote:

> On Mon, Aug 29, 2022 at 9:40 AM Tom Lane  wrote:
> > Here's another bit of baggage handling: fixing up the places that
> > were afraid to use fflush(NULL).  We could doubtless have done
> > this years ago (indeed, I found several places already using it)
> > but as long as we're making a push to get rid of obsolete code,
> > doing it now seems appropriate.
>
> +1, must be OK (pg_dump and pg_upgrade).
>
> Archeology:
>
> commit 79fcde48b229534fd4a5e07834e5e0e84dd38bee
> Author: Tom Lane 
> Date:   Sun Nov 29 01:51:56 1998 +
>
> Portability fix for old SunOS releases: fflush(NULL)
>
>
> https://www.postgresql.org/message-id/199811241847.NAA04690%40tuna.uimage.com
>
> SunOS 4.x was still in some kind of support phase for a few more
> years, but I guess they weren't working too hard on conformance and
> features, given that SunOS 5 (the big BSD -> System V rewrite) had
> come out in '92...
>
> > One thing that's not clear to me is what the appropriate rules
> > should be for popen().  POSIX makes clear that you shouldn't
> > expect popen() to include an fflush() itself, but we seem quite
> > haphazard about whether to do one or not before popen().  In
> > the case of popen(..., "r") we can expect that the child can't
> > write on our stdout, but stderr could be a problem anyway.
> >
> > Likewise, there are some places that fflush before system(),
> > but they are a minority.  Again it seems like the main risk
> > is duplicated or mis-ordered stderr output.
> >
> > I'm inclined to add fflush(NULL) before any popen() or system()
> > that hasn't got one already, but did not do that in the attached.
>
> Couldn't hurt.  (Looking around at our setvbuf() setup to check the
> expected stream state in various places... and huh, I hadn't
> previously noticed the thing about Windows interpreting line buffering
> to mean full buffering.  Pfnghghl...)
>
>
> The patch does not apply successfully; please rebase the patch.

patching file src/backend/postmaster/fork_process.c
Hunk #1 FAILED at 37.
1 out of 1 hunk FAILED -- saving rejects to file
src/backend/postmaster/fork_process.c.rej
patching file src/backend/storage/file/fd.c
Hunk #1 FAILED at 2503.
1 out of 1 hunk FAILED -- saving rejects to file
src/backend/storage/file/fd.c.rej
patching file src/backend/utils/error/elog.c
Hunk #1 FAILED at 643.
Hunk #2 FAILED at 670.



-- 
Ibrar Ahmed

Re: Counterintuitive behavior when toast_tuple_target < TOAST_TUPLE_THRESHOLD

2022-09-15 Thread Nikita Malakhov

Hi!

As it is seen from the code (toasting.c and further) Toast tables are
created immediately
when a new relation with the TOASTable column is created. Practically,
there could occur
the case when Toast table does not exist and we should of course check for
that.

TOAST_TUPLE_THRESHOLD is not only one which decides should be value stored
externally, this is slightly more complex and less obvious logic:
(see heapam.c, heap_prepare_insert())
else if (HeapTupleHasExternal(tup) || tup->t_len > TOAST_TUPLE_THRESHOLD)

as you can see here is another condition - HeapTupleHasExternal, which is
set in
heap_fill_tuple and lower in fill_val, where the infomask bit
HEAP_HASEXTERNAL is set.

So when I experimented with the TOAST I'd to add a new flag which forced
the value to be
TOASTed regardless of its size.

Also, TOAST_TUPLE_THRESHOLD sets overall tuple size over which it would be
considered
to be toasted, and has its minimum value that could not be decreased
further.

In [1] (the Pluggable TOAST) we suggest making this an ontion for Toaster.

[1]
https://www.postgresql.org/message-id/flat/224711f9-83b7-a307-b17f-4457ab73a...@sigaev.ru


On Thu, Sep 15, 2022 at 3:05 AM David Rowley  wrote:

> On Thu, 15 Sept 2022 at 04:04, Aleksander Alekseev
>  wrote:
> > 1. Forbid setting toast_tuple_target < TOAST_TUPLE_THRESHOLD
> > 2. Consider using something like RelationGetToastTupleTarget(rel,
> > TOAST_TUPLE_THRESHOLD) in heapam.c:2250, heapam.c:3625 and
> > rewriteheap.c:636 and modify the documentation accordingly.
> > 3. Add a separate user-defined table setting toast_tuple_threshold
> > similar to toast_tuple_target.
> >
> > Thoughts?
>
> There was some discussion on this problem in [1].
>
> The problem with #2 is that if you look at
> heapam_relation_needs_toast_table(), it only decides if the toast
> table should be created based on (tuple_length >
> TOAST_TUPLE_THRESHOLD). So if you were to change the logic as you
> describe for #2 then there might not be a toast table during an
> INSERT/UPDATE.
>
> The only way to fix that would be to ensure that we reconsider if we
> should create a toast table or not when someone changes the
> toast_tuple_target reloption.  That can't be done under
> ShareUpdateExclusiveLock, so we'd need to obtain an
> AccessExclusiveLock instead when changing the toast_tuple_target
> reloption. That might upset some people.
>
> The general direction of [1] was to just increase the minimum setting
> to TOAST_TUPLE_THRESHOLD, but there were some concerns about breaking
> pg_dump as we'd have to error if someone does ALTER TABLE to set the
> toast_tuple_target reloption lower than the newly defined minimum
> value.
>
> I don't quite follow you on #3. If there's no toast table we can't toast.
>
> David
>
> [1]
> https://www.postgresql.org/message-id/20190403063759.gf3...@paquier.xyz
>
>
>

-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

Re: Switching XLog source from archive to streaming when primary available

2022-09-15 Thread Kyotaro Horiguchi

At Thu, 15 Sep 2022 10:28:12 +0530, Bharath Rupireddy 
 wrote in 
> I'm attaching the v6 patch that's rebased on to the latest HEAD.
> Please consider this for review.

Thaks for the new version!

+#define StreamingReplRetryEnabled() \
+   (streaming_replication_retry_interval > 0 && \
+StandbyMode && \
+currentSource == XLOG_FROM_ARCHIVE)

It seems to me a bit too complex..

+   /* Save the timestamp at which we're switching to 
archive. */
+   if (StreamingReplRetryEnabled())
+   switched_to_archive_at = GetCurrentTimestamp();

Anyway we are going to open a file just after this so
GetCurrentTimestamp() cannot cause a perceptible degradation.
Coulnd't we do that unconditionally, to get rid of the macro?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: START_REPLICATION SLOT causing a crash in an assert build

2022-09-15 Thread Kyotaro Horiguchi

At Thu, 15 Sep 2022 01:26:15 -0500, Jaime Casanova 
 wrote in 
> On Tue, Sep 13, 2022 at 10:07:50PM -0500, Jaime Casanova wrote:
> > On Tue, Sep 13, 2022 at 06:48:45PM +0900, Kyotaro Horiguchi wrote:
> > > 
> > > Another measure would be to add the region to wipe-out on reset to
> > > PgStat_KindInfo, but it seems too much.. (attached)
> > > 
> > 
> > This patch solves the problem, i didn't like the other solution because
> > as you say it partly nullify the protection of the assertion.
> > 
> 
> I talked too fast, while it solves the immediate problem the patch as is
> causes other crashes.

Where did the crash happen?  Is it a bug introduced by it? Or does it
root to other cause?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Avoid use deprecated Windows Memory API

2022-09-15 Thread Daniel Gustafsson

> On 15 Sep 2022, at 01:19, Ranier Vilela  wrote:

> LocalAlloc is deprecated.
> So use HeapAlloc instead, once LocalAlloc is an overhead wrapper to HeapAlloc.
> 
> Attached a patch.

Don't forget that patches which aim to reduce overhead are best when
accompanied with benchmarks which show the effect of the reduction.

-   pacl = (PACL) LocalAlloc(LPTR, dwNewAclSize);
+   pacl = (PACL) HeapAlloc(hDefaultProcessHeap, 0, dwNewAclSize);

These calls are not equal, the LocalAlloc calls zeroes out the allocated memory
but the HeapAlloc does not unless the HEAP_ZERO_MEMORY flag is passed.  I
haven't read the code enough to know if that matters, but it seems relevant to
at least discuss.

--
Daniel Gustafsson   https://vmware.com/

Re: walmethods.c/h are doing some strange things

2022-09-15 Thread velagandula sravan kumar

From: Robert Haas 
Date: Friday, 2 September 2022 at 9:23 PM
To: pgsql-hack...@postgresql.org 
Subject: walmethods.c/h are doing some strange things
Hi,

We have a number of places in the system where we are using
object-oriented design patterns. For example, a foreign data wrapper
returns a table of function pointers which are basically methods for
operating on a planner or executor node that corresponds to a foreign
table that uses that foreign data wrapper. More simply, a
TupleTableSlot or TableAmRoutine or bbstreamer or bbsink object
contains a pointer to a table of callbacks which are methods that can
be applied to that object. walmethods.c/h also try to do something
sort of like this, but I find the way that they do it really weird,
because while Create{Directory|Tar}WalMethod() does return a table of
callbacks, those callbacks aren't tied to any specific object.
Instead, each set of callbacks refers to the one and only object of
that type that can ever exist, and the pointer to that object is
stored in a global variable managed by walmethods.c. So whereas in
other cases we give you the object and then a way to get the
corresponding set of callbacks, here we only give you the callbacks,
and we therefore have to impose the artificial restriction that there
can only ever be one object.

I think it would be better to structure things so that Walfile and
WalWriteMethod function as abstract base classes; that is, each is a
struct containing those members that are common to all
implementations, and then each implementation extends that struct with
whatever additional members it needs. One advantage of this is that it
would allow us to simplify the communication between receivelog.c and
walmethods.c. Right now, for example, there's a get_current_pos()
method in WalWriteMethods. The way that works is that
WalDirectoryMethod has a struct where it stores a 'curpos' value that
is returned by this method, and WalTrMethod has a different struct
that also stores a 'currpos' value that is returned by this method.
There is no real benefit in having the same variable in two different
structs and having to access it via a callback when we could just put
it into a common struct and access it directly. There's also a
compression_algorithm() method which has exactly the same issue,
though that is an overall property of the WalWriteMethod rather than a
per-Walfile property. There's also a getlasterr callback which is
basically just duplicate code across the two implementations; we could
unify that code. There's also a global variable current_walfile_name[]
in receivelog.c which only needs to exist because the file name is
inconveniently hidden inside the WalWriteMethod abstraction layer; we
can just make it visible.

Attached are a couple of hastily-written patches implementing this.
There might be good arguments for more thoroughly renaming some of the
things these patches touch, but I thought that doing any more renaming
would make it less clear what the core of the change is, so I'm
posting it like this for now. One thing I noticed while writing these
patches is that the existing code isn't very clear about whether
"Walfile" is supposed to be an abstraction for a pointer to the
implementation-specific struct, or the struct itself. From looking at
walmethods.h, you'd think it's a pointer to the struct, because we
declare typedef void *Walfile. walmethods.c agrees, but receivelog.c
takes a different view, declaring all of its variables as type
"Walfile *". This doesn't cause a compiler error because void * is
just as interchangeable with void ** as it is with DirectoryMethodFile
* or TarMethodFile *, but I think it is clearly a mistake, and the
approach I'm proposing here makes such mistakes more difficult to
make.

+ 1 on being able to restrict making such mistakes. I had a quick look at the 
patch
and the refactoring makes sense.

Thanks,
Sravan Kumar
www.enterprisedb.com

Aside from the stuff that I am complaining about here which is mostly
stylistic, I think that the division of labor between receivelog.c and
walmethods.c is questionable in a number of ways. There are things
which are specific to one walmethod or the other that are handled in
the common code (receivelog.c) rather than the type-specific code
(walmethod.c), and in general it feels like receivelog.c knows way too
much about what is really happening beneath the abstraction layer that
walmethods.c supposedly creates. This comment is one of the clearer
examples of this:

 /*
  * When streaming to files, if an existing file exists we verify that it's
  * either empty (just created), or a complete WalSegSz segment (in which
  * case it has been created and padded). Anything else indicates a corrupt
  * file. Compressed files have no need for padding, so just ignore this
  * case.
  *
  * When streaming to tar, no file with this name will exist before, so we
  * never have to verify a size.
  */

There's nothing gen

Re: Counterintuitive behavior when toast_tuple_target < TOAST_TUPLE_THRESHOLD

2022-09-15 Thread Aleksander Alekseev

Hi David,

> There was some discussion on this problem in [1].
> [1] https://www.postgresql.org/message-id/20190403063759.gf3...@paquier.xyz

Thanks for sharing this discussion. I missed it.

> The problem with #2 is that if you look at
> heapam_relation_needs_toast_table(), it only decides if the toast
> table should be created based on (tuple_length >
> TOAST_TUPLE_THRESHOLD). So if you were to change the logic as you
> describe for #2 then there might not be a toast table during an
> INSERT/UPDATE.
>
> The only way to fix that would be to ensure that we reconsider if we
> should create a toast table or not when someone changes the
> toast_tuple_target reloption.  That can't be done under
> ShareUpdateExclusiveLock, so we'd need to obtain an
> AccessExclusiveLock instead when changing the toast_tuple_target
> reloption. That might upset some people.

Personally, if I would choose between (A) a feature that is
potentially expensive but useful to many and (B) a feature that in
practice is pretty much useless to most of the users, I would choose
(A). Maybe we will be able to make it a little less expensive if we
optimistically take a shared lock first and then, if necessary, take
an exclusive lock.

> The general direction of [1] was to just increase the minimum setting
> to TOAST_TUPLE_THRESHOLD, but there were some concerns about breaking
> pg_dump as we'd have to error if someone does ALTER TABLE to set the
> toast_tuple_target reloption lower than the newly defined minimum
> value.

Yep, this doesn't seem to be an option.

> I don't quite follow you on #3. If there's no toast table we can't toast.

The case I had in mind was the one when we already have a TOAST table
and then change toast_tuple_target.

In this scenario TOAST_TUPLE_THRESHOLD is used to decide whether TOAST
should be triggered for a given tuple. For how long TOAST will keep
compressing the tuple is controlled by toast_tuple_target, not by
TOAST_TUPLE_THRESHOLD. So the user has control of "target" but there
is no control of "threshold". If the user could set both "threshold"
and "target" low this would solve the problem the user originally had
(the one described in the first email).

Adding a separate "threshold" option doesn't solve the problem when we
change it and there is no TOAST table yet.

I wonder though if we really need two entities - a "threshold" and a
"target". It seems to me that it should actually be one value, no?

-- 
Best regards,
Aleksander Alekseev

Re: Cleaning up historical portability baggage

2022-09-15 Thread John Naylor

On Thu, Sep 15, 2022 at 3:11 PM Ibrar Ahmed  wrote:
> The patch does not apply successfully; please rebase the patch.

There's a good reason for that -- the latest one was committed two
weeks ago. The status should still be waiting on author, though,
namely for:

On Fri, Aug 26, 2022 at 5:28 AM Thomas Munro  wrote:
> Remaining things from this thread:
>  * removing --disable-thread-safety
>  * removing those vestigial HAVE_XXX macros (one by one analysis and patches)
>  * making Unix sockets secure for Windows in tests

-- 
John Naylor
EDB: http://www.enterprisedb.com

Re: failing to build preproc.c on solaris with sun studio

2022-09-15 Thread Justin Pryzby

On Thu, Sep 15, 2022 at 04:53:09PM +1200, Thomas Munro wrote:
> Not sure it's OK to put random junk in ccache's directory, and in any
> case we'd certainly want to teach it to trim itself before doing that
> on CI...

> I suppose a version good enough to live in src/tools would need to
> trim the cache, and I don't enjoy writing code that deletes files in
> shell script, so maybe this'd need to be written in Python...

Maybe it'd be maybe better in python (for portability?).

But also, maybe the cache should be a single file with hash content in
it, plus the two cached files.  Then, rather than storing and pruning N
files with dynamic names, you'd be overwriting the same two files, and
avoid the need to prune.

As I understand, the utility of this cache is rebuilding when the
grammar hasn't changed; but it doesn't seem important to be able to use
cached output when switching branches (for example).

> bison="bison"

Maybe this should do:
: ${BISON:=bison} # assign default value

(and then change to refer to $BISON everywhere)

>   "--version")
>   "-d")
>   "-o")
>   "-"*)

These don't need to be quoted

>   echo "could not find .y file in command line arguments: $@"

Could add >&2 to write to stderr

> if [ -z "$SIMPLE_BISON_CACHE_PATH" ] ; then
>   SIMPLE_BISON_CACHE_PATH="/tmp/simple-bison-cache"
> fi

Maybe
: ${SIMPLE_BISON_CACHE_PATH:=/tmp/simple-bison-cache} # assign default value

-- 
Justin

Re: SUBTRANS: Minimizing calls to SubTransSetParent()

2022-09-15 Thread Simon Riggs

On Wed, 14 Sept 2022 at 15:21, Alvaro Herrera  wrote:
>
> On 2022-Aug-30, Simon Riggs wrote:
>
> > 001_new_isolation_tests_for_subxids.v3.patch
> > Adds new test cases to master without adding any new code, specifically
> > addressing the two areas of code that are not tested by existing tests.
> > This gives us a baseline from which we can do test driven development.
> > I'm hoping this can be reviewed and committed fairly smoothly.
>
> I gave this a quick run to confirm the claimed increase of coverage.  It
> checks out, so pushed.

Thank you.

So now we just have the main part of the patch, reattached here for
the auto patch tester's benefit.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


002_minimize_calls_to_SubTransSetParent.v9.patch
Description: Binary data

Re: Fix gin index cost estimation

2022-09-15 Thread Ronan Dunklau

Le lundi 12 septembre 2022, 16:41:16 CEST Ronan Dunklau a écrit :
> But I realised that another approach might be better suited: since we want 
to
> charge a cpu cost for every page visited, actually basing that on the 
already
> estimated entryPagesFetched and dataPagesFetched would be better, instead of
> copying what is done for other indexes type and estimating the tree height. 
It
> would be simpler, as we don't need to estimate the tree height anymore.
> 
> I will submit a patch doing that.

The attached does that and is much simpler. I only took into account 
entryPagesFetched, not sure if we should also charge something for data pages.

Instead of trying to estimate the height of the tree, we rely on the 
(imperfect) estimation of the number of entry pages fetched, and charge 50 
times cpu_operator_cost to that, in addition to the cpu_operator_cost charged 
per entry visited.

I also adapted to take into accounts multiple scans induced by scalar array 
operations. 

As it is, I don't understand the following calculation:

/*
* Estimate number of entry pages read.  We need to do
* counts.searchEntries searches.  Use a power function as it should be,
 * but tuples on leaf pages usually is much greater. Here we include all
 * searches in entry tree, including search of first entry in partial
 * match algorithm
 */
entryPagesFetched += ceil(counts.searchEntries * rint(pow(numEntryPages, 
0.15)));

Is the power(0.15) used an approximation for a log ? If so why ? Also 
shouldn't we round that up ?
It seems to me it's unlikely to affect the total too much in normal cases 
(adding at worst random_page_cost) but if we start to charge cpu operator 
costs as proposed here it makes a big difference and it is probably
safer to overestimate a bit than the opposite.

With those changes, the gin cost (purely cpu-wise) stays above the btree one 
as I think it should be. 

-- 
Ronan Dunklau>From 8635a22ee5f90297756f072199c69a318bd17ea2 Mon Sep 17 00:00:00 2001
From: Ronan Dunklau 
Date: Mon, 12 Sep 2022 15:40:18 +0200
Subject: [PATCH v2] Fix gin costing.

GIN index scans were not taking any descent CPU-based cost into account. That made
them look cheaper than other types of indexes when they shouldn't be.

We use the same heuristic as for btree indexes, but multiplying it by
the number of searched entries.

Per report of Hung Nguyen.
---
 src/backend/utils/adt/selfuncs.c | 34 +---
 1 file changed, 31 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index c746759eef..881e470a07 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7419,6 +7419,7 @@ gincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 qual_arg_cost,
 spc_random_page_cost,
 outer_scans;
+	Cost		descentCost;
 	Relation	indexRel;
 	GinStatsData ginStats;
 	ListCell   *lc;
@@ -7622,7 +7623,7 @@ gincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	 * searches in entry tree, including search of first entry in partial
 	 * match algorithm
 	 */
-	entryPagesFetched += ceil(counts.searchEntries * rint(pow(numEntryPages, 0.15)));
+	entryPagesFetched += ceil(counts.searchEntries * ceil(pow(numEntryPages, 0.15)));
 
 	/*
 	 * Add an estimate of entry pages read by partial match algorithm. It's a
@@ -7643,6 +7644,33 @@ gincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	 */
 	dataPagesFetched = ceil(numDataPages * partialScale);
 
+	*indexStartupCost = 0;
+	*indexTotalCost = 0;
+
+	/*
+	 * Add a CPU-cost component to represent the costs of initial entry btree
+	 * descent.  We don't charge any I/O cost for touching upper btree levels,
+	 * since they tend to stay in cache, but we still have to do about log2(N)
+	 * comparisons to descend a btree of N leaf tuples.  We charge one
+	 * cpu_operator_cost per comparison.
+	 *
+	 * If there are ScalarArrayOpExprs, charge this once per SA scan.  The
+	 * ones after the first one are not startup cost so far as the overall
+	 * plan is concerned, so add them only to "total" cost.
+	 */
+	if (numEntries > 1)			/* avoid computing log(0) */
+	{
+		descentCost = ceil(log(numEntries) / log(2.0)) * cpu_operator_cost;
+		*indexStartupCost += descentCost * counts.searchEntries;
+		*indexTotalCost += counts.arrayScans * descentCost * counts.searchEntries;
+	}
+
+	/*
+	 * Add a cpu cost per page fetched. This is not amortized over a loop.
+	 */
+	*indexStartupCost += entryPagesFetched * 50.0 * cpu_operator_cost;
+	*indexTotalCost += entryPagesFetched * counts.arrayScans * 50.0 * cpu_operator_cost;
+
 	/*
 	 * Calculate cache effects if more than one scan due to nestloops or array
 	 * quals.  The result is pro-rated per nestloop scan, but the array qual
@@ -7666,7 +7694,7 @@ gincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	 * Here we use random page cost because logically-close pages could be far
 	 * apart on disk

Re: Assertion failure in WaitForWALToBecomeAvailable state machine

2022-09-15 Thread Bharath Rupireddy

On Tue, Sep 13, 2022 at 3:56 PM Kyotaro Horiguchi
 wrote:
>
> > is that XLogShutdownWalRcv() does a bunch of work via ShutdownWalRcv()
> > - it calls ConditionVariablePrepareToSleep(),
>
> Anyway the code path is executed in almost all cases because the same
> assertion fires otherwise. So I don't see a problem if we do the bunch
> of synchronization things also in that rare case.  I'm not sure we
> want to do [3].

IMO, we don't need to let ShutdownWalRcv() to do extra work of
ConditionVariablePrepareToSleep() - WalRcvRunning() -
ConditionVariableCancelSleep() for WALRCV_STOPPED cases when we know
the walreceiver status before - even though it doesn't have any
problems per se and at that place in the code the WALRCV_STOPPED cases
are more frequent than any other walreceiver cases.

Having said that, let's also hear from other hackers.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Avoid use deprecated Windows Memory API

2022-09-15 Thread Ranier Vilela

Em qui., 15 de set. de 2022 às 05:35, Daniel Gustafsson 
escreveu:

> > On 15 Sep 2022, at 01:19, Ranier Vilela  wrote:
>
> > LocalAlloc is deprecated.
> > So use HeapAlloc instead, once LocalAlloc is an overhead wrapper to
> HeapAlloc.
> >
> > Attached a patch.
>
> Don't forget that patches which aim to reduce overhead are best when
> accompanied with benchmarks which show the effect of the reduction.
>
I'm trusting the API producer.


>
> -   pacl = (PACL) LocalAlloc(LPTR, dwNewAclSize);
> +   pacl = (PACL) HeapAlloc(hDefaultProcessHeap, 0, dwNewAclSize);
>
> These calls are not equal, the LocalAlloc calls zeroes out the allocated
> memory
> but the HeapAlloc does not unless the HEAP_ZERO_MEMORY flag is passed.  I
> haven't read the code enough to know if that matters, but it seems
> relevant to
> at least discuss.
>
Yeah, I missed that.
But works fine and passes all tests.
If really ok, yet another improvement by avoiding useless padding.

CF entry created.
https://commitfest.postgresql.org/40/3893/

regards,
Ranier Vilela

Re: SUBTRANS: Minimizing calls to SubTransSetParent()

2022-09-15 Thread Japin Li



On Thu, 15 Sep 2022 at 18:04, Simon Riggs  wrote:
> On Wed, 14 Sept 2022 at 15:21, Alvaro Herrera  wrote:
>>
>> On 2022-Aug-30, Simon Riggs wrote:
>>
>> > 001_new_isolation_tests_for_subxids.v3.patch
>> > Adds new test cases to master without adding any new code, specifically
>> > addressing the two areas of code that are not tested by existing tests.
>> > This gives us a baseline from which we can do test driven development.
>> > I'm hoping this can be reviewed and committed fairly smoothly.
>>
>> I gave this a quick run to confirm the claimed increase of coverage.  It
>> checks out, so pushed.
>
> Thank you.
>
> So now we just have the main part of the patch, reattached here for
> the auto patch tester's benefit.

Hi Simon,

Thanks for the updated patch, here are some comments.

There is a typo, 
s/SubTransGetTopMostTransaction/SubTransGetTopmostTransaction/g.

+*call SubTransGetTopMostTransaction() if that xact 
overflowed;


Is there a punctuation mark missing on the following first line?

+* 2. When IsolationIsSerializable() we sometimes need to 
access topxid
+*This occurs only when SERIALIZABLE is requested by app 
user.


When we use function name in comments, some places we use parentheses,
but others do not use it. Why? I think, we should keep them consistent,
at least in the same commit.

+* 3. When TransactionIdSetTreeStatus will use a status of 
SUB_COMMITTED,
+*which then requires us to consult subtrans to find 
parent, which
+*is needed to avoid race condition. In this case we ask 
Clog/Xact
+*module if TransactionIdsAreOnSameXactPage(). Since we 
start a new
+*clog page every 32000 xids, this is usually <<1% of 
subxids.

Maybe we declaration a topxid to avoid calling GetTopTransactionId()
twice when we should set subtrans parent?

+   TransactionId subxid = 
XidFromFullTransactionId(s->fullTransactionId);
+   TransactionId topxid = GetTopTransactionId();
  ...
+   if (MyProc->subxidStatus.overflowed ||
+   IsolationIsSerializable() ||
+   !TransactionIdsAreOnSameXactPage(topxid, subxid))
+   {
  ...
+   SubTransSetParent(subxid, topxid);
+   }

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Re: Perform streaming logical transactions by background workers and parallel apply

2022-09-15 Thread Amit Kapila

On Thu, Sep 15, 2022 at 10:45 AM wangw.f...@fujitsu.com
 wrote:
>
> Attach the new patch set.
>

Review of v29-0001*
==
1.
+parallel_apply_find_worker(TransactionId xid)
{
...
+ entry = hash_search(ParallelApplyWorkersHash, &xid, HASH_FIND, &found);
+ if (found)
+ {
+ /* If any workers (or the postmaster) have died, we have failed. */
+ if (entry->winfo->error_mq_handle == NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("lost connection to parallel apply worker")));
...
}

I think the above comment is incorrect because if the postmaster would
have died then you wouldn't have found the entry in the hash table.
How about something like: "We can't proceed if the parallel streaming
worker has already exited."

2.
+/*
+ * Find the previously assigned worker for the given transaction, if any.
+ */
+ParallelApplyWorkerInfo *
+parallel_apply_find_worker(TransactionId xid)

No need to use word 'previously' in the above sentence.

3.
+ * We need one key to register the location of the header, and we need
+ * another key to track the location of the message queue.
+ */
+ shm_toc_initialize_estimator(&e);
+ shm_toc_estimate_chunk(&e, sizeof(ParallelApplyWorkerShared));
+ shm_toc_estimate_chunk(&e, queue_size);
+ shm_toc_estimate_chunk(&e, error_queue_size);
+
+ shm_toc_estimate_keys(&e, 3);

Overall, three keys are used but the comment indicates two. You forgot
to mention about error_queue.

4.
+ if (launched)
+ ParallelApplyWorkersList = lappend(ParallelApplyWorkersList, winfo);
+ else
+ {
+ shm_mq_detach(winfo->mq_handle);
+ shm_mq_detach(winfo->error_mq_handle);
+ dsm_detach(winfo->dsm_seg);
+ pfree(winfo);
+
+ winfo = NULL;
+ }

A. The code used in the else part to free worker info is the same as
what is used in parallel_apply_free_worker. Can we move this to a
separate function say parallel_apply_free_worker_info()?
B. I think it will be better if you use {} for if branch to make it
look consistent with else branch.

5.
+ * case define a named savepoint, so that we are able to commit/rollback it
+ * separately later.
+ */
+void
+parallel_apply_subxact_info_add(TransactionId current_xid)

I don't see the need of commit in the above message. So, we can
slightly modify it to: "... so that we are able to rollback to it
separately later."

6.
+ for (i = list_length(subxactlist) - 1; i >= 0; i--)
+ {
+ xid = list_nth_xid(subxactlist, i);
...
...

+/*
+ * Return the TransactionId value contained in the n'th element of the
+ * specified list.
+ */
+static inline TransactionId
+list_nth_xid(const List *list, int n)
+{
+ Assert(IsA(list, XidList));
+ return lfirst_xid(list_nth_cell(list, n));
+}

I am not really sure that we need a new list function to use for this
place. Can't we directly use lfirst_xid(list_nth_cell) instead?

7.
+void
+parallel_apply_replorigin_setup(void)
+{
+ RepOriginId originid;
+ char originname[NAMEDATALEN];
+ bool started_tx = false;
+
+ /* This function might be called inside or outside of transaction. */
+ if (!IsTransactionState())
+ {
+ StartTransactionCommand();
+ started_tx = true;
+ }

Is there a place in the patch where this function will be called
without having an active transaction state? If so, then this coding is
fine but if not, then I suggest keeping an assert for transaction
state here. The same thing applies to
parallel_apply_replorigin_reset() as well.

8.
+ *
+ * If write_abort_lsn is true, send the abort_lsn and abort_time fields,
+ * otherwise don't.
  */
 void
 logicalrep_write_stream_abort(StringInfo out, TransactionId xid,
-   TransactionId subxid)
+   TransactionId subxid, XLogRecPtr abort_lsn,
+   TimestampTz abort_time, bool abort_info)

In the comment, the name of the variable needs to be updated.

9.
+TransactionId stream_xid = InvalidTransactionId;

-static TransactionId stream_xid = InvalidTransactionId;
...
...
+void
+parallel_apply_subxact_info_add(TransactionId current_xid)
+{
+ if (current_xid != stream_xid &&
+ !list_member_xid(subxactlist, current_xid))

It seems you have changed the scope of stream_xid to use it in
parallel_apply_subxact_info_add(). Won't it be better to pass it as a
parameter (say top_xid)?

10.
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -20,6 +20,7 @@
 #include 

 #include "access/xlog.h"
+#include "catalog/pg_subscription.h"
 #include "catalog/pg_type.h"
 #include "common/connect.h"
 #include "funcapi.h"
@@ -443,9 +444,14 @@ libpqrcv_startstreaming(WalReceiverConn *conn,
  appendStringInfo(&cmd, "proto_version '%u'",
  options->proto.logical.proto_version);

- if (options->proto.logical.streaming &&
- PQserverVersion(conn->streamConn) >= 14)
- appendStringInfoString(&cmd, ", streaming 'on'");
+ if (options->proto.logical.streaming != SUBSTREAM_OFF)
+ {
+ if (PQserverVersion(conn->streamConn) >= 16 &&
+ options->proto.logical.streaming == SUBSTREAM_PARALLEL)
+ appendStringInfoString(&cmd,

Re: Improve description of XLOG_RUNNING_XACTS

2022-09-15 Thread Amit Kapila

On Thu, Sep 15, 2022 at 1:26 PM Masahiko Sawada  wrote:
>
> On Wed, Sep 14, 2022 at 6:33 PM Amit Kapila  wrote:
> >
> > On Fri, Sep 9, 2022 at 6:18 AM Masahiko Sawada  
> > wrote:
> > >
> > > Updated the patch accordingly.
> > >
> >
> > I have created two xacts each with savepoints and after your patch,
> > the record will show xacts/subxacts information as below:
> >
> > rmgr: Standby len (rec/tot): 74/74, tx:  0, lsn:
> > 0/014AC238, prev 0/014AC1F8, desc: RUNNING_XACTS nextXid 733
> > latestCompletedXid 726 oldestRunningXid 727; 2 xacts: 729 727; 4
> > subxacts: 730 731 728 732
> >
> > There is no way to associate which subxacts belong to which xact, so
> > will it be useful, and if so, how? I guess we probably don't need it
> > here because the describe records just display the record information.
>
> I think it's useful for debugging purposes. For instance, when I was
> working on the fix 68dcce247f1a13318613a0e27782b2ca21a4ceb7
> (REL_14_STABLE), I checked if all initial running transactions
> including subtransactions are properly stored and purged by checking
> the debug logs and pg_waldump output. Actually, until I realize that
> the description of RUNNING_XACTS doesn't show subtransaction
> information, I was confused by the fact that the saved initial running
> transactions didn't match the description shown by pg_waldump.
>

I see your point but I am still worried due to the concern raised by
Horiguchi-San earlier in this thread that the total number could be as
large as TOTAL_MAX_CACHED_SUBXIDS. I think if we want to include
information only on the number of subxacts then that is clearly an
improvement without any disadvantage.

Does anyone else have an opinion on this matter?

-- 
With Regards,
Amit Kapila.

Re: Avoid use deprecated Windows Memory API

2022-09-15 Thread Alvaro Herrera

On 2022-Sep-14, Ranier Vilela wrote:

> According to:
> https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-localalloc

> LocalAlloc is deprecated.
> So use HeapAlloc instead, once LocalAlloc is an overhead wrapper to
> HeapAlloc.

These functions you are patching are not in performance-sensitive code,
so I doubt this makes any difference performance wise.  I doubt
Microsoft will ever remove these deprecated functions, given its history
of backwards compatibility, so from that perspective this change does
not achieve anything either.

If you were proposing to change how palloc() allocates memory, that
would be quite different and perhaps useful, as long as a benchmark
accompanies the patch.
.
-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-09-15 Thread kuroda.hay...@fujitsu.com

Dear  Önder,

Thank you for proposing good feature. I'm also interested in the patch, 
So I started to review this. Followings are initial comments.

===
For execRelation.c

01. RelationFindReplTupleByIndex()

```
/* Start an index scan. */
InitDirtySnapshot(snap);
-   scan = index_beginscan(rel, idxrel, &snap,
-  
IndexRelationGetNumberOfKeyAttributes(idxrel),
-  0);
 
/* Build scan key. */
-   build_replindex_scan_key(skey, rel, idxrel, searchslot);
+   scankey_attoff = build_replindex_scan_key(skey, rel, idxrel, 
searchslot);
 
+   scan = index_beginscan(rel, idxrel, &snap, scankey_attoff, 0);
```

I think "/* Start an index scan. */" should be just above index_beginscan().

===
For worker.c

02. sable_indexoid_internal()

```
+ * Note that if the corresponding relmapentry has InvalidOid usableIndexOid,
+ * the function returns InvalidOid.
+ */
+static Oid
+usable_indexoid_internal(ApplyExecutionData *edata, ResultRelInfo *relinfo)
```

"InvalidOid usableIndexOid" should be "invalid usableIndexOid,"

03. check_relation_updatable()

```
 * We are in error mode so it's fine this is somewhat slow. It's better 
to
 * give user correct error.
 */
-   if (OidIsValid(GetRelationIdentityOrPK(rel->localrel)))
+   if (OidIsValid(rel->usableIndexOid))
{
```

Shouldn't we change the above comment to? The check is no longer slow.

===
For relation.c

04. GetCheapestReplicaIdentityFullPath()

```
+static Path *
+GetCheapestReplicaIdentityFullPath(Relation localrel)
+{
+   PlannerInfo *root;
+   Query  *query;
+   PlannerGlobal *glob;
+   RangeTblEntry *rte;
+   RelOptInfo *rel;
+   int attno;
+   RangeTblRef *rt;
+   List *joinList;
+   Path *seqScanPath;
```

I think the part that constructs dummy-planner state can be move to another 
function
because that part is not meaningful for this.
Especially line 824-846 can. 


===
For 032_subscribe_use_index.pl

05. general

```
+# insert some initial data within the range 0-1000
+$node_publisher->safe_psql('postgres',
+   "INSERT INTO test_replica_id_full SELECT i%20 FROM 
generate_series(0,1000)i;"
+);
```

It seems that the range of initial data seems [0, 19].
Same mistake-candidates are found many place.

06. general

```
+# updates 1000 rows
+$node_publisher->safe_psql('postgres',
+   "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 15;");
```

Only 50 tuples are modified here.
Same mistake-candidates are found many place.

07. general

```
+# we check if the index is used or not
+$node_subscriber->poll_query_until(
+   'postgres', q{select (idx_scan = 200) from pg_stat_all_indexes where 
indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full_3 
updates 200 rows via index"; 
```
The query will be executed until the index scan is finished, but it may be not 
commented.
How about changing it to "we wait until the index used on the subscriber-side." 
or something?
Same comments are found in many place.

08. test related with ANALYZE

```
+# Testcase start: SUBSCRIPTION CAN UPDATE THE INDEX IT USES AFTER ANALYZE - 
PARTITIONED TABLE
+# 
```

"Testcase start:" should be "Testcase end:" here.

09. general

In some tests results are confirmed but in other test they are not.
I think you can make sure results are expected in any case if there are no 
particular reasons.


Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: why can't a table be part of the same publication as its schema

2022-09-15 Thread houzj.f...@fujitsu.com

On Thursday, September 15, 2022 10:48 AM houzj.f...@fujitsu.com wrote:
> 
> On Thursday, September 15, 2022 3:37 AM Peter Eisentraut
>  wrote:
> 
> Hi,
> 
> >
> > On 14.09.22 07:10, houzj.f...@fujitsu.com wrote:
> > > After applying the patch, we support adding a table with column list
> > > along with the table's schema[1], and it will directly apply the
> > > column list in the logical replication after applying the patch.
> > >
> > > [1]--
> > > CREATE PUBLICATION pub FOR TABLE public.test(a), FOR ALL TABLES IN
> > > SCHEMA public;
> > > -
> > >
> > > If from the point of view of consistency, for column list, we could
> > > report an ERROR because we currently don't allow using different
> > > column lists for a table. Maybe an ERROR like:
> > >
> > > "ERROR: cannot use column for table x when the table's schema is
> > > also in the
> > publication"
> > >
> > > But if we want to report an ERROR for column list in above case. We
> > > might need to restrict the ALTER TABLE SET SCHEMA as well because
> > > user could move a table which is published with column list to a
> > > schema that is also published in the publication, so we might need
> > > to add some similar check(which is removed in Vignesh's patch) to
> > > tablecmd.c to
> > disallow this case.
> > >
> > > Another option could be just ingore the column list if table's
> > > schema is also part of publication. But it seems slightly
> > > inconsistent with the rule that we disallow using different column list 
> > > for a
> table.
> >
> > Ignoring things doesn't seem like a good idea.
> >
> > A solution might be to disallow adding any schemas to a publication if
> > column lists on a table are specified.
> 
> Thanks for the suggestion. If I understand correctly, you mean we can disallow
> publishing a table with column list and any schema(a schema that the table
> might not be part of) in the same publication[1].
> 
> something like--
> [1]CREATE PUBLICATION pub FOR TABLE public.test(a), ALL TABLES IN SCHEMA
> s2;
> ERROR: "cannot add schema to publication when column list is used in the
> published table"
> --
> 
> Personally, it looks acceptable to me as user can anyway achieve the same
> purpose by creating serval publications and combine it and we can save the
> restriction at ALTER TABLE SET SCHEMA. Although it restricts some cases.
> I will post a top-up patch about this soon.
> 
> 
> About the row filter handling, maybe we don't need to restrict row filter like
> above ? Because the rule is to simply merge the row filter with 'OR' among
> publications, so it seems we could ignore the row filter in the publication 
> when
> the table's schema is also published in the same publication(which means no
> filter).

Attach the new version patch which added suggested restriction for column list
and merged Vignesh's patch.

Some other document might need to be updated. I will update them soon.

Best regards,
Hou zj



v3-0001-Allow-creation-of-publication-with-schema-and-tab.patch
Description:  v3-0001-Allow-creation-of-publication-with-schema-and-tab.patch

Re: Avoid use deprecated Windows Memory API

2022-09-15 Thread Aleksander Alekseev

Hi Ranier,

> use HeapAlloc instead, once LocalAlloc is an overhead wrapper to HeapAlloc.

Thanks for the patch.

Although MSDN doesn't explicitly say that LocalAlloc is _depricated_
+1 for replacing it. I agree with Alvaro that it is unlikely to be
ever removed, but this is a trivial change, so let's keep the code a
bit cleaner. Also I agree that no benchmarking is required for this
patch since the code is not performance-sensitive.

> These calls are not equal, the LocalAlloc calls zeroes out the allocated 
> memory

I took a look. The memory is initialized below by InitializeAcl() /
GetTokenInformation() [1][2] so it seems we are fine here.

The patch didn't have a proper commit message and had some issues with
the formating. I fixed this. The new code checks the return value of
GetProcessHeap() which is unlikely to fail, so I figured unlikely() is
appropriate:

+   hDefaultProcessHeap = GetProcessHeap();
+   if (unlikely(hDefaultProcessHeap == NULL))

The corrected patch is attached.

[1]: 
https://docs.microsoft.com/en-us/windows/win32/api/securitybaseapi/nf-securitybaseapi-initializeacl
[2]: 
https://docs.microsoft.com/en-us/windows/win32/api/securitybaseapi/nf-securitybaseapi-gettokeninformation

-- 
Best regards,
Aleksander Alekseev


v2-0001-Replace-LocalAlloc-LocalFree-with-HeapAlloc-HeapF.patch
Description: Binary data

Re: Avoid use deprecated Windows Memory API

2022-09-15 Thread Ranier Vilela

Em qui., 15 de set. de 2022 às 09:58, Aleksander Alekseev <
aleksan...@timescale.com> escreveu:

> Hi Ranier,
>
> > use HeapAlloc instead, once LocalAlloc is an overhead wrapper to
> HeapAlloc.
>
> Thanks for the patch.
>
> Although MSDN doesn't explicitly say that LocalAlloc is _depricated_
> +1 for replacing it.

The MSDN says:
" New applications should use the heap functions
".
IMO Postgres 16, fits here.

I agree with Alvaro that it is unlikely to be
> ever removed, but this is a trivial change, so let's keep the code a
> bit cleaner. Also I agree that no benchmarking is required for this
> patch since the code is not performance-sensitive.
>
In no time, the patch claims performance.
So much so that the open CF is in refactoring.


> > These calls are not equal, the LocalAlloc calls zeroes out the allocated
> memory
>
> I took a look. The memory is initialized below by InitializeAcl() /
> GetTokenInformation() [1][2] so it seems we are fine here.
>
> The patch didn't have a proper commit message and had some issues with
> the formating. I fixed this. The new code checks the return value of
> GetProcessHeap() which is unlikely to fail, so I figured unlikely() is
> appropriate:
>
> +   hDefaultProcessHeap = GetProcessHeap();
> +   if (unlikely(hDefaultProcessHeap == NULL))
>
> The corrected patch is attached.
>
Thanks for the review and fixes.

regards,
Ranier Vilela

Re: pgsql: Doc: Explain about Column List feature.

2022-09-15 Thread Alvaro Herrera

On 2022-Sep-14, Peter Smith wrote:

> On Tue, Sep 13, 2022 at 10:11 PM Alvaro Herrera  
> wrote:

> > On 2022-Sep-07, Amit Kapila wrote:

> > One more thing. There's a sect2 about combining column list.  Part of it
> > seems pretty judgmental and I see no reason to have it in there; I
> > propose to remove it (it's not in this patch).  I think we should just
> > say it doesn't work at present, here's how to work around it, and
> > perhaps even say that we may lift the restriction in the future.  The
> > paragraph that starts with "Background:" is IMO out of place, and it
> > repeats the mistake that column lists are for security.
> 
> It wasn't clear which part you felt was judgemental. I have removed
> the "Background" paragraph but I have otherwise left that section and
> Warning as-is because it still seemed useful for the user to know. You
> can change/remove it if you disagree.

I meant the Background part that you remove, yeah.

Looking at the rendered docs again, I notice that section "31.4.5.
Combining Multiple Column Lists" is *only* the red-tinted Warning block.
That seems quite odd.  I am tempted to remove the sect2 heading for that
one too.  I am now wondering how to split things between the normative bits
  "It is not supported to have a subscription comprising several
  publications where the same table has been published with different
  column lists."

and the informative bits
  "This means changing the column lists of the tables being subscribed
  could cause inconsistency of column lists among publications, in which
  case the ALTER PUBLICATION will be successful but later the walsender
  on the publisher, or the subscriber may throw an error. In this
  scenario, the user needs to recreate the subscription after adjusting
  the column list or drop the problematic publication using ALTER
  SUBSCRIPTION ... DROP PUBLICATION and then add it back after adjusting
  the column list."

> > Lastly: In the create-publication reference page I think it would be
> > better to reword the Parameters section a bit.  It mentions
> > FOR TABLE as a parameter, but the parameter is actually
> > table_name; and the row-filter and
> > column-list explanations are also put in there when they should be in
> > their own expression and column_name
> > varlistentries.  I think splitting things that way would result in a
> > clearer explanation.
> 
> IMO this should be proposed as a separate patch. Some of those things
> (e.g. FOR TABLE as a parameter) seem to have been written that way
> since PG10.

Agreed.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Aprender sin pensar es inútil; pensar sin aprender, peligroso" (Confucio)

Re: Avoid use deprecated Windows Memory API

2022-09-15 Thread Ranier Vilela

Em qui., 15 de set. de 2022 às 09:50, Alvaro Herrera <
alvhe...@alvh.no-ip.org> escreveu:

> On 2022-Sep-14, Ranier Vilela wrote:
>
> > According to:
> >
> https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-localalloc
>
> > LocalAlloc is deprecated.
> > So use HeapAlloc instead, once LocalAlloc is an overhead wrapper to
> > HeapAlloc.
>
> These functions you are patching are not in performance-sensitive code,
> so I doubt this makes any difference performance wise.  I doubt
> Microsoft will ever remove these deprecated functions, given its history
> of backwards compatibility, so from that perspective this change does
> not achieve anything either.
>
If users don't adapt to the new API, the old one will never really expire.


> If you were proposing to change how palloc() allocates memory, that
> would be quite different and perhaps useful, as long as a benchmark
> accompanies the patch.
>
This is irrelevant to the discussion.
Neither the patch nor the thread deals with palloc.

regards,
Ranier Vilela

Re: Avoid use deprecated Windows Memory API

2022-09-15 Thread Alvaro Herrera

On 2022-Sep-15, Aleksander Alekseev wrote:

> I agree with Alvaro that it is unlikely to be ever removed, but this
> is a trivial change, so let's keep the code a bit cleaner.

In what way is this code cleaner?  I argue it is the opposite.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Nunca confiaré en un traidor.  Ni siquiera si el traidor lo he creado yo"
(Barón Vladimir Harkonnen)

Re: Avoid use deprecated Windows Memory API

2022-09-15 Thread Alvaro Herrera

On 2022-Sep-15, Ranier Vilela wrote:

> Em qui., 15 de set. de 2022 às 09:50, Alvaro Herrera <
> alvhe...@alvh.no-ip.org> escreveu:

> > These functions you are patching are not in performance-sensitive code,
> > so I doubt this makes any difference performance wise.  I doubt
> > Microsoft will ever remove these deprecated functions, given its history
> > of backwards compatibility, so from that perspective this change does
> > not achieve anything either.
>
> If users don't adapt to the new API, the old one will never really expire.

If you are claiming that Microsoft will remove the old API because
Postgres stopped using it, ... sorry, no.

> Neither the patch nor the thread deals with palloc.

I know.  Which is why I think the patch is useless.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: Avoid use deprecated Windows Memory API

2022-09-15 Thread Aleksander Alekseev

Hi Alvaro,

> In what way is this code cleaner?  I argue it is the opposite.

Well, I guess it depends on the perspective. There are a bit more
lines of code for sure. So if "less code is better" is the criteria,
then no, the new code is not cleaner. If the criteria is to use an API
according to the spec provided by the vendor, to me personally the new
code looks cleaner.

This being said I don't have a strong opinion on whether this patch
should be accepted or not. I completely agree that MS will actually
keep LocalAlloc() indefinitely for backward compatibility with
existing applications, as the company always did.

-- 
Best regards,
Aleksander Alekseev

Re: Avoid use deprecated Windows Memory API

2022-09-15 Thread Ranier Vilela

Em qui., 15 de set. de 2022 às 10:13, Alvaro Herrera <
alvhe...@alvh.no-ip.org> escreveu:

> On 2022-Sep-15, Ranier Vilela wrote:
>
> > Em qui., 15 de set. de 2022 às 09:50, Alvaro Herrera <
> > alvhe...@alvh.no-ip.org> escreveu:
>
> > > These functions you are patching are not in performance-sensitive code,
> > > so I doubt this makes any difference performance wise.  I doubt
> > > Microsoft will ever remove these deprecated functions, given its
> history
> > > of backwards compatibility, so from that perspective this change does
> > > not achieve anything either.
> >
> > If users don't adapt to the new API, the old one will never really
> expire.
>
> If you are claiming that Microsoft will remove the old API because
> Postgres stopped using it, ... sorry, no.
>
Certainly not.
But Postgres should be an example.
By removing the old API, Postgres encourages others to do so.
So who knows, one day, the OS might finally get rid of this useless burden.


> > Neither the patch nor the thread deals with palloc.
>
> I know.  Which is why I think the patch is useless.
>
Other hackers think differently, thankfully.

regards,
Ranier Vilela

Re: Avoid use deprecated Windows Memory API

2022-09-15 Thread Daniel Gustafsson

> On 15 Sep 2022, at 15:13, Alvaro Herrera  wrote:
> 
> On 2022-Sep-15, Ranier Vilela wrote:
> 
>> Em qui., 15 de set. de 2022 às 09:50, Alvaro Herrera <
>> alvhe...@alvh.no-ip.org> escreveu:
> 
>>> These functions you are patching are not in performance-sensitive code,
>>> so I doubt this makes any difference performance wise.  I doubt
>>> Microsoft will ever remove these deprecated functions, given its history
>>> of backwards compatibility, so from that perspective this change does
>>> not achieve anything either.
>> 
>> If users don't adapt to the new API, the old one will never really expire.
> 
> If you are claiming that Microsoft will remove the old API because
> Postgres stopped using it, ... sorry, no.

Also, worth noting is that these functions aren't actually deprecated.  The
note in the docs state:

The local functions have greater overhead and provide fewer features
than other memory management functions.  New applications should use
the heap functions unless documentation states that a local function
should be used.

And following the bouncing ball into the documentation they refer to [0] I read
this:

As a result, the global and local families of functions are equivalent
and choosing between them is a matter of personal preference.

--
Daniel Gustafsson   https://vmware.com/

[0]: 
https://docs.microsoft.com/en-us/windows/win32/memory/global-and-local-functions

Re: Avoid use deprecated Windows Memory API

2022-09-15 Thread Ranier Vilela

Em qui., 15 de set. de 2022 às 10:30, Daniel Gustafsson 
escreveu:

> > On 15 Sep 2022, at 15:13, Alvaro Herrera 
> wrote:
> >
> > On 2022-Sep-15, Ranier Vilela wrote:
> >
> >> Em qui., 15 de set. de 2022 às 09:50, Alvaro Herrera <
> >> alvhe...@alvh.no-ip.org> escreveu:
> >
> >>> These functions you are patching are not in performance-sensitive code,
> >>> so I doubt this makes any difference performance wise.  I doubt
> >>> Microsoft will ever remove these deprecated functions, given its
> history
> >>> of backwards compatibility, so from that perspective this change does
> >>> not achieve anything either.
> >>
> >> If users don't adapt to the new API, the old one will never really
> expire.
> >
> > If you are claiming that Microsoft will remove the old API because
> > Postgres stopped using it, ... sorry, no.
>
> Also, worth noting is that these functions aren't actually deprecated.  The
> note in the docs state:
>
Daniel, I agree that MSDN could be better written.
See:
"Remarks

Windows memory management does not provide a separate local heap and global
heap. Therefore, the *LocalAlloc* and GlobalAlloc

functions are essentially the same.

The movable-memory flags *LHND*, *LMEM_MOVABLE*, and *NONZEROLHND* add
unnecessary overhead and require locking to be used safely. They should be
avoided unless documentation specifically states that they should be used.

New applications should use the heap functions
"

Again, MSDN claims to use a new API.

And following the bouncing ball into the documentation they refer to [0] I
> read
> this:
>
> As a result, the global and local families of functions are
> equivalent
> and choosing between them is a matter of personal preference.
>
IMO,  This part has nothing to do with it.

regards,
Ranier Vilela

Re: Cleaning up historical portability baggage

2022-09-15 Thread Tom Lane

John Naylor  writes:
> On Thu, Sep 15, 2022 at 3:11 PM Ibrar Ahmed  wrote:
>> The patch does not apply successfully; please rebase the patch.

> There's a good reason for that -- the latest one was committed two
> weeks ago. The status should still be waiting on author, though,
> namely for:

> On Fri, Aug 26, 2022 at 5:28 AM Thomas Munro  wrote:
>> Remaining things from this thread:
>> * removing --disable-thread-safety
>> * removing those vestigial HAVE_XXX macros (one by one analysis and patches)
>> * making Unix sockets secure for Windows in tests

I imagine we should just close the current CF entry as committed.
There's no patch in existence for any of those TODO items, and
I didn't think one was imminent.

regards, tom lane

Re: Assertion failure in WaitForWALToBecomeAvailable state machine

2022-09-15 Thread Noah Misch

On Tue, Sep 13, 2022 at 11:56:16AM +0530, Bharath Rupireddy wrote:
> On Tue, Sep 13, 2022 at 8:52 AM Noah Misch  wrote:
> > > > > [1] - 
> > > > > https://www.postgresql.org/message-id/flat/20220909.172949.2223165886970819060.horikyota.ntt%40gmail.com
> >
> > I plan to use that message's patch, because it guarantees WALRCV_STOPPED at
> > the code location being changed.  Today, in the unlikely event of
> > !WalRcvStreaming() due to WALRCV_WAITING or WALRCV_STOPPING, that code
> > proceeds without waiting for WALRCV_STOPPED.

Pushed that way.

> Hm. That was the original fix [2] proposed and it works. The concern
> is that XLogShutdownWalRcv() does a bunch of work via ShutdownWalRcv()
> - it calls ConditionVariablePrepareToSleep(),
> ConditionVariableCancelSleep() (has lock 2 acquisitions and
> requisitions) and 1 function call WalRcvRunning()) even for
> WALRCV_STOPPED case, all this is unnecessary IMO when we determine the
> walreceiver is state is already WALRCV_STOPPED.

That's fine.  If we're reaching this code at high frequency, that implies
we're also forking walreceiver processes at high frequency.  This code would
be a trivial part of the overall cost.

> > If WALRCV_WAITING or WALRCV_STOPPING can happen at that patch's code site, I
> > perhaps should back-patch the change to released versions.  Does anyone know
> > whether one or both can happen?

If anyone discovers such cases later, we can extend the back-patch then.

> IMO, we must back-patch to the version where
> cc2c7d65fc27e877c9f407587b0b92d46cd6dd16  got introduced irrespective
> of any of the above happening.

Correct.  The sentences were about *released* versions, not v15.

Re: Can we avoid chdir'ing in resolve_symlinks() ?

2022-09-15 Thread Peter Eisentraut


On 13.09.22 17:16, Tom Lane wrote:

Peter Eisentraut  writes:

2) You configure and install with prefix=/usr/local/pgsql-14, and then
symlink /usr/local/pgsql -> /usr/local/pgsql-14; hoping that you can
then use /usr/local/pgsql as if that's where it actually is.  We don't
currently support that.  (Note that it would work if you made a copy of
the tree instead of using the symlink.)


What about it does not work?


The problem is if another package or extension uses pg_config to find, 
say, libdir, includedir, or bindir and integrates it into its own build 
system or its own build products.  If those directories point to 
/usr/local/pgsql/{bin,include,lib}, then there is no problem.  But if 
they point to /usr/local/pgsql-14.5/{bin,include,lib}, then the next 
minor update will break those other packages.

Re: Can we avoid chdir'ing in resolve_symlinks() ?

2022-09-15 Thread Tom Lane

Peter Eisentraut  writes:
> On 13.09.22 17:16, Tom Lane wrote:
>> What about it does not work?

> The problem is if another package or extension uses pg_config to find, 
> say, libdir, includedir, or bindir and integrates it into its own build 
> system or its own build products.  If those directories point to 
> /usr/local/pgsql/{bin,include,lib}, then there is no problem.  But if 
> they point to /usr/local/pgsql-14.5/{bin,include,lib}, then the next 
> minor update will break those other packages.

That seems ... a tad far-fetched, and even more to the point,
it'd be the other package's fault not ours.  We have never promised
that those directories point to anyplace that's not PG-specific.
I certainly do not buy that that's a good argument for breaking
Postgres installation setups that work today.

Also, there is nothing in that scenario that is in any way dependent
on the use of symlinks, or even absolute paths, so I don't quite
see the relevance to the current discussion.

regards, tom lane

Typo in xact.c

2022-09-15 Thread Japin Li


Hi hacker,

Recently, I find there might be a typo in xact.c comments.  The comments
say "PG_PROC", however, it actually means "PGPROC" structure.  Since we
have pg_proc catalog, and use PG_PROC to reference the catalog [1], so,
we should use PGPROC to reference the structure.  Any thoughts?

[1] src/include/nodes/primnodes.h

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

diff --git a/src/backend/access/transam/README b/src/backend/access/transam/README
index 734c39a4d0..72af656060 100644
--- a/src/backend/access/transam/README
+++ b/src/backend/access/transam/README
@@ -198,7 +198,7 @@ parent.  This maintains the invariant that child transactions have XIDs later
 than their parents, which is assumed in a number of places.
 
 The subsidiary actions of obtaining a lock on the XID and entering it into
-pg_subtrans and PG_PROC are done at the time it is assigned.
+pg_subtrans and PGPROC are done at the time it is assigned.
 
 A transaction that has no XID still needs to be identified for various
 purposes, notably holding locks.  For this purpose we assign a "virtual
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 50f092d7eb..7abc6a0705 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -680,12 +680,12 @@ AssignTransactionId(TransactionState s)
 		log_unknown_top = true;
 
 	/*
-	 * Generate a new FullTransactionId and record its xid in PG_PROC and
+	 * Generate a new FullTransactionId and record its xid in PGPROC and
 	 * pg_subtrans.
 	 *
 	 * NB: we must make the subtrans entry BEFORE the Xid appears anywhere in
-	 * shared storage other than PG_PROC; because if there's no room for it in
-	 * PG_PROC, the subtrans entry is needed to ensure that other backends see
+	 * shared storage other than PGPROC; because if there's no room for it in
+	 * PGPROC, the subtrans entry is needed to ensure that other backends see
 	 * the Xid as "running".  See GetNewTransactionId.
 	 */
 	s->fullTransactionId = GetNewTransactionId(isSubXact);

Re: Add the ability to limit the amount of memory that can be allocated to backends.

2022-09-15 Thread Reid Thompson

On Thu, 2022-09-15 at 12:07 +0400, Ibrar Ahmed wrote:
> 
> The patch does not apply; please rebase the patch.
> 
> patching file src/backend/utils/misc/guc.c
> Hunk #1 FAILED at 3664.
> 1 out of 1 hunk FAILED -- saving rejects to file
> src/backend/utils/misc/guc.c.rej 
> 
> patching file src/backend/utils/misc/postgresql.conf.sample
> 

rebased patches attached.

Thanks,
Reid

From 0e7010c53508d5a396edd16fd9166abe431f5dbe Mon Sep 17 00:00:00 2001
From: Reid Thompson 
Date: Thu, 11 Aug 2022 12:01:25 -0400
Subject: [PATCH 1/2] Add tracking of backend memory allocated to
 pg_stat_activity

This new field displays the current bytes of memory allocated to the
backend process. It is updated as memory for the process is
malloc'd/free'd. Memory allocated to items on the freelist is included in
the displayed value.  Dynamic shared memory allocations are included
only in the value displayed for the backend that created them, they are
not included in the value for backends that are attached to them to
avoid double counting. On occasion, orphaned memory segments may be
cleaned up on postmaster startup. This may result in decreasing the sum
without a prior increment. We limit the floor of backend_mem_allocated
to zero. Updated pg_stat_activity documentation for the new column.
---
 doc/src/sgml/monitoring.sgml|  12 +++
 src/backend/catalog/system_views.sql|   1 +
 src/backend/storage/ipc/dsm_impl.c  |  80 +++
 src/backend/utils/activity/backend_status.c | 105 
 src/backend/utils/adt/pgstatfuncs.c |   4 +-
 src/backend/utils/mmgr/aset.c   |  18 
 src/backend/utils/mmgr/generation.c |  15 +++
 src/backend/utils/mmgr/slab.c   |  21 
 src/include/catalog/pg_proc.dat |   6 +-
 src/include/utils/backend_status.h  |   7 +-
 src/test/regress/expected/rules.out |   9 +-
 11 files changed, 269 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 1d9509a2f6..40ae638f25 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -947,6 +947,18 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
  
 
 
+
+ 
+  backend_mem_allocated bigint
+ 
+ 
+  The byte count of memory allocated to this backend. Dynamic shared memory
+  allocations are included only in the value displayed for the backend that
+  created them, they are not included in the value for backends that are
+  attached to them to avoid double counting.
+ 
+
+
  
   
query text
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 55f7ec79e0..a78750ab12 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -862,6 +862,7 @@ CREATE VIEW pg_stat_activity AS
 S.backend_xid,
 s.backend_xmin,
 S.query_id,
+S.backend_mem_allocated,
 S.query,
 S.backend_type
 FROM pg_stat_get_activity(NULL) AS S
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index e1b90c5de4..3356bb65b5 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -66,6 +66,7 @@
 #include "postmaster/postmaster.h"
 #include "storage/dsm_impl.h"
 #include "storage/fd.h"
+#include "utils/backend_status.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
 
@@ -232,6 +233,13 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
 			name)));
 			return false;
 		}
+		/*
+		 * Detach and destroy pass through here, only decrease the memory
+		 * shown allocated in pg_stat_activity when the creator destroys the
+		 * allocation.
+		 */
+		if (op == DSM_OP_DESTROY)
+			pgstat_report_backend_mem_allocated_decrease(*mapped_size);
 		*mapped_address = NULL;
 		*mapped_size = 0;
 		if (op == DSM_OP_DESTROY && shm_unlink(name) != 0)
@@ -332,6 +340,36 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size,
 		name)));
 		return false;
 	}
+
+	/*
+	 * Attach and create pass through here, only update backend memory
+	 * allocated in pg_stat_activity for the creator process.
+	 */
+	if (op == DSM_OP_CREATE)
+	{
+		/*
+		 * Posix creation calls dsm_impl_posix_resize implying that resizing
+		 * occurs or may be added in the future. As implemented
+		 * dsm_impl_posix_resize utilizes fallocate or truncate, passing the
+		 * whole new size as input, growing the allocation as needed (only
+		 * truncate supports shrinking). We update by replacing the old
+		 * allocation with the new.
+		 */
+#if defined(HAVE_POSIX_FALLOCATE) && defined(__linux__)
+		/*
+		 * posix_fallocate does not shrink allocations, adjust only on
+		 * allocation increase.
+		 */
+		if (request_size > *mapped_size)
+		{
+			pgstat_report_backend_mem_allocated_decrease(*mapped_size);
+			pgstat

Re: ICU for global collation

2022-09-15 Thread Marina Polyakova


On 2022-09-15 09:52, Kyotaro Horiguchi wrote:

If I executed initdb as follows, I would be told to specify
--icu-locale option.


$ initdb --encoding sql-ascii --locale-provider icu hoge
...
initdb: error: ICU locale must be specified


However, when I reran the command, it complains about incompatible
encoding this time.  I think it's more user-friendly to check for the
encoding compatibility before the check for missing --icu-locale
option.

regards.


I agree with you. Here's another version of the patch. The 
locale/encoding checks and reports in initdb have been reordered, 
because now the encoding is set first and only then the ICU locale is 
checked.


P.S. While working on the patch, I discovered that UTF8 encoding is 
always used for the ICU provider in initdb unless it is explicitly 
specified by the user:


if (!encoding && locale_provider == COLLPROVIDER_ICU)
encodingid = PG_UTF8;

IMO this creates additional errors for locales with other encodings:

$ initdb --locale de_DE.iso885915@euro --locale-provider icu 
--icu-locale de-DE

...
initdb: error: encoding mismatch
initdb: detail: The encoding you selected (UTF8) and the encoding that 
the selected locale uses (LATIN9) do not match. This would lead to 
misbehavior in various character string processing functions.
initdb: hint: Rerun initdb and either do not specify an encoding 
explicitly, or choose a matching combination.


And ICU supports many encodings, see the contents of pg_enc2icu_tbl in 
encnames.c...


--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 6ff48bb18f3639ae45d9528b32df51a4aebc60c0..f248ad42b77c8c0cf2089963d4357b120914ce20 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1034,6 +1034,12 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 
 	if (dblocprovider == COLLPROVIDER_ICU)
 	{
+		if (!(is_encoding_supported_by_icu(encoding)))
+			ereport(ERROR,
+	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+	 errmsg("encoding \"%s\" is not supported with ICU provider",
+			pg_encoding_to_char(encoding;
+
 		/*
 		 * This would happen if template0 uses the libc provider but the new
 		 * database uses icu.
@@ -1042,10 +1048,9 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 			ereport(ERROR,
 	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 	 errmsg("ICU locale must be specified")));
-	}
 
-	if (dblocprovider == COLLPROVIDER_ICU)
 		check_icu_locale(dbiculocale);
+	}
 
 	/*
 	 * Check that the new encoding and locale settings match the source
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 6aeec8d426c52414b827686781c245291f27ed1f..999e7c2bf8dc707063eddf19a5c27eed03d3ca09 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2092,20 +2092,30 @@ setlocales(void)
 	check_locale_name(LC_CTYPE, lc_messages, &canonname);
 	lc_messages = canonname;
 #endif
+}
 
-	if (locale_provider == COLLPROVIDER_ICU)
+static void
+check_icu_locale_encoding(void)
+{
+	if (!(is_encoding_supported_by_icu(encodingid)))
 	{
-		if (!icu_locale)
-			pg_fatal("ICU locale must be specified");
+		pg_log_error("encoding \"%s\" is not supported with ICU provider",
+	 pg_encoding_to_char(encodingid));
+		pg_log_error_hint("Rerun %s and choose a matching combination.",
+		  progname);
+		exit(1);
+	}
 
-		/*
-		 * In supported builds, the ICU locale ID will be checked by the
-		 * backend during post-bootstrap initialization.
-		 */
+	if (!icu_locale)
+		pg_fatal("ICU locale must be specified");
+
+	/*
+	 * In supported builds, the ICU locale ID will be checked by the backend
+	 * during post-bootstrap initialization.
+	 */
 #ifndef USE_ICU
-		pg_fatal("ICU is not supported in this build");
+	pg_fatal("ICU is not supported in this build");
 #endif
-	}
 }
 
 /*
@@ -2281,34 +2291,6 @@ setup_locale_encoding(void)
 {
 	setlocales();
 
-	if (locale_provider == COLLPROVIDER_LIBC &&
-		strcmp(lc_ctype, lc_collate) == 0 &&
-		strcmp(lc_ctype, lc_time) == 0 &&
-		strcmp(lc_ctype, lc_numeric) == 0 &&
-		strcmp(lc_ctype, lc_monetary) == 0 &&
-		strcmp(lc_ctype, lc_messages) == 0 &&
-		(!icu_locale || strcmp(lc_ctype, icu_locale) == 0))
-		printf(_("The database cluster will be initialized with locale \"%s\".\n"), lc_ctype);
-	else
-	{
-		printf(_("The database cluster will be initialized with this locale configuration:\n"));
-		printf(_("  provider:%s\n"), collprovider_name(locale_provider));
-		if (icu_locale)
-			printf(_("  ICU locale:  %s\n"), icu_locale);
-		printf(_("  LC_COLLATE:  %s\n"
- "  LC_CTYPE:%s\n"
- "  LC_MESSAGES: %s\n"
- "  LC_MONETARY: %s\n"
- "  LC_NUMERIC:  %s\n"
- "  LC_TIME: %s\n"),
-			   lc_collate,
-			   lc_ctype,
-			   lc_messages,
-			   lc_monetary,
-			   lc_numeric,
-			   lc_time);
-	}
-
 	if (!encoding && locale_provider == COLLPROVIDER_ICU)
 		encod

Re: pgsql: Doc: Explain about Column List feature.

2022-09-15 Thread Alvaro Herrera

On 2022-Sep-15, Alvaro Herrera wrote:

> Looking at the rendered docs again, I notice that section "31.4.5.
> Combining Multiple Column Lists" is *only* the red-tinted Warning block.
> That seems quite odd.  I am tempted to remove the sect2 heading for that
> one too.

Pushed.  I didn't modify this part; I spent too much time looking at it
trying to figure out how to do it.  I think this bit really belongs in
the CREATE/ALTER docs rather than this chapter.  But in order to support
having a separate  for the restriction on combination, we need a
separate  for the column_name parameter. So I'm going to
edit that one and I'll throw this change in.

Thanks, Peter, for the discussion.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Pensar que el espectro que vemos es ilusorio no lo despoja de espanto,
sólo le suma el nuevo terror de la locura" (Perelandra, C.S. Lewis)

Re: START_REPLICATION SLOT causing a crash in an assert build

2022-09-15 Thread Jaime Casanova

On Thu, Sep 15, 2022 at 05:30:11PM +0900, Kyotaro Horiguchi wrote:
> At Thu, 15 Sep 2022 01:26:15 -0500, Jaime Casanova 
>  wrote in 
> > On Tue, Sep 13, 2022 at 10:07:50PM -0500, Jaime Casanova wrote:
> > > On Tue, Sep 13, 2022 at 06:48:45PM +0900, Kyotaro Horiguchi wrote:
> > > > 
> > > > Another measure would be to add the region to wipe-out on reset to
> > > > PgStat_KindInfo, but it seems too much.. (attached)
> > > > 
> > > 
> > > This patch solves the problem, i didn't like the other solution because
> > > as you say it partly nullify the protection of the assertion.
> > > 
> > 
> > I talked too fast, while it solves the immediate problem the patch as is
> > causes other crashes.
> 
> Where did the crash happen?  Is it a bug introduced by it? Or does it
> root to other cause?
> 

Just compile and run the installcheck tests.

It fails at ./src/backend/utils/activity/pgstat_shmem.c:530 inside
pgstat_release_entry_ref() because it expects a "deadbeef", it seems to
be a magic variable but cannot find what its use is.

Attached a backtrace.

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
set = {__val = {4194304, 140727848465760, 2, 6, 6808139, 
94494822015184, 
4611686018427388799, 139771048516262, 0, 281470681751456, 0, 0, 0, 
0, 0, 0}}
pid = 
tid = 
ret = 
#1  0x7f1efb762535 in __GI_abort () at abort.c:79
save_stage = 1
act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, 
sa_mask = {__val = {
  0, 0, 0, 0, 0, 139771046014965, 2, 3559591060477507152, 
7018350264137834804, 
  94494822015184, 7003716880224747600, 0, 9726297134600432896, 
140727848466000, 0, 
  140727848466864}}, sa_flags = 1246535888, sa_restorer = 0x0}
sigs = {__val = {32, 0 }}
#2  0x55f14ab8e280 in ExceptionalCondition (
conditionName=0x55f14ad70420 "entry_ref->shared_stats->magic == 
0xdeadbeef", 
errorType=0x55f14ad70073 "FailedAssertion", fileName=0x55f14ad702da 
"pgstat_shmem.c", 
lineNumber=530) at assert.c:69
No locals.
#3  0x55f14aa15a1e in pgstat_release_entry_ref (key=..., 
entry_ref=0x55f14bf71098, 
discard_pending=false) at pgstat_shmem.c:530
__func__ = "pgstat_release_entry_ref"
#4  0x55f14aa15f68 in pgstat_release_matching_entry_refs 
(discard_pending=false, match=0x0, 
match_data=0) at pgstat_shmem.c:699
i = {cur = 255, end = 1, done = false}
ent = 0x55f14bf450f0
#5  0x55f14aa15fc0 in pgstat_release_all_entry_refs (discard_pending=false)
at pgstat_shmem.c:715
No locals.
#6  0x55f14aa15203 in pgstat_detach_shmem () at pgstat_shmem.c:238
No locals.
#7  0x55f14aa0de82 in pgstat_shutdown_hook (code=0, arg=0) at pgstat.c:520
No locals.
#8  0x55f14a9b2b27 in shmem_exit (code=0) at ipc.c:239
__func__ = "shmem_exit"
#9  0x55f14a9b29df in proc_exit_prepare (code=0) at ipc.c:194
__func__ = "proc_exit_prepare"
#10 0x55f14a9b2930 in proc_exit (code=0) at ipc.c:107
__func__ = "proc_exit"
#11 0x55f14a9ed1a0 in PostgresMain (dbname=0x55f14bed0e90 "regression", 
username=0x55f14bed0e68 "jcasanov") at postgres.c:4795
firstchar = 88
input_message = {data = 0x55f14bea4900 "", len = 0, maxlen = 1024, 
cursor = 0}
local_sigjmp_buf = {{__jmpbuf = {0, -2769250407385151644, 
94494822015184, 
  140727848468560, 0, 0, -2769250407297071260, 
-8248146431642187932}, 
__mask_was_saved = 1, __saved_mask = {__val = {4194304, 
8534995794563506275, 15679, 
15680, 979, 18446744073709551536, 0, 0, 139771044971635, 3904, 
0, 
140727848467536, 94494822015184, 140727848468560, 
94494829323263, 15616
send_ready_for_query = false
idle_in_transaction_timeout_enabled = false
idle_session_timeout_enabled = false
__func__ = "PostgresMain"
#12 0x55f14a924a1d in BackendRun (port=0x55f14becad20) at postmaster.c:4504
No locals.
#13 0x55f14a924369 in BackendStartup (port=0x55f14becad20) at 
postmaster.c:4232
bn = 0x55f14bec7e80
pid = 0
__func__ = "BackendStartup"
#14 0x55f14a9207ae in ServerLoop () at postmaster.c:1806
port = 0x55f14becad20
i = 2
rmask = {fds_bits = {128, 0 }}
selres = 1
now = 1663253503
readmask = {fds_bits = {224, 0 }}
nSockets = 8
last_lockfile_recheck_time = 1663253479
last_touch_time = 1663253239
__func__ = "ServerLoop"
#15 0x55f14a91fffd in PostmasterMain (argc=5, argv=0x55f14be9dec0) at 
postmaster.c:1478
opt = -1
status = 0
userDoption = 0x55f14bec1900 "data1"
listen_addr_saved = true
i = 64
output_config_variable = 0x0
__func__ = "PostmasterMain"
#16 0x55f14a81f3fc in main (argc=5, argv=0x55f14be9de

Re: New strategies for freezing, advancing relfrozenxid early

2022-09-15 Thread Peter Geoghegan

On Thu, Sep 15, 2022 at 12:09 AM John Naylor
 wrote:
> On Wed, Sep 14, 2022 at 11:33 PM Peter Geoghegan  wrote:
> > The final number of TIDs doesn't seem like the most interesting
> > information that VM snapshots could provide us when it comes to
> > building the dead_items TID data structure -- the *distribution* of
> > TIDs across heap pages seems much more interesting. The "shape" can be
> > known ahead of time, at least to some degree. It can help with
> > compression, which will reduce cache misses.
>
> My point here was simply that spilling to disk is an admission of
> failure to utilize memory efficiently and thus shouldn't be a selling
> point of VM snapshots. Other selling points could still be valid.

I was trying to explain the goals of this work in a way that was as
accessible as possible. It's not easy to get the high level ideas
across, let alone all of the details.

It's true that I have largely ignored the question of how VM snapshots
will need to spill up until now. There are several reasons for this,
most of which you could probably guess. FWIW it wouldn't be at all
difficult to add *some* reasonable spilling behavior very soon; the
underlying access patterns are highly sequential and predictable, in
the obvious way.

-- 
Peter Geoghegan

Re: failing to build preproc.c on solaris with sun studio

2022-09-15 Thread Andres Freund

Hi,

On 2022-09-14 01:02:39 -0400, Tom Lane wrote:
> John Naylor  writes:
> > If we're going to go to this length, it seems more straightforward to
> > just check the .c/.h files into version control, like every other
> > project that I have such knowledge of.
> 
> Strong -1 on that, because then we'd have to mandate that every
> committer use exactly the same version of bison.

Yea, I don't think we want to go there either.

Greetings,

Andres Freund

Re: Cleaning up historical portability baggage

2022-09-15 Thread Thomas Munro

On Fri, Sep 16, 2022 at 1:55 AM Tom Lane  wrote:
> John Naylor  writes:
> > On Thu, Sep 15, 2022 at 3:11 PM Ibrar Ahmed  wrote:
> >> The patch does not apply successfully; please rebase the patch.
>
> > There's a good reason for that -- the latest one was committed two
> > weeks ago. The status should still be waiting on author, though,
> > namely for:
>
> > On Fri, Aug 26, 2022 at 5:28 AM Thomas Munro  wrote:
> >> Remaining things from this thread:
> >> * removing --disable-thread-safety
> >> * removing those vestigial HAVE_XXX macros (one by one analysis and 
> >> patches)
> >> * making Unix sockets secure for Windows in tests
>
> I imagine we should just close the current CF entry as committed.
> There's no patch in existence for any of those TODO items, and
> I didn't think one was imminent.

I have patches for these, but not quite ready to post.  I'll mark this
entry closed, and make a new one or two when ready, instead of this
one-gigantic-CF-entry-that-goes-on-forever format.

Re: Summary function for pg_buffercache

2022-09-15 Thread Andres Freund

Hi,

On 2022-09-09 17:36:45 +0300, Aleksander Alekseev wrote:
> I suggest we focus on saving the memory first and then think about the
> performance, if necessary.

Personally I think the locks part is at least as important - it's what makes
the production impact higher.

Greetings,

Andres Freund

Re: [RFC] building postgres with meson - v13

2022-09-15 Thread Thomas Munro

On Thu, Sep 15, 2022 at 2:26 PM Andres Freund  wrote:
> - noticed that libpgport.a had and needed a dependency on errcodes.h - that
>   seemed wrong. The dependency is due to src/port/*p{read,write}v?.c including
>   postgres.h - which seems wrong. So I added a patch changing them to include
>   c.h.

Oops.  +1

GCC 12 produces a bunch of warnings by default with meson, and that
turned out to be because the default optimisation level is -O3.
That's a change from the make build, which uses -O2.  Should we set a
default of 2, or is there some meson-way-of-doing-things reason why
not?

Re: [RFC] building postgres with meson - v13

2022-09-15 Thread Andres Freund

Hi,

On 2022-09-16 09:14:20 +1200, Thomas Munro wrote:
> GCC 12 produces a bunch of warnings by default with meson, and that
> turned out to be because the default optimisation level is -O3.
> That's a change from the make build, which uses -O2.  Should we set a
> default of 2, or is there some meson-way-of-doing-things reason why
> not?

We can change the defaults - the only downside is that there's a convenience
setting 'buildtype' (debug, debugoptimized, release, minsize, custom, plain)
that changes multiple settings (optimization level, amount of debugging
information) and that doesn't work as nicely if you change the default
compiler optimization setting.

They made a similar discovery as us, deriving the defaults of settings based
on other settings quickly can become confusing. I think they're looking at how
to make that UI a bit nicer.

I'd prefer to defer fine-tuning the default settings till a bunch of this has
gone in, but I won't insist on that course.

Their default warning flags passed to compilers trigger a bunch of warnings in
our build (irrespective of -O*), so I lowered the warning level. But I think
their set of settings likely is sensible, an we should just disable a bunch of
warnings we don't care about. But I haven't done that for now, to keep the set
of warning flags the same between meson and autoconf.

Greetings,

Andres Freund

Oddities in our pg_attribute_printf usage

2022-09-15 Thread Tom Lane

Following my discovery of missed pg_attribute_printf coverage
in libpq_pipeline (cf2c7a736), I went looking to see if we'd
forgotten that anywhere else.  The coverage seems to be solid
... except at the very root, where we have no such markers for
pg_vsnprintf, pg_vsprintf, pg_vfprintf, pg_vprintf.
I wonder if the option to write "0" for the second argument of
pg_attribute_printf didn't exist then, or we didn't know about it?

I'm not sure that adding those markers does all that much for
us, since (a) it's hard to see how the compiler could check
anything if it doesn't have the actual args list at hand,
and (b) these functions are likely never invoked with constant
format strings anyway.  Nonetheless, it seems pretty silly that
we've faithfully labeled all the derivative printf-alikes but
not these fundamental ones.

I also noted that win32security.c thinks it can stick
pg_attribute_printf into a function definition, whereas all
the rest of our code thinks you have to append that marker
to a separate function declaration.  Does that actually work
with Microsoft-platform compilers?  Or more likely, is the
macro expanding to empty on every compiler we've used with
that code?  Anyway, seems like we ought to make this fall
in line with the convention elsewhere.

So I end with the attached.  Any objections?

regards, tom lane

diff --git a/src/include/port.h b/src/include/port.h
index ce5da0534e..fe48618df7 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -204,13 +204,13 @@ extern unsigned char pg_ascii_tolower(unsigned char ch);
 #undef printf
 #endif
 
-extern int	pg_vsnprintf(char *str, size_t count, const char *fmt, va_list args);
+extern int	pg_vsnprintf(char *str, size_t count, const char *fmt, va_list args) pg_attribute_printf(3, 0);
 extern int	pg_snprintf(char *str, size_t count, const char *fmt,...) pg_attribute_printf(3, 4);
-extern int	pg_vsprintf(char *str, const char *fmt, va_list args);
+extern int	pg_vsprintf(char *str, const char *fmt, va_list args) pg_attribute_printf(2, 0);
 extern int	pg_sprintf(char *str, const char *fmt,...) pg_attribute_printf(2, 3);
-extern int	pg_vfprintf(FILE *stream, const char *fmt, va_list args);
+extern int	pg_vfprintf(FILE *stream, const char *fmt, va_list args) pg_attribute_printf(2, 0);
 extern int	pg_fprintf(FILE *stream, const char *fmt,...) pg_attribute_printf(2, 3);
-extern int	pg_vprintf(const char *fmt, va_list args);
+extern int	pg_vprintf(const char *fmt, va_list args) pg_attribute_printf(1, 0);
 extern int	pg_printf(const char *fmt,...) pg_attribute_printf(1, 2);
 
 /*
diff --git a/src/port/win32security.c b/src/port/win32security.c
index 1235199f2f..281244a34f 100644
--- a/src/port/win32security.c
+++ b/src/port/win32security.c
@@ -17,14 +17,14 @@
 #include "postgres_fe.h"
 #endif
 
+static void log_error(const char *fmt,...) pg_attribute_printf(1, 2);
+
 
 /*
  * Utility wrapper for frontend and backend when reporting an error
  * message.
  */
-static
-pg_attribute_printf(1, 2)
-void
+static void
 log_error(const char *fmt,...)
 {
 	va_list		ap;

Backends stalled in 'startup' state

2022-09-15 Thread Ashwin Agrawal

We recently saw many backends (close to max_connection limit) get stalled
in 'startup' in one of the production environments for Greenplum (fork of
PostgreSQL). Tracing the reason, it was found all the tuples created by
bootstrap (xmin=1) in pg_attribute were at super high block numbers (for
example beyond 30,000). Tracing the reason for the backend startup stall
exactly matched Tom's reasoning in [1]. Stalls became much longer in
presence of sub-transaction overflow or presence of long running
transactions as tuple visibility took longer. The thread ruled out the
possibility of system catalog rows to be present in higher block numbers
instead of in front for pg_attribute.

This thread provides simple reproduction on the latest version of
PostgreSQL and RCA for how bootstrap catalog entries can move to higher
blocks and as a result cause stalls for backend starts. Simple fix to avoid
the issue provided at the end.

The cause is syncscan triggering during VACUUM FULL. VACUUM FULL rewrites
the table by performing the seqscan as well. And
heapam_relation_copy_for_cluster() conveys feel free to use syncscan. Hence
logic to not start from block 0 instead some other block already in cache
is possible and opens the possibility to move the bootstrap tuples to
anywhere else in the table.

--
Repro
--
-- create database to play
drop database if exists test;
create database test;
\c test

-- function just to create many tables to increase pg_attribute size
-- (ideally many column table might do the job more easily)
CREATE OR REPLACE FUNCTION public.f(id integer)
 RETURNS void
 LANGUAGE plpgsql
 STRICT
AS $function$
declare
  sql text;
  i int;
begin
  for i in id..id+ loop
sql='create table if not exists tbl'||i||' (id int)';
execute sql;
  end loop;
end;
$function$;

select f(1);
select f(2);
select f(3);
select f(4);

-- validate pg_attribute size is greater than 1/4 of shared_buffers
-- for syncscan to triggger
show shared_buffers;
select pg_size_pretty(pg_relation_size('pg_attribute'));
select ctid, xmin, attrelid, attname from pg_attribute where xmin = 1 limit
5;

-- perform seq scan of pg_attribute to page past bootstrapped tuples
copy (select * from pg_attribute limit 2000) to '/tmp/p';

-- this will internally use syncscan starting with block after bootstrap
tuples
-- and hence move bootstrap tuples last to higher block numbers
vacuum full pg_attribute;

--
Sample run
--
show shared_buffers;
 shared_buffers

 128MB
(1 row)

select pg_size_pretty(pg_relation_size('pg_attribute'));
 pg_size_pretty

 40 MB
(1 row)

select ctid, xmin, attrelid, attname from pg_attribute where xmin = 1 limit
5;
 ctid  | xmin | attrelid |   attname
---+--+--+--
 (0,1) |1 | 1255 | oid
 (0,2) |1 | 1255 | proname
 (0,3) |1 | 1255 | pronamespace
 (0,4) |1 | 1255 | proowner
 (0,5) |1 | 1255 | prolang
(5 rows)

copy (select * from pg_attribute limit 2000) to '/tmp/p';
COPY 2000
vacuum full pg_attribute;
VACUUM
select ctid, xmin, attrelid, attname from pg_attribute where xmin = 1 limit
5;
   ctid| xmin | attrelid |   attname
---+--+--+--
 (5115,14) |1 | 1255 | oid
 (5115,15) |1 | 1255 | proname
 (5115,16) |1 | 1255 | pronamespace
 (5115,17) |1 | 1255 | proowner
 (5115,18) |1 | 1255 | prolang
(5 rows)


Note:
-- used logic causing the problem to fix it as well on the system :-)
-- scan till block where bootstrap tuples are located
select ctid, xmin, attrelid, attname from pg_attribute where xmin = 1 limit
5;
-- now due to syncscan triggering it will pick the blocks with bootstrap
tuples first and help to bring them back to front
vacuum full pg_attribute;

--
Patch to avoid the problem:
--
diff --git a/src/backend/access/heap/heapam_handler.c
b/src/backend/access/heap/heapam_handler.c
index a3414a76e8..4c031914a3 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -757,7 +757,17 @@ heapam_relation_copy_for_cluster(Relation OldHeap,
Relation NewHeap,
pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,

 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);

-   tableScan = table_beginscan(OldHeap, SnapshotAny, 0,
(ScanKey) NULL);
+   /*
+* For system catalog tables avoid syncscan, so that scan
always
+* starts from block 0 during rewrite and helps retain
bootstrap
+* tuples in initial pages only. If using syncscan, then
bootstrap
+

Re: Assertion failure in WaitForWALToBecomeAvailable state machine

2022-09-15 Thread Michael Paquier

On Thu, Sep 15, 2022 at 06:58:43AM -0700, Noah Misch wrote:
> Pushed that way.

Thanks for taking care of it, Noah.
--
Michael


signature.asc
Description: PGP signature

Re: Kerberos delegation support in libpq and postgres_fdw

2022-09-15 Thread Jacob Champion

On Thu, Jul 7, 2022 at 4:24 PM Jacob Champion  wrote:
> So my question is this: does substituting my credentials for the admin's
> credentials let me weaken or break the transport encryption on the
> backend connection, and grab the password that I'm not supposed to have
> access to as a front-end client?

With some further research: yes, it does.

If a DBA is using a GSS encrypted tunnel to communicate to a foreign
server, accepting delegation by default means that clients will be
able to break that backend encryption at will, because the keys in use
will be under their control.

> Maybe there's some ephemeral exchange going on that makes it too hard to
> attack in practice, or some other mitigations.

There is no forward secrecy, ephemeral exchange, etc. to mitigate this [1]:

   The Kerberos protocol in its basic form does not provide perfect
   forward secrecy for communications.  If traffic has been recorded by
   an eavesdropper, then messages encrypted using the KRB_PRIV message,
   or messages encrypted using application-specific encryption under
   keys exchanged using Kerberos can be decrypted if the user's,
   application server's, or KDC's key is subsequently discovered.

So the client can decrypt backend communications that make use of its
delegated key material. (This also means that gssencmode is a lot
weaker than I expected.)

> I'm trying to avoid writing Wireshark dissector
> code, but maybe that'd be useful either way...

I did end up filling out the existing PGSQL dissector so that it could
decrypt GSSAPI exchanges (with the use of a keytab, that is). If you'd
like to give it a try, the patch, based on Wireshark 3.7.1, is
attached. Note the GPLv2 license. It isn't correct code yet, because I
didn't understand how packet reassembly worked in Wireshark when I
started writing the code, so really large GSSAPI messages that are
split across multiple TCP packets will confuse the dissector. But it's
enough to prove the concept.

To see this in action, set up an FDW connection that uses gssencmode
(so the server in the middle will need its own Kerberos credentials).
Capture traffic starting from the kinit through the query on the
foreign table. Export the client's key material into a keytab, and set
up Wireshark to use that keytab for decryption. When credential
forwarding is *not* in use, Wireshark will be able to decrypt the
initial client connection, but it won't be able to see anything inside
the foreign server connection. When credential forwarding is in use,
Wireshark will be able to decrypt both connections.

Thanks,
--Jacob

[1] https://www.rfc-editor.org/rfc/rfc4120
From 0cf3153a6044edae42037e45aee0fc88352f Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Wed, 14 Sep 2022 13:08:22 -0700
Subject: [PATCH] pgsql: decrypt GSS-encrypted channels and tokens if possible

License: GPLv2

Add dissection for a conversation that starts with a GSSAPI encryption
request, and pass those packets along to the gssapi dissector. This
allows parts of the conversation to be decrypted if Wireshark has been
set up with a KRB5 keytab.

Additionally, break apart the PGSQL_AUTH_GSSAPI_SSPI_DATA case into
separate GSSAPI and SSPI cases, and when we're using GSSAPI, dissect the
authentication tokens that are passed between the client and server.

This incidentally fixes an off-by-four bug in the
PGSQL_AUTH_TYPE_GSSAPI_SSPI_CONTINUE case; the offset was not being
updated correctly before.

The new code copies the dissection strategy of multipart, and adds a new
last_nongss_frame marker similar to the STARTTLS handling code's
last_nontls_frame.
---
 epan/dissectors/packet-pgsql.c | 121 +++--
 1 file changed, 116 insertions(+), 5 deletions(-)

diff --git a/epan/dissectors/packet-pgsql.c b/epan/dissectors/packet-pgsql.c
index c935eec1f8..7efe1c003f 100644
--- a/epan/dissectors/packet-pgsql.c
+++ b/epan/dissectors/packet-pgsql.c
@@ -14,6 +14,8 @@

 #include 

+#include "packet-dcerpc.h"
+#include "packet-gssapi.h"
 #include "packet-tls-utils.h"
 #include "packet-tcp.h"

@@ -22,6 +24,7 @@ void proto_reg_handoff_pgsql(void);

 static dissector_handle_t pgsql_handle;
 static dissector_handle_t tls_handle;
+static dissector_handle_t gssapi_handle;

 static int proto_pgsql = -1;
 static int hf_frontend = -1;
@@ -80,10 +83,13 @@ static int hf_constraint_name = -1;
 static int hf_file = -1;
 static int hf_line = -1;
 static int hf_routine = -1;
+static int hf_gssapi_length = -1;

 static gint ett_pgsql = -1;
 static gint ett_values = -1;

+static expert_field ei_gssapi_decryption_not_possible = EI_INIT;
+
 #define PGSQL_PORT 5432
 static gboolean pgsql_desegment = TRUE;
 static gboolean first_message = TRUE;
@@ -92,11 +98,14 @@ typedef enum {
   PGSQL_AUTH_STATE_NONE,   /*  No authentication seen or used */
   PGSQL_AUTH_SASL_REQUESTED,   /* Server sends SASL auth request with 
supported SASL mechanisms*/
   PGSQL_AUTH_SASL_CONTINUE,/* Server and/or

Re: A question about wording in messages

2022-09-15 Thread Thomas Munro

On Wed, Sep 14, 2022 at 2:38 PM Tom Lane  wrote:
> Kyotaro Horiguchi  writes:
> > I saw the following message recently modified.
> >> This controls the maximum distance we can read ahead in the WAL to 
> >> prefetch referenced data blocks.
> > Maybe the "we" means "PostgreSQL program and you" but I see it
> > somewhat out of place.
>
> +1, I saw that today and thought it was outside our usual style.
> The whole thing is awfully verbose for a GUC description, too.
> Maybe
>
> "Maximum distance to read ahead in WAL to prefetch data blocks."

+1

For "we", I must have been distracted by code comment style.  For the
extra useless verbiage, it's common for GUC description to begin "This
control/affects/blah" like that, but I agree it's useless noise.

For the other cases, Amit's suggestion of 'server' seems sensible to me.

Re: remove_useless_groupby_columns is too enthusiastic

2022-09-15 Thread David Rowley

On Wed, 13 Jul 2022 at 05:31, Tom Lane  wrote:
> I tried the attached quick-hack patch that just prevents
> remove_useless_groupby_columns from removing anything that
> appears in ORDER BY.  That successfully fixes the complained-of
> case, and it doesn't change any existing regression test results.
> I'm not sure whether there are any cases that it makes significantly
> worse.

What I am concerned about with this patch is that for Hash Aggregate,
we'll always want to remove the useless group by clauses to minimise
the amount of hashing and equal comparisons that we must do. So the
patch makes that case worse in favour of possibly making Group
Aggregate cases better. I don't think that's going to be a great
trade-off as Hash Aggregate is probably more commonly used than Group
Aggregate, especially so when the number of rows in each group is
large and there is no LIMIT clause to favour a cheap startup plan.

Maybe the fix for this should be:

1. Add a new bool "isredundant_groupby" field to SortGroupClause,
2. Rename remove_useless_groupby_columns() to
mark_redundant_groupby_columns() and have it set the
isredundant_groupby instead of removing items from the list,
3. Adjust get_useful_group_keys_orderings() so that it returns
additional PathKeyInfo with the isredundant_groupby items removed,
4. Adjust the code in add_paths_to_grouping_rel() so that it always
uses the minimal set of SortGroupClauses for Hash Aggregate paths,

Perhaps a valid concern with the above is all the additional Paths
we'd consider if we did #3. But maybe that's not so bad as that's not
a multiplicate problem like it would be adding additional Paths to
base and joinrels.

We'd probably still want to keep preprocess_groupclause() as
get_useful_group_keys_orderings() is not exhaustive in its search.

David

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-09-15 Thread Peter Smith

Here are some review comments for the latest v10 patch.

(Mostly these are just nitpick wording/comments etc)

==

1. Commit message

It is often not feasible to use `REPLICA IDENTITY FULL` on the publication
because it leads to full table scan per tuple change on the subscription.
This makes `REPLICA IDENTITY FULL` impracticable -- probably other than
some small number of use cases.

~

The "often not feasible" part seems repeated by the "impracticable" part.

SUGGESTION
Using `REPLICA IDENTITY FULL` on the publication leads to a full table
scan per tuple change on the subscription. This makes `REPLICA
IDENTITY FULL` impracticable -- probably other than some small number
of use cases.

~~~

2.

The Majority of the logic on the subscriber side already exists in
the code.

"Majority" -> "majority"

~~~

3.

The ones familiar
with this part of the code could realize that the sequential scan
code on the subscriber already implements the `tuples_equal()`
function.

SUGGESTION
Anyone familiar with this part of the code might recognize that...

~~~

4.

In short, the changes on the subscriber is mostly
combining parts of (unique) index scan and sequential scan codes.

"is mostly" -> "are mostly"

~~~

5.

>From the performance point of view, there are few things to note.

"are few" -> "are a few"

==

6. src/backend/executor/execReplication.c - build_replindex_scan_key

+static int
 build_replindex_scan_key(ScanKey skey, Relation rel, Relation idxrel,
  TupleTableSlot *searchslot)
 {
- int attoff;
+ int index_attoff;
+ int scankey_attoff = 0;

Should it be called 'skey_attoff' for consistency with the param 'skey'?

~~~

7.

  Oid operator;
  Oid opfamily;
  RegProcedure regop;
- int pkattno = attoff + 1;
- int mainattno = indkey->values[attoff];
- Oid optype = get_opclass_input_type(opclass->values[attoff]);
+ int table_attno = indkey->values[index_attoff];
+ Oid optype = get_opclass_input_type(opclass->values[index_attoff]);

Maybe the 'optype' should be adjacent to the other Oid opXXX
declarations just to keep them all together?

~~~

8.

+ if (!AttributeNumberIsValid(table_attno))
+ {
+ IndexInfo *indexInfo PG_USED_FOR_ASSERTS_ONLY;
+
+ /*
+ * There are two cases to consider. First, if the index is a primary or
+ * unique key, we cannot have any indexes with expressions. So, at this
+ * point we are sure that the index we are dealing with is not these.
+ */
+ Assert(RelationGetReplicaIndex(rel) != RelationGetRelid(idxrel) &&
+RelationGetPrimaryKeyIndex(rel) != RelationGetRelid(idxrel));
+
+ /*
+ * At this point, we are also sure that the index is not consisting
+ * of only expressions.
+ */
+#ifdef USE_ASSERT_CHECKING
+ indexInfo = BuildIndexInfo(idxrel);
+ Assert(!IndexOnlyOnExpression(indexInfo));
+#endif

I was a bit confused by the comment. IIUC the code has already called
the FilterOutNotSuitablePathsForReplIdentFull some point prior so all
the unwanted indexes are already filtered out. Therefore these
assertions are just for no reason, other than sanity checking that
fact, right? If my understand is correct perhaps a simpler single
comment is possible:

SUGGESTION (or something like this)
This attribute is an expression, however
FilterOutNotSuitablePathsForReplIdentFull was called earlier during
[...] and the indexes comprising only expressions have already been
eliminated. We sanity check this now. Furthermore, because primary key
and unique key indexes can't include expressions we also sanity check
the index is neither of those kinds.

~~~

9.
- return hasnulls;
+ /* We should always use at least one attribute for the index scan */
+ Assert (scankey_attoff > 0);

SUGGESTION
There should always be at least one attribute for the index scan.

~~~

10. src/backend/executor/execReplication.c - RelationFindReplTupleByIndex

ScanKeyData skey[INDEX_MAX_KEYS];
IndexScanDesc scan;
SnapshotData snap;
TransactionId xwait;
Relation idxrel;
bool found;
TypeCacheEntry **eq = NULL; /* only used when the index is not unique */
bool indisunique;
int scankey_attoff;

10a.
Should 'scankey_attoff' be called 'skey_attoff' for consistency with
the 'skey' array?

~

10b.
Also, it might be tidier to declare the 'skey_attoff' adjacent to the 'skey'.

==

11. src/backend/replication/logical/relation.c

For LogicalRepPartMap, I was wondering if it should keep a small
comment to xref back to the long comment which was moved to
logicalreplication.h

e.g.
/* Refer to the LogicalRepPartMapEntry comment in logicalrelation.h */

~~~

12. src/backend/replication/logical/relation.c - logicalrep_partition_open

+ /*
+ * Finding a usable index is an infrequent task. It occurs when
+ * an operation is first performed on the relation, or after
+ * invalidation of the relation cache entry (e.g., such as ANALYZE).
+ */
+ entry->usableIndexOid = FindLogicalRepUsableIndex(entry->localrel, remoterel);
  entry->localrelvalid = true;

Should there be a blank line between those assignments? (just for
consistency with the other code

Re: why can't a table be part of the same publication as its schema

2022-09-15 Thread Peter Smith

On Thu, Sep 15, 2022 at 10:57 PM houzj.f...@fujitsu.com
 wrote:
>
...
> Attach the new version patch which added suggested restriction for column list
> and merged Vignesh's patch.
>
> Some other document might need to be updated. I will update them soon.
>
> Best regards,
> Hou zj

Hi Hou-san.

FYI, I found your v3 patch apply was broken due to a very recent changes pushed:

e.g.
error: patch failed: doc/src/sgml/logical-replication.sgml:1142

~~

PSA a rebase of your same patch (I left the version number the same)

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v3-0001-Allow-creation-of-publication-with-schema-and-tab.patch
Description: Binary data

Re: walmethods.c/h are doing some strange things

2022-09-15 Thread Kyotaro Horiguchi

At Fri, 2 Sep 2022 11:52:38 -0400, Robert Haas  wrote in 
> that type that can ever exist, and the pointer to that object is
> stored in a global variable managed by walmethods.c. So whereas in
> other cases we give you the object and then a way to get the
> corresponding set of callbacks, here we only give you the callbacks,
> and we therefore have to impose the artificial restriction that there
> can only ever be one object.

Makes sense to me.

> There is no real benefit in having the same variable in two different
> structs and having to access it via a callback when we could just put
> it into a common struct and access it directly. There's also a
> compression_algorithm() method which has exactly the same issue,
..
> though that is an overall property of the WalWriteMethod rather than a
> per-Walfile property. There's also a getlasterr callback which is
> basically just duplicate code across the two implementations; we could
> unify that code. There's also a global variable current_walfile_name[]
> in receivelog.c which only needs to exist because the file name is
> inconveniently hidden inside the WalWriteMethod abstraction layer; we
> can just make it visible.

Sounds sensible.

> Attached are a couple of hastily-written patches implementing this.

> patches is that the existing code isn't very clear about whether
> "Walfile" is supposed to be an abstraction for a pointer to the
> implementation-specific struct, or the struct itself. From looking at
> walmethods.h, you'd think it's a pointer to the struct, because we
> declare typedef void *Walfile. walmethods.c agrees, but receivelog.c
> takes a different view, declaring all of its variables as type
> "Walfile *". This doesn't cause a compiler error because void * is
> just as interchangeable with void ** as it is with DirectoryMethodFile
> * or TarMethodFile *, but I think it is clearly a mistake, and the
> approach I'm proposing here makes such mistakes more difficult to
> make.

+1.  I remember I thought the same thing when I was faced with the
code before.

> Aside from the stuff that I am complaining about here which is mostly
> stylistic, I think that the division of labor between receivelog.c and
> walmethods.c is questionable in a number of ways. There are things
> which are specific to one walmethod or the other that are handled in
> the common code (receivelog.c) rather than the type-specific code
> (walmethod.c), and in general it feels like receivelog.c knows way too
> much about what is really happening beneath the abstraction layer that
> walmethods.c supposedly creates. This comment is one of the clearer
> examples of this:
> 
>  /*
>   * When streaming to files, if an existing file exists we verify that 
> it's
>   * either empty (just created), or a complete WalSegSz segment (in which
>   * case it has been created and padded). Anything else indicates a 
> corrupt
>   * file. Compressed files have no need for padding, so just ignore this
>   * case.
>   *
>   * When streaming to tar, no file with this name will exist before, so we
>   * never have to verify a size.
>   */
> 
> There's nothing generic here. We're not describing an algorithm that
> could be used with any walmethod that might exist now or in the
> future. We're describing something that will produce the right result
> given the two walmethods we actually have and the actual behavior of
> the callbacks of each one. I don't really know what to do about this
> part of the problem; these pieces of code are deeply intertwined in
> complex ways that don't seem simple to untangle. Maybe I'll have a

I agree to the view. That part seems to be a part of
open_for_write()'s body functions. But, I'm not sure how we untangle
them at a glance, too.  In the first place, I'm not sure why we need
to do that despite the file going to be overwritten from the
beginning, though..

> better idea later, or perhaps someone else will. For now, I'd like to
> get some thoughts on the attached refactoring patches that deal with
> some more superficial aspects of the problem.

regares.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Assertion failure in WaitForWALToBecomeAvailable state machine

2022-09-15 Thread Kyotaro Horiguchi

At Fri, 16 Sep 2022 09:05:46 +0900, Michael Paquier  wrote 
in 
> On Thu, Sep 15, 2022 at 06:58:43AM -0700, Noah Misch wrote:
> > Pushed that way.
> 
> Thanks for taking care of it, Noah.

+1. Thanks.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

warning about missing format string annotations

2022-09-15 Thread Andres Freund

Hi,

On 2022-09-15 21:18:03 +, Tom Lane wrote:
> We're so used to having the compiler check this stuff for us that
> a printf-like function without pg_attribute_printf is a land mine.
> I wonder if there is a way to detect such omissions.

gcc has -Wsuggest-attribute=format - but unfortunately its heuristics appear
to be too simplistic to catch many omission. It doesn't catch this one, for
example. But it may still be worth trying out in a few more cases.

   -Wsuggest-attribute=format
   -Wmissing-format-attribute
   Warn about function pointers that might be candidates for 
"format" attributes.  Note these are only possible candidates, not absolute 
ones.
   GCC guesses that function pointers with "format" attributes that 
are used in assignment, initialization, parameter passing or return
   statements should have a corresponding "format" attribute in the 
resulting type.  I.e. the left-hand side of the assignment or
   initialization, the type of the parameter variable, or the 
return type of the containing function respectively should also have a "format"
   attribute to avoid the warning.

   GCC also warns about function definitions that might be 
candidates for "format" attributes.  Again, these are only possible candidates. 
 GCC
   guesses that "format" attributes might be appropriate for any 
function that calls a function like "vprintf" or "vscanf", but this might not
   always be the case, and some functions for which "format" 
attributes are appropriate may not be detected.

Greetings,

Andres Freund

Re: Improve description of XLOG_RUNNING_XACTS

2022-09-15 Thread Kyotaro Horiguchi

At Thu, 15 Sep 2022 17:39:17 +0530, Amit Kapila  wrote 
in 
> I see your point but I am still worried due to the concern raised by
> Horiguchi-San earlier in this thread that the total number could be as
> large as TOTAL_MAX_CACHED_SUBXIDS. I think if we want to include
> information only on the number of subxacts then that is clearly an
> improvement without any disadvantage.
> 
> Does anyone else have an opinion on this matter?

The doesn't seem to work for Sawada-san's case, but I'm fine with
that:p

Putting an arbitrary upper-bound on the number of subxids to print
might work? I'm not sure how we can determine the upper-bound, though.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: pg_basebackup's --gzip switch misbehaves

2022-09-15 Thread Michael Paquier

On Wed, Sep 14, 2022 at 10:26:42AM +0200, Daniel Gustafsson wrote:
> Maybe the creation of $tempdir should take PG_TEST_NOCLEAN into account and 
> not
> register CLEANUP if set?

Agreed.  It sounds like a good idea to me to extend that to temporary
paths, and then check those rmtree() calls where the tests would not
retain too much data for small-ish machines.

By the way, should we document PG_TEST_TIMEOUT_DEFAULT and
PG_TEST_NOCLEAN not only in src/test/perl/README but also doc/?.  We
provide something in the docs about PROVE_FLAGS and PROVE_TESTS, for
example. 
--
Michael


signature.asc
Description: PGP signature

Re: ICU for global collation

2022-09-15 Thread Michael Paquier

On Wed, Sep 14, 2022 at 05:19:34PM +0300, Marina Polyakova wrote:
> I was surprised that it is allowed to create clusters/databases where the
> default ICU collations do not actually work due to unsupported encodings:
> 
> $ initdb --encoding SQL_ASCII --locale-provider icu --icu-locale en-US -D
> data &&
> pg_ctl -D data -l logfile start &&
> psql -c "SELECT 'a' < 'b'" template1
> ...
> waiting for server to start done
> server started
> ERROR:  encoding "SQL_ASCII" not supported by ICU
> 
> $ createdb --encoding SQL_ASCII --locale-provider icu --icu-locale en-US
> --template template0 mydb &&
> psql -c "SELECT 'a' < 'b'" mydb
> ERROR:  encoding "SQL_ASCII" not supported by ICU
> 
> The patch diff_check_icu_encoding.patch prohibits the creation of such
> objects...

Agreed that it is a bit confusing to get this type of error after the
database has been created when querying it due to a mix of unsupported
options.  Peter?
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH]Feature improvement for MERGE tab completion

2022-09-15 Thread bt22kawamotok


Thanks for updating.

+   COMPLETE_WITH("UPDATE", "DELETE", "DO NOTHING");

"UPDATE" is always followed by "SET",  so why not complement it with
"UPDATE SET"?


Thanks for reviewing.
That's a good idea!
I create new patch v7.

Regards,

Kotaro Kawamotodiff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 62a39779b9..f899c6ab26 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -4063,23 +4063,25 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_mergetargets);
 	else if (TailMatches("MERGE", "INTO", MatchAny))
 		COMPLETE_WITH("USING", "AS");
-	else if (TailMatches("MERGE", "INTO", MatchAny, "USING"))
-		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables);
 	/* with [AS] alias */
-	else if (TailMatches("MERGE", "INTO", MatchAny, "AS", MatchAny))
-		COMPLETE_WITH("USING");
-	else if (TailMatches("MERGE", "INTO", MatchAny, MatchAny))
+	else if (TailMatches("MERGE", "INTO", MatchAny, "AS", MatchAny) ||
+			 TailMatches("MERGE", "INTO", MatchAny, MatchAnyExcept("AS")))
 		COMPLETE_WITH("USING");
-	else if (TailMatches("MERGE", "INTO", MatchAny, "AS", MatchAny, "USING"))
-		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables);
-	else if (TailMatches("MERGE", "INTO", MatchAny, MatchAny, "USING"))
-		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables);
+	else if (TailMatches("MERGE", "INTO", MatchAny, "USING") ||
+			 TailMatches("MERGE", "INTO", MatchAny, "AS", MatchAny, "USING") ||
+			 TailMatches("MERGE", "INTO", MatchAny, MatchAny, "USING"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_selectables);
+	else if (TailMatches("MERGE", "INTO", MatchAny, "USING", MatchAny) ||
+			 TailMatches("MERGE", "INTO", MatchAny, MatchAny, "USING", MatchAny) ||
+			 TailMatches("MERGE", "INTO", MatchAny, "AS", MatchAny, "USING", MatchAny))
+		COMPLETE_WITH("AS", "ON");
 	/* ON */
-	else if (TailMatches("MERGE", "INTO", MatchAny, "USING", MatchAny))
-		COMPLETE_WITH("ON");
-	else if (TailMatches("INTO", MatchAny, "AS", MatchAny, "USING", MatchAny, "AS", MatchAny))
-		COMPLETE_WITH("ON");
-	else if (TailMatches("INTO", MatchAny, MatchAny, "USING", MatchAny, MatchAny))
+	else if (TailMatches("MERGE", "INTO", MatchAny, "USING", MatchAny, MatchAnyExcept("ON|AS")) ||
+			 TailMatches("MERGE", "INTO", MatchAny, "USING", MatchAny, "AS", MatchAny) ||
+			 TailMatches("MERGE", "INTO", MatchAny, MatchAny, "USING", MatchAny, MatchAnyExcept("ON|AS")) ||
+			 TailMatches("MERGE", "INTO", MatchAny, MatchAny, "USING", MatchAny, "AS", MatchAny) ||
+			 TailMatches("MERGE", "INTO", MatchAny, "AS", MatchAny, "USING", MatchAny, MatchAnyExcept("ON|AS")) ||
+			 TailMatches("MERGE", "INTO", MatchAny, "AS", MatchAny, "USING", MatchAny, "AS", MatchAny))
 		COMPLETE_WITH("ON");
 	/* ON condition */
 	else if (TailMatches("INTO", MatchAny, "USING", MatchAny, "ON"))
@@ -4089,18 +4091,25 @@ psql_completion(const char *text, int start, int end)
 	else if (TailMatches("INTO", MatchAny, MatchAny, "USING", MatchAny, MatchAny, "ON"))
 		COMPLETE_WITH_ATTR(prev6_wd);
 	/* WHEN [NOT] MATCHED */
-	else if (TailMatches("USING", MatchAny, "ON", MatchAny))
-		COMPLETE_WITH("WHEN MATCHED", "WHEN NOT MATCHED");
-	else if (TailMatches("USING", MatchAny, "AS", MatchAny, "ON", MatchAny))
+	else if (TailMatches("USING", MatchAny, "ON", MatchAny) ||
+			 TailMatches("USING", MatchAny, "ON", MatchAny, MatchAnyExcept("WHEN"), MatchAnyExcept("WHEN")) ||
+			 TailMatches("USING", MatchAny, "AS", MatchAny, "ON", MatchAny) ||
+			 TailMatches("USING", MatchAny, "AS", MatchAny, "ON", MatchAny, MatchAnyExcept("WHEN"), MatchAnyExcept("WHEN")) ||
+			 TailMatches("USING", MatchAny, MatchAny, "ON", MatchAny) ||
+			 TailMatches("USING", MatchAny, MatchAny, "ON", MatchAny, MatchAnyExcept("WHEN"), MatchAnyExcept("WHEN")))
 		COMPLETE_WITH("WHEN MATCHED", "WHEN NOT MATCHED");
-	else if (TailMatches("USING", MatchAny, MatchAny, "ON", MatchAny))
-		COMPLETE_WITH("WHEN MATCHED", "WHEN NOT MATCHED");
-	else if (TailMatches("WHEN", "MATCHED"))
-		COMPLETE_WITH("THEN", "AND");
-	else if (TailMatches("WHEN", "NOT", "MATCHED"))
+	else if (TailMatches("USING", MatchAny, "ON", MatchAny, "WHEN") ||
+			 TailMatches("USING", MatchAny, "ON", MatchAny, MatchAny, MatchAny, "WHEN") ||
+			 TailMatches("USING", MatchAny, "AS", MatchAny, "ON", MatchAny, "WHEN") ||
+			 TailMatches("USING", MatchAny, "AS", MatchAny, "ON", MatchAny, MatchAny, MatchAny, "WHEN") ||
+			 TailMatches("USING", MatchAny, MatchAny, "ON", MatchAny, "WHEN") ||
+			 TailMatches("USING", MatchAny, MatchAny, "ON", MatchAny, MatchAny, MatchAny, "WHEN"))
+		COMPLETE_WITH("MATCHED", "NOT MATCHED");
+	else if (TailMatches("WHEN", "MATCHED") ||
+			 TailMatches("WHEN", "NOT", "MATCHED"))
 		COMPLETE_WITH("THEN", "AND");
 	else if (TailMatches("WHEN", "MATCHED", "THEN"))
-		COMPLETE_WITH("UPDATE", "DELETE");
+		COMPLETE_WITH("UPDATE SET", "DELETE", "DO NOTHING");
 	else if (TailMatches("WHEN", "NOT", "MATCHED", "THEN")

Re: Inconsistencies in error messages

2022-09-15 Thread John Naylor

On Wed, Sep 14, 2022 at 5:25 PM John Naylor
 wrote:
> Will commit this way unless there are objections.

I forgot to mention yesterday, but this is done.

-- 
John Naylor
EDB: http://www.enterprisedb.com

Re: Typo in xact.c

2022-09-15 Thread Kyotaro Horiguchi

At Thu, 15 Sep 2022 22:38:01 +0800, Japin Li  wrote in 
> 
> Hi hacker,
> 
> Recently, I find there might be a typo in xact.c comments.  The comments
> say "PG_PROC", however, it actually means "PGPROC" structure.  Since we
> have pg_proc catalog, and use PG_PROC to reference the catalog [1], so,
> we should use PGPROC to reference the structure.  Any thoughts?
> 
> [1] src/include/nodes/primnodes.h

The patch seems to me covering all occurances of PG_PROC as PGPROC.

I found several uses of PG_PROC as (pg_catalog.)pg_proc, which is
quite confusing, too..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Background writer and checkpointer in crash recovery

2022-09-15 Thread Justin Pryzby

I don't know that this warrants an Opened Item, but I think some fix
ought to be applied to v15, whether that happens this week or next
month.

Re: why can't a table be part of the same publication as its schema

2022-09-15 Thread Amit Kapila

On Thu, Sep 15, 2022 at 8:18 AM houzj.f...@fujitsu.com
 wrote:
>
> On Thursday, September 15, 2022 3:37 AM Peter Eisentraut 
>  wrote:
> > > Another option could be just ingore the column list if table's schema
> > > is also part of publication. But it seems slightly inconsistent with
> > > the rule that we disallow using different column list for a table.
> >
> > Ignoring things doesn't seem like a good idea.
> >
> > A solution might be to disallow adding any schemas to a publication if 
> > column
> > lists on a table are specified.
>
> Thanks for the suggestion. If I understand correctly, you mean we can disallow
> publishing a table with column list and any schema(a schema that the table
> might not be part of) in the same publication[1].
>
> something like--
> [1]CREATE PUBLICATION pub FOR TABLE public.test(a), ALL TABLES IN SCHEMA s2;
> ERROR: "cannot add schema to publication when column list is used in the 
> published table"
> --
>
> Personally, it looks acceptable to me as user can anyway achieve the same
> purpose by creating serval publications and combine it and we can save the
> restriction at ALTER TABLE SET SCHEMA. Although it restricts some cases.
>

Yeah, I agree that it restricts more cases for how different
combinations can be specified for a publication but OTOH it helps to
uplift restriction in ALTER TABLE ... SET SCHEMA which seems like a
good trade-off.

> I will post a top-up patch about this soon.
>
>
> About the row filter handling, maybe we don't need to restrict row filter like
> above ? Because the rule is to simply merge the row filter with 'OR' among
> publications, so it seems we could ignore the row filter in the publication 
> when
> the table's schema is also published in the same publication(which means no 
> filter).
>

Yeah, this is what we are doing internally when combining multiple
publications but let me explain with an example the case of a single
publication so that if anybody has any objections to it, we can
discuss the same.

Case-1: When row filter is specified *without* ALL TABLES IN SCHEMA clause
postgres=# create table t1(c1 int, c2 int, c3 int);
CREATE TABLE
postgres=# create publication pub1 for table t1 where (c1 > 10);
CREATE PUBLICATION
postgres=# select pubname, schemaname, tablename, rowfilter from
pg_publication_tables;
 pubname | schemaname | tablename | rowfilter
-++---+---
 pub1| public | t1| (c1 > 10)
(1 row)

Case-2: When row filter is specified *with* ALL TABLES IN SCHEMA clause
postgres=# create schema s1;
CREATE SCHEMA
postgres=# create table s1.t2(c1 int, c2 int, c3 int);
CREATE TABLE
postgres=# create publication pub2 for table s1.t2 where (c1 > 10),
all tables in schema s1;
CREATE PUBLICATION
postgres=# select pubname, schemaname, tablename, rowfilter from
pg_publication_tables;
 pubname | schemaname | tablename | rowfilter
-++---+---
 pub1| public | t1| (c1 > 10)
 pub2| s1 | t2|
(2 rows)

So, for case-2, the rowfilter is not considered. Note, case-2 was not
possible before the patch which is discussed here and after the patch,
the behavior will be the same as we have it before when we combine
publications.

-- 
With Regards,
Amit Kapila.

Re: Switching XLog source from archive to streaming when primary available

2022-09-15 Thread Bharath Rupireddy

On Thu, Sep 15, 2022 at 1:52 PM Kyotaro Horiguchi
 wrote:
>
> At Thu, 15 Sep 2022 10:28:12 +0530, Bharath Rupireddy 
>  wrote in
> > I'm attaching the v6 patch that's rebased on to the latest HEAD.
> > Please consider this for review.
>
> Thaks for the new version!
>
> +#define StreamingReplRetryEnabled() \
> +   (streaming_replication_retry_interval > 0 && \
> +StandbyMode && \
> +currentSource == XLOG_FROM_ARCHIVE)
>
> It seems to me a bit too complex..

I don't think so, it just tells whether a standby is allowed to switch
source to stream from archive.

> +   /* Save the timestamp at which we're switching to 
> archive. */
> +   if (StreamingReplRetryEnabled())
> +   switched_to_archive_at = 
> GetCurrentTimestamp();
>
> Anyway we are going to open a file just after this so
> GetCurrentTimestamp() cannot cause a perceptible degradation.
> Coulnd't we do that unconditionally, to get rid of the macro?

Do we really need to do it unconditionally? I don't think so. And, we
can't get rid of the macro, as we need to check for the current
source, GUC and standby mode. When this feature is disabled, it
mustn't execute any extra code IMO.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Typo in xact.c

2022-09-15 Thread John Naylor

On Fri, Sep 16, 2022 at 10:11 AM Kyotaro Horiguchi
 wrote:
>
> At Thu, 15 Sep 2022 22:38:01 +0800, Japin Li  wrote in
> >
> > Hi hacker,
> >
> > Recently, I find there might be a typo in xact.c comments.  The comments
> > say "PG_PROC", however, it actually means "PGPROC" structure.  Since we
> > have pg_proc catalog, and use PG_PROC to reference the catalog [1], so,
> > we should use PGPROC to reference the structure.  Any thoughts?
> >
> > [1] src/include/nodes/primnodes.h
>
> The patch seems to me covering all occurances of PG_PROC as PGPROC.

+1 since this hinders grep-ability.

> I found several uses of PG_PROC as (pg_catalog.)pg_proc, which is
> quite confusing, too..

It's pretty obvious to me what that refers to in primnodes.h, although
the capitalization of (some, but not all) catalog names in comments in
that file is a bit strange. Maybe not worth changing there.

-- 
John Naylor
EDB: http://www.enterprisedb.com

Re: Typo in xact.c

2022-09-15 Thread Japin Li


On Fri, 16 Sep 2022 at 11:51, John Naylor  wrote:
> On Fri, Sep 16, 2022 at 10:11 AM Kyotaro Horiguchi
>  wrote:
>>
>> At Thu, 15 Sep 2022 22:38:01 +0800, Japin Li  wrote in
>> >
>> > Hi hacker,
>> >
>> > Recently, I find there might be a typo in xact.c comments.  The comments
>> > say "PG_PROC", however, it actually means "PGPROC" structure.  Since we
>> > have pg_proc catalog, and use PG_PROC to reference the catalog [1], so,
>> > we should use PGPROC to reference the structure.  Any thoughts?
>> >
>> > [1] src/include/nodes/primnodes.h
>>
>> The patch seems to me covering all occurances of PG_PROC as PGPROC.
>
> +1 since this hinders grep-ability.
>
>> I found several uses of PG_PROC as (pg_catalog.)pg_proc, which is
>> quite confusing, too..
>
> It's pretty obvious to me what that refers to in primnodes.h, although
> the capitalization of (some, but not all) catalog names in comments in
> that file is a bit strange. Maybe not worth changing there.

Thanks for the review.  I found for system catalog, we often use lower case.
Here attached a new patch to fix it.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

>From 1b1fd4742efe052dbca46006e72411b33a8766cd Mon Sep 17 00:00:00 2001
From: Japin Li 
Date: Fri, 16 Sep 2022 11:50:38 +0800
Subject: [PATCH v1 1/2] Use lower case to reference the system catalog

---
 src/include/nodes/primnodes.h | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 40661334bb..48827d291a 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -595,8 +595,8 @@ typedef enum CoercionForm
 typedef struct FuncExpr
 {
 	Expr		xpr;
-	Oid			funcid;			/* PG_PROC OID of the function */
-	Oid			funcresulttype; /* PG_TYPE OID of result value */
+	Oid			funcid;			/* pg_proc OID of the function */
+	Oid			funcresulttype; /* pg_type OID of result value */
 	bool		funcretset;		/* true if function returns set */
 	bool		funcvariadic;	/* true if variadic arguments have been
  * combined into an array last argument */
@@ -644,13 +644,13 @@ typedef struct OpExpr
 {
 	Expr		xpr;
 
-	/* PG_OPERATOR OID of the operator */
+	/* pg_operator OID of the operator */
 	Oid			opno;
 
-	/* PG_PROC OID of underlying function */
+	/* pg_proc OID of underlying function */
 	Oid			opfuncid pg_node_attr(equal_ignore_if_zero);
 
-	/* PG_TYPE OID of result value */
+	/* pg_type OID of result value */
 	Oid			opresulttype;
 
 	/* true if operator returns set */
@@ -721,16 +721,16 @@ typedef struct ScalarArrayOpExpr
 {
 	Expr		xpr;
 
-	/* PG_OPERATOR OID of the operator */
+	/* pg_operator OID of the operator */
 	Oid			opno;
 
-	/* PG_PROC OID of comparison function */
+	/* pg_proc OID of comparison function */
 	Oid			opfuncid pg_node_attr(equal_ignore_if_zero);
 
-	/* PG_PROC OID of hash func or InvalidOid */
+	/* pg_proc OID of hash func or InvalidOid */
 	Oid			hashfuncid pg_node_attr(equal_ignore_if_zero);
 
-	/* PG_PROC OID of negator of opfuncid function or InvalidOid.  See above */
+	/* pg_proc OID of negator of opfuncid function or InvalidOid.  See above */
 	Oid			negfuncid pg_node_attr(equal_ignore_if_zero);
 
 	/* true for ANY, false for ALL */
-- 
2.25.1

>From 4b9ec1f791cbe865866899e8864b6e96d8fa015c Mon Sep 17 00:00:00 2001
From: Japin Li 
Date: Fri, 16 Sep 2022 11:52:22 +0800
Subject: [PATCH v1 2/2] Fix a typo about PGPROC structure reference

---
 src/backend/access/transam/README | 2 +-
 src/backend/access/transam/xact.c | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/backend/access/transam/README b/src/backend/access/transam/README
index 734c39a4d0..72af656060 100644
--- a/src/backend/access/transam/README
+++ b/src/backend/access/transam/README
@@ -198,7 +198,7 @@ parent.  This maintains the invariant that child transactions have XIDs later
 than their parents, which is assumed in a number of places.
 
 The subsidiary actions of obtaining a lock on the XID and entering it into
-pg_subtrans and PG_PROC are done at the time it is assigned.
+pg_subtrans and PGPROC are done at the time it is assigned.
 
 A transaction that has no XID still needs to be identified for various
 purposes, notably holding locks.  For this purpose we assign a "virtual
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 50f092d7eb..7abc6a0705 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -680,12 +680,12 @@ AssignTransactionId(TransactionState s)
 		log_unknown_top = true;
 
 	/*
-	 * Generate a new FullTransactionId and record its xid in PG_PROC and
+	 * Generate a new FullTransactionId and record its xid in PGPROC and
 	 * pg_subtrans.
 	 *
 	 * NB: we must make the subtrans entry BEFORE the Xid appears anywhere in
-	 * shared storage other than PG_PROC; because if there's no room for it in
-	 * PG_PROC, the subtrans entry is needed to ensure that other back

Re: Typo in xact.c

2022-09-15 Thread Japin Li



On Fri, 16 Sep 2022 at 11:11, Kyotaro Horiguchi  wrote:
> At Thu, 15 Sep 2022 22:38:01 +0800, Japin Li  wrote in 
>> 
>> Hi hacker,
>> 
>> Recently, I find there might be a typo in xact.c comments.  The comments
>> say "PG_PROC", however, it actually means "PGPROC" structure.  Since we
>> have pg_proc catalog, and use PG_PROC to reference the catalog [1], so,
>> we should use PGPROC to reference the structure.  Any thoughts?
>> 
>> [1] src/include/nodes/primnodes.h
>
> The patch seems to me covering all occurances of PG_PROC as PGPROC.
>
> I found several uses of PG_PROC as (pg_catalog.)pg_proc, which is
> quite confusing, too..
>
> regards.

Thanks for the review.  Attached a new patch to replace upper case system
catatlog to lower case [1].

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Re: missing indexes in indexlist with partitioned tables

2022-09-15 Thread David Rowley

On Wed, 3 Aug 2022 at 11:07, Arne Roland  wrote:
> Attached a rebased version of the patch.

Firstly, I agree that we should fix the issue of join removals not
working with partitioned tables.

I had a quick look over this and the first thing that I thought was
the same as what Amit mentioned in:

On Tue, 25 Jan 2022 at 21:04, Amit Langote  wrote:
> Finally, I don't understand why we need a separate field to store
> indexes found in partitioned base relations.  AFAICS, nothing but the
> sites you are interested in (relation_has_unique_index_for() and
> rel_supports_distinctness()) would ever bother to look at a
> partitioned base relation's indexlist.  Do you think putting them into
> in indexlist might break something?

I kinda disagree with Alvaro's fix in 05fb5d661. I think indexlist is
the place to store these details.  That commit added the following
comment:

/*
* Ignore partitioned indexes, since they are not usable for
* queries.
*/

But neither are hypothetical indexes either, yet they're added to
RelOptInfo.indexlist.

I think the patch should be changed so that the existing list is used
and we find another fix for the problems Alvaro fixed in 05fb5d661.
Unfortunately, there was no discussion marked on that commit message,
so it's not quite clear what the problem was. I'm unsure if there was
anything other than CLUSTER that was broken.  I see that cfdd03f45
added CLUSTER for partitioned tables in v15. I think the patch would
need to go over the usages of RelOptInfo.indexlist to make sure that
we don't need to add any further conditions to skip their usage for
partitioned tables.

I wrote the attached patch as I wanted to see what would break if we
did this. The only problem I got from running make check was in
get_actual_variable_range(), so I just changed that so it returns
false when the given rel is a partitioned table. I only quickly did
the changes to get_relation_info() and didn't give much thought to
what the can* bool flags should be set to.  I just mostly skipped all
that code because it was crashing on
relation->rd_tableam->scan_bitmap_next_block due to the rd_tableam
being NULL.

Also, just a friendly tip, Arne; I saw you named your patch 0006 for
version 6.  You'll see many 000n patches around the list, but those
are generally done with git format-patch. That number normally means
the patch in the patch series, not the version of the patch. I'm not
sure if it'll help any, but my workflow for writing new patches
against master tends to be:

git checkout master
git checkout -b some_feature_branch
# write some code
git commit -a
# maybe more code
git commit -a
git format-patch -v1 master

That'll create v1-0001 and v1-0002 patches. When I'm onto v2, I just
change the version number from -v1 to -v2.

David
diff --git a/src/backend/optimizer/util/plancat.c 
b/src/backend/optimizer/util/plancat.c
index 5012bfe142..a2b7667fc7 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -156,10 +156,10 @@ get_relation_info(PlannerInfo *root, Oid 
relationObjectId, bool inhparent,

/*
 * Make list of indexes.  Ignore indexes on system catalogs if told to.
-* Don't bother with indexes for an inheritance parent, either.
+* Don't bother with indexes from traditional inheritance parents, 
either.
 */
-   if (inhparent ||
-   (IgnoreSystemIndexes && IsSystemRelation(relation)))
+   if ((inhparent && relation->rd_rel->relkind != 
RELKIND_PARTITIONED_TABLE)
+   || (IgnoreSystemIndexes && IsSystemRelation(relation)))
hasindex = false;
else
hasindex = relation->rd_rel->relhasindex;
@@ -212,16 +212,6 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, 
bool inhparent,
continue;
}

-   /*
-* Ignore partitioned indexes, since they are not 
usable for
-* queries.
-*/
-   if (indexRelation->rd_rel->relkind == 
RELKIND_PARTITIONED_INDEX)
-   {
-   index_close(indexRelation, NoLock);
-   continue;
-   }
-
/*
 * If the index is valid, but cannot yet be used, 
ignore it; but
 * mark the plan we are generating as transient. See
@@ -266,108 +256,127 @@ get_relation_info(PlannerInfo *root, Oid 
relationObjectId, bool inhparent,

info->relam = indexRelation->rd_rel->relam;

-   /* We copy just the fields we need, not all of rd_indam 
*/
-   amroutine = indexRelation->rd_indam;
-   info->amcanorderbyop = amroutine->amcanorderbyop;
-   info->amoptionalkey = amroutine->amoptionalkey;
-   info->amsearcharray =

Re: Add last_vacuum_index_scans in pg_stat_all_tables

2022-09-15 Thread Ken Kato


Regression is failing on all platforms; please correct that and
resubmit the patch.


Hi,

Thank you for the review!
I fixed it and resubmitting the patch.

Regards,

--
Ken Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATIONdiff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 1d9509a2f6..d58dbc36bc 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4562,6 +4562,16 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
daemon
   
  
+
+ 
+  
+   last_vacuum_index_scans bigint
+  
+  
+   Number of splitted index scans performed during the the last vacuum
+   on this table
+  
+ 
 

   
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index dfbe37472f..c5b4405f4b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -629,7 +629,8 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 		 rel->rd_rel->relisshared,
 		 Max(vacrel->new_live_tuples, 0),
 		 vacrel->recently_dead_tuples +
-		 vacrel->missed_dead_tuples);
+		 vacrel->missed_dead_tuples,
+		 vacrel->num_index_scans);
 	pgstat_progress_end_command();
 
 	if (instrument)
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 55f7ec79e0..f854576b20 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -675,7 +675,9 @@ CREATE VIEW pg_stat_all_tables AS
 pg_stat_get_vacuum_count(C.oid) AS vacuum_count,
 pg_stat_get_autovacuum_count(C.oid) AS autovacuum_count,
 pg_stat_get_analyze_count(C.oid) AS analyze_count,
-pg_stat_get_autoanalyze_count(C.oid) AS autoanalyze_count
+pg_stat_get_autoanalyze_count(C.oid) AS autoanalyze_count,
+pg_stat_get_last_vacuum_index_scans(C.oid) AS last_vacuum_index_scans
+
 FROM pg_class C LEFT JOIN
  pg_index I ON C.oid = I.indrelid
  LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index a846d9ffb6..ffc9daf944 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -208,8 +208,8 @@ pgstat_drop_relation(Relation rel)
  * Report that the table was just vacuumed.
  */
 void
-pgstat_report_vacuum(Oid tableoid, bool shared,
-	 PgStat_Counter livetuples, PgStat_Counter deadtuples)
+pgstat_report_vacuum(Oid tableoid, bool shared, PgStat_Counter livetuples,
+	 PgStat_Counter deadtuples, PgStat_Counter num_index_scans)
 {
 	PgStat_EntryRef *entry_ref;
 	PgStatShared_Relation *shtabentry;
@@ -232,6 +232,7 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
 
 	tabentry->n_live_tuples = livetuples;
 	tabentry->n_dead_tuples = deadtuples;
+	tabentry->n_index_scans = num_index_scans;
 
 	/*
 	 * It is quite possible that a non-aggressive VACUUM ended up skipping
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index be15b4b2e5..d5fd34ee0e 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -179,6 +179,20 @@ pg_stat_get_dead_tuples(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(result);
 }
 
+Datum
+pg_stat_get_last_vacuum_index_scans(PG_FUNCTION_ARGS)
+{
+	Oid			relid = PG_GETARG_OID(0);
+	int64		result;
+	PgStat_StatTabEntry *tabentry;
+
+	if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL)
+		result = 0;
+	else
+		result = tabentry->n_index_scans;
+
+	PG_RETURN_INT64(result);
+}
 
 Datum
 pg_stat_get_mod_since_analyze(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index a07e737a33..36c6c53e65 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5331,6 +5331,10 @@
   proname => 'pg_stat_get_autoanalyze_count', provolatile => 's',
   proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
   prosrc => 'pg_stat_get_autoanalyze_count' },
+{ oid => '3813', descr => 'statistics: last vacuum index scans for a table',
+  proname => 'pg_stat_get_last_vacuum_index_scans', provolatile => 's',
+  proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+  prosrc => 'pg_stat_get_last_vacuum_index_scans' },
 { oid => '1936', descr => 'statistics: currently active backend IDs',
   proname => 'pg_stat_get_backend_idset', prorows => '100', proretset => 't',
   provolatile => 's', proparallel => 'r', prorettype => 'int4',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index ac28f813b4..89989c8c19 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -366,6 +366,7 @@ typedef struct PgStat_StatTabEntry
 
 	PgStat_Counter n_live_tuples;
 	PgStat_Counter n_dead_tuples;
+	PgStat_Counter n_index_scans;
 	PgStat_Counter changes_since

Re: ICU for global collation

2022-09-15 Thread Kyotaro Horiguchi

At Thu, 15 Sep 2022 18:41:31 +0300, Marina Polyakova 
 wrote in 
> P.S. While working on the patch, I discovered that UTF8 encoding is
> always used for the ICU provider in initdb unless it is explicitly
> specified by the user:
> 
> if (!encoding && locale_provider == COLLPROVIDER_ICU)
>   encodingid = PG_UTF8;
> 
> IMO this creates additional errors for locales with other encodings:
> 
> $ initdb --locale de_DE.iso885915@euro --locale-provider icu
> --icu-locale de-DE
> ...
> initdb: error: encoding mismatch
> initdb: detail: The encoding you selected (UTF8) and the encoding that
> the selected locale uses (LATIN9) do not match. This would lead to
> misbehavior in various character string processing functions.
> initdb: hint: Rerun initdb and either do not specify an encoding
> explicitly, or choose a matching combination.
> 
> And ICU supports many encodings, see the contents of pg_enc2icu_tbl in
> encnames.c...

It seems to me the best default that fits almost all cases using icu
locales.

So, we need to specify encoding explicitly in that case.

$ initdb --encoding iso-8859-15 --locale de_DE.iso885915@euro --locale-provider 
icu --icu-locale de-DE

However, I think it is hardly understantable from the documentation.

(I checked this using euc-jp [1] so it might be wrong..)

[1] initdb --encoding euc-jp --locale ja_JP.eucjp --locale-provider icu 
--icu-locale ja-x-icu

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: START_REPLICATION SLOT causing a crash in an assert build

2022-09-15 Thread Kyotaro Horiguchi

At Thu, 15 Sep 2022 11:15:12 -0500, Jaime Casanova 
 wrote in 
> It fails at ./src/backend/utils/activity/pgstat_shmem.c:530 inside

Thanks for the info.  I reproduced by make check.. stupid..

It's the thinko about the base address of reset_off.

So the attached doesn't crash..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/utils/activity/pgstat.c 
b/src/backend/utils/activity/pgstat.c
index 6224c498c2..ed3f3af4d9 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -263,6 +263,8 @@ static const PgStat_KindInfo 
pgstat_kind_infos[PGSTAT_NUM_KINDS] = {
.shared_size = sizeof(PgStatShared_Database),
.shared_data_off = offsetof(PgStatShared_Database, stats),
.shared_data_len = sizeof(((PgStatShared_Database *) 0)->stats),
+   .reset_off = 0,
+   .reset_len = sizeof(((PgStatShared_Database *) 0)->stats),
.pending_size = sizeof(PgStat_StatDBEntry),
 
.flush_pending_cb = pgstat_database_flush_cb,
@@ -277,6 +279,8 @@ static const PgStat_KindInfo 
pgstat_kind_infos[PGSTAT_NUM_KINDS] = {
.shared_size = sizeof(PgStatShared_Relation),
.shared_data_off = offsetof(PgStatShared_Relation, stats),
.shared_data_len = sizeof(((PgStatShared_Relation *) 0)->stats),
+   .reset_off = 0,
+   .reset_len = sizeof(((PgStatShared_Relation *) 0)->stats),
.pending_size = sizeof(PgStat_TableStatus),
 
.flush_pending_cb = pgstat_relation_flush_cb,
@@ -291,6 +295,8 @@ static const PgStat_KindInfo 
pgstat_kind_infos[PGSTAT_NUM_KINDS] = {
.shared_size = sizeof(PgStatShared_Function),
.shared_data_off = offsetof(PgStatShared_Function, stats),
.shared_data_len = sizeof(((PgStatShared_Function *) 0)->stats),
+   .reset_off = 0,
+   .reset_len = sizeof(((PgStatShared_Function *) 0)->stats),
.pending_size = sizeof(PgStat_BackendFunctionEntry),
 
.flush_pending_cb = pgstat_function_flush_cb,
@@ -307,6 +313,10 @@ static const PgStat_KindInfo 
pgstat_kind_infos[PGSTAT_NUM_KINDS] = {
.shared_size = sizeof(PgStatShared_ReplSlot),
.shared_data_off = offsetof(PgStatShared_ReplSlot, stats),
.shared_data_len = sizeof(((PgStatShared_ReplSlot *) 0)->stats),
+   /* reset doesn't wipe off slot name */
+   .reset_off = offsetof(PgStat_StatReplSlotEntry, spill_txns),
+   .reset_len = sizeof(((PgStatShared_ReplSlot *) 0)->stats),
+   offsetof(PgStat_StatReplSlotEntry, spill_txns),
 
.reset_timestamp_cb = pgstat_replslot_reset_timestamp_cb,
.to_serialized_name = pgstat_replslot_to_serialized_name_cb,
@@ -323,6 +333,8 @@ static const PgStat_KindInfo 
pgstat_kind_infos[PGSTAT_NUM_KINDS] = {
.shared_size = sizeof(PgStatShared_Subscription),
.shared_data_off = offsetof(PgStatShared_Subscription, stats),
.shared_data_len = sizeof(((PgStatShared_Subscription *) 
0)->stats),
+   .reset_off = 0,
+   .reset_len = sizeof(((PgStatShared_Subscription *) 0)->stats),
.pending_size = sizeof(PgStat_BackendSubEntry),
 
.flush_pending_cb = pgstat_subscription_flush_cb,
diff --git a/src/backend/utils/activity/pgstat_shmem.c 
b/src/backend/utils/activity/pgstat_shmem.c
index ac98918688..09a8c3873c 100644
--- a/src/backend/utils/activity/pgstat_shmem.c
+++ b/src/backend/utils/activity/pgstat_shmem.c
@@ -915,8 +915,9 @@ shared_stat_reset_contents(PgStat_Kind kind, 
PgStatShared_Common *header,
 {
const PgStat_KindInfo *kind_info = pgstat_get_kind_info(kind);
 
-   memset(pgstat_get_entry_data(kind, header), 0,
-  pgstat_get_entry_len(kind));
+   memset((char *)pgstat_get_entry_data(kind, header) +
+  kind_info->reset_off, 0,
+  kind_info->reset_len);
 
if (kind_info->reset_timestamp_cb)
kind_info->reset_timestamp_cb(header, ts);
diff --git a/src/include/utils/pgstat_internal.h 
b/src/include/utils/pgstat_internal.h
index 901d2041d6..144fe5814c 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -215,6 +215,13 @@ typedef struct PgStat_KindInfo
uint32  shared_data_off;
uint32  shared_data_len;
 
+   /*
+* The offset/size of the region to wipe out when the entry is reset.
+* reset_off is relative to shared_data_off.
+*/
+   uint32  reset_off;
+   uint32  reset_len;
+
/*
 * The size of the pending data for this kind. E.g. how large
 * PgStat_EntryRef->pending is. Used for allocations.

Re: why can't a table be part of the same publication as its schema

2022-09-15 Thread Amit Kapila

On Thu, Sep 15, 2022 at 6:27 PM houzj.f...@fujitsu.com
 wrote:
>
> Attach the new version patch which added suggested restriction for column list
> and merged Vignesh's patch.
>

Few comments:

1.
 static void
-CheckPubRelationColumnList(List *tables, const char *queryString,
+CheckPubRelationColumnList(List *tables, bool publish_schema,
+const char *queryString,
 bool pubviaroot)

It is better to keep bool parameters together at the end.

2.
  /*
+ * Disallow using column list if any schema is in the publication.
+ */
+ if (publish_schema)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("cannot use publication column list for relation \"%s.%s\"",
+get_namespace_name(RelationGetNamespace(pri->relation)),
+RelationGetRelationName(pri->relation)),
+ errdetail("Column list cannot be specified if any schema is part of
the publication or specified in the list."));

I think it would be better to explain why we disallow this case.

3.
+ if (!heap_attisnull(coltuple, Anum_pg_publication_rel_prattrs, NULL))
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("cannot add schema to the publication"),
+ errdetail("Schema cannot be added if any table that specifies column
list is already part of the publication"));

A full stop is missing at the end in the errdetail message.

4. I have modified a few comments in the attached. Please check and if
you like the changes then please include those in the next version.

-- 
With Regards,
Amit Kapila.


change_comments_amit_1.patch
Description: Binary data

Re: Add last_vacuum_index_scans in pg_stat_all_tables

2022-09-15 Thread Fujii Masao





On 2022/09/16 13:23, Ken Kato wrote:

Regression is failing on all platforms; please correct that and
resubmit the patch.


Hi,

Thank you for the review!
I fixed it and resubmitting the patch.


Could you tell me why the number of index scans should be tracked for
each table? Instead, isn't it enough to have one global counter, to
check whether the current setting of maintenance_work_mem is sufficient
or not? That is, I'm thinking to have something like pg_stat_vacuum view
that reports, for example, the number of vacuum runs, the total
number of index scans, the maximum number of index scans by one
vacuum run, the number of cancellation of vacuum because of
lock conflicts, etc. If so, when these global counters are high or
increasing, we can think that it may worth tuning maintenance_work_mem.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-09-15 Thread Masahiko Sawada

On Mon, Aug 15, 2022 at 10:39 PM John Naylor
 wrote:
>
> On Mon, Aug 15, 2022 at 12:39 PM Masahiko Sawada  
> wrote:
> >
> > On Fri, Jul 22, 2022 at 10:43 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Tue, Jul 19, 2022 at 1:30 PM John Naylor
> > >  wrote:
> > > >
> > > >
> > > >
> > > > On Tue, Jul 19, 2022 at 9:11 AM Masahiko Sawada  
> > > > wrote:
> > > >
> > > > > I’d like to keep the first version simple. We can improve it and add
> > > > > more optimizations later. Using radix tree for vacuum TID storage
> > > > > would still be a big win comparing to using a flat array, even without
> > > > > all these optimizations. In terms of single-value leaves method, I'm
> > > > > also concerned about an extra pointer traversal and extra memory
> > > > > allocation. It's most flexible but multi-value leaves method is also
> > > > > flexible enough for many use cases. Using the single-value method
> > > > > seems to be too much as the first step for me.
> > > > >
> > > > > Overall, using 64-bit keys and 64-bit values would be a reasonable
> > > > > choice for me as the first step . It can cover wider use cases
> > > > > including vacuum TID use cases. And possibly it can cover use cases by
> > > > > combining a hash table or using tree of tree, for example.
> > > >
> > > > These two aspects would also bring it closer to Andres' prototype, 
> > > > which 1) makes review easier and 2) easier to preserve optimization 
> > > > work already done, so +1 from me.
> > >
> > > Thanks.
> > >
> > > I've updated the patch. It now implements 64-bit keys, 64-bit values,
> > > and the multi-value leaves method. I've tried to remove duplicated
> > > codes but we might find a better way to do that.
> > >
> >
> > With the recent changes related to simd, I'm going to split the patch
> > into at least two parts: introduce other simd optimized functions used
> > by the radix tree and the radix tree implementation. Particularly we
> > need two functions for radix tree: a function like pg_lfind32 but for
> > 8 bits integers and return the index, and a function that returns the
> > index of the first element that is >= key.
>
> I recommend looking at
>
> https://www.postgresql.org/message-id/CAFBsxsESLUyJ5spfOSyPrOvKUEYYNqsBosue9SV1j8ecgNXSKA%40mail.gmail.com
>
> since I did the work just now for searching bytes and returning a
> bool, buth = and <=. Should be pretty close. Also, i believe if you
> left this for last as a possible refactoring, it might save some work.
> In any case, I'll take a look at the latest patch next month.

I've updated the radix tree patch. It's now separated into two patches.

0001 patch introduces pg_lsearch8() and pg_lsearch8_ge() (we may find
better names) that are similar to the pg_lfind8() family but they
return the index of the key in the vector instead of true/false. The
patch includes regression tests.

0002 patch is the main radix tree implementation. I've removed some
duplicated codes of node manipulation. For instance, since node-4,
node-16, and node-32 have a similar structure with different fanouts,
I introduced the common function for them.

In addition to two patches, I've attached the third patch. It's not
part of radix tree implementation but introduces a contrib module
bench_radix_tree, a tool for radix tree performance benchmarking. It
measures loading and lookup performance of both the radix tree and a
flat array.

Regards,

--
Masahiko Sawada
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 5d0115b068ecb01d791eab5f8a78a6d25b9cf45c Mon Sep 17 00:00:00 2001
From: Masahiko Sawada 
Date: Wed, 14 Sep 2022 12:38:01 +
Subject: [PATCH v6 1/3] Support pg_lsearch8_eq and pg_lsearch8_ge

---
 src/include/port/pg_lfind.h   |  71 
 src/include/port/simd.h   | 155 +-
 .../test_lfind/expected/test_lfind.out|  12 ++
 .../modules/test_lfind/sql/test_lfind.sql |   2 +
 .../modules/test_lfind/test_lfind--1.0.sql|   8 +
 src/test/modules/test_lfind/test_lfind.c  | 139 
 6 files changed, 378 insertions(+), 9 deletions(-)

diff --git a/src/include/port/pg_lfind.h b/src/include/port/pg_lfind.h
index 0625cac6b5..583f204763 100644
--- a/src/include/port/pg_lfind.h
+++ b/src/include/port/pg_lfind.h
@@ -80,6 +80,77 @@ pg_lfind8_le(uint8 key, uint8 *base, uint32 nelem)
 	return false;
 }
 
+/*
+ * pg_lsearch8
+ *
+ * Return the index of the element in 'base' that equals to 'key', otherwise return
+ * -1.
+ */
+static inline int
+pg_lsearch8(uint8 key, uint8 *base, uint32 nelem)
+{
+	uint32		i;
+
+	/* round down to multiple of vector length */
+	uint32		tail_idx = nelem & ~(sizeof(Vector8) - 1);
+	Vector8		chunk;
+
+	for (i = 0; i < tail_idx; i += sizeof(Vector8))
+	{
+		int idx;
+
+		vector8_load(&chunk, &base[i]);
+		if ((idx = vector8_search_eq(chunk, key)) != -1)
+			return i + idx;
+	}
+
+	/* Process the remaining elements one at a time. */
+	for (; i < nele

Re: A question about wording in messages

2022-09-15 Thread Kyotaro Horiguchi

At Fri, 16 Sep 2022 12:10:05 +1200, Thomas Munro  wrote 
in 
> On Wed, Sep 14, 2022 at 2:38 PM Tom Lane  wrote:
> > Kyotaro Horiguchi  writes:
> > > I saw the following message recently modified.
> > >> This controls the maximum distance we can read ahead in the WAL to 
> > >> prefetch referenced data blocks.
> > > Maybe the "we" means "PostgreSQL program and you" but I see it
> > > somewhat out of place.
> >
> > +1, I saw that today and thought it was outside our usual style.
> > The whole thing is awfully verbose for a GUC description, too.
> > Maybe
> >
> > "Maximum distance to read ahead in WAL to prefetch data blocks."
> 
> +1
> 
> For "we", I must have been distracted by code comment style.  For the
> extra useless verbiage, it's common for GUC description to begin "This
> control/affects/blah" like that, but I agree it's useless noise.
> 
> For the other cases, Amit's suggestion of 'server' seems sensible to me.

Thaks for the opinion. I'm fine with that, too.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 31df0a224d7d4d6cb619534f24940c4c3571cfd9 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 16 Sep 2022 15:07:21 +0900
Subject: [PATCH v2] Get rid of "we" from messages

It's not our usual style to use "we" in error messages. Get rid of
that usages so that the messages sound usual.  On the way fixing them,
simplify a verbose message.
---
 src/backend/replication/pgoutput/pgoutput.c | 4 ++--
 src/backend/storage/file/fd.c   | 2 +-
 src/backend/utils/misc/guc_tables.c | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 62e0ffecd8..9c8faece10 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -441,13 +441,13 @@ pgoutput_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 		if (data->protocol_version > LOGICALREP_PROTO_MAX_VERSION_NUM)
 			ereport(ERROR,
 	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-	 errmsg("client sent proto_version=%d but we only support protocol %d or lower",
+	 errmsg("client sent proto_version=%d but server only supports protocol %d or lower",
 			data->protocol_version, LOGICALREP_PROTO_MAX_VERSION_NUM)));
 
 		if (data->protocol_version < LOGICALREP_PROTO_MIN_VERSION_NUM)
 			ereport(ERROR,
 	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-	 errmsg("client sent proto_version=%d but we only support protocol %d or higher",
+	 errmsg("client sent proto_version=%d but server only supports protocol %d or higher",
 			data->protocol_version, LOGICALREP_PROTO_MIN_VERSION_NUM)));
 
 		if (data->publication_names == NIL)
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 20c3741aa1..073dab2be5 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -978,7 +978,7 @@ set_max_safe_fds(void)
 		ereport(FATAL,
 (errcode(ERRCODE_INSUFFICIENT_RESOURCES),
  errmsg("insufficient file descriptors available to start server process"),
- errdetail("System allows %d, we need at least %d.",
+ errdetail("System allows %d, server needs at least %d.",
 		   max_safe_fds + NUM_RESERVED_FDS,
 		   FD_MINFREE + NUM_RESERVED_FDS)));
 
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 550e95056c..5c340d471d 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -2591,7 +2591,7 @@ struct config_int ConfigureNamesInt[] =
 	{
 		{"wal_decode_buffer_size", PGC_POSTMASTER, WAL_RECOVERY,
 			gettext_noop("Buffer size for reading ahead in the WAL during recovery."),
-			gettext_noop("This controls the maximum distance we can read ahead in the WAL to prefetch referenced data blocks."),
+			gettext_noop("Maximum distance to read ahead in WAL to prefetch data blocks."),
 			GUC_UNIT_BYTE
 		},
 		&wal_decode_buffer_size,
-- 
2.31.1

Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)

2022-09-15 Thread Michael Paquier

On Wed, Sep 14, 2022 at 02:24:12PM +0530, Bharath Rupireddy wrote:
> I'm attaching the v4 patch that's rebased on to the latest HEAD.
> Please consider this for review.

I have been looking at this patch.

-   StringInfo  labelfile;
-   StringInfo  tblspc_map_file;
backup_manifest_info manifest;
+   BackupState backup_state;
You could use initialize the state here with a {0}.  That's simpler.

--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -380,6 +380,31 @@ GetRmgr(RmgrId rmid)
 }
 #endif

+/* Structure to hold backup state. */
+typedef struct BackupStateData
+{
Why is that in xlog_internal.h?  This header includes a lot of
declarations about the internals of WAL, but the backup state is not
that internal.  I'd like to think that we should begin separating the
backup-related routines into their own file, as of a set of
xlogbackup.c/h in this case.  That's a split I have been wondering
about for some time now.  The internals of xlog.c for the start/stop
backups are tied to XLogCtlData which map such a switch more
complicated than it looks, so we can begin small and have the routines
to create, free and build the label file and the tablespace map in
this new file.

+   state->name = (char *) palloc0(len + 1);
+   memcpy(state->name, name, len);
Or just pstrdup()?

+BackupState
+get_backup_state(const char *name)
+{
I would name this one create_backup_state() instead.

+void
+create_backup_content_str(BackupState state, bool forhistoryfile)
+{
This could be a build_backup_content().

It seems to me that there is no point in having the list of
tablespaces in BackupStateData?  This actually makes the code harder
to follow, see for example the changes with do_pg_backup_start(), we
the list of tablespace may or may be not passed down as a pointer of
BackupStateData while BackupStateData is itself the first argument of
this routine.  These are independent from the label and backup history
file, as well.

We need to be careful about the file format (see read_backup_label()),
and create_backup_content_str() is careful about that which is good.
Core does not care about the format of the backup history file, though
some community tools may.  I agree that what you are proposing here
makes the generation of these files easier to follow, but let's
document what forhistoryfile is here for, at least.  Saving the
the backup label and history file strings in BackupState is a
confusing interface IMO.  It would be more intuitive to have the
backup state in input, and provide the string generated in output
depending on what we want from the backup state.

-   backup_started_in_recovery = RecoveryInProgress();
+   Assert(state != NULL);
+
+   in_recovery = RecoveryInProgress();
[...]
-   if (strcmp(backupfrom, "standby") == 0 && !backup_started_in_recovery)
+   if (state->started_in_recovery == true && in_recovery == false)

I would have kept the naming to backup_started_in_recovery here.  What
you are doing is logically right by relying on started_in_recovery to
check if recovery was running when the backup started, but this just
creates useless noise in the refactoring.

Something unrelated to your patch that I am noticing while scanning
the area is that we have been always lazy in freeing the label file
data allocated in TopMemoryContext when using the SQL functions if the
backup is aborted.  We are not talking about this much amount of
memory each time a backup is performed, but that could be a cause for
memory bloat in a server if the same session is used and backups keep
failing, as the data is freed only on a successful pg_backup_stop().
Base backups through the replication protocol don't care about that as
the code keeps around the same pointer for the whole duration of
perform_base_backup().  Trying to tie the cleanup of the label file
with the abort phase would be the cause of more complications with
do_pg_backup_stop(), and xlog.c has no need to know about that now.
Just a remark for later.
--
Michael

signature.asc
Description: PGP signature

Re: Switching XLog source from archive to streaming when primary available

2022-09-15 Thread Kyotaro Horiguchi

At Fri, 16 Sep 2022 09:15:58 +0530, Bharath Rupireddy 
 wrote in 
> On Thu, Sep 15, 2022 at 1:52 PM Kyotaro Horiguchi
>  wrote:
> >
> > At Thu, 15 Sep 2022 10:28:12 +0530, Bharath Rupireddy 
> >  wrote in
> > > I'm attaching the v6 patch that's rebased on to the latest HEAD.
> > > Please consider this for review.
> >
> > Thaks for the new version!
> >
> > +#define StreamingReplRetryEnabled() \
> > +   (streaming_replication_retry_interval > 0 && \
> > +StandbyMode && \
> > +currentSource == XLOG_FROM_ARCHIVE)
> >
> > It seems to me a bit too complex..

In other words, it seems to me that the macro name doesn't manifest
the condition correctly.

> I don't think so, it just tells whether a standby is allowed to switch
> source to stream from archive.
> 
> > +   /* Save the timestamp at which we're switching to 
> > archive. */
> > +   if (StreamingReplRetryEnabled())
> > +   switched_to_archive_at = 
> > GetCurrentTimestamp();
> >
> > Anyway we are going to open a file just after this so
> > GetCurrentTimestamp() cannot cause a perceptible degradation.
> > Coulnd't we do that unconditionally, to get rid of the macro?
> 
> Do we really need to do it unconditionally? I don't think so. And, we
> can't get rid of the macro, as we need to check for the current
> source, GUC and standby mode. When this feature is disabled, it
> mustn't execute any extra code IMO.

I don't think we don't particularly want to do that unconditionally.
I wanted just to get rid of the macro from the usage site.  Even if
the same condition is used elsewhere, I see it better to write out the
condition directly there..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [PATCH] Simple code cleanup in tuplesort.c.

2022-09-15 Thread Richard Guo

On Wed, Jul 27, 2022 at 5:10 PM Xing Guo  wrote:

> The bounded heap sorting status flag is set twice in sort_bounded_heap()
> and tuplesort_performsort(). This patch helps remove one of them.
>

Revisiting this patch I think maybe it's better to remove the setting of
Tuplesort status from tuplesort_performsort() for the TSS_BOUNDED case.
Thus we keep the heap manipulation routines, make_bounded_heap and
sort_bounded_heap, consistent in that they update their status
accordingly inside the function.

Also, would you please add it to the CF to not lose track of it?

Thanks
Richard

Re: Remove dead macro exec_subplan_get_plan

2022-09-15 Thread Richard Guo

On Tue, Sep 6, 2022 at 12:39 AM Zhang Mingli  wrote:

> Macro exec_subplan_get_plan is not used anymore.
> Attach a patch to remove it.
>

How about add it to the CF to not lose track of it?

Thanks
Richard

Re: ICU for global collation

2022-09-15 Thread Marina Polyakova


On 2022-09-15 09:52, Kyotaro Horiguchi wrote:

If I executed initdb as follows, I would be told to specify
--icu-locale option.


$ initdb --encoding sql-ascii --locale-provider icu hoge
...
initdb: error: ICU locale must be specified


However, when I reran the command, it complains about incompatible
encoding this time.  I think it's more user-friendly to check for the
encoding compatibility before the check for missing --icu-locale
option.

regards.


In continuation of options check: AFAICS the following checks in initdb

if (locale_provider == COLLPROVIDER_ICU)
{
if (!icu_locale)
pg_fatal("ICU locale must be specified");

/*
 * In supported builds, the ICU locale ID will be checked by the
 * backend during post-bootstrap initialization.
 */
#ifndef USE_ICU
pg_fatal("ICU is not supported in this build");
#endif
}

are executed approximately when they are executed in create database 
after getting all the necessary data from the template database:


if (dblocprovider == COLLPROVIDER_ICU)
{
/*
 * This would happen if template0 uses the libc provider but the new
 * database uses icu.
 */
if (!dbiculocale)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 errmsg("ICU locale must be specified")));
}

if (dblocprovider == COLLPROVIDER_ICU)
check_icu_locale(dbiculocale);

But perhaps the check that --icu-locale cannot be specified unless 
locale provider icu is chosen should also be moved here? So all these 
checks will be in one place and it will use the provider from the 
template database (which could be icu):


$ initdb --locale-provider icu --icu-locale en-US -D data &&
pg_ctl -D data -l logfile start &&
createdb --icu-locale ru-RU --template template0 mydb
...
createdb: error: database creation failed: ERROR:  ICU locale cannot be 
specified unless locale provider is ICU


--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Make ON_ERROR_STOP stop on shell script failure

2022-09-15 Thread bt22nakamorit


Hi,

"""\set ON_ERROR_STOP on""" stops any subsequent incoming query that 
comes after an error of an SQL, but does not stop after a shell script 
ran by """\! """ returning values other than 0, -1, or 
127, which suggests a failure in the result of the shell script.


For example, suppose that below is an SQL file.
\set ON_ERROR_STOP on
SELECT 1;
\! false
SELECT 2;

The current design allows SELECT 2 even though the shell script returns 
a value indicating a failure.


I thought that this action is rather unexpected since, based on the word 
"""ON_ERROR_STOP""", ones may expect that failures of shell scripts 
should halt the incoming instructions as well.
One clear solution is to let failures of shell script stop incoming 
queries just like how errors of SQLs do currently. Thoughts?


Tatsudiff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index a141146e70..7445ca04ff 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -4989,6 +4989,10 @@ do_shell(const char *command)
 		pg_log_error("\\!: failed");
 		return false;
 	}
+	else if (result != 0) {
+		pg_log_error("command failed");
+		return false;
+	}
 	return true;
 }

missing PG_FREE_IF_COPY in textlike() and textnlike() ?

2022-09-15 Thread CK Tan

Hi Hackers,

I see in the texteq() function calls to DatumGetTextPP() are followed
by conditional calls to PG_FREE_IF_COPY. e.g.

https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/varlena.c#L1792

   text *targ1 = DatumGetTextPP(arg1);
   text *targ2 = DatumGetTextPP(arg2);
   result = (memcmp(VARDATA_ANY(targ1), VARDATA_ANY(targ2), len1 -
VARHDRSZ) == 0);
   PG_FREE_IF_COPY(targ1, 0);
   PG_FREE_IF_COPY(targ2, 1);

However, in textlike(), PG_FREE_IF_COPY calls are missing.

https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/like.c#L283

Is this a memory leak bug?

Regards,
-cktan

86 matches

Mail list logo