Re: Question about savepoint level?

2022-10-24 Thread Japin Li

On Mon, 24 Oct 2022 at 12:19, Japin Li  wrote:
> Hi, hackers
>
> The TransactionStateData has savepointLevel field, however, I do not 
> understand
> what is savepoint level, it seems all savepoints have the same savepointLevel,
> I want to know how the savepoint level changes.

I try to remove the savepointLevel, and it seems harmless.  Any thoughts?

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

>From 1e5c015efc44bcf2bc93365e99740deb618eebfe Mon Sep 17 00:00:00 2001
From: Japin Li 
Date: Mon, 24 Oct 2022 14:54:03 +0800
Subject: [PATCH v1] Remove useless savepoint level

---
 src/backend/access/transam/xact.c | 18 --
 1 file changed, 18 deletions(-)

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index fd5103a78e..e8a90a3a30 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -190,7 +190,6 @@ typedef struct TransactionStateData
 	FullTransactionId fullTransactionId;	/* my FullTransactionId */
 	SubTransactionId subTransactionId;	/* my subxact ID */
 	char	   *name;			/* savepoint name, if any */
-	int			savepointLevel; /* savepoint level */
 	TransState	state;			/* low-level state */
 	TBlockState blockState;		/* high-level state */
 	int			nestingLevel;	/* transaction nesting depth */
@@ -3234,12 +3233,10 @@ CommitTransactionCommand(void)
 		case TBLOCK_SUBRESTART:
 			{
 char	   *name;
-int			savepointLevel;
 
 /* save name and keep Cleanup from freeing it */
 name = s->name;
 s->name = NULL;
-savepointLevel = s->savepointLevel;
 
 AbortSubTransaction();
 CleanupSubTransaction();
@@ -3247,7 +3244,6 @@ CommitTransactionCommand(void)
 DefineSavepoint(NULL);
 s = CurrentTransactionState;	/* changed by push */
 s->name = name;
-s->savepointLevel = savepointLevel;
 
 /* This is the same as TBLOCK_SUBBEGIN case */
 AssertState(s->blockState == TBLOCK_SUBBEGIN);
@@ -3263,19 +3259,16 @@ CommitTransactionCommand(void)
 		case TBLOCK_SUBABORT_RESTART:
 			{
 char	   *name;
-int			savepointLevel;
 
 /* save name and keep Cleanup from freeing it */
 name = s->name;
 s->name = NULL;
-savepointLevel = s->savepointLevel;
 
 CleanupSubTransaction();
 
 DefineSavepoint(NULL);
 s = CurrentTransactionState;	/* changed by push */
 s->name = name;
-s->savepointLevel = savepointLevel;
 
 /* This is the same as TBLOCK_SUBBEGIN case */
 AssertState(s->blockState == TBLOCK_SUBBEGIN);
@@ -4352,11 +4345,6 @@ ReleaseSavepoint(const char *name)
 (errcode(ERRCODE_S_E_INVALID_SPECIFICATION),
  errmsg("savepoint \"%s\" does not exist", name)));
 
-	/* disallow crossing savepoint level boundaries */
-	if (target->savepointLevel != s->savepointLevel)
-		ereport(ERROR,
-(errcode(ERRCODE_S_E_INVALID_SPECIFICATION),
- errmsg("savepoint \"%s\" does not exist within current savepoint level", name)));
 
 	/*
 	 * Mark "commit pending" all subtransactions up to the target
@@ -4461,11 +4449,6 @@ RollbackToSavepoint(const char *name)
 (errcode(ERRCODE_S_E_INVALID_SPECIFICATION),
  errmsg("savepoint \"%s\" does not exist", name)));
 
-	/* disallow crossing savepoint level boundaries */
-	if (target->savepointLevel != s->savepointLevel)
-		ereport(ERROR,
-(errcode(ERRCODE_S_E_INVALID_SPECIFICATION),
- errmsg("savepoint \"%s\" does not exist within current savepoint level", name)));
 
 	/*
 	 * Mark "abort pending" all subtransactions up to the target
@@ -5253,7 +5236,6 @@ PushTransaction(void)
 	s->parent = p;
 	s->nestingLevel = p->nestingLevel + 1;
 	s->gucNestLevel = NewGUCNestLevel();
-	s->savepointLevel = p->savepointLevel;
 	s->state = TRANS_DEFAULT;
 	s->blockState = TBLOCK_SUBBEGIN;
 	GetUserIdAndSecContext(&s->prevUser, &s->prevSecContext);
-- 
2.25.1



Re: Improve tab completion for ALTER STATISTICS

2022-10-24 Thread Michael Paquier
On Wed, Oct 19, 2022 at 04:06:51PM +0530, vignesh C wrote:
> I noticed that the tab completion for ALTER STATISTICS .. SET was not
> handled. The attached patch displays SCHEMA and STATISTICS for tab
> completion of ALTER STATISTICS name SET.

Indeed, it is a bit strange as we would get a list of settable
parameters once the completion up to SET is done, rather than
STATISTICS and SCHEMA.  Your patch looks fine, so applied.  Thanks!
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] minor bug fix for pg_dump --clean

2022-10-24 Thread Frédéric Yhuel

On 10/24/22 03:01, Tom Lane wrote:

=?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Yhuel?=  writes:

When using pg_dump (or pg_restore) with option "--clean", there is some
SQL code to drop every objects at the beginning.


Yup ...


The DROP statement for a view involving circular dependencies is :
CREATE OR REPLACE VIEW [...]
(see commit message of d8c05aff for a much better explanation)
If the view is not in the "public" schema, and the target database is
empty, this statement fails, because the schema hasn't been created yet.
The attached patches are a TAP test which can be used to reproduce the
bug, and a proposed fix. They apply to the master branch.


I am disinclined to accept this as a valid bug, because there's never
been any guarantee that a --clean script would execute error-free in
a database that doesn't match what the source database contains.

(The pg_dump documentation used to say that in so many words.
I see that whoever added the --if-exists option was under the
fond illusion that that fixes all cases, which it surely does not.
We need to back off the promises a bit there.)

An example of a case that won't execute error-free is if the view
having a circular dependency includes a column of a non-built-in
data type.  If you try to run that in an empty database, you'll
get an error from the CREATE OR REPLACE VIEW's reference to that
data type.  For instance, if I adjust your test case to make
the "payload" column be of type hstore, I get something like

psql:dumpresult.sql:22: ERROR:  type "public.hstore" does not exist
LINE 4: NULL::public.hstore AS payload;
   ^

The same type of failure occurs for user-defined functions and
operators that use a non-built-in type, and I'm sure there are
more cases in the same vein.  But it gets *really* messy if
the target database isn't completely empty, but contains objects
with different properties than the dump script expects; for example,
if the view being discussed here exists with a different column set
than the script thinks, or if the dependency chains aren't all the
same.

If this fix were cleaner I might be inclined to accept it anyway,
but it's not very clean at all --- for example, it's far from
obvious to me what are the side-effects of changing the filter
in RestoreArchive like that.  Nor am I sure that the schema
you want to create is guaranteed to get dropped again later in
every use-case.



Hi Tom, Viktoria,

Thank you for your review Viktoria!

Thank you for this detailed explanation, Tom! I didn't have great hope 
for this patch. I thought that the TAP test could be accepted, but now I 
can see that it is clearly useless.




So I think mainly what we ought to do here is to adjust the
documentation to make it clearer that --clean is not guaranteed
to work without errors unless the target database has the same
set of objects as the source.  --if-exists can reduce the set
of error cases, but not eliminate it.  Possibly we should be
more enthusiastic about recommending --create --clean (ie,
drop and recreate the whole database) instead.



I beleive a documentation patch would be useful, indeed.

Best regards,
Frédéric




Re: Allow file inclusion in pg_hba and pg_ident files

2022-10-24 Thread Michael Paquier
On Mon, Oct 24, 2022 at 01:33:12PM +0800, Julien Rouhaud wrote:
> v12 attached, fixing multiple conflicts with recent activity.

 typedef struct TokenizedAuthLine
 {
List   *fields; /* List of lists of AuthTokens */
+   char   *file_name;  /* File name *

Hmm.  While putting my eyes on this patch set for the first time in a
few months (sorry!), the addition of file_name to TokenizedAuthLine in
0002 stands out.  This impacts all the macros used for the validation
of the ident and HBA lines, leading as well to a lot of bloat in the
patch with patterns like that in 20~25 places:
 errcontext("line %d of configuration file \"%s\"", \
-   line_num, IdentFileName))); \
+   line_num, file_name))); \
[...]
 errcontext("line %d of configuration file \"%s\"",
-   line_num, HbaFileName)));
+   line_num, file_name)));

Do you think that it would make sense to split that into its own
patch?  That would move the code toward less references to HbaFileName
and IdentFileName, which is one step toward what we want for this
thread.  This tokenization of HBA and ident files is already shared,
so moving this knowledge into TokenizedAuthLine looks helpful
independently of the rest.
--
Michael


signature.asc
Description: PGP signature


Re: Move backup-related code to xlogbackup.c/.h

2022-10-24 Thread Michael Paquier
On Wed, Oct 19, 2022 at 09:07:04PM +0530, Bharath Rupireddy wrote:
> XLogBackupResetRunning() seemed better. +1 for above function names.

I see what you are doing here.  XLogCtl would still live in xlog.c,
but we want to have functions that are able to manipulate some of its
fields.  I am not sure to like that much because it introduces a
circling dependency between xlog.c and xlogbackup.c.  As of HEAD,
xlog.c calls build_backup_content() from xlogbackup.c, which is fine
as xlog.c is kind of a central piece that feeds on the insert and
recovery pieces.  However your patch makes some code paths of
xlogbackup.c call routines from xlog.c, and I don't think that we
should do that.

> I'm okay either way.
> 
> Please see the attached v8 patch set.

Among all that, CleanupBackupHistory() is different, still it has a
dependency with some of the archiving pieces..
--
Michael


signature.asc
Description: PGP signature


PGDOCS - Logical replication GUCs - added some xrefs

2022-10-24 Thread Peter Smith
Hi hackers.

There is a docs Logical Replication section "31.10 Configuration
Settings" [1] which describes some logical replication GUCs, and
details on how they interact with each other and how to take that into
account when setting their values.

There is another docs Server Configuration section "20.6 Replication"
[2] which lists the replication-related GUC parameters, and what they
are for.

Currently AFAIK those two pages are unconnected, but I felt it might
be helpful if some of the parameters in the list [2] had xref links to
the additional logical replication configuration information [1]. PSA
a patch to do that.

~~

Meanwhile, I also suspect that the main blurb top of [1] is not
entirely correct... it says "These settings control the behaviour of
the built-in streaming replication feature", although some of the GUCs
mentioned later in this section are clearly for "logical replication".

Thoughts?

--
[1] 31.10 Configuration Settings -
https://www.postgresql.org/docs/current/logical-replication-config.html
[2] 20.6 Replication -
https://www.postgresql.org/docs/current/runtime-config-replication.html

Kind Regards,
Peter Smith.
Fujitsu Australia


v1-0001-Logical-replication-GUCs-added-some-docs-xrefs.patch
Description: Binary data


Re: Allow file inclusion in pg_hba and pg_ident files

2022-10-24 Thread Julien Rouhaud
Hi,

On Mon, Oct 24, 2022 at 04:13:51PM +0900, Michael Paquier wrote:
> On Mon, Oct 24, 2022 at 01:33:12PM +0800, Julien Rouhaud wrote:
> > v12 attached, fixing multiple conflicts with recent activity.
>
>  typedef struct TokenizedAuthLine
>  {
> List   *fields; /* List of lists of AuthTokens */
> +   char   *file_name;  /* File name *
>
> Hmm.  While putting my eyes on this patch set for the first time in a
> few months (sorry!), the addition of file_name to TokenizedAuthLine in
> 0002 stands out.  This impacts all the macros used for the validation
> of the ident and HBA lines, leading as well to a lot of bloat in the
> patch with patterns like that in 20~25 places:
>  errcontext("line %d of configuration file \"%s\"", \
> -   line_num, IdentFileName))); \
> +   line_num, file_name))); \
> [...]
>  errcontext("line %d of configuration file \"%s\"",
> -   line_num, HbaFileName)));
> +   line_num, file_name)));
>
> Do you think that it would make sense to split that into its own
> patch?  That would move the code toward less references to HbaFileName
> and IdentFileName, which is one step toward what we want for this
> thread.  This tokenization of HBA and ident files is already shared,
> so moving this knowledge into TokenizedAuthLine looks helpful
> independently of the rest.

It would also require to bring HbaLine->sourcefile.  I'm afraid it would be
weird to introduce such a refactoring in a separate commit just to pass a
constant down multiple level of indirection, as all the macro will remain
specific to either hba or ident anyway.

I agree that there are quite a lot of s/XXXFileName/file_name/, but those
aren't complicated, and keeping them in the same commit makes it easy to
validate that none has been forgotten since the regression tests covering those
messages are in that commit too.

And of course while double checking that none was forgotten I realize that I
missed the new regcomp_auth_token() which introduced a couple new usage of
HbaFileName.




Re: effective_multixact_freeze_max_age issue

2022-10-24 Thread Anton A. Melnikov

Hello!

On 18.10.2022 20:56, Peter Geoghegan wrote:
 

The term "removable cutoff" is how VACUUM VERBOSE reports OldestXmin.
I think that it's good to use the same terminology here.


Thanks for the explanation! Firstly exactly this term confused me.
Sure, the same terminology makes all easier to understand.




Could you clarify this moment please? Would be very grateful.


While this WARNING is triggered when a threshold controlled by
autovacuum_freeze_max_age is crossed, it's not just a problem with
freezing. It's convenient to use autovacuum_freeze_max_age to
represent "a wildly excessive number of XIDs for OldestXmin to be held
back by, no matter what". In practice it is usually already a big
problem when OldestXmin is held back by far fewer XIDs than that, but
it's hard to reason about when exactly we need to consider that a
problem. However, we can at least be 100% sure of real problems when
OldestXmin age reaches autovacuum_freeze_max_age. There is no longer
any doubt that we need to warn the user, since antiwraparound
autovacuum cannot work as designed at that point. But the WARNING is
nevertheless not primarily (or not exclusively) about not being able
to freeze. It's also about not being able to remove bloat.> Freezing can be 
thought of as roughly the opposite process to removing
dead tuples deleted by now committed transactions. OldestXmin is the
cutoff both for removing dead tuples (which we want to get rid of
immediately), and freezing live all-visible tuples (which we want to
keep long term). FreezeLimit is usually 50 million XIDs before
OldestXmin (the freeze_min_age default), but that's just how we
implement lazy freezing, which is just an optimization.



That's clear. Thanks a lot!


As variant may be split these checks for the freeze cuttoffs and the oldest 
xmins for clarity?
The patch attached tries to do this.


I don't think that this is an improvement. For one thing the
FreezeLimit cutoff will have been held back (relative to nextXID-wise
table age) by more than the freeze_min_age setting for a long time
before this WARNING is hit -- so we're not going to show the WARNING
in most cases where freeze_min_age has been completely ignored (it
must be ignored in extreme cases because FreezeLimit must always be <=
OldestXmin).



Also, the proposed new WARNING is only seen when we're
bound to also see the existing OldestXmin WARNING already. Why have 2
WARNINGs about exactly the same problem?> 


I didn't understand this moment.

If the FreezeLimit is always older than OldestXmin or equal to it according to:


FreezeLimit is usually 50 million XIDs before
OldestXmin (the freeze_min_age default),
 
can't there be a situation like this?


   __
  |  autovacuum_freeze_max_age   |
past <___||_||> future
FreezeLimit  safeOldestXmin   oldestXmin  nextXID
 |___|
  freeze_min_age

In that case the existing OldestXmin WARNING will not fire while the new one 
will.
As the FreezeLimit is only optimization it's likely not a warning but a notice 
message
before OldestXmin WARNING and possible real problems in the future.
Maybe it can be useful in a such kind?

With best wishes,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Pluggable toaster

2022-10-24 Thread Aleksander Alekseev
Hi Nikita,

I don't argue with most of what you say. I am just pointing out the
reason why the chosen approach "N TOASTers x M TableAMs" will not
work:

> Don't you think that this is an arguable design decision? Basically
> all we know about the underlying TableAM is that it stores tuples
> _somehow_ and that tuples have TIDs [1]. That's it. We don't know if
> it even has any sort of pages, whether they are fixed in size or not,
> whether it uses shared buffers, etc. It may not even require TOAST.
> [...]

Also I completely agree with:

> Implementing another Table AM just to implement another TOAST strategy seems 
> too
> much, the TAM API is very heavy and complex, and you would have to add it as 
> a contrib.

This is what I meant above when talking about the framework for
simplifying this task:

> It looks like the idea should be actually turned inside out. I.e. what
> would be nice to have is some sort of _framework_ that helps TableAM
> authors to implement TOAST (alternatively, the rest of the TableAM
> except for TOAST) if the TableAM is similar to the default one.

>From the user perspective it's much easier to think about one entity -
TableAM, and choosing from heapam_with_default_toast and
heapam_with_different_toast.

>From the extension implementer point of view creating TableAMs is a
difficult task. This is what the framework should address. Ideally the
interface should be as simple as:

CreateParametrizedDefaultHeapAM(SomeTOASTSubstitutionObject, ...other
arguments, in the future...)

Where the extension author should be worried only about an alternative
TOAST implementation.

I think at some point such a framework may address at least one more
issue we have - an inability to change the page size on the table
level. As it was shown by Tomas Vondra [1] the default 8 KB page size
can be suboptimal depending on the load. So it would be nice if the
user could change it without rebuilding PostgreSQL. Naturally this is
out of scope of this particular patchset. I just wanted to point out
opportunities we have here.

[1]: 
https://www.postgresql.org/message-id/flat/b4861449-6c54-ccf8-e67c-c039228cdc6d%40enterprisedb.com

--
Best regards,
Aleksander Alekseev




Re: Question about savepoint level?

2022-10-24 Thread Richard Guo
On Mon, Oct 24, 2022 at 3:00 PM Japin Li  wrote:

> On Mon, 24 Oct 2022 at 12:19, Japin Li  wrote:
> > The TransactionStateData has savepointLevel field, however, I do not
> understand
> > what is savepoint level, it seems all savepoints have the same
> savepointLevel,
> > I want to know how the savepoint level changes.
>
> I try to remove the savepointLevel, and it seems harmless.  Any thoughts?


ISTM the savepointLevel always remains the same as what is in
TopTransactionStateData after looking at the codes. Now I also get
confused. Maybe what we want is nestingLevel?

Thanks
Richard


Re: Crash after a call to pg_backup_start()

2022-10-24 Thread Alvaro Herrera
On 2022-Oct-24, Michael Paquier wrote:

> On the contrary, it seems to me that putting the assertion within the
> if() block makes the assertion weaker, because we would never check
> for an incorrect state after do_pg_abort_backup() is registered (aka
> any pg_backup_start() call) when not entering in this if() block.

Reading it again, I agree with your conclusion, so I'll push as you
proposed with some extra tests, after they complete running.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"La verdad no siempre es bonita, pero el hambre de ella sí"




pg_dump: Refactor code that constructs ALTER ... OWNER TO commands

2022-10-24 Thread Peter Eisentraut
Avoid having to list all the possible object types twice.  Instead, only 
_getObjectDescription() needs to know about specific object types.  It 
communicates back to _printTocEntry() whether an owner is to be set.


In passing, remove the logic to use ALTER TABLE to set the owner of 
views and sequences.  This is no longer necessary.  Furthermore, if 
pg_dump doesn't recognize the object type, this is now a fatal error, 
not a warning.From ef05e1dcecbdd51974c2dad0c552b314a7c00f4c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 24 Oct 2022 11:46:37 +0200
Subject: [PATCH] pg_dump: Refactor code that constructs ALTER ... OWNER TO
 commands

Avoid having to list all the possible object types twice.  Instead,
only _getObjectDescription() needs to know about specific object
types.  It communicates back to _printTocEntry() whether an owner is
to be set.

In passing, remove the logic to use ALTER TABLE to set the owner of
views and sequences.  This is no longer necessary.  Furthermore, if
pg_dump doesn't recognize the object type, this is now a fatal error,
not a warning.
---
 src/bin/pg_dump/pg_backup_archiver.c | 130 ++-
 1 file changed, 47 insertions(+), 83 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup_archiver.c 
b/src/bin/pg_dump/pg_backup_archiver.c
index 233198afc0c9..f39c0fa36fdc 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -72,7 +72,7 @@ typedef struct _parallelReadyList
 static ArchiveHandle *_allocAH(const char *FileSpec, const ArchiveFormat fmt,
   const int 
compression, bool dosync, ArchiveMode mode,
   SetupWorkerPtrType 
setupWorkerPtr);
-static void _getObjectDescription(PQExpBuffer buf, TocEntry *te);
+static void _getObjectDescription(PQExpBuffer buf, const TocEntry *te);
 static void _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool isData);
 static char *sanitize_line(const char *str, bool want_hyphen);
 static void _doSetFixedOutputState(ArchiveHandle *AH);
@@ -3398,27 +3398,27 @@ _selectTableAccessMethod(ArchiveHandle *AH, const char 
*tableam)
  * Extract an object description for a TOC entry, and append it to buf.
  *
  * This is used for ALTER ... OWNER TO.
+ *
+ * If the object type has no owner, do nothing.
  */
 static void
-_getObjectDescription(PQExpBuffer buf, TocEntry *te)
+_getObjectDescription(PQExpBuffer buf, const TocEntry *te)
 {
const char *type = te->desc;
 
-   /* Use ALTER TABLE for views and sequences */
-   if (strcmp(type, "VIEW") == 0 || strcmp(type, "SEQUENCE") == 0 ||
-   strcmp(type, "MATERIALIZED VIEW") == 0)
-   type = "TABLE";
-
/* objects that don't require special decoration */
if (strcmp(type, "COLLATION") == 0 ||
strcmp(type, "CONVERSION") == 0 ||
strcmp(type, "DOMAIN") == 0 ||
-   strcmp(type, "TABLE") == 0 ||
-   strcmp(type, "TYPE") == 0 ||
strcmp(type, "FOREIGN TABLE") == 0 ||
+   strcmp(type, "MATERIALIZED VIEW") == 0 ||
+   strcmp(type, "SEQUENCE") == 0 ||
+   strcmp(type, "STATISTICS") == 0 ||
+   strcmp(type, "TABLE") == 0 ||
strcmp(type, "TEXT SEARCH DICTIONARY") == 0 ||
strcmp(type, "TEXT SEARCH CONFIGURATION") == 0 ||
-   strcmp(type, "STATISTICS") == 0 ||
+   strcmp(type, "TYPE") == 0 ||
+   strcmp(type, "VIEW") == 0 ||
/* non-schema-specified objects */
strcmp(type, "DATABASE") == 0 ||
strcmp(type, "PROCEDURAL LANGUAGE") == 0 ||
@@ -3427,33 +3427,28 @@ _getObjectDescription(PQExpBuffer buf, TocEntry *te)
strcmp(type, "FOREIGN DATA WRAPPER") == 0 ||
strcmp(type, "SERVER") == 0 ||
strcmp(type, "PUBLICATION") == 0 ||
-   strcmp(type, "SUBSCRIPTION") == 0 ||
-   strcmp(type, "USER MAPPING") == 0)
+   strcmp(type, "SUBSCRIPTION") == 0)
{
appendPQExpBuffer(buf, "%s ", type);
if (te->namespace && *te->namespace)
appendPQExpBuffer(buf, "%s.", fmtId(te->namespace));
appendPQExpBufferStr(buf, fmtId(te->tag));
-   return;
}
-
/* BLOBs just have a name, but it's numeric so must not use fmtId */
-   if (strcmp(type, "BLOB") == 0)
+   else if (strcmp(type, "BLOB") == 0)
{
appendPQExpBuffer(buf, "LARGE OBJECT %s", te->tag);
-   return;
}
-
/*
 * These object types require additional decoration.  Fortunately, the
 * information needed is exactly what's in the DROP command.
 */
-   if (strcmp(type, "AGGREGATE") == 0 ||
-   strcmp(type, "FUNCTION") == 0 ||
-   strcmp(type, "OPERATO

Re: Reducing the chunk header sizes on all memory context types

2022-10-24 Thread John Naylor
On Thu, Oct 20, 2022 at 1:55 AM Andres Freund  wrote:
>
> Hi,
>
> On 2022-10-11 10:21:17 +0700, John Naylor wrote:
> > On Tue, Oct 11, 2022 at 5:31 AM David Rowley 
wrote:
> > >
> > > The proposed patches in [1] do aim to make additional usages of the
> > > slab allocator, and I have a feeling that we'll want to fix the
> > > performance of slab.c before those. Perhaps the Asserts are a better
> > > option if we're to get the proposed radix tree implementation.
> >
> > Going by [1], that use case is not actually a natural fit for slab
because
> > of memory fragmentation.
> >
> > [1]
> >
https://www.postgresql.org/message-id/20220704220038.at2ane5xkymzzssb%40awork3.anarazel.de
>
> Not so sure about that - IIRC I made one slab for each different size
class,
> which seemed to work well and suit slab well?

If that's the case, then great! The linked message didn't give me that
impression, but I won't worry about it.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Question about savepoint level?

2022-10-24 Thread Alvaro Herrera
On 2022-Oct-24, Richard Guo wrote:

> On Mon, Oct 24, 2022 at 3:00 PM Japin Li  wrote:
> 
> > I try to remove the savepointLevel, and it seems harmless.  Any thoughts?
> 
> ISTM the savepointLevel always remains the same as what is in
> TopTransactionStateData after looking at the codes. Now I also get
> confused. Maybe what we want is nestingLevel?

This has already been discussed:
https://postgr.es/m/1317297307-sup-7...@alvh.no-ip.org
Now that we have transaction-controlling procedures, I think the next
step is to add the SQL-standard feature that allows savepoint level
control for them, which would make the savepointLevel no longer dead
code.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"You're _really_ hosed if the person doing the hiring doesn't understand
relational systems: you end up with a whole raft of programmers, none of
whom has had a Date with the clue stick."  (Andrew Sullivan)




Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-24 Thread Drouvot, Bertrand

Hi,

On 10/24/22 5:34 AM, Michael Paquier wrote:

On Fri, Oct 21, 2022 at 02:10:37PM +0200, Drouvot, Bertrand wrote:

On 10/21/22 2:58 AM, Michael Paquier wrote:


I have spent a couple of hours doing a pass over v2, playing manually
with regex patterns, reloads, the system views and item lists.  The
logic was fine, but I have adjusted a few things related to the
comments and the documentation (particularly with the examples,
removing one example and updating one with a regex that has a comma,
needing double quotes).  The CI and all my machines were green, and
the test coverage looked sufficient.  So, applied. 

Thanks!

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Question about savepoint level?

2022-10-24 Thread Japin Li


On Mon, 24 Oct 2022 at 17:56, Alvaro Herrera  wrote:
> This has already been discussed:
> https://postgr.es/m/1317297307-sup-7...@alvh.no-ip.org

Sorry for my lazy search.

> Now that we have transaction-controlling procedures, I think the next
> step is to add the SQL-standard feature that allows savepoint level
> control for them, which would make the savepointLevel no longer dead
> code.

So the savepoint level is used for CREATE PROCEDURE ... OLD/NEW SAVEPOINT LEVEL
syntax [1], right?

[1] https://www.ibm.com/docs/en/db2/10.1.0?topic=statements-create-procedure-sql

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: Some comments that should've covered MERGE

2022-10-24 Thread Alvaro Herrera
On 2022-Oct-19, Richard Guo wrote:

> Hi hackers,
> 
> I happened to notice $subject. Attach a trivial patch for that.

Thanks, applied.  I did change the comment atop setTargetTable, which I
thought could use a little bit more detail on what is happening, and
also in its callsite in transformMergeStmt.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"La experiencia nos dice que el hombre peló millones de veces las patatas,
pero era forzoso admitir la posibilidad de que en un caso entre millones,
las patatas pelarían al hombre" (Ijon Tichy)




Re: Question about savepoint level?

2022-10-24 Thread Alvaro Herrera
On 2022-Oct-24, Japin Li wrote:

> On Mon, 24 Oct 2022 at 17:56, Alvaro Herrera  wrote:

> > Now that we have transaction-controlling procedures, I think the next
> > step is to add the SQL-standard feature that allows savepoint level
> > control for them, which would make the savepointLevel no longer dead
> > code.
> 
> So the savepoint level is used for CREATE PROCEDURE ... OLD/NEW SAVEPOINT 
> LEVEL
> syntax [1], right?
> 
> [1] 
> https://www.ibm.com/docs/en/db2/10.1.0?topic=statements-create-procedure-sql

Yeah, that's what I understand.  The default behavior is the current
behavior (OLD SAVEPOINT LEVEL).  In a procedure that specifies NEW
SAVEPOINT LEVEL trying to rollback a savepoint that was defined before
the procedure was called is an error, which sounds a useful protection.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"El sentido de las cosas no viene de las cosas, sino de
las inteligencias que las aplican a sus problemas diarios
en busca del progreso." (Ernesto Hernández-Novich)




Re: Mingw task for Cirrus CI

2022-10-24 Thread Melih Mutlu
Hi,


> > +
> > +  on_failure:
> > +<<: *on_failure_meson
> > +cores_script: |
> > +  %BASH_EXE% -lc "cd build src/tools/ci/cores_backtrace.sh msys
> build/tmp_install"
> > +
>
> This is wrong - it should just archive the same files that the current
> windows
> task does.
>

Changed it with the on_failure from the other windows task.


> Other than that, I think this is basically ready?
>

If you say so, then I think it's ready.

Best,
Melih


v10-0001-Added-Windows-with-MinGW-environment-in-Cirrus-CI.patch
Description: Binary data


Re: Pluggable toaster

2022-10-24 Thread Nikita Malakhov
Hi!

>I don't argue with most of what you say. I am just pointing out the
>reason why the chosen approach "N TOASTers x M TableAMs" will not
>work:

We assume that TAM used in custom Toaster works as it is should work,
and leave TAM internals to this TAM developer - say, we do not want to
change internals of Heap AM.

We don't want to create some kind of silver bullet. There are already
existing
and widely-known (from production environments) problems with TOAST
mechanics, and we suggest not too complex way to solve them.

As I mentioned before, Pluggable TOAST does not change Heap AM, it is
not minded to change TAMs.

>This is what I meant above when talking about the framework for
>simplifying this task:

That's a kind of generalizing custom TOAST implementation. It is very
good intention, but keep in mind that different kinds of data require very
different approach to external storage - say, JSON TOAST works with
maps of keys and values, super binary object (experimental name) does
not work with internals of TOASTed data except searching. But, we thought
 about that too and reusable code resides in toast_internals.c source - any
custom Toaster working with Heap could use it's insert, update and fetch
methods, but deal with data on it's own.

Even with the general framework there must be a common interface which
would be the entry point for those custom methods developed with the
framework. That's what the TOAST API is - just an interface that all custom
TOAST implementations use to have a common entry point from any TAM,
with syntax support to plug in custom TOAST implementations from the SQL.
No less, but no more.

Moreover, our patches show that even Generic (default) TOAST implementation
could still be left as-is, without necessity to route it via our API,
though it is logically
wrong because common API is meant to be common for all TOAST implementations
without exceptions.

Have I answered your question? Please don't hesitate to point to any unclear
parts, I'd be glad to explain that.

The main idea in TOAST API is very elegant and light, and it is designed
alike
to Pluggable Storage (Table AM API).

On Mon, Oct 24, 2022 at 12:10 PM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> Hi Nikita,
>
> I don't argue with most of what you say. I am just pointing out the
> reason why the chosen approach "N TOASTers x M TableAMs" will not
> work:
>
> > Don't you think that this is an arguable design decision? Basically
> > all we know about the underlying TableAM is that it stores tuples
> > _somehow_ and that tuples have TIDs [1]. That's it. We don't know if
> > it even has any sort of pages, whether they are fixed in size or not,
> > whether it uses shared buffers, etc. It may not even require TOAST.
> > [...]
>
> Also I completely agree with:
>
> > Implementing another Table AM just to implement another TOAST strategy
> seems too
> > much, the TAM API is very heavy and complex, and you would have to add
> it as a contrib.
>
> This is what I meant above when talking about the framework for
> simplifying this task:
>
> > It looks like the idea should be actually turned inside out. I.e. what
> > would be nice to have is some sort of _framework_ that helps TableAM
> > authors to implement TOAST (alternatively, the rest of the TableAM
> > except for TOAST) if the TableAM is similar to the default one.
>
> From the user perspective it's much easier to think about one entity -
> TableAM, and choosing from heapam_with_default_toast and
> heapam_with_different_toast.
>
> From the extension implementer point of view creating TableAMs is a
> difficult task. This is what the framework should address. Ideally the
> interface should be as simple as:
>
> CreateParametrizedDefaultHeapAM(SomeTOASTSubstitutionObject, ...other
> arguments, in the future...)
>
> Where the extension author should be worried only about an alternative
> TOAST implementation.
>
> I think at some point such a framework may address at least one more
> issue we have - an inability to change the page size on the table
> level. As it was shown by Tomas Vondra [1] the default 8 KB page size
> can be suboptimal depending on the load. So it would be nice if the
> user could change it without rebuilding PostgreSQL. Naturally this is
> out of scope of this particular patchset. I just wanted to point out
> opportunities we have here.
>
> [1]:
> https://www.postgresql.org/message-id/flat/b4861449-6c54-ccf8-e67c-c039228cdc6d%40enterprisedb.com
>
> --
> Best regards,
> Aleksander Alekseev
>


-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/


Re: Testing DDL Deparser

2022-10-24 Thread Alvaro Herrera
On 2022-Oct-20, Runqi Tian wrote:

> My question regarding subcommand is actually on commands other than
> ALTER TABLE. Let me give an example (You can also find this example in
> the patch), when command like
> 
> CREATE SCHEMA element_test CREATE TABLE foo (id int)
> 
> is caught by ddl_command_end trigger, function
> pg_event_trigger_ddl_commands() will return 2 records which I called
> as subcommands in the previous email.

Ah, right.

I don't remember why we made these commands be separate; but for
instance if you try to add a SERIAL column you'll also see one command
to create the sequence, then the table, then the sequence gets its OWNED
BY the column.

I think the point is that we want to have some regularity so that an
application can inspect the JSON blobs and perhaps modify them; if you
make a bunch of sub-objects, this becomes more difficult.  For DDL
replication purposes perhaps this isn't very useful (since you just grab
it and execute on the other side as-is), but other use cases might have
other ideas.

> Is this behavior expected? I thought the deparser is designed to
> deparse the entire command to a single command instead of dividing
> them into 2 commands.

It is expected.

> It seems that keeping separate test cases in deparser tests folder is
> better than using the test cases in core regression tests folder
> directly. I will write some test cases in the new deparser test
> folder.

Well, the reason to use the regular regression tests rather than
separate, is that when a developer adds a new feature, they will add
test cases for it in regular regression tests, so deparsing of their
command will be automatically picked up by the DDL-deparse testing
framework.  We discussed at the time that another option would be to
have patch reviewers ensure that the added DDL commands are also tested
in the DDL-deparse framework, but nobody wants to add yet another thing
that we have to remember (or, more likely, forget).

> I see, it seems that a function to deparse DROP command to JSON output
> is needed but not provided yet. I implemented a function
> deparse_drop_ddl() in the testing harness, maybe we could consider
> exposing a function in engine to deparse DROP command as
> deparse_ddl_command() does.

No objection against this idea.

> updated to:
> 
> 1, The deparsed JSON is the same as the expected string

I would rather say "has the same effects as".

> 1, Build DDL event triggers and DDL deparser into pg_regress tests so
> that DDL commands in these tests can be captured and deparsed.
> 2, Let the goal 3 implementation, aka the TAP test to execute test
> cases from pg_regress, if sub and pub node don’t dump the same
> results, some DDL commands must be changed.
> 
> Solution 1 is more lighter weight as we only need to run pg_regress
> once. Any other thoughts?

We have several runs of pg_regress already -- apart from the normal one,
we run it once in recovery/t/027_stream_regress.pl and once in the
pg_upgrade test.  I'm not sure that we necessarily need to avoid another
one here, particularly if avoiding it would potentially pollute the
results for the regular tests.  I am okay with solution 2 personally.

If we really wanted to optimize this, perhaps we should try to drive all
three uses (pg_upgrade, stream_regress, this new test) from a single
pg_regress run.  But ISTM we don't *have* to.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Hay dos momentos en la vida de un hombre en los que no debería
especular: cuando puede permitírselo y cuando no puede" (Mark Twain)




Re: Perform streaming logical transactions by background workers and parallel apply

2022-10-24 Thread Amit Kapila
On Fri, Oct 21, 2022 at 3:02 PM houzj.f...@fujitsu.com
 wrote:
>

Few comments on the 0001 and 0003 patches:

v40-0001*
==
1.
+ /*
+ * The queue used to transfer messages from the parallel apply worker to
+ * the leader apply worker.
+ */
+ shm_mq_handle *error_mq_handle;

Shall we say error messages instead of messages?

2.
+/*
+ * Is there a message pending in parallel apply worker which we need to
+ * receive?
+ */
+volatile sig_atomic_t ParallelApplyMessagePending = false;

Can we slightly change above comment to: "Is there a message sent by
parallel apply worker which we need to receive?"

3.
+
+ ThrowErrorData(&edata);
+
+ /* Should not reach here after rethrowing an error. */
+ error_context_stack = save_error_context_stack;

Should we instead do Assert(false) after ThrowErrorData?

4.
+ * apply worker (c) necessary information to be shared among parallel apply
+ * workers and leader apply worker (i.e. in_parallel_apply_xact flag and the
+ * corresponding LogicalRepWorker slot information).

I don't think here the comment needs to exactly say which variables
are shared. necessary information to synchronize between parallel
apply workers and leader apply worker.

5.
+ * The dynamic shared memory segment will contain (a) a shm_mq that can be
+ * used to send changes in the transaction from leader apply worker to parallel
+ * apply worker (b) another shm_mq that can be used to send errors

In both (a) and (b), instead of "can be", we can use "is".

6.
Note that we cannot skip the streaming transactions when using
+ * parallel apply workers because we cannot get the finish LSN before
+ * applying the changes.

This comment is unclear about the action of parallel apply worker when
finish LSN is set. We can add something like: "So, we don't start
parallel apply worker when finish LSN is set by the user."

v40-0003
==
7. The function RelationGetUniqueKeyBitmap() should be defined in
relcache.c next to RelationGetIdentityKeyBitmap().

8.
+RelationGetUniqueKeyBitmap(Relation rel)
{
...
+ if (!rel->rd_rel->relhasindex)
+ return NULL;

It would be better to use "if
(!RelationGetForm(relation)->relhasindex)" so as to be consistent with
similar usage in RelationGetUniqueKeyBitmap.

9. In RelationGetUniqueKeyBitmap(), we must assert here that the
historic snapshot is set as we are not taking a lock on index rels.
The same is already ensured in RelationGetIdentityKeyBitmap(), is
there a reason to be different here?

-- 
With Regards,
Amit Kapila.




Re: Perform streaming logical transactions by background workers and parallel apply

2022-10-24 Thread Masahiko Sawada
On Wed, Oct 12, 2022 at 3:04 PM Amit Kapila  wrote:
>
> On Tue, Oct 11, 2022 at 5:52 AM Masahiko Sawada  wrote:
> >
> > On Fri, Oct 7, 2022 at 2:00 PM Amit Kapila  wrote:
> > >
> > > About your point that having different partition structures for
> > > publisher and subscriber, I don't know how common it will be once we
> > > have DDL replication. Also, the default value of
> > > publish_via_partition_root is false which doesn't seem to indicate
> > > that this is a quite common case.
> >
> > So how can we consider these concurrent issues that could happen only
> > when streaming = 'parallel'? Can we restrict some use cases to avoid
> > the problem or can we have a safeguard against these conflicts?
> >
>
> Yeah, right now the strategy is to disallow parallel apply for such
> cases as you can see in *0003* patch.

Tightening the restrictions could work in some cases but there might
still be coner cases and it could reduce the usability. I'm not really
sure that we can ensure such a deadlock won't happen with the current
restrictions. I think we need something safeguard just in case. For
example, if the leader apply worker is waiting for a lock acquired by
its parallel worker, it cancels the parallel worker's transaction,
commits its transaction, and restarts logical replication. Or the
leader can log the deadlock to let the user know.

>
> > We
> > could find a new problematic scenario in the future and if it happens,
> > logical replication gets stuck, it cannot be resolved only by apply
> > workers themselves.
> >
>
> I think users can change streaming option to on/off and internally the
> parallel apply worker can detect and restart to allow replication to
> proceed. Having said that, I think that would be a bug in the code and
> we should try to fix it. We may need to disable parallel apply in the
> problematic case.
>
> The other ideas that occurred to me in this regard are (a) provide a
> reloption (say parallel_apply) at table level and we can use that to
> bypass various checks like different Unique Key between
> publisher/subscriber, constraints/expressions having mutable
> functions, Foreign Key (when enabled on subscriber), operations on
> Partitioned Table. We can't detect whether those are safe or not
> (primarily because of a different structure in publisher and
> subscriber) so we prohibit parallel apply but if users use this
> option, we can allow it even in those cases.

The parallel apply worker is assigned per transaction, right? If so,
how can we know which tables are modified in the transaction in
advance? and what if two tables whose reloptions are true and false
are modified in the same transaction?

> (b) While enabling the
> parallel option in the subscription, we can try to match all the
> table(s) information of the publisher/subscriber. It will be tricky to
> make this work because say even if match some trigger function name,
> we won't be able to match the function body. The other thing is when
> at a later point the table definition is changed on the subscriber, we
> need to again validate the information between publisher and
> subscriber which I think would be difficult as we would be already in
> between processing some message and getting information from the
> publisher at that stage won't be possible.

Indeed.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Pluggable toaster

2022-10-24 Thread Aleksander Alekseev
Hi Nikita,

> >I don't argue with most of what you say. I am just pointing out the
> >reason why the chosen approach "N TOASTers x M TableAMs" will not
> >work:
>
> We assume that TAM used in custom Toaster works as it is should work,
> and leave TAM internals to this TAM developer - say, we do not want to
> change internals of Heap AM.
>
> We don't want to create some kind of silver bullet.

This is exactly the point. In order to not to create a silver bullet,
TOASTers should be limited to a single TableAM. The one we know uses
pages of a known fixed size, the one that actually requires TOAST
because pages are relatively small, etc.

> That's what the TOAST API is - just an interface that all custom
> TOAST implementations use to have a common entry point from any TAM,
> [...]

I believe this is the source of misunderstanding. Note that not _any_
TableAM needs TOAST to begin with. As an example, if one chooses to
implement a column-organized TableAM that stores all text/bytea
attributes in a separate dictionary file this person doesn't need
TOAST and doesn't want to be constrained by the need of choosing one.

For this reason the "N TOASTers x M TableAMs" approach is
architecturally broken.

> keep in mind that different kinds of data require very
> different approach to external storage - say, JSON TOAST works with
> maps of keys and values, [...]

To clarify: is an ability to specify TOASTers for given columns and/or
types also part of the plan?

> Have I answered your question? Please don't hesitate to point to any unclear
> parts, I'd be glad to explain that.

No. To be honest, it looks like you are merely discarding most/any
feedback the community provided so far.

I really think that pluggable TOASTers would be a great feature.
However if the goal is to get it into the core I doubt that we are
going to make much progress with the current approach.

-- 
Best regards,
Aleksander Alekseev




Re: Perform streaming logical transactions by background workers and parallel apply

2022-10-24 Thread Amit Kapila
On Mon, Oct 24, 2022 at 11:41 AM Peter Smith  wrote:
>
> Here are my review comments for v40-0001.
>
> ==
>
> src/backend/replication/logical/worker.c
>
>
> 1. should_apply_changes_for_rel
>
> + else if (am_parallel_apply_worker())
> + {
> + if (rel->state != SUBREL_STATE_READY)
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("logical replication parallel apply worker for subscription
> \"%s\" will stop",
> + MySubscription->name),
> + errdetail("Cannot handle streamed replication transaction using parallel "
> +"apply workers until all tables are synchronized.")));
>
> 1a.
> "transaction" -> "transactions"
>
> 1b.
> "are synchronized" -> "have been synchronized."
>
> e.g. "Cannot handle streamed replication transactions using parallel
> apply workers until all tables have been synchronized."
>
> ~~~
>
> 2. maybe_reread_subscription
>
> + if (am_parallel_apply_worker())
> + ereport(LOG,
> + (errmsg("logical replication parallel apply worker for subscription
> \"%s\" will "
> + "stop because the subscription was removed",
> + MySubscription->name)));
> + else
> + ereport(LOG,
> + (errmsg("logical replication apply worker for subscription \"%s\" will "
> + "stop because the subscription was removed",
> + MySubscription->name)));
>
> Maybe there is an easier way to code this instead of if/else and
> cut/paste message text:
>
> SUGGESTION
>
> ereport(LOG,
> (errmsg("logical replication %s for subscription \"%s\" will stop
> because the subscription was removed",
> am_parallel_apply_worker() ? "parallel apply worker" : "apply worker",
> MySubscription->name)));
> ~~~
>

If we want to go this way then it may be better to record the
appropriate string beforehand and use that here.

-- 
With Regards,
Amit Kapila.




Re: PATCH: Using BRIN indexes for sorted output

2022-10-24 Thread Tomas Vondra



On 10/24/22 06:32, Justin Pryzby wrote:
> On Sat, Oct 15, 2022 at 02:33:50PM +0200, Tomas Vondra wrote:
>> Of course, if there are e.g. BTREE indexes this is going to be slower,
>> but people are unlikely to have both index types on the same column.
> 
> On Sun, Oct 16, 2022 at 05:48:31PM +0200, Tomas Vondra wrote:
>> I don't think it's all that unfair. How likely is it to have both a BRIN
>> and btree index on the same column? And even if you do have such indexes
> 
> Note that we (at my work) use unique, btree indexes on multiple columns
> for INSERT ON CONFLICT into the most-recent tables: UNIQUE(a,b,c,...),
> plus a separate set of indexes on all tables, used for searching:
> BRIN(a) and BTREE(b).  I'd hope that the costing is accurate enough to
> prefer the btree index for searching the most-recent table, if that's
> what's faster (for example, if columns b and c are specified).
> 

Well, the costing is very crude at the moment - at the moment it's
pretty much just a copy of the existing BRIN costing. And the cost is
likely going to increase, because brinsort needs to do regular BRIN
bitmap scan (more or less) and then also a sort (which is an extra cost,
of course). So if it works now, I don't see why would brinsort break it.
Moreover, if you don't have ORDER BY in the query, I don't see why would
we create a brinsort at all.

But if you could test this once the costing gets improved, that'd be
very valuable.

>> +/* There must not be any TID scan in progress yet. */
>> +Assert(node->ss.ss_currentScanDesc == NULL);
>> +
>> +/* Initialize the TID range scan, for the provided block range. */
>> +if (node->ss.ss_currentScanDesc == NULL)
>> +{
> 
> Why is this conditional on the condition that was just Assert()ed ?
> 

Yeah, that's a mistake, due to how the code evolved.

>>  
>> +void
>> +cost_brinsort(BrinSortPath *path, PlannerInfo *root, double loop_count,
>> +   bool partial_path)
> 
> It's be nice to refactor existing code to avoid this part being so
> duplicitive.
> 
>> + * In some situations (particularly with OR'd index conditions) we may
>> + * have scan_clauses that are not equal to, but are logically implied 
>> by,
>> + * the index quals; so we also try a predicate_implied_by() check to see
> 
> Isn't that somewhat expensive ?
> 
> If that's known, then it'd be good to say that in the documentation.
> 

Some of this is probably a residue from create_indexscan_path and may
not be needed for this new node.

>> +{
>> +{"enable_brinsort", PGC_USERSET, QUERY_TUNING_METHOD,
>> +gettext_noop("Enables the planner's use of BRIN sort 
>> plans."),
>> +NULL,
>> +GUC_EXPLAIN
>> +},
>> +&enable_brinsort,
>> +false,
> 
> I think new GUCs should be enabled during patch development.
> Maybe in a separate 0002 patch "for CI only not for commit".
> That way "make check" at least has a chance to hit that new code paths.
> 
> Also, note that indxpath.c had the var initialized to true.
> 

Good point.

>> +attno = (i + 1);
>> +   nranges = (nblocks / pagesPerRange);
>> +   node->bs_phase = (nullsFirst) ? 
>> BRINSORT_LOAD_NULLS : BRINSORT_LOAD_RANGE;
> 
> I'm curious why you have parenthesis these places ?
> 

Not sure, it seemed more readable when writing the code I guess.

>> +#ifndef NODEBrinSort_H
>> +#define NODEBrinSort_H
> 
> NODEBRIN_SORT would be more consistent with NODEINCREMENTALSORT.
> But I'd prefer NODE_* - otherwise it looks like NO DEBRIN.
> 

Yeah, stupid search/replace on the indescan code, which was used as a
starting point.

> This needed a bunch of work needed to pass any of the regression tests -
> even with the feature set to off.
> 
>  . meson.build needs the same change as the corresponding ./Makefile.
>  . guc missing from postgresql.conf.sample
>  . brin_validate.c is missing support for the opr function.
>I gather you're planning on changing this part (?) but this allows to
>pass tests for now.
>  . mingw is warning about OidIsValid(pointer) in nodeBrinSort.c.
>https://cirrus-ci.com/task/5771227447951360?logs=mingw_cross_warning#L969
>  . Uninitialized catalog attribute.
>  . Some typos in your other patches: "heuristics heuristics".  ste.
>lest (least).
> 

Thanks, I'll get this fixed. I've posted the patch as a PoC to showcase
it and gather some feedback, I should have mentioned it's incomplete in
these ways.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Issue in GIN fast-insert: XLogBeginInsert + Read/LockBuffer ordering

2022-10-24 Thread Alvaro Herrera
On 2022-Oct-13, Michael Paquier wrote:

> On Wed, Oct 12, 2022 at 08:54:34PM -0400, Tom Lane wrote:
> > Don't we need to back-patch these fixes?
> 
> I guess I should, though I have finished by not doing due to the
> unlikeliness of the problem, where we would need the combination of a
> page eviction with an error in the critical section to force a PANIC,
> or a crash before the WAL gets inserted.  Other opinions?

I suppose it's a matter of whether any bugs are possible outside of
Neon.  If yes, then definitely this should be backpatched.  Offhand, I
don't see any.  On the other hand, even if no bugs are known, then it's
still valuable to have all code paths do WAL insertion in the same way,
rather than having a single place that is alone in doing it in a
different way.  But if we don't know of any bugs, then backpatching
might be more risk than not doing so.

I confess I don't understand why is it important that XLogBeginInsert is
called inside the critical section.  It seems to me that that part is
only a side-effect of having to acquire the buffer locks in the critical
section.  Right?

I noticed that line 427 logs the GIN metapage with flag REGBUF_STANDARD;
is the GIN metapage really honoring pd_upper?  I see only pg_lower being
set.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"La grandeza es una experiencia transitoria.  Nunca es consistente.
Depende en gran parte de la imaginación humana creadora de mitos"
(Irulan)




[patch] Have psql's \d+ indicate foreign partitions

2022-10-24 Thread Ian Lawrence Barwick
Hi

Recently I have been working a lot with partitioned tables which contain a mix
of local and foreign partitions, and find it would be very useful to be able to
easily obtain an overview of which partitions are foreign and where they are
located.

Currently, executing "\d+" on a partitioned table lists the partitions
like this:

postgres=# \d+ parttest
   Partitioned table "public.parttest"
 Column |  Type   | Collation | Nullable | Default | Storage  |
Compression | Stats target | Description

+-+---+--+-+--+-+--+-
 id | integer |   | not null | | plain|
 |  |
 val1   | text|   |  | | extended |
 |  |
 val2   | text|   |  | | extended |
 |  |
Partition key: HASH (id)
Partitions: parttest_10_0 FOR VALUES WITH (modulus 10, remainder 0),
parttest_10_1 FOR VALUES WITH (modulus 10, remainder 1),
parttest_10_2 FOR VALUES WITH (modulus 10, remainder 2),
parttest_10_3 FOR VALUES WITH (modulus 10, remainder 3),
parttest_10_4 FOR VALUES WITH (modulus 10, remainder 4),
parttest_10_5 FOR VALUES WITH (modulus 10, remainder 5),
parttest_10_6 FOR VALUES WITH (modulus 10, remainder 6),
parttest_10_7 FOR VALUES WITH (modulus 10, remainder 7),
parttest_10_8 FOR VALUES WITH (modulus 10, remainder 8),
parttest_10_9 FOR VALUES WITH (modulus 10, remainder 9)

which doesn't help much in that respect.

Attached patch changes this output to:

postgres=# \d+ parttest
   Partitioned table "public.parttest"
 Column |  Type   | Collation | Nullable | Default | Storage  |
Compression | Stats target | Description

+-+---+--+-+--+-+--+-
 id | integer |   | not null | | plain|
 |  |
 val1   | text|   |  | | extended |
 |  |
 val2   | text|   |  | | extended |
 |  |
Partition key: HASH (id)
Partitions: parttest_10_0 FOR VALUES WITH (modulus 10, remainder 0),
parttest_10_1 FOR VALUES WITH (modulus 10, remainder
1), server: "fdw_node2",
parttest_10_2 FOR VALUES WITH (modulus 10, remainder 2),
parttest_10_3 FOR VALUES WITH (modulus 10, remainder
3), server: "fdw_node2",
parttest_10_4 FOR VALUES WITH (modulus 10, remainder 4),
parttest_10_5 FOR VALUES WITH (modulus 10, remainder
5), server: "fdw_node2",
parttest_10_6 FOR VALUES WITH (modulus 10, remainder 6),
parttest_10_7 FOR VALUES WITH (modulus 10, remainder
7), server: "fdw_node2",
parttest_10_8 FOR VALUES WITH (modulus 10, remainder 8),
parttest_10_9 FOR VALUES WITH (modulus 10, remainder
9), server: "fdw_node2"

which is much more informative, albeit a little more cluttered, but
short of using
emojis I can't see any better way (suggestions welcome).

For completeness, output with child tables could look like this:

postgres=# \d+ inhtest
 Table "public.inhtest"
 Column |  Type   | Collation | Nullable | Default | Storage  |
Compression | Stats target | Description

+-+---+--+-+--+-+--+-
 id | integer |   | not null | | plain|
 |  |
 val1   | text|   |  | | extended |
 |  |
 val2   | text|   |  | | extended |
 |  |
Child tables: inhtest_10_0,
  inhtest_10_1 (server: "fdw_node2"),
  inhtest_10_2,
  inhtest_10_3 (server: "fdw_node2"),
  inhtest_10_4,
  inhtest_10_5 (server: "fdw_node2"),
  inhtest_10_6,
  inhtest_10_7 (server: "fdw_node2"),
  inhtest_10_8,
  inhtest_10_9 (server: "fdw_node2")
Access method: heap

Will add to next CF.


Regards

Ian Barwick
commit d5f5de96381b93a6ea1066d4abb4c6617e0af758
Author: Ian Barwick 
Date:   Thu Oct 20 12:45:28 2022 +0900

psql: in \d+, indicate foreign partitions

Currently with a partitioned table, \d+ lists the partitions and their
partition key, but it would be useful to see which ones, if any, are
foreign partitions.

A simple way of doing this is, for foreign partitions, to display the
name of the partition's foreign se

Re: Pluggable toaster

2022-10-24 Thread Nikita Malakhov
Hi,

>This is exactly the point. In order to not to create a silver bullet,
>TOASTers should be limited to a single TableAM. The one we know uses
>pages of a known fixed size, the one that actually requires TOAST
>because pages are relatively small, etc.

Currently all our TOAST implementations use Heap AM, except ones
that use external (truly external, i.e. files outside DB) storage. Using
Table AM
Routine and routing AM methods calls via it is a topic for further
discussion,
if Pluggable TOAST will be committed. And even then it would be an open
issue.

>I believe this is the source of misunderstanding. Note that not _any_
>TableAM needs TOAST to begin with. As an example, if one chooses to
>implement a column-organized TableAM that stores all text/bytea
>attributes in a separate dictionary file this person doesn't need
>TOAST and doesn't want to be constrained by the need of choosing one.

>For this reason the "N TOASTers x M TableAMs" approach is
>architecturally broken.

TOAST implementation is not necessary for Table AM. And TOAST API is just
an optional open interface - SET TOASTER is an option for CREATE/ALTER
TABLE command. In previous discussion we haven't mentioned an approach
"N TOASTers x M TableAMs".

>To clarify: is an ability to specify TOASTers for given columns and/or
>types also part of the plan?

For table columns it is already supported by the syntax part of the TOAST
API.
For data types we reserved the validation part of the API, but this support
is still a
subject for discussion, although we think it will be very handy for DB
users, like
we issue something like:
CREATE TYPE ... TOASTER=jsonb_toaster ... ;
or
ALTER TYPE JSONB SET TOASTER jsonb_toaster;
and do not have to set special toaster for jsonb column each time we create
or alter a table with it.

>No. To be honest, it looks like you are merely discarding most/any
>feedback the community provided so far.

Very sorry to read that. Almost all of the requests in this discussion have
been taken
into account in patches, and the most serious one - I mean pg_attribute
expansion
which was mentioned by Tom Lane and Robert Haas - is being fixed right now
and
will be ready very soon.

>I really think that pluggable TOASTers would be a great feature.
>However if the goal is to get it into the core I doubt that we are
>going to make much progress with the current approach.

We hope we will. This feature is very demanded by end-users, and will be
even more
as time goes by - current TOAST limitations and how they affect DBMS
performance is
a serious drawback in comparison to competitors.

On Mon, Oct 24, 2022 at 2:55 PM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> Hi Nikita,
>
> > >I don't argue with most of what you say. I am just pointing out the
> > >reason why the chosen approach "N TOASTers x M TableAMs" will not
> > >work:
> >
> > We assume that TAM used in custom Toaster works as it is should work,
> > and leave TAM internals to this TAM developer - say, we do not want to
> > change internals of Heap AM.
> >
> > We don't want to create some kind of silver bullet.
>
> This is exactly the point. In order to not to create a silver bullet,
> TOASTers should be limited to a single TableAM. The one we know uses
> pages of a known fixed size, the one that actually requires TOAST
> because pages are relatively small, etc.
>
> > That's what the TOAST API is - just an interface that all custom
> > TOAST implementations use to have a common entry point from any TAM,
> > [...]
>
> I believe this is the source of misunderstanding. Note that not _any_
> TableAM needs TOAST to begin with. As an example, if one chooses to
> implement a column-organized TableAM that stores all text/bytea
> attributes in a separate dictionary file this person doesn't need
> TOAST and doesn't want to be constrained by the need of choosing one.
>
> For this reason the "N TOASTers x M TableAMs" approach is
> architecturally broken.
>
> > keep in mind that different kinds of data require very
> > different approach to external storage - say, JSON TOAST works with
> > maps of keys and values, [...]
>
> To clarify: is an ability to specify TOASTers for given columns and/or
> types also part of the plan?
>
> > Have I answered your question? Please don't hesitate to point to any
> unclear
> > parts, I'd be glad to explain that.
>
> No. To be honest, it looks like you are merely discarding most/any
> feedback the community provided so far.
>
> I really think that pluggable TOASTers would be a great feature.
> However if the goal is to get it into the core I doubt that we are
> going to make much progress with the current approach.
>
> --
> Best regards,
> Aleksander Alekseev
>


-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/


Re: [patch] Have psql's \d+ indicate foreign partitions

2022-10-24 Thread Justin Pryzby
On Mon, Oct 24, 2022 at 09:44:18PM +0900, Ian Lawrence Barwick wrote:
> Recently I have been working a lot with partitioned tables which contain a mix
> of local and foreign partitions, and find it would be very useful to be able 
> to
> easily obtain an overview of which partitions are foreign and where they are
> located.

> Partitions: parttest_10_0 FOR VALUES WITH (modulus 10, remainder 0),
> parttest_10_1 FOR VALUES WITH (modulus 10, remainder 1), 
> server: "fdw_node2",

> which is much more informative, albeit a little more cluttered, but

> @@ -3445,6 +3451,10 @@ describeOneTableDetails(const char *schemaname,
>   if (child_relkind == RELKIND_PARTITIONED_TABLE 
> ||
>   child_relkind == 
> RELKIND_PARTITIONED_INDEX)
>   appendPQExpBufferStr(&buf, ", 
> PARTITIONED");
> + else if (child_relkind == RELKIND_FOREIGN_TABLE 
> && is_partitioned)
> + appendPQExpBuffer(&buf, ", server: 
> \"%s\"", PQgetvalue(result, i, 4));
> + else if (child_relkind == RELKIND_FOREIGN_TABLE 
> && !is_partitioned)
> + appendPQExpBuffer(&buf, " (server: 
> \"%s\")", PQgetvalue(result, i, 4));
>   if (strcmp(PQgetvalue(result, i, 2), "t") == 0)
>   appendPQExpBufferStr(&buf, " (DETACH 
> PENDING)");
>   if (i < tuples - 1)

To avoid the clutter that you mentioned, I suggest that this should show
that the table *is* foreign, but without the server - if you want to
know the server (or its options), you can run another \d command for
that (or run a SQL query).

That's similar to what's shown if the child is partitioned: a suffix
like ", PARTITIONED", but without show the partition strategy.

I had a patch to allow \d++, and maybe showing the foreign server would
be reasonable for that.  But the patch got closed, evidently lack of
interest.




Re: Pluggable toaster

2022-10-24 Thread Aleksander Alekseev
Hi Nikita,

> Using Table AM Routine and routing AM methods calls via it is a topic for 
> further discussion,
> if Pluggable TOAST will be committed. [...] And even then it would be an open 
> issue.

>From personal experience with the project I have serious doubts this
is going to happen. Before such invasive changes are going to be
accepted there should be a clear understanding of how exactly TOASTers
are supposed to be used. This should be part of the documentation in
the patchset. Additionally there should be an open-soruce or
source-available extension that actually demonstrates the benefits of
TOASTers with reproducible benchmarks (we didn't even get to that part
yet).

> TOAST implementation is not necessary for Table AM.

What other use cases for TOAST do you have in mind?

>> > Have I answered your question? Please don't hesitate to point to any 
>> > unclear
>> > parts, I'd be glad to explain that.
>>
>> No. To be honest, it looks like you are merely discarding most/any
>> feedback the community provided so far.
>>
>> I really think that pluggable TOASTers would be a great feature.
>> However if the goal is to get it into the core I doubt that we are
>> going to make much progress with the current approach.

To clarify, the concern about "N TOASTers vs M TableAM" was expressed
by Robert Haas back in Jan 2022:

> I agree ... but I'm also worried about what happens when we have
> multiple table AMs. One can imagine a new table AM that is
> specifically optimized for TOAST which can be used with an existing
> heap table. One can imagine a new table AM for the main table that
> wants to use something different for TOAST. So, I don't think it's
> right to imagine that the choice of TOASTer depends solely on the
> column data type. I'm not really sure how this should work exactly ...
> but it needs careful thought.

This is the most important open question so far to my knowledge. It
was never addressed, it doesn't seem like there is a plan of doing so,
the suggested alternative approach was ignored, nor are there any
strong arguments that would defend this design choice and/or criticize
the alternative one (other than general words "don't worry we know
what we are doing").

This what I mean by the community feedback being discarded.

-- 
Best regards,
Aleksander Alekseev




Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2022-10-24 Thread Peter Geoghegan
On Sun, Oct 23, 2022 at 9:32 PM Jeff Davis  wrote:
> It's possible this would be easier for users to understand: one process
> that does cleanup work over time in a way that minimizes interference;
> and another process that activates in more urgent situations (perhaps
> due to misconfiguration of the first process).

I think that the new "early" version of antiwraparound autovacuum
(that can still be autocancelled) would simply be called autovacuum.
It wouldn't appear as "autovacuum to prevent wraparound" in places
like pg_stat_activity. For the most part users wouldn't have to care
about the difference between these autovacuums and traditional
non-antiwraparound autovacuums. They really would be exactly the same
thing, so it would make sense if users typically noticed no difference
whatsoever (at least in contexts like pg_stat_activity).

> But we should be careful that we don't end up with more confusion. For
> something like that to work, we'd probably want the second process to
> not be configurable at all, and we'd want it to be issuing WARNINGs
> pointing to what might be misconfigured, and otherwise just be
> invisible.

There should be some simple scheme for determining when an
antiwraparound autovacuum (non-cancellable autovacuum to advance
relfrozenxid/relminmxid) should run (applied by the autovacuum.c
scheduling logic). Something like "table has attained an age that's
now 2x autovacuum_freeze_max_age, or 1/2 of vacuum_failsafe_age,
whichever is less".

The really important thing is giving a regular/early autocancellable
autovacuum triggered by age(relfrozenxid) *some* opportunity to run. I
strongly suspect that the exact details won't matter too much,
provided we manage to launch at least one such autovacuum before
escalating to traditional antiwraparound autovacuum (which cannot be
autocancelled). Even if regular/early autovacuum had just one
opportunity to run to completion, we'd already be much better off. The
hazards from blocking automated DDL in a way that leads to a very
disruptive traffic jam (like in the Joyent Manta postmortem) would go
way down.

> >  That way we wouldn't be fighting against the widely held perception
> > that antiwraparound autovacuums are scary.
>
> There's certainly a terminology problem there. Just to brainstorm on
> some new names, we might want to call it something like "xid
> reclamation" or "xid horizon advancement".

I think that we should simply call it autovacuum. Under this scheme,
antiwraparound autovacuum would be a qualitatively different kind of
operation to users (though not to vacuumlazy.c), because it would not
be autocancellable in the standard way. And because users should take
it as a signal that things aren't really working well (otherwise we
wouldn't have reached the point of requiring a scary antiwraparound
autovacuum in the first place). Right now antiwraparound autovacuums
are both an emergency thing (or at least described as such in one or
two areas of the source code), and a completely routine occurrence.
This is deeply confusing.

Separately, I plan on breaking out insert-triggered autovacuums from
traditional dead tuple triggered autovacuums [1], which creates a need
to invent some kind of name to differentiate the new table age
triggering criteria from both insert-driven and dead tuple driven
autovacuums. These are all fundamentally the same operations with the
same urgency to users, though. We'd only need to describe the
*criteria* that *triggered* the autovacuum in our autovacuum log
report (actually we'd still report autovacuums aš antiwraparound
autovacuum in cases where that still happened, which won't be
presented as just another triggering criteria in the report).

[1] 
https://www.postgresql.org/message-id/flat/cah2-wzneqmkmry8feudk8xdph37-4anygf7a04bwxoc1gkd...@mail.gmail.com
-- 
Peter Geoghegan




Re: parse partition strategy string in gram.y

2022-10-24 Thread Finnerty, Jim
Is there a reason why HASH partitioning does not currently support range 
partition bounds, where the values in the partition bounds would refer to the 
hashed value?

The advantage of hash  partition bounds is that they are not domain-specific, 
as they are for ordinary RANGE partitions, but they are more flexible than 
MODULUS/REMAINDER partition bounds.

On 10/21/22, 9:48 AM, "Japin Li"  wrote:

CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



On Fri, 21 Oct 2022 at 20:34, Justin Pryzby  wrote:
> On Fri, Oct 21, 2022 at 06:22:44PM +0800, Japin Li wrote:
>> Is there any way to get the regression tests diffs from Cirrus CI?
>> I did not find the diffs in [1].
>>
>> [1] https://cirrus-ci.com/build/4721735111540736
>
> They're called "main".
> I'm planning on submitting a patch to rename it to "regress", someday.
> See also: 
https://www.postgresql.org/message-id/20221001161420.GF6256%40telsasoft.com

Oh, thank you very much!  I find it in testrun/build/testrun/main/regress 
[1].

[1] 
https://api.cirrus-ci.com/v1/artifact/task/6215926717612032/testrun/build/testrun/main/regress/regression.diffs

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.





Re: Pluggable toaster

2022-10-24 Thread Nikita Malakhov
Hi!

>From personal experience with the project I have serious doubts this
>is going to happen. Before such invasive changes are going to be
>accepted there should be a clear understanding of how exactly TOASTers
>are supposed to be used. This should be part of the documentation in
>the patchset. Additionally there should be an open-soruce or
>source-available extension that actually demonstrates the benefits of
>TOASTers with reproducible benchmarks (we didn't even get to that part
>yet).

Actually, there's a documentation part in the patchset. Also, there is
README file
explaining the API.
In addition, we have several custom TOAST implementations with some
results - they were published and presented on PgCon. I was asked to exclude
custom TOAST implementations and some further improvements for the first
iteration, that's why currently the patchset consists only of 3 patches -
base
core changes, default TOAST implementation via TOAST API and documentation
package.

>What other use cases for TOAST do you have in mind?

The main use case is the same as for the TOAST mechanism - storing and
retrieving
oversized data. But we expanded this case with some details -
- update TOASTed data (yes, current TOAST implementation cannot update
stored
data - is marks whole TOASTED object as dead and stores new one);
- retrieve part of the stored data chunks without fully de-TOASTing stored
data (even
with existing TOAST this will be painful if you have to get just a small
part of the several
 hundreds Mb sized object);
- be able to store objects of size larger than 1 Gb;
- store more than 4 Tb of TOASTed data for one table;
- optimize storage for fast search and retrieval of parts of TOASTed object
- this is
must-have for effectively using JSON, PostgreSQL already is in catching-up
position
in JSON performance field.

For all this cases we have test results that show improvements in storage
and performance.

>To clarify, the concern about "N TOASTers vs M TableAM" was expressed
>by Robert Haas back in Jan 2022:

>> I agree ... but I'm also worried about what happens when we have
>> multiple table AMs. One can imagine a new table AM that is
>> specifically optimized for TOAST which can be used with an existing
>> heap table. One can imagine a new table AM for the main table that
>> wants to use something different for TOAST. So, I don't think it's
>> right to imagine that the choice of TOASTer depends solely on the
>> column data type. I'm not really sure how this should work exactly ...
>> but it needs careful thought.

>This is the most important open question so far to my knowledge. It
>was never addressed, it doesn't seem like there is a plan of doing so,
>the suggested alternative approach was ignored, nor are there any
>strong arguments that would defend this design choice and/or criticize
>the alternative one (other than general words "don't worry we know
>what we are doing").

>This what I mean by the community feedback being discarded.

Maybe there was some misunderstanding, I was new to this project and
company at that time - I was introduced to is in the middle of December
2021, but  Theodor Sigaev gave an answer to Mr. Haas:

>Right. that's why we propose a validate method  (may be, it's a wrong
>name, but I don't known better one) which accepts several arguments, one
>of which is table AM oid. If that method returns false then toaster
>isn't useful with current TAM, storage or/and compression kinds, etc.

And this is generalized and correct from the OOP POV mean to provide a
way to ensure that this concrete TOAST implementation is valid for Table AM
calling it.


On Mon, Oct 24, 2022 at 4:53 PM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> Hi Nikita,
>
> > Using Table AM Routine and routing AM methods calls via it is a topic
> for further discussion,
> > if Pluggable TOAST will be committed. [...] And even then it would be an
> open issue.
>
> From personal experience with the project I have serious doubts this
> is going to happen. Before such invasive changes are going to be
> accepted there should be a clear understanding of how exactly TOASTers
> are supposed to be used. This should be part of the documentation in
> the patchset. Additionally there should be an open-soruce or
> source-available extension that actually demonstrates the benefits of
> TOASTers with reproducible benchmarks (we didn't even get to that part
> yet).
>
> > TOAST implementation is not necessary for Table AM.
>
> What other use cases for TOAST do you have in mind?
>
> >> > Have I answered your question? Please don't hesitate to point to any
> unclear
> >> > parts, I'd be glad to explain that.
> >>
> >> No. To be honest, it looks like you are merely discarding most/any
> >> feedback the community provided so far.
> >>
> >> I really think that pluggable TOASTers would be a great feature.
> >> However if the goal is to get it into the core I doubt that we are
> >> going to make much progress with the current approach.
>
> 

Re: effective_multixact_freeze_max_age issue

2022-10-24 Thread Peter Geoghegan
On Mon, Oct 24, 2022 at 1:18 AM Anton A. Melnikov  wrote:
> > Also, the proposed new WARNING is only seen when we're
> > bound to also see the existing OldestXmin WARNING already. Why have 2
> > WARNINGs about exactly the same problem?>
>
> I didn't understand this moment.
>
> If the FreezeLimit is always older than OldestXmin or equal to it according 
> to:
>
> > FreezeLimit is usually 50 million XIDs before
> > OldestXmin (the freeze_min_age default),
>
> can't there be a situation like this?

I don't understand what you mean. FreezeLimit *isn't* always exactly
50 million XIDs before OldestXmin -- not anymore. In fact that's the
main benefit of this work (commit c3ffa731). That detail has changed
(and changed for the better). Though it will only be noticeable to
users when an old snapshot holds back OldestXmin by a significant
amount.

It is true that we must always respect the classic "FreezeLimit <=
OldestXmin" invariant. So naturally vacuum_set_xid_limits() continues
to make sure that the invariant holds in all cases, if necessary by
holding back FreezeLimit:

+   /* freezeLimit must always be <= oldestXmin */
+   if (TransactionIdPrecedes(*oldestXmin, *freezeLimit))
+   *freezeLimit = *oldestXmin;

When we *don't* have to do this (very typical when
vacuum_freeze_min_age is set to its default of 50 million), then
FreezeLimit won't be affected by old snapshots. Overall, FreezeLimit
must either be:

1. *Exactly* freeze_min_age XIDs before nextXID (note that it is
nextXID instead of OldestXmin here, as of commit c3ffa731).

or:

2. *Exactly* OldestXmin.

FreezeLimit must always be either exactly 1 or exactly 2, regardless
of anything else (like long running transactions/snapshots).
Importantly, we still never interpret freeze_min_age as more than
"autovacuum_freeze_max_age / 2" when determining FreezeLimit. While
the safeOldestXmin cutoff is "nextXID - autovacuum_freeze_max_age".

Before commit c3ffa731, FreezeLimit would sometimes be interpreted as
exactly OldestXmin -- it would be set to OldestXmin directly when the
WARNING was given. But now we get smoother behavior, without any big
discontinuities in how FreezeLimit is set over time when OldestXmin is
held back generally.

-- 
Peter Geoghegan




Re: New docs chapter on Transaction Management and related changes

2022-10-24 Thread Simon Riggs
On Sun, 16 Oct 2022 at 02:08, Bruce Momjian  wrote:
>
> On Fri, Oct 14, 2022 at 05:46:55PM -0400, Robert Treat wrote:
> > On Fri, Oct 14, 2022 at 3:51 PM Bruce Momjian  wrote:
> > > Attached is the merged patch from all the great comments I received.  I
> > > have also rebuilt the docs with the updated patch:
> > >
> > > https://momjian.us/tmp/pgsql/
> > >
> >
> > +   RELEASE SAVEPOINT also subcommits and destroys
> > +   all savepoints that were established after the named savepoint was
> > +   established. This means that any subtransactions of the named savepoint
> > +   will also be subcommitted and destroyed.
> >
> > Wonder if we should be more explicit that data changes are preserved,
> > not destroyed... something like:
> > "This means that any changes within subtransactions of the named
> > savepoint will be subcommitted and those subtransactions will be
> > destroyed."
>
> Good point.  I reread the section and there was just too much confusion
> over subtransactions, partly because the behavior just doesn't map
> easily to subtransaction.  I therefore merged all three paragraphs into
> one and tried to make the text saner;  release_savepoint.sgml diff
> attached, URL content updated.

Just got around to reading this, thanks for changes.

The rewording doesn't work for me. The use of the word "destroy" is
very misleading, since the effect is to commit.

So I think we must avoid use of the word destroy completely. Possible
rewording:

RELEASE SAVEPOINT will subcommit the subtransaction
established by the named savepoint, releasing any resources held by
it. If there were any subtransactions created by the named savepoint,
these will also be subcommitted.

The point is that savepoints create subtransactions, but they are not
the only way to create them, so we cannot equate savepoint and
subtransaction completely.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




[PATCH] minor optimization for ineq_histogram_selectivity()

2022-10-24 Thread Frédéric Yhuel

Hello,

When studying the weird planner issue reported here [1], I came up with 
the attached patch. It reduces the probability of calling 
get_actual_variable_range().


The patch applies to the master branch.

How to test :

CREATE TABLE foo (a bigint, b TEXT) WITH (autovacuum_enabled = off);
INSERT INTO foo SELECT i%213, md5(i::text) from 
generate_series(1,100) i;

VACUUM ANALYZE foo;
SELECT * FROM pg_stats WHERE tablename = 'foo' AND attname='a'\gx
CREATE INDEX ON foo(a);
DELETE FROM foo WHERE a = 212;
EXPLAIN (BUFFERS) SELECT count(a) FROM foo WHERE a > 208;

Without this patch, you will observe at least 4694 shared hits (which 
are mostly heap fetches). If you apply the patch, you will observe very 
few of them.


You should run the EXPLAIN on a standby, if you want to observe the heap 
fetches more than one time (because of killed index tuples being ignored).


Best regards,
Frédéric

[1] 
https://www.postgresql.org/message-id/flat/CAECtzeVPM4Oi6dTdqVQmjoLkDBVChNj7ed3hNs1RGrBbwCJ7Cw%40mail.gmail.comFrom d7602f0731a3c5cac59cf2fc81bc336f800cb11a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Fr=C3=A9d=C3=A9ric=20Yhuel?= 
Date: Mon, 24 Oct 2022 16:33:26 +0200
Subject: [PATCH] minor optimization for ineq_histogram_selectivity()

Avoid calling of get_actual_variable_range() when possible.

With this change, 'probe' can still reach sslot.nvalues - 1 if needed,
so the algorithm still works correctly.

But if the right end of the hitogram bin is equal
to sslot.values[sslot.nvalues - 2], 'probe' will stay strictly less
than sslot.nvalues - 1 in the while loop, and we save a call to
get_actual_variable_range().

Use of get_actual_variable_range() can sometimes lead to surprising
slow planning time (see [1] and [2])

[1] https://www.postgresql.org/message-id/flat/CAECtzeVPM4Oi6dTdqVQmjoLkDBVChNj7ed3hNs1RGrBbwCJ7Cw%40mail.gmail.com
[2] https://www.postgresql.org/message-id/flat/36adf760-680b-7a4a-e019-64f4eaaf6ff7%40gmail.com
---
 src/backend/utils/adt/selfuncs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 69e0fb98f5..c1bc67705c 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -1108,7 +1108,7 @@ ineq_histogram_selectivity(PlannerInfo *root,
 
 			while (lobound < hibound)
 			{
-int			probe = (lobound + hibound) / 2;
+int			probe = (lobound + hibound - 1) / 2;
 bool		ltcmp;
 
 /*
-- 
2.30.2



Re: Add the ability to limit the amount of memory that can be allocated to backends.

2022-10-24 Thread Arne Roland
Hello Reid,


could you rebase the patch again? It doesn't apply currently 
(http://cfbot.cputube.org/patch_40_3867.log). Thanks!


You mention, that you want to prevent the compiler from getting cute.

I don't think this comments are exactly helpful in the current state. I think 
probably fine to just omit them.


I don't understand the purpose of the result variable in 
exceeds_max_total_bkend_mem. What purpose does it serve?


I really like the simplicity of the suggestion here to prevent oom.

I intent to play around with a lot of backends, once I get a rebased patch.


Regards

Arne



From: Reid Thompson 
Sent: Thursday, September 15, 2022 4:58:19 PM
To: Ibrar Ahmed; pgsql-hackers@lists.postgresql.org
Cc: reid.thomp...@crunchydata.com; Justin Pryzby
Subject: Re: Add the ability to limit the amount of memory that can be 
allocated to backends.

On Thu, 2022-09-15 at 12:07 +0400, Ibrar Ahmed wrote:
>
> The patch does not apply; please rebase the patch.
>
> patching file src/backend/utils/misc/guc.c
> Hunk #1 FAILED at 3664.
> 1 out of 1 hunk FAILED -- saving rejects to file
> src/backend/utils/misc/guc.c.rej
>
> patching file src/backend/utils/misc/postgresql.conf.sample
>

rebased patches attached.

Thanks,
Reid



Re: effective_multixact_freeze_max_age issue

2022-10-24 Thread Peter Geoghegan
On Mon, Oct 24, 2022 at 7:56 AM Peter Geoghegan  wrote:
> I don't understand what you mean. FreezeLimit *isn't* always exactly
> 50 million XIDs before OldestXmin -- not anymore. In fact that's the
> main benefit of this work (commit c3ffa731). That detail has changed
> (and changed for the better). Though it will only be noticeable to
> users when an old snapshot holds back OldestXmin by a significant
> amount.

I meant that the new behavior will only have a noticeable impact when
OldestXmin is significantly earlier than nextXID. Most of the time
there won't be any old snapshots, which means that there will only be
a negligible difference between OldestXmin and nextXID when things are
operating normally (OldestXmin will still probably be a tiny bit
earlier than nextXID, but not enough to matter). And so most of the
time the difference between the old approach and the new approach will
be completely negligible; 50 million XIDs is usually a huge number (it
is usually far far larger than the difference between OldestXmin and
nextXID).

BTW, I have some sympathy for the argument that the WARNINGs that we
have here may not be enough -- we only warn when the situation is
already extremely serious. I just don't see any reason why that
problem should be treated as a regression caused by commit c3ffa731.
The WARNINGs may be inadequate, but that isn't new.

-- 
Peter Geoghegan




Re: Pluggable toaster

2022-10-24 Thread Aleksander Alekseev
Hi Nikita,

> > > TOAST implementation is not necessary for Table AM.
>
> >What other use cases for TOAST do you have in mind?
>
> The main use case is the same as for the TOAST mechanism - storing and 
> retrieving
> oversized data. But we expanded this case with some details -
> - update TOASTed data (yes, current TOAST implementation cannot update stored
> data - is marks whole TOASTED object as dead and stores new one);
> - retrieve part of the stored data chunks without fully de-TOASTing stored 
> data (even
> with existing TOAST this will be painful if you have to get just a small part 
> of the several
>  hundreds Mb sized object);
> - be able to store objects of size larger than 1 Gb;
> - store more than 4 Tb of TOASTed data for one table;
> - optimize storage for fast search and retrieval of parts of TOASTed object - 
> this is
> must-have for effectively using JSON, PostgreSQL already is in catching-up 
> position
> in JSON performance field.

I see. Actually most of this is what TableAM does. We just happened to
give this part of TableAM a separate name. The only exception is the
last case, when you create custom TOASTers for particular types and
potentially specify them for the given column.

All in all, this makes sense.

> Right. that's why we propose a validate method  (may be, it's a wrong
> name, but I don't known better one) which accepts several arguments, one
> of which is table AM oid. If that method returns false then toaster
> isn't useful with current TAM, storage or/and compression kinds, etc.

OK, I missed this message. So there was some misunderstanding after
all, sorry for this.

That's exactly what I wanted to know. It's much better than allowing
any TOASTer to run on top of any TableAM.

-- 
Best regards,
Aleksander Alekseev




Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2022-10-24 Thread Jeff Davis
On Mon, 2022-10-24 at 07:25 -0700, Peter Geoghegan wrote:
> The really important thing is giving a regular/early autocancellable
> autovacuum triggered by age(relfrozenxid) *some* opportunity to run.

+1. That principle seems both reasonable from a system standpoint and
understandable to a user.

> Even if regular/early autovacuum had just one
> opportunity to run to completion, we'd already be much better off.

By "opportunity", you mean that, regardless of configuration, the
cancellable autovacuum would at least start; though it still might be
cancelled by DDL. Right?

> These are all fundamentally the same operations with the
> same urgency to users, though. We'd only need to describe the
> *criteria* that *triggered* the autovacuum in our autovacuum log
> report

Hmm... I'm worried that could be a bit confusing depending on how it's
done. Let's be clear that it was merely the triggering criteria and
doesn't necessarily represent the work that is being done.

There are enough cases that it would be good to start a document and
outline the end behavior that your patch series is designed to
accomplish. In other words, a before/after of the interesting cases.


-- 
Jeff Davis
PostgreSQL Contributor Team - AWS






Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2022-10-24 Thread Peter Geoghegan
On Mon, Oct 24, 2022 at 8:43 AM Jeff Davis  wrote:
> > Even if regular/early autovacuum had just one
> > opportunity to run to completion, we'd already be much better off.
>
> By "opportunity", you mean that, regardless of configuration, the
> cancellable autovacuum would at least start; though it still might be
> cancelled by DDL. Right?

Yes, exactly.

It might be difficult as a practical matter to make sure that we
*reliably* give autovacuum.c the opportunity to launch a "standard"
autovacuum tasked with advancing relfrozenxid (just after
autovacuum_freeze_max_age is first crossed) before the point that a
scary antiwraparound autovacuum is launched. So we might end up giving
it more XID slack than it's likely to ever need (say by only launching
a traditional antiwraparound autovacuum against tables that attain an
age that is twice the value of autovacuum_freeze_max_age). These are
all just details, though -- the important principle is that we try our
utmost to give the less disruptive strategy a chance to succeed before
concluding that it has failed, and then "escalating" to a traditional
antiwraparound autovacuum.

> > These are all fundamentally the same operations with the
> > same urgency to users, though. We'd only need to describe the
> > *criteria* that *triggered* the autovacuum in our autovacuum log
> > report
>
> Hmm... I'm worried that could be a bit confusing depending on how it's
> done. Let's be clear that it was merely the triggering criteria and
> doesn't necessarily represent the work that is being done.

Maybe it could be broken out into a separate "autovacuum triggered by:
" line, seen only in the autovacuum log instrumentation (and not in
the similar report output by a manual VACUUM VERBOSE). When we still
end up "escalating" to an antiwraparound autovacuum, the "triggered
by:" line would match whatever we'd show in the benign the
non-cancellable-but-must-advance-relfrozenxid autovacuum case. The
difference would be that we'd now be reporting on a different
operation entirely (not just a regular autovacuum, a scary
antiwraparound autovacuum).

(Again, even these distinctions wouldn't be meaningful to vacuumlazy.c
itself -- it would just need to handle the details around logging in a
way that gave users the right idea. There wouldn't be any special
discrete aggressive mode of operation anymore, assuming my big patch
set gets into Postgres 16 too.)

> There are enough cases that it would be good to start a document and
> outline the end behavior that your patch series is designed to
> accomplish. In other words, a before/after of the interesting cases.

That's on my TODO list. Mostly it's an independent thing to this
(antiwraparound) autovacuum stuff, despite the fact that both projects
share the same underlying philosophy.

-- 
Peter Geoghegan




Re: parse partition strategy string in gram.y

2022-10-24 Thread Alvaro Herrera
On 2022-Oct-24, Finnerty, Jim wrote:

> Is there a reason why HASH partitioning does not currently support
> range partition bounds, where the values in the partition bounds would
> refer to the hashed value?

Just lack of an implementation, I suppose.

> The advantage of hash partition bounds is that they are not
> domain-specific, as they are for ordinary RANGE partitions, but they
> are more flexible than MODULUS/REMAINDER partition bounds.

Well, modulus/remainder is what we have.  If you have ideas for a
different implementation, let's hear them.  I suppose we would have to
know about both the user interface and how it would internally, from two
perspectives: how does tuple routing work (ie. how to match a tuple's
values to a set of bound values), and how does partition pruning work
(ie. how do partition bounds match a query's restriction clauses).

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/




Re: New docs chapter on Transaction Management and related changes

2022-10-24 Thread Robert Treat
On Mon, Oct 24, 2022 at 11:02 AM Simon Riggs
 wrote:
>
> On Sun, 16 Oct 2022 at 02:08, Bruce Momjian  wrote:
> >
> > On Fri, Oct 14, 2022 at 05:46:55PM -0400, Robert Treat wrote:
> > > On Fri, Oct 14, 2022 at 3:51 PM Bruce Momjian  wrote:
> > > > Attached is the merged patch from all the great comments I received.  I
> > > > have also rebuilt the docs with the updated patch:
> > > >
> > > > https://momjian.us/tmp/pgsql/
> > > >
> > >
> > > +   RELEASE SAVEPOINT also subcommits and destroys
> > > +   all savepoints that were established after the named savepoint was
> > > +   established. This means that any subtransactions of the named 
> > > savepoint
> > > +   will also be subcommitted and destroyed.
> > >
> > > Wonder if we should be more explicit that data changes are preserved,
> > > not destroyed... something like:
> > > "This means that any changes within subtransactions of the named
> > > savepoint will be subcommitted and those subtransactions will be
> > > destroyed."
> >
> > Good point.  I reread the section and there was just too much confusion
> > over subtransactions, partly because the behavior just doesn't map
> > easily to subtransaction.  I therefore merged all three paragraphs into
> > one and tried to make the text saner;  release_savepoint.sgml diff
> > attached, URL content updated.
>
> Just got around to reading this, thanks for changes.
>
> The rewording doesn't work for me. The use of the word "destroy" is
> very misleading, since the effect is to commit.
>
> So I think we must avoid use of the word destroy completely. Possible
> rewording:
>
> RELEASE SAVEPOINT will subcommit the subtransaction
> established by the named savepoint, releasing any resources held by
> it. If there were any subtransactions created by the named savepoint,
> these will also be subcommitted.
>

I think it should be "If there were any subtransactions of the named
savepoint, these will also be subcommitted", but otherwise I think
this wording should work.


Robert Treat
https://xzilla.net




fixing typo in comment for restriction_is_or_clause

2022-10-24 Thread Zhihong Yu
Hi,
When I was looking at src/backend/optimizer/util/restrictinfo.c, I found a
typo in one of the comments.

I also took the chance to simplify the code a little bit.

Please take a look at the patch.

Thanks


is-or.patch
Description: Binary data


Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2022-10-24 Thread Melanie Plageman
On Thu, Oct 20, 2022 at 1:31 PM Andres Freund  wrote:
>
> Hi,
>
> - we shouldn't do pgstat_count_io_op() while the buffer header lock is held,
>   if possible.

I've changed this locally. It will be fixed in the next version I share.

>
>   I wonder if we should add a "source" output argument to
>   StrategyGetBuffer(). Then nearly all the counting can happen in
>   BufferAlloc().

I think we can just check for BM_VALID being set before invalidating it
in order to claim the buffer at the end of BufferAlloc(). Then we can
count it as an eviction or reuse.

>
> - "repossession" is a very unintuitive name for me. If we want something like
>   it, can't we just name it reuse_failed or such?

Repossession could be called eviction_failed or reuse_failed.
Do we think we will ever want to use it to count buffers we released
in other IOContexts (thus making the name eviction_failed better than
reuse_failed)?

> - Is it actually correct to count evictions in StrategyGetBuffer()? What if we
>   then decide to not use that buffer in BufferAlloc()? Yes, that'll be counted
>   via rejected, but that still leaves the eviction count to be "misleading"?

I agree that counting evictions in StrategyGetBuffer() is incorrect.
Checking BM_VALID at bottom of BufferAlloc() should be better.

> On 2022-10-19 15:26:51 -0400, Melanie Plageman wrote:
> > I have made some major changes in this area to make the columns more
> > useful. I have renamed and split "clocksweeps". It is now "evicted" and
> > "freelist acquired". This makes it clear when a block must be evicted
> > from a shared buffer must be and may help to identify misconfiguration
> > of shared buffers.
>
> I'm not sure freelist acquired is really that useful? If we don't add it, we
> should however definitely not count buffers from the freelist as evictions.
>
>
> > There is some nuance here that I tried to make clear in the docs.
> > "freelist acquired" in a shared context is straightforward.
> > "freelist acquired" in a strategy context is counted when a shared
> > buffer is added to the strategy ring (not when it is reused).
>
> Not sure what the second half here means - why would a buffer that's not from
> the freelist ever be counted as being from the freelist?
>
>
> > "freelist_acquired" is confusing for local buffers but I wanted to
> > distinguish between reuse/eviction of local buffers and initial
> > allocation. "freelist_acquired" seemed more fitting because there is a
> > clocksweep to find a local buffer and if it hasn't been allocated yet it
> > is allocated in a place similar to where shared buffers acquire a buffer
> > from the freelist. If I didn't count it here, I would need to make a new
> > column only for local buffers called "allocated" or something like that.
>
> I think you're making this too granular. We need to have more detail than
> today. But we don't necessarily need to catch every nuance.
>

I am fine with cutting freelist_acquired. The same actionable
information that it could provide could be provided by "read", right?
Also, removing it means I can remove the complicated explanation of how
freelist_acquired should be interpreted in IOCONTEXT_LOCAL.

Speaking of IOCONTEXT_LOCAL, I was wondering if it is confusing to call
it IOCONTEXT_LOCAL since it refers to IO done for temporary tables. What
if, in the future, we want to track other IO done using data in local
memory? Also, what if we want to track other IO done using data from
shared memory that is not in shared buffers? Would IOCONTEXT_SB and
IOCONTEXT_TEMP be better? Should IOContext literally describe the
context of the IO being done and there be a separate column which
indicates the source of the data for the IO?
Like wal_buffer, local_buffer, shared_buffer? Then if it is not
block-oriented, it could be shared_mem, local_mem, or bypass?

If we had another dimension to the matrix "data_src" which, with
block-oriented IO is equivalent to "buffer type", this could help with
some of the clarity problems.

We could remove the "reused" column and that becomes:

IOCONTEXT | DATA_SRC| IOOP

strategy  | strategy_buffer | EVICT

Having data_src and iocontext simplifies the meaning of all io
operations involving a strategy. Some operations are done on shared
buffers and some on existing strategy buffers and this would be more
clear without the addition of special columns for strategies.


> > I have also added the columns "repossessed" and "rejected". "rejected"
> > is when a bulkread rejects a strategy buffer because it is dirty and
> > requires flush. Seeing a lot of rejections could indicate you need to
> > vacuum. "repossessed" is the number of times a strategy buffer was
> > pinned or in use by another backend and had to be removed from the
> > strategy ring and replaced with a new shared buffer. This gives you some
> > indication that there is contention on blocks recently used by a
> > strategy.
>
> I don't immediately see a real use case f

Re: Pluggable toaster

2022-10-24 Thread Nikita Malakhov
Hi,

Aleksander, thanks for the discussion! It seems to me that I have to add
some parts
of it to API documentation, to clarify the details on API purpose and
use-cases.

On Mon, Oct 24, 2022 at 6:37 PM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> Hi Nikita,
>
> > > > TOAST implementation is not necessary for Table AM.
> >
> > >What other use cases for TOAST do you have in mind?
> >
> > The main use case is the same as for the TOAST mechanism - storing and
> retrieving
> > oversized data. But we expanded this case with some details -
> > - update TOASTed data (yes, current TOAST implementation cannot update
> stored
> > data - is marks whole TOASTED object as dead and stores new one);
> > - retrieve part of the stored data chunks without fully de-TOASTing
> stored data (even
> > with existing TOAST this will be painful if you have to get just a small
> part of the several
> >  hundreds Mb sized object);
> > - be able to store objects of size larger than 1 Gb;
> > - store more than 4 Tb of TOASTed data for one table;
> > - optimize storage for fast search and retrieval of parts of TOASTed
> object - this is
> > must-have for effectively using JSON, PostgreSQL already is in
> catching-up position
> > in JSON performance field.
>
> I see. Actually most of this is what TableAM does. We just happened to
> give this part of TableAM a separate name. The only exception is the
> last case, when you create custom TOASTers for particular types and
> potentially specify them for the given column.
>
> All in all, this makes sense.
>
> > Right. that's why we propose a validate method  (may be, it's a wrong
> > name, but I don't known better one) which accepts several arguments, one
> > of which is table AM oid. If that method returns false then toaster
> > isn't useful with current TAM, storage or/and compression kinds, etc.
>
> OK, I missed this message. So there was some misunderstanding after
> all, sorry for this.
>
> That's exactly what I wanted to know. It's much better than allowing
> any TOASTer to run on top of any TableAM.
>
> --
> Best regards,
> Aleksander Alekseev
>


-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/


Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2022-10-24 Thread Melanie Plageman
On Sun, Oct 23, 2022 at 6:35 PM Maciek Sakrejda  wrote:
>
> On Wed, Oct 19, 2022 at 12:27 PM Melanie Plageman
>  wrote:
> >
> > v34 is attached.
> > I think the column names need discussion. Also, the docs need more work
> > (I added a lot of new content there). I could use feedback on the column
> > names and definitions and review/rephrasing ideas for the docs
> > additions.
>
> Nice! I think the expanded docs are great, and make this information
> much easier to interpret.
>
> >+   io_context bulkread, existing
> >+   dirty buffers in the ring requirng flush are
>
> "requiring"

Thanks!

>
> >+   shared buffers were acquired from the freelist and added to the
> >+   fixed-size strategy ring buffer. Shared buffers are added to the
> >+   strategy ring lazily. If the current buffer in the ring is pinned or 
> >in
>
> This is the first mention of the term "strategy" in these docs. It's
> not totally opaque, since there's some context, but maybe we should
> either try to avoid that term or define it more explicitly?
>

I am thinking it might be good to define the term strategy for use in
this view documentation.
In the IOContext column documentation, I've added this
...
avoid occupying an undue portion of the main shared buffer pool. This
pattern is called a Buffer Access Strategy and the fixed-size ring
buffer can be referred to as a strategy ring buffer.


I was thinking this would allow me to refer to the strategy ring buffer
more easily. I fear simply referring to "the" ring buffer throughout
this view documentation will be confusing.

> >+   io_contexts. This is equivalent to
> >+   evicted for shared buffers in
> >+   io_context shared, as the 
> >contents
> >+   of the buffer are evicted but refers to the case when 
> >the
>
> I don't quite follow this: does this mean that I should expect
> 'reused' and 'evicted' to be equal in the 'shared' context, because
> they represent the same thing? Or will 'reused' just be null because
> it's not distinct from 'evicted'? It looks like it's null right now,
> but I find the wording here confusing.

You should only see evictions when the strategy evicts shared buffers
and reuses when the strategy evicts existing strategy buffers.

How about this instead in this docs?

the number of times an existing buffer in the strategy ring was reused
as part of an operation in the bulkread,
bulkwrite, or vacuum
io_contexts. when a buffer access strategy
reuses a buffer in the strategy ring, it must evict its
contents, incrementing reused. when a buffer access
strategy adds a new shared buffer to the strategy ring and this shared
buffer is occupied, the buffer access strategy must evict the contents
of the shared buffer, incrementing evicted.


> > I've implemented a change using the same function pg_settings uses to
> > turn the build-time parameter BLCKSZ into 8kB (get_config_unit_name())
> > using the flag GUC_UNIT_BLOCKS. I am unsure if this is better or worse
> > than "block_size". I am feeling very conflicted about this column.
>
> Yeah, I guess it feels less natural here than in pg_settings, but it
> still kind of feels like one way of doing this is better than two...

So, Andres pointed out that it would be nice to be able to multiply the
unit column by the operation column (e.g. select unit * reused from
pg_stat_io...) and get a number of bytes. Then you can use
pg_size_pretty to convert it to something more human readable.

It probably shouldn't be called unit, then, since that would be the same
name as pg_settings but a different meaning. I thought of
"bytes_conversion". Then, non-block-oriented IO also wouldn't have to be
in bytes. They could put 1000 or 1 for bytes_conversion.

What do you think?

- Melanie




Question about "compound" queries.

2022-10-24 Thread Anton A. Melnikov

Hello!

Please, could somebody explain what the "compound" queries were created for?
Maybe i'm calling them wrong. It's about queries like:
SELECT 1 + 2 \; SELECT 2.0 AS "float" \;  SELECT 1;

Such queries can neither be prepared nor used in the extended protocol with
 ERROR:  cannot insert multiple commands into a prepared statement.
What are their advantages?
And what is the proper name for such queries? "Compound" or something else?
Would be very grateful for clarification.

Best wishes,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: fix stats_fetch_consistency value in postgresql.conf.sample

2022-10-24 Thread Justin Pryzby
On Mon, Sep 26, 2022 at 05:29:58PM +0900, Kyotaro Horiguchi wrote:
> At Sat, 17 Sep 2022 23:53:07 -0500, Justin Pryzby  
> wrote in 
> > This is an alternative implementation, which still relies on adding the
> > GUC_DYNAMIC, flag but doesn't require adding a new, sql-accessible
> > function to convert the GUC to a pretty/human display value.
> 
> Thanks!
> 
> I'm not sure shared_buffer is GUC_DYNAMIC_DEFAULT, and we need to read

It's set during initdb.

> postgresql.conf.sample using SQL, but +1 for the direction.
> 
> + AND NOT (sc.sample_value ~ '^0' AND current_setting(name) ~ '^0') -- 
> zeros may be written differently
> + AND NOT (sc.sample_value='60s' AND current_setting(name) = '1min') -- 
> two ways to write 1min
> 
> Mmm. Couldn't we get away from that explicit exceptions?

Suggestions are welcomed.

Rebased the patch.

I also split the flag into DEFAULTS_COMPILE and DEFAULTS_INITDB, since
that makes it easier to understand what the flags mean and the intent of
the patch.  And maybe allows fewer exclusions in patches like Peter's,
which I think would only want to exclude compile-time defaults.

-- 
Justin
>From c3ac7bd40d26eb692f939d58935415ceb1cf308e Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 19 Jul 2022 08:31:56 -0500
Subject: [PATCH 1/2] add DYNAMIC_DEFAULT for settings which vary by
 ./configure or platform or initdb

---
 doc/src/sgml/func.sgml  | 15 ++
 src/backend/utils/misc/guc_funcs.c  |  6 ++-
 src/backend/utils/misc/guc_tables.c | 78 ++---
 src/include/utils/guc.h |  2 +
 4 files changed, 70 insertions(+), 31 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6e0425cb3dc..b8e05073d1a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24374,6 +24374,21 @@ SELECT collation for ('foo' COLLATE "de_DE");
  FlagDescription
 
 
+
+ 
+  DEFAULT_COMPILE
+  Parameters with this flag have default values which vary by
+  platform, or compile-time options.
+  
+ 
+
+ 
+  DEFAULT_INITDB
+  Parameters with this flag have default values which are
+  determined dynamically during initdb.
+  
+ 
+
  
   EXPLAIN
   Parameters with this flag are included in
diff --git a/src/backend/utils/misc/guc_funcs.c b/src/backend/utils/misc/guc_funcs.c
index 108b3bd1290..2b666e8d014 100644
--- a/src/backend/utils/misc/guc_funcs.c
+++ b/src/backend/utils/misc/guc_funcs.c
@@ -538,7 +538,7 @@ ShowAllGUCConfig(DestReceiver *dest)
 Datum
 pg_settings_get_flags(PG_FUNCTION_ARGS)
 {
-#define MAX_GUC_FLAGS	6
+#define MAX_GUC_FLAGS	8
 	char	   *varname = TextDatumGetCString(PG_GETARG_DATUM(0));
 	struct config_generic *record;
 	int			cnt = 0;
@@ -551,6 +551,10 @@ pg_settings_get_flags(PG_FUNCTION_ARGS)
 	if (record == NULL)
 		PG_RETURN_NULL();
 
+	if (record->flags & GUC_DEFAULT_COMPILE)
+		flags[cnt++] = CStringGetTextDatum("DEFAULT_COMPILE");
+	if (record->flags & GUC_DEFAULT_INITDB)
+		flags[cnt++] = CStringGetTextDatum("DEFAULT_INITDB");
 	if (record->flags & GUC_EXPLAIN)
 		flags[cnt++] = CStringGetTextDatum("EXPLAIN");
 	if (record->flags & GUC_NO_RESET)
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 6454d034e7e..885d3dd8b81 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -601,7 +601,7 @@ StaticAssertDecl(lengthof(GucContext_Names) == (PGC_USERSET + 1),
 const char *const GucSource_Names[] =
 {
 	 /* PGC_S_DEFAULT */ "default",
-	 /* PGC_S_DYNAMIC_DEFAULT */ "default",
+	 /* PGC_S_DYNAMIC_DEFAULT */ "default", //
 	 /* PGC_S_ENV_VAR */ "environment variable",
 	 /* PGC_S_FILE */ "configuration file",
 	 /* PGC_S_ARGV */ "command line",
@@ -1191,7 +1191,7 @@ struct config_bool ConfigureNamesBool[] =
 		{"debug_assertions", PGC_INTERNAL, PRESET_OPTIONS,
 			gettext_noop("Shows whether the running server has assertion checks enabled."),
 			NULL,
-			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
+			GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE | GUC_DEFAULT_COMPILE
 		},
 		&assert_enabled,
 #ifdef USE_ASSERT_CHECKING
@@ -1367,7 +1367,8 @@ struct config_bool ConfigureNamesBool[] =
 	{
 		{"update_process_title", PGC_SUSET, PROCESS_TITLE,
 			gettext_noop("Updates the process title to show the active SQL command."),
-			gettext_noop("Enables updating of the process title every time a new SQL command is received by the server.")
+			gettext_noop("Enables updating of the process title every time a new SQL command is received by the server."),
+			GUC_DEFAULT_COMPILE
 		},
 		&update_process_title,
 #ifdef WIN32
@@ -2116,7 +2117,8 @@ struct config_int ConfigureNamesInt[] =
 	{
 		{"max_connections", PGC_POSTMASTER, CONN_AUTH_SETTINGS,
 			gettext_noop("Sets the maximum number of concurrent connections."),
-			NULL
+			NULL,
+			GUC_DEFAULT_INITDB
 		},
 		&MaxConnections,
 		100, 1, MAX_BACKENDS,
@@ -2153,7 +2155,7 @@ struct config_int ConfigureNamesI

Re: Interesting areas for beginners

2022-10-24 Thread Matheus Alcantara
Thanks so much for the answers, I'll try to start looking at some patches.


--
Matheus Alcantara




Re: Question about "compound" queries.

2022-10-24 Thread David G. Johnston
On Mon, Oct 24, 2022 at 3:02 PM Anton A. Melnikov 
wrote:

> Hello!
>
> Please, could somebody explain what the "compound" queries were created
> for?
> Maybe i'm calling them wrong. It's about queries like:
> SELECT 1 + 2 \; SELECT 2.0 AS "float" \;  SELECT 1;
>
> Such queries can neither be prepared nor used in the extended protocol with
>   ERROR:  cannot insert multiple commands into a prepared statement.
> What are their advantages?
> And what is the proper name for such queries? "Compound" or something else?
> Would be very grateful for clarification.
>

I suspect they came about out of simplicity - being able to simply take a
text file with a bunch of SQL commands in a script and send them as-is to
the server without any client-side parsing and let the server just deal
with it.  It works because the system needs to do those kinds of things
anyway so, why not make it user-facing, even if most uses would find its
restrictions makes it undesirable to use.

David J.


Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert

2022-10-24 Thread Jacob Champion
On Mon, Oct 4, 2021 at 9:14 PM Bruce Momjian  wrote:
> On Tue, Sep 28, 2021 at 02:54:39AM -0700, tho...@habets.se wrote:
> > And you say for complex setups. Fair enough. But currently I'd say the
> > default is wrong, and what should be default is not configurable.
>
> Agreed, I think this needs much more discussion and documentation.

I'd like to try to get this conversation started again. To pique
interest I've attached a new version of 0001, which implements
`sslrootcert=system` instead as suggested upthread. In 0002 I went
further and switched the default sslmode to `verify-full` when using
the system CA roots, because I feel pretty strongly that anyone
interested in using public CA systems is also interested in verifying
hostnames. (Otherwise, why make the switch?)

Notes:
- 0001, like Thomas' original patch, uses
SSL_CTX_set_default_verify_paths(). This will load both a default file
and a default directory. This is probably what most people want if
they're using the system roots -- just give me whatever the local
system wants me to use! -- but sslrootcert currently deals with files
only, I think. Is that a problem?
- The implementation in 0002 goes all the way down to
conninfo_add_defaults(). Maybe this is overly complex. Should I just
make sslmode a derived option, via connectOptions2()?

Thanks,
--Jacob
From 14311929a443f25f5064cdb01b57fae8d575e66d Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Mon, 24 Oct 2022 15:24:11 -0700
Subject: [PATCH v2 2/2] libpq: default to verify-full for system CAs

Weaker verification modes don't make much sense for public CAs.
---
 src/interfaces/libpq/fe-connect.c | 25 +
 src/interfaces/libpq/t/001_uri.pl | 30 --
 src/test/ssl/t/001_ssltests.pl|  6 ++
 3 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 746e9b4f1e..92b87d3318 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -5876,6 +5876,7 @@ static bool
 conninfo_add_defaults(PQconninfoOption *options, PQExpBuffer errorMessage)
 {
 	PQconninfoOption *option;
+	PQconninfoOption *sslmode_default = NULL, *sslrootcert = NULL;
 	char	   *tmp;
 
 	/*
@@ -5892,6 +5893,9 @@ conninfo_add_defaults(PQconninfoOption *options, PQExpBuffer errorMessage)
 	 */
 	for (option = options; option->keyword != NULL; option++)
 	{
+		if (strcmp(option->keyword, "sslrootcert") == 0)
+			sslrootcert = option; /* save for later */
+
 		if (option->val != NULL)
 			continue;			/* Value was in conninfo or service */
 
@@ -5936,6 +5940,13 @@ conninfo_add_defaults(PQconninfoOption *options, PQExpBuffer errorMessage)
 }
 continue;
 			}
+
+			/*
+			 * sslmode is not specified. Let it be filled in with the compiled
+			 * default for now, but if sslrootcert=system, we'll override the
+			 * default later before returning.
+			 */
+			sslmode_default = option;
 		}
 
 		/*
@@ -5969,6 +5980,20 @@ conninfo_add_defaults(PQconninfoOption *options, PQExpBuffer errorMessage)
 		}
 	}
 
+	/*
+	 * Special handling for sslrootcert=system with no sslmode explicitly
+	 * defined. In this case we want to strengthen the default sslmode to
+	 * verify-full.
+	 */
+	if (sslmode_default && sslrootcert)
+	{
+		if (sslrootcert->val && strcmp(sslrootcert->val, "system") == 0)
+		{
+			free(sslmode_default->val);
+			sslmode_default->val = strdup("verify-full");
+		}
+	}
+
 	return true;
 }
 
diff --git a/src/interfaces/libpq/t/001_uri.pl b/src/interfaces/libpq/t/001_uri.pl
index beaf13b49c..ccfcd77680 100644
--- a/src/interfaces/libpq/t/001_uri.pl
+++ b/src/interfaces/libpq/t/001_uri.pl
@@ -8,7 +8,9 @@ use IPC::Run;
 
 
 # List of URIs tests. For each test the first element is the input string, the
-# second the expected stdout and the third the expected stderr.
+# second the expected stdout and the third the expected stderr. Optionally,
+# additional arguments may specify key/value pairs which will override
+# environment variables for the duration of the test.
 my @tests = (
 	[
 		q{postgresql://uri-user:secret@host:12345/db},
@@ -209,20 +211,44 @@ my @tests = (
 		q{postgres://%2Fvar%2Flib%2Fpostgresql/dbname},
 		q{dbname='dbname' host='/var/lib/postgresql' (local)},
 		q{},
+	],
+	# Usually the default sslmode is 'prefer' (for libraries with SSL) or
+	# 'disable' (for those without). This default changes to 'verify-full' if
+	# the system CA store is in use.
+	[
+		q{postgresql://host?sslmode=disable},
+		q{host='host' sslmode='disable' (inet)},
+		q{},
+		PGSSLROOTCERT => "system",
+	],
+	[
+		q{postgresql://host?sslmode=prefer},
+		q{host='host' sslmode='prefer' (inet)},
+		q{},
+		PGSSLROOTCERT => "system",
+	],
+	[
+		q{postgresql://host?sslmode=verify-full},
+		q{host='host' (inet)},
+		q{},
+		PGSSLROOTCERT => "system",
 	]);
 
 # test to run for each of the above test definitions
 sub test_uri
 {
 	local $Test::Builder::Level = $Test::Builder::Level + 1;

Re: Crash after a call to pg_backup_start()

2022-10-24 Thread Michael Paquier
On Mon, Oct 24, 2022 at 11:39:19AM +0200, Alvaro Herrera wrote:
> Reading it again, I agree with your conclusion, so I'll push as you
> proposed with some extra tests, after they complete running.

Thanks for the fix, Álvaro!
--
Michael


signature.asc
Description: PGP signature


Re: parse partition strategy string in gram.y

2022-10-24 Thread Tom Lane
Alvaro Herrera  writes:
> On 2022-Oct-24, Finnerty, Jim wrote:
>> The advantage of hash partition bounds is that they are not
>> domain-specific, as they are for ordinary RANGE partitions, but they
>> are more flexible than MODULUS/REMAINDER partition bounds.

I'm more than a bit skeptical of that claim.  Under what
circumstances (other than a really awful hash function,
perhaps) would it make sense to not use equi-sized hash
partitions?  If you can predict that more stuff is going
to go into one partition than another, then you need to
fix your hash function, not invent more complication for
the core partitioning logic.

regards, tom lane




Re: [PATCHES] Post-special page storage TDE support

2022-10-24 Thread Julien Rouhaud
Hi,

On Mon, Oct 24, 2022 at 12:55:53PM -0500, David Christensen wrote:
>
> Explicitly
> locking (assuming you stay in your lane) should only need to guard
> against access from other
> backends of this type if using shared buffers, so will be use-case dependent.

I'm not sure what you mean here?

> This does have a runtime overhead due to moving some offset
> calculations from compile time to
> runtime. It is thought that the utility of this feature will outweigh
> the costs here.

Have you done some benchmarking to give an idea of how much overhead we're
talking about?

> Candidates for page features include 32-bit or 64-bit checksums,
> encryption tags, or additional
> per-page metadata.
>
> While we are not currently getting rid of the pd_checksum field, this
> mechanism could be used to
> free up that 16 bits for some other purpose.

IIUC there's a hard requirement of initdb-time initialization, as there's
otherwise no guarantee that you will find enough free space in each page at
runtime.  It seems like a very hard requirement for a full replacement of the
current checksum approach (even if I agree that the current implementation
limitations are far from ideal), especially since there's no technical reason
that would prevent us from dynamically enabling data-checksums without doing
all the work when the cluster is down.




Re: fixing typo in comment for restriction_is_or_clause

2022-10-24 Thread John Naylor
On Tue, Oct 25, 2022 at 12:19 AM Zhihong Yu  wrote:
>
> Hi,
> When I was looking at src/backend/optimizer/util/restrictinfo.c, I found
a typo in one of the comments.

Using "t" as an abbreviation for "true" was probably intentional, so not a
typo. There is no doubt what the behavior is.

> I also took the chance to simplify the code a little bit.

It's perfectly clear and simple now, even if it doesn't win at "code golf".

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Question about savepoint level?

2022-10-24 Thread Richard Guo
On Mon, Oct 24, 2022 at 6:01 PM Alvaro Herrera 
wrote:

> On 2022-Oct-24, Richard Guo wrote:
> > ISTM the savepointLevel always remains the same as what is in
> > TopTransactionStateData after looking at the codes. Now I also get
> > confused. Maybe what we want is nestingLevel?
>
> This has already been discussed:
> https://postgr.es/m/1317297307-sup-7...@alvh.no-ip.org
> Now that we have transaction-controlling procedures, I think the next
> step is to add the SQL-standard feature that allows savepoint level
> control for them, which would make the savepointLevel no longer dead
> code.


Now I see the context. Thanks for pointing that out.

Thanks
Richard


Re: Some comments that should've covered MERGE

2022-10-24 Thread Richard Guo
On Mon, Oct 24, 2022 at 6:59 PM Alvaro Herrera 
wrote:

> On 2022-Oct-19, Richard Guo wrote:
>
> > Hi hackers,
> >
> > I happened to notice $subject. Attach a trivial patch for that.
>
> Thanks, applied.  I did change the comment atop setTargetTable, which I
> thought could use a little bit more detail on what is happening, and
> also in its callsite in transformMergeStmt.


Thanks for the fix!

Thanks
Richard


Re: fixing typo in comment for restriction_is_or_clause

2022-10-24 Thread Richard Guo
On Tue, Oct 25, 2022 at 10:05 AM John Naylor 
wrote:

>
> On Tue, Oct 25, 2022 at 12:19 AM Zhihong Yu  wrote:
> >
> > Hi,
> > When I was looking at src/backend/optimizer/util/restrictinfo.c, I found
> a typo in one of the comments.
>
> Using "t" as an abbreviation for "true" was probably intentional, so not a
> typo. There is no doubt what the behavior is.
>
> > I also took the chance to simplify the code a little bit.
>
> It's perfectly clear and simple now, even if it doesn't win at "code golf".
>

Agree with your point.  Do you think we can further make the one-line
function a macro or an inline function in the .h file?  I think this
function is called quite frequently during planning, so maybe doing that
would bring a little bit of efficiency.

Thanks
Richard


Re: fixing typo in comment for restriction_is_or_clause

2022-10-24 Thread Japin Li


On Tue, 25 Oct 2022 at 10:48, Richard Guo  wrote:
> On Tue, Oct 25, 2022 at 10:05 AM John Naylor 
> wrote:
>
>>
>> On Tue, Oct 25, 2022 at 12:19 AM Zhihong Yu  wrote:
>> >
>> > Hi,
>> > When I was looking at src/backend/optimizer/util/restrictinfo.c, I found
>> a typo in one of the comments.
>>
>> Using "t" as an abbreviation for "true" was probably intentional, so not a
>> typo. There is no doubt what the behavior is.
>>
>> > I also took the chance to simplify the code a little bit.
>>
>> It's perfectly clear and simple now, even if it doesn't win at "code golf".
>>
>
> Agree with your point.  Do you think we can further make the one-line
> function a macro or an inline function in the .h file?  I think this
> function is called quite frequently during planning, so maybe doing that
> would bring a little bit of efficiency.
>

+1, same goes for restriction_is_securely_promotable.

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.





Re: fixing typo in comment for restriction_is_or_clause

2022-10-24 Thread Zhihong Yu
On Mon, Oct 24, 2022 at 7:58 PM Japin Li  wrote:

>
> On Tue, 25 Oct 2022 at 10:48, Richard Guo  wrote:
> > On Tue, Oct 25, 2022 at 10:05 AM John Naylor <
> john.nay...@enterprisedb.com>
> > wrote:
> >
> >>
> >> On Tue, Oct 25, 2022 at 12:19 AM Zhihong Yu  wrote:
> >> >
> >> > Hi,
> >> > When I was looking at src/backend/optimizer/util/restrictinfo.c, I
> found
> >> a typo in one of the comments.
> >>
> >> Using "t" as an abbreviation for "true" was probably intentional, so
> not a
> >> typo. There is no doubt what the behavior is.
> >>
> >> > I also took the chance to simplify the code a little bit.
> >>
> >> It's perfectly clear and simple now, even if it doesn't win at "code
> golf".
> >>
> >
> > Agree with your point.  Do you think we can further make the one-line
> > function a macro or an inline function in the .h file?  I think this
> > function is called quite frequently during planning, so maybe doing that
> > would bring a little bit of efficiency.
> >
>
> +1, same goes for restriction_is_securely_promotable.
>
> Hi,
Thanks for the comments.

Please take a look at patch v2.


v2-is-or.patch
Description: Binary data


Re: Perform streaming logical transactions by background workers and parallel apply

2022-10-24 Thread Masahiko Sawada
On Fri, Oct 21, 2022 at 6:32 PM houzj.f...@fujitsu.com
 wrote:
>
> On Thursday, October 20, 2022 5:49 PM Amit Kapila  
> wrote:
> > On Thu, Oct 20, 2022 at 2:08 PM Peter Smith 
> > wrote:
> > >
> > > 7. get_transaction_apply_action
> > >
> > > > 12. get_transaction_apply_action
> > > >
> > > > I still felt like there should be some tablesync checks/comments in
> > > > this function, just for sanity, even if it works as-is now.
> > > >
> > > > For example, are you saying ([3] #22b) that there might be rare
> > > > cases where a Tablesync would call to parallel_apply_find_worker?
> > > > That seems strange, given that "for streaming transactions that are
> > > > being applied in the parallel ... we disallow applying changes on a
> > > > table that is not in the READY state".
> > > >
> > > > --
> > >
> > > Houz wrote [2] -
> > >
> > > I think because we won't try to start parallel apply worker in table
> > > sync worker(see the check in parallel_apply_can_start()), so we won't
> > > find any worker in parallel_apply_find_worker() which means
> > > get_transaction_apply_action will return TRANS_LEADER_SERIALIZE. And
> > > get_transaction_apply_action is a function which can be invoked for
> > > all kinds of workers(same is true for all apply_handle_xxx functions),
> > > so not sure if table sync check/comment is necessary.
> > >
> > > ~
> > >
> > > Sure, and I believe you when you say it all works OK - but IMO there
> > > is something still not quite right with this current code. For
> > > example,
> > >
> > > e.g.1 the functional will return TRANS_LEADER_SERIALIZE for Tablesync
> > > worker, and yet the comment for TRANS_LEADER_SERIALIZE says "means
> > > that we are in the leader apply worker" (except we are not)
> > >
> > > e.g.2 we know for a fact that Tablesync workers cannot start their own
> > > parallel apply workers, so then why do we even let the Tablesync
> > > worker make a call to parallel_apply_find_worker() looking for
> > > something we know will not be found?
> > >
> >
> > I don't see much benefit in adding an additional check for tablesync workers
> > here. It will unnecessarily make this part of the code look bit ugly.
>
> Thanks for the review, here is the new version patch set which addressed 
> Peter[1]
> and Kuroda-san[2]'s comments.

I've started to review this patch. I tested v40-0001 patch and have
one question:

IIUC even when most of the changes in the transaction are filtered out
in pgoutput (eg., by relation filter or row filter), the walsender
sends STREAM_START. This means that the subscriber could end up
launching parallel apply workers also for almost empty (and streamed)
transactions. For example, I created three subscriptions each of which
subscribes to a different table. When I loaded a large amount of data
into one table, all three (leader) apply workers received START_STREAM
and launched their parallel apply workers. However, two of them
finished without applying any data. I think this behaviour looks
problematic since it wastes workers and rather decreases the apply
performance if the changes are not large. Is it worth considering a
way to delay launching a parallel apply worker until we find out the
amount of changes is actually large? For example, the leader worker
writes the streamed changes to files as usual and launches a parallel
worker if the amount of changes exceeds a threshold or the leader
receives the second segment. After that, the leader worker switches to
send the streamed changes to parallel workers via shm_mq instead of
files.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Improve tab completion for ALTER STATISTICS

2022-10-24 Thread vignesh C
On Mon, 24 Oct 2022 at 12:30, Michael Paquier  wrote:
>
> On Wed, Oct 19, 2022 at 04:06:51PM +0530, vignesh C wrote:
> > I noticed that the tab completion for ALTER STATISTICS .. SET was not
> > handled. The attached patch displays SCHEMA and STATISTICS for tab
> > completion of ALTER STATISTICS name SET.
>
> Indeed, it is a bit strange as we would get a list of settable
> parameters once the completion up to SET is done, rather than
> STATISTICS and SCHEMA.  Your patch looks fine, so applied.  Thanks!

Thanks for pushing this.

Regards,
Vignesh




Outdated comments in AssignTransactionId?

2022-10-24 Thread Japin Li

The AssignTransactionId has the following comments:

/*
 * ensure this test matches similar one in
 * RecoverPreparedTransactions()
 */
if (nUnreportedXids >= PGPROC_MAX_CACHED_SUBXIDS ||
log_unknown_top)
{
...
}

However, RecoverPreparedTransactions removes this reference in 49e9281549.
Attached remove this reference in AssignTransactionId.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index fd5103a78e..fe0cf5527b 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -741,10 +741,6 @@ AssignTransactionId(TransactionState s)
 		unreportedXids[nUnreportedXids] = XidFromFullTransactionId(s->fullTransactionId);
 		nUnreportedXids++;
 
-		/*
-		 * ensure this test matches similar one in
-		 * RecoverPreparedTransactions()
-		 */
 		if (nUnreportedXids >= PGPROC_MAX_CACHED_SUBXIDS ||
 			log_unknown_top)
 		{


Re: GUC values - recommended way to declare the C variables?

2022-10-24 Thread Peter Smith
On Thu, Oct 20, 2022 at 6:52 PM Peter Smith  wrote:
>
> On Thu, Oct 20, 2022 at 3:16 PM Justin Pryzby  wrote:
> >
> > On Thu, Oct 20, 2022 at 11:56:58AM +1100, Peter Smith wrote:
> > > Patch 0002 adds a sanity-check function called by
> > > InitializeGUCOptions, as suggested by Tom [2]. This is to ensure that
> > > the GUC C variable initial values are sensible and/or have not gone
> > > stale compared with the compiled-in defaults of guc_tables.c. This
> > > patch also changes some GUC C variable initial values which were
> > > already found (by this sanity-checker) to be different.
> >
> > I like it.
> >
> > However it's fails on windows:
> >
> > https://cirrus-ci.com/task/5545965036765184
> >
> > running bootstrap script ... FATAL:  GUC (PGC_BOOL) update_process_title, 
> > boot_val=0, C-var=1
> >
> > Maybe you need to exclude dynamically set gucs ?
> > See also this other thread, where I added a flag identifying exactly
> > that.  https://commitfest.postgresql.org/40/3736/
> > I need to polish that patch some, but maybe it'll be useful for you, too.
> >

PSA patch set v3.

This is essentially the same as before except now, utilizing the
GUC_DEFAULT_COMPILE flag added by Justin's patch [1], the sanity-check
skips over any dynamic compiler-dependent GUCs.

Patch 0001 - GUC trivial mods to logical replication GUC C var declarations
Patch 0002 - (TMP) Justin's patch adds the GUC_DEFAULT_COMPILE flag
support -- this is now a prerequisite for 0003
Patch 0003 - GUC sanity-check comparisons of GUC C var declarations
with the GUC defaults from guc_tables.c

--
[1] Justin's patch of 24/Oct -
https://www.postgresql.org/message-id/20221024220544.GJ16921%40telsasoft.com

Kind Regards,
Peter Smith.
Fujitsu Australia


v3-0003-GUC-C-variable-sanity-check.patch
Description: Binary data


v3-0001-GUC-tidy-some-C-variable-declarations.patch
Description: Binary data


v3-0002-TMP-Justins-DYNAMIC_DEFAULT-patch.patch
Description: Binary data


Re: fixing typo in comment for restriction_is_or_clause

2022-10-24 Thread Japin Li

On Tue, 25 Oct 2022 at 11:07, Zhihong Yu  wrote:
> Please take a look at patch v2.

Maybe we should define those functions in headers.  See patch v3.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

diff --git a/src/backend/optimizer/util/restrictinfo.c b/src/backend/optimizer/util/restrictinfo.c
index ef8df3d098..8a6812b4b1 100644
--- a/src/backend/optimizer/util/restrictinfo.c
+++ b/src/backend/optimizer/util/restrictinfo.c
@@ -375,41 +375,6 @@ commute_restrictinfo(RestrictInfo *rinfo, Oid comm_op)
 	return result;
 }
 
-/*
- * restriction_is_or_clause
- *
- * Returns t iff the restrictinfo node contains an 'or' clause.
- */
-bool
-restriction_is_or_clause(RestrictInfo *restrictinfo)
-{
-	if (restrictinfo->orclause != NULL)
-		return true;
-	else
-		return false;
-}
-
-/*
- * restriction_is_securely_promotable
- *
- * Returns true if it's okay to evaluate this clause "early", that is before
- * other restriction clauses attached to the specified relation.
- */
-bool
-restriction_is_securely_promotable(RestrictInfo *restrictinfo,
-   RelOptInfo *rel)
-{
-	/*
-	 * It's okay if there are no baserestrictinfo clauses for the rel that
-	 * would need to go before this one, *or* if this one is leakproof.
-	 */
-	if (restrictinfo->security_level <= rel->baserestrict_min_security ||
-		restrictinfo->leakproof)
-		return true;
-	else
-		return false;
-}
-
 /*
  * get_actual_clauses
  *
diff --git a/src/include/optimizer/restrictinfo.h b/src/include/optimizer/restrictinfo.h
index 6d30bd5e9d..dbfbebff48 100644
--- a/src/include/optimizer/restrictinfo.h
+++ b/src/include/optimizer/restrictinfo.h
@@ -31,9 +31,6 @@ extern RestrictInfo *make_restrictinfo(PlannerInfo *root,
 	   Relids outer_relids,
 	   Relids nullable_relids);
 extern RestrictInfo *commute_restrictinfo(RestrictInfo *rinfo, Oid comm_op);
-extern bool restriction_is_or_clause(RestrictInfo *restrictinfo);
-extern bool restriction_is_securely_promotable(RestrictInfo *restrictinfo,
-			   RelOptInfo *rel);
 extern List *get_actual_clauses(List *restrictinfo_list);
 extern List *extract_actual_clauses(List *restrictinfo_list,
 	bool pseudoconstant);
@@ -46,4 +43,36 @@ extern bool join_clause_is_movable_into(RestrictInfo *rinfo,
 		Relids currentrelids,
 		Relids current_and_outer);
 
+/*
+ * restriction_is_or_clause
+ *
+ * Returns true iff the restrictinfo node contains an 'or' clause.
+ */
+static inline bool
+restriction_is_or_clause(RestrictInfo *restrictinfo)
+{
+	return (restrictinfo->orclause != NULL);
+}
+
+/*
+ * restriction_is_securely_promotable
+ *
+ * Returns true if it's okay to evaluate this clause "early", that is before
+ * other restriction clauses attached to the specified relation.
+ */
+static inline bool
+restriction_is_securely_promotable(RestrictInfo *restrictinfo,
+   RelOptInfo *rel)
+{
+	/*
+	 * It's okay if there are no baserestrictinfo clauses for the rel that
+	 * would need to go before this one, *or* if this one is leakproof.
+	 */
+	if (restrictinfo->security_level <= rel->baserestrict_min_security ||
+		restrictinfo->leakproof)
+		return true;
+	else
+		return false;
+}
+
 #endif			/* RESTRICTINFO_H */


Re: fix stats_fetch_consistency value in postgresql.conf.sample

2022-10-24 Thread Peter Smith
On Tue, Oct 25, 2022 at 9:05 AM Justin Pryzby  wrote:
>
> On Mon, Sep 26, 2022 at 05:29:58PM +0900, Kyotaro Horiguchi wrote:
> > At Sat, 17 Sep 2022 23:53:07 -0500, Justin Pryzby  
> > wrote in
...

> Rebased the patch.
>
> I also split the flag into DEFAULTS_COMPILE and DEFAULTS_INITDB, since
> that makes it easier to understand what the flags mean and the intent of
> the patch.  And maybe allows fewer exclusions in patches like Peter's,
> which I think would only want to exclude compile-time defaults.
>

Thanks!

FYI, I'm making use of this patch now as a prerequisite for my GUC C
var sanity-checker [1].

--
[1]  
https://www.postgresql.org/message-id/CAHut%2BPss16YBiYYKyrZBvSp_4uSQfCy7aYfDXU0N8w5VZ5dd_g%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: fixing typo in comment for restriction_is_or_clause

2022-10-24 Thread Richard Guo
On Tue, Oct 25, 2022 at 11:46 AM Japin Li  wrote:

>
> On Tue, 25 Oct 2022 at 11:07, Zhihong Yu  wrote:
> > Please take a look at patch v2.
>
> Maybe we should define those functions in headers.  See patch v3.


Yes, putting them in .h file is better to me. For the v3 patch, we can
do the same one-line trick for restriction_is_securely_promotable.

Thanks
Richard


Re: Understanding, testing and improving our Windows filesystem code

2022-10-24 Thread Thomas Munro
I pushed the bug fixes from this series, without their accompanying
tests.  Here's a rebase of the test suite, with all those tests now
squashed into the main test patch, and also the
tell-Windows-to-be-more-like-Unix patch.  Registered in the
commitfest.
From 1b70db7f4068df22b29e02d0c1b759195e0187ff Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 11 Oct 2022 17:26:02 +1300
Subject: [PATCH v3 1/4] Add suite of macros for writing TAP tests in C.

Historically we had to load test modules into a PostgreSQL backend or
write an ad-hoc standalone program to test C code.  Let's provide a
convenient way to write stand-alone tests for low-level C code.

This commit defines a small set of macros that spit out results in TAP
format.  These can be improved and extended as required.

Discussion: https://postgr.es/m/CA%2BhUKG%2BajSQ_8eu2AogTncOnZ5me2D-Cn66iN_-wZnRjLN%2Bicg%40mail.gmail.com
---
 src/include/lib/pg_tap.h | 138 +++
 1 file changed, 138 insertions(+)
 create mode 100644 src/include/lib/pg_tap.h

diff --git a/src/include/lib/pg_tap.h b/src/include/lib/pg_tap.h
new file mode 100644
index 00..3da46ac5d5
--- /dev/null
+++ b/src/include/lib/pg_tap.h
@@ -0,0 +1,138 @@
+/*
+ * Simple macros for writing tests in C that print results in TAP format,
+ * as consumed by "prove".
+ *
+ * Specification for the output format: https://testanything.org/
+ */
+
+#ifndef PG_TAP_H
+#define PG_TAP_H
+
+#include 
+#include 
+#include 
+
+/* Counters are global, so we can break our tests into multiple functions. */
+static int	pg_test_count;
+static int	pg_fail_count;
+static int	pg_todo_count;
+
+/*
+ * Require an expression to be true.  Used for set-up steps that are not
+ * reported as a test.
+ */
+#define PG_REQUIRE(expr) \
+if (!(expr)) { \
+	printf("Bail out! requirement (" #expr ") failed at %s:%d\n", \
+		__FILE__, __LINE__); \
+	exit(EXIT_FAILURE); \
+}
+
+/*
+ * Like PG_REQUIRE, but log strerror(errno) before bailing.
+ */
+#define PG_REQUIRE_SYS(expr) \
+if (!(expr)) { \
+	printf("Bail out! requirement (" #expr ") failed at %s:%d, error: %s\n", \
+		__FILE__, __LINE__, strerror(errno)); \
+	exit(EXIT_FAILURE); \
+}
+
+/*
+ * Test that an expression is true.  An optional message can be provided,
+ * defaulting to the expression itself if not provided.
+ */
+#define PG_EXPECT(expr, ...) \
+do { \
+	const char *messages[] = {#expr, __VA_ARGS__}; \
+	const char *message = messages[lengthof(messages) - 1]; \
+	pg_test_count++; \
+	if (expr) { \
+		printf("ok %d - %s\n", pg_test_count, message); \
+	} else { \
+		if (pg_todo_count == 0) \
+			pg_fail_count++; \
+		printf("not ok %d - %s (at %s:%d)%s\n", pg_test_count, \
+			message, __FILE__, __LINE__, \
+			pg_todo_count > 0 ? " # TODO" : ""); \
+	} \
+} while (0)
+
+/*
+ * Like PG_EXPECT(), but also log strerror(errno) on failure.
+ */
+#define PG_EXPECT_SYS(expr, ...) \
+do { \
+	const char *messages[] = {#expr, __VA_ARGS__}; \
+	const char *message = messages[lengthof(messages) - 1]; \
+	pg_test_count++; \
+	if (expr) { \
+		printf("ok %d - %s\n", pg_test_count, message); \
+	} else { \
+		if (pg_todo_count == 0) \
+			pg_fail_count++; \
+		printf("not ok %d - %s (at %s:%d), error: %s%s\n", pg_test_count, \
+			message, __FILE__, __LINE__, strerror(errno), \
+			pg_todo_count > 0 ? " # TODO" : ""); \
+	} \
+} while (0)
+
+
+/*
+ * Test that one int expression is equal to another, logging the values if not.
+ */
+#define PG_EXPECT_EQ(expr1, expr2, ...) \
+do { \
+	const char *messages[] = {#expr1 " == " #expr2, __VA_ARGS__}; \
+	const char *message = messages[lengthof(messages) - 1]; \
+	int expr1_val = (expr1); \
+	int expr2_val = (expr2); \
+	pg_test_count++; \
+	if (expr1_val == expr2_val) { \
+		printf("ok %d - %s\n", pg_test_count, message); \
+	} else { \
+		if (pg_todo_count == 0) \
+			pg_fail_count++; \
+		printf("not ok %d - %s: %d != %d (at %s:%d)%s\n", pg_test_count, \
+			message, expr1_val, expr2_val, __FILE__, __LINE__, \
+			pg_todo_count > 0 ? " # TODO" : ""); \
+	} \
+} while (0)
+
+/*
+ * Test that one C string expression is equal to another, logging the values if
+ * not.
+ */
+#define PG_EXPECT_EQ_STR(expr1, expr2, ...) \
+do { \
+	const char *messages[] = {#expr1 " matches " #expr2, __VA_ARGS__}; \
+	const char *message = messages[lengthof(messages) - 1]; \
+	const char *expr1_val = (expr1); \
+	const char *expr2_val = (expr2); \
+	pg_test_count++; \
+	if (strcmp(expr1_val, expr2_val) == 0) { \
+		printf("ok %d - %s\n", pg_test_count, message); \
+	} else { \
+		if (pg_todo_count == 0) \
+			pg_fail_count++; \
+		printf("not ok %d - %s: \"%s\" vs \"%s\" (at %s:%d) %s\n", \
+			pg_test_count, \
+			message, expr1_val, expr2_val, __FILE__, __LINE__, \
+			pg_todo_count > 0 ? " # TODO" : ""); \
+	} \
+} while (0)
+
+/*
+ * The main function of a test program should begin and end with these
+ * functions.
+ */
+#define PG_BEGIN_TESTS() \
+	setbuf(stdout, NULL);		/* disable buffering for Windows */
+
+#

Re: Issue in GIN fast-insert: XLogBeginInsert + Read/LockBuffer ordering

2022-10-24 Thread Michael Paquier
On Mon, Oct 24, 2022 at 02:22:16PM +0200, Alvaro Herrera wrote:
> I suppose it's a matter of whether any bugs are possible outside of
> Neon.  If yes, then definitely this should be backpatched.  Offhand, I
> don't see any.  On the other hand, even if no bugs are known, then it's
> still valuable to have all code paths do WAL insertion in the same way,
> rather than having a single place that is alone in doing it in a
> different way.  But if we don't know of any bugs, then backpatching
> might be more risk than not doing so.

I have been putting my mind on that once again, and I don't see how
this would cause an issue in vanilla PG code.  XLogBeginInsert() does
its checks, meaning that we'd get a PANIC instead of an ERROR now that
these calls are within a critical section but that should not matter
because we know that recovery has ended or we would not be able to do
GIN insertions like that.  Then, it only switches to
begininsert_called to true, that we use for sanity checks in the
various WAL insert paths.  As Matthias has outlined, Neon relies on
begininsert_called more than we do currently.  FWIW, I think that
we're still fine not backpatching that, even considering the
consistency of the code with stable branches.  Now, if there is a
strong trend in favor of a backpatch, I'd be fine with this conclusion
as well.

> I confess I don't understand why is it important that XLogBeginInsert is
> called inside the critical section.  It seems to me that that part is
> only a side-effect of having to acquire the buffer locks in the critical
> section.  Right?

Yeah, you are right that it would not matter for XLogBeginInsert(),
though I'd like to think that this is a good practice on consistency
grounds with anywhere else, and we respect what's documented in the
README.

> I noticed that line 427 logs the GIN metapage with flag REGBUF_STANDARD;
> is the GIN metapage really honoring pd_upper?  I see only pg_lower being
> set.

Hmm.  Not sure.
--
Michael


signature.asc
Description: PGP signature


Re: Issue in GIN fast-insert: XLogBeginInsert + Read/LockBuffer ordering

2022-10-24 Thread Tom Lane
Michael Paquier  writes:
> On Mon, Oct 24, 2022 at 02:22:16PM +0200, Alvaro Herrera wrote:
>> I confess I don't understand why is it important that XLogBeginInsert is
>> called inside the critical section.  It seems to me that that part is
>> only a side-effect of having to acquire the buffer locks in the critical
>> section.  Right?

> Yeah, you are right that it would not matter for XLogBeginInsert(),
> though I'd like to think that this is a good practice on consistency
> grounds with anywhere else, and we respect what's documented in the
> README.

Yeah --- it's documented that way, and there doesn't seem to be
a good reason not to honor that here.

regards, tom lane




Re: GUC values - recommended way to declare the C variables?

2022-10-24 Thread Michael Paquier
On Tue, Oct 25, 2022 at 02:43:43PM +1100, Peter Smith wrote:
> This is essentially the same as before except now, utilizing the
> GUC_DEFAULT_COMPILE flag added by Justin's patch [1], the sanity-check
> skips over any dynamic compiler-dependent GUCs.

Yeah, this is a self-reminder that I should try to look at what's on
the other thread.

> Patch 0001 - GUC trivial mods to logical replication GUC C var declarations

This one seems fine, so done.
--
Michael


signature.asc
Description: PGP signature


Re: Getting rid of SQLValueFunction

2022-10-24 Thread Michael Paquier
On Fri, Oct 21, 2022 at 02:27:07PM +0900, Michael Paquier wrote:
> I have looked at that, and the attribute mapping remains compatible
> with past versions once the appropriate pg_proc entries are added.
> The updated patch set attached does that (with a user() function as
> well to keep the code a maximum simple), with more tests to cover the
> attribute case mentioned upthread.

Attached is a rebased patch set, as of the conflicts from 2e0d80c.
--
Michael
From 1d2d6e4803f3ab3e9d2c29efcc8dee0f8a53d699 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 25 Oct 2022 14:15:14 +0900
Subject: [PATCH v3 1/2] Remove from SQLValueFunction all the entries using
 "name" as return type

This commit changes six SQL keywords to use COERCE_SQL_SYNTAX rather
than SQLValueFunction:
- CURRENT_ROLE
- CURRENT_USER
- USER
- SESSION_USER
- CURRENT_CATALOG
- CURRENT_SCHEMA
Among the six, "user", "current_role" and "current_catalog" require
specific SQL functions to allow ruleutils.c to map them to the SQL
keywords these require when using COERCE_SQL_SYNTAX.  Having
pg_proc.proname match with the keyword ensures that the compatibility
remains the same when projecting any of these keywords in a FROM clause
to an attribute name when an alias is not specified.  Tests are added to
make sure that the mapping happens from the SQL keyword is correct,
using a view defition that feeds on these keywords in one of the tests
introduced by 40c24bf (both in a SELECT clause and when using each
keyword in a FROM clause).

SQLValueFunction is reduced to half its contents after this change,
simplifying its logic a bit as there is no need to enforce a C collation
anymore.  I have made a few performance tests, with a million-ish calls
to these keywords without seeing a difference in run-time or in perf
profiles.  The remaining SQLValueFunctions relate to timestamps.

Bump catalog version.

Reviewed-by: Corey Huinker
Discussion: https://postgr.es/m/yzag3morycguu...@paquier.xyz
---
 src/include/catalog/pg_proc.dat   |  9 +++
 src/include/nodes/primnodes.h |  8 +-
 src/backend/executor/execExprInterp.c | 27 
 src/backend/nodes/nodeFuncs.c | 11 +++-
 src/backend/parser/gram.y | 30 +-
 src/backend/parser/parse_expr.c   |  8 --
 src/backend/parser/parse_target.c | 18 --
 src/backend/utils/adt/ruleutils.c | 36 +--
 8 files changed, 55 insertions(+), 92 deletions(-)

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 62a5b8e655..241366fc8e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1505,6 +1505,15 @@
 { oid => '745', descr => 'current user name',
   proname => 'current_user', provolatile => 's', prorettype => 'name',
   proargtypes => '', prosrc => 'current_user' },
+{ oid => '9695', descr => 'current role name',
+  proname => 'current_role', provolatile => 's', prorettype => 'name',
+  proargtypes => '', prosrc => 'current_user' },
+{ oid => '9696', descr => 'user name',
+  proname => 'user', provolatile => 's', prorettype => 'name',
+  proargtypes => '', prosrc => 'current_user' },
+{ oid => '9697', descr => 'name of the current database',
+  proname => 'current_catalog', provolatile => 's', prorettype => 'name',
+  proargtypes => '', prosrc => 'current_database' },
 { oid => '746', descr => 'session user name',
   proname => 'session_user', provolatile => 's', prorettype => 'name',
   proargtypes => '', prosrc => 'session_user' },
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index f71f551782..f6dd27edcc 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -1313,13 +1313,7 @@ typedef enum SQLValueFunctionOp
 	SVFOP_LOCALTIME,
 	SVFOP_LOCALTIME_N,
 	SVFOP_LOCALTIMESTAMP,
-	SVFOP_LOCALTIMESTAMP_N,
-	SVFOP_CURRENT_ROLE,
-	SVFOP_CURRENT_USER,
-	SVFOP_USER,
-	SVFOP_SESSION_USER,
-	SVFOP_CURRENT_CATALOG,
-	SVFOP_CURRENT_SCHEMA
+	SVFOP_LOCALTIMESTAMP_N
 } SQLValueFunctionOp;
 
 typedef struct SQLValueFunction
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 9b9bbf00a9..6ebf5c287e 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -2495,15 +2495,10 @@ ExecEvalParamExtern(ExprState *state, ExprEvalStep *op, ExprContext *econtext)
 void
 ExecEvalSQLValueFunction(ExprState *state, ExprEvalStep *op)
 {
-	LOCAL_FCINFO(fcinfo, 0);
 	SQLValueFunction *svf = op->d.sqlvaluefunction.svf;
 
 	*op->resnull = false;
 
-	/*
-	 * Note: current_schema() can return NULL.  current_user() etc currently
-	 * cannot, but might as well code those cases the same way for safety.
-	 */
 	switch (svf->op)
 	{
 		case SVFOP_CURRENT_DATE:
@@ -2525,28 +2520,6 @@ ExecEvalSQLValueFunction(ExprState *state, ExprEvalStep *op)
 		case SVFOP_LOCALTIMESTAMP_N:
 			*op->resvalue = TimestampGetDatum(GetSQLLocalTimestamp(svf->typmo

Some regression tests for the pg_control_*() functions

2022-10-24 Thread Michael Paquier
Hi all,

As mentioned in [1], there is no regression tests for the SQL control
functions: pg_control_checkpoint, pg_control_recovery,
pg_control_system and pg_control_init.

It would be minimal to check their execution, as of a "SELECT FROM
func()", still some validation can be done on its output as long as
the test is portable enough (needs transparency for wal_level, commit
timestamps, etc.).

Attached is a proposal to provide some coverage.  Some of the checks
could be just removed, like the ones for non-NULL fields, but I have
written out everything to show how much could be done.

Thoughts?

[1]: https://www.postgresql.org/message-id/yzy0ilxnbmaxh...@paquier.xyz
--
Michael
diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index 9f106c2a10..a5df523109 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -594,3 +594,89 @@ SELECT * FROM tenk1 a JOIN my_gen_series(1,10) g ON a.unique1 = g;
  Index Cond: (unique1 = g.g)
 (4 rows)
 
+--
+-- Test functions for control data
+--
+\x
+SELECT checkpoint_lsn > '0/0'::pg_lsn AS checkpoint_lsn,
+redo_lsn > '0/0'::pg_lsn AS redo_lsn,
+redo_wal_file IS NOT NULL AS redo_wal_file,
+timeline_id > 0 AS timeline_id,
+prev_timeline_id > 0 AS prev_timeline_id,
+next_xid IS NOT NULL AS next_xid,
+next_oid > 0 AS next_oid,
+next_multixact_id != '0'::xid AS next_multixact_id,
+next_multi_offset IS NOT NULL AS next_multi_offset,
+oldest_xid != '0'::xid AS oldest_xid,
+oldest_xid_dbid > 0 AS oldest_xid_dbid,
+oldest_active_xid IS NOT NULL AS oldest_active_xid,
+oldest_multi_xid != '0'::xid AS oldest_multi_xid,
+oldest_multi_dbid > 0 AS oldest_multi_dbid,
+oldest_commit_ts_xid IS NOT NULL AS oldest_commit_ts_xid,
+newest_commit_ts_xid IS NOT NULL AS newest_commit_ts_xid
+  FROM pg_control_checkpoint();
+-[ RECORD 1 ]+--
+checkpoint_lsn   | t
+redo_lsn | t
+redo_wal_file| t
+timeline_id  | t
+prev_timeline_id | t
+next_xid | t
+next_oid | t
+next_multixact_id| t
+next_multi_offset| t
+oldest_xid   | t
+oldest_xid_dbid  | t
+oldest_active_xid| t
+oldest_multi_xid | t
+oldest_multi_dbid| t
+oldest_commit_ts_xid | t
+newest_commit_ts_xid | t
+
+SELECT max_data_alignment > 0 AS max_data_alignment,
+database_block_size > 0 AS database_block_size,
+blocks_per_segment > 0 AS blocks_per_segment,
+wal_block_size > 0 AS wal_block_size,
+max_identifier_length > 0 AS max_identifier_length,
+max_index_columns > 0 AS max_index_columns,
+max_toast_chunk_size > 0 AS max_toast_chunk_size,
+large_object_chunk_size > 0 AS large_object_chunk_size,
+float8_pass_by_value IS NOT NULL AS float8_pass_by_value,
+data_page_checksum_version >= 0 AS data_page_checksum_version
+  FROM pg_control_init();
+-[ RECORD 1 ]--+--
+max_data_alignment | t
+database_block_size| t
+blocks_per_segment | t
+wal_block_size | t
+max_identifier_length  | t
+max_index_columns  | t
+max_toast_chunk_size   | t
+large_object_chunk_size| t
+float8_pass_by_value   | t
+data_page_checksum_version | t
+
+SELECT min_recovery_end_lsn >= '0/0'::pg_lsn AS min_recovery_end_lsn,
+min_recovery_end_timeline >= 0 AS min_recovery_end_timeline,
+backup_start_lsn >= '0/0'::pg_lsn AS backup_start_lsn,
+backup_end_lsn >= '0/0'::pg_lsn AS backup_end_lsn,
+end_of_backup_record_required IS NOT NULL AS end_of_backup_record_required
+  FROM pg_control_recovery();
+-[ RECORD 1 ]-+--
+min_recovery_end_lsn  | t
+min_recovery_end_timeline | t
+backup_start_lsn  | t
+backup_end_lsn| t
+end_of_backup_record_required | t
+
+SELECT pg_control_version > 0 AS pg_control_version,
+catalog_version_no > 0 AS catalog_version_no,
+system_identifier >= 0 AS system_identifier,
+pg_control_last_modified <= now() AS pg_control_last_modified
+  FROM pg_control_system();
+-[ RECORD 1 ]+--
+pg_control_version   | t
+catalog_version_no   | t
+system_identifier| t
+pg_control_last_modified | t
+
diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql
index 639e9b352c..e5e75b82f3 100644
--- a/src/test/regress/sql/misc_functions.sql
+++ b/src/test/regress/sql/misc_functions.sql
@@ -223,3 +223,47 @@ SELECT * FROM tenk1 a JOIN my_gen_series(1,1000) g ON a.unique1 = g;
 
 EXPLAIN (COSTS OFF)
 SELECT * FROM tenk1 a JOIN my_gen_series(1,10) g ON a.unique1 = g;
+
+--
+-- Test functions for control data
+--
+\x
+SELECT checkpoint_lsn > '0/0'::pg_lsn AS checkpoint_lsn,
+redo_lsn > '0/0'::pg_lsn AS redo_lsn,
+redo_wal_file IS NOT NULL AS redo_wal_file,
+timeline_id > 0 AS timeline_id,
+prev_timeline_id > 0 AS prev_timeline_id,
+next_xid

Re: fixing typo in comment for restriction_is_or_clause

2022-10-24 Thread Japin Li

On Tue, 25 Oct 2022 at 12:01, Richard Guo  wrote:
> On Tue, Oct 25, 2022 at 11:46 AM Japin Li  wrote:
>
>>
>> On Tue, 25 Oct 2022 at 11:07, Zhihong Yu  wrote:
>> > Please take a look at patch v2.
>>
>> Maybe we should define those functions in headers.  See patch v3.
>
>
> Yes, putting them in .h file is better to me. For the v3 patch, we can
> do the same one-line trick for restriction_is_securely_promotable.
>

Fixed.  Please consider the v4 for further review.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

diff --git a/src/backend/optimizer/util/restrictinfo.c b/src/backend/optimizer/util/restrictinfo.c
index ef8df3d098..8a6812b4b1 100644
--- a/src/backend/optimizer/util/restrictinfo.c
+++ b/src/backend/optimizer/util/restrictinfo.c
@@ -375,41 +375,6 @@ commute_restrictinfo(RestrictInfo *rinfo, Oid comm_op)
 	return result;
 }
 
-/*
- * restriction_is_or_clause
- *
- * Returns t iff the restrictinfo node contains an 'or' clause.
- */
-bool
-restriction_is_or_clause(RestrictInfo *restrictinfo)
-{
-	if (restrictinfo->orclause != NULL)
-		return true;
-	else
-		return false;
-}
-
-/*
- * restriction_is_securely_promotable
- *
- * Returns true if it's okay to evaluate this clause "early", that is before
- * other restriction clauses attached to the specified relation.
- */
-bool
-restriction_is_securely_promotable(RestrictInfo *restrictinfo,
-   RelOptInfo *rel)
-{
-	/*
-	 * It's okay if there are no baserestrictinfo clauses for the rel that
-	 * would need to go before this one, *or* if this one is leakproof.
-	 */
-	if (restrictinfo->security_level <= rel->baserestrict_min_security ||
-		restrictinfo->leakproof)
-		return true;
-	else
-		return false;
-}
-
 /*
  * get_actual_clauses
  *
diff --git a/src/include/optimizer/restrictinfo.h b/src/include/optimizer/restrictinfo.h
index 6d30bd5e9d..b61707fd26 100644
--- a/src/include/optimizer/restrictinfo.h
+++ b/src/include/optimizer/restrictinfo.h
@@ -31,9 +31,6 @@ extern RestrictInfo *make_restrictinfo(PlannerInfo *root,
 	   Relids outer_relids,
 	   Relids nullable_relids);
 extern RestrictInfo *commute_restrictinfo(RestrictInfo *rinfo, Oid comm_op);
-extern bool restriction_is_or_clause(RestrictInfo *restrictinfo);
-extern bool restriction_is_securely_promotable(RestrictInfo *restrictinfo,
-			   RelOptInfo *rel);
 extern List *get_actual_clauses(List *restrictinfo_list);
 extern List *extract_actual_clauses(List *restrictinfo_list,
 	bool pseudoconstant);
@@ -46,4 +43,33 @@ extern bool join_clause_is_movable_into(RestrictInfo *rinfo,
 		Relids currentrelids,
 		Relids current_and_outer);
 
+/*
+ * restriction_is_or_clause
+ *
+ * Returns true iff the restrictinfo node contains an 'or' clause.
+ */
+static inline bool
+restriction_is_or_clause(RestrictInfo *restrictinfo)
+{
+	return (restrictinfo->orclause != NULL);
+}
+
+/*
+ * restriction_is_securely_promotable
+ *
+ * Returns true if it's okay to evaluate this clause "early", that is before
+ * other restriction clauses attached to the specified relation.
+ */
+static inline bool
+restriction_is_securely_promotable(RestrictInfo *restrictinfo,
+   RelOptInfo *rel)
+{
+	/*
+	 * It's okay if there are no baserestrictinfo clauses for the rel that
+	 * would need to go before this one, *or* if this one is leakproof.
+	 */
+	return (restrictinfo->security_level <= rel->baserestrict_min_security ||
+			restrictinfo->leakproof);
+}
+
 #endif			/* RESTRICTINFO_H */


Re: GUC values - recommended way to declare the C variables?

2022-10-24 Thread Peter Smith
On Tue, Oct 25, 2022 at 4:09 PM Michael Paquier  wrote:
>
> On Tue, Oct 25, 2022 at 02:43:43PM +1100, Peter Smith wrote:
> > This is essentially the same as before except now, utilizing the
> > GUC_DEFAULT_COMPILE flag added by Justin's patch [1], the sanity-check
> > skips over any dynamic compiler-dependent GUCs.
>
> Yeah, this is a self-reminder that I should try to look at what's on
> the other thread.
>
> > Patch 0001 - GUC trivial mods to logical replication GUC C var declarations
>
> This one seems fine, so done.
> --

Thanks for pushing v3-0001.

PSA v4. Rebased the remaining 2 patches so the cfbot can still work.

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v4-0001-TMP-Justins-DYNAMIC_DEFAULT-patch.patch
Description: Binary data


v4-0002-GUC-C-variable-sanity-check.patch
Description: Binary data


Re: fixing typo in comment for restriction_is_or_clause

2022-10-24 Thread John Naylor
On Tue, Oct 25, 2022 at 9:48 AM Richard Guo  wrote:
>
>
> On Tue, Oct 25, 2022 at 10:05 AM John Naylor 
wrote:
>>
>> It's perfectly clear and simple now, even if it doesn't win at "code
golf".
>
>
> Agree with your point.  Do you think we can further make the one-line
> function a macro or an inline function in the .h file?  I think this
> function is called quite frequently during planning, so maybe doing that
> would bring a little bit of efficiency.

My point was misunderstood, which is: I don't think we need to do anything
at all here if the goal was purely about aesthetics.

If the goal has now changed to efficiency, I have no opinion about that yet
since no evidence has been presented.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Perform streaming logical transactions by background workers and parallel apply

2022-10-24 Thread Peter Smith
FYI - After a recent push, the v40-0001 patch no longer applies on the
latest HEAD.

[postgres@CentOS7-x64 oss_postgres_misc]$ git apply
../patches_misc/v40-0001-Perform-streaming-logical-transactions-by-parall.patch
error: patch failed: src/backend/replication/logical/launcher.c:54
error: src/backend/replication/logical/launcher.c: patch does not apply

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Allow file inclusion in pg_hba and pg_ident files

2022-10-24 Thread Michael Paquier
On Mon, Oct 24, 2022 at 04:03:03PM +0800, Julien Rouhaud wrote:
> It would also require to bring HbaLine->sourcefile.  I'm afraid it would be
> weird to introduce such a refactoring in a separate commit just to pass a
> constant down multiple level of indirection, as all the macro will remain
> specific to either hba or ident anyway.

Putting my hands on it, I am not really afraid of doing that
independently.  From what I can see, this is cutting 23kB worth of
diffs from 0002, reducing it from 94K to 71kB.

> I agree that there are quite a lot of s/XXXFileName/file_name/, but those
> aren't complicated, and keeping them in the same commit makes it easy to
> validate that none has been forgotten since the regression tests covering 
> those
> messages are in that commit too.

Another advantage is that it minimizes the presence of the hardcoded
HbaFileName and IdentFileName in hba.c, which is one thing we are
trying to achieve here for the inclusion of more files.  I found a bit
strange that IdentLine had no sourcefile, actually.  We track the file
number but use it nowhere, and it seems to me that having more
symmetry between both would be a good thing.

So, the key of the logic is how we are going to organize the
tokenization of the HBA and ident lines through all the inclusions..
As far as I get it, tokenize_auth_file() is the root call and
tokenize_file_with_context() with its depth is able to work on each
individual file, and it can optionally recurse depending on what's
included.  Why do you need to switch to the old context in
tokenize_file_with_context()?  Could it be simpler to switch once to
linecxt outside of the internal routine?

It looks like GetDirConfFiles() is another piece that can be
refactored and reviewed on its own, as we use it in
ParseConfigDirectory()@guc.c.
--
Michael
From 7c12a9dd2b23765c0e0d38da8140051a89c45fb4 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 25 Oct 2022 15:17:27 +0900
Subject: [PATCH v13 1/3] Refactor knowledge of origin file in hba.c

This limits the footprint of HbaFileName and IdentFileName to their
entry loading point, easing the introduction of the inclusion logic.
---
 src/include/libpq/hba.h |   3 ++
 src/backend/libpq/hba.c | 114 +---
 2 files changed, 63 insertions(+), 54 deletions(-)

diff --git a/src/include/libpq/hba.h b/src/include/libpq/hba.h
index cec2e2665f..bf896ac084 100644
--- a/src/include/libpq/hba.h
+++ b/src/include/libpq/hba.h
@@ -93,6 +93,7 @@ typedef struct AuthToken
 
 typedef struct HbaLine
 {
+	char	   *sourcefile;
 	int			linenumber;
 	char	   *rawline;
 	ConnType	conntype;
@@ -138,6 +139,7 @@ typedef struct HbaLine
 
 typedef struct IdentLine
 {
+	char	   *sourcefile;
 	int			linenumber;
 
 	char	   *usermap;
@@ -157,6 +159,7 @@ typedef struct IdentLine
 typedef struct TokenizedAuthLine
 {
 	List	   *fields;			/* List of lists of AuthTokens */
+	char	   *file_name;		/* File name of origin */
 	int			line_num;		/* Line number */
 	char	   *raw_line;		/* Raw line text */
 	char	   *err_msg;		/* Error message if any */
diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index ea92f02a47..6524b60610 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -641,6 +641,7 @@ tokenize_auth_file(const char *filename, FILE *file, List **tok_lines,
 
 			tok_line = (TokenizedAuthLine *) palloc(sizeof(TokenizedAuthLine));
 			tok_line->fields = current_line;
+			tok_line->file_name = pstrdup(filename);
 			tok_line->line_num = line_number;
 			tok_line->raw_line = pstrdup(buf.data);
 			tok_line->err_msg = err_msg;
@@ -984,7 +985,7 @@ do { \
 			 errmsg("authentication option \"%s\" is only valid for authentication methods %s", \
 	optname, _(validmethods)), \
 			 errcontext("line %d of configuration file \"%s\"", \
-	line_num, HbaFileName))); \
+	line_num, file_name))); \
 	*err_msg = psprintf("authentication option \"%s\" is only valid for authentication methods %s", \
 		optname, validmethods); \
 	return false; \
@@ -1004,7 +1005,7 @@ do { \
  errmsg("authentication method \"%s\" requires argument \"%s\" to be set", \
 		authname, argname), \
  errcontext("line %d of configuration file \"%s\"", \
-		line_num, HbaFileName))); \
+		line_num, file_name))); \
 		*err_msg = psprintf("authentication method \"%s\" requires argument \"%s\" to be set", \
 			authname, argname); \
 		return NULL; \
@@ -1027,7 +1028,7 @@ do { \
 (errcode(ERRCODE_CONFIG_FILE_ERROR), \
  errmsg("missing entry at end of line"), \
  errcontext("line %d of configuration file \"%s\"", \
-			line_num, IdentFileName))); \
+			line_num, file_name))); \
 		*err_msg = pstrdup("missing entry at end of line"); \
 		return NULL; \
 	} \
@@ -1040,7 +1041,7 @@ do { \
 (errcode(ERRCODE_CONFIG_FILE_ERROR), \
  errmsg("multiple values in ident field"), \
  errcontext("line %d of configuration file \"%s\"", \
-			line_num, IdentFileName))); \
+