[PATCH] Add min() and max() aggregate functions for xid8

2022-02-02 Thread Ken Kato

Hi hackers,

Unlike xid, xid8 increases monotonically and cannot be reused.
This trait makes it possible to support min() and max() aggregate 
functions for xid8.

I thought they would be useful for monitoring.

So I made a patch for this.

Best wishes,

--
Ken Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATIONdiff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 8754f2f89b..1b064b4feb 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19973,7 +19973,7 @@ SELECT NULLIF(value, '(none)') ...
 values.  Available for any numeric, string, date/time, or enum type,
 as well as inet, interval,
 money, oid, pg_lsn,
-tid,
+tid, xid8,
 and arrays of any of these types.

Yes
@@ -19992,7 +19992,7 @@ SELECT NULLIF(value, '(none)') ...
 values.  Available for any numeric, string, date/time, or enum type,
 as well as inet, interval,
 money, oid, pg_lsn,
-tid,
+tid, xid8,
 and arrays of any of these types.

Yes
diff --git a/src/backend/utils/adt/xid.c b/src/backend/utils/adt/xid.c
index 9b4ceaea47..6e24697782 100644
--- a/src/backend/utils/adt/xid.c
+++ b/src/backend/utils/adt/xid.c
@@ -286,6 +286,24 @@ xid8cmp(PG_FUNCTION_ARGS)
 		PG_RETURN_INT32(-1);
 }
 
+Datum
+xid8_larger(PG_FUNCTION_ARGS)
+{
+	FullTransactionId fxid1 = PG_GETARG_FULLTRANSACTIONID(0);
+	FullTransactionId fxid2 = PG_GETARG_FULLTRANSACTIONID(1);
+
+	PG_RETURN_FULLTRANSACTIONID((U64FromFullTransactionId(fxid1) > U64FromFullTransactionId(fxid2)) ? fxid1 : fxid2);
+}
+
+Datum
+xid8_smaller(PG_FUNCTION_ARGS)
+{
+	FullTransactionId fxid1 = PG_GETARG_FULLTRANSACTIONID(0);
+	FullTransactionId fxid2 = PG_GETARG_FULLTRANSACTIONID(1);
+
+	PG_RETURN_FULLTRANSACTIONID((U64FromFullTransactionId(fxid1) < U64FromFullTransactionId(fxid2)) ? fxid1 : fxid2);
+}
+
 /*
  *	 COMMAND IDENTIFIER ROUTINES			 *
  */
diff --git a/src/include/catalog/pg_aggregate.dat b/src/include/catalog/pg_aggregate.dat
index 137f6eef69..2843f4b415 100644
--- a/src/include/catalog/pg_aggregate.dat
+++ b/src/include/catalog/pg_aggregate.dat
@@ -149,6 +149,9 @@
 { aggfnoid => 'max(pg_lsn)', aggtransfn => 'pg_lsn_larger',
   aggcombinefn => 'pg_lsn_larger', aggsortop => '>(pg_lsn,pg_lsn)',
   aggtranstype => 'pg_lsn' },
+{ aggfnoid => 'max(xid8)', aggtransfn => 'xid8_larger',
+  aggcombinefn => 'xid8_larger', aggsortop => '>(xid8,xid8)',
+  aggtranstype => 'xid8' },
 
 # min
 { aggfnoid => 'min(int8)', aggtransfn => 'int8smaller',
@@ -214,6 +217,9 @@
 { aggfnoid => 'min(pg_lsn)', aggtransfn => 'pg_lsn_smaller',
   aggcombinefn => 'pg_lsn_smaller', aggsortop => '<(pg_lsn,pg_lsn)',
   aggtranstype => 'pg_lsn' },
+{ aggfnoid => 'min(xid8)', aggtransfn => 'xid8_smaller',
+  aggcombinefn => 'xid8_smaller', aggsortop => '<(xid8,xid8)',
+  aggtranstype => 'xid8' },
 
 # count
 { aggfnoid => 'count(any)', aggtransfn => 'int8inc_any',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 7024dbe10a..62f36daa98 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -203,6 +203,12 @@
 { oid => '5071', descr => 'convert xid8 to xid',
   proname => 'xid', prorettype => 'xid', proargtypes => 'xid8',
   prosrc => 'xid8toxid' },
+{ oid => '5097', descr => 'larger of two',
+  proname => 'xid8_larger', prorettype => 'xid8', proargtypes => 'xid8 xid8',
+  prosrc => 'xid8_larger' },
+{ oid => '5098', descr => 'smaller of two',
+  proname => 'xid8_smaller', prorettype => 'xid8', proargtypes => 'xid8 xid8',
+  prosrc => 'xid8_smaller' },
 { oid => '69',
   proname => 'cideq', proleakproof => 't', prorettype => 'bool',
   proargtypes => 'cid cid', prosrc => 'cideq' },
@@ -6564,6 +6570,9 @@
 { oid => '4189', descr => 'maximum value of all pg_lsn input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'pg_lsn',
   proargtypes => 'pg_lsn', prosrc => 'aggregate_dummy' },
+{ oid => '5099', descr => 'maximum value of all xid8 input values',
+  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'xid8',
+  proargtypes => 'xid8', prosrc => 'aggregate_dummy' },
 
 { oid => '2131', descr => 'minimum value of all bigint input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'int8',
@@ -6631,6 +6640,9 @@
 { oid => '4190', descr => 'minimum value of all pg_lsn input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'pg_lsn',
   proargtypes => 'pg_lsn', prosrc => 'aggregate_dummy' },
+{ oid => '5100', descr => 'minimum value of all xid8 input values',
+  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'xid8',
+  proargtypes => 'xid8', prosrc => 'aggregate_dummy' },
 
 # count has two 

Re: [PATCH] Accept IP addresses in server certificate SANs

2022-02-02 Thread Kyotaro Horiguchi
At Wed, 2 Feb 2022 19:46:13 +, Jacob Champion  wrote 
in 
> On Mon, 2022-01-31 at 17:29 +0900, Kyotaro Horiguchi wrote:
> > +#define PGSQL_AF_INET  (AF_INET + 0)
> > +#define PGSQL_AF_INET6 (AF_INET + 1)
> > +
> > 
> > Now we have the same definition thrice in frontend code. Coulnd't we
> > define them in, say, libpq-fe.h or inet-fe.h (nonexistent) then
> > include it from the three files?
> 
> I started down the inet-fe.h route, and then realized I didn't know
> where that should go. Does it need to be included in (or part of)
> port.h? And should it be installed as part of the logic in
> src/include/Makefile?

I don't think it should be a part of port.h.  Though I suggested
frontend-only header file by the name, isn't it enough to separate out
the definitions from utils/inet.h to common/inet-common.h then include
the inet-common.h from inet.h?

> > When IP address is embedded in URI, it won't be translated to another
> > IP address. Concretely https://192.0.1.5/hoge cannot reach to the host
> > 192.0.1.8.  On the other hand, as done in the test, libpq allows that
> > when "host=192.0.1.5 hostaddr=192.0.1.8".  I can't understand what we
> > are doing in that case.  Don't we need to match the SAN IP address
> > with hostaddr instead of host?
> 
> I thought that host, not hostaddr, was the part that corresponded to
> the URI. So in a hypothetical future where postgresqls:// exists, the
> two URIs
> 
> postgresqls://192.0.2.2:5432/db
> postgresqls://192.0.2.2:5432/db?hostaddr=127.0.0.1
> 
> should both be expecting the same certificate. That seems to match the
> libpq documentation as well.

Hmm. Well, considering that the objective for the validation is to
check if the server is actually the client is intending to connect, it
is fine.  Sorry for the noise.

> (Specifying a host parameter is also allowed... that seems like it
> could cause problems for a hypothetical postgresqls:// scheme, but it's
> probably not relevant for this thread.)

Yeah.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: support for CREATE MODULE

2022-02-02 Thread Pavel Stehule
čt 3. 2. 2022 v 5:46 odesílatel Julien Rouhaud  napsal:

> Hi,
>
> On Thu, Feb 03, 2022 at 05:25:27AM +0100, Pavel Stehule wrote:
> >
> > čt 3. 2. 2022 v 3:28 odesílatel Swaha Miller 
> > napsal:
> >
> > > Hi,
> > >
> > > I'm following up from Jim's POC for adding MODULE to PostgreSQL. [1]
> > >
> > > My proposal implements modules as schema objects to be stored in a new
> > > system catalog pg_module with new syntax for CREATE [OR REPLACE]
> MODULE,
> > > ALTER MODULE, DROP MODULE and for GRANT and REVOKE for privileges on
> > > modules and module routines. I am attempting to follow the SQL spec.
> > > However, for right now, I'm proposing to support only routines as
> module
> > > contents, with local temporary tables and path specifications as
> defined
> > > in the SQL spec, to be supported in a future submission. We could also
> > > include support for variables depending on its status. [2]
> >
> > I dislike this feature. The modules are partially redundant to schemas
> and
> > to extensions in Postgres, and I am sure, so there is no reason to
> > introduce this.
> >
> > What is the benefit against schemas and extensions?
>
> I agree with Pavel.  It seems that it's mainly adding another namespacing
> layer
> between schemas and objects, and it's likely going to create a mess.
> That's also going to be problematic if you want to add support for module
> variables, as you won't be able to use e.g.
> dbname.schemaname.modulename.variablename.fieldname.
>
> Also, my understanding was that the major interest of modules (at least
> for the
> routines part) was the ability to make some of them private to the module,
> but
> it doesn't look like it's doing that, so I also don't see any real benefit
> compared to schemas and extensions.
>

The biggest problem is coexistence of Postgres's SEARCH_PATH  object
identification, and local and public scopes used in MODULEs or in Oracle's
packages.

I can imagine MODULES as third level of database unit object grouping with
following functionality

1. It should support all database objects like schemas
2. all public objects should be accessed directly when outer schema is in
SEARCH_PATH
3. private objects cannot be accessed from other modules
4. modules should be movable between schemas, databases without a loss of
functionality
5. modules should to support renaming without loss of functionality
6. there should be redefined some rules of visibility, because there can be
new identifier's collisions and ambiguities
7. there should be defined relation of modules's objects and schema's
objects. Maybe an introduction of the default module can be a good idea.

I had the opportunity to see a man who modified routines in pgAdmin. It can
be hell, but if we introduce a new concept (and it is an important
concept), then there should be strong benefits - for example - possibility
of strong encapsulation of code inside modules (or some units - the name is
not important).

The problem with pgAdmin maybe can be solved better by adding some
annotations to database objects that allows more user friendly organization
in the object tree in pgAdmin (and similar tools). Maybe it can be useful
to have more tries (defined by locality, semantic, quality, ...).

Regards

Pavel


Re: Add checkpoint and redo LSN to LogCheckpointEnd log message

2022-02-02 Thread Kyotaro Horiguchi
At Thu, 3 Feb 2022 13:59:03 +0900, Fujii Masao  
wrote in 
> 
> 
> On 2022/02/02 23:46, Bharath Rupireddy wrote:
> > On Tue, Feb 1, 2022 at 9:39 PM Fujii Masao
> >  wrote:
> >> I found that CreateRestartPoint() already reported the redo lsn as
> >> follows after emitting the restartpoint log message. To avoid
> >> duplicated logging of the same information, we should update this
> >> code?
> >>
> >>  ereport((log_checkpoints ? LOG : DEBUG2),
> >>  (errmsg("recovery restart point at %X/%X",
> >>  
> >> LSN_FORMAT_ARGS(lastCheckPoint.redo)),
> >>   xtime ? errdetail("Last completed transaction was
> >>   at log time %s.",
> >> 
> >> timestamptz_to_str(xtime))
> >> : 0));
> >>
> >> This code reports lastCheckPoint.redo as redo lsn. OTOH, with the
> >> patch, LogCheckpointEnd() reports
> >> ControlFile->checkPointCopy.redo. They may be different, for example,
> >> when the current DB state is not DB_IN_ARCHIVE_RECOVERY. In this case,
> >> which lsn should we report as redo lsn?
> > Do we ever reach CreateRestartPoint when ControlFile->stat !=
> > DB_IN_ARCHIVE_RECOVERY? Assert(ControlFile->state ==
> > DB_IN_ARCHIVE_RECOVERY); in CreateRestartPoint doesn't fail any
> > regression tests.
> 
> ISTM that CreateRestartPoint() can reach the condition
> ControlFile->state != DB_IN_ARCHIVE_RECOVERY. Please imagine the case
> where CreateRestartPoint() has already started and calls
> CheckPointGuts(). If the standby server is promoted while
> CreateRestartPoint() is flushing the data to disk at CheckPointGuts(),
> the state would be updated to DB_IN_PRODUCTION and
> CreateRestartPoint() can see the state != DB_IN_ARCHIVE_RECOVERY
> later.

By the way a comment there:
> * this is a quick hack to make sure nothing really bad happens if somehow
> * we get here after the end-of-recovery checkpoint.

now looks a bit wrong since now it's normal that a restartpoint ends
after promotion.

> As far as I read the code, this case seems to be able to make the
> server unrecoverable. If this case happens, since pg_control is not
> updated, pg_control still indicates the REDO LSN of last valid
> restartpoint. But CreateRestartPoint() seems to delete old WAL files
> based on its "current" REDO LSN not pg_control's REDO LSN. That is,
> WAL files required for the crash recovery starting from pg_control's
> REDO LSN would be removed.

Seems right. (I didn't confirm the behavior, though )

> If this understanding is right, to address this issue, probably we
> need to make CreateRestartPoint() do nothing (return immediately) when
> the state != DB_IN_ARCHIVE_RECOVERY?

On way to take. In that case should we log something like
"Restartpoint canceled" or something? 

By the way, restart point should start only while recoverying, and at
the timeof the start both checkpoint.redo and checkpoint LSN are
already past. We shouldn't update minRecovery point after promotion,
but is there any reason for not updating the checkPoint and
checkPointCopy?  If we update them after promotion, the
which-LSN-to-show problem would be gone.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: XTS cipher mode for cluster file encryption

2022-02-02 Thread Sasasu
On 2022/2/2 01:50, Bruce Momjian wrote:
> Well, I sent an email a week ago asking if people want to advance this
> feature forward, and so far you are the only person to reply, which I
> think means there isn't enough interest in this feature to advance it.

I am still focus on this thread.

and I have a small patch to solve the current buffile problem.
https://www.postgresql.org/message-id/a859a753-70f2-bb17-6830-19dbcad11c17%40sasa.su


OpenPGP_signature
Description: OpenPGP digital signature


Re: Add header support to text format and matching feature

2022-02-02 Thread Peter Eisentraut

On 31.01.22 07:54, Peter Eisentraut wrote:

On 30.01.22 23:56, Rémi Lapeyre wrote:
I notice in the 0002 patch that there is no test case for the error 
"wrong header for column \"%s\": got \"%s\"", which I think is really 
the core functionality of this patch.  So please add that.




I added a test for it in this new version of the patch.


The file_fdw.sql tests contain this

+CREATE FOREIGN TABLE header_doesnt_match (a int, foo text) SERVER 
file_server
+OPTIONS (format 'csv', filename :'filename', delimiter ',', header 
'match');   -- ERROR


but no actual error is generated.  Please review the additions on the 
file_fdw tests to see that they make sense.


A few more comments on your latest patch:

- The DefGetCopyHeader() function seems very bulky and might not be 
necessary.  I think you can just check for the string "match" first and 
then use defGetBoolean() as before if it didn't match.


- If you define COPY_HEADER_ABSENT = 0 in the enum, then most of the 
existing truth checks don't need to be changed.


- In NextCopyFromRawFields(), it looks like you have code that replaces 
the null_print value if the supplied column name is null.  I don't think 
we should allow null column values.  Someone, this should be an error. 
(Do we use null_print on output and make the column name null if it 
matches?)






Re: Doc: CREATE_REPLICATION_SLOT command requires the plugin name

2022-02-02 Thread Amit Kapila
On Wed, Feb 2, 2022 at 12:41 PM Antonin Houska  wrote:
>
> Amit Kapila  wrote:
>
> > On Tue, Feb 1, 2022 at 5:43 PM Antonin Houska  wrote:
> > >
> > > Amit Kapila  wrote:
> > >
> > > > On Tue, Feb 1, 2022 at 3:44 PM Antonin Houska  wrote:
> > > > >
> > > > > I got a syntax error when using the command according to the existing
> > > > > documentation. The output_plugin parameter needs to be passed too.
> > > > >
> > > >
> > > > Why do we need it for physical slots?
> > >
> > > Sure we don't, the missing curly brackets seem to be the problem. I used 
> > > the
> > > other form of the command for reference, which therefore might need a 
> > > minor
> > > fix too.
> > >
> >
> > Instead of adding additional '{}', can't we simply use:
> > { PHYSICAL [ RESERVE_WAL ] |
> > LOGICAL  > class="parameter">output_plugin } [
> > EXPORT_SNAPSHOT |
> > NOEXPORT_SNAPSHOT | USE_SNAPSHOT
> > | TWO_PHASE ]
>
> Do you mean changing
>
> CREATE_REPLICATION_SLOT slot_name [ TEMPORARY ] { PHYSICAL [ RESERVE_WAL ] | 
> LOGICAL output_plugin [ EXPORT_SNAPSHOT | NOEXPORT_SNAPSHOT | USE_SNAPSHOT | 
> TWO_PHASE ] }
>
> to
>
> CREATE_REPLICATION_SLOT slot_name [ TEMPORARY ] { PHYSICAL [ RESERVE_WAL ] | 
> LOGICAL output_plugin } [ EXPORT_SNAPSHOT | NOEXPORT_SNAPSHOT | USE_SNAPSHOT 
> | TWO_PHASE ]
>
> ?
>
> I'm not sure, IMHO that would still allow for commands like
>
> CREATE_REPLICATION_SLOT slot_name PHYSICAL output_plugin
>

I don't think so. Symbol '|' differentiates that, check its usage at
other places like Create Table doc page [1]. Considering that, it
seems to me that the way it is currently mentioned in docs is correct.

CREATE_REPLICATION_SLOT slot_name [ TEMPORARY ] { PHYSICAL [
RESERVE_WAL ] | LOGICAL output_plugin [ EXPORT_SNAPSHOT |
NOEXPORT_SNAPSHOT | USE_SNAPSHOT | TWO_PHASE ] }

According to me, this means output_plugin and all other options
related to snapshot can be mentioned only for logical slots.

[1] - https://www.postgresql.org/docs/devel/sql-createtable.html

-- 
With Regards,
Amit Kapila.




Re: Replace pg_controldata output fields with macros for better code manageability

2022-02-02 Thread Kyotaro Horiguchi
At Wed, 2 Feb 2022 18:02:39 -0500, Bruce Momjian  wrote in 
> On Sat, Jan 29, 2022 at 08:00:53PM +0530, Bharath Rupireddy wrote:
> > Hi,
> > 
> > While working on another pg_control patch, I observed that the
> > pg_controldata output fields such as "Latest checkpoint's
> > TimeLineID:", "Latest checkpoint's NextOID:'' and so on, are being
> > used in pg_resetwal.c, pg_controldata.c and pg_upgrade/controldata.c.
> > Direct usage of these fields in many places doesn't look good and can
> > be error prone if we try to change the text in one place and forget in
> > another place. I'm thinking of having those fields as macros in
> > pg_control.h, something like [1] and use it all the places. This will
> > be a good idea for better code manageability IMO.
> > 
> > Thoughts?
> > 
> > [1]
> > #define PG_CONTROL_FIELD_VERSION_NUMBER "pg_control version number:"
> > #define PG_CONTROL_FIELD_CATALOG_NUMBER "Catalog version number:"
> > #define PG_CONTROL_FIELD_CHECKPOINT_TLI "Latest checkpoint's TimeLineID:"
> > #define PG_CONTROL_FIELD_CHECKPOINT_PREV_TLI "Latest checkpoint's
> > PrevTimeLineID:"
> > #define PG_CONTROL_FIELD_CHECKPOINT_OLDESTXID "Latest checkpoint's 
> > oldestXID:"
> > #define PG_CONTROL_FIELD_CHECKPOINT_OLDESTXID_DB "Latest checkpoint's
> > oldestXID's DB:"
> > and so on.
> 
> That seems like a very good idea.

+1 for consolidate the texts an any shape.

But it doesn't sound like the way we should take to simply replace
only the concrete text labels to symbols.  Significant downsides of
doing that are untranslatability and difficulty to add the proper
indentation before the value field.  So we need to define the texts
including indentation spaces and format placeholder.

But if we define the strings separately, I would rather define them in
an array.



typedef enum pg_control_fields
{
  PGCNTRL_FIELD_ERROR = -1,
  PGCNTRL_FIELD_VERSION = 0,
  PGCNTRL_FIELD_CATALOG_VERSION,
  ...
  PGCTRL_NUM_FIELDS
} pg_control_fields;

const char *pg_control_item_formats[] = {
gettext_noop("pg_control version number:%u\n"),
gettext_noop("Catalog version number:   %u\n"),
...
};

const char *
get_pg_control_item_format(pg_control_fields item_id)
{
return _(pg_control_item_formats[item_id]);
};

const char *
get_pg_control_item()
{
return _(pg_control_item_formats[item_id]);
};

pg_control_fields
get_pg_control_item(const char *str, const char **valp)
{
int i;
for (i = 0 ; i < PGCNTL_NUM_FIELDS ; i++)
{
const char *fmt = pg_control_item_formats[i];
const char *p = strchr(fmt, ':');

Assert(p);
if (strncmp(str, fmt, p - fmt) == 0)
{
p = str + (p - fmt);
for(p++ ; *p == ' ' ; p++);
*valp = p;
return i;
}
}

return -1;
}

Printer side does:

   printf(get_pg_control_item_format(PGCNTRL_FIELD_VERSION),
  ControlFile->pg_control_version);

And the reader side would do:

   switch(get_pg_control_item(str, ))
   {
   case PGCNTRL_FIELD_VERSION:
   cluster->controldata.ctrl_ver = str2uint(v);
   break;
   ...

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

  




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-02 Thread Masahiko Sawada
On Thu, Feb 3, 2022 at 1:48 PM David G. Johnston
 wrote:
>
> On Wednesday, February 2, 2022, Masahiko Sawada  wrote:
>>
>> and have other error
>> information in pg_stat_subscription_workers view.
>
>
> What benefit is there to keeping the existing collector-based 
> pg_stat_subscripiton_workers view?  If we re-write it using shmem IPC then we 
> might as well put everything there and forego using a catalog.  Then it 
> behaves in a similar manner to pg_stat_activity but for logical replication 
> workers.

Yes, but if we use shmem IPC, we need to allocate shared memory for
them based on the number of subscriptions, not logical replication
workers (i.e., max_logical_replication_workers). So we cannot estimate
memory in the beginning. Also, IIUC the number of subscriptions that
are concurrently working is limited by max_replication_slots (see
ReplicationStateCtl) but I think we need to remember the state of
disabled subscriptions too.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Extensible Rmgr for Table AMs

2022-02-02 Thread Julien Rouhaud
Hi,

On Tue, Feb 01, 2022 at 03:38:32PM -0800, Jeff Davis wrote:
> On Tue, 2022-02-01 at 20:45 +0800, Julien Rouhaud wrote:
> > 
> > One last thing, did you do some benchmark with a couple custom rmgr
> > to see how
> > much the O(n) access is showing up in profiles?
> 
> What kind of a test case would be reasonable there? You mean having a
> lot of custom rmgrs?
> 
> I was expecting that few people would have more than one custom rmgr
> loaded anyway, so a sparse array or hashtable seemed wasteful. If
> custom rmgrs become popular we probably need to have a larger ID space
> anyway, but it seems like overengineering to do so now.

I agree that having dozen of custom rmgrs doesn't seem likely, but I also have
no idea of how much overhead you get by not doing a direct array access.  I
think it would be informative to benchmark something like simple OLTP write
workload on a fast storage (or a ramdisk, or with fsync off...), with the used
rmgr being the 1st and the 2nd custom rmgr.  Both scenario still seems
plausible and shouldn't degenerate on good hardware.




Re: Add checkpoint and redo LSN to LogCheckpointEnd log message

2022-02-02 Thread Fujii Masao




On 2022/02/02 23:46, Bharath Rupireddy wrote:

On Tue, Feb 1, 2022 at 9:39 PM Fujii Masao  wrote:

I found that CreateRestartPoint() already reported the redo lsn as follows 
after emitting the restartpoint log message. To avoid duplicated logging of the 
same information, we should update this code?

 ereport((log_checkpoints ? LOG : DEBUG2),
 (errmsg("recovery restart point at %X/%X",
 LSN_FORMAT_ARGS(lastCheckPoint.redo)),
  xtime ? errdetail("Last completed transaction was at log 
time %s.",

timestamptz_to_str(xtime)) : 0));

This code reports lastCheckPoint.redo as redo lsn. OTOH, with the patch, 
LogCheckpointEnd() reports ControlFile->checkPointCopy.redo. They may be 
different, for example, when the current DB state is not DB_IN_ARCHIVE_RECOVERY. 
In this case, which lsn should we report as redo lsn?


Do we ever reach CreateRestartPoint when ControlFile->stat !=
DB_IN_ARCHIVE_RECOVERY? Assert(ControlFile->state ==
DB_IN_ARCHIVE_RECOVERY); in CreateRestartPoint doesn't fail any
regression tests.


ISTM that CreateRestartPoint() can reach the condition ControlFile->state != 
DB_IN_ARCHIVE_RECOVERY. Please imagine the case where CreateRestartPoint() has 
already started and calls CheckPointGuts(). If the standby server is promoted 
while CreateRestartPoint() is flushing the data to disk at CheckPointGuts(), the 
state would be updated to DB_IN_PRODUCTION and CreateRestartPoint() can see the 
state != DB_IN_ARCHIVE_RECOVERY later.

As far as I read the code, this case seems to be able to make the server unrecoverable. 
If this case happens, since pg_control is not updated, pg_control still indicates the 
REDO LSN of last valid restartpoint. But CreateRestartPoint() seems to delete old WAL 
files based on its "current" REDO LSN not pg_control's REDO LSN. That is, WAL 
files required for the crash recovery starting from pg_control's REDO LSN would be 
removed.

If this understanding is right, to address this issue, probably we need to make 
CreateRestartPoint() do nothing (return immediately) when the state != 
DB_IN_ARCHIVE_RECOVERY?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: 2022-01 Commitfest

2022-02-02 Thread Julien Rouhaud
On Wed, Feb 02, 2022 at 01:10:39PM -0500, Jaime Casanova wrote:
> On Wed, Feb 02, 2022 at 01:00:18PM -0500, Tom Lane wrote:
> > 
> > Anyway, thanks to Julien for doing this mostly-thankless task
> > this time!
> > 
> 
> Agreed, great work!

Thanks a lot :)




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-02 Thread David G. Johnston
On Wednesday, February 2, 2022, Masahiko Sawada 
wrote:

> and have other error
> information in pg_stat_subscription_workers view.
>

What benefit is there to keeping the existing collector-based
pg_stat_subscripiton_workers view?  If we re-write it using shmem IPC then
we might as well put everything there and forego using a catalog.  Then it
behaves in a similar manner to pg_stat_activity but for logical replication
workers.

David J.


Re: support for CREATE MODULE

2022-02-02 Thread Julien Rouhaud
Hi,

On Thu, Feb 03, 2022 at 05:25:27AM +0100, Pavel Stehule wrote:
> 
> čt 3. 2. 2022 v 3:28 odesílatel Swaha Miller 
> napsal:
> 
> > Hi,
> >
> > I'm following up from Jim's POC for adding MODULE to PostgreSQL. [1]
> >
> > My proposal implements modules as schema objects to be stored in a new
> > system catalog pg_module with new syntax for CREATE [OR REPLACE] MODULE,
> > ALTER MODULE, DROP MODULE and for GRANT and REVOKE for privileges on
> > modules and module routines. I am attempting to follow the SQL spec.
> > However, for right now, I'm proposing to support only routines as module
> > contents, with local temporary tables and path specifications as defined
> > in the SQL spec, to be supported in a future submission. We could also
> > include support for variables depending on its status. [2]
> 
> I dislike this feature. The modules are partially redundant to schemas and
> to extensions in Postgres, and I am sure, so there is no reason to
> introduce this.
> 
> What is the benefit against schemas and extensions?

I agree with Pavel.  It seems that it's mainly adding another namespacing layer
between schemas and objects, and it's likely going to create a mess.
That's also going to be problematic if you want to add support for module
variables, as you won't be able to use e.g.
dbname.schemaname.modulename.variablename.fieldname.

Also, my understanding was that the major interest of modules (at least for the
routines part) was the ability to make some of them private to the module, but
it doesn't look like it's doing that, so I also don't see any real benefit
compared to schemas and extensions.




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-02 Thread Masahiko Sawada
On Wed, Feb 2, 2022 at 4:36 PM David G. Johnston
 wrote:
>
> On Tue, Feb 1, 2022 at 11:55 PM Amit Kapila  wrote:
>>
>> On Wed, Feb 2, 2022 at 9:41 AM David G. Johnston
>>  wrote:
>> >
>> > On Tue, Feb 1, 2022 at 8:07 PM Amit Kapila  wrote:
>> >>
>> >> On Tue, Feb 1, 2022 at 11:47 AM Masahiko Sawada  
>> >> wrote:
>> >>
>> >> >
>> >> > I see that it's better to use a better IPC for ALTER SUBSCRIPTION SKIP
>> >> > feature to pass error-XID or error-LSN information to the worker
>> >> > whereas I'm also not sure of the advantages in storing all error
>> >> > information in a system catalog. Since what we need to do for this
>> >> > purpose is only error-XID/LSN, we can store only error-XID/LSN in the
>> >> > catalog? That is, the worker stores error-XID/LSN in the catalog on an
>> >> > error, and ALTER SUBSCRIPTION SKIP command enables the worker to skip
>> >> > the transaction in question. The worker clears the error-XID/LSN after
>> >> > successfully applying or skipping the first non-empty transaction.
>> >> >
>> >>
>> >> Where do you propose to store this information?
>> >
>> >
>> > pg_subscription_worker
>> >
>> > The error message and context is very important.  Just make sure it is 
>> > only non-null when the worker state is "syncing failed" (or whatever term 
>> > we use).
>> >
>> >
>>
>> Sure, but is this the reason you want to store all the error info in
>> the system catalog? I agree that providing more error info could be
>> useful and also possibly the previously failed (apply) xacts info as
>> well but I am not able to see why you want to have that sort of info
>> in the catalog. I could see storing info like err_lsn/err_xid that can
>> allow to proceed to apply worker automatically or to slow down the
>> launch of errored apply worker but not all sort of other error info
>> (like err_cnt, err_code, err_message, err_time, etc.). I want to know
>> why you are insisting to make all the error info persistent via the
>> system catalog?
>
>
> I look at the catalog and am informed that the worker has stopped because of 
> an error.  I'd rather simply read the error message right then instead of 
> having to go look at the log file.  And if I am going to take an action in 
> order to overcome the error I would have to know what that error is; so the 
> error message is not something I can ignore.  The error is an attribute of 
> system state, and the catalog stores the current state of the (workers) 
> system.
>
> I already explained that the concept of err_cnt is not useful.  The fact that 
> you include it here makes me think you are still thinking that this all 
> somehow is meant to keep track of history.  It is not.  The workers are state 
> machines and "error" is one of the states - with relevant attributes to 
> display to the user, and system, while in that state.  The state machine 
> reporting does not care about historical states nor does it report on them.  
> There is some uncertainty if we continue with the automatic re-launch; which, 
> now that I write this, I can see where what you call err_cnt is effectively a 
> count of how many times the worker re-launched without the underlying problem 
> being resolved and thus encountered the same error.  If we persist with the 
> re-launch behavior then maybe err_cnt should be left in place - with the 
> description for it basically being the ah-ha! comment I just made. In a world 
> where we do not typically re-launch and simply re-try without being informed 
> there is a change - such a count remains of minimal value.
>
> I don't really understand the confusion here though - this error data already 
> exists in the pg_stat_subscription_workers stat collector view - the fact 
> that I want to keep it around (just changing the reset behavior) - doesn't 
> seem like it should be controversial.  I, thinking as a user, really don't 
> care about all of these implementation details.  Whether it is a pg_stat_* 
> view (collector or shmem IPC) or a pg_* catalog is immaterial to me.  The 
> behavior I observe is what matters.  As a developer I don't want to use the 
> statistics collector because these are not statistics and the collector is 
> unreliable.  I don't know enough about the relevant differences between 
> shared memory IPC and catalog tables to decide between them.  But catalog 
> tables seem like a lower bar to meet and seem like they can implement the 
> user-facing requirements as I envision them.

I see that important information such as error-XID that can be used
for ALTER SUBSCRIPTION SKIP needs to be stored in a reliable way, and
using system catalogs is a reasonable way for this purpose. But it's
still unclear to me why all error information that is currently shown
in pg_stat_subscription_workers view, including error-XID and the
error message, relation OID, action, etc., need to be stored in the
catalog. The information other than error-XID doesn't necessarily need
to be reliable compared to error-XID. I think we can have

Re: Unclear problem reports

2022-02-02 Thread Julien Rouhaud
Hi,

On Wed, Feb 02, 2022 at 07:35:36PM -0500, Bruce Momjian wrote:
> The Postgres community is great at diagnosing problems and giving users
> feedback.  In most cases, we can either diagnose a problem and give a
> fix, or at least give users a hint at finding the cause.
> 
> However, there is a class of problems that are very hard to help with,
> and I have perhaps seen an increasing number of them recently, e.g.:
> 
>   
> https://www.postgresql.org/message-id/17384-f50f2eedf541e512%40postgresql.org
>   
> https://www.postgresql.org/message-id/CALTnk7uOrMztNfzjNOZe3TdquAXDPD3vZKjWFWj%3D-Fv-gmROUQ%40mail.gmail.com
> 
> I consider these as problems that need digging to find the cause, and
> users are usually unable to do sufficient digging, and we don't have
> time to give them instructions, so they never get a reply.
> 
> Is there something we can do to improve this situation?  Should we just
> tell them they need to hire a Postgres expert?  I assume these are users
> who do not already have access to such experts.

There's also only so much you can do to help interacting by mail without
access to the machine, even if you're willing to spend time helping people for
free.

One thing that might help would be to work on a troubleshooting section on the
wiki with some common problems and some clear instructions on either how to try
to fix the problem or provide needed information for further debugging.  At
least it would save the time for those steps and one could quickly respond with
that link, similarly to how we do for the slow query questions wiki entry.




Re: support for CREATE MODULE

2022-02-02 Thread Pavel Stehule
Hi

čt 3. 2. 2022 v 3:28 odesílatel Swaha Miller 
napsal:

> Hi,
>
> I'm following up from Jim's POC for adding MODULE to PostgreSQL. [1]
>
> My proposal implements modules as schema objects to be stored in a new
> system catalog pg_module with new syntax for CREATE [OR REPLACE] MODULE,
> ALTER MODULE, DROP MODULE and for GRANT and REVOKE for privileges on
> modules and module routines. I am attempting to follow the SQL spec.
> However, for right now, I'm proposing to support only routines as module
> contents, with local temporary tables and path specifications as defined
> in the SQL spec, to be supported in a future submission. We could also
> include support for variables depending on its status. [2]
>
> Following are some examples of what the new module syntax would look
> like. The attached patch has detailed documentation.
>
> CREATE MODULE mtest1 CREATE FUNCTION m1testa() RETURNS text
> LANGUAGE sql
> RETURN '1x';
> SELECT mtest1.m1testa();
> ALTER MODULE mtest1 CREATE FUNCTION m1testd() RETURNS text
> LANGUAGE sql
> RETURN 'm1testd';
> SELECT mtest1.m1testd();
> ALTER MODULE mtest1 RENAME TO mtest1renamed;
> SELECT mtest1renamed.m1testd();
> REVOKE ON MODULE mtest1 REFERENCES ON FUNCTION m1testa() FROM public;
> GRANT ON MODULE mtest1 REFERENCES ON FUNCTION m1testa() TO
> regress_priv_user1;
>
> I am new to the PostgreSQL community and would really appreciate your
> input and feedback.
>

I dislike this feature. The modules are partially redundant to schemas and
to extensions in Postgres, and I am sure, so there is no reason to
introduce this.

What is the benefit against schemas and extensions?

Regards

Pavel


>
> Thanks,
> Swaha Miller
> Amazon Web Services
>
> [1]
> https://www.postgresql.org/message-id/CAB_5SRebSCjO12%3DnLsaLCBw2vnkiNH7jcNchirPc0yQ2KmiknQ%40mail.gmail.com
>
> [2]
> https://www.postgresql.org/message-id/flat/CAFj8pRD053CY_N4%3D6SvPe7ke6xPbh%3DK50LUAOwjC3jm8Me9Obg%40mail.gmail.com
>
>


Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-02-02 Thread Bharath Rupireddy
On Thu, Feb 3, 2022 at 12:07 AM Nathan Bossart  wrote:
>
> On Wed, Feb 02, 2022 at 05:19:26PM +0530, Bharath Rupireddy wrote:
> > On Wed, Feb 2, 2022 at 5:25 AM Nathan Bossart  
> > wrote:
> >> However, I'm not sure about the change to ReadDirExtended().  That might be
> >> okay for CheckPointSnapBuild(), which is just trying to remove old files,
> >> but CheckPointLogicalRewriteHeap() is responsible for ensuring that files
> >> are flushed to disk for the checkpoint.  If we stop reading the directory
> >> after an error and let the checkpoint continue, isn't it possible that some
> >> mappings files won't be persisted to disk?
> >
> > Unless I mis-read your above statement, with LOG level in
> > ReadDirExtended, I don't think we stop reading the files in
> > CheckPointLogicalRewriteHeap. Am I missing something here?
>
> ReadDirExtended() has the following comment:
>
>  * If elevel < ERROR, returns NULL after any error.  With the normal coding
>  * pattern, this will result in falling out of the loop immediately as
>  * though the directory contained no (more) entries.
>
> If there is a problem reading the directory, we will LOG and then exit the
> loop.  If we didn't scan through all the entries in the directory, there is
> a chance that we didn't fsync() all the files that need it.

Thanks. I get it. For syncing map files, we don't want to tolerate any
errors, whereas removal of the old map files (lesser than cutoff LSN)
can be tolerated in CheckPointLogicalRewriteHeap.

Here's the v7 version using ReadDir for CheckPointLogicalRewriteHeap.

Regards,
Bharath Rupireddy.


v7-0001-Replace-ReadDir-with-ReadDirExtended.patch
Description: Binary data


Re: Adding CI to our tree

2022-02-02 Thread Justin Pryzby
On Tue, Jan 18, 2022 at 03:08:47PM -0600, Justin Pryzby wrote:
> On Mon, Jan 17, 2022 at 12:16:19PM -0800, Andres Freund wrote:
> > I think it might still be worth adding stopgap way of running all tap tests 
> > on
> > windows though. Having a vcregress.pl function to find all directories with 
> > t/
> > and run the tests there, shouldn't be a lot of code...
> 
> I started doing that, however it makes CI/windows even slower.  I think it'll
> be necessary to run prove with all the tap tests to parallelize them, rather
> than looping around directories, many of which have only a single file, and 
> are
> run serially.

FYI: I've rebased these against your cirrus/windows changes.

A recent cirrus result is here; this has other stuff in the branch, so you can
see the artifacts with HTML docs and code coverage.

https://github.com/justinpryzby/postgres/runs/5046465342

95793675633 cirrus: spell ccache_maxsize
e5286ede1b4 cirrus: avoid unnecessary double star **
03f6de4643e cirrus: include hints how to install OS packages..
39cc2130e76 cirrus/linux: script test.log..
398b7342349 cirrus/macos: uname -a and export at end of sysinfo
9d0f03d3450 wip: cirrus: code coverage
bff64e8b998 cirrus: upload html docs as artifacts
80f52c3b172 wip: only upload changed docs
7f3b50f1847 pg_upgrade: show list of files copied only in vebose mode
ba229165fe1 wip: run pg_upgrade tests with data-checksums
97d7b039b9b vcregress/ci: test modules/contrib with NO_INSTALLCHECK=1
654b6375401 wip: vcsregress: add alltaptests

BTW, I think the double star added in f3feff825 probably doesn't need to be
double, either:

path: "crashlog-**.txt"

-- 
Justin




Re: psql tab completion versus Debian's libedit

2022-02-02 Thread Tom Lane
Andres Freund  writes:
> On 2022-02-02 22:28:31 -0500, Tom Lane wrote:
>> I conclude that there are no extant versions of readline/libedit
>> that don't have rl_completion_append_character, so we could
>> drop that configure test and save a cycle or two.

> Sounds good to me!

On it now.

regards, tom lane




Re: psql tab completion versus Debian's libedit

2022-02-02 Thread Andres Freund
On 2022-02-02 22:28:31 -0500, Tom Lane wrote:
> I conclude that there are no extant versions of readline/libedit
> that don't have rl_completion_append_character, so we could
> drop that configure test and save a cycle or two.

Sounds good to me!




Re: psql tab completion versus Debian's libedit

2022-02-02 Thread Tom Lane
Andres Freund  writes:
> I think this is caused by the feature flag detection being broken in the meson
> branch - unrelated to your commit - ending up with falsely believing that none
> of the rl_* variables exist (below for more on that aspect).
> Do we care that the tests would fail when using a readline without any of the
> rl_* variables? I don't know if those even exist.

Hm, interesting.  I reproduced this by #undef'ing
HAVE_RL_COMPLETION_APPEND_CHARACTER in pg_config.h.  It's not too
surprising, because without that, we have no way to prevent readline
from appending a space to a completion.  In the test at hand, that
causes two failures:

1. After we do "pub", we get "public. " not just "public.";
and now, that is no longer part of the word-to-be-completed.
That breaks the test because FROM is no longer the previous word
but the one before that, so we won't try to do table completion.

2. After we do "tab", we normally get "tab1 ", but since no
completion is attempted, we should get just "tab".  But we
get "tab " because the dummy-completion code at the bottom
of psql_completion fails to suppress adding a space.  In general,
*any* failed completion would nonetheless append a space.

Each of these is sufficiently bad to prompt user complaints,
I think, but we've seen none.  And I observe that every buildfarm
animal reports having rl_completion_append_character.

I conclude that there are no extant versions of readline/libedit
that don't have rl_completion_append_character, so we could
drop that configure test and save a cycle or two.

regards, tom lane




Re: Bugs in pgoutput.c

2022-02-02 Thread Tom Lane
Amit Kapila  writes:
> Tom, is it okay for you if I go ahead with this patch after some testing?

I've been too busy to get back to it, so sure.

regards, tom lane




Re: Bugs in pgoutput.c

2022-02-02 Thread Amit Kapila
On Sat, Jan 29, 2022 at 8:32 AM Amit Kapila  wrote:
>
> On Thu, Jan 6, 2022 at 3:42 AM Tom Lane  wrote:
> >
> > Commit 6ce16088b caused me to look at pgoutput.c's handling of
> > cache invalidations, and I was pretty appalled by what I found.
> >
> > * rel_sync_cache_relation_cb does the wrong thing when called for
> > a cache flush (i.e., relid == 0).  Instead of invalidating all
> > RelationSyncCache entries as it should, it will do nothing.
> >
> > * When rel_sync_cache_relation_cb does invalidate an entry,
> > it immediately zaps the entry->map structure, even though that
> > might still be in use (as per the adjacent comment that carefully
> > explains why this isn't safe).  I'm not sure if this could lead
> > to a dangling-pointer core dump, but it sure seems like it could
> > lead to failing to translate tuples that are about to be sent.
> >
> > * Similarly, rel_sync_cache_publication_cb is way too eager to
> > reset the pubactions flags, which would likely lead to failing
> > to transmit changes that we should transmit.
> >
>
> Are you planning to proceed with this patch?
>

Tom, is it okay for you if I go ahead with this patch after some testing?

-- 
With Regards,
Amit Kapila.




Re: Windows crash / abort handling

2022-02-02 Thread Andres Freund
Hi,

On 2022-02-02 11:24:19 +1300, Thomas Munro wrote:
> On Sun, Jan 30, 2022 at 10:02 AM Andres Freund  wrote:
> > On 2022-01-09 16:57:04 -0800, Andres Freund wrote:
> > > I've attached a patch implementing these changes.
> >
> > Unless somebody is planning to look at this soon, I'm planning to push it to
> > master. It's too annoying to have these hangs and not see backtraces.
> 
> +1, I don't know enough about Windows development to have an opinion
> on the approach but we've got to try *something*, these hangs are
> terrible.

I've pushed the patch this thread is about now. Lets see what the buildfarm
says. I only could one windows version.  Separately I've also pushed a patch
to run the windows tests under a timeout. I hope in combination these patches
address the hangs.

Greetings,

Andres Freund




Re: Unclear problem reports

2022-02-02 Thread David G. Johnston
On Wed, Feb 2, 2022 at 5:35 PM Bruce Momjian  wrote:

> I consider these as problems that need digging to find the cause, and
> users are usually unable to do sufficient digging, and we don't have
> time to give them instructions, so they never get a reply.
>
> Is there something we can do to improve this situation?  Should we just
> tell them they need to hire a Postgres expert?  I assume these are users
> who do not already have access to such experts.
>
>
Have pg_lister queue up a check for, say, two or three days after the bug
reporting form is filled out.  If the report hasn't been responded to by
someone other than the OP send out a reply that basically says:

We're sorry your message hasn't yet attracted a response.  If your report
falls into the category of "tech support for a malfunctioning server", and
this includes error messages that are difficult or impossible to replicate
in a development environment (maybe give some examples), you may wish to
consider eliciting paid professional help.  Please see this page on our
website for a directory of companies that provide such services.  The
PostgreSQL Core Project itself refrains from making recommendations since
many, if not all, of these companies contribute back to the project in
order to keep it both free and open source.

I would also consider adding a form that directs messages to pg_admin
instead of pg_bugs.  On that form we could put the above message upfront -
basically saying that here is a place to ask for help but the core project
(this website) doesn't directly provide such services: so if you don't
receive a reply, or your needs are immediate, consider enlisting a paid
support from our directory.

The problem with the separate form is that we would need users to
self-select to report their problem to the support list instead of using a
bug reporting form.  Neither of the mentioned emails are good examples of
bug reports.  Either we make it easier and hope self-selection works, or
just resign ourselves to seeing these messages on -bugs and use the first
option to at least be a bit more responsive.  The risks related to having a
rote response, namely it not really being applicable to the situation, seem
minimal and others seeing that response (assuming we'd send it to the list
and not just the OP) would likely encourage someone to at least give better
suggestions for next steps should that be necessary.

David J.


Re: psql tab completion versus Debian's libedit

2022-02-02 Thread Andres Freund
Hi,

On 2022-02-01 16:30:11 -0500, Tom Lane wrote:
> I chased down the failure that kittiwake has been showing since
> 02b8048ba [1].

I just rebased my meson branch across the commit d33a81203e9. And on freebsd
the meson based build failed in the expanded tests, while autoconf succeeded.

The failure is:

not ok 22 - complete schema-qualified name
#   Failed test 'complete schema-qualified name'
#   at /tmp/cirrus-ci-build/src/bin/psql/t/010_tab_completion.pl line 236.
# Actual output was "tab "
# Did not match "(?^:tab1 )"


I think this is caused by the feature flag detection being broken in the meson
branch - unrelated to your commit - ending up with falsely believing that none
of the rl_* variables exist (below for more on that aspect).

Do we care that the tests would fail when using a readline without any of the
rl_* variables? I don't know if those even exist.


The reason for meson not detecting the variables is either an "andres" or
freebsd / readline issue. The tests fail with:

/usr/local/include/readline/rltypedefs.h:71:36: error: unknown type name 'FILE'
typedef int rl_getc_func_t PARAMS((FILE *));
   ^
apparently the readline header on freebsd somehow has a dependency on stdio.h
being included.

Looks like it's easy enough to work around. My local copy of readline.h (8.1
on debian sid) has an explicit stdio.h include, but it looks like that's a
debian addition...

Greetings,

Andres Freund




Re: ci/cfbot: run windows tests under a timeout

2022-02-02 Thread Andres Freund
Hi,

On 2022-02-02 10:31:07 -0800, Andres Freund wrote:
> The attached test adds a timeout (using git's timeout binary) to all vcregress
> invocations. I've not re-added it to the other OSs, but I'm on the fence about
> doing so.

I've pushed this now.

Greetings,

Andres Freund




Unclear problem reports

2022-02-02 Thread Bruce Momjian
The Postgres community is great at diagnosing problems and giving users
feedback.  In most cases, we can either diagnose a problem and give a
fix, or at least give users a hint at finding the cause.

However, there is a class of problems that are very hard to help with,
and I have perhaps seen an increasing number of them recently, e.g.:


https://www.postgresql.org/message-id/17384-f50f2eedf541e512%40postgresql.org

https://www.postgresql.org/message-id/CALTnk7uOrMztNfzjNOZe3TdquAXDPD3vZKjWFWj%3D-Fv-gmROUQ%40mail.gmail.com

I consider these as problems that need digging to find the cause, and
users are usually unable to do sufficient digging, and we don't have
time to give them instructions, so they never get a reply.

Is there something we can do to improve this situation?  Should we just
tell them they need to hire a Postgres expert?  I assume these are users
who do not already have access to such experts.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Replace pg_controldata output fields with macros for better code manageability

2022-02-02 Thread Bruce Momjian
On Sat, Jan 29, 2022 at 08:00:53PM +0530, Bharath Rupireddy wrote:
> Hi,
> 
> While working on another pg_control patch, I observed that the
> pg_controldata output fields such as "Latest checkpoint's
> TimeLineID:", "Latest checkpoint's NextOID:'' and so on, are being
> used in pg_resetwal.c, pg_controldata.c and pg_upgrade/controldata.c.
> Direct usage of these fields in many places doesn't look good and can
> be error prone if we try to change the text in one place and forget in
> another place. I'm thinking of having those fields as macros in
> pg_control.h, something like [1] and use it all the places. This will
> be a good idea for better code manageability IMO.
> 
> Thoughts?
> 
> [1]
> #define PG_CONTROL_FIELD_VERSION_NUMBER "pg_control version number:"
> #define PG_CONTROL_FIELD_CATALOG_NUMBER "Catalog version number:"
> #define PG_CONTROL_FIELD_CHECKPOINT_TLI "Latest checkpoint's TimeLineID:"
> #define PG_CONTROL_FIELD_CHECKPOINT_PREV_TLI "Latest checkpoint's
> PrevTimeLineID:"
> #define PG_CONTROL_FIELD_CHECKPOINT_OLDESTXID "Latest checkpoint's oldestXID:"
> #define PG_CONTROL_FIELD_CHECKPOINT_OLDESTXID_DB "Latest checkpoint's
> oldestXID's DB:"
> and so on.

That seems like a very good idea.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: Server-side base backup: why superuser, not pg_write_server_files?

2022-02-02 Thread Tom Lane
I wrote:
> The Windows animals don't like this:
> pg_basebackup: error: connection to server at "127.0.0.1", port 59539 failed: 
> FATAL:  SSPI authentication failed for user "backupuser"

> Not sure whether we have a standard method to get around that.

Ah, right, we do.  Looks like adding something like

auth_extra => [ '--create-role', 'backupuser' ]

to the $node->init call would do it, or you could mess with
invoking pg_regress --config-auth directly.

regards, tom lane




Re: archive modules

2022-02-02 Thread Nathan Bossart
Thanks for the review!

On Wed, Feb 02, 2022 at 01:42:55PM -0500, Robert Haas wrote:
> I think avoiding ERROR is going to be impractical. Catching it in the
> contrib module seems OK. Catching it in the generic code is probably
> also possible to do in a reasonable way. Not catching the error also
> seems like it would be OK, since we expect errors to be infrequent.
> I'm not objecting to anything you did here, but I'm uncertain why
> adding basic_archive along shell_archive changes the calculus here in
> any way. It just seems like a separate problem.

The main scenario I'm thinking about is when there is no space left for
archives.  The shell archiving logic is pretty good about avoiding ERRORs,
so when there is a problem executing the command, the archiver will retry
the command a few times before giving up for a while.  If basic_archive
just ERROR'd due to ENOSPC, it would cause the archiver to restart.  Until
space frees up, I believe the archiver will end up restarting every 10
seconds.

I thought some more about adding a generic exception handler for the
archiving callback.  I think we'd need to add a new callback function that
would perform any required cleanup (e.g., closing any files that might be
left open).  That part isn't too bad.  However, module authors will also
need to keep in mind that the archiving callback runs in its own transient
memory context.  If the module needs to palloc() something that needs to
stick around for a while, it will need to do so in a different memory
context.  With sufficient documentation, maybe this part isn't too bad
either, but in the end, all of this is to save an optional ~15 lines of
code in the module.  It's not crucial to do your own ERROR handling in your
archive module, but if you want to, you can use basic_archive as a good
starting point.

tl;dr - I left it the same.

> +   /* Perform logging of memory contexts of this process */
> +   if (LogMemoryContextPending)
> +   ProcessLogMemoryContextInterrupt();
> 
> Any special reason for moving this up higher? Not really an issue, just 
> curious.

Since archive_library changes cause the archiver to restart, I thought it
might be good to move this before the process might exit in case
LogMemoryContextPending and ConfigReloadPending are both true.

> +   gettext_noop("This is unused if
> \"archive_library\" does not indicate archiving via shell is
> enabled.")
> 
> This contains a double negative. We could describe it more positively:
> This is used only if \"archive_library\" specifies archiving via
> shell. But that's actually a little confusing, because the way you've
> set it up, archiving via shell can be specified by writing either
> archive_library = '' or archive_library = 'shell'. I don't see any
> particularly good reason to allow that to be spelled in two ways.
> Let's pick one. Then here we can write either:
> 
> (a) This is used only if archive_library = 'shell'.
> -or-
> (b) This is used only if archive_library is not set.
> 
> IMHO, either of those would be clearer than what you have right now,
> and it would definitely be shorter.

I went with (b).  That felt a bit more natural to me, and it was easier to
code because I don't have to add error checking for an empty string.

> +/*
> + * Callback that gets called to determine if the archive module is
> + * configured.
> + */
> +typedef bool (*ArchiveCheckConfiguredCB) (void);
> +
> +/*
> + * Callback called to archive a single WAL file.
> + */
> +typedef bool (*ArchiveFileCB) (const char *file, const char *path);
> +
> +/*
> + * Called to shutdown an archive module.
> + */
> +typedef void (*ArchiveShutdownCB) (void);
> 
> I think that this is the wrong amount of comments. One theory is that
> the reader should refer to the documentation to understand how these
> callbacks work. In that case, having a separate comment for each one
> that doesn't really say anything is just taking up space. It would be
> better to have one comment for all three lines referring the reader to
> the documentation. Alternatively, one could take the position that the
> explanation should go into these comments, and then perhaps we don't
> even really need documentation. A one-line comment that doesn't really
> say anything non-obvious seems like the worst amount.

In my quest to write well-commented code, I sometimes overdo it.  I
adjusted these comments in the new revision.

> + 
> +  
> +   There are considerable robustness and security risks in using
> archive modules
> +   because, being written in the C language, they
> have access
> +   to many server resources.  Administrators wishing to enable archive 
> modules
> +   should exercise extreme caution.  Only carefully audited modules should be
> +   loaded.
> +  
> + 
> 
> Maybe I'm just old and jaded, but do we really need this? I know we
> have the same thing for background workers, but if anything that seems
> like an argument against duplicating it elsewhere. Lots of 

Re: Server-side base backup: why superuser, not pg_write_server_files?

2022-02-02 Thread Tom Lane
=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?=  writes:
> Here's a follow-on patch that adds a test for non-superuser server-side
> basebackup, which crashes without your patch and passes with it.

The Windows animals don't like this:

# Running: pg_basebackup --no-sync -cfast -U backupuser --target 
server:C:\\prog\\bf\\root\\HEAD\\pgsql.build\\src\\bin\\pg_basebackup\\tmp_check\\tmp_test_VGMM/backuponserver
 -X none
pg_basebackup: error: connection to server at "127.0.0.1", port 59539 failed: 
FATAL:  SSPI authentication failed for user "backupuser"
not ok 108 - backup target server

#   Failed test 'backup target server'
#   at t/010_pg_basebackup.pl line 527.

Not sure whether we have a standard method to get around that.

regards, tom lane




Re: [PATCH] Add extra statistics to explain for Nested Loop

2022-02-02 Thread Ekaterina Sokolova

Hi, hackers.

I apply the new version of patch.

Justin Pryzby  wrote:

I'm curious to hear what you and others think of the refactoring.
Thank you so much. With your changes, the patch has become more 
understandable and readable.


It'd be nice if there's a good way to add a test case for verbose 
output

involving parallel workers, but the output is unstable ...

Done!

Lukas Fittl  wrote:
I've briefly thought whether this needs documentation (currently the 
patch includes none),
but there does not appear to be a good place to add documentation about 
this from a
quick glance, so it seems acceptable to leave this out given the lack 
of more detailed

EXPLAIN documentation in general.

You're right! I added feature description to the patch header.

Whilst no specific bad cases were provided, I wonder if even a simple 
pgbench with

auto_explain (and log_analyze=1) would be a way to test this?
I wanted to measure overheads, but could't choose correct way. Thanks 
for idea with auto_explain.
I loaded it and made 10 requests of pgbench (number of clients: 1, of 
threads: 1).
I'm not sure I chose the right way to measure overhead, so any 
suggestions are welcome.

Current results are in file overhead_v0.txt.

Please feel free to share your suggestions and comments. Regards,

--
Ekaterina Sokolova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres CompanyFrom: Ekaterina Sokolova 
Subject: [PATCH] Add extra statistics to explain for Nested Loop

For some distributions of data in tables, different loops in nested loop
joins can take different time and process different amounts of entries.
It makes average statistics returned by explain analyze not very useful for DBA.

This patch add collecting of min, max and total statistics for time and rows
across all loops to EXPLAIN ANALYSE. You need to set the VERBOSE flag to display
this information. The patch contains regression tests.

Example of results in TEXT format:
   ->  Append (actual rows=5 loops=5)
 Loop Min Rows: 2  Max Rows: 6  Total Rows: 23

Reviewed-by: Lukas Fittl, Justin Pryzby, Yugo Nagata, Julien Rouhaud.

---
 src/backend/commands/explain.c| 147 +++---
 src/backend/executor/instrument.c |  31 ++
 src/include/executor/instrument.h |   6 ++
 src/test/regress/expected/explain.out |  10 ++
 src/test/regress/expected/partition_prune.out | 117 
 src/test/regress/sql/partition_prune.sql  |  32 ++
 6 files changed, 280 insertions(+), 63 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 10644dfac44..458f37c3fc6 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -118,6 +118,8 @@ static void show_instrumentation_count(const char *qlabel, int which,
 	   PlanState *planstate, ExplainState *es);
 static void show_foreignscan_info(ForeignScanState *fsstate, ExplainState *es);
 static void show_eval_params(Bitmapset *bms_params, ExplainState *es);
+static void show_loop_info(Instrumentation *instrument, bool isworker,
+		   ExplainState *es);
 static const char *explain_get_index_name(Oid indexId);
 static void show_buffer_usage(ExplainState *es, const BufferUsage *usage,
 			  bool planning);
@@ -1606,42 +1608,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
 
 	if (es->analyze &&
 		planstate->instrument && planstate->instrument->nloops > 0)
-	{
-		double		nloops = planstate->instrument->nloops;
-		double		startup_ms = 1000.0 * planstate->instrument->startup / nloops;
-		double		total_ms = 1000.0 * planstate->instrument->total / nloops;
-		double		rows = planstate->instrument->ntuples / nloops;
-
-		if (es->format == EXPLAIN_FORMAT_TEXT)
-		{
-			if (es->timing)
-appendStringInfo(es->str,
- " (actual time=%.3f..%.3f rows=%.0f loops=%.0f)",
- startup_ms, total_ms, rows, nloops);
-			else
-appendStringInfo(es->str,
- " (actual rows=%.0f loops=%.0f)",
- rows, nloops);
-		}
-		else
-		{
-			if (es->timing)
-			{
-ExplainPropertyFloat("Actual Startup Time", "ms", startup_ms,
-	 3, es);
-ExplainPropertyFloat("Actual Total Time", "ms", total_ms,
-	 3, es);
-			}
-			ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
-			ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
-		}
-	}
+		show_loop_info(planstate->instrument, false, es);
 	else if (es->analyze)
 	{
 		if (es->format == EXPLAIN_FORMAT_TEXT)
 			appendStringInfoString(es->str, " (never executed)");
 		else
 		{
+			/* without min and max values because actual result is 0 */
 			if (es->timing)
 			{
 ExplainPropertyFloat("Actual Startup Time", "ms", 0.0, 3, es);
@@ -1664,44 +1638,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		for (int n = 0; n < w->num_workers; n++)
 		{
 			Instrumentation *instrument = >instrument[n];
-			double		nloops = instrument->nloops;
-			double		startup_ms;
-			

Re: Support for NSS as a libpq TLS backend

2022-02-02 Thread Bruce Momjian
On Tue, Feb  1, 2022 at 01:52:09PM -0800, Andres Freund wrote:
> There's https://hg.mozilla.org/projects/nspr/file/tip/pr/src - which is I
> think the upstream source.
> 
> A project without even a bare-minimal README at the root does have a "internal
> only" feel to it...

I agree --- it is a library --- if they don't feel the need to publish
the API, it seems to mean they want to maintain the ability to change it
at any time, and therefore it is inappropriate for other software to
rely on that API.

This is not the same as Postgres extensions needing to read the Postgres
source code --- they are an important but edge use case and we never saw
the need to standardize or publish the internal functions that must be
studied and adjusted possibly for major releases.

This kind of feels like the Chrome JavaScript code that used to be able
to be build separately for PL/v8, but has gotten much harder to do in
the past few years.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.





Re: CREATEROLE and role ownership hierarchies

2022-02-02 Thread Robert Haas
On Wed, Feb 2, 2022 at 3:23 PM Mark Dilger  wrote:
> It's perfectly reasonable (in my mind) that Robert, acting as superuser, may 
> want to create a creator who acts like a superuser over the sandbox, while at 
> the same time Stephen, acting as superuser, may want to create a creator who 
> acts as a low privileged bot that only adds and removes roles, but cannot 
> read their tables, SET ROLE to them, etc.
>
> I don't see any reason that Robert and Stephen can't both get what they want. 
>  We just have to make Thing 1 flexible enough.

Hmm, that would be fine with me. I don't mind a bit if other people
get what they want, as long as I can get what I want, too! In fact,
I'd prefer it if other people also get what they want...

That having been said, I have some reservations if it involves tightly
coupling new features that we're trying to add to existing things that
may or may not be that well designed, like the role-level INHERIT
flag, or WITH ADMIN OPTION, or the not-properly maintained
pg_auth_members.grantor column, or even the SQL standard. I'm not
saying we should ignore any of those things and I don't think that we
should ... but at the same time, we can't whether the feature does
what people want it to do, either. If we do, this whole thing is
really a complete waste of time. If a patch achieves infinitely large
amounts of backward compatibility, standards compliance, and
conformity with existing design but doesn't do the right stuff, forget
it!

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Why is INSERT-driven autovacuuming based on pg_class.reltuples?

2022-02-02 Thread Peter Geoghegan
On Tue, Feb 1, 2022 at 3:02 PM David Rowley  wrote:
> If we wanted a more current estimate for the number of tuples in a
> relation then we could use reltuples / relpages *
> RelationGetNumberOfBlocks(r).  However, I still don't see why an
> INSERT driven auto-vacuums are a particularly special case. ANALYZE
> updating the reltuples estimate had an effect on when auto-vacuum
> would trigger for tables that generally grow in the number of live
> tuples but previously only (i.e before insert vacuums existed)
> received auto-vacuum attention due to UPDATEs/DELETEs.

Sorry for the delay in my response; I've been travelling.

I admit that I jumped the gun here -- I now believe that it's
operating as originally designed. And that the design is reasonable.
It couldn't hurt to describe the design in a little more detail in the
user docs, though.

> Nothing there seems to indicate the scale is based on the historical
> table size when the table was last vacuumed/analyzed, so you could
> claim that the 3 usages of relpages when deciding if the table should
> be vacuumed and/or analyzed are all wrong and should take into account
> RelationGetNumberOfBlocks too.

The route of my confusion might interest you. vacuumlazy.c's call to
vac_update_relstats() provides relpages and reltuples values that are
very accurate relative to the VACUUM operations view of things
(relative to its OldestXmin, which uses the size of the table at the
start of the VACUUM, not the end). When the vac_update_relstats() call
is made, the same accurate-relative-toOldestXmin values might actually
*already* be out of date relative to the physical table as it is at
the end of the same VACUUM.

The fact that these accurate values could be out of date like this is
practically inevitable for a certain kind of table. But that might not
matter at all, if they were later *interpreted* in a way that took
this into account later on -- context matters.

For some reason I made the leap to thinking that everybody else
believed the same thing, too, and that the intention with the design
of the insert-driven autovacuum stuff was to capture that. But that's
just not the case, at all. I apologize for the confusion.

BTW, this situation with the relstats only being accurate "relative to
the last VACUUM's OldestXmin" is *especially* important with
new_dead_tuples values, which are basically recently dead tuples
relative to OldestXmin. In the common case where there are very few
recently dead tuples, we have an incredibly distorted "sample" of dead
tuples (it tells us much more about VACUUM implementation details than
the truth of what's going on in the table). So that also recommends
"relativistically interpreting" the values later on. This specific
issue has even less to do with autovacuum_vacuum_scale_factor than the
main point, of course. I agree with you that the insert-driven stuff
isn't a special case.

--
Peter Geoghegan




Re: CREATEROLE and role ownership hierarchies

2022-02-02 Thread Mark Dilger



> On Feb 2, 2022, at 11:52 AM, Stephen Frost  wrote:
> 
> The question that we need to solve is how to give
> users the ability to choose what roles have which of the privileges that
> we've outlined above and agreed should be separable.

Ok, there are really two different things going on here, and the conversation 
keeps conflating them.  Maybe I'm wrong, but I think the conflation of these 
things is the primary problem preventing us from finishing up the design.

Thing 1:   The superuser needs to be able to create roles who can create other 
roles.  Let's call them "creators".  Not every organization will want the same 
level of privilege to be given to a creator, or even that all creators have 
equal levels of privilege.  So when the superuser creates a creator, the 
superuser needs to be able to configure what exactly what that creator can do.  
This includes which attributes the creator can give to new roles.  It *might* 
include whether the creator maintains a dependency link with the created role, 
called "ownership" or somesuch.  It *might* include whether the creator can 
create roles into which the creator is granted membership/administership.  But 
there really isn't any reason that these things should be all-or-nothing.  
Maybe one creator maintains a dependency link with created roles, and that 
dependency link entails some privileges.  Maybe other creators do not maintain 
such a link.  It seems like superuser can define a creator in many different 
ways, as long as we nail down what those ways are, and what they mean.

Thing 2:  The creator needs to be able to specify which attributes and role 
memberships are set up with for roles the creator creates.  To the extent that 
the creator has been granted the privilege to create yet more creators, this 
recurses to Thing 1.  But not all creators will have that ability.


I think the conversation gets off topic and disagreement abounds when Thing 1 
is assumed to be hardcoded, leaving just the details of Thing 2 to be discussed.

It's perfectly reasonable (in my mind) that Robert, acting as superuser, may 
want to create a creator who acts like a superuser over the sandbox, while at 
the same time Stephen, acting as superuser, may want to create a creator who 
acts as a low privileged bot that only adds and removes roles, but cannot read 
their tables, SET ROLE to them, etc.

I don't see any reason that Robert and Stephen can't both get what they want.  
We just have to make Thing 1 flexible enough.

Do you agree at least with this much?  If so, I think we can hammer out what to 
do about Thing 1 and get something committed in time for postgres 15.  If not, 
then I'm probably going to stop working on this until next year, because at 
this point, we don't have enough time to finish.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: speed up text_position() for utf-8

2022-02-02 Thread John Naylor
I wrote:

> I tried this test on a newer CPU, and 0003 had no regression. Both
> systems used gcc 11.2. Rather than try to figure out why an experiment
> had unexpected behavior, I plan to test 0001 and 0002 on a couple
> different compilers/architectures and call it a day. It's also worth
> noting that 0002 by itself seemed to be decently faster on the newer
> machine, but not as fast as 0001 and 0002 together.

I tested four machines with various combinations of patches, and it
seems the only thing they all agree on is that 0002 is a decent
improvement (full results attached). The others can be faster or
slower. 0002 also simplifies things, so it has that going for it. I
plan to commit that this week unless there are objections.
-- 
John Naylor
EDB: http://www.enterprisedb.com
haswell xeon / gcc 8

master:
Time: 608.047 ms
Time: 2859.939 ms (00:02.860)
Time: 2255.402 ms (00:02.255)

0001 only:
Time: 603.026 ms
Time: 3077.163 ms (00:03.077)
Time: 2377.189 ms (00:02.377)

0002 only:
Time: 615.444 ms
Time: 2388.939 ms (00:02.389)
Time: 2227.665 ms (00:02.228)

0001-0002:
Time: 608.757 ms
Time: 2746.711 ms (00:02.747)
Time: 2156.545 ms (00:02.157)

0001-0003:
Time: 609.236 ms
Time: 1466.130 ms (00:01.466)
Time: 1353.658 ms (00:01.354)


power8 / gcc 4.8

master:
Time: 559.375 ms
Time: 7336.079 ms (00:07.336)
Time: 5049.222 ms (00:05.049)

0001 only:
Time: 559.466 ms
Time: 7272.719 ms (00:07.273)
Time: 4878.817 ms (00:04.879)

0002 only:
Time: 555.255 ms
Time: 6061.435 ms (00:06.061)
Time: 4440.829 ms (00:04.441)

0001-0002:
Time: 555.144 ms
Time: 5378.270 ms (00:05.378)
Time: 5745.331 ms (00:05.745)

0001-0003:
Time: 554.707 ms
Time: 2189.187 ms (00:02.189)
Time: 2193.870 ms (00:02.194)


comet lake / gcc 11

master:
Time: 383.196 ms
Time: 2323.875 ms (00:02.324)
Time: 1866.594 ms (00:01.867)

0001 only:
Time: 379.193 ms
Time: 2046.540 ms (00:02.047)
Time: 1720.601 ms (00:01.721)

0002 only:
Time: 374.072 ms
Time: 1769.163 ms (00:01.769)
Time: 1485.612 ms (00:01.486)

0001-0002:
Time: 374.454 ms
Time: 1208.933 ms (00:01.209)
Time: 1018.236 ms (00:01.018)

0001-0003:
Time: 376.698 ms
Time: 1278.807 ms (00:01.279)
Time: 1177.005 ms (00:01.177)


3-year old core i5 / gcc 11

master:
Time: 378.955 ms
Time: 2329.763 ms (00:02.330)
Time: 1891.931 ms (00:01.892)

0001 only:
Time: 384.547 ms
Time: 2342.837 ms (00:02.343)
Time: 1906.436 ms (00:01.906)

0002 only:
Time: 379.096 ms
Time: 2051.755 ms (00:02.052)
Time: 1641.376 ms (00:01.641)

0001-0002:
Time: 377.758 ms
Time: 1293.624 ms (00:01.294)
Time: 1096.991 ms (00:01.097)

0001-0003:
Time: 377.767 ms
Time: 2052.826 ms (00:02.053)
Time: 1329.185 ms (00:01.329)


Re: CREATEROLE and role ownership hierarchies

2022-02-02 Thread Stephen Frost
Greetings,

* Mark Dilger (mark.dil...@enterprisedb.com) wrote:
> > On Jan 31, 2022, at 10:50 AM, Stephen Frost  wrote:
> > Supporting that through ADMIN is one option, another would be a
> > 'DROPROLE' attribute, though we'd want a way to curtail that from being
> > able to be used for just any role and that does lead down a path similar
> > to ownership or just generally the concept that some roles have certain
> > rights over certain other roles (whether you create them or not...).
> 
> I've been operating under the assumption that I have a lot more freedom to 
> create new features than to change how existing features behave, for two 
> reasons: backwards compatibility and sql-spec compliance.

I agree that those are concerns that need to be considered, though I'm
more concerned about the SQL compliance and less about backwards
compatibility in this case.  For one thing, I'm afraid that we're not as
compliant as we really should be and that should really drive us to make
change here anyway, to get closer to what the spec calls for.

> Changing how having ADMIN on a role works seems problematic for both those 
> reasons.  My family got me socks for Christmas, not what I actually wanted, a 
> copy of the SQL-spec.  So I'm somewhat guessing here.  But I believe we'd 
> have problems if we "fixed" the part where a role can revoke ADMIN from 
> others on themselves.  Whatever we have, whether we call it "ownership", it 
> can't be something a role can unilaterally revoke.
> 
> As for a 'DROPROLE' attribute, I don't think that gets us anywhere.  You 
> don't seem to think so, either.  So that leaves us with "ownership", perhaps 
> by another word?  I only chose that word because it's what we use elsewhere, 
> but if we want to call it "managementship" and "manager" or whatever, that's 
> fine.  I'm not to the point of debating the terminology just yet.  I'm still 
> trying to get the behavior nailed down.

Yeah, didn't mean to imply that those were great ideas or that I was
particularly advocating for them, but just to bring up some other ideas
to try and get more thought going into this.

> > I do think there's a lot of value in being able to segregate certain
> > rights- consider that you may want a role that's able to create other
> > roles, perhaps grant them into some set of roles, can lock those roles
> > (prevent them from logging in, maybe do a password reset, something like
> > that), but which *isn't* able to drop those roles (and all their
> > objects) as that's dangerous and mistakes can certainly happen, or be
> > able to become that role because the creating role simply doesn't have
> > any need to be able to do that (or desire to in many cases, as we
> > discussed in the landlord-vs-tenant sub-thread).
> 
> I'm totally on the same page.  Your argument upthread about wanting any 
> malfeasance on the part of a service provider showing up in the audit logs 
> was compelling.  Even for those things the "owner"/"manager" has the rights 
> to do, we might want to make them choose to do it explicitly and not merely 
> do it by accident.

Glad to hear that.

> > Naturally, you'd want *some* role to be able to drop that role (and one
> > that doesn't have full superuser access) but that might be a role that's
> > not able to create new roles or take over accounts.
> 
> I think it's important to go beyond the idea of a role attribute here.  It's 
> not that role "bob" can drop roles.  It's that "bob" can drop *specific* 
> roles, and for that, there has to be some kind of dependency tracked between 
> "bob" and those other roles.  I'm calling that "ownership".  I think that 
> language isn't just arbitrary, but actually helpful (technically, not 
> politically) because REASSIGN OWNED should treat this kind of relationship 
> exactly the same as it treats ownership of schemas, tables, functions, etc.

I agree that role attributes isn't a good approach and that we should be
moving away from it.

I'm less sure that the existance of REASSIGN OWNED for schemas and
tables and such should be the driver for what this capability of one
role being able to drop another role needs to be called.

> > Separation of concerns and powers and all of that is what we want to be
> > going for here, more generically, which is why I was opposed to the
> > blanket "owners have all rights of all roles they own" implementation.
> 
> I'm hoping to bring back, in v9, the idea of ownership/managership.  The real 
> sticking point here is that we (Robert, Andrew, I, and possibly others) want 
> to be able to drop in a non-superuser-creator-role into existing systems that 
> use superuser for role management.  We'd like it to be as transparent a 
> switch as possible.

That description itself really makes me wonder about the sense of what
was proposed.  Specifically "existing systems that use superuser for
role management" doesn't make me picture a system where this manager
role has any need to run SELECT statements against the 

Re: Refactoring SSL tests

2022-02-02 Thread Daniel Gustafsson
> On 2 Feb 2022, at 17:09, Andrew Dunstan  wrote:
> On 2/2/22 08:26, Daniel Gustafsson wrote:

>> Thoughts?  I'm fairly sure there are many crimes against Perl in this patch,
>> I'm happy to take pointers on how to improve that.
> 
> It feels a bit odd to me from a perl POV. I think it needs to more along
> the lines of standard OO patterns. I'll take a stab at that based on
> this, might be a few days.

That would be great, thanks!

--
Daniel Gustafsson   https://vmware.com/





Re: Server-side base backup: why superuser, not pg_write_server_files?

2022-02-02 Thread Tom Lane
Daniel Gustafsson  writes:
>> On 2 Feb 2022, at 19:58, Robert Haas  wrote:
>> And one thing that concretely stinks about is the progress reporting
>> you get while the tests are running:
>> 
>> t/010_pg_basebackup.pl ... 142/?
>> 
>> That's definitely less informative than 142/330 or whatever.

> There is that.  That's less informative, but only when looking at the tests
> while they are running.  There is no difference once the tests has finished so
> CI runs etc are no less informative.  This however is something to consider.

TBH I don't see that that's worth much.  None of our tests run so long
that you'll be sitting there trying to estimate when it'll be done.
I'd rather have the benefit of not having to maintain the test counts.

regards, tom lane




Re: [PATCH] Accept IP addresses in server certificate SANs

2022-02-02 Thread Jacob Champion
On Mon, 2022-01-31 at 17:29 +0900, Kyotaro Horiguchi wrote:
> However, 0002,
> 
> +/*
> + * In a frontend build, we can't include inet.h, but we still need to have
> + * sensible definitions of these two constants.  Note that pg_inet_net_ntop()
> + * assumes that PGSQL_AF_INET is equal to AF_INET.
> + */
> +#define PGSQL_AF_INET  (AF_INET + 0)
> +#define PGSQL_AF_INET6 (AF_INET + 1)
> +
> 
> Now we have the same definition thrice in frontend code. Coulnd't we
> define them in, say, libpq-fe.h or inet-fe.h (nonexistent) then
> include it from the three files?

I started down the inet-fe.h route, and then realized I didn't know
where that should go. Does it need to be included in (or part of)
port.h? And should it be installed as part of the logic in
src/include/Makefile?

> +$node->connect_fails(
> +   "$common_connstr host=192.0.2.2",
> +   "host not matching an IPv4 address (Subject Alternative Name 1)",
> 
> It is not the real IP address of the server.
> 
> https://datatracker.ietf.org/doc/html/rfc6125
> > In some cases, the URI is specified as an IP address rather than a
> > hostname.  In this case, the iPAddress subjectAltName must be
> > present in the certificate and must exactly match the IP in the URI.
> 
> When IP address is embedded in URI, it won't be translated to another
> IP address. Concretely https://192.0.1.5/hoge cannot reach to the host
> 192.0.1.8.  On the other hand, as done in the test, libpq allows that
> when "host=192.0.1.5 hostaddr=192.0.1.8".  I can't understand what we
> are doing in that case.  Don't we need to match the SAN IP address
> with hostaddr instead of host?

I thought that host, not hostaddr, was the part that corresponded to
the URI. So in a hypothetical future where postgresqls:// exists, the
two URIs

postgresqls://192.0.2.2:5432/db
postgresqls://192.0.2.2:5432/db?hostaddr=127.0.0.1

should both be expecting the same certificate. That seems to match the
libpq documentation as well.

(Specifying a host parameter is also allowed... that seems like it
could cause problems for a hypothetical postgresqls:// scheme, but it's
probably not relevant for this thread.)

--Jacob




Re: [PoC] Delegating pg_ident to a third party

2022-02-02 Thread Jacob Champion
On Mon, 2022-01-10 at 15:09 -0500, Stephen Frost wrote:
> Greetings,

Sorry for the delay, the last few weeks have been insane.

> * Jacob Champion (pchamp...@vmware.com) wrote:
> > On Tue, 2022-01-04 at 22:24 -0500, Stephen Frost wrote:
> > > On Tue, Jan 4, 2022 at 18:56 Jacob Champion  wrote:
> > > > Could you talk more about the use cases for which having the "actual
> > > > user" is better? From an auditing perspective I don't see why
> > > > "authenticated as ja...@example.net, logged in as admin" is any worse
> > > > than "logged in as jacob".
> > > 
> > > The above case isn’t what we are talking about, as far as I
> > > understand anyway. You’re suggesting “authenticated as 
> > > ja...@example.net, logged in as sales” where the user in the database
> > > is “sales”.  Consider triggers which only have access to “sales”, or
> > > a tool like pgaudit which only has access to “sales”.
> > 
> > Okay. So an additional getter function in miscadmin.h, and surfacing
> > that function to trigger languages, are needed to make authn_id more
> > generally useful. Any other cases you can think of?
> 
> That would help but now you've got two different things that have to be
> tracked, potentially, because for some people you might not want to use
> their system auth'd-as ID.  I don't see that as a great solution and
> instead as a workaround.

There's nothing to be worked around. If you have a user mapping set up
using the features that exist today, and you want to audit who logged
in at some point in the past, then you need to log both the
authenticated ID and the authorized role. There's no getting around
that. It's not enough to say "just check the configuration" because the
config can change over time.

> > But to elaborate: *forcing* one-user-per-role is wasteful, because if I
> > have a thousand employees, and I want to give all my employees access
> > to a guest role in the database, then I have to administer a thousand
> > roles: maintaining them through dump/restores and pg_upgrades, auditing
> > them to figure out why Bob in Accounting somehow got a different
> > privilege GRANT than the rest of the users, adding new accounts,
> > purging old ones, maintaining the inevitable scripts that will result.
> 
> pg_upgrade just handles it, no?  pg_dumpall -g does too.  Having to deal
> with roles in general is a pain but the number of them isn't necessarily
> an issue.  A guest role which doesn't have any auditing requirements
> might be a decent use-case for what you're talking about here but I
> don't know that we'd implement this for just that case.  Part of this
> discussion was specifically about addressing the other challenges- like
> having automation around the account addition/removal and sync'ing role
> membership too.  As for auditing privileges, that should be done
> regardless and the case you outline isn't somehow different from others
> (the same could be as easily said for how the 'guest' account got access
> to whatever it did).

I think there's a difference between auditing a small fixed number of
roles and auditing many thousands of them that change on a weekly or
daily basis. I'd rather maintain the former, given the choice. It's
harder for things to slip through the cracks with fewer moving pieces.

> > If none of the users need to be "special" in any way, that's all wasted
> > overhead. (If they do actually need to be special, then at least some
> > of that overhead becomes necessary. Otherwise it's waste.) You may be
> > able to mitigate the cost of the waste, or absorb the mitigations into
> > Postgres so that the user can't see the waste, or decide that the waste
> > is not costly enough to care about. It's still waste.
> 
> Except the amount of 'wasted' overhead being claimed here seems to be
> hardly any.  The biggest complaint levied at this seems to really be
> just the issues around the load on the ldap systems from having to deal
> with the frequent sync queries, and that's largely a solvable issue in
> the majority of environments out there today.

As long as we're in agreement that there is waste, I don't think I'm
going to convince you about the cost. It's tangential anyway if you're
not going to remove many-to-many maps.

> > > Not sure exactly what you’re referring to here by “administer role
> > > privileges externally too”..?  Curious to hear what you are imagining
> > > specifically.
> > 
> > Just that it would be nice to centrally provision role GRANTs as well
> > as role membership, that's all. No specifics in mind, and I'm not even
> > sure if LDAP would be a helpful place to put that sort of config.
> 
> GRANT's on objects, you mean?  I agree, that would be interesting to
> consider though it would involve custom entries in the LDAP directory,
> no?  Role membership would be able to be sync'd as part of group
> membership and that was something I was thinking would be handled as
> part of this in a similar manner to what the 3rd party solutions provide
> today using 

Re: Server-side base backup: why superuser, not pg_write_server_files?

2022-02-02 Thread Daniel Gustafsson
> On 2 Feb 2022, at 19:58, Robert Haas  wrote:
> 
> On Wed, Feb 2, 2022 at 1:50 PM Robert Haas  wrote:
>> On Wed, Feb 2, 2022 at 1:46 PM Tom Lane  wrote:
>>> Well, if someone wants to step up and provide a patch that changes 'em
>>> all at once, that'd be great.  But we've discussed this before and
>>> nothing's happened.
>> 
>> I mean, I don't understand why it's even better. And I would go so far
>> as to say that if nobody can be bothered to do the work to convert
>> everything at once, it probably isn't better.

I personally think it's better, so I went and did the work.  The attached is a
first pass over the tree to see what such a patch would look like.  This should
get a thread of it's own and not be hidden here but as it was discussed I piled
on for now.

> And one thing that concretely stinks about is the progress reporting
> you get while the tests are running:
> 
> t/010_pg_basebackup.pl ... 142/?
> 
> That's definitely less informative than 142/330 or whatever.

There is that.  That's less informative, but only when looking at the tests
while they are running.  There is no difference once the tests has finished so
CI runs etc are no less informative.  This however is something to consider.

--
Daniel Gustafsson   https://vmware.com/



0001-Replace-Test-More-plans-with-done_testing.patch
Description: Binary data


Re: Server-side base backup: why superuser, not pg_write_server_files?

2022-02-02 Thread Robert Haas
On Wed, Feb 2, 2022 at 1:50 PM Robert Haas  wrote:
> On Wed, Feb 2, 2022 at 1:46 PM Tom Lane  wrote:
> > Well, if someone wants to step up and provide a patch that changes 'em
> > all at once, that'd be great.  But we've discussed this before and
> > nothing's happened.
>
> I mean, I don't understand why it's even better. And I would go so far
> as to say that if nobody can be bothered to do the work to convert
> everything at once, it probably isn't better.

And one thing that concretely stinks about is the progress reporting
you get while the tests are running:

t/010_pg_basebackup.pl ... 142/?

That's definitely less informative than 142/330 or whatever.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Server-side base backup: why superuser, not pg_write_server_files?

2022-02-02 Thread Robert Haas
On Wed, Feb 2, 2022 at 1:46 PM Tom Lane  wrote:
> Well, if someone wants to step up and provide a patch that changes 'em
> all at once, that'd be great.  But we've discussed this before and
> nothing's happened.

I mean, I don't understand why it's even better. And I would go so far
as to say that if nobody can be bothered to do the work to convert
everything at once, it probably isn't better.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Deparsing rewritten query

2022-02-02 Thread Pavel Stehule
st 2. 2. 2022 v 15:18 odesílatel Julien Rouhaud  napsal:

> Hi,
>
> On Tue, Feb 01, 2022 at 08:35:00PM +0100, Pavel Stehule wrote:
> >
> > I tested your patch, and it looks so it is working without any problem.
> All
> > tests passed.
> >
> > There is just one question. If printalias = true will be active for all
> > cases or just with some flag?
>
> Sorry, as I just replied to Bharath I sent the wrong patch.  The new patch
> has
> the same modification with printalias = true though, so I can still answer
> that
> question.  The change is active for all cases, however it's a no-op for any
> in-core case, as a query sent by a client should be valid, and thus should
> have
> an alias attached to all subqueries.  It's only different if you call
> get_query_def() on the result of pg_analyze_and_rewrite(), since this code
> doesn't add the subquery aliases as those aren't needed for the execution
> part.
>

ok.

I checked this trivial patch, and I don't see any problem. Again I run
check-world with success. The documentation for this feature is not
necessary. But I am not sure about regress tests. Without any other code,
enfosing printalias will be invisible. What do you think about the
transformation of your extension to a new module in src/test/modules? Maybe
it can be used for other checks in future.

Regards

Pavel


Re: Server-side base backup: why superuser, not pg_write_server_files?

2022-02-02 Thread Tom Lane
Robert Haas  writes:
> On Wed, Feb 2, 2022 at 12:55 PM Tom Lane  wrote:
>> Actually, it seemed that the consensus in the nearby thread [1]
>> was to start doing exactly that, rather than try to convert them
>> all in one big push.

> Urk. Well, OK then.

> Such an approach seems to me to have essentially nothing to recommend
> it, but I just work here.

Well, if someone wants to step up and provide a patch that changes 'em
all at once, that'd be great.  But we've discussed this before and
nothing's happened.

regards, tom lane




Re: Server-side base backup: why superuser, not pg_write_server_files?

2022-02-02 Thread Robert Haas
On Wed, Feb 2, 2022 at 12:55 PM Tom Lane  wrote:
> Robert Haas  writes:
> > This seems like a good idea, but I'm not going to slip a change from
> > an exact test count to done_testing() into a commit on some other
> > topic...
>
> Actually, it seemed that the consensus in the nearby thread [1]
> was to start doing exactly that, rather than try to convert them
> all in one big push.

Urk. Well, OK then.

Such an approach seems to me to have essentially nothing to recommend
it, but I just work here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: archive modules

2022-02-02 Thread Robert Haas
On Mon, Jan 31, 2022 at 8:36 PM Nathan Bossart  wrote:
> If basic_archive is to be in contrib, we probably want to avoid restarting
> the archiver every time the module ERRORs.  I debated trying to add a
> generic exception handler that all archive modules could use, but I suspect
> many will have unique cleanup requirements.  Plus, AFAICT restarting the
> archiver isn't terrible, it just causes most of the normal retry logic to
> be skipped.
>
> I also looked into rewriting basic_archive to avoid ERRORs and return false
> for all failures, but this was rather tedious.  Instead, I just introduced
> a custom exception handler for basic_archive's archive callback.  This
> allows us to ERROR as necessary (which helps ensure that failures show up
> in the server logs), and the archiver can treat it like a normal failure
> and avoid restarting.

I think avoiding ERROR is going to be impractical. Catching it in the
contrib module seems OK. Catching it in the generic code is probably
also possible to do in a reasonable way. Not catching the error also
seems like it would be OK, since we expect errors to be infrequent.
I'm not objecting to anything you did here, but I'm uncertain why
adding basic_archive along shell_archive changes the calculus here in
any way. It just seems like a separate problem.

+   /* Perform logging of memory contexts of this process */
+   if (LogMemoryContextPending)
+   ProcessLogMemoryContextInterrupt();

Any special reason for moving this up higher? Not really an issue, just curious.

+   gettext_noop("This is unused if
\"archive_library\" does not indicate archiving via shell is
enabled.")

This contains a double negative. We could describe it more positively:
This is used only if \"archive_library\" specifies archiving via
shell. But that's actually a little confusing, because the way you've
set it up, archiving via shell can be specified by writing either
archive_library = '' or archive_library = 'shell'. I don't see any
particularly good reason to allow that to be spelled in two ways.
Let's pick one. Then here we can write either:

(a) This is used only if archive_library = 'shell'.
-or-
(b) This is used only if archive_library is not set.

IMHO, either of those would be clearer than what you have right now,
and it would definitely be shorter.

+/*
+ * Callback that gets called to determine if the archive module is
+ * configured.
+ */
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+
+/*
+ * Callback called to archive a single WAL file.
+ */
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+
+/*
+ * Called to shutdown an archive module.
+ */
+typedef void (*ArchiveShutdownCB) (void);

I think that this is the wrong amount of comments. One theory is that
the reader should refer to the documentation to understand how these
callbacks work. In that case, having a separate comment for each one
that doesn't really say anything is just taking up space. It would be
better to have one comment for all three lines referring the reader to
the documentation. Alternatively, one could take the position that the
explanation should go into these comments, and then perhaps we don't
even really need documentation. A one-line comment that doesn't really
say anything non-obvious seems like the worst amount.

+ 
+  
+   There are considerable robustness and security risks in using
archive modules
+   because, being written in the C language, they
have access
+   to many server resources.  Administrators wishing to enable archive modules
+   should exercise extreme caution.  Only carefully audited modules should be
+   loaded.
+  
+ 

Maybe I'm just old and jaded, but do we really need this? I know we
have the same thing for background workers, but if anything that seems
like an argument against duplicating it elsewhere. Lots of copies of
essentially identical warnings aren't the way to great documentation;
if we copy this here, we'll probably copy it to more places. And also,
it seems a bit like warning people that they shouldn't give their
complete financial records to total strangers about whom they have no
little or no information. Do tell.

+
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+
+
+If true is returned, the server will proceed with
+archiving the file by calling the archive_file_cb
+callback.  If false is returned, archiving will not
+proceed.  In the latter case, the server will periodically call this
+function, and archiving will proceed if it eventually returns
+true.

It's not obvious from reading this why anyone would want to provide
this callback, or have it do anything other than 'return true'. But
there actually is a behavior difference if you provide this and have
it return false, vs. just having archiving itself fail. At least, the
message "archive_mode enabled, yet archiving is not configured" will
be emitted. So that's something we could mention here.

I would suggest s/if it 

Re: A qsort template

2022-02-02 Thread John Naylor
I wrote:

> > 0010 - Thresholds on my TODO list.
>
> I did some basic tests on the insertion sort thresholds, and it looks
> like we could safely and profitably increase the current value from 7
> to 20 or so, in line with other more recent implementations. I've
> attached an addendum on top of 0012 and the full test results on an
> Intel Coffee Lake machine with gcc 11.1. I found that the object test
> setup in 0012 had some kind of bug that was comparing the pointer of
> the object array. Rather than fix that, I decided to use Datums, but
> with the two extremes in comparator: simple branching with machine
> instructions vs. a SQL-callable function. The papers I've read
> indicate the results for Datum sizes would not be much different for
> small structs. The largest existing sort element is SortTuple, but
> that's only 24 bytes and has a bulky comparator as well.
>
> The first thing to note is that I rejected outright any testing of a
> "middle value" where the pivot is simply the middle of the array. Even
> the Bently and McIlroy paper which is the reference for our
> implementation says "The range that consists of the single integer 7
> could be eliminated, but has been left adjustable because on some
> machines larger ranges are a few percent better".
>
> I tested thresholds up to 64, which is where I guessed results to get
> worse (most implementations are smaller than that). Here are the best
> thresholds at a quick glance:
>
> - elementary comparator:
>
> random: 16 or greater
> decreasing, rotate: get noticeably better all the way up to 64
> organ: little difference, but seems to get better all the way up to 64
> 0/1: seems to get worse above 20
>
> - SQL-callable comparator:
>
> random: between 12 and 20, but slight differences until 32
> decreasing, rotate: get noticeably better all the way up to 64
> organ: seems best at 12, but slight differences until 32
> 0/1: slight differences
>
> Based on these tests and this machine, it seems 20 is a good default
> value. I'll repeat this test on one older Intel and one non-Intel
> platform with older compilers.

The above was an Intel Comet Lake / gcc 11, and I've run the same test
on a Haswell-era Xeon / gcc 8 and a Power8 machine / gcc 4.8. The
results on those machines are pretty close to the above (full results
attached). The noticeable exception is the Power8 on random input with
a slow comparator -- those measurements there are more random than
others so we can't draw conclusions from them, but the deviations are
small in any case. I'm still thinking 20 or so is about right.

I've put a lot out here recently, so I'll take a break now and come
back in a few weeks.

(no running tally here because the conclusions haven't changed since
last message)
-- 
John Naylor
EDB: http://www.enterprisedb.com
NOTICE:  [direct] size=8MB, order=random, threshold=7, test=0, time=0.130188
NOTICE:  [direct] size=8MB, order=random, threshold=7, test=1, time=0.124503
NOTICE:  [direct] size=8MB, order=random, threshold=7, test=2, time=0.124557
NOTICE:  [direct] size=8MB, order=random, threshold=12, test=0, time=0.119103
NOTICE:  [direct] size=8MB, order=random, threshold=12, test=1, time=0.119035
NOTICE:  [direct] size=8MB, order=random, threshold=12, test=2, time=0.086948
NOTICE:  [direct] size=8MB, order=random, threshold=16, test=0, time=0.082710
NOTICE:  [direct] size=8MB, order=random, threshold=16, test=1, time=0.082771
NOTICE:  [direct] size=8MB, order=random, threshold=16, test=2, time=0.083004
NOTICE:  [direct] size=8MB, order=random, threshold=20, test=0, time=0.080768
NOTICE:  [direct] size=8MB, order=random, threshold=20, test=1, time=0.080764
NOTICE:  [direct] size=8MB, order=random, threshold=20, test=2, time=0.080635
NOTICE:  [direct] size=8MB, order=random, threshold=24, test=0, time=0.080678
NOTICE:  [direct] size=8MB, order=random, threshold=24, test=1, time=0.080555
NOTICE:  [direct] size=8MB, order=random, threshold=24, test=2, time=0.080604
NOTICE:  [direct] size=8MB, order=random, threshold=28, test=0, time=0.079002
NOTICE:  [direct] size=8MB, order=random, threshold=28, test=1, time=0.078901
NOTICE:  [direct] size=8MB, order=random, threshold=28, test=2, time=0.082200
NOTICE:  [direct] size=8MB, order=random, threshold=32, test=0, time=0.079317
NOTICE:  [direct] size=8MB, order=random, threshold=32, test=1, time=0.079164
NOTICE:  [direct] size=8MB, order=random, threshold=32, test=2, time=0.079308
NOTICE:  [direct] size=8MB, order=random, threshold=64, test=0, time=0.078604
NOTICE:  [direct] size=8MB, order=random, threshold=64, test=1, time=0.078718
NOTICE:  [direct] size=8MB, order=random, threshold=64, test=2, time=0.078689
NOTICE:  [direct] size=8MB, order=decreasing, threshold=7, test=0, time=0.025097
NOTICE:  [direct] size=8MB, order=decreasing, threshold=7, test=1, time=0.025078
NOTICE:  [direct] size=8MB, order=decreasing, threshold=7, test=2, time=0.025095
NOTICE:  [direct] size=8MB, order=decreasing, threshold=12, test=0, 

Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-02-02 Thread Nathan Bossart
On Wed, Feb 02, 2022 at 05:19:26PM +0530, Bharath Rupireddy wrote:
> On Wed, Feb 2, 2022 at 5:25 AM Nathan Bossart  
> wrote:
>> However, I'm not sure about the change to ReadDirExtended().  That might be
>> okay for CheckPointSnapBuild(), which is just trying to remove old files,
>> but CheckPointLogicalRewriteHeap() is responsible for ensuring that files
>> are flushed to disk for the checkpoint.  If we stop reading the directory
>> after an error and let the checkpoint continue, isn't it possible that some
>> mappings files won't be persisted to disk?
> 
> Unless I mis-read your above statement, with LOG level in
> ReadDirExtended, I don't think we stop reading the files in
> CheckPointLogicalRewriteHeap. Am I missing something here?

ReadDirExtended() has the following comment:

 * If elevel < ERROR, returns NULL after any error.  With the normal coding
 * pattern, this will result in falling out of the loop immediately as
 * though the directory contained no (more) entries.

If there is a problem reading the directory, we will LOG and then exit the
loop.  If we didn't scan through all the entries in the directory, there is
a chance that we didn't fsync() all the files that need it.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




ci/cfbot: run windows tests under a timeout

2022-02-02 Thread Andres Freund
Hi,

On windows cfbot currently regularly hangs / times out. Presumably this is due
to the issues discussed in 
https://postgr.es/m/CA%2BhUKG%2BG5DUNJfdE-qusq5pcj6omYTuWmmFuxCvs%3Dq1jNjkKKA%40mail.gmail.com
which lead to reverting [1] some networking related changes everywhere but
master.

But it's hard to tell - because the entire test task times out, we don't get
to see debugging information.

In earlier versions of the CI script I had tests run under a timeout command,
that killed the entire test run. I found that to be helpful when working on
AIO. But I removed that, in an attempt to simplify things, before
submitting. Turns out it was needed complexity.

The attached test adds a timeout (using git's timeout binary) to all vcregress
invocations. I've not re-added it to the other OSs, but I'm on the fence about
doing so.

The diff is a bit larger than one might think necessary: Yaml doesn't like % -
from the windows command variable syntax - at the start of an unquoted
string...


Separately, we should probably make Cluster.pm::psql() etc always use a
"fallback" timeout (rather than just when the test writer thought it's
necessary). Or perhaps Utils.pm's INIT should set up a timer after which an
individual test is terminated?


Greetings,

Andres Freund

[1]
commit 75674c7ec1b1607e7013b5cebcb22d9c8b4b2cb6
Author: Tom Lane 
Date:   2022-01-25 12:17:40 -0500

Revert "graceful shutdown" changes for Windows, in back branches only.

This reverts commits 6051857fc and ed52c3707, but only in the back
branches.  Further testing has shown that while those changes do fix
some things, they also break others; in particular, it looks like
walreceivers fail to detect walsender-initiated connection close
reliably if the walsender shuts down this way.  We'll keep trying to
improve matters in HEAD, but it now seems unwise to push these changes
into stable releases.

Discussion: 
https://postgr.es/m/CA+hUKG+OeoETZQ=Qw5Ub5h3tmwQhBmDA=nuNO3KG=zwfuyp...@mail.gmail.com
>From e9795000729f649a861523c65376b978cea4a94c Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 1 Feb 2022 15:25:54 -0800
Subject: [PATCH] wip: time out tests on windows

ci-os-only: windows
---
 .cirrus.yml | 81 +
 1 file changed, 45 insertions(+), 36 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index 677bdf0e65e..8e0ae69beea 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -366,6 +366,14 @@ task:
 #   build
 MSBFLAGS: -m -verbosity:minimal "-consoleLoggerParameters:Summary;ForceNoAlign" /p:TrackFileAccess=false -nologo
 
+# If tests hang forever, cirrus eventually times out. In that case log
+# output etc is not uploaded, making the problem hard to debug. Of course
+# tests internally should have shorter timeouts, but that's proven to not
+# be sufficient. 15min currently is fast enough to finish individual test
+# "suites".
+T_C: "\"C:/Program Files/Git/usr/bin/timeout.exe\" -v -k60s 15m"
+
+
   only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*windows.*'
 
   windows_container:
@@ -391,42 +399,43 @@ task:
 # Installation on windows currently only completely works from src/tools/msvc
 - cd src/tools/msvc && perl install.pl %CIRRUS_WORKING_DIR%/tmp_install
 
-  test_regress_parallel_script:
-- perl src/tools/msvc/vcregress.pl check parallel
-  startcreate_script:
-# paths to binaries need backslashes
-- tmp_install\bin\pg_ctl.exe initdb -D tmp_check/db -l tmp_check/initdb.log --options=--no-sync
-- echo include '%TEMP_CONFIG%' >> tmp_check/db/postgresql.conf
-- tmp_install\bin\pg_ctl.exe start -D tmp_check/db -l tmp_check/postmaster.log
-  test_pl_script:
-- perl src/tools/msvc/vcregress.pl plcheck
-  test_isolation_script:
-- perl src/tools/msvc/vcregress.pl isolationcheck
-  test_modules_script:
-- perl src/tools/msvc/vcregress.pl modulescheck
-  test_contrib_script:
-- perl src/tools/msvc/vcregress.pl contribcheck
-  stop_script:
-- tmp_install\bin\pg_ctl.exe stop -D tmp_check/db -l tmp_check/postmaster.log
-  test_ssl_script:
-- set with_ssl=openssl
-- perl src/tools/msvc/vcregress.pl taptest ./src/test/ssl/
-  test_subscription_script:
-- perl src/tools/msvc/vcregress.pl taptest ./src/test/subscription/
-  test_authentication_script:
-- perl src/tools/msvc/vcregress.pl taptest ./src/test/authentication/
-  test_recovery_script:
-- perl src/tools/msvc/vcregress.pl recoverycheck
-  test_bin_script:
-- perl src/tools/msvc/vcregress.pl bincheck
-  test_pg_upgrade_script:
-- perl src/tools/msvc/vcregress.pl upgradecheck
-  test_ecpg_script:
-# tries to build additional stuff
-- vcvarsall x64
-# References ecpg_regression.proj in the current dir
-- cd src/tools/msvc
-- perl vcregress.pl ecpgcheck
+  test_regress_parallel_script: |
+%T_C% perl src/tools/msvc/vcregress.pl 

Re: 2022-01 Commitfest

2022-02-02 Thread Jaime Casanova
On Wed, Feb 02, 2022 at 01:00:18PM -0500, Tom Lane wrote:
> 
> Anyway, thanks to Julien for doing this mostly-thankless task
> this time!
> 

Agreed, great work!

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL




Re: 2022-01 Commitfest

2022-02-02 Thread Tom Lane
Jaime Casanova  writes:
> On Thu, Feb 03, 2022 at 01:28:53AM +0800, Julien Rouhaud wrote:
>> If we close all patches that had a review just because they weren't perfect 
>> in
>> their initial submission, we're just going to force everyone to re-register
>> their patch for every single commit fest.  I don't see that doing anything
>> apart from making sure that everyone stops contributing.

> I had the same problem last time, "Returned with feedback" didn't feel
> fine in some cases.

Agreed, we're not here to cause make-work for submitters.  RWF is
appropriate if the patch has been in Waiting On Author for awhile
and doesn't seem to be going anywhere, but otherwise we should
just punt it to the next CF.

Anyway, thanks to Julien for doing this mostly-thankless task
this time!

regards, tom lane




Re: 2022-01 Commitfest

2022-02-02 Thread Julien Rouhaud
On Wed, Feb 02, 2022 at 12:45:40PM -0500, Jaime Casanova wrote:
> On Thu, Feb 03, 2022 at 01:28:53AM +0800, Julien Rouhaud wrote:
> > 
> > My understanding of "Returned with Feedback" is that the patch implements
> > something wanted, but as proposed won't be accepted without a major 
> > redesign or
> > something like that.  Not patches that are going through normal "review /
> > addressing reviews" cycles.  And definitely not bug fixes either.
> > 
> > If we close all patches that had a review just because they weren't perfect 
> > in
> > their initial submission, we're just going to force everyone to re-register
> > their patch for every single commit fest.  I don't see that doing anything
> > apart from making sure that everyone stops contributing.
> > 
> 
> I had the same problem last time, "Returned with feedback" didn't feel
> fine in some cases.
> 
> After reading this i started to wish there was some kind of guide about
> this, and of course the wiki has that guide (outdated yes but something
> to start with).
> 
> https://wiki.postgresql.org/wiki/CommitFest_Checklist#Sudden_Death_Overtime
>
> This needs some love, still mentions rrreviewers for example

Yes, I looked at it but to be honest it doesn't make any sense.

It feels like this is punishing patches that get reviewed at the end of the
commitfest or that previously got an incorrect review, and somehow tries to
salvage patches from authors that don't review anything.

> but if we
> updated and put here a clear definition of the states maybe it could
> help to do CF managment.

I'm all for it, but looking at the current commit fest focusing on unresponsive
authors (e.g. close one way or another patches that have been waiting on author
for more than X days) should already help quite a lot.




Re: Server-side base backup: why superuser, not pg_write_server_files?

2022-02-02 Thread Tom Lane
Robert Haas  writes:
> This seems like a good idea, but I'm not going to slip a change from
> an exact test count to done_testing() into a commit on some other
> topic...

Actually, it seemed that the consensus in the nearby thread [1]
was to start doing exactly that, rather than try to convert them
all in one big push.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/9D4FFB61-392B-4A2C-B7E4-911797B4AC14%40yesql.se




Re: 2022-01 Commitfest

2022-02-02 Thread Jaime Casanova
On Thu, Feb 03, 2022 at 01:28:53AM +0800, Julien Rouhaud wrote:
> 
> My understanding of "Returned with Feedback" is that the patch implements
> something wanted, but as proposed won't be accepted without a major redesign 
> or
> something like that.  Not patches that are going through normal "review /
> addressing reviews" cycles.  And definitely not bug fixes either.
> 
> If we close all patches that had a review just because they weren't perfect in
> their initial submission, we're just going to force everyone to re-register
> their patch for every single commit fest.  I don't see that doing anything
> apart from making sure that everyone stops contributing.
> 

I had the same problem last time, "Returned with feedback" didn't feel
fine in some cases.

After reading this i started to wish there was some kind of guide about
this, and of course the wiki has that guide (outdated yes but something
to start with).

https://wiki.postgresql.org/wiki/CommitFest_Checklist#Sudden_Death_Overtime

This needs some love, still mentions rrreviewers for example, but if we
updated and put here a clear definition of the states maybe it could
help to do CF managment.

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL




Re: 2022-01 Commitfest

2022-02-02 Thread Julien Rouhaud
Hi,

On Wed, Feb 02, 2022 at 12:09:06PM -0500, Greg Stark wrote:
> I gave two reviews and received one review but the patches have been
> "Moved to next CF".

For now I only moved to the next commit fest the patches that were in "Needs
Review" or "Ready for Committer".  I'm assuming that you failed to update the
cf entry accordingly after your reviews, so yeah the patches were moved.

I unfortunately don't have a lot of time right now and the commit fest still
needs to be closed, so I prefer to use my time triaging the patches that were
marked as Waiting on Author rather than going through a couple hundred of
threads yet another time.

> Should I update them to "Returned with Feedback"
> given they all did get feedback? I was under the impression "Moved to
> next CF" was only for patches that didn't get feedback in a CF and
> were still waiting for feedback.

My understanding of "Returned with Feedback" is that the patch implements
something wanted, but as proposed won't be accepted without a major redesign or
something like that.  Not patches that are going through normal "review /
addressing reviews" cycles.  And definitely not bug fixes either.

If we close all patches that had a review just because they weren't perfect in
their initial submission, we're just going to force everyone to re-register
their patch for every single commit fest.  I don't see that doing anything
apart from making sure that everyone stops contributing.




Re: [PATCH] Add reloption for views to enable RLS

2022-02-02 Thread Christoph Heiss

Hi Laurenz,

thank you again for the review!

On 1/20/22 15:20, Laurenz Albe wrote:

[..]
I gave the new patch a spin, and got a surprising result:

   [..]

   INSERT INTO v VALUES (1);
   INSERT 0 1

Huh?  "duff" has no permission to insert into "tab"!
That really should not happen, thanks for finding that and helping me 
investigating on how to fix that!


This is now solved by checking the security_invoker property on the view 
in rewriteTargetView().


I've also added a testcase for this in v4 to catch that in future.



[..]

About the documentation:

--- a/doc/src/sgml/ref/create_view.sgml
+++ b/doc/src/sgml/ref/create_view.sgml
+   
+security_invoker (boolean)
+
+ 
+  If this option is set, it will cause all access to the underlying
+  tables to be checked as referenced by the invoking user, rather than
+  the view owner.  This will only take effect when row level security 
is
+  enabled on the underlying tables (using 
+  ALTER TABLE ... ENABLE ROW LEVEL SECURITY).
+ 

Why should this *only* take effect if (not "when") RLS is enabled?
The above test shows that there is an effect even without RLS.

+ This option can be changed on existing views using ALTER VIEW. See
+   for more details on row level 
security.
+ 

I don't think that it is necessary to mention that this can be changed with
ALTER VIEW - all storage parameters can be.  I guess you copied that from
the "check_option" documentation, but I would say it need not be mentioned
there either.

Exactly, I tried to fit it in with the existing parameters.
I moved the link to ALTER VIEW to the end of the paragraph, as it 
applies to all options anyways.




+   
+If the security_invoker option is set on the view,
+access to tables is determined by permissions of the invoking user, rather
+than the view owner.  This can be used to provide stricter permission
+checking to the underlying tables than by default.
 

Since you are talking about use cases here, RLS might deserve a mention.

Expanded upon a little bit in v4.



--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
+   {
+   {
+   "security_invoker",
+   "View subquery in invoked within the current security context.",
+   RELOPT_KIND_VIEW,
+   AccessExclusiveLock
+   },
+   false
+   },

That doesn't seem to be proper English.

Yes, that happened when rewriting this for v1 -> v2.
Fixed.

Thanks,
Christoph HeissFrom 01437a45bfd069080ffe0eb45288bfddd3de6009 Mon Sep 17 00:00:00 2001
From: Christoph Heiss 
Date: Wed, 2 Feb 2022 16:44:38 +0100
Subject: [PATCH v4 1/3] Add new boolean reloption security_invoker to views

When this reloption is set to true, all references to the underlying tables will
be checked against the invoking user rather than the view owner, as is currently
implemented.

Signed-off-by: Christoph Heiss 
---
 src/backend/access/common/reloptions.c| 11 
 src/backend/nodes/copyfuncs.c |  1 +
 src/backend/nodes/equalfuncs.c|  1 +
 src/backend/nodes/outfuncs.c  |  1 +
 src/backend/nodes/readfuncs.c |  1 +
 src/backend/optimizer/plan/subselect.c|  1 +
 src/backend/optimizer/prep/prepjointree.c |  1 +
 src/backend/rewrite/rewriteHandler.c  | 17 --
 src/backend/utils/cache/relcache.c| 63 +--
 src/include/nodes/parsenodes.h|  1 +
 src/include/utils/rel.h   | 11 
 11 files changed, 77 insertions(+), 32 deletions(-)

diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index d592655258..c7c62a0076 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -140,6 +140,15 @@ static relopt_bool boolRelOpts[] =
 		},
 		false
 	},
+	{
+		{
+			"security_invoker",
+			"Check permissions against underlying tables as the calling user, not as view owner",
+			RELOPT_KIND_VIEW,
+			AccessExclusiveLock
+		},
+		false
+	},
 	{
 		{
 			"vacuum_truncate",
@@ -1996,6 +2005,8 @@ view_reloptions(Datum reloptions, bool validate)
 	static const relopt_parse_elt tab[] = {
 		{"security_barrier", RELOPT_TYPE_BOOL,
 		offsetof(ViewOptions, security_barrier)},
+		{"security_invoker", RELOPT_TYPE_BOOL,
+		offsetof(ViewOptions, security_invoker)},
 		{"check_option", RELOPT_TYPE_ENUM,
 		offsetof(ViewOptions, check_option)}
 	};
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 90b5da51c9..b171992ba8 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2465,6 +2465,7 @@ _copyRangeTblEntry(const RangeTblEntry *from)
 	COPY_NODE_FIELD(tablesample);
 	COPY_NODE_FIELD(subquery);
 	COPY_SCALAR_FIELD(security_barrier);
+	COPY_SCALAR_FIELD(security_invoker);
 	COPY_SCALAR_FIELD(jointype);
 	COPY_SCALAR_FIELD(joinmergedcols);
 	

Re: 2022-01 Commitfest

2022-02-02 Thread Greg Stark
I gave two reviews and received one review but the patches have been
"Moved to next CF". Should I update them to "Returned with Feedback"
given they all did get feedback? I was under the impression "Moved to
next CF" was only for patches that didn't get feedback in a CF and
were still waiting for feedback.

On Wed, 2 Feb 2022 at 11:16, Julien Rouhaud  wrote:
>
> Hi,
>
> It's now at least Feb. 1st anywhere on earth, so the commit fest is now over.
>
> Since last week 5 entries were committed, 1 withdrawn, 3 returned with
> feedback, 2 already moved to the next commitfest and 1 rejected.
>
> This gives a total of 211 patches still alive, most of them ready for the next
> and final pg15 commitfest.
>
> Status summary:
> - Needs review: 147.
> - Waiting on Author: 38.
> - Ready for Committer: 26.
> - Committed: 59.
> - Moved to next CF: 3.
> - Returned with Feedback: 5.
> - Rejected: 5.
> - Withdrawn: 9.
> - Total: 292.
>
> I will take care of closing the current commit fest and moving the entries to
> the next one shortly.
>
>


-- 
greg




Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

2022-02-02 Thread Greg Stark
Additionally I've looked at the tests and I'm not sure but I don't
think this arrangement is going to work. I don't have the time to run
CLOBBER_CACHE and CLOBBER_CACHE_ALWAYS tests but I know they run
*really* slowly. So the test can't just do a CHECKPOINT and then trust
that the next few transactions will still be in the wal to decode
later. There could have been many more timed checkpoints in between.

I think the way to do it is to create either a backup label or a
replication slot. Then you can inspect the lsn of the label or slot
and decode all transactions between that point and the current point.

I also think you should try to have a wider set of wal records in that
range to test decoding records with and without full page writes, with
DDL records, etc.

I do like that the tests don't actually have the decoded record info
in the test though. But they can do a minimal effort to check that the
records they think they're testing are actually being tested. Insert
into a temporary table and then run a few queries with WHERE clauses
to test for a heap insert, btree insert test the right relid is
present, and test that a full page write is present (if full page
writes are enabled I guess). You don't need an exhaustive set of
checks because you're not testing that wal logging works properly,
just that the tests aren't accidentally passing because they're not
finding any interesting records.




Re: CREATEROLE and role ownership hierarchies

2022-02-02 Thread Robert Haas
On Tue, Feb 1, 2022 at 6:38 PM Andrew Dunstan  wrote:
> > In existing postgresql releases, having CREATEROLE means you can give away 
> > most attributes, including ones you yourself don't have (createdb, login).  
> > So we already have the concept of NOFOO WITH ADMIN OPTION, we just don't 
> > call it that.  In pre-v8 patches on this thread, I got rid of that; you 
> > *must* have the attribute to give it away.  But maybe that was too 
> > restrictive, and we need a way to specify, attribute by attribute, how this 
> > works.  Is this just a problem of surprising grammar?  Is it surprising 
> > behavior?  If the latter, I'm inclined to give up this WIP as having been a 
> > bad move.  If the former, I'll try to propose some less objectionable 
> > grammar.
> >
>
> Certainly the grammar would need to be better. But I'm not sure any
> grammar that expresses what is supported here is not going to be
> confusing, because the underlying scheme seems complex. But I'm
> persuadable. I'd like to hear from others on the subject.

Well, we've been moving more and more in the direction of using
predefined roles to manage access. The things that are basically
Boolean flags on the role are mostly legacy stuff. So my tentative
opinion (and I'm susceptible to being persuaded that I'm wrong here)
is that putting a lot of work into fleshing out that infrastructure
does not necessarily make a ton of sense. Are we ever going to add
even one more flag that works that way?

Also, any account that can create roles is a pretty high-privilege
account. Maybe it's superuser, or maybe not, but it's certainly
powerful. In my opinion, that makes fine distinctions here less
important. Is there really an argument for saying "well, we're going
to let you bypass RLS, but we're not going to let you give that
privilege to others"? It seems contrived to think of restricting a
role that is powerful enough to create whole new accounts in such a
way. I'm not saying that someone couldn't have a use case for it, but
I think it'd be a pretty thin use case.

In short, I think it makes tons of sense to say that CREATEROLE lets
you give to others those role flags which you have, but not the ones
you lack. However, to me, it feels like overengineering to distinguish
between things you have and can give away, things you have and can't
give away, and things you don't even have.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: autovacuum prioritization

2022-02-02 Thread Robert Haas
On Wed, Jan 26, 2022 at 6:46 PM Greg Stark  wrote:
> One thing I've been wanting to do something about is I think
> autovacuum needs to be a little cleverer about when *not* to vacuum a
> table because it won't do any good.

I agree.

> I was thinking of just putting a check in before kicking off a vacuum
> and if the globalxmin is a significant fraction of the distance to the
> relfrozenxid then instead log a warning. Basically it means "we can't
> keep the bloat below the threshold due to the idle transactions et al,
> not because there's insufficient i/o bandwidth".

Unfortunately, XID distances don't tell us much, because the tuples
need not be uniformly distributed across the XID space. In fact, it
seems highly likely that they will be very non-uniformly distributed,
with a few transactions having created a lot of dead tuples and most
having created none. Therefore, it's pretty plausible that a vacuum
that permits relfrozenxid++ could solve every problem we have. If we
knew something about the distribution of dead XIDs in the table, then
we could make an intelligent judgement about whether vacuuming would
be useful. But otherwise I feel like we're just guessing, so instead
of really fixing the problem we'll just be making it happen in a set
of cases that's even harder to grasp.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Ensure that STDERR is empty during connect_ok

2022-02-02 Thread Alvaro Herrera
On 2022-Feb-02, Andrew Dunstan wrote:

> On 2/2/22 11:01, Dagfinn Ilmari Mannsåker wrote:

> > Rather than waiting for Someone™ to find a suitably-shaped tuit to do a
> > whole sweep converting everything to done_testing(), I think we should
> > make a habit of converting individual scripts whenever a change breaks
> > the count.

+1.

> Or when anyone edits a script, even if the number of tests doesn't get
> broken.

Sure, if the committer is up to doing it, but let's not make that a hard
requirement for any commit that touches a test script.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Ni aún el genio muy grande llegaría muy lejos
si tuviera que sacarlo todo de su propio interior" (Goethe)




Re: 2022-01 Commitfest

2022-02-02 Thread Julien Rouhaud
Hi,

It's now at least Feb. 1st anywhere on earth, so the commit fest is now over.

Since last week 5 entries were committed, 1 withdrawn, 3 returned with
feedback, 2 already moved to the next commitfest and 1 rejected.

This gives a total of 211 patches still alive, most of them ready for the next
and final pg15 commitfest.

Status summary:
- Needs review: 147.
- Waiting on Author: 38.
- Ready for Committer: 26.
- Committed: 59.
- Moved to next CF: 3.
- Returned with Feedback: 5.
- Rejected: 5.
- Withdrawn: 9.
- Total: 292.

I will take care of closing the current commit fest and moving the entries to
the next one shortly.




Re: Ensure that STDERR is empty during connect_ok

2022-02-02 Thread Andrew Dunstan


On 2/2/22 11:01, Dagfinn Ilmari Mannsåker wrote:
> Tom Lane  writes:
>
>> Daniel Gustafsson  writes:
>>
>>> While I prefer to not plan at all and instead run done_testing(),
>>> doing that consistently is for another patch, keeping this with the
>>> remainder of the suites.
>> +1 to that too, counting the tests is a pretty useless exercise.
> Rather than waiting for Someone™ to find a suitably-shaped tuit to do a
> whole sweep converting everything to done_testing(), I think we should
> make a habit of converting individual scripts whenever a change breaks
> the count.


Or when anyone edits a script, even if the number of tests doesn't get
broken.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Refactoring SSL tests

2022-02-02 Thread Andrew Dunstan


On 2/2/22 08:26, Daniel Gustafsson wrote:
> Thoughts?  I'm fairly sure there are many crimes against Perl in this patch,
> I'm happy to take pointers on how to improve that.


It feels a bit odd to me from a perl POV. I think it needs to more along
the lines of standard OO patterns. I'll take a stab at that based on
this, might be a few days.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Ensure that STDERR is empty during connect_ok

2022-02-02 Thread Dagfinn Ilmari Mannsåker
Tom Lane  writes:

> Daniel Gustafsson  writes:
>
>> While I prefer to not plan at all and instead run done_testing(),
>> doing that consistently is for another patch, keeping this with the
>> remainder of the suites.
>
> +1 to that too, counting the tests is a pretty useless exercise.

Rather than waiting for Someone™ to find a suitably-shaped tuit to do a
whole sweep converting everything to done_testing(), I think we should
make a habit of converting individual scripts whenever a change breaks
the count.

- ilmari




Re: pg_receivewal - couple of improvements

2022-02-02 Thread Julien Rouhaud
On Wed, Feb 02, 2022 at 09:14:03PM +0530, Bharath Rupireddy wrote:
> 
> FYI that thread is closed, it committed the change (f61e1dd [1]) that
> pg_receivewal can read from its replication slot restart lsn.
> 
> I know that providing the start pos as an option came up there [2],
> but I wanted to start the discussion fresh as that thread got closed.

Ah sorry I misunderstood your email.

I'm not sure it's a good idea.  If you have missing WALs in your target
directory but have an alternative backup location, you will have to restore the
WAL from that alternative location anyway, so I'm not sure how accepting a
different start position is going to help in that scenario.  On the other hand
allowing a position at the command line can also lead to accepting a bogus
position, which could possibly make things worse.

> 2) Currently, RECONNECT_SLEEP_TIME is 5sec - but I may want to have
> more reconnect time as I know that the primary can go down at any time
> for whatever reasons in production environments which can take some
> time till I bring up primary and I don't want to waste compute cycles
> in the node on which pg_receivewal is running

I don't think that attempting a connection is really costly.  Also, increasing
this retry time also increases the amount of time you're not streaming WALs,
and thus the amount of data you can lose so I'm not sure that's actually a good
idea.  But you might also want to make it more aggressive, so no objection to
make it configurable.




Re: Server-side base backup: why superuser, not pg_write_server_files?

2022-02-02 Thread Robert Haas
On Wed, Feb 2, 2022 at 10:42 AM Dagfinn Ilmari Mannsåker
 wrote:
> Here's a follow-on patch that adds a test for non-superuser server-side
> basebackup, which crashes without your patch and passes with it.

This seems like a good idea, but I'm not going to slip a change from
an exact test count to done_testing() into a commit on some other
topic...

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: refactoring basebackup.c

2022-02-02 Thread Robert Haas
On Tue, Jan 18, 2022 at 1:55 PM Robert Haas  wrote:
> 0001 adds "server" and "blackhole" as backup targets. It now has some
> tests. This might be more or less ready to ship, unless somebody else
> sees a problem, or I find one.

I played around with this a bit and it seems quite easy to extend this
further. So please find attached a couple more patches to generalize
this mechanism.

0001 adds an extensibility framework for backup targets. The idea is
that an extension loaded via shared_preload_libraries can call
BaseBackupAddTarget() to define a new base backup target, which the
user can then access via pg_basebackup --target TARGET_NAME, or if
they want to pass a detail string, pg_basebackup --target
TARGET_NAME:DETAIL. There might be slightly better ways of hooking
this into the system. I'm not unhappy with this approach, but there
might be a better idea out there.

0002 adds an example contrib module called basebackup_to_shell. The
system administrator can set basebackup_to_shell.command='SOMETHING'.
A backup directed to the 'shell' target will cause the server to
execute the configured command once per generated archive, and once
for the backup_manifest, if any. When executing the command, %f gets
replaced with the archive filename (e.g. base.tar) and %d gets
replaced with the detail. The actual contents of the file are passed
to the command's standard input, and it can then do whatever it likes
with that data. Clearly, this is not state of the art; for instance,
if what you really want is to upload the backup files someplace via
HTTP, using this to run 'curl' is probably not so good of an idea as
using an extension module that links with libcurl. That would likely
lead to better error checking, better performance, nicer
configuration, and just generally fewer things that can go wrong. On
the other hand, writing an integration in C is kind of tricky, and
this thing is quite easy to use -- and it does work.

There are a couple of things to be concerned about with 0002 from a
security perspective. First, in a backend environment, we have a
function to spawn a subprocess via popen(), namely OpenPipeStream(),
but there is no function to spawn a subprocess with execve() and end
up with a socket connected to its standard input. And that means that
whatever command the administrator configures is being interpreted by
the shell, which is a potential problem given that we're interpolating
the target detail string supplied by the user, who must have at least
replication privileges but need not be the superuser. I chose to
handle this by allowing the target detail to contain only alphanumeric
characters. Refinement is likely possible, but whether the effort is
worthwhile seems questionable. Second, what if the superuser wants to
allow the use of this module to only some of the users who have
replication privileges? That seems a bit unlikely but it's possible,
so I added a GUC basebackup_to_shell.required_role. If set, the
functionality is only usable by members of the named role. If unset,
anyone with replication privilege can use it. I guess someone could
criticize this as defaulting to the least secure setting, but
considering that you have to have replication privileges to use this
at all, I don't find that argument much to get excited about.

I have to say that I'm incredibly happy with how easy these patches
were to write. I think this is going to make adding new base backup
targets as accessible as we can realistically hope to make it. There
is some boilerplate code, as an examination of the patches will
reveal, but it's not a lot, and at least IMHO it's pretty
straightforward. Granted, coding up a new base backup target is
something only experienced C hackers are likely to do, but the fact
that I was able to throw this together so quickly suggests to me that
I've got the design basically right, and that anyone who does want to
plug into the new mechanism shouldn't have too much trouble doing so.

Thoughts?

-- 
Robert Haas
EDB: http://www.enterprisedb.com


0001-Allow-extensions-to-add-new-backup-targets.patch
Description: Binary data


0002-Add-basebackup_to_shell-contrib-module.patch
Description: Binary data


Re: Ensure that STDERR is empty during connect_ok

2022-02-02 Thread Alvaro Herrera
On 2022-Feb-02, Daniel Gustafsson wrote:

> Making this a subtest in order to not having to change the callers, turns the
> patch into the attached.  For this we must group the new test with one already
> existing test, if we group more into it (which would make more sense) then we
> need to change callers as that reduce the testcount across the tree.

Well, it wasn't my intention that this patch would not have to change
the callers.  What I was thinking about is making connect_ok() have a
subtest for *all* the tests it contains (and changing the callers to
match), so that *in the future* we can add more tests there without
having to change the callers *again*.

> Or did I misunderstand you?

I think you did.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Cómo ponemos nuestros dedos en la arcilla del otro. Eso es la amistad; jugar
al alfarero y ver qué formas se pueden sacar del otro" (C. Halloway en
La Feria de las Tinieblas, R. Bradbury)




Re: pg_receivewal - couple of improvements

2022-02-02 Thread Bharath Rupireddy
On Wed, Feb 2, 2022 at 9:05 PM Julien Rouhaud  wrote:
>
> Hi,
>
> On Wed, Feb 02, 2022 at 08:53:13PM +0530, Bharath Rupireddy wrote:
> >
> > Here are some improvements we can make to pg_receivewal that were
> > emanated after working with it in production environments:
> >
> > 1) As a user, I, sometimes, want my pg_receivewal to start streaming
> > from the LSN that I provide as an input i.e. startpos instead of it
> > calculating the stream start position 1) from its target directory or
> > 2) from its replication slot's restart_lsn or 3) after sending
> > IDENTIFY_SYSTEM on to the primary.
>
> This is already being discussed (at least part of) as of
> https://www.postgresql.org/message-id/flat/18708360.4lzOvYHigE%40aivenronan.

FYI that thread is closed, it committed the change (f61e1dd [1]) that
pg_receivewal can read from its replication slot restart lsn.

I know that providing the start pos as an option came up there [2],
but I wanted to start the discussion fresh as that thread got closed.

[1]
commit f61e1dd2cee6b1a1da75c2bb0ca3bc72f18748c1
Author: Michael Paquier 
Date:   Tue Oct 26 09:30:37 2021 +0900

Allow pg_receivewal to stream from a slot's restart LSN

Author: Ronan Dunklau
Reviewed-by: Kyotaro Horiguchi, Michael Paquier, Bharath Rupireddy
Discussion: https://postgr.es/m/18708360.4lzOvYHigE@aivenronan

[2] 
https://www.postgresql.org/message-id/20210902.144554.1303720268994714850.horikyota.ntt%40gmail.com

Regards,
Bharath Rupireddy.




Re: Ensure that STDERR is empty during connect_ok

2022-02-02 Thread Daniel Gustafsson


> On 2 Feb 2022, at 16:01, Alvaro Herrera  wrote:
> 
> On 2022-Feb-02, Daniel Gustafsson wrote:
> 
>> As part of the NSS patchset, quite a few bugs (and NSS quirks) were found by
>> inspecting STDERR in connect_ok and require it to be empty.  This is not 
>> really
>> NSS specific, and could help find issues in other libraries as well so I
>> propose to apply it regardless of the fate of the NSS patchset.
>> 
>> (The change in the SCRAM tests stems from this now making all testruns have 
>> the
>> same number of tests.  While I prefer to not plan at all and instead run
>> done_testing(), doing that consistently is for another patch, keeping this 
>> with
>> the remainder of the suites.)
> 
> Since commit 405f32fc4960 we can rely on subtests for this, so perhaps
> we should group all the tests in connect_ok() (including your new one)
> into a subtest; then each connect_ok() calls count as a single test, and
> we can add more tests in it without having to change the callers.

Disclaimer: longer term I would prefer to remove test plan counting like above
and Toms comment downthread.  This is just to make this patch more palatable on
the way there.

Making this a subtest in order to not having to change the callers, turns the
patch into the attached.  For this we must group the new test with one already
existing test, if we group more into it (which would make more sense) then we
need to change callers as that reduce the testcount across the tree.

This makes the patch smaller, but not more readable IMO (given that we can't
make it fully use subtests), so my preference is still the v1 patch.

Or did I misunderstand you?

--
Daniel Gustafsson   https://vmware.com/



v2-0001-Ensure-that-STDERR-is-empty-in-connect_ok-tests.patch
Description: Binary data


Re: Server-side base backup: why superuser, not pg_write_server_files?

2022-02-02 Thread Dagfinn Ilmari Mannsåker
Robert Haas  writes:

> On Fri, Jan 28, 2022 at 12:35 PM Dagfinn Ilmari Mannsåker
>  wrote:
>> On Fri, 28 Jan 2022, at 17:33, Robert Haas wrote:
>> > LGTM. Committed.
>>
>> Thanks!
>
> It appears that neither of us actually tested that this works.

Oops!

> For me, it works when I test as a superuser, but if I test as a
> non-superuser with or without pg_write_server_files, it crashes,
> because we end up trying to do syscache lookups without a transaction
> environment. I *think* that the attached is a sufficient fix; at
> least, it passes simple testing.

Here's a follow-on patch that adds a test for non-superuser server-side
basebackup, which crashes without your patch and passes with it.

- ilmari

>From e88af403706338d068a7156d2a9c02e27196ce12 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Wed, 2 Feb 2022 15:40:55 +
Subject: [PATCH] Test server-side basebackup as non-superuser

---
 src/bin/pg_basebackup/t/010_pg_basebackup.pl | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index a827be5e59..2283a8c42d 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -10,7 +10,7 @@
 use Fcntl qw(:seek);
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
-use Test::More tests => 143;
+use Test::More;
 
 program_help_ok('pg_basebackup');
 program_version_ok('pg_basebackup');
@@ -521,6 +521,15 @@
 ok(-f "$tempdir/backuponserver/base.tar", 'backup tar was created');
 rmtree("$tempdir/backuponserver");
 
+$node->command_ok(
+	[qw(createuser --replication --role=pg_write_server_files backupuser)],
+	'create backup user');
+$node->command_ok(
+	[ @pg_basebackup_defs, '-U', 'backupuser', '--target', "server:$real_tempdir/backuponserver", '-X', 'none' ],
+	'backup target server');
+ok(-f "$tempdir/backuponserver/base.tar", 'backup tar was created as non-superuser');
+rmtree("$tempdir/backuponserver");
+
 $node->command_fails(
 	[
 		@pg_basebackup_defs, '-D',
@@ -768,3 +777,5 @@
 	rmtree("$tempdir/backup_gzip2");
 	rmtree("$tempdir/backup_gzip3");
 }
+
+done_testing();
-- 
2.30.2



Re: warn if GUC set to an invalid shared library

2022-02-02 Thread David G. Johnston
On Tue, Feb 1, 2022 at 11:06 PM Maciek Sakrejda 
wrote:

> I tried running ALTER SYSTEM and got the warnings as expected:
>
> postgres=# alter system set shared_preload_libraries =
> no_such_library,not_this_one_either;
> WARNING:  could not access file "$libdir/plugins/no_such_library"
> WARNING:  could not access file "$libdir/plugins/not_this_one_either"
> ALTER SYSTEM
>
> I think this is great, but it would be really helpful to also indicate
> that at this point the server will fail to come back up after a restart. In
> my mind, that's a big part of the reason for having a warning here. Having
> made this mistake a couple of times, I would be able to read between the
> lines, as would many other users, but if you're not familiar with Postgres
> this might still be pretty opaque.


+1

I would at least consider having the UX go something like:

postgres=# ALTER SYSTEM SET shared_preload_libraries = not_such_library;
ERROR: .
HINT: to bypass the error please add FORCE before SET
postgres=# ALTER SYSTEM FORCE SET shared_preload_libraries =
no_such_library;
NOTICE: Error suppressed while setting shared_preload_libraries.

That is, have the user express their desire to leave the system in a
precarious state explicitly before actually doing so.

Upon startup, if the system already can track each separate location that
shared_preload_libraries is set, printing out those locations and current
values would be useful context.  Seeing ALTER SYSTEM in that listing would
be helpful.

David J.


Re: pg_receivewal - couple of improvements

2022-02-02 Thread Julien Rouhaud
Hi,

On Wed, Feb 02, 2022 at 08:53:13PM +0530, Bharath Rupireddy wrote:
> 
> Here are some improvements we can make to pg_receivewal that were
> emanated after working with it in production environments:
> 
> 1) As a user, I, sometimes, want my pg_receivewal to start streaming
> from the LSN that I provide as an input i.e. startpos instead of it
> calculating the stream start position 1) from its target directory or
> 2) from its replication slot's restart_lsn or 3) after sending
> IDENTIFY_SYSTEM on to the primary.

This is already being discussed (at least part of) as of
https://www.postgresql.org/message-id/flat/18708360.4lzOvYHigE%40aivenronan.




pg_receivewal - couple of improvements

2022-02-02 Thread Bharath Rupireddy
Hi,

Here are some improvements we can make to pg_receivewal that were
emanated after working with it in production environments:

1) As a user, I, sometimes, want my pg_receivewal to start streaming
from the LSN that I provide as an input i.e. startpos instead of it
calculating the stream start position 1) from its target directory or
2) from its replication slot's restart_lsn or 3) after sending
IDENTIFY_SYSTEM on to the primary.
This will particularly be useful when the primary is down for some
time (for whatever reasons) and the WAL files that are required by the
pg_receivewal may have been removed by it (I know this situation is a
bit messy, but it is quite possible in production environments). Then,
the pg_receivewal will
calculate the start position from its target directory and request the
primary with it, which the primary may not have. I have to intervene
and manually delete/move the WAL files in the pg_receivewal target
directory and restart the pg_receivewal so that it can continue.
Instead, if pg_receivewal can accept a startpos as an option, it can
just go ahead and stream from the primary.

2) Currently, RECONNECT_SLEEP_TIME is 5sec - but I may want to have
more reconnect time as I know that the primary can go down at any time
for whatever reasons in production environments which can take some
time till I bring up primary and I don't want to waste compute cycles
in the node on which pg_receivewal is running and I should be able to
just set it to a higher value, say 5 min or so, after which
pg_receivewal can try to perform StreamLog(); and attempt connection
to primary.

Thoughts?

Regards,
Bharath Rupireddy.




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-02 Thread David G. Johnston
On Wed, Feb 2, 2022 at 5:08 AM Amit Kapila  wrote:

> On Wed, Feb 2, 2022 at 1:06 PM David G. Johnston
>  wrote:
>
> ...
> >
> > I already explained that the concept of err_cnt is not useful.  The fact
> that you include it here makes me think you are still thinking that this
> all somehow is meant to keep track of history.  It is not.  The workers are
> state machines and "error" is one of the states - with relevant attributes
> to display to the user, and system, while in that state.  The state machine
> reporting does not care about historical states nor does it report on
> them.  There is some uncertainty if we continue with the automatic
> re-launch;
> >
>
> I think automatic retry will help to allow some transient errors say
> like network glitches that can be resolved on retry and will keep the
> behavior transparent. This is also consistent with what we do in
> standby mode where if there is an error on primary due to which
> standby is not able to fetch some data it will just retry. We can't
> fix any error that occurred on the server-side, so the way is to retry
> which is true for both standby and subscribers.
>

Good points.  In short there are two subsets of problems to deal with
here.  We should address them separately, though the pg_subscription_worker
table should provide relevant information for both cases.  If we are in a
retry situation relevant information, like next_scheduled_retry
(estimated), should be provided (if there is some kind of delay involved).
In a situation like "unique constraint violation" the
"next_scheduled_retry" would be null; or make the field a text field and
print "Manual Intervention Required".  Likewise, the XID/LSN would be null
in a retry situation since we haven't received a wholly intact transaction
from the publisher (we may know of such an ID but if the final COMMIT
message is never even seen before the feed dies we should not be exposing
that incomplete information to the user).

A standby is not expected to encounter any user data constraint problems so
even a system with manual intervention for such will work for standbys
because they will never hit that code path.  And you cannot simply skip
applying the failed transaction and move onto the next one - that data also
never came over.

David J.


Re: Server-side base backup: why superuser, not pg_write_server_files?

2022-02-02 Thread Robert Haas
On Fri, Jan 28, 2022 at 12:35 PM Dagfinn Ilmari Mannsåker
 wrote:
> On Fri, 28 Jan 2022, at 17:33, Robert Haas wrote:
> > LGTM. Committed.
>
> Thanks!

It appears that neither of us actually tested that this works. For me,
it works when I test as a superuser, but if I test as a non-superuser
with or without pg_write_server_files, it crashes, because we end up
trying to do syscache lookups without a transaction environment. I
*think* that the attached is a sufficient fix; at least, it passes
simple testing.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


0001-Fix-server-crash-bug-in-server-backup-target.patch
Description: Binary data


Re: Ensure that STDERR is empty during connect_ok

2022-02-02 Thread Tom Lane
Daniel Gustafsson  writes:
> As part of the NSS patchset, quite a few bugs (and NSS quirks) were found by
> inspecting STDERR in connect_ok and require it to be empty.  This is not 
> really
> NSS specific, and could help find issues in other libraries as well so I
> propose to apply it regardless of the fate of the NSS patchset.

+1

> (The change in the SCRAM tests stems from this now making all testruns have 
> the
> same number of tests.  While I prefer to not plan at all and instead run
> done_testing(), doing that consistently is for another patch, keeping this 
> with
> the remainder of the suites.)

+1 to that too, counting the tests is a pretty useless exercise.

regards, tom lane




Re: Ensure that STDERR is empty during connect_ok

2022-02-02 Thread Alvaro Herrera
On 2022-Feb-02, Daniel Gustafsson wrote:

> As part of the NSS patchset, quite a few bugs (and NSS quirks) were found by
> inspecting STDERR in connect_ok and require it to be empty.  This is not 
> really
> NSS specific, and could help find issues in other libraries as well so I
> propose to apply it regardless of the fate of the NSS patchset.
>
> (The change in the SCRAM tests stems from this now making all testruns have 
> the
> same number of tests.  While I prefer to not plan at all and instead run
> done_testing(), doing that consistently is for another patch, keeping this 
> with
> the remainder of the suites.)

Since commit 405f32fc4960 we can rely on subtests for this, so perhaps
we should group all the tests in connect_ok() (including your new one)
into a subtest; then each connect_ok() calls count as a single test, and
we can add more tests in it without having to change the callers.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
Al principio era UNIX, y UNIX habló y dijo: "Hello world\n".
No dijo "Hello New Jersey\n", ni "Hello USA\n".




Re: Add checkpoint and redo LSN to LogCheckpointEnd log message

2022-02-02 Thread Bharath Rupireddy
On Tue, Feb 1, 2022 at 9:39 PM Fujii Masao  wrote:
> I found that CreateRestartPoint() already reported the redo lsn as follows 
> after emitting the restartpoint log message. To avoid duplicated logging of 
> the same information, we should update this code?
>
> ereport((log_checkpoints ? LOG : DEBUG2),
> (errmsg("recovery restart point at %X/%X",
> LSN_FORMAT_ARGS(lastCheckPoint.redo)),
>  xtime ? errdetail("Last completed transaction was at 
> log time %s.",
>
> timestamptz_to_str(xtime)) : 0));
>
> This code reports lastCheckPoint.redo as redo lsn. OTOH, with the patch, 
> LogCheckpointEnd() reports ControlFile->checkPointCopy.redo. They may be 
> different, for example, when the current DB state is not 
> DB_IN_ARCHIVE_RECOVERY. In this case, which lsn should we report as redo lsn?

Do we ever reach CreateRestartPoint when ControlFile->stat !=
DB_IN_ARCHIVE_RECOVERY? Assert(ControlFile->state ==
DB_IN_ARCHIVE_RECOVERY); in CreateRestartPoint doesn't fail any
regression tests.

Here's what can happen:
lastCheckPoint.redo is 100 and ControlFile->checkPointCopy.redo is
105, so, "skipping restartpoint, already performed at %X/%X"
LogCheckpointEnd isn't reached
lastCheckPoint.redo is 105 and ControlFile->checkPointCopy.redo is 100
and ControlFile->state == DB_IN_ARCHIVE_RECOVERY, then the control
file gets updated and LogCheckpointEnd prints the right redo lsn.
lastCheckPoint.redo is 105 and ControlFile->checkPointCopy.redo is 100
and ControlFile->state != DB_IN_ARCHIVE_RECOVERY, the the control file
doesn't get updated and LogCheckpointEnd just prints the control redo
lsn. Looks like this case is rare given Assert(ControlFile->state ==
DB_IN_ARCHIVE_RECOVERY); doesn't fail any tests.

I think we should just let LogCheckpointEnd print the redo lsn from
the control file. We can just remove the above errmsg("recovery
restart point at %X/%X" message altogether or just print it only in
the rare scenario, something like below:

if (ControlFile->state != DB_IN_ARCHIVE_RECOVERY)
{
ereport((log_checkpoints ? LOG : DEBUG2),
(errmsg("performed recovery restart point at %X/%X while
the database state is %s",
LSN_FORMAT_ARGS(lastCheckPoint.redo)),
getDBState(ControlFile->state)));
}

And the last commit/abort records's timestamp will always get logged
even before we reach here in the main redo loop (errmsg("last
completed transaction was at log time %s").

Or another way is to just pass the redo lsn to LogCheckpointEnd and
pass the lastCheckPoint.redo in if (ControlFile->state !=
DB_IN_ARCHIVE_RECOVERY) cases or when control file isn't updated but
restart point happened.

Thoughts?

Regards,
Bharath Rupireddy.




Ensure that STDERR is empty during connect_ok

2022-02-02 Thread Daniel Gustafsson
As part of the NSS patchset, quite a few bugs (and NSS quirks) were found by
inspecting STDERR in connect_ok and require it to be empty.  This is not really
NSS specific, and could help find issues in other libraries as well so I
propose to apply it regardless of the fate of the NSS patchset.

(The change in the SCRAM tests stems from this now making all testruns have the
same number of tests.  While I prefer to not plan at all and instead run
done_testing(), doing that consistently is for another patch, keeping this with
the remainder of the suites.)

--
Daniel Gustafsson   https://vmware.com/



0001-Ensure-that-STDERR-is-empty-in-connect_ok-tests.patch
Description: Binary data


Re: Deparsing rewritten query

2022-02-02 Thread Julien Rouhaud
Hi,

On Tue, Feb 01, 2022 at 08:35:00PM +0100, Pavel Stehule wrote:
> 
> I tested your patch, and it looks so it is working without any problem. All
> tests passed.
> 
> There is just one question. If printalias = true will be active for all
> cases or just with some flag?

Sorry, as I just replied to Bharath I sent the wrong patch.  The new patch has
the same modification with printalias = true though, so I can still answer that
question.  The change is active for all cases, however it's a no-op for any
in-core case, as a query sent by a client should be valid, and thus should have
an alias attached to all subqueries.  It's only different if you call
get_query_def() on the result of pg_analyze_and_rewrite(), since this code
doesn't add the subquery aliases as those aren't needed for the execution part.




Re: Deparsing rewritten query

2022-02-02 Thread Julien Rouhaud
Hi,

On Wed, Feb 02, 2022 at 07:09:35PM +0530, Bharath Rupireddy wrote:
> On Tue, Feb 1, 2022 at 9:08 AM Julien Rouhaud  wrote:
> >
> > Hi,
> 
> Thanks. +1 for this work. Some comments on v3:
> 
> 1) How about pg_get_rewritten_query()?

Argh, I just realized that I sent the patch from the wrong branch.  Per
previous complaint from Tom, I'm not proposing that function anymore (I will
publish an extension for that if the patch gets commits) but only expose
get_query_def().

I'm attaching the correct patch this time, sorry about that.
>From 0485ea1b507e8f2f1df782a97f11184276d7fca7 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Tue, 29 Jun 2021 00:07:04 +0800
Subject: [PATCH v3] Expose get_query_def()

This function can be useful for external module, for instance if they want to
display a statement after the rewrite stage.

In order to emit valid SQL, make sure that any subquery RTE comes with an
alias.

Author: Julien Rouhaud
Reviewed-by: Gilles Darold
Reviewed-by: Pavel Stehule
Discussion: https://postgr.es/m/20210627041138.zklczwmu3ms4ufnk@nol
---
 src/backend/utils/adt/ruleutils.c | 15 +++
 src/include/utils/ruleutils.h |  3 +++
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/adt/ruleutils.c 
b/src/backend/utils/adt/ruleutils.c
index 039b1d2b95..3db2948984 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -389,9 +389,6 @@ static void make_ruledef(StringInfo buf, HeapTuple ruletup, 
TupleDesc rulettc,
 int prettyFlags);
 static void make_viewdef(StringInfo buf, HeapTuple ruletup, TupleDesc rulettc,
 int prettyFlags, int 
wrapColumn);
-static void get_query_def(Query *query, StringInfo buf, List *parentnamespace,
- TupleDesc resultDesc,
- int prettyFlags, int 
wrapColumn, int startIndent);
 static void get_values_def(List *values_lists, deparse_context *context);
 static void get_with_clause(Query *query, deparse_context *context);
 static void get_select_query_def(Query *query, deparse_context *context,
@@ -5344,7 +5341,7 @@ make_viewdef(StringInfo buf, HeapTuple ruletup, TupleDesc 
rulettc,
  * the view represented by a SELECT query.
  * --
  */
-static void
+void
 get_query_def(Query *query, StringInfo buf, List *parentnamespace,
  TupleDesc resultDesc,
  int prettyFlags, int wrapColumn, int startIndent)
@@ -10989,6 +10986,16 @@ get_from_clause_item(Node *jtnode, Query *query, 
deparse_context *context)
if (strcmp(refname, rte->ctename) != 0)
printalias = true;
}
+   else if (rte->rtekind == RTE_SUBQUERY)
+   {
+   /*
+* For a subquery RTE, always print alias.  A 
user-specified query
+* should only be valid if an alias is provided, but 
our view
+* expansion doesn't generate aliases, so a rewritten 
query might
+* not be valid SQL.
+*/
+   printalias = true;
+   }
if (printalias)
appendStringInfo(buf, " %s", quote_identifier(refname));
 
diff --git a/src/include/utils/ruleutils.h b/src/include/utils/ruleutils.h
index e8090c96d7..f512bb6867 100644
--- a/src/include/utils/ruleutils.h
+++ b/src/include/utils/ruleutils.h
@@ -39,6 +39,9 @@ extern List *select_rtable_names_for_explain(List *rtable,

 Bitmapset *rels_used);
 extern char *generate_collation_name(Oid collid);
 extern char *generate_opclass_name(Oid opclass);
+void   get_query_def(Query *query, StringInfo buf, List 
*parentnamespace,
+ TupleDesc resultDesc,
+ int prettyFlags, int 
wrapColumn, int startIndent);
 extern char *get_range_partbound_string(List *bound_datums);
 
 extern char *pg_get_statisticsobjdef_string(Oid statextid);
-- 
2.35.0



Re: Schema variables - new implementation for Postgres 15

2022-02-02 Thread Julien Rouhaud
Hi,

On Sun, Jan 30, 2022 at 08:09:18PM +0100, Pavel Stehule wrote:
> 
> rebase after 02b8048ba5dc36238f3e7c3c58c5946220298d71

Here are a few comments, mostly about pg_variable.c and sessionvariable.c.  I
stopped before reading the whole patch as I have some concern about the sinval
machanism, which ould change a bit the rest of the patch.  I'm also attaching a
patch (with .txt extension to avoid problem with the cfbot) with some comment
update propositions.

In sessionvariable.c, why VariableEOXAction and VariableEOXActionCodes?  Can't
the parser emit directly the char value, like e.g. relpersistence?

extraneous returns for 2 functions:

+void
+get_session_variable_type_typmod_collid(Oid varid, Oid *typid, int32 *typmod,
+   Oid *collid)
+{
[...]
+   return;
+}

+void
+initVariable(Variable *var, Oid varid, bool fast_only)
+{
[...]
+   return;
+}

VariableCreate():

Maybe add a bunch of AssertArg() for all the mandatory parametrers?

Also, the check for variable already existing should be right after the
AssertArg(), and using SearchSysCacheExistsX().

Maybe also adding an Assert(OidIsValid(xxxoid)) just after the
CatalogTupleInsert(), similarly to some other creation functions?


event-triggers.sgml needs updating for the firing matrix, as session variable
are compatible with even triggers.

+typedef enum SVariableXActAction
+{
+   ON_COMMIT_DROP, /* used for ON COMMIT DROP */
+   ON_COMMIT_RESET,/* used for DROP VARIABLE */
+   RESET,  /* used for ON TRANSACTION END RESET */
+   RECHECK /* recheck if session variable is living */
+} SVariableXActAction;

The names seem a bit generic, maybe add a prefix like SVAR_xxx?

ON_COMMIT_RESET is also confusing as it looks like an SQL clause.  Maybe
PERFORM_DROP or something?

+static List *xact_drop_actions = NIL;
+static List *xact_reset_actions = NIL;

Maybe add a comment saying both are lists of SVariableXActAction?

+typedef SVariableData * SVariable;

looks like a missing bump to typedefs.list.

+char *
+get_session_variable_name(Oid varid)
+{
+   HeapTuple   tup;
+   Form_pg_variable varform;
+   char   *varname;
+
+   tup = SearchSysCache1(VARIABLEOID, ObjectIdGetDatum(varid));
+
+   if (!HeapTupleIsValid(tup))
+   elog(ERROR, "cache lookup failed for session variable %u", varid);
+
+   varform = (Form_pg_variable) GETSTRUCT(tup);
+
+   varname = NameStr(varform->varname);
+
+   ReleaseSysCache(tup);
+
+   return varname;
+}

This kind of function should return a palloc'd copy of the name.

+void
+ResetSessionVariables(void)
[...]
+   list_free_deep(xact_drop_actions);
+   xact_drop_actions = NIL;
+
+   list_free_deep(xact_reset_actions);
+   xact_drop_actions = NIL;
+}

The 2nd chunk should be xact_reset_actions = NIL

+static void register_session_variable_xact_action(Oid varid, 
SVariableXActAction action);
+static void delete_session_variable_xact_action(Oid varid, SVariableXActAction 
action);

The naming is a bit confusing, maybe unregister_session_cable_xact_action() for
consistency?

+void
+register_session_variable_xact_action(Oid varid,
+ SVariableXActAction action)

the function is missing the static keyword.

In AtPreEOXact_SessionVariable_on_xact_actions(), those 2 instructions are
executed twice (once in the middle and once at the end):

list_free_deep(xact_drop_actions);
xact_drop_actions = NIL;



+* If this entry was created during the current transaction,
+* creating_subid is the ID of the creating subxact; if created in a prior
+* transaction, creating_subid is zero.

I don't see any place in the code where creating_subid can be zero? It looks
like it's only there for future transactional implementation, but for now this
attribute seems unnecessary?


/* at transaction end recheck sinvalidated variables */
RegisterXactCallback(sync_sessionvars_xact_callback, NULL);

I don't think it's ok to use xact callback for in-core code.  The function
explicitly says:

> * These functions are intended for use by dynamically loaded modules.
> * For built-in modules we generally just hardwire the appropriate calls
> * (mainly because it's easier to control the order that way, where needed).

Also, this function and AtPreEOXact_SessionVariable_on_xact_actions() are
skipping all or part of the processing if there is no active transaction.  Is
that really ok?

I'm particularly sceptical about AtPreEOXact_SessionVariable_on_xact_actions
and the RECHECK actions, as the xact_reset_actions list is reset whether the
recheck was done or not, so it seems to me that it could be leaking some
entries in the hash table.  If the database has a lot of object, it seems
possible (while unlikely) that a subsequent CREATE VARIABLE can get the same
oid leading to incorrect results?

If that's somehow ok, wouldn't it be better to rearrange the code to call those
functions less often, and only when 

Re: Deparsing rewritten query

2022-02-02 Thread Bharath Rupireddy
On Tue, Feb 1, 2022 at 9:08 AM Julien Rouhaud  wrote:
>
> Hi,

Thanks. +1 for this work. Some comments on v3:

1) How about pg_get_rewritten_query()?
2) Docs missing?
3) How about allowing only the users who are explicitly granted to use
this function like pg_log_backend_memory_contexts,
pg_log_query_plan(in discussion), pg_log_backtrace(in discussion)?
4) initStringInfo in the for loop will palloc every time and will leak
the memory. you probably need to do resetStringInfo in the for loop
instead.
+ foreach(lc, querytree_list)
+ {
+ query = (Query *) lfirst(lc);
+ initStringInfo();
5) I would even suggest using a temp memory context for this function
alone, because it will ensure we dont' leak any memory which probably
parser, analyzer, rewriter would use.
6) Why can't query be for loop variable?
+ Query *query;
7) Why can't the function check for empty query string and emit error
immedeiately (empty string isn't allowed or some other better error
message), rather than depending on the pg_parse_query?
+ parsetree_list = pg_parse_query(sql);
+
+ /* only support one statement at a time */
+ if (list_length(parsetree_list) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("a single statement should be provided")));
8) Show rewritten query given raw query string
+{ oid => '9246', descr => 'show a query as rewritten',
9) Doesn't the input need a ; at the end of query? not sure if the
parser assumes it as provided?
+SELECT pg_get_query_def('SELECT * FROM shoe') as def;
10) For pg_get_viewdef it makes sense to have the test case in
rules.sql, but shouldn't this function be in misc_functions.sql?
11) Missing bump cat version note in the commit message.
12) I'm just thinking adding an extra option to explain, which will
then print the rewritten query in the explain output, would be useful
than having a separate function to do this?
13) Somebody might also be interested to just get the completed
planned query i.e. output of pg_plan_query? or the other way, given
the query plan as input to a function, can we get the query back?
something like postgres_fdw/deparse.c does?

Regards,
Bharath Rupireddy.




Re: Make relfile tombstone files conditional on WAL level

2022-02-02 Thread Dilip Kumar
On Wed, Feb 2, 2022 at 6:57 PM Robert Haas  wrote:
>
> On Mon, Jan 31, 2022 at 9:37 AM Dilip Kumar  wrote:
> > I agree that we are using 8 bytes unsigned int multiple places in code
> > as uint64.  But I don't see it as an exposed data type and not used as
> > part of any exposed function.  But we will have to use the relfilenode
> > in the exposed c function e.g.
> > binary_upgrade_set_next_heap_relfilenode().
>
> Oh, I thought we were talking about the C data type uint8 i.e. an
> 8-bit unsigned integer. Which in retrospect was a dumb thought because
> you said you wanted to store the relfilenode AND the fork number
> there, which only make sense if you were talking about SQL data types
> rather than C data types. It is confusing that we have an SQL data
> type called int8 and a C data type called int8 and they're not the
> same.
>
> But if you're talking about SQL data types, why? pg_class only stores
> the relfilenode and not the fork number currently, and I don't see why
> that would change. I think that the data type for the relfilenode
> column would change to a 64-bit signed integer (i.e. bigint or int8)
> that only ever uses the low-order 56 bits, and then when you need to
> store a relfilenode and a fork number in the same 8-byte quantity
> you'd do that using either a struct with bit fields or by something
> like combined = ((uint64) signed_representation_of_relfilenode) |
> (((int) forknumber) << 56);

Yeah you're right.  I think whenever we are using combined then we can
use uint64 C type and in pg_class we can keep it as int64 because that
is only representing the relfilenode part.  I think I was just
confused and tried to use the same data type everywhere whether it is
combined with fork number or not.  Thanks for your input, I will
change this.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Make relfile tombstone files conditional on WAL level

2022-02-02 Thread Robert Haas
On Mon, Jan 31, 2022 at 9:37 AM Dilip Kumar  wrote:
> I agree that we are using 8 bytes unsigned int multiple places in code
> as uint64.  But I don't see it as an exposed data type and not used as
> part of any exposed function.  But we will have to use the relfilenode
> in the exposed c function e.g.
> binary_upgrade_set_next_heap_relfilenode().

Oh, I thought we were talking about the C data type uint8 i.e. an
8-bit unsigned integer. Which in retrospect was a dumb thought because
you said you wanted to store the relfilenode AND the fork number
there, which only make sense if you were talking about SQL data types
rather than C data types. It is confusing that we have an SQL data
type called int8 and a C data type called int8 and they're not the
same.

But if you're talking about SQL data types, why? pg_class only stores
the relfilenode and not the fork number currently, and I don't see why
that would change. I think that the data type for the relfilenode
column would change to a 64-bit signed integer (i.e. bigint or int8)
that only ever uses the low-order 56 bits, and then when you need to
store a relfilenode and a fork number in the same 8-byte quantity
you'd do that using either a struct with bit fields or by something
like combined = ((uint64) signed_representation_of_relfilenode) |
(((int) forknumber) << 56);

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Refactoring SSL tests

2022-02-02 Thread Daniel Gustafsson
As part of the NSS patchset (and Secure Transport before that), I had to
refactor the SSL tests to handle different SSL libraries.  The current tests
and test module is quite tied to how OpenSSL works wrt setting up the server,
the attached refactors this and abstracts the OpenSSL specifics more like how
the rest of the codebase is set up.

The tight coupling is of course not a problem right now, but I think this patch
has more benefits making it a candidate for going in regardless of the fate of
the NSS patchset.  This is essentially the 0002 patch from that patchset with
additional cleanup and documentation:

* switch_server_cert takes a set of named parameters rather than a fixed set
with defaults depending on each other, which made adding ssl_passphrase to it
cumbersome. It also adds readability IMO.

* SSLServer is renamed SSL::Server, which in turn use SSL::Backend::X where X
is the backend pointed to by with_ssl.  Each backend will implement its own
module which is responsible for setting up keys/certs and to resolve sslkey
values to their full paths.  The idea is that the namespace will also allow for
an SSL::Client in the future when we implment running client tests against
different servers etc.

* The modules are POD documented.

* While not related to the refactor per se, the hardcoded number of planned
tests is removed in favor of calling done_testing().

With this, adding a new SSL library is quite straightforward, I've done the
legwork to test that =)

I opted for avoiding too invasive changes leaving the tests somewhat easy to
compare to back branches.

Thoughts?  I'm fairly sure there are many crimes against Perl in this patch,
I'm happy to take pointers on how to improve that.

--
Daniel Gustafsson   https://vmware.com/



0001-Refactor-SSL-tests.patch
Description: Binary data


Re: make MaxBackends available in _PG_init

2022-02-02 Thread Robert Haas
On Tue, Feb 1, 2022 at 5:36 PM Nathan Bossart  wrote:
> I can work on a new patch if this is the direction we want to go.  There
> were a couple of functions that called GetMaxBackends() repetitively that I
> should probably fix before the patch should be seriously considered.

Sure, that sort of thing should be tidied up. It's unlikely to make
any real difference, but as a fairly prominent PostgreSQL hacker once
said, a cycle saved is a cycle earned.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: ICU for global collation

2022-02-02 Thread Peter Eisentraut


On 27.01.22 09:10, Peter Eisentraut wrote:

On 21.01.22 17:13, Julien Rouhaud wrote:

On Fri, Jan 21, 2022 at 03:24:02PM +0100, Peter Eisentraut wrote:

On 21.01.22 14:51, Julien Rouhaud wrote:
Is that change intended?  There isn't any usage of the 
collversionstr before

the possible error when actual_versionstr is missing.


I wanted to move it closer to the SysCacheGetAttr() where the "datum" 
value
is obtained.  It seemed weird to get the datum, then do other things, 
then

decode the datum.


Oh ok.  It won't make much difference performance-wise, so no objection.


I have committed this and will provide follow-up patches in the next few 
days.


Here is the main patch rebased on the various changes that have been 
committed in the meantime.  There is still some work to be done on the 
user interfaces of initdb, createdb, etc.


I have split out the database-level collation version tracking into a 
separate patch [0].  I think we should get that one in first and then 
refresh this one.


[0]: 
https://www.postgresql.org/message-id/flat/f0ff3190-29a3-5b39-a179-fa32eee57db6%40enterprisedb.comFrom 4028980f6be3662c0302575ed92de77e941e5a9e Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 2 Feb 2022 13:54:04 +0100
Subject: [PATCH v4] Add option to use ICU as global collation provider

This adds the option to use ICU as the default collation provider for
either the whole cluster or a database.  New options for initdb,
createdb, and CREATE DATABASE are used to select this.

Discussion: 
https://www.postgresql.org/message-id/flat/5e756dd6-0e91-d778-96fd-b1bcb06c161a%402ndquadrant.com
---
 doc/src/sgml/catalogs.sgml|   9 +
 doc/src/sgml/ref/create_database.sgml |  16 ++
 doc/src/sgml/ref/createdb.sgml|   9 +
 doc/src/sgml/ref/initdb.sgml  |  23 ++
 src/backend/catalog/pg_collation.c|  18 +-
 src/backend/commands/collationcmds.c  |  93 ---
 src/backend/commands/dbcommands.c |  72 +-
 src/backend/utils/adt/pg_locale.c | 242 +++---
 src/backend/utils/init/postinit.c |  26 ++
 src/bin/initdb/Makefile   |   2 +
 src/bin/initdb/initdb.c   |  64 -
 src/bin/initdb/t/001_initdb.pl|  18 +-
 src/bin/pg_dump/pg_dump.c |  19 ++
 src/bin/pg_upgrade/check.c|  10 +
 src/bin/pg_upgrade/info.c |  18 +-
 src/bin/pg_upgrade/pg_upgrade.h   |   2 +
 src/bin/psql/describe.c   |  23 +-
 src/bin/psql/tab-complete.c   |   2 +-
 src/bin/scripts/Makefile  |   2 +
 src/bin/scripts/createdb.c|   9 +
 src/bin/scripts/t/020_createdb.pl |  20 +-
 src/include/catalog/pg_collation.dat  |   3 +-
 src/include/catalog/pg_collation.h|   6 +-
 src/include/catalog/pg_database.dat   |   4 +-
 src/include/catalog/pg_database.h |   6 +
 src/include/utils/pg_locale.h |   6 +
 .../regress/expected/collate.icu.utf8.out |  10 +-
 src/test/regress/sql/collate.icu.utf8.sql |   8 +-
 28 files changed, 572 insertions(+), 168 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 7d5b0b1656..5a5779b9a3 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2384,6 +2384,15 @@ pg_collation 
Columns
   
  
 
+ 
+  
+   collicucoll text
+  
+  
+   ICU collation string
+  
+ 
+
  
   
collversion text
diff --git a/doc/src/sgml/ref/create_database.sgml 
b/doc/src/sgml/ref/create_database.sgml
index f22e28dc81..403010cddf 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -28,6 +28,7 @@
[ LOCALE [=] locale ]
[ LC_COLLATE [=] lc_collate ]
[ LC_CTYPE [=] lc_ctype ]
+   [ COLLATION_PROVIDER [=] collation_provider ]
[ TABLESPACE [=] tablespace_name ]
[ ALLOW_CONNECTIONS [=] allowconn ]
[ CONNECTION LIMIT [=] connlimit ]
@@ -158,6 +159,21 @@ Parameters

   
  
+
+ 
+  collation_provider
+
+  
+   
+Specifies the provider to use for the default collation in this
+database.  Possible values are:
+icu,ICU
+libc.  libc is the default.  The
+available choices depend on the operating system and build options.
+   
+  
+ 
+
  
   tablespace_name
   
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index 86473455c9..4b07363fcc 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -83,6 +83,15 @@ Options
   
  
 
+ 
+  
--collation-provider={libc|icu}
+  
+   
+Specifies the collation provider for the database's default collation.
+   

Re: RFC: Logging plan of the running query

2022-02-02 Thread torikoshia

2022-02-01 01:51, Fujii Masao wrote:
Thanks for reviewing and suggestions!

   +    Note that there is the case where the request to log a 
query

   +    plan is skipped even while the target backend is running a
   +    query due to lock conflict avoidance.
   +    If this happens, users can just retry pg_log_query_plan().


This may cause users to misunderstand that pg_log_query_plan() fails
while the target backend is holding *any* locks? Isn't it better to
mention "page-level locks", instead? So how about the following?

--
Note that the request to log the query plan will be ignored if it's
received during a short period while the target backend is holding a
page-level lock.
--


Agreed.


+    ereport(LOG_SERVER_ONLY,
+    errmsg("backend with PID %d is holding a page lock. 
Try again",

+    MyProcPid));

It seems more proper to output this message in DETAIL or HINT,
instead. So how about something like the following messages?

LOG: could not log the query plan
DETAIL: query plan cannot be logged while page level lock is being 
held

HINT: Try pg_log_query_plan() after a few 


Agreed.
I felt the HINT message 'after a few ...' is difficult to describe, 
and wrote as below.


| HINT: Retrying pg_log_query_plan() might succeed since the lock 
duration of page level locks are usually short


How do you think?


Or we don't need HINT message?


Removed the HINT message.


+ errmsg("could not log the query plan"),
+ errdetail("query plan cannot be logged 
while page level lock is

being held"),

In detail message, the first word of sentences should be capitalized.
How about "Cannot log the query plan while holding page-level lock.",
instead?


Agreed.


Thanks for updating the patch! Here are some review comments.

+  
+   
+
+ pg_log_query_plan
+

This entry is placed before one for pg_log_backend_memory_contexts().
But it should be *after* that since those entries seem to be placed in
alphabetical order in the table?


Modified it.


+Requests to log the plan of the query currently running on the
+backend with specified process ID along with the untruncated
+query string.

Other descriptions about logging of query string seem not to mention
something like "untruncated query string". For example, auto_explain,
log_statement, etc. Why do we need to mention "along with the
untruncated query string" specially for pg_log_query_plan()?


Modified it as below:

 Requests to log the plan of the query currently running on 
the
-backend with specified process ID along with the 
untruncated

-query string.
-They will be logged at LOG message level 
and

+backend with specified process ID.
+It will be logged at LOG message level 
and


+Note that nested statements (statements executed inside a 
function) are not

+considered for logging. Only the plan of the most deeply nested
query is logged.

Now the plan of even nested statement can be logged. So this
description needs to be updated?


Modified it as below:

-Note that nested statements (statements executed inside a 
function) are not
-considered for logging. Only the plan of the most deeply nested 
query is logged.
+Note that when the statements are executed inside a function, 
only the

+plan of the most deeply nested query is logged.


@@ -440,6 +450,7 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
  MemoryContextSwitchTo(oldcontext);
 +ActiveQueryDesc = NULL;

ActiveQueryDesc seems unnecessary. Why does ActiveQueryDesc need to be
reset to NULL in standard_ExecutorFinish()?


ActiveQueryDesc should not be reset in standard_ExecutorFinish().
Removed it.


Currently even during ProcessLogQueryPlanInterrupt(),
pg_log_query_plan() can be call and another
ProcessLogQueryPlanInterrupt() can be executed. So repeatable
re-entrances to ProcessLogQueryPlanInterrupt() could cause "stack
depth limit exceeded" error. To avoid this, shouldn't we make
ProcessLogQueryPlanInterrupt() do nothing and return immediately, if
it's called during another ProcessLogQueryPlanInterrupt()?

pg_log_backend_memory_contexts() also might have the same issue.


As you pointed out offlist, the issue could occur even when both 
pg_log_backend_memory_contexts and pg_log_query_plan are called and it 
may be better to make another patch.



You also pointed out offlist that it would be necessary to call 
LockErrorCleanup() if ProcessLogQueryPlanInterrupt() can incur ERROR.
AFAICS it can call ereport(ERROR), i.e., palloc0() in NewExplainState(), 
so I added PG_TRY/CATCH and make it call LockErrorCleanup() when ERROR 
occurs.




On 2022-02-01 17:27, Kyotaro Horiguchi wrote:

Thanks for reviewing Horiguchi-san!


By the way, I'm anxious about the following part 

RE: [Proposal] Add foreign-server health checks infrastructure

2022-02-02 Thread kuroda.hay...@fujitsu.com
Dear Fujii-san,

Thank you for good suggestions.

> This logic sounds complicated to me. I'm afraid that FDW developers may a bit
> easily misunderstand the logic and make the bug in their FDW.
> Isn't it simpler to just disable the timeout in core whenever the transaction 
> ends
> whether committed or aborted, like statement_timeout is disabled after each
> command? For example, call something like DisableForeignCheckTimeout() in
> CommitTransaction() etc.

Your idea is that stop the timer at the end of each transactions, right?
I had not adopted that because I thought some developers might want not to stop 
the timer
even if transactions ends. It caused complexed situation and not have good 
usecase, however,
so your logic was implemented.

> > You are right. I think this suggests that error-reporting should be done in 
> > the
> core layer.
> > For meaningful error reporting, I changed a callback specification
> > that it should return ForeignServer* which points to downed remote server.
> 
> This approach seems to assume that FDW must manage all the ForeignServer
> information so that the callback can return it. Is this assumption valid for 
> all the
> FDW?

Not sure, the assumption might be too optimistic. 
mysql_fdw can easily return ForeignServer* because it caches serverid,
but dblink and maybe oracle_fdw cannot.

> How about making FDW trigger a query cancel interrupt by signaling SIGINT to
> the backend, instead?

I understood that the error should be caught by QueryCancelPending.
Could you check 0003? Does it follow that?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

<>


v08_0001_add_checking_infrastracture.patch
Description: v08_0001_add_checking_infrastracture.patch


v08_0002_add_doc.patch
Description: v08_0002_add_doc.patch


Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-02 Thread Amit Kapila
On Wed, Feb 2, 2022 at 1:06 PM David G. Johnston
 wrote:
>
> On Tue, Feb 1, 2022 at 11:55 PM Amit Kapila  wrote:
>>
>> On Wed, Feb 2, 2022 at 9:41 AM David G. Johnston
>>  wrote:
>> >
>> > On Tue, Feb 1, 2022 at 8:07 PM Amit Kapila  wrote:
>> >>
>> >> On Tue, Feb 1, 2022 at 11:47 AM Masahiko Sawada  
>> >> wrote:
>> >>
>> >> >
>> >> > I see that it's better to use a better IPC for ALTER SUBSCRIPTION SKIP
>> >> > feature to pass error-XID or error-LSN information to the worker
>> >> > whereas I'm also not sure of the advantages in storing all error
>> >> > information in a system catalog. Since what we need to do for this
>> >> > purpose is only error-XID/LSN, we can store only error-XID/LSN in the
>> >> > catalog? That is, the worker stores error-XID/LSN in the catalog on an
>> >> > error, and ALTER SUBSCRIPTION SKIP command enables the worker to skip
>> >> > the transaction in question. The worker clears the error-XID/LSN after
>> >> > successfully applying or skipping the first non-empty transaction.
>> >> >
>> >>
>> >> Where do you propose to store this information?
>> >
>> >
>> > pg_subscription_worker
>> >
>> > The error message and context is very important.  Just make sure it is 
>> > only non-null when the worker state is "syncing failed" (or whatever term 
>> > we use).
>> >
>> >
>>
>> Sure, but is this the reason you want to store all the error info in
>> the system catalog? I agree that providing more error info could be
>> useful and also possibly the previously failed (apply) xacts info as
>> well but I am not able to see why you want to have that sort of info
>> in the catalog. I could see storing info like err_lsn/err_xid that can
>> allow to proceed to apply worker automatically or to slow down the
>> launch of errored apply worker but not all sort of other error info
>> (like err_cnt, err_code, err_message, err_time, etc.). I want to know
>> why you are insisting to make all the error info persistent via the
>> system catalog?
>
>
...
...
>
> I already explained that the concept of err_cnt is not useful.  The fact that 
> you include it here makes me think you are still thinking that this all 
> somehow is meant to keep track of history.  It is not.  The workers are state 
> machines and "error" is one of the states - with relevant attributes to 
> display to the user, and system, while in that state.  The state machine 
> reporting does not care about historical states nor does it report on them.  
> There is some uncertainty if we continue with the automatic re-launch;
>

I think automatic retry will help to allow some transient errors say
like network glitches that can be resolved on retry and will keep the
behavior transparent. This is also consistent with what we do in
standby mode where if there is an error on primary due to which
standby is not able to fetch some data it will just retry. We can't
fix any error that occurred on the server-side, so the way is to retry
which is true for both standby and subscribers. Basically, I don't
think every kind of error demands user intervention. We can allow to
control it via some parameter say disable_on_error as is discussed in
CF entry [1] but don't think that should be the default.

[1] - https://commitfest.postgresql.org/36/3407/

-- 
With Regards,
Amit Kapila.




Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

2022-02-02 Thread Bharath Rupireddy
On Wed, Feb 2, 2022 at 5:25 AM Nathan Bossart  wrote:
>
> On Mon, Jan 31, 2022 at 10:42:54AM +0530, Bharath Rupireddy wrote:
> > After an off-list discussion with Andreas, proposing here a patch that
> > basically replaces ReadDir call with ReadDirExtended and gets rid of
> > lstat entirely. With this chance, the checkpoint will only care about
> > the snapshot and mapping files and not fail if it finds other files in
> > the directories. Removing lstat enables us to make things faster as we
> > avoid a bunch of extra system calls - one lstat call per each mapping
> > or snapshot file.
>
> I think removing the lstat() is probably reasonable.  We currently aren't
> doing proper error checking, and the chances of a non-regular file matching
> the prefix are likely pretty low.  In the worst case, we'll LOG or ERROR
> when unlinking or fsyncing fails.
>
> However, I'm not sure about the change to ReadDirExtended().  That might be
> okay for CheckPointSnapBuild(), which is just trying to remove old files,
> but CheckPointLogicalRewriteHeap() is responsible for ensuring that files
> are flushed to disk for the checkpoint.  If we stop reading the directory
> after an error and let the checkpoint continue, isn't it possible that some
> mappings files won't be persisted to disk?

Unless I mis-read your above statement, with LOG level in
ReadDirExtended, I don't think we stop reading the files in
CheckPointLogicalRewriteHeap. Am I missing something here?

Since, we also continue in CheckPointLogicalRewriteHeap if we can't
parse/delete some files with the change of "could not parse
filename"/"could not remove file" messages to LOG level

I'm attaching v6, just changed elog(LOG, to ereport(LOG in
CheckPointLogicalRewriteHeap, other things remain the same.

Regards,
Bharath Rupireddy.


v6-0001-Replace-ReadDir-with-ReadDirExtended.patch
Description: Binary data


Re: CREATE TABLE ( .. STORAGE ..)

2022-02-02 Thread Teodor Sigaev

Hi!


Are they both set to name or ColId? Although they are the same.



Thank you, fixed, that was just an oversight.


2 For ColumnDef new member storage_name, did you miss the function 
_copyColumnDef()  _equalColumnDef()?


Thank you, fixed




Regards
Wenjing



2021年12月27日 15:51,Teodor Sigaev  写道:

Hi!

Working on pluggable toaster (mostly, for JSONB improvements, see links below) 
I had found that STORAGE attribute on column is impossible to set  in CREATE 
TABLE command but COMPRESS option is possible. It looks unreasonable. Suggested 
patch implements this possibility.

[1] http://www.sai.msu.su/~megera/postgres/talks/jsonb-pgconfnyc-2021.pdf
[2] http://www.sai.msu.su/~megera/postgres/talks/jsonb-pgvision-2021.pdf
[3] http://www.sai.msu.su/~megera/postgres/talks/jsonb-pgconfonline-2021.pdf
[4] http://www.sai.msu.su/~megera/postgres/talks/bytea-pgconfonline-2021.pdf

PS I will propose pluggable toaster patch a bit later
--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: 
http://www.sigaev.ru/




--
Teodor Sigaev  E-mail: teo...@sigaev.ru
  WWW: http://www.sigaev.ru/

create_table_storage-v2.patch.gz
Description: application/gzip


  1   2   >