Re: Add checkpoint and redo LSN to LogCheckpointEnd log message

2022-02-07 Thread Kyotaro Horiguchi
At Mon, 7 Feb 2022 13:51:31 +0900, Fujii Masao  
wrote in 
> 
> 
> On 2022/02/07 12:02, Kyotaro Horiguchi wrote:
> > - If any later checkpoint/restartpoint has been established, just skip
> >remaining task then return false. (!chkpt_was_latest)
> >(I'm not sure this can happen, though.)
> > - we update control file only when archive recovery is still ongoing.
> 
> This comment seems to conflict with what your PoC patch does. Because
> with the patch, ControlFile->checkPoint and
> ControlFile->checkPointCopy seem to be updated even when
> ControlFile->state != DB_IN_ARCHIVE_RECOVERY.

Ah, yeah, by "update" I meant that "move forward". Sorry for confusing
wording.

> I agree with what your PoC patch does for now. When we're not in
> archive recovery state, checkpoint and REDO locations in pg_control
> should be updated but min recovery point should be reset to invalid
> one (which instruments that subsequent crash recovery should replay
> all available WAL files).

Yes.  All buffers before the last recovery point's end have been
flushed out so the recovery point is valid as a checkpoint.  On the
other hand minRecoveryPoint is no longer needed and actually is
ignored at the next crash recovery.  We can leave it alone but it is
consistent that it is cleared.

> > - Otherwise reset minRecoveryPoint then continue.
> > Do you have any thoughts or opinions?
> 
> Regarding chkpt_was_latest, whether the state is
> DB_IN_ARCHIVE_RECOVERY or not, if checkpoint and redo locations in
> pg_control are updated, IMO we don't need to skip the "remaining
> tasks". Since those locations are updated and subsequent crash
> recovery will start from that redo location, for example, ISTM that we
> can safely delete old WAL files prior to the redo location as the
> "remaining tasks". Thought?

If I read you correctly, the PoC works that way. It updates pg_control
if the restart point is latest then performs the remaining cleanup
tasks in that case. Recovery state doesn't affect this process.

I reexamined about the possibility of concurrent checkpoints.

Both CreateCheckPoint and CreateRestartPoint are called from
checkpointer loop, shutdown handler of checkpointer and standalone
process.  So I can't see a possibility of concurrent checkpoints.

In the past we had a time when startup process called CreateCheckPoint
directly in the crash recovery case where checkpoint is not running
but since 7ff23c6d27 checkpoint is started before startup process
starts.  So I conclude that that cannot happen.

So the attached takes away the path for the case where the restart
point is overtaken by a concurrent checkpoint.

Thus.. the attached removes the ambiguity of of the proposed patch
about the LSNs in the restartpoint-ending log message.

Thoughts?

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 4369846b6de00fbddb215300a8ff774bbc04 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 8 Feb 2022 16:42:53 +0900
Subject: [PATCH v1] Get rid of unused path to handle concurrent checkpoints

CreateRestartPoint considered the case a concurrent checkpoint is
running. But after 7ff23c6d27 we no longer launch multiple checkpoints
simultaneously.  That code path, if it is passed, may leave
unrecoverable database by removing WAL segments that are required by
the last established restartpoint.
---
 src/backend/access/transam/xlog.c | 53 +++
 1 file changed, 32 insertions(+), 21 deletions(-)

diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 958220c495..01a345e8bd 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9752,29 +9752,30 @@ CreateRestartPoint(int flags)
PriorRedoPtr = ControlFile->checkPointCopy.redo;
 
/*
-* Update pg_control, using current time.  Check that it still shows
-* DB_IN_ARCHIVE_RECOVERY state and an older checkpoint, else do 
nothing;
-* this is a quick hack to make sure nothing really bad happens if 
somehow
-* we get here after the end-of-recovery checkpoint.
+* Update pg_control, using current time.
 */
LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
-   if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY &&
-   ControlFile->checkPointCopy.redo < lastCheckPoint.redo)
+
+   /* We mustn't have a concurrent checkpoint that advances checkpoint LSN 
*/
+   Assert(lastCheckPoint.redo > ControlFile->checkPointCopy.redo);
+
+   ControlFile->checkPoint = lastCheckPointRecPtr;
+   ControlFile->checkPointCopy = lastCheckPoint;
+
+   /*
+* Ensure minRecoveryPoint is past the checkpoint record while archive
+* recovery is still ongoing.  Normally, this will have happened already
+* while writing out dirty buffers, but not necessarily - e.g. because 
no
+* buffers were dirtied.  We do this because a non-exclusive base backup
+* uses minRecoveryPoint to determine 

Re: [BUG]Update Toast data failure in logical replication

2022-02-07 Thread Dilip Kumar
On Tue, Feb 8, 2022 at 12:48 AM Andres Freund  wrote:
>
> Hi,
>
> On 2022-02-07 08:44:00 +0530, Amit Kapila wrote:
> > Right, and it is getting changed. We are just printing the first 200
> > characters (by using SQL [1]) from the decoded tuple so what is shown
> > in the results is the initial 200 bytes.
>
> Ah, I knew I must have been missing something.
>
>
> > The complete decoded data after the patch is as follows:
>
> Hm. I think we should change the way the strings are shortened - otherwise we
> don't really verify much in that test. Perhaps we could just replace the long
> repetitive strings with something shorter in the output?
>
> E.g. using something like regexp_replace(data, 
> '(1234567890|9876543210){200}', '\1{200}','g')
> inside the substr().


IMHO, in this particular case using regexp_replace as you explained
would be a good option as we will be verifying complete data instead
of just the first 200 characters.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements

2022-02-07 Thread Anton A. Melnikov

Hello!

On 26.01.2022 16:43, Andrei Zubkov wrote:

>>
>> If you're worried about some external table having a NOT NULL
>> constraint for
>> those fields, how about returning NaN instead?  That's a valid value
>> for a
>> double precision data type.
>
> I don't know for sure what we can expect to be wrong here. My opinion
> is to use NULLs, as they seems more suitable here. But this can be done
> only if we are not expecting significant side effects.

Let me suggest for your consideration an additional reset request flag 
that can be used to synchronize reset in a way similar to interrupt 
handling.
External reset can set this flag immediately. Then pg_stat_statements 
will wait for the moment when the required query hits into the 
statistics and only at this moment really reset the aux statistics,
write a new timestamp and clear the flag. At the time of real reset, 
total_time will be determined, and pg_stat_statements can immediately 
initialize min and max correctly.
From reset to the next query execution the aux view will give old 
correct values so neither NaNs nor NULLs will be required.
Also we can put the value of reset request flag into the aux view to 
give feedback to the external application that reset was requested.


With best regards,
--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [PATCH] Add min() and max() aggregate functions for xid8

2022-02-07 Thread Fujii Masao



On 2022/02/08 13:23, Ken Kato wrote:


Thank you for the comments!


if (FullTransactionIdFollows(fxid1, fxid2))
    PG_RETURN_FULLTRANSACTIONID(fxid1);
else
    PG_RETURN_FULLTRANSACTIONID(fxid2);



Isn't it better to use '0x'::xid8 instead of
'18446744073709551615'::xid8, to more easily understand that this test
uses maximum number allowed as xid8?


I updated these two parts as you suggested.



In addition to those two xid8 values, IMO it's better to insert also
the xid8 value neither minimum nor maximum xid8 ones, for example,
'42'::xid8.


I added '010'::xid8, '42'::xid8, and '-1'::xid8
in addition to '0'::xid8 and '0x'::xid8
just to have more varieties.


Thanks for updating the patch! It basically looks good to me. I applied the 
following small changes to the patch. Updated version of the patch attached. 
Could you review this version?

+   if (FullTransactionIdFollowsOrEquals(fxid1, fxid2))
+   PG_RETURN_FULLTRANSACTIONID(fxid1);

I used FullTransactionIdFollows() and FullTransactionIdPrecedes() in xid8_larger() and xid8_smaller() 
because other xxx_larger() and xxx_smaller() functions also use ">" operator instead of 
">=".

+create table xid8_tab (x xid8);
+insert into xid8_tab values ('0'::xid8), ('010'::xid8),
+('42'::xid8), ('0x'::xid8), ('-1'::xid8);

Since "::xid8" is not necessary here, I got rid of it from the above query.

I also merged this xid8_tab and the existing xid8_t1 table, to reduce the 
number of table creation.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATIONdiff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 8754f2f89b..1b064b4feb 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19973,7 +19973,7 @@ SELECT NULLIF(value, '(none)') ...
 values.  Available for any numeric, string, date/time, or enum type,
 as well as inet, interval,
 money, oid, pg_lsn,
-tid,
+tid, xid8,
 and arrays of any of these types.

Yes
@@ -19992,7 +19992,7 @@ SELECT NULLIF(value, '(none)') ...
 values.  Available for any numeric, string, date/time, or enum type,
 as well as inet, interval,
 money, oid, pg_lsn,
-tid,
+tid, xid8,
 and arrays of any of these types.

Yes
diff --git a/src/backend/utils/adt/xid.c b/src/backend/utils/adt/xid.c
index 9b4ceaea47..e4b4952a28 100644
--- a/src/backend/utils/adt/xid.c
+++ b/src/backend/utils/adt/xid.c
@@ -286,6 +286,30 @@ xid8cmp(PG_FUNCTION_ARGS)
PG_RETURN_INT32(-1);
 }
 
+Datum
+xid8_larger(PG_FUNCTION_ARGS)
+{
+   FullTransactionId fxid1 = PG_GETARG_FULLTRANSACTIONID(0);
+   FullTransactionId fxid2 = PG_GETARG_FULLTRANSACTIONID(1);
+
+   if (FullTransactionIdFollows(fxid1, fxid2))
+   PG_RETURN_FULLTRANSACTIONID(fxid1);
+   else
+   PG_RETURN_FULLTRANSACTIONID(fxid2);
+}
+
+Datum
+xid8_smaller(PG_FUNCTION_ARGS)
+{
+   FullTransactionId fxid1 = PG_GETARG_FULLTRANSACTIONID(0);
+   FullTransactionId fxid2 = PG_GETARG_FULLTRANSACTIONID(1);
+
+   if (FullTransactionIdPrecedes(fxid1, fxid2))
+   PG_RETURN_FULLTRANSACTIONID(fxid1);
+   else
+   PG_RETURN_FULLTRANSACTIONID(fxid2);
+}
+
 /*
  *  COMMAND IDENTIFIER ROUTINES
 *
  */
diff --git a/src/include/catalog/pg_aggregate.dat 
b/src/include/catalog/pg_aggregate.dat
index 137f6eef69..2843f4b415 100644
--- a/src/include/catalog/pg_aggregate.dat
+++ b/src/include/catalog/pg_aggregate.dat
@@ -149,6 +149,9 @@
 { aggfnoid => 'max(pg_lsn)', aggtransfn => 'pg_lsn_larger',
   aggcombinefn => 'pg_lsn_larger', aggsortop => '>(pg_lsn,pg_lsn)',
   aggtranstype => 'pg_lsn' },
+{ aggfnoid => 'max(xid8)', aggtransfn => 'xid8_larger',
+  aggcombinefn => 'xid8_larger', aggsortop => '>(xid8,xid8)',
+  aggtranstype => 'xid8' },
 
 # min
 { aggfnoid => 'min(int8)', aggtransfn => 'int8smaller',
@@ -214,6 +217,9 @@
 { aggfnoid => 'min(pg_lsn)', aggtransfn => 'pg_lsn_smaller',
   aggcombinefn => 'pg_lsn_smaller', aggsortop => '<(pg_lsn,pg_lsn)',
   aggtranstype => 'pg_lsn' },
+{ aggfnoid => 'min(xid8)', aggtransfn => 'xid8_smaller',
+  aggcombinefn => 'xid8_smaller', aggsortop => '<(xid8,xid8)',
+  aggtranstype => 'xid8' },
 
 # count
 { aggfnoid => 'count(any)', aggtransfn => 'int8inc_any',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 7024dbe10a..62f36daa98 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -203,6 +203,12 @@
 { oid => '5071', descr => 'convert xid8 to xid',
   proname => 'xid', prorettype => 'xid', proargtypes => 'xid8',
  

Re: [PATCH] Add min() and max() aggregate functions for xid8

2022-02-07 Thread Ken Kato


Thank you for the comments!


if (FullTransactionIdFollows(fxid1, fxid2))
PG_RETURN_FULLTRANSACTIONID(fxid1);
else
PG_RETURN_FULLTRANSACTIONID(fxid2);



Isn't it better to use '0x'::xid8 instead of
'18446744073709551615'::xid8, to more easily understand that this test
uses maximum number allowed as xid8?


I updated these two parts as you suggested.



In addition to those two xid8 values, IMO it's better to insert also
the xid8 value neither minimum nor maximum xid8 ones, for example,
'42'::xid8.


I added '010'::xid8, '42'::xid8, and '-1'::xid8
in addition to '0'::xid8 and '0x'::xid8
just to have more varieties.


Best wishes,

--
Ken Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATIONdiff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 8754f2f89b..1b064b4feb 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19973,7 +19973,7 @@ SELECT NULLIF(value, '(none)') ...
 values.  Available for any numeric, string, date/time, or enum type,
 as well as inet, interval,
 money, oid, pg_lsn,
-tid,
+tid, xid8,
 and arrays of any of these types.

Yes
@@ -19992,7 +19992,7 @@ SELECT NULLIF(value, '(none)') ...
 values.  Available for any numeric, string, date/time, or enum type,
 as well as inet, interval,
 money, oid, pg_lsn,
-tid,
+tid, xid8,
 and arrays of any of these types.

Yes
diff --git a/src/backend/utils/adt/xid.c b/src/backend/utils/adt/xid.c
index 9b4ceaea47..9f7e1816b0 100644
--- a/src/backend/utils/adt/xid.c
+++ b/src/backend/utils/adt/xid.c
@@ -286,6 +286,30 @@ xid8cmp(PG_FUNCTION_ARGS)
 		PG_RETURN_INT32(-1);
 }
 
+Datum
+xid8_larger(PG_FUNCTION_ARGS)
+{
+	FullTransactionId fxid1 = PG_GETARG_FULLTRANSACTIONID(0);
+	FullTransactionId fxid2 = PG_GETARG_FULLTRANSACTIONID(1);
+
+	if (FullTransactionIdFollowsOrEquals(fxid1, fxid2))
+		PG_RETURN_FULLTRANSACTIONID(fxid1);
+	else
+		PG_RETURN_FULLTRANSACTIONID(fxid2);
+}
+
+Datum
+xid8_smaller(PG_FUNCTION_ARGS)
+{
+	FullTransactionId fxid1 = PG_GETARG_FULLTRANSACTIONID(0);
+	FullTransactionId fxid2 = PG_GETARG_FULLTRANSACTIONID(1);
+
+	if (FullTransactionIdPrecedesOrEquals(fxid1, fxid2))
+		PG_RETURN_FULLTRANSACTIONID(fxid1);
+	else
+		PG_RETURN_FULLTRANSACTIONID(fxid2);
+}
+
 /*
  *	 COMMAND IDENTIFIER ROUTINES			 *
  */
diff --git a/src/include/catalog/pg_aggregate.dat b/src/include/catalog/pg_aggregate.dat
index 137f6eef69..2843f4b415 100644
--- a/src/include/catalog/pg_aggregate.dat
+++ b/src/include/catalog/pg_aggregate.dat
@@ -149,6 +149,9 @@
 { aggfnoid => 'max(pg_lsn)', aggtransfn => 'pg_lsn_larger',
   aggcombinefn => 'pg_lsn_larger', aggsortop => '>(pg_lsn,pg_lsn)',
   aggtranstype => 'pg_lsn' },
+{ aggfnoid => 'max(xid8)', aggtransfn => 'xid8_larger',
+  aggcombinefn => 'xid8_larger', aggsortop => '>(xid8,xid8)',
+  aggtranstype => 'xid8' },
 
 # min
 { aggfnoid => 'min(int8)', aggtransfn => 'int8smaller',
@@ -214,6 +217,9 @@
 { aggfnoid => 'min(pg_lsn)', aggtransfn => 'pg_lsn_smaller',
   aggcombinefn => 'pg_lsn_smaller', aggsortop => '<(pg_lsn,pg_lsn)',
   aggtranstype => 'pg_lsn' },
+{ aggfnoid => 'min(xid8)', aggtransfn => 'xid8_smaller',
+  aggcombinefn => 'xid8_smaller', aggsortop => '<(xid8,xid8)',
+  aggtranstype => 'xid8' },
 
 # count
 { aggfnoid => 'count(any)', aggtransfn => 'int8inc_any',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 7024dbe10a..62f36daa98 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -203,6 +203,12 @@
 { oid => '5071', descr => 'convert xid8 to xid',
   proname => 'xid', prorettype => 'xid', proargtypes => 'xid8',
   prosrc => 'xid8toxid' },
+{ oid => '5097', descr => 'larger of two',
+  proname => 'xid8_larger', prorettype => 'xid8', proargtypes => 'xid8 xid8',
+  prosrc => 'xid8_larger' },
+{ oid => '5098', descr => 'smaller of two',
+  proname => 'xid8_smaller', prorettype => 'xid8', proargtypes => 'xid8 xid8',
+  prosrc => 'xid8_smaller' },
 { oid => '69',
   proname => 'cideq', proleakproof => 't', prorettype => 'bool',
   proargtypes => 'cid cid', prosrc => 'cideq' },
@@ -6564,6 +6570,9 @@
 { oid => '4189', descr => 'maximum value of all pg_lsn input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'pg_lsn',
   proargtypes => 'pg_lsn', prosrc => 'aggregate_dummy' },
+{ oid => '5099', descr => 'maximum value of all xid8 input values',
+  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'xid8',
+  proargtypes => 'xid8', prosrc => 'aggregate_dummy' },
 
 { oid => '2131', descr => 'minimum value of all bigint input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', 

Re: shared_preload_libraries = 'pg_stat_statements' failing with installcheck (compute_query_id = auto)

2022-02-07 Thread Julien Rouhaud
On Tue, Feb 08, 2022 at 12:56:59PM +0900, Michael Paquier wrote:
> On Tue, Feb 08, 2022 at 11:48:15AM +0800, Julien Rouhaud wrote:
> > That's already been discussed in [1] and rejected, as it would also mean 
> > losing
> > the ability to have pg_stat_statements (or any similar extension) coverage
> > using the regression tests.  I personally rely on regression tests for such
> > custom extensions quite a lot, so I'm still -1 on that.
> 
> Well, I can see that this is a second independent complain after a few
> months.

> If you wish to keep this capability, wouldn't it be better to
> add a "regress" mode to compute_query_id, where we would mask
> automatically this information in the output of EXPLAIN but still run
> the computation?

Did you just realize you had some script broken because of that (or have an
actual need) or did you only try to see what setting breaks the regression
tests?  If the latter, it seems that the complaint seems a bit artificial.

No one complained about it since, and I'm assuming that pgpro could easily fix
their test workflow since they didn't try to do add such a new mode.

I suggested something like that in  the thread but Tom didn't seemed
interested.  Note that it would be much cleaner to do now that we rely on an
internal query_id_enabled variable rather than changing compute_query_id value,
but I don't know if it's really worth the effort given the number of things
that already breaks the regression tests.




Re: GUC flags

2022-02-07 Thread Michael Paquier
On Mon, Feb 07, 2022 at 09:07:28PM -0600, Justin Pryzby wrote:
> I think this is the regex I wrote to handle either "name = value" or "name
> value", which was needed between f47ed79cc..4d7c3e344.   See skip_equals.

Yes, I took it from there, noticing that it was working just fine for
this purpose.

> It's fine the way it is, but could also remove the 2nd half of the alternation
> (|), since GUCs shouldn't be added to sample.conf without '='.

Makes sense.  check_guc also checks after this pattern.
--
Michael


signature.asc
Description: PGP signature


Re: WaitLatchOrSocket seems to not count to 4 right...

2022-02-07 Thread Fujii Masao




On 2022/02/08 9:51, Thomas Munro wrote:

an index instead of a pointer...) or, since it's a bit silly to add
both of those events, maybe we should do:

-   if ((wakeEvents & WL_POSTMASTER_DEATH) && IsUnderPostmaster)
-   AddWaitEventToSet(set, WL_POSTMASTER_DEATH, PGINVALID_SOCKET,
- NULL, NULL);
-
 if ((wakeEvents & WL_EXIT_ON_PM_DEATH) && IsUnderPostmaster)
 AddWaitEventToSet(set, WL_EXIT_ON_PM_DEATH, PGINVALID_SOCKET,
   NULL, NULL);
+   else if ((wakeEvents & WL_POSTMASTER_DEATH) && IsUnderPostmaster)
+   AddWaitEventToSet(set, WL_POSTMASTER_DEATH, PGINVALID_SOCKET,
+


+1

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: shared_preload_libraries = 'pg_stat_statements' failing with installcheck (compute_query_id = auto)

2022-02-07 Thread Michael Paquier
On Tue, Feb 08, 2022 at 11:48:15AM +0800, Julien Rouhaud wrote:
> That's already been discussed in [1] and rejected, as it would also mean 
> losing
> the ability to have pg_stat_statements (or any similar extension) coverage
> using the regression tests.  I personally rely on regression tests for such
> custom extensions quite a lot, so I'm still -1 on that.

Well, I can see that this is a second independent complain after a few
months.  If you wish to keep this capability, wouldn't it be better to
add a "regress" mode to compute_query_id, where we would mask
automatically this information in the output of EXPLAIN but still run
the computation?
--
Michael


signature.asc
Description: PGP signature


Re: shared_preload_libraries = 'pg_stat_statements' failing with installcheck (compute_query_id = auto)

2022-02-07 Thread Julien Rouhaud
Hi,

On Tue, Feb 08, 2022 at 12:38:46PM +0900, Michael Paquier wrote:
> 
> While testing installcheck with various server configurations to see
> how the main regression test suites could break, I found that loading
> pg_stat_statements into the backend is enough to break installcheck
> as compute_query_id = auto, the default, lets the decision to compute
> query IDs to pg_stat_statements itself.  In short, loading
> pg_stat_statements breaks EXPLAIN outputs of any SQL-based regression
> test.

Indeed, that's unfortunate but that's also a known behavior.

> Running installcheck on existing installations is a popular sanity
> check, as much as is enabling pg_stat_statements by default, so it
> seems to me that we'd better disable compute_query_id by default in
> the databases created for the sake of the regression tests, enabling
> it only in places where it is relevant.  We do that in explain.sql for
> a test with compute_query_id, but pg_stat_statements does not do
> that.

I agree that enabling pg_stat_statements is something common when you're
actually going to use your instance, I'm not sure that it also applies to
environment for running regression tests.

> I'd like to suggest a fix for that, by tweaking the tests of
> pg_stat_statements to use compute_query_id = auto, so as we would
> still stress the code paths where the module takes the decision to
> compute query IDs, while the default regression databases would
> disable it.  Please note that this also fixes the case of any
> out-of-core modules that have EXPLAIN cases.

That's already been discussed in [1] and rejected, as it would also mean losing
the ability to have pg_stat_statements (or any similar extension) coverage
using the regression tests.  I personally rely on regression tests for such
custom extensions quite a lot, so I'm still -1 on that.

[1] 
https://www.postgresql.org/message-id/flat/1634283396.372373993%40f75.i.mail.ru




shared_preload_libraries = 'pg_stat_statements' failing with installcheck (compute_query_id = auto)

2022-02-07 Thread Michael Paquier
Hi all,
(Added Bruce and Julien in CC)

While testing installcheck with various server configurations to see
how the main regression test suites could break, I found that loading
pg_stat_statements into the backend is enough to break installcheck
as compute_query_id = auto, the default, lets the decision to compute
query IDs to pg_stat_statements itself.  In short, loading
pg_stat_statements breaks EXPLAIN outputs of any SQL-based regression
test.

Running installcheck on existing installations is a popular sanity
check, as much as is enabling pg_stat_statements by default, so it
seems to me that we'd better disable compute_query_id by default in
the databases created for the sake of the regression tests, enabling
it only in places where it is relevant.  We do that in explain.sql for
a test with compute_query_id, but pg_stat_statements does not do
that.

I'd like to suggest a fix for that, by tweaking the tests of
pg_stat_statements to use compute_query_id = auto, so as we would
still stress the code paths where the module takes the decision to
compute query IDs, while the default regression databases would
disable it.  Please note that this also fixes the case of any
out-of-core modules that have EXPLAIN cases.

The attached is enough to pass installcheck-world, on an instance
where pg_stat_statements is loaded.

Thoughts?
--
Michael
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index e6f71c7582..f305a4e57a 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -1925,8 +1925,9 @@ create_database(const char *dbname)
 	 "ALTER DATABASE \"%s\" SET lc_numeric TO 'C';"
 	 "ALTER DATABASE \"%s\" SET lc_time TO 'C';"
 	 "ALTER DATABASE \"%s\" SET bytea_output TO 'hex';"
+	 "ALTER DATABASE \"%s\" SET compute_query_id TO 'off';"
 	 "ALTER DATABASE \"%s\" SET timezone_abbreviations TO 'Default';",
-	 dbname, dbname, dbname, dbname, dbname, dbname);
+	 dbname, dbname, dbname, dbname, dbname, dbname, dbname);
 	psql_end_command(buf, "postgres");
 
 	/*
diff --git a/contrib/pg_stat_statements/expected/oldextversions.out b/contrib/pg_stat_statements/expected/oldextversions.out
index f18c08838f..ff20286db7 100644
--- a/contrib/pg_stat_statements/expected/oldextversions.out
+++ b/contrib/pg_stat_statements/expected/oldextversions.out
@@ -1,4 +1,5 @@
 -- test old extension version entry points
+SET compute_query_id = auto;
 CREATE EXTENSION pg_stat_statements WITH VERSION '1.4';
 -- Execution of pg_stat_statements_reset() is granted only to
 -- superusers in 1.4, so this fails.
diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index e0abe34bb6..e8d22f791b 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -1,4 +1,5 @@
 CREATE EXTENSION pg_stat_statements;
+SET compute_query_id = auto;
 --
 -- simple and compound statements
 --
diff --git a/contrib/pg_stat_statements/sql/oldextversions.sql b/contrib/pg_stat_statements/sql/oldextversions.sql
index f2e822acd3..773e5f55f9 100644
--- a/contrib/pg_stat_statements/sql/oldextversions.sql
+++ b/contrib/pg_stat_statements/sql/oldextversions.sql
@@ -1,4 +1,5 @@
 -- test old extension version entry points
+SET compute_query_id = auto;
 
 CREATE EXTENSION pg_stat_statements WITH VERSION '1.4';
 -- Execution of pg_stat_statements_reset() is granted only to
diff --git a/contrib/pg_stat_statements/sql/pg_stat_statements.sql b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
index dffd2c8c18..74ced1a7f4 100644
--- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql
+++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
@@ -1,4 +1,5 @@
 CREATE EXTENSION pg_stat_statements;
+SET compute_query_id = auto;
 
 --
 -- simple and compound statements


signature.asc
Description: PGP signature


Re: [RFC] building postgres with meson - perl embedding

2022-02-07 Thread Andres Freund
Hi,

On 2022-02-07 20:42:09 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > What is the reason behind subtracting ccdlflags?
>
> It looks like the coding actually originated here:
>
> commit f5d0c6cad5bb2706e0e63f3f8f32e431ea428100
> Author: Bruce Momjian 
> Date:   Wed Jun 20 00:26:06 2001 +
>
> Apparently, on some systems, ExtUtils::Embed and MakeMaker are slightly
> broken, and its impossible to make a shared library when compiling with
> both CCDLFLAGS and LDDLFAGS, you have to pick one or the other.
>
> Alex Pilosov
>
> and Peter just copied the logic in 7662419f1.  Considering that
> the point of 7662419f1 was to get rid of MakeMaker, maybe we no
> longer needed that at that point.

Yea. And maybe what was broken in 2001 isn't broken anymore either ;)


Looking at a number of OSs:

debian sid:
embed:  -Wl,-E  -fstack-protector-strong -L/usr/local/lib  
-L/usr/lib/x86_64-linux-gnu/perl/5.34/CORE -lperl -ldl -lm -lpthread -lc -lcrypt
ldopts: -Wl,-E

fedora:
embed:  -Wl,--enable-new-dtags -Wl,-z,relro -Wl,--as-needed -Wl,-z,now 
-specs=/usr/lib/rpm/redhat/redhat-hardened-ld 
-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -Wl,-z,relro -Wl,--as-needed 
-Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld 
-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -fstack-protector-strong 
-L/usr/local/lib  -L/usr/lib64/perl5/CORE -lperl -lpthread -lresolv -ldl -lm 
-lcrypt -lutil -lc
ldopts: -Wl,--enable-new-dtags -Wl,-z,relro -Wl,--as-needed -Wl,-z,now 
-specs=/usr/lib/rpm/redhat/redhat-hardened-ld 
-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1

suse tumbleweed:
embed:  -Wl,-E -Wl,-rpath,/usr/lib/perl5/5.34.0/x86_64-linux-thread-multi/CORE  
-L/usr/local/lib64 -fstack-protector-strong  
-L/usr/lib/perl5/5.34.0/x86_64-linux-thread-multi/CORE -lperl -lm -ldl -lcrypt 
-lpthread
ldopts: -Wl,-E -Wl,-rpath,/usr/lib/perl5/5.34.0/x86_64-linux-thread-multi/CORE

freebsd:
embed:  -Wl,-R/usr/local/lib/perl5/5.30/mach/CORE -pthread -Wl,-E  
-fstack-protector-strong -L/usr/local/lib  
-L/usr/local/lib/perl5/5.30/mach/CORE -lperl -lpthread -lm -lcrypt -lutil
ldopts: -Wl,-R/usr/local/lib/perl5/5.30/mach/CORE

netbsd:
embed:  -Wl,-E  -Wl,-R/usr/pkg/lib/perl5/5.34.0/x86_64-netbsd-thread-multi/CORE 
 -pthread -L/usr/lib -Wl,-R/usr/lib -Wl,-R/usr/pkg/lib -L/usr/pkg/lib  
-L/usr/pkg/lib/perl5/5.34.0/x86_64-netbsd-thread-multi/CORE -lperl -lm -lcrypt 
-lpthread
ldopts: -Wl,-E  -Wl,-R/usr/pkg/lib/perl5/5.34.0/x86_64-netbsd-thread-multi/CORE

openbsd:
embed:  -Wl,-R/usr/libdata/perl5/amd64-openbsd/CORE -Wl,-E  
-fstack-protector-strong -L/usr/local/lib  
-L/usr/libdata/perl5/amd64-openbsd/CORE -lperl -lm -lc
ldopts: -Wl,-R/usr/libdata/perl5/amd64-openbsd/CORE

aix:
embed:  -bE:/usr/opt/perl5/lib64/5.28.1/aix-thread-multi-64all/CORE/perl.exp 
-bE:/usr/opt/perl5/lib64/5.28.1/aix-thread-multi-64all/CORE/perl.exp -brtl 
-bdynamic -b64  -L/usr/opt/perl5/lib64/5.28.1/aix-thread-multi-64all/CORE 
-lperl -lpthread -lbind -lnsl -ldl -lld -lm -lcrypt -lpthreads -lc
ldopts: -bE:/usr/opt/perl5/lib64/5.28.1/aix-thread-multi-64all/CORE/perl.exp 
-bE:/usr/opt/perl5/lib64/5.28.1/aix-thread-multi-64all/CORE/perl.exp

mac m1 monterey:
embed:  -fstack-protector-strong  
-L/System/Library/Perl/5.30/darwin-thread-multi-2level/CORE -lperl
ldopts:

windows msys install ucrt perl:
embed:  -s -L"C:\dev\msys64\ucrt64\lib\perl5\core_perl\CORE" 
-L"C:\dev\msys64\ucrt64\lib"  
"C:\dev\msys64\ucrt64\lib\perl5\core_perl\CORE\libperl532.a"
ldopts:

windows strawberrry perl:
embed:  -s -L"C:\STRAWB~1\perl\lib\CORE" -L"C:\STRAWB~1\c\lib"  
"C:\STRAWB~1\perl\lib\CORE\libperl530.a" 
"C:\STRAWB~1\c\x86_64-w64-mingw32\lib\libmoldname.a" 
"C:\STRAWB~1\c\x86_64-w64-mingw32\lib\libkernel32.a" 
"C:\STRAWB~1\c\x86_64-w64-mingw32\lib\libuser32.a" 
"C:\STRAWB~1\c\x86_64-w64-mingw32\lib\libgdi32.a" 
"C:\STRAWB~1\c\x86_64-w64-mingw32\lib\libwinspool.a" 
"C:\STRAWB~1\c\x86_64-w64-mingw32\lib\libcomdlg32.a" 
"C:\STRAWB~1\c\x86_64-w64-mingw32\lib\libadvapi32.a" 
"C:\STRAWB~1\c\x86_64-w64-mingw32\lib\libshell32.a" 
"C:\STRAWB~1\c\x86_64-w64-mingw32\lib\libole32.a" 
"C:\STRAWB~1\c\x86_64-w64-mingw32\lib\liboleaut32.a" 
"C:\STRAWB~1\c\x86_64-w64-mingw32\lib\libnetapi32.a" 
"C:\STRAWB~1\c\x86_64-w64-mingw32\lib\libuuid.a" 
"C:\STRAWB~1\c\x86_64-w64-mingw32\lib\libws2_32.a" 
"C:\STRAWB~1\c\x86_64-w64-mingw32\lib\libmpr.a" 
"C:\STRAWB~1\c\x86_64-w64-mingw32\lib\libwinmm.a" 
"C:\STRAWB~1\c\x86_64-w64-mingw32\lib\libversion.a" 
"C:\STRAWB~1\c\x86_64-w64-mingw32\lib\libodbc32.a" 
"C:\STRAWB~1\c\x86_64-w64-mingw32\lib\libodbccp32.a" 
"C:\STRAWB~1\c\x86_64-w64-mingw32\lib\libcomctl32.a"
ldopts:


So on windows, macos it makes no difference because ldopts is empty.

On various linuxes, except red-hat and debian ones, as well as on the BSDs, it
removes rpath. Which we then add back in various places (pl and transform
modules). On debian the added rpath never will contain the library.

AIX is the one exception. Specifying -bE... doesn't seem right for building
plperl etc. 

Re: GUC flags

2022-02-07 Thread Justin Pryzby
On Tue, Feb 08, 2022 at 10:44:07AM +0900, Michael Paquier wrote:
> What do you think about the updated version attached?  I have applied
> the addition of config_data() separately.

Looks fine

> + # Check if this line matches a GUC parameter.
> + if ($line =~ m/^#?([_[:alpha:]]+) (= .*|[^ ]*$)/)

I think this is the regex I wrote to handle either "name = value" or "name
value", which was needed between f47ed79cc..4d7c3e344.   See skip_equals.

It's fine the way it is, but could also remove the 2nd half of the alternation
(|), since GUCs shouldn't be added to sample.conf without '='.

-- 
Justin




RE: Logical replication timeout problem

2022-02-07 Thread wangw.f...@fujitsu.com
On Wed, Jan 26, 2022 at 11:37 AM I wrote:
> On Sat, Jan 22, 2022 at 7:12 PM Amit Kapila  wrote:
> > Now, one idea to solve this problem could be that whenever we skip
> > sending any change we do try to update the plugin progress via
> > OutputPluginUpdateProgress(for walsender, it will invoke
> > WalSndUpdateProgress), and there it tries to process replies and send
> > keep_alive if necessary as we do when we send some data via
> > OutputPluginWrite(for walsender, it will invoke WalSndWriteData). I
> > don't know whether it is a good idea to invoke such a mechanism for
> > every change we skip to send or we should do it after we skip sending
> > some threshold of continuous changes. I think later would be
> > preferred. Also, we might want to introduce a new parameter
> > send_keep_alive to this API so that there is flexibility to invoke
> > this mechanism as we don't need to invoke it while we are actually
> > sending data and before that, we just update the progress via this
> > API.
> ..
> Based on above, I think the second idea that sending some threshold of
> continuous changes might be better, I will do some research about this
> approach.
Based on the second idea, I wrote a new patch(see attachment).

Regards,
Wang wei


0001-Fix-the-timeout-of-subscriber-in-long-transactions.patch
Description:  0001-Fix-the-timeout-of-subscriber-in-long-transactions.patch


RE: Logical replication timeout problem

2022-02-07 Thread wangw.f...@fujitsu.com
On Sat, Jan 28, 2022 at 19:36 PM Fabrice Chapuis  
wrote:
> shouldn't we use receiver_timeout in place of wal_sender_timeout because de
> problem comes from the consummer.
Thanks for your review.

IMO, because it is a bug fix on the publisher-side, and the keepalive message
is sent based on wal_sender_timeout in the existing code. So keep it consistent
with the existing code.

Regards,
Wang wei


Re: Refactoring the regression tests for more independence

2022-02-07 Thread Julien Rouhaud
On Mon, Feb 07, 2022 at 02:00:25PM -0500, Tom Lane wrote:
> Not too surprisingly, these patches broke during the commitfest.
> Here's a rebased version.
> 
> I'm not sure that anyone wants to review these in detail ...
> should I just go ahead and push them?

I don't see anything shocking after a quick glance, and I don't think any
review is going to give any more confidence compared to the script-dep-testing
script, so +1 for pushing them since the cf bot is green again.




Re: [RFC] building postgres with meson - perl embedding

2022-02-07 Thread Tom Lane
I wrote:
> Andres Freund  writes:
>> What is the reason behind subtracting ccdlflags?

> It looks like the coding actually originated here:
> commit f5d0c6cad5bb2706e0e63f3f8f32e431ea428100

Ah, here's the thread leading up to that:

https://www.postgresql.org/message-id/flat/200106191206.f5JC6R108371%40candle.pha.pa.us

The use of ldopts rather than hand-hacked link options seems to date to
0ed7864d6, only a couple days before that.  I don't think we had a
buildfarm then, but I'd bet against the problem being especially
widespread even then, or more people would've complained.


BTW, the business with zapping arch options seems to not be necessary
anymore either on recent macOS:

$ perl -MExtUtils::Embed -e ldopts
   -fstack-protector-strong  
-L/System/Library/Perl/5.30/darwin-thread-multi-2level/CORE -lperl
$ perl -MConfig -e 'print $Config{ccdlflags}'
 $

(same results on either Intel or ARM Mac).  However, it looks like it
is still necessary to keep locust happy, and I have no idea just when
Apple stopped using arch switches here, so we'd better keep that.

regards, tom lane




Add tag/category to the commitfest app

2022-02-07 Thread Julien Rouhaud
Hi,

During the last commit fest, some new contributors mentioned me that they were
having trouble to find patch to review.

And indeed, the commit fest app doesn't provide any kind of information about
the patch difficulty (which is different from the patch size, so you can't even
try to estimate it that way), whether there's a need for acceptance on the
feature/approach more than a code review, if it's a POC..., and that's not
welcoming.

I was thinking that we could add a tag system, probably with some limited
predefined tags, to address that problem.

Do you think it's worth adding, or do you have a better idea to help newer
contributors?




Re: GUC flags

2022-02-07 Thread Michael Paquier
On Sun, Feb 06, 2022 at 09:04:14PM -0600, Justin Pryzby wrote:
> Your test is checking that stuff in sample.conf is actually a GUC and not
> marked NOT_IN_SAMPLE.  But those are both unlikely mistakes to make.

Yeah, you are right.  Still, I don't see any reason to not include both.

> I'd first parse the GUC-like lines in the file, making a list of gucs_in_file
> and then compare the two lists.

This is a good idea, and makes the tests faster because there is no
need to test each GUC separately.  While testing a bit more, I got
recalled by the fact that config_file is not marked as NOT_IN_SAMPLE
and not in postgresql.conf.sample, so the new case you suggested was
failing.

What do you think about the updated version attached?  I have applied
the addition of config_data() separately.
--
Michael
From dd9f0cb34960e89dd1ea5c12ab03ee1bc6884e1f Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 8 Feb 2022 10:10:42 +0900
Subject: [PATCH v2] Add TAP test to automate check_guc

---
 src/test/modules/test_misc/t/003_check_guc.pl | 105 ++
 1 file changed, 105 insertions(+)
 create mode 100644 src/test/modules/test_misc/t/003_check_guc.pl

diff --git a/src/test/modules/test_misc/t/003_check_guc.pl b/src/test/modules/test_misc/t/003_check_guc.pl
new file mode 100644
index 00..01f954e09c
--- /dev/null
+++ b/src/test/modules/test_misc/t/003_check_guc.pl
@@ -0,0 +1,105 @@
+# Tests to cross-check the consistency of GUC parameters with
+# postgresql.conf.sample.
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More tests => 3;
+
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->start;
+
+# Grab the names of all the parameters that can be listed in the
+# configuration sample file.  config_file is an exception.  It is not
+# in postgresql.conf.sample but is part of the lists from guc.c.
+my $all_params = $node->safe_psql(
+	'postgres',
+	"SELECT name
+ FROM pg_settings
+   WHERE NOT 'NOT_IN_SAMPLE' = ANY (pg_settings_get_flags(name)) AND
+   name <> 'config_file'
+ ORDER BY 1");
+# Note the lower-case conversion, for consistency.
+my @all_params_array = split("\n", lc($all_params));
+
+# Grab the names of all parameters marked as NOT_IN_SAMPLE.
+my $not_in_sample = $node->safe_psql(
+	'postgres',
+	"SELECT name
+ FROM pg_settings
+   WHERE 'NOT_IN_SAMPLE' = ANY (pg_settings_get_flags(name))
+ ORDER BY 1");
+# Note the lower-case conversion, for consistency.
+my @not_in_sample_array = split("\n", lc($not_in_sample));
+
+# Find the location of postgresql.conf.sample, based on the information
+# provided by pg_config.
+my $sample_file =
+  $node->config_data('--sharedir') . '/postgresql.conf.sample';
+
+# List of all the GUCs found in the sample file.
+my @gucs_in_file;
+
+# Read the sample file line-by-line, checking its contents to build a list
+# of everything known as a GUC.
+my $num_tests = 0;
+open(my $contents, '<', $sample_file)
+  || die "Could not open $sample_file: $!";
+while (my $line = <$contents>)
+{
+	# Check if this line matches a GUC parameter.
+	if ($line =~ m/^#?([_[:alpha:]]+) (= .*|[^ ]*$)/)
+	{
+		# Lower-case conversion matters for some of the GUCs.
+		my $param_name = lc($1);
+
+		# Ignore some exceptions.
+		next if $param_name eq "include";
+		next if $param_name eq "include_dir";
+		next if $param_name eq "include_if_exists";
+
+		# Update the list of GUCs found in the sample file, for the
+		# follow-up tests.
+		push @gucs_in_file, $param_name;
+	}
+}
+
+close $contents;
+
+# Cross-check that all the GUCs found in the sample file match the ones
+# fetched above.  This maps the arrays to a hash, making the creation of
+# each exclude and intersection list easier.
+my %gucs_in_file_hash  = map { $_ => 1 } @gucs_in_file;
+my %all_params_hash= map { $_ => 1 } @all_params_array;
+my %not_in_sample_hash = map { $_ => 1 } @not_in_sample_array;
+
+my @missing_from_file = grep(!$gucs_in_file_hash{$_}, @all_params_array);
+is(scalar(@missing_from_file),
+	0, "no parameters missing from postgresql.conf.sample");
+
+my @missing_from_list = grep(!$all_params_hash{$_}, @gucs_in_file);
+is(scalar(@missing_from_list), 0, "no parameters missing from guc.c");
+
+my @sample_intersect = grep($not_in_sample_hash{$_}, @gucs_in_file);
+is(scalar(@sample_intersect),
+	0, "no parameters marked as NOT_IN_SAMPLE in postgresql.conf.sample");
+
+# These would log some information only on errors.
+foreach my $param (@missing_from_file)
+{
+	print("found GUC $param in guc.c, missing from postgresql.conf.sample\n");
+}
+foreach my $param (@missing_from_list)
+{
+	print(
+		"found GUC $param in postgresql.conf.sample, with incorrect info in guc.c\n"
+	);
+}
+foreach my $param (@sample_intersect)
+{
+	print(
+		"found GUC $param in postgresql.conf.sample, marked as NOT_IN_SAMPLE\n"
+	);
+}
-- 
2.34.1



signature.asc
Description: PGP signature


Re: 2022-02-10 release announcement draft

2022-02-07 Thread Jonathan S. Katz

On 2/6/22 10:20 PM, Justin Pryzby wrote:

On Sun, Feb 06, 2022 at 08:01:02PM -0500, Jonathan S. Katz wrote:

Hi,

Attached is a draft for the release announcement for the 2022-02-10
cumulative update release.

Please review for technical accuracy or if you believe any items should be
added/removed. Please provide feedback no later than 2020-02-10 0:00 AoE[1].


I guess you mean 2022 ;)


Maybe ;)


* The [`psql`](https://www.postgresql.org/docs/current/app-psql.html)
`\password` command now defaults to setting the password for to the role defined
by `CURRENT_USER`. Additionally, the role name is now included in the password
prompt.


s/for to/for/


Done.


* Build extended statistics for partitioned tables. If you previously added
extended statistics to a partitioned table, you can fix this by running
[`ANALYZE`](https://www.postgresql.org/docs/current/sql-analyze.html).
[`autovacuum`](https://www.postgresql.org/docs/current/routine-vacuuming.html#AUTOVACUUM)
currently does not process these statistics, so you must periodically run
`ANALYZE` on any partitioned tables that are using extended statistics.


Instead of "you can fix this" , I suggest to say "you should run ANALYZE on
those tables to collect updated statistics."


I opted for most of this suggestion.


It should say "not process these TABLES" (not stats), and should not say "that
are using extended statistics" since partitioned tables should be manually
analyzed whether or not they have extended stats.




It's good to say "..you should periodically run ANALYZE on partitioned tables
to collect updated statistics." (even though this patch doesn't change that).


I reread the release note an agree, I misinterpreted this. I updated the 
language.





* Fix crash with 
[multiranges](https://www.postgresql.org/docs/current/rangetypes.html)
when extracting a variable-length data types.


remove "a" or s/types/types/


Fixed.


* Disallow altering data type of a partitioned table's columns when the
partitioned table's row type is used as a composite type elsewhere.


the data types ?


This is correct, per 7ead9925f

I've attached the updated copy. Thanks!

Jonathan
The PostgreSQL Global Development Group has released an update to all supported
versions of PostgreSQL, including 14.2, 13.6, 12.10, 11.15, and 10.20. This
release fixes over 55 bugs reported over the last three months.

For the full list of changes, please review the
[release notes](https://www.postgresql.org/docs/release/).

Bug Fixes and Improvements
--

This update fixes over 55 bugs that were reported in the last several months.
The issues listed below affect PostgreSQL 14. Some of these issues may also
affect other supported versions of PostgreSQL.

Included in this release:

* Fix for a low probability scenario of index corruption when a HOT
([heap-only 
tuple](https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/heap/README.HOT;hb=HEAD))
chain changes state during `VACUUM`. Encountering this issue is unlikely, but if
you are concerned, please consider
[reindexing](https://www.postgresql.org/docs/current/sql-reindex.html).
* Fix for using [`REINDEX 
CONCURRENTLY`](https://www.postgresql.org/docs/current/sql-reindex.html)
on [TOAST](https://www.postgresql.org/docs/current/storage-toast.html) table
indexes to prevent corruption. You can fix any TOAST indexes by
[reindexing](https://www.postgresql.org/docs/current/sql-reindex.html) them
again.
* The [`psql`](https://www.postgresql.org/docs/current/app-psql.html)
`\password` command now defaults to setting the password for the role defined
by `CURRENT_USER`. Additionally, the role name is now included in the password
prompt.
* Build extended statistics for partitioned tables. If you previously added
extended statistics to a partitioned table, you should run
[`ANALYZE`](https://www.postgresql.org/docs/current/sql-analyze.html) on those
tables.
As 
[`autovacuum`](https://www.postgresql.org/docs/current/routine-vacuuming.html#AUTOVACUUM)
currently does not process partitioned tables, you must periodically run
`ANALYZE` on any partitioned tables to update their statistics.
* Fix crash with [`ALTER 
STATISTICS`](https://www.postgresql.org/docs/current/sql-alterstatistics.html)
when the statistics object is dropped concurrently.
* Fix crash with 
[multiranges](https://www.postgresql.org/docs/current/rangetypes.html)
when extracting variable-length data types.
* Several fixes to the query planner that lead to incorrect query results.
* Several fixes for query plan 
[memoization](https://www.postgresql.org/docs/current/runtime-config-query.html#GUC-ENABLE-MEMOIZE).
* Fix startup of a physical replica to tolerate transaction ID wraparound.
* When using logical replication, avoid duplicate transmission of a
partitioned table's data when the publication includes both the child and parent
tables.
* Disallow altering data type of a partitioned table's columns when the
partitioned table's row type 

Re: [RFC] building postgres with meson - perl embedding

2022-02-07 Thread Tom Lane
Andres Freund  writes:
> What is the reason behind subtracting ccdlflags?

It looks like the coding actually originated here:

commit f5d0c6cad5bb2706e0e63f3f8f32e431ea428100
Author: Bruce Momjian 
Date:   Wed Jun 20 00:26:06 2001 +

Apparently, on some systems, ExtUtils::Embed and MakeMaker are slightly
broken, and its impossible to make a shared library when compiling with
both CCDLFLAGS and LDDLFAGS, you have to pick one or the other.

Alex Pilosov

and Peter just copied the logic in 7662419f1.  Considering that
the point of 7662419f1 was to get rid of MakeMaker, maybe we no
longer needed that at that point.

On my RHEL box, the output of ldopts is sufficiently redundant
that the subtraction doesn't actually accomplish much:

$ perl -MExtUtils::Embed -e ldopts
-Wl,--enable-new-dtags -Wl,-z,relro -Wl,-z,now 
-specs=/usr/lib/rpm/redhat/redhat-hardened-ld -Wl,-z,relro -Wl,-z,now 
-specs=/usr/lib/rpm/redhat/redhat-hardened-ld -fstack-protector-strong 
-L/usr/local/lib  -L/usr/lib64/perl5/CORE -lperl -lpthread -lresolv -ldl -lm 
-lcrypt -lutil -lc

$ perl -MConfig -e 'print $Config{ccdlflags}'
-Wl,--enable-new-dtags -Wl,-z,relro -Wl,-z,now 
-specs=/usr/lib/rpm/redhat/redhat-hardened-ld

which leads to

$ grep perl_embed_ldflags src/Makefile.global
perl_embed_ldflags  =  -Wl,-z,relro -Wl,-z,now 
-specs=/usr/lib/rpm/redhat/redhat-hardened-ld -fstack-protector-strong 
-L/usr/local/lib  -L/usr/lib64/perl5/CORE -lperl -lpthread -lresolv -ldl -lm 
-lcrypt -lutil -lc

so the only thing we actually got rid of was -Wl,--enable-new-dtags,
which I think we'll put back anyway.

Things might be different elsewhere of course, but I'm tempted
to take out the ccdlflags subtraction and see what the buildfarm
says.

regards, tom lane




Re: [RFC] building postgres with meson - autogenerated headers

2022-02-07 Thread Andres Freund
Hi,

On 2022-02-07 16:30:53 -0500, Tom Lane wrote:
> > A second aspect that I'm wondering about is whether we should try to split
> > pg_config.h output a bit:
> 
> TBH I can't get excited about that.  I do not think that rebuilding
> with different options is a critical path.  ccache already does most
> of the heavy lifting when you do that sort of thing, anyway.

I've found it to be pretty painful when building with msvc, which doesn't have
ccache (yet at least), and where the process startup overhead is bigger.

Even on some other platforms it's useful - it takes a while on net/openbsd to
recompile postgres, even if everything is in ccache. If I test on some
platform I'll often install the most basic set, get the tests to run, and then
incrementally figure out what other packages need to be installed etc.

Greetings,

Andres Freund




Re: [RFC] building postgres with meson - perl embedding

2022-02-07 Thread Andres Freund
Hi,

I was trying to fix a few perl embedding oddities in the meson
patchset.

Whenever I have looked at the existing code, I've been a bit confused about
the following

code/comment in perl.m4:

# PGAC_CHECK_PERL_EMBED_LDFLAGS
# -
# We are after Embed's ldopts, but without the subset mentioned in
# Config's ccdlflags; [...]

pgac_tmp1=`$PERL -MExtUtils::Embed -e ldopts`
pgac_tmp2=`$PERL -MConfig -e 'print $Config{ccdlflags}'`
perl_embed_ldflags=`echo X"$pgac_tmp1" | sed -e "s/^X//" -e 
"s%$pgac_tmp2%%" -e ["s/ -arch [-a-zA-Z0-9_]*//g"]`

What is the reason behind subtracting ccdlflags?


The comment originates in:

commit d69a419e682c2d39c2355105a7e5e2b90357c8f0
Author: Tom Lane 
Date:   2009-09-08 18:15:55 +

Remove any -arch switches given in ExtUtils::Embed's ldopts from our
perl_embed_ldflags setting.  On OS X it seems that ExtUtils::Embed is
trying to force a universal binary to be built, but you need to specify
that a lot further upstream if you want Postgres built that way; the only
result of including -arch in perl_embed_ldflags is some warnings at the
plperl.so link step.  Per my complaint and Jan Otto's suggestion.

but the subtraction goes all the way back to

commit 7662419f1bc1a994193c319c9304dfc47e121c98
Author: Peter Eisentraut 
Date:   2002-05-28 16:57:53 +

Change PL/Perl and Pg interface build to use configured compiler and
Makefile.shlib system, not MakeMaker.


Greetings,

Andres Freund




Re: [PATCH] Add UPDATE WHERE OFFSET IN clause

2022-02-07 Thread Tom Lane
Chapman Flack  writes:
> On 02/07/22 00:59, Böszörményi Zoltán wrote:
>> UPDATE ... WHERE OFFSET n IN cursor;

> If added to UPDATE, should this be added to DELETE also?

FWIW, I think this is a really horrid hack.  For one thing, it's not
robust against not-strictly-linear FETCH/MOVE of the cursor.  It looks
to me like "OFFSET n" really means "the row that we read N reads ago",
not "the row N before the current cursor position".  I see that the
documentation does explain it that way, but it fails to account for
optimizations such as whether we implement moves by reading backwards
or rewind-and-read-forwards.  I don't think we want to expose that
sort of implementation detail.

I'm also pretty displeased with causing unbounded memory consumption for
every use of nodeLockRows, whether it has anything to do with a cursor or
not (never mind whether the cursor will ever be used for WHERE OFFSET IN).
Yeah, it's only a few bytes per row, but that will add up in queries that
process lots of rows.

regards, tom lane




Re: WaitLatchOrSocket seems to not count to 4 right...

2022-02-07 Thread Thomas Munro
On Tue, Feb 8, 2022 at 1:48 PM Fujii Masao  wrote:
> On 2022/02/08 7:00, Greg Stark wrote:
> > Unless I'm misreading this code I think the nevents in
> > WaitLatchOrSocket should really be 4 not 3. At least there are 4 calls
> > to AddWaitEventToSet in it and I think it's possible to trigger all 4.
>
> Good catch! I think you're right.
>
> As the quick test, I confirmed that the assertion failure happened when I 
> passed four possible events to WaitLatchOrSocket() in postgres_fdw.
>
> TRAP: FailedAssertion("set->nevents < set->nevents_space", File: "latch.c", 
> Line: 868, PID: 54424)

Yeah, the assertion shows there's no problem, but we should change it
to 4 (and one day we should make it dynamic and change udata to hold
an index instead of a pointer...) or, since it's a bit silly to add
both of those events, maybe we should do:

-   if ((wakeEvents & WL_POSTMASTER_DEATH) && IsUnderPostmaster)
-   AddWaitEventToSet(set, WL_POSTMASTER_DEATH, PGINVALID_SOCKET,
- NULL, NULL);
-
if ((wakeEvents & WL_EXIT_ON_PM_DEATH) && IsUnderPostmaster)
AddWaitEventToSet(set, WL_EXIT_ON_PM_DEATH, PGINVALID_SOCKET,
  NULL, NULL);
+   else if ((wakeEvents & WL_POSTMASTER_DEATH) && IsUnderPostmaster)
+   AddWaitEventToSet(set, WL_POSTMASTER_DEATH, PGINVALID_SOCKET,
+




Re: WaitLatchOrSocket seems to not count to 4 right...

2022-02-07 Thread Fujii Masao




On 2022/02/08 7:00, Greg Stark wrote:

Unless I'm misreading this code I think the nevents in
WaitLatchOrSocket should really be 4 not 3. At least there are 4 calls
to AddWaitEventToSet in it and I think it's possible to trigger all 4.


Good catch! I think you're right.

As the quick test, I confirmed that the assertion failure happened when I 
passed four possible events to WaitLatchOrSocket() in postgres_fdw.

TRAP: FailedAssertion("set->nevents < set->nevents_space", File: "latch.c", 
Line: 868, PID: 54424)
0   postgres0x000107efa49f ExceptionalCondition 
+ 223
1   postgres0x000107cbca0c AddWaitEventToSet + 
76
2   postgres0x000107cbd86e WaitLatchOrSocket + 
430
3   postgres_fdw.so 0x00010848b1aa pgfdw_get_result + 
218
4   postgres_fdw.so 0x00010848accb do_sql_command + 75
5   postgres_fdw.so 0x00010848c6b8 
configure_remote_session + 40
6   postgres_fdw.so 0x00010848c32d connect_pg_server + 
1629
7   postgres_fdw.so 0x00010848aa06 make_new_connection 
+ 566
8   postgres_fdw.so 0x000108489a06 GetConnection + 550
9   postgres_fdw.so 0x000108497ba4 
postgresBeginForeignScan + 260
10  postgres0x000107a7d79f ExecInitForeignScan 
+ 943
11  postgres0x000107a5c8ab ExecInitNode + 683
12  postgres0x000107a5028a InitPlan + 1386
13  postgres0x000107a4fb66 
standard_ExecutorStart + 806
14  postgres0x000107a4f833 ExecutorStart + 83
15  postgres0x000107d0277f PortalStart + 735
16  postgres0x000107cfe150 exec_simple_query + 
1168
17  postgres0x000107cfd39e PostgresMain + 2110
18  postgres0x000107c07e72 BackendRun + 50
19  postgres0x000107c07438 BackendStartup + 552
20  postgres0x000107c0621c ServerLoop + 716
21  postgres0x000107c039f9 PostmasterMain + 6441
22  postgres0x000107ae20d9 main + 809
23  libdyld.dylib   0x7fff2045cf3d start + 1
24  ??? 0x0003 0x0 + 3

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: [PATCH] Add UPDATE WHERE OFFSET IN clause

2022-02-07 Thread Chapman Flack
Hi!

On 02/07/22 00:59, Böszörményi Zoltán wrote:
> UPDATE ... WHERE OFFSET n IN cursor;

If added to UPDATE, should this be added to DELETE also?

Regards,
-Chap




RE: logical replication empty transactions

2022-02-07 Thread osumi.takami...@fujitsu.com
Hi,


Thank you for your updating the patch.

I'll quote one of the past discussions
in order to make this thread go forward or more active.
On Friday, August 13, 2021 8:01 PM Ajin Cherian  wrote:
> On Mon, Aug 2, 2021 at 7:20 PM Amit Kapila 
> wrote:
> >
> > On Fri, Jul 23, 2021 at 3:39 PM Ajin Cherian  wrote:
> > >
> >
> > Let's first split the patch for prepared and non-prepared cases as
> > that will help to focus on each of them separately. BTW, why haven't
> > you considered implementing point 1b as explained by Andres in his
> > email [1]? I think we can send a keepalive message in case of
> > synchronous replication when we skip an empty transaction, otherwise,
> > it might delay in responding to transactions synchronous_commit mode.
> > I think in the tests done in the thread, it might not have been shown
> > because we are already sending keepalives too frequently. But what if
> > someone disables wal_sender_timeout or kept it to a very large value?
> > See WalSndKeepaliveIfNecessary. The other thing you might want to look
> > at is if the reason for frequent keepalives is the same as described
> > in the email [2].
> >
> 
> I have tried to address the comment here by modifying the
> ctx->update_progress callback function (WalSndUpdateProgress) provided
> for plugins. I have added an option
> by which the callback can specify if it wants to send keep_alives. And when
> the callback is called with that option set, walsender updates a flag
> force_keep_alive_syncrep.
> The Walsender in the WalSndWaitForWal for loop, checks this flag and if
> synchronous replication is enabled, then sends a keep alive.
> Currently this logic
> is added as an else to the current logic that is already there in
> WalSndWaitForWal, which is probably considered unnecessary and a source of
> the keep alive flood that you talked about. So, I can change that according to
> how that fix shapes up there. I have also added an extern function in 
> syncrep.c
> that makes it possible for walsender to query if synchronous replication is
> turned on.
Changing the timing to send the keepalive to the decoding commit
timing didn't look impossible to me, although my suggestion
can be ad-hoc.

After the initialization of sentPtr(by confirmed_flush lsn),
sentPtr is updated from logical_decoding_ctx->reader->EndRecPtr in 
XLogSendLogical.
In the XLogSendLogical, we update it after we execute 
LogicalDecodingProcessRecord.
This order leads to the current implementation to wait the next iteration
to send a keepalive in WalSndWaitForWal.

But, I felt we can utilize end_lsn passed to ReorderBufferCommit for updating
sentPtr. The end_lsn is the lsn same as the ctx->reader->EndRecPtr,
which means advancing the timing to update the sentPtr for the commit case.
Then if the transaction is empty in synchronous mode,
send the keepalive in WalSndUpdateProgress directly,
instead of having the force_keepalive_syncrep flag and having it true.


Best Regards,
Takamichi Osumi



Re: fix crash with Python 3.11

2022-02-07 Thread Tom Lane
Peter Eisentraut  writes:
> I've been struggling to make progress on this.  First, throwing the 
> Python exception suggested in #1 above would require a significant 
> amount of new code.  (We have code to create an exception out of 
> ErrorData, but no code to make one up from nothing.)  This would need 
> further thought on how to arrange the code sensibly.  Second, calling 
> AbortOutOfAnyTransaction() appears to clean up so much stuff that even 
> the data needed for error handling in PL/Python is wiped out.

Yeah, it's messy.

> One way to way to address the first problem is to not handle the "cannot 
> commit while a subtransaction is active" and similar cases manually but 
> let _SPI_commit() throw the error and then check in PL/Python for the 
> error code ERRCODE_INVALID_TRANSACTION_TERMINATION.  (The code in 
> _SPI_commit() is there after all explicitly for PLs, so if we're not 
> using it, then we're doing it wrong.)

TBH, I've thought from day one that _SPI_commit and friends are unusably
simplistic, precisely because they throw all this error-recovery
complexity onto the caller, and provide no tools to handle it without
dropping down to a lower level of abstraction.

> But in any case, for either implementation it seems then you'd get 
> different error behavior from .commit and .rollback calls depending on 
> the specific error, which seems weird.

Well, I think we *have to* do that.  The INVALID_TRANSACTION_TERMINATION
errors mean that we're in some kind of atomic execution context that
we mustn't destroy.  Other errors mean that the commit failed, and that
has to be cleaned up somehow.

(Note: there is also an INVALID_TRANSACTION_TERMINATION error in
HoldPinnedPortals, which is now seeming like a mistake to me; we should
change that to some other errcode, perhaps.  There are no others
besides _SPI_commit/_SPI_rollback.)

> Altogether, maybe the response to

>>> The existing test cases apparently fail to trip over that
>>> because Python just throws the exception again, but what if someone
>>> writes a plpython function that catches the exception and then tries
>>> to perform new SPI actions?

> perhaps should be, don't do that then?

That seems far south of acceptable to me.  I experimented with the
attached script, which in HEAD just results in aborting the Python
script -- not good, but at least the general state of the session
is okay.  With your patch, I get

INFO:  sqlstate: 23503
INFO:  after exception
WARNING:  buffer refcount leak: [8760] (rel=base/16384/38401, blockNum=0, 
flags=0x9380, refcount=1 1)
WARNING:  relcache reference leak: relation "testfk" not closed
WARNING:  relcache reference leak: relation "testpk" not closed
WARNING:  TupleDesc reference leak: TupleDesc 0x7f3a12e0de80 (38403,-1) still 
referenced
WARNING:  snapshot 0x1cba150 still active
DO
 f1 

  0
  1
(2 rows)

So aside from all the internal problems, we've actually managed to commit
an invalid database state.  I do not find that OK.

I think that maybe we could get somewhere by having _SPI_commit/rollback
work like this:

1. Throw the INVALID_TRANSACTION_TERMINATION errors if appropriate.
The caller can catch those and proceed if desired, knowing that the
transaction system is in the same state it was.

2. Use a PG_TRY block to do the commit or rollback.  On error,
roll back (knowing that we are not in a subtransaction) and then
re-throw the error.

If the caller catches an error other than INVALID_TRANSACTION_TERMINATION,
it can proceed, but it's still on the hook to do SPI_start_transaction
before it attempts any new database access (just as it would be if
there had not been an error).

We could provide a simplified API in which SPI_start_transaction is
automatically included for either success or failure, so that the caller
doesn't have the delayed-cleanup responsibility.  I'm not actually sure
that there is any situation where that couldn't be done, but putting it
into the existing functions would be a API break (... unless we turn
SPI_start_transaction into a no-op, which might be OK?)  Basically this'd
be making the behavior of SPI_commit_and_chain the norm, except you'd
have the option of reverting to default transaction settings instead
of the previous ones.

So basically where we'd end up is that plpython would do about what
you show in your patch, but then there's additional work needed in
spi.c so that we're not leaving the rest of the system in a bad state.

regards, tom lane

create extension if not exists plpython3u;

drop table if exists testpk, testfk;

create table testpk (id int primary key);

create table testfk(f1 int references testpk deferrable initially deferred);

DO
LANGUAGE plpython3u
$$
  plpy.execute("INSERT INTO testfk VALUES (0)")
  try:
plpy.commit()
  except Exception as e:
plpy.info('sqlstate: %s' % (e.sqlstate))
  plpy.info('after exception')
  plpy.execute("INSERT INTO testpk VALUES (1)")
  plpy.execute("INSERT INTO testfk VALUES 

WaitLatchOrSocket seems to not count to 4 right...

2022-02-07 Thread Greg Stark
Unless I'm misreading this code I think the nevents in
WaitLatchOrSocket should really be 4 not 3. At least there are 4 calls
to AddWaitEventToSet in it and I think it's possible to trigger all 4.

I guess it's based on knowing that nobody would actually set
WL_EXIT_ON_PM_DEATH and WL_POSTMASTER_DEATH on the same waitset?

-- 
greg




Re: style for typedef of function that will be pointed to

2022-02-07 Thread Chapman Flack
On 10/05/21 14:00, Chapman Flack wrote:
> On 10/05/21 13:47, Tom Lane wrote:
>>> An alternative I've sometimes used elsewhere is to typedef the function
>>> type itself, and use the * when declaring a pointer to it:
>>> typedef void Furbinator(char *furbee);
>>
>> Is that legal C?  I doubt that it was before C99 or so.  As noted
>> in the Ghostscript docs you came across, it certainly wouldn't have
>> been portable back in the day.
> 
> It compiles silently for me with gcc --std=c89 -Wpedantic
> 
> I think that's the oldest standard I can ask gcc about. Per the manpage,
> 'c89' is ISO C90 without its amendment 1, and without any gnuisms.

There are some places in the tree where AssertVariableIsOfType is being
cleverly used to achieve the same thing:

void
_PG_output_plugin_init(OutputPluginCallbacks *cb)
{
  AssertVariableIsOfType(&_PG_output_plugin_init, LogicalOutputPluginInit);


void
_PG_archive_module_init(ArchiveModuleCallbacks *cb)
{
  AssertVariableIsOfType(&_PG_archive_module_init, ArchiveModuleInit);


While clever, doesn't it seem like a strained way to avoid just saying:

typedef void ArchiveModuleInit(ArchiveModuleCallbacks *cb);


ArchiveModuleInit _PG_archive_module_init;

void
_PG_archive_module_init(ArchiveModuleCallbacks *cb)
{


if indeed compilers C90 and later are happy with the straight typedef?

Not that one would go changing existing declarations. But perhaps it
could be on the table for new ones?

Regards,
-Chap




Re: [RFC] building postgres with meson - autogenerated headers

2022-02-07 Thread Tom Lane
Andres Freund  writes:
> I've been wondering whether we should try to have the generated pg_config.h
> look as similar as possible to autoconf/autoheader's, or not. And whether the
> way autoconf/autoheader define symbols makes sense when not using either
> anymore.

> To be honest, I do not really understand the logic behind when autoconf ends
> up with #defines that define a macro to 0/1 and when a macro ends defined/or
> not and when we end up with a macro defined to 1 or not defined at all.

Agreed, that always seemed entirely random to me too.  I'd be content
to end up with "defined or not defined" as the standard.  I think
we have way more #ifdef tests than #if tests, so changing the latter
seems more sensible than changing the former.

> A second aspect that I'm wondering about is whether we should try to split
> pg_config.h output a bit:

TBH I can't get excited about that.  I do not think that rebuilding
with different options is a critical path.  ccache already does most
of the heavy lifting when you do that sort of thing, anyway.

regards, tom lane




Re: Synchronizing slots from primary to standby

2022-02-07 Thread Andres Freund
Hi,

On 2022-01-03 14:46:52 +0100, Peter Eisentraut wrote:
> +static void
> +ApplyLauncherStartSlotSync(TimestampTz *last_start_time, long *wait_time)
> +{
> [...]
> +
> + foreach(lc, slots)
> + {
> + WalRecvReplicationSlotData *slot_data = lfirst(lc);
> + LogicalRepWorker *w;
> +
> + if (!OidIsValid(slot_data->database))
> + continue;
> +
> + LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
> + w = logicalrep_worker_find(slot_data->database, InvalidOid,
> +InvalidOid, 
> false);
> + LWLockRelease(LogicalRepWorkerLock);
> +
> + if (w == NULL)
> + {
> + *last_start_time = now;
> + *wait_time = wal_retrieve_retry_interval;
> +
> + logicalrep_worker_launch(slot_data->database, 
> InvalidOid, NULL,
> +  
> BOOTSTRAP_SUPERUSERID, InvalidOid);

Do we really need a dedicated worker for each single slot? That seems
excessively expensive.


> +++ b/src/backend/replication/logical/reorderbuffer.c
> [...]
> +static void
> +wait_for_standby_confirmation(XLogRecPtr commit_lsn)
> +{
> + char   *rawname;
> + List   *namelist;
> + ListCell   *lc;
> + XLogRecPtr  flush_pos = InvalidXLogRecPtr;
> +
> + if (strcmp(standby_slot_names, "") == 0)
> + return;
> +
> + rawname = pstrdup(standby_slot_names);
> + SplitIdentifierString(rawname, ',', );
> +
> + while (true)
> + {
> + int wait_slots_remaining;
> + XLogRecPtr  oldest_flush_pos = InvalidXLogRecPtr;
> + int rc;
> +
> + wait_slots_remaining = list_length(namelist);
> +
> + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
> + for (int i = 0; i < max_replication_slots; i++)
> + {
> + ReplicationSlot *s = 
> >replication_slots[i];
> + boolinlist;
> +
> + if (!s->in_use)
> + continue;
> +
> + inlist = false;
> + foreach (lc, namelist)
> + {
> + char *name = lfirst(lc);
> + if (strcmp(name, NameStr(s->data.name)) == 0)
> + {
> + inlist = true;
> + break;
> + }
> + }
> + if (!inlist)
> + continue;
> +
> + SpinLockAcquire(>mutex);

It doesn't seem like a good idea to perform O(max_replication_slots *
#standby_slot_names) work on each decoded commit. Nor to
SplitIdentifierString(pstrdup(standby_slot_names)) every time.


> + if (s->data.database == InvalidOid)
> + /* Physical slots advance restart_lsn on flush 
> and ignore confirmed_flush_lsn */
> + flush_pos = s->data.restart_lsn;
> + else
> + /* For logical slots we must wait for commit 
> and flush */
> + flush_pos = s->data.confirmed_flush;
> +
> + SpinLockRelease(>mutex);
> +
> + /* We want to find out the min(flush pos) over all 
> named slots */
> + if (oldest_flush_pos == InvalidXLogRecPtr
> + || oldest_flush_pos > flush_pos)
> + oldest_flush_pos = flush_pos;
> +
> + if (flush_pos >= commit_lsn && wait_slots_remaining > 0)
> + wait_slots_remaining --;
> + }
> + LWLockRelease(ReplicationSlotControlLock);
> +
> + if (wait_slots_remaining == 0)
> + return;
> +
> + rc = WaitLatch(MyLatch,
> +WL_LATCH_SET | WL_TIMEOUT | 
> WL_POSTMASTER_DEATH,
> +1000L, PG_WAIT_EXTENSION);

I don't think it's a good idea to block here like this - no walsender specific
handling is going to happen. E.g. not sending replies to receivers will cause
them to time out.   And for the SQL functions this will cause blocking even
though the interface expects to return when reaching the end of the WAL -
which this pretty much is.


I think this needs to be restructured so that you only do the checking of the
"up to this point" position when needed, rather than every commit. We already
*have* a check for not replaying further than the flushed WAL position, see
the GetFlushRecPtr() calls in WalSndWaitForWal(),
pg_logical_slot_get_changes_guts(). I think you'd basically need to 

Re: Synchronizing slots from primary to standby

2022-02-07 Thread Andres Freund
Hi,

On 2022-02-07 13:38:38 +0530, Ashutosh Sharma wrote:
> Are you talking about this scenario - what if the logical replication
> slot on the publisher is dropped, but is being referenced by the
> standby where the slot is synchronized?

It's a bit hard to say, because neither in this thread nor in the patch I've
found a clear description of what the syncing needs to & tries to
guarantee. It might be that that was discussed in one of the precursor
threads, but...

Generally I don't think we can permit scenarios where a slot can be in a
"corrupt" state, i.e. missing required catalog entries, after "normal"
administrative commands (i.e. not mucking around in catalog entries / on-disk
files). Even if the sequence of commands may be a bit weird. All such cases
need to be either prevented or detected.


As far as I can tell, the way this patch keeps slots on physical replicas
"valid" is solely by reorderbuffer.c blocking during replay via
wait_for_standby_confirmation().

Which means that if e.g. the standby_slot_names GUC differs from
synchronize_slot_names on the physical replica, the slots synchronized on the
physical replica are not going to be valid.  Or if the primary drops its
logical slots.


> Should the redo function for the drop replication slot have the capability
> to drop it on standby and its subscribers (if any) as well?

Slots are not WAL logged (and shouldn't be).

I think you pretty much need the recovery conflict handling infrastructure I
referenced upthread, which recognized during replay if a record has a conflict
with a slot on a standby.  And then ontop of that you can build something like
this patch.

Greetings,

Andres Freund




Re: [PATCH] nodeindexscan with reorder memory leak

2022-02-07 Thread Alexander Korotkov
Hi!

On Mon, Feb 7, 2022 at 11:42 AM Aliaksandr Kalenik  wrote:
> Thanks for your review!
>
> On Sun, Jan 30, 2022 at 7:24 PM Tom Lane  wrote:
> > Actually, that code has got worse problems than that.  I tried to improve
> > our regression tests to exercise that code path, as attached.  What I got
> > was
> >
> > +SELECT point(x,x), (SELECT circle_center(f1) FROM gcircle_tbl ORDER BY f1 
> > <-> p
> > oint(x,x) LIMIT 1) as c FROM generate_series(0,1000,1) x;
> > +ERROR:  index returned tuples in wrong order
>
> I tried to figure out what is the problem with this query. This error
> happens when actual distance is less than estimated distance.
> For this specific query it happened while comparing these values:
> 50.263279680219099532223481219262 (actual distance returned by
> dist_cpoint in geo_ops.c) and 50.263279680219113743078196421266
> (bounding box distance returned by computeDistance in gistproc.c).
>
> So for me it looks like this error is not really related to KNN scan
> code but to some floating-arithmetic issue in distance calculation
> functions for geometry type.Would be great to figure out a fix but
> for now I didn’t manage to find a better way than comparing the
> difference of distance with FLT_EPSILON which definitely doesn't seem
> like the way to fix :(

Probably, this is caused by some compiler optimization.  Could you
re-check the issue with different compilers and optimization levels?

Regarding the memory leak, could you add a corresponding regression
test to the patch (probably similar to Tom's query upthread)?

--
Regards,
Alexander Korotkov




Re: Documentation about PL transforms

2022-02-07 Thread Chapman Flack
On 02/07/22 10:59, Peter Eisentraut wrote:
> On 05.02.22 00:55, Chapman Flack wrote:
>> I'm thinking plhandler.sgml is the only place that really needs to be
>> ...
>> worth mentioning there, though, that it isn't possible to develop
>> transforms for an arbitrary PL unless that PL applies transforms.)
> 
> makes sense
> 
>> (argument_type [, ...]) even though check_transform_function will reject
>> any argument list of length other than 1 or with type other than internal.
> 
> That could be corrected.

I'll work on some doc patches.

>> As long as a PL handler has the sole responsibility for invoking
>> its transforms, I wonder if it would make sense to allow a PL to
>> define what its transforms should look like, maybe replacing
>> check_transform_function with a transform_validator for the PL.
> 
> Maybe.  This kind of thing would mostly be driven what a PL wants in actual
> practice, and then how that could be generalized to many/all PLs.

It has since occurred to me that another benefit of having a
transform_validator per PL would be immediate error reporting
if someone, for whatever reason, tries out CREATE TRANSFORM
for a PL that doesn't grok transforms.

Regards,
-Chap




Re: [RFC] building postgres with meson - autogenerated headers

2022-02-07 Thread Andres Freund
Hi,

I've been wondering whether we should try to have the generated pg_config.h
look as similar as possible to autoconf/autoheader's, or not. And whether the
way autoconf/autoheader define symbols makes sense when not using either
anymore.

To be honest, I do not really understand the logic behind when autoconf ends
up with #defines that define a macro to 0/1 and when a macro ends defined/or
not and when we end up with a macro defined to 1 or not defined at all.

So far I've tried to mirror the logic, but not the description / comment
formatting of the individual macros.

The "defined to 1 or not defined at all" behaviour is a mildly awkward to
achieve with meson, because it doesn't match the behaviour for booleans
options meson has (there are two builtin behaviours, one to define/undefine a
macro, the other to set the macro to 0/1. But there's none that defines a
macro to 1 or undefines it).

Probably best to initially have the macros defined as similar as reasonably
possible, but subsequently clean things up a bit.


A second aspect that I'm wondering about is whether we should try to split
pg_config.h output a bit:

With meson it's easy to change options like whether to build with some
dependency in an existing build tree and then still get a reliable build
result (ninja rebuilds if the commandline changed from the last invocation).

But right now doing so often ends up with way bigger rebuilds than necessary,
because for a lot of options we add #define USE_LDAP 1 etc to pg_config.h,
which of course requires rebuilding a lot of files. Even though most of these
symbols are only checked in a handful of files, often only .c files.

It seems like it might make sense to separate out defines that depend on the
compiler / "standard libraries" (e.g. {SIZEOF,ALIGNOF}_*,
HAVE_DECL_{STRNLEN,...}, HAVE_*_H) from feature defines (like
USE_{LDAP,ICU,...}). The header containing the latter could then included in
the places needing it (or we could have one header for each of the places
using it).

Perhaps we should also separate out configure-time settings like BLCKSZ,
DEF_PGPORT, etc. Realistically most of them are going to require a "full tree"
recompile anway, but it seems like it might make things easier to understand.

I think a split into pg_config_{platform,features,settings}.h could make sense.

Similar to above, it's probably best to do this separately after merging meson
support. But knowing what the split should eventually look like would be
helpful before, to ensure it's easy to do.

Greetings,

Andres Freund




Re: [BUG]Update Toast data failure in logical replication

2022-02-07 Thread Andres Freund
Hi,

On 2022-02-07 08:44:00 +0530, Amit Kapila wrote:
> Right, and it is getting changed. We are just printing the first 200
> characters (by using SQL [1]) from the decoded tuple so what is shown
> in the results is the initial 200 bytes.

Ah, I knew I must have been missing something.


> The complete decoded data after the patch is as follows:

Hm. I think we should change the way the strings are shortened - otherwise we
don't really verify much in that test. Perhaps we could just replace the long
repetitive strings with something shorter in the output?

E.g. using something like regexp_replace(data, '(1234567890|9876543210){200}', 
'\1{200}','g')
inside the substr().

Wonder if we should deduplicate the number of different toasted strings in the
file to something that'd allow us to have a single "redact_toast" function or
such. There's too many different ones to have a reasonbly simple redaction
function right now. But that's perhaps better done separately. 

Greetings,

Andres Freund




Re: should vacuum's first heap pass be read-only?

2022-02-07 Thread Peter Geoghegan
Yes, that's what I meant. That's always how I thought that it would work,
for over a year now. I might have jumped to the conclusion that that's what
you had in mind all along. Oops.

Although this design is simpler, which is an advantage, that's not really
the point. The  point is that it makes sense, and that extra concurrent
with pruning heap vacuuming doesn't seem useful at all.
-- 
Peter Geoghegan


Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit

2022-02-07 Thread Fujii Masao




On 2022/02/07 14:35, Etsuro Fujita wrote:

0001 patch failed to be applied. Could you rebase the patch?


Done.  Attached is an updated version of the patch set.


Thanks for updating the patch! Here are the review comments for 0001 patch.

I got the following compiler warning.

[16:58:07.120] connection.c: In function ‘pgfdw_finish_pre_commit_cleanup’:
[16:58:07.120] connection.c:1726:4: error: ISO C90 forbids mixed declarations 
and code [-Werror=declaration-after-statement]
[16:58:07.120]  1726 |PGresult   *res;
[16:58:07.120]   |^~~~


+   /* Ignore errors in the DEALLOCATE (see note above) */
+   if ((res = PQgetResult(entry->conn)) != NULL)

Doesn't PQgetResult() need to be called repeatedly until it returns NULL or the 
connection is lost because there can be more than one messages to receive?


+   if (pending_deallocs)
+   {
+   foreach(lc, pending_deallocs)

If pending_deallocs is NIL, we don't enter this foreach loop. So probably "if 
(pending_deallocs)" seems not necessary.


entry->keep_connections = defGetBoolean(def);
+   if (strcmp(def->defname, "parallel_commit") == 0)
+   entry->parallel_commit = defGetBoolean(def);

Isn't it better to use "else if" here, instead?


+static void do_sql_command_begin(PGconn *conn, const char *sql);
+static void do_sql_command_end(PGconn *conn, const char *sql);

To simplify the code more, I'm tempted to change do_sql_command() so that it 
just calls the above two functions, instead of calling PQsendQuery() and 
pgfw_get_result() directly. Thought? If we do this, probably we also need to 
change do_sql_command_end() so that it accepts boolean flag which specifies 
whether PQconsumeInput() is called or not, as follows.

do_sql_command_end(PGconn *conn, const char *sql, bool consumeInput)
{
/*
* If any data is expected to be available from the socket, consume it.
* ...
* When parallel_commit is enabled, since there can be a time window 
between
* sending query and receiving result, we can expect data is already 
available
* from the socket. In this case we try to consume it at first 
Otherwise..
*/
if (consumeInput && !PQconsumeInput(conn))
...

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-02-07 Thread Peter Geoghegan
On Mon, Feb 7, 2022 at 12:21 PM Robert Haas  wrote:
>
> On Mon, Feb 7, 2022 at 11:43 AM Peter Geoghegan  wrote:
> > > That's because, if VACUUM is only ever getting triggered by XID
> > > age advancement and not by bloat, there's no opportunity for your
> > > patch set to advance relfrozenxid any sooner than we're doing now.
> >
> > We must distinguish between:
> >
> > 1. "VACUUM is fundamentally never going to need to run unless it is
> > forced to, just to advance relfrozenxid" -- this applies to tables
> > like the stock and customers tables from the benchmark.
> >
> > and:
> >
> > 2. "VACUUM must sometimes run to mark newly appended heap pages
> > all-visible, and maybe to also remove dead tuples, but not that often
> > -- and yet we current only get expensive and inconveniently timed
> > anti-wraparound VACUUMs, no matter what" -- this applies to all the
> > other big tables in the benchmark, in particular to the orders and
> > order lines tables, but also to simpler cases like pgbench_history.
>
> It's not really very understandable for me when you refer to the way
> table X behaves in Y benchmark, because I haven't studied that in
> enough detail to know. If you say things like insert-only table, or a
> continuous-random-updates table, or whatever the case is, it's a lot
> easier to wrap my head around it.

What I've called category 2 tables are the vast majority of big tables
in practice. They include pure append-only tables, but also tables
that grow and grow from inserts, but also have some updates. The point
of the TPC-C order + order lines examples was to show how broad the
category really is. And how mixtures of inserts and bloat from updates
on one single table confuse the implementation in general.

> > Does that make sense? It's pretty subtle, admittedly, and you no doubt
> > have (very reasonable) concerns about the extremes, even if you accept
> > all that. I just want to get the general idea across here, as a
> > starting point for further discussion.
>
> Not really. I think you *might* be saying tables which currently get
> only wraparound vacuums will end up getting other kinds of vacuums
> with your patch because things will improve enough for other tables in
> the system that they will be able to get more attention than they do
> currently.

Yes, I am.

> But I'm not sure I am understanding you correctly, and even
> if I am I don't understand why that would be so, and even if it is I
> think it doesn't help if essentially all the tables in the system are
> suffering from the problem.

When I say "relfrozenxid advancement has been qualitatively improved
by the patch", what I mean is that we are much closer to a rate of
relfrozenxid advancement that is far closer to the theoretically
optimal rate for our current design, with freezing and with 32-bit
XIDs, and with the invariants for freezing.

Consider the extreme case, and generalize. In the simple append-only
table case, it is most obvious. The final relfrozenxid is very close
to OldestXmin (only tiny noise level differences appear), regardless
of XID consumption by the system in general, and even within the
append-only table in particular. Other cases are somewhat trickier,
but have roughly the same quality, to a surprising degree. Lots of
things that never really should have affected relfrozenxid to begin
with do not, for the first time.

-- 
Peter Geoghegan




Re: Storage for multiple variable-length attributes in a single row

2022-02-07 Thread Esteban Zimanyi
Dear David

There are two approaches for storing temporal information in a relational
database, explored since the 1980s following the work of Richard Snodgrass
http://www2.cs.arizona.edu/~rts/publications.html
tuple-timestamping vs attribute-timestamping.  The SQL standard used the
tuple-timestamping approach, but in MobilityDB we decided to use the
attribute-timestamping approach. As you rightly pointed out,
tuple-timestamping follows the traditional relational normalization theory.

The main advantage of the attribute timestamping for mobility data is that
we need only to store the changes of values for a temporal attribute. In
the example of gear for a car, even if we receive high-frequency
observations, there will be very little gear changes for a trip, while
there will be much more position changes. Therefore on MobilityDB we only
store the change of values  (e.g., no change of position will be stored
during a red light or traffic jam), which constitutes a huge lossless
compression with respect to the raw format storing every observation in a
single row. We have experimented 450% lossless compression for real IoT
data.

In addition, MobilityDB does all temporal operations and allows to
determine the value of any temporal attribute at any timestamp (e.g., using
linear interpolation between observations for speed or GPS position),
independently of the actual stored observations.

I hope this clarifies things a little.


Re: [PATCH v2] use has_privs_for_role for predefined roles

2022-02-07 Thread Joshua Brindle
On Mon, Feb 7, 2022 at 12:09 PM Robert Haas  wrote:
>
> On Mon, Feb 7, 2022 at 11:13 AM Joe Conway  wrote:
> > Easily worked around with one additional level of role:
>
> Interesting.
>
> > > But in the absence of that, it seems clearly better for predefined
> > > roles to disregard INHERIT and just always grant the rights they are
> > > intended to give. Because if we don't do that, then we end up with
> > > people having to SET ROLE to the predefined role and perform actions
> > > directly as that role, which seems like it can't be what we want. I
> > > almost feel like we ought to be looking for ways of preventing people
> > > from doing SET ROLE to a predefined role altogether, not encouraging
> > > them to do it.
> > I disagree with this though.
> >
> > It is confusing and IMHO dangerous that the predefined roles currently
> > work differently than regular roles eith respect to privilege inheritance.
>
> I feel like that's kind of a conclusory statement, as opposed to
> making an argument. I mean that this tells me something about how you
> feel, but it doesn't really help me understand why you feel that way.
>
> I suppose one argument in favor of your position is that if it
> happened to be sri who was granted a predefined role, sunita would
> inherit the rest of sr's privileges only with SET ROLE, but the
> predefined role either way (IIUC, which I might not). If that's so,
> then I guess I see the point, but I'm still sort of inclined to think
> we're just trading one set of problems in for a different set. I just
> have such a hard time imaging anyone using NOINHERIT in anger and
> being happy with the result
>

IMO this is inarguably a plain bug. The inheritance system works one
way for pre-defined roles and another way for other roles - and the
difference isn't even documented.

The question is whether there is a security issue warranting back
patching, which is a bit of a tossup I think. According to git history
it's always worked this way, and the possible breakage of pre-existing
clusters seems maybe not worth it.




Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-02-07 Thread Robert Haas
On Mon, Feb 7, 2022 at 11:43 AM Peter Geoghegan  wrote:
> > That's because, if VACUUM is only ever getting triggered by XID
> > age advancement and not by bloat, there's no opportunity for your
> > patch set to advance relfrozenxid any sooner than we're doing now.
>
> We must distinguish between:
>
> 1. "VACUUM is fundamentally never going to need to run unless it is
> forced to, just to advance relfrozenxid" -- this applies to tables
> like the stock and customers tables from the benchmark.
>
> and:
>
> 2. "VACUUM must sometimes run to mark newly appended heap pages
> all-visible, and maybe to also remove dead tuples, but not that often
> -- and yet we current only get expensive and inconveniently timed
> anti-wraparound VACUUMs, no matter what" -- this applies to all the
> other big tables in the benchmark, in particular to the orders and
> order lines tables, but also to simpler cases like pgbench_history.

It's not really very understandable for me when you refer to the way
table X behaves in Y benchmark, because I haven't studied that in
enough detail to know. If you say things like insert-only table, or a
continuous-random-updates table, or whatever the case is, it's a lot
easier to wrap my head around it.

> Does that make sense? It's pretty subtle, admittedly, and you no doubt
> have (very reasonable) concerns about the extremes, even if you accept
> all that. I just want to get the general idea across here, as a
> starting point for further discussion.

Not really. I think you *might* be saying tables which currently get
only wraparound vacuums will end up getting other kinds of vacuums
with your patch because things will improve enough for other tables in
the system that they will be able to get more attention than they do
currently. But I'm not sure I am understanding you correctly, and even
if I am I don't understand why that would be so, and even if it is I
think it doesn't help if essentially all the tables in the system are
suffering from the problem.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Storage for multiple variable-length attributes in a single row

2022-02-07 Thread Julien Rouhaud
On Mon, Feb 07, 2022 at 10:10:53AM -0700, David G. Johnston wrote:
> On Mon, Feb 7, 2022 at 9:58 AM Esteban Zimanyi 
> wrote:
> 
> >
> > As suggested by David, this goes beyond the "traditional" usage of
> > PostgreSQL. Therefore my questions are
> > * What is the suggested strategy to splitting these 2K attributes into
> > vertically partitioned tables where the tables are linked by the primary
> > key (e.g. trip number in the example above). Are there any limitations/best
> > practices in the number/size of TOASTED attributes that a table should
> > contain.
> > * In each partitioned table containing N TOASTED attributes, given the
> > above requirements, are there any limitations/best practices in storing
> > them using extended storage or an alternative one such as external.
> >
> >
> Frankly, the best practice is "don't have that many columns".  Since you
> do, I posit that you are just going to have to make decisions (possibly
> with experimentation) on your own.  Or maybe ask around on a MobilityDB
> forum what people using that tool and having these kinds of data structures
> do.  From a core PostgreSQL perspective you've already deviated from the
> model structures that it was designed with in mind.
> I'm really confused that you'd want the data value itself to contain a
> timestamp that, on a per-row basis, should be the same timestamp that every
> other value on the row has.  Extracting the timestamp to it own column and
> using simpler and atomic data types is how core PostgreSQL and the
> relational model normalization recommend dealing with this situation.  Then
> you just break up the attributes of a similar nature into their own tables
> based upon their shared nature.  In almost all cases relying on "main"
> storage.

Actually looking at the original example:

>CREATE TYPE tint (
>  internallength = variable,
>  [...]
>  storage = extended,
>  alignment = double,
>  [...]
>);

I'm wondering if it's just some miscommunication here.  If the tint data type
only needs to hold a timestamp and an int, I don't see why it would be
varlerna at all.

So if a single tint can internally hold thousands of (int, timestamptz), a bit
like pgpointcloud, then having it by default external (so both possibly
out-of-line and compressed) seems like a good idea, as you can definitely hit
the 8k boundary, it should compress nicely and you also avoid some quite high
tuple header overhead.




Re: Storage for multiple variable-length attributes in a single row

2022-02-07 Thread David G. Johnston
On Mon, Feb 7, 2022 at 9:58 AM Esteban Zimanyi 
wrote:

>
> As suggested by David, this goes beyond the "traditional" usage of
> PostgreSQL. Therefore my questions are
> * What is the suggested strategy to splitting these 2K attributes into
> vertically partitioned tables where the tables are linked by the primary
> key (e.g. trip number in the example above). Are there any limitations/best
> practices in the number/size of TOASTED attributes that a table should
> contain.
> * In each partitioned table containing N TOASTED attributes, given the
> above requirements, are there any limitations/best practices in storing
> them using extended storage or an alternative one such as external.
>
>
Frankly, the best practice is "don't have that many columns".  Since you
do, I posit that you are just going to have to make decisions (possibly
with experimentation) on your own.  Or maybe ask around on a MobilityDB
forum what people using that tool and having these kinds of data structures
do.  From a core PostgreSQL perspective you've already deviated from the
model structures that it was designed with in mind.
I'm really confused that you'd want the data value itself to contain a
timestamp that, on a per-row basis, should be the same timestamp that every
other value on the row has.  Extracting the timestamp to it own column and
using simpler and atomic data types is how core PostgreSQL and the
relational model normalization recommend dealing with this situation.  Then
you just break up the attributes of a similar nature into their own tables
based upon their shared nature.  In almost all cases relying on "main"
storage.

David J.


Re: [PATCH v2] use has_privs_for_role for predefined roles

2022-02-07 Thread Robert Haas
On Mon, Feb 7, 2022 at 11:13 AM Joe Conway  wrote:
> Easily worked around with one additional level of role:

Interesting.

> > But in the absence of that, it seems clearly better for predefined
> > roles to disregard INHERIT and just always grant the rights they are
> > intended to give. Because if we don't do that, then we end up with
> > people having to SET ROLE to the predefined role and perform actions
> > directly as that role, which seems like it can't be what we want. I
> > almost feel like we ought to be looking for ways of preventing people
> > from doing SET ROLE to a predefined role altogether, not encouraging
> > them to do it.
> I disagree with this though.
>
> It is confusing and IMHO dangerous that the predefined roles currently
> work differently than regular roles eith respect to privilege inheritance.

I feel like that's kind of a conclusory statement, as opposed to
making an argument. I mean that this tells me something about how you
feel, but it doesn't really help me understand why you feel that way.

I suppose one argument in favor of your position is that if it
happened to be sri who was granted a predefined role, sunita would
inherit the rest of sr's privileges only with SET ROLE, but the
predefined role either way (IIUC, which I might not). If that's so,
then I guess I see the point, but I'm still sort of inclined to think
we're just trading one set of problems in for a different set. I just
have such a hard time imaging anyone using NOINHERIT in anger and
being happy with the result

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: libpq async duplicate error results

2022-02-07 Thread Tom Lane
[ cc'ing Alvaro for pipeline questions ]

Fabien COELHO  writes:
>> It is annoying that some of the text is duplicated in the second
>> report, but in the end that's cosmetic, and I'm not sure we can
>> improve it without breaking other things.  In particular, we
>> can't just think about what comes back in the PGgetResult calls,
>> we also have to consider what PQerrorMessage(conn) will report
>> afterwards.  So I don't think that resetting conn->errorMessage
>> during a PQgetResult series would be a good fix.
>> 
>> An idea that I don't have time to pursue right now is to track
>> how much of conn->errorMessage has been read out by PQgetResult,
>> and only report newly-added text in later PQgetResult calls of
>> a series, while PQerrorMessage would continue to report the
>> whole thing.  Buffer resets would occur where they do now.

> I agree that the message reset is not convincing, but it was the only way 
> to get the expected behavior without changing the existing functions.

I spent a bit more time poking at this issue.  I no longer like the
idea of reporting only the tail part of conn->errorMessage; I'm
afraid that would convert "edge cases sometimes return redundant text"
into "edge cases sometimes miss returning text at all", which is not
an improvement.

In any case, the particular example we're looking at here is connection
loss, which is not something that should greatly concern us as far as
pipeline semantics are concerned, because you're not getting any more
pipelined results anyway if that happens.  In the non-I/O-error case,
I see that PQgetResult does a hacky "resetPQExpBuffer(>errorMessage)"
in between pipelined queries.  It seems like if there are real usability
issues in this area, they probably come from that not being well placed.
In particular, I wonder why that's done with the pqPipelineProcessQueue
call in the PGASYNC_IDLE stanza, yet not with the pqPipelineProcessQueue
call in the PGASYNC_READY stanza (where we're returning a PIPELINE_SYNC
result).  It kinda looks to me like maybe pqPipelineProcessQueue
ought to have the responsibility for doing that, rather than having
two out of the three call sites do it.  Also it would seem more natural
to associate it with the reinitialization that happens inside
pqPipelineProcessQueue.

regards, tom lane




Re: Storage for multiple variable-length attributes in a single row

2022-02-07 Thread Esteban Zimanyi
Many thanks for your prompt reply David. Allow me then to restate the
questions, hoping that this better fits this mailing list.

MobilityDB is a time-series extension to PostgreSQL/PostGIS in which
time-varying attributes (e.g., gear, GPS location of a car) are
semantically grouped into "units" (e.g., a trip of a car) and are stored as
temporal functions, e.g., a set of couples (integer, timestamptz) for gear
(a temporal integer) or a set of triples (lon, lat, timestamptz) for the
GPS location (a temporal point). All temporal types are stored using
extended format, e.g.,
CREATE TYPE tint (
  internallength = variable,
  [...]
  storage = extended,
  alignment = double,
  [...]
);
When ingesting mobility (IoT) data into MobilityDB we receive very wide (2K
attributes) of high frequency (every tenth of a second) from flat format
(e.g. CSV) and we need to store it in PostgreSQL tables using MobilityDB
temporal types. In the above scenario, the values of these temporal types
can be very wide (on average 30K timestamped couples/triples per trip).

As suggested by David, this goes beyond the "traditional" usage of
PostgreSQL. Therefore my questions are
* What is the suggested strategy to splitting these 2K attributes into
vertically partitioned tables where the tables are linked by the primary
key (e.g. trip number in the example above). Are there any limitations/best
practices in the number/size of TOASTED attributes that a table should
contain.
* In each partitioned table containing N TOASTED attributes, given the
above requirements, are there any limitations/best practices in storing
them using extended storage or an alternative one such as external.

Many thanks for your insight

Esteban


Re: Make relfile tombstone files conditional on WAL level

2022-02-07 Thread Robert Haas
On Mon, Feb 7, 2022 at 11:31 AM Dilip Kumar  wrote:
> For RelFileNode also we need to use 2, 32-bit integers so that we do
> not add extra alignment padding because there are a few more
> structures that include RelFileNode e.g. xl_xact_relfilenodes,
> RelFileNodeBackend, and many other structures.

Are you sure that kind of stuff is really important enough to justify
the code churn? I don't think RelFileNodeBackend is used widely enough
or in sufficiently performance-critical places that we really need to
care about a few bytes of alignment padding. xl_xact_relfilenodes is
more concerning because that goes into the WAL format, but I don't
know that we use it often enough for an extra 4 bytes per record to
really matter, especially considering that this proposal also adds 4
bytes *per relfilenode* which has to be a much bigger deal than a few
padding bytes after 'nrels'. The reason why BufferTag matters a lot is
because (1) we have an array of this struct that can easily contain a
million or eight entries, so the alignment padding adds up a lot more
and (2) access to that array is one of the most performance-critical
parts of PostgreSQL.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-02-07 Thread Peter Geoghegan
On Mon, Feb 7, 2022 at 10:08 AM Robert Haas  wrote:
> But ... if I'm not mistaken, in the kind of case that Greg is
> describing, relfrozenxid will be advanced exactly as often as it is
> today.

But what happens today in a scenario like Greg's is pathological,
despite being fairly common (common in large DBs). It doesn't seem
informative to extrapolate too much from current experience for that
reason.

> That's because, if VACUUM is only ever getting triggered by XID
> age advancement and not by bloat, there's no opportunity for your
> patch set to advance relfrozenxid any sooner than we're doing now.

We must distinguish between:

1. "VACUUM is fundamentally never going to need to run unless it is
forced to, just to advance relfrozenxid" -- this applies to tables
like the stock and customers tables from the benchmark.

and:

2. "VACUUM must sometimes run to mark newly appended heap pages
all-visible, and maybe to also remove dead tuples, but not that often
-- and yet we current only get expensive and inconveniently timed
anti-wraparound VACUUMs, no matter what" -- this applies to all the
other big tables in the benchmark, in particular to the orders and
order lines tables, but also to simpler cases like pgbench_history.

As I've said a few times now, the patch doesn't change anything for 1.
But Greg's problem tables very much sound like they're from category
2. And what we see with the master branch for such tables is that they
always get anti-wraparound VACUUMs, past a certain size (depends on
things like exact XID rate and VACUUM settings, the insert-driven
autovacuum scheduling stuff matters). While the patch never reaches
that point in practice, during my testing -- and doesn't come close.

It is true that in theory, as the size of ones of these "category 2"
tables tends to infinity, the patch ends up behaving the same as
master anyway. But I'm pretty sure that that usually doesn't matter at
all, or matters less than you'd think. As I emphasized when presenting
the recent v7 TPC-C benchmark, neither of the two "TPC-C big problem
tables" (which are particularly interesting/tricky examples of tables
from category 2) come close to getting an anti-wraparound VACUUM
(plus, as I said in the same email, wouldn't matter if they did).

> So I think that people in this kind of situation will potentially be
> helped or hurt by other things the patch set does, but the eager
> relfrozenxid stuff won't make any difference for them.

To be clear, I think it would if everything was in place, including
the basic relfrozenxid advancement thing, plus the new freezing stuff
(though you wouldn't need the experimental FSM thing to get this
benefit).

Here is a thought experiment that may make the general idea a bit clearer:

Imagine I reran the same benchmark as before, with the same settings,
and the expectation that everything would be the same as first time
around for the patch series. But to make things more interesting, this
time I add an adversarial element: I add an adversarial gizmo that
burns XIDs steadily, without doing any useful work. This gizmo doubles
the rate of XID consumption for the database as a whole, perhaps by
calling "SELECT txid_current()" in a loop, followed by a timed sleep
(with a delay chosen with the goal of doubling XID consumption). I
imagine that this would also burn CPU cycles, but probably not enough
to make more than a noise level impact -- so we're severely stressing
the implementation by adding this gizmo, but the stress is precisely
targeted at XID consumption and related implementation details. It's a
pretty clean experiment. What happens now?

I believe (though haven't checked for myself) that nothing important
would change. We'd still see the same VACUUM operations occur at
approximately the same times (relative to the start of the benchmark)
that we saw with the original benchmark, and each VACUUM operation
would do approximately the same amount of physical work on each
occasion. Of course, the autovacuum log output would show that the
OldestXmin for each individual VACUUM operation had larger values than
first time around for this newly initdb'd TPC-C database (purely as a
consequence of the XID burning gizmo), but it would *also* show
*concomitant* increases for our newly set relfrozenxid. The system
should therefore hardly behave differently at all compared to the
original benchmark run, despite this adversarial gizmo.

It's fair to wonder: okay, but what if it was 4x, 8x, 16x? What then?
That does get a bit more complicated, and we should get into why that
is. But for now I'll just say that I think that even that kind of
extreme would make much less difference than you might think -- since
relfrozenxid advancement has been qualitatively improved by the patch
series. It is especially likely that nothing would change if you were
willing to increase autovacuum_freeze_max_age to get a bit more
breathing room -- room to allow the autovacuums to run at their
"natural" times. You 

Re: should vacuum's first heap pass be read-only?

2022-02-07 Thread Robert Haas
On Fri, Feb 4, 2022 at 4:12 PM Peter Geoghegan  wrote:
> I had imagined that we'd
> want to do heap vacuuming in the same way as today with the dead TID
> conveyor belt stuff -- it just might take several VACUUM operations
> until we are ready to do a round of heap vacuuming.

I am trying to understand exactly what you are imagining here. Do you
mean we'd continue to lazy_scan_heap() at the start of every vacuum,
and lazy_vacuum_heap_rel() at the end? I had assumed that we didn't
want to do that, because we might already know from the conveyor belt
that there are some dead TIDs that could be marked unused, and it
seems strange to just ignore that knowledge at a time when we're
scanning the heap anyway. However, on reflection, that approach has
something to recommend it, because it would be somewhat simpler to
understand what's actually being changed. We could just:

1. Teach lazy_scan_heap() that it should add TIDs to the conveyor
belt, if we're using one, unless they're already there, but otherwise
work as today.

2. Teach lazy_vacuum_heap_rel() that it, if there is a conveyor belt,
it should try to clear from the indexes all of the dead TIDs that are
eligible.

3. If there is a conveyor belt, use some kind of magic to decide when
to skip vacuuming some or all indexes. When we skip one or more
indexes, the subsequent lazy_vacuum_heap_rel() can't possibly mark as
unused any of the dead TIDs we found this time, so we should just skip
it, unless somehow there are TIDs on the conveyor belt that were
already ready to be marked unused at the start of this VACUUM, in
which case we can still handle those.

Is this the kind of thing you had in mind?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_upgrade should truncate/remove its logs before running

2022-02-07 Thread Julien Rouhaud
On Mon, Feb 07, 2022 at 11:00:22AM -0500, Andrew Dunstan wrote:
> 
> On 2/6/22 19:39, Tom Lane wrote:
> >
> > I note, though, that there's still not been any email to the buildfarm
> > owners list about this update.
> 
> The announcement was held up in list moderation for 20 hours or so.

I've certainly experienced way more than that in the past.  Are volunteers
needed?




Re: Make relfile tombstone files conditional on WAL level

2022-02-07 Thread Dilip Kumar
On Mon, Feb 7, 2022 at 9:42 PM Robert Haas  wrote:
>
> On Mon, Feb 7, 2022 at 12:26 AM Dilip Kumar  wrote:
> > I have splitted the patch into multiple patches which can be
> > independently committable and easy to review. I have explained the
> > purpose and scope of each patch in the respective commit messages.
>
> Hmm. The parts of this I've looked at seem reasonably clean, but I
> don't think I like the design choice. You're inventing
> RelFileNodeSetFork(), but at present the RelFileNode struct doesn't
> include a fork number. I feel like we should leave that alone, and
> only change the definition of a BufferTag. What about adding accessors
> for all of the BufferTag fields in 0001, and then in 0002 change it to
> look like something this:
>
> typedef struct BufferTag
> {
> Oid dbOid;
> Oid tablespaceOid;
> uint32  fileNode_low;
> uint32  fileNode_hi:24;
> uint32  forkNumber:8;
> BlockNumber blockNumber;
> } BufferTag;

Okay, we can do that.  But we can not leave RelFileNode untouched I
mean inside RelFileNode also we will have to change the relNode as 2
32 bit integers, I mean like below.

> typedef struct RelFileNode
> {
> Oid spcNode;
> Oid dbNode;
> uint32  relNode_low;
> uint32  relNode_hi;
} RelFileNode;

For RelFileNode also we need to use 2, 32-bit integers so that we do
not add extra alignment padding because there are a few more
structures that include RelFileNode e.g. xl_xact_relfilenodes,
RelFileNodeBackend, and many other structures.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Refactoring SSL tests

2022-02-07 Thread Andrew Dunstan

On 2/2/22 14:50, Daniel Gustafsson wrote:
>> On 2 Feb 2022, at 17:09, Andrew Dunstan  wrote:
>> On 2/2/22 08:26, Daniel Gustafsson wrote:
>>> Thoughts?  I'm fairly sure there are many crimes against Perl in this patch,
>>> I'm happy to take pointers on how to improve that.
>> It feels a bit odd to me from a perl POV. I think it needs to more along
>> the lines of standard OO patterns. I'll take a stab at that based on
>> this, might be a few days.
> That would be great, thanks!
>

Here's the result of that surgery.  It's a little incomplete in that it
needs some POD adjustment, but I think the code is right - it passes
testing for me.

One of the advantages of this, apart from being more idiomatic, is that
by avoiding the use of package level variables you can have two
SSL::Server objects, one for OpenSSL and (eventually) one for NSS. This
was the original motivation for the recent install_path additions to
PostgreSQL::Test::Cluster, so it complements that work nicely.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com
From fa9bd034210aae4e11eed463e1a1365231c944f5 Mon Sep 17 00:00:00 2001
From: Andrew Dunstan 
Date: Mon, 7 Feb 2022 11:04:53 -0500
Subject: [PATCH] ssl test refactoring

---
 src/test/ssl/t/001_ssltests.pl| 144 +--
 src/test/ssl/t/002_scram.pl   |  21 +-
 src/test/ssl/t/003_sslinfo.pl |  33 ++-
 src/test/ssl/t/SSL/Backend/OpenSSL.pm | 227 ++
 src/test/ssl/t/SSL/Server.pm  | 332 ++
 src/test/ssl/t/SSLServer.pm   | 219 -
 6 files changed, 647 insertions(+), 329 deletions(-)
 create mode 100644 src/test/ssl/t/SSL/Backend/OpenSSL.pm
 create mode 100644 src/test/ssl/t/SSL/Server.pm
 delete mode 100644 src/test/ssl/t/SSLServer.pm

diff --git a/src/test/ssl/t/001_ssltests.pl b/src/test/ssl/t/001_ssltests.pl
index b1fb15ce80..d5383a58ce 100644
--- a/src/test/ssl/t/001_ssltests.pl
+++ b/src/test/ssl/t/001_ssltests.pl
@@ -8,20 +8,24 @@ use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Test::More;
 
-use File::Copy;
-
 use FindBin;
 use lib $FindBin::RealBin;
 
-use SSLServer;
+use SSL::Server;
 
 if ($ENV{with_ssl} ne 'openssl')
 {
 	plan skip_all => 'OpenSSL not supported by this build';
 }
-else
+
+my $ssl_server = SSL::Server->new();
+sub sslkey
+{
+	return $ssl_server->sslkey(@_);
+}
+sub switch_server_cert
 {
-	plan tests => 110;
+	$ssl_server->switch_server_cert(@_);
 }
 
  Some configuration
@@ -36,39 +40,6 @@ my $SERVERHOSTCIDR = '127.0.0.1/32';
 # Allocation of base connection string shared among multiple tests.
 my $common_connstr;
 
-# The client's private key must not be world-readable, so take a copy
-# of the key stored in the code tree and update its permissions.
-#
-# This changes to using keys stored in a temporary path for the rest of
-# the tests. To get the full path for inclusion in connection strings, the
-# %key hash can be interrogated.
-my $cert_tempdir = PostgreSQL::Test::Utils::tempdir();
-my %key;
-my @keys = (
-	"client.key",   "client-revoked.key",
-	"client-der.key",   "client-encrypted-pem.key",
-	"client-encrypted-der.key", "client-dn.key");
-foreach my $keyfile (@keys)
-{
-	copy("ssl/$keyfile", "$cert_tempdir/$keyfile")
-	  or die
-	  "couldn't copy ssl/$keyfile to $cert_tempdir/$keyfile for permissions change: $!";
-	chmod 0600, "$cert_tempdir/$keyfile"
-	  or die "failed to change permissions on $cert_tempdir/$keyfile: $!";
-	$key{$keyfile} = PostgreSQL::Test::Utils::perl2host("$cert_tempdir/$keyfile");
-	$key{$keyfile} =~ s!\\!/!g if $PostgreSQL::Test::Utils::windows_os;
-}
-
-# Also make a copy of that explicitly world-readable.  We can't
-# necessarily rely on the file in the source tree having those
-# permissions.
-copy("ssl/client.key", "$cert_tempdir/client_wrongperms.key")
-  or die
-  "couldn't copy ssl/client_key to $cert_tempdir/client_wrongperms.key for permission change: $!";
-chmod 0644, "$cert_tempdir/client_wrongperms.key"
-  or die "failed to change permissions on $cert_tempdir/client_wrongperms.key: $!";
-$key{'client_wrongperms.key'} = PostgreSQL::Test::Utils::perl2host("$cert_tempdir/client_wrongperms.key");
-$key{'client_wrongperms.key'} =~ s!\\!/!g if $PostgreSQL::Test::Utils::windows_os;
  Set up the server.
 
 note "setting up data directory";
@@ -83,31 +54,31 @@ $node->start;
 
 # Run this before we lock down access below.
 my $result = $node->safe_psql('postgres', "SHOW ssl_library");
-is($result, 'OpenSSL', 'ssl_library parameter');
+is($result, $ssl_server->ssl_library(), 'ssl_library parameter');
 
-configure_test_server_for_ssl($node, $SERVERHOSTADDR, $SERVERHOSTCIDR,
-	'trust');
+$ssl_server->configure_test_server_for_ssl($node, $SERVERHOSTADDR,
+		   $SERVERHOSTCIDR,	'trust');
 
 note "testing password-protected keys";
 
-open my $sslconf, '>', $node->data_dir . "/sslconfig.conf";
-print $sslconf "ssl=on\n";
-print $sslconf 

Re: RFC: Logging plan of the running query

2022-02-07 Thread Fujii Masao




On 2022/02/02 21:59, torikoshia wrote:

This may cause users to misunderstand that pg_log_query_plan() fails
while the target backend is holding *any* locks? Isn't it better to
mention "page-level locks", instead? So how about the following?

--
Note that the request to log the query plan will be ignored if it's
received during a short period while the target backend is holding a
page-level lock.
--


Agreed.


On second thought, this note is confusing rather than helpful? Because the 
users don't know when and what operation needs page-level lock. So now I'm 
thinking it's better to remove this note.



As you pointed out offlist, the issue could occur even when both 
pg_log_backend_memory_contexts and pg_log_query_plan are called and it may be 
better to make another patch.


OK.



You also pointed out offlist that it would be necessary to call 
LockErrorCleanup() if ProcessLogQueryPlanInterrupt() can incur ERROR.
AFAICS it can call ereport(ERROR), i.e., palloc0() in NewExplainState(), so I 
added PG_TRY/CATCH and make it call LockErrorCleanup() when ERROR occurs.


As we discussed off-list, this treatment might not be necessary. Because when 
ERROR or FATAL error happens, AbortTransaction() is called and 
LockErrorCleanup() is also called inside there.



Since I'm not sure how long it will take to discuss this point, the attached 
patch is based on the current HEAD at this time.


Thanks for updating the patch!

@@ -5048,6 +5055,12 @@ AbortSubTransaction(void)
 */
PG_SETMASK();
 
+	/*

+* When ActiveQueryDesc is referenced after abort, some of its elements
+* are freed. To avoid accessing them, reset ActiveQueryDesc here.
+*/
+   ActiveQueryDesc = NULL;

AbortSubTransaction() should reset ActiveQueryDesc to save_ActiveQueryDesc that 
ExecutorRun() set, instead of NULL? Otherwise ActiveQueryDesc of top-level 
statement will be unavailable after subtransaction is aborted in the nested 
statements.

For example, no plan is logged while the following "select pg_sleep(test())" is 
running, because the exception inside test() function aborts the subtransaction and 
resets ActiveQueryDesc to NULL.

create or replace function test () returns int as $$
begin
perform 1/0;
exception when others then
return 30;
end;
$$ language plpgsql;

select pg_sleep(test());


-CREATE ROLE regress_log_memory;
+CREATE ROLE regress_log;

Isn't this name too generic? How about regress_log_function, for example?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: [PATCH v2] use has_privs_for_role for predefined roles

2022-02-07 Thread Joe Conway

On 2/7/22 10:35, Robert Haas wrote:

On Sun, Feb 6, 2022 at 12:24 PM Tom Lane  wrote:

Joe Conway  writes:
> I'd like to pick this patch up and see it through to commit/push.
> Presumably that will include back-patching to all supported pg versions.
> Before I go through the effort to back-patch, does anyone want to argue
> that this should *not* be back-patched?

Hm, I'm -0.5 or so.  I think changing security-related behaviors
in a stable branch is a hard sell unless you are closing a security
hole.  This is a fine improvement for HEAD but I'm inclined to
leave the back branches alone.


I think the threshold to back-patch a clear behavior change is pretty
high, so I do not think it should be back-patched.


ok


I am also not convinced that a sufficient argument has been made for
changing this in master. This isn't the only thread where NOINHERIT
has come up lately, and I have to admit that I'm not a fan. Let's
suppose that I have two users, let's say sunita and sri. In the real
world, Sunita is Sri's manager and needs to be able to perform actions
as Sri when Sri is out of the office, but it isn't desirable for
Sunita to have Sri's privileges in all situations all the time. So we
mark role sunita as NOINHERIT and grant sri to sunita. Then it turns
out that Sunita also needs to be COPY TO/FROM PROGRAM, so we give her
pg_execute_server_program. Now, if she can't exercise this privilege
without setting role to the prefined role, that's bad, isn't it? I
mean, we want her to be able to copy between *her* tables and various
shell commands, not the tables owned by pg_execute_server_program, of
which there are presumably none.


Easily worked around with one additional level of role:

8<---
nmx=# create user sunita;
CREATE ROLE
nmx=# create user sri superuser;
CREATE ROLE
nmx=# create user sri_alt noinherit;
CREATE ROLE
nmx=# grant sri to sri_alt;
GRANT ROLE
nmx=# grant sri_alt to sunita;
GRANT ROLE
nmx=# grant pg_execute_server_program to sunita;
GRANT ROLE
nmx=# set session authorization sri;
SET
nmx=# create table foo(id int);
CREATE TABLE
nmx=# insert into foo values(42);
INSERT 0 1
nmx=# set session authorization sunita;
SET
nmx=> select * from foo;
ERROR:  permission denied for table foo
nmx=> set role sri;
SET
nmx=# select * from foo;
 id

 42
(1 row)

nmx=# reset role;
RESET
nmx=> select current_user;
 current_user
--
 sunita
(1 row)

nmx=> create temp table sfoo(f1 text);
CREATE TABLE
nmx=> copy sfoo(f1) from program 'id';
COPY 1
nmx=> select f1 from sfoo;
   f1 



 uid=1001(postgres) gid=1001(postgres) 
groups=1001(postgres),108(ssl-cert),1002(pgconf)

(1 row)
8<---


It seems to me that the INHERIT role flag isn't very well-considered.
Inheritance, or the lack of it, ought to be decided separately for
each inherited role. However, that would be a major architectural
change.


Agreed -- that would be useful.


But in the absence of that, it seems clearly better for predefined
roles to disregard INHERIT and just always grant the rights they are
intended to give. Because if we don't do that, then we end up with
people having to SET ROLE to the predefined role and perform actions
directly as that role, which seems like it can't be what we want. I
almost feel like we ought to be looking for ways of preventing people
from doing SET ROLE to a predefined role altogether, not encouraging
them to do it.

I disagree with this though.

It is confusing and IMHO dangerous that the predefined roles currently 
work differently than regular roles eith respect to privilege inheritance.


In fact, I would extend that argument to the pseudo-role PUBLIC.

Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




Re: Make relfile tombstone files conditional on WAL level

2022-02-07 Thread Robert Haas
On Mon, Feb 7, 2022 at 12:26 AM Dilip Kumar  wrote:
> I have splitted the patch into multiple patches which can be
> independently committable and easy to review. I have explained the
> purpose and scope of each patch in the respective commit messages.

Hmm. The parts of this I've looked at seem reasonably clean, but I
don't think I like the design choice. You're inventing
RelFileNodeSetFork(), but at present the RelFileNode struct doesn't
include a fork number. I feel like we should leave that alone, and
only change the definition of a BufferTag. What about adding accessors
for all of the BufferTag fields in 0001, and then in 0002 change it to
look like something this:

typedef struct BufferTag
{
Oid dbOid;
Oid tablespaceOid;
uint32  fileNode_low;
uint32  fileNode_hi:24;
uint32  forkNumber:8;
BlockNumber blockNumber;
} BufferTag;

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Database-level collation version tracking

2022-02-07 Thread Julien Rouhaud
On Mon, Feb 07, 2022 at 04:44:24PM +0100, Peter Eisentraut wrote:
> On 07.02.22 11:29, Julien Rouhaud wrote:
> 
> > - there is no test
> 
> Suggestions where to put it?  We don't really have tests for the
> collation-level versioning either, do we?

There's so limited testing in collate.* regression tests, so I thought it would
be ok to add it there.  At least some ALTER DATABASE ... REFRESH VERSION would
be good, similarly to collation-level versioning.

> > - it's missing some updates in create_database.sgml, and psql tab completion
> >for CREATE DATABASE with the new collation_version defelem.
> 
> Added to create_database.sgml, but not to psql.  We don't have completion
> for the collation option either, since it's only meant to be used by
> pg_upgrade, not interactively.

Ok.

> 
> > - that's not really something new with this patch, but should we output the
> >collation version info or mismatch info in \l / \dO?
> 
> It's a possibility.  Perhaps there is a question of performance if we show
> it in \l and people have tons of databases and they have to make a locale
> call for each one.  As you say, it's more an independent feature, but it's
> worth looking into.

Agreed.

> > +Oid
> > +AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
> > +{
> > +   Relationrel;
> > +   Oid dboid;
> > +   HeapTuple   tup;
> > +   Datum   datum;
> > +   boolisnull;
> > +   char   *oldversion;
> > +   char   *newversion;
> > +
> > +   rel = table_open(DatabaseRelationId, RowExclusiveLock);
> > +   dboid = get_database_oid(stmt->dbname, false);
> > +
> > +   if (!pg_database_ownercheck(dboid, GetUserId()))
> > +   aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
> > +  stmt->dbname);
> > +
> > +   tup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(dboid));
> > +   if (!HeapTupleIsValid(tup))
> > +   elog(ERROR, "cache lookup failed for database %u", dboid);
> > 
> > Is that ok to not obtain a lock on the database when refreshing the 
> > collation?
> 
> That code takes a RowExclusiveLock on pg_database.  Did you have something
> else in mind?

But that lock won't prevent a concurrent DROP DATABASE, so it's totally
expected to hit that cache lookup failed error.  There should either be a
shdepLockAndCheckObject(), or changing the error message to some errmsg("data
xxx does not exist").

> > +   /*
> > +* Check collation version.  See similar code in
> > +* pg_newlocale_from_collation().
> > +*/
> > +   datum = SysCacheGetAttr(DATABASEOID, tup, 
> > Anum_pg_database_datcollversion,
> > +   );
> > +   if (!isnull)
> > +   {
> > 
> > This (and pg_newlocale_from_collation()) reports a problem if a recorded
> > collation version is found but there's no reported collation version.
> > Shouldn't it also complain if it's the opposite?  It's otherwise a backdoor 
> > to
> > make sure there won't be any check about the version anymore, and while it 
> > can
> > probably happen if you mess with the catalogs it still doesn't look great.
> 
> get_collation_actual_version() always returns either null or not null for a
> given installation.  So the situation that the stored version is null and
> the actual version is not null can only happen as part of a software
> upgrade.  In that case, all uses of collations after an upgrade would
> immediately start complaining about missing versions, which seems like a bad
> experience.  Users can explicitly opt in to version tracking by running
> REFRESH VERSION once.

Ah right, I do remember that point which was also discussed in the collation
version tracking.  Sorry about the noise.

> > +   /*
> > +* template0 shouldn't have any collation-dependent objects, so 
> > unset
> > +* the collation version.  This avoids warnings when making a new
> > +* database from it.
> > +*/
> > +   "UPDATE pg_database SET datcollversion = NULL WHERE datname = 
> > 'template0';\n\n",
> > 
> > I'm not opposed, but shouldn't there indeed be a warning in case of 
> > discrepancy
> > in the source database (whether template or not)?
> > 
> > # update pg_database set datcollversion = 'meh' where datname in 
> > ('postgres', 'template1');
> > UPDATE 2
> > 
> > # create database test1 template postgres;
> > CREATE DATABASE
> > 
> > # create database test2 template template1;
> > CREATE DATABASE
> > 
> > # \c test2
> > WARNING:  database "test2" has a collation version mismatch
> 
> I don't understand what the complaint is here.  It seems to work ok?

The comment says that you explicitly set a NULL collation version to avoid
warning when creating a database using template0 as a template, presumably
after a collation lib upgrade.

But as far as I can see the source database collation version is not checked
when creating a new database, so it seems to me that either the comment is
wrong, or we need another check for that.  The latter seems preferable to me.




Re: Storage for multiple variable-length attributes in a single row

2022-02-07 Thread David G. Johnston
On Mon, Feb 7, 2022 at 8:44 AM Esteban Zimanyi 
wrote:

> May I kindly ask your insight about a question I posted 1 month ago and
> for which I never received any answer ?
>

-hackers really isn't the correct place for usage questions like this -
even if you are creating a custom type (why you are doing this is left
unstated, I have no idea what it means to be a temporally extended version
of boolean, etc...)

You should read up on how TOAST works in PostgreSQL since that is what
handles large length data values.

IIUC your setup correctly, you are claiming to have 2k or more columns.
This well exceeds the limit for PostgreSQL.
A "tablespace" is a particular functionality provided by the server.   You
are using "table space" in a different sense and I'm unsure exactly what
you mean.  I presume "cell".  PostgreSQL has row-oriented storage (our
internals documentation goes over this).

I think your mention of mobilitydb also complicates receiving a useful
response as this list is for the core project.  That you can exceed the
column count limit suggests that your environment is enough different than
core that you should be asking there.

You will need to await someone else to specifically answer the extended
storage question though - but I suspect you've provided insufficient
details in that regard.

David J.


Re: pg_upgrade should truncate/remove its logs before running

2022-02-07 Thread Andrew Dunstan


On 2/6/22 19:39, Tom Lane wrote:
> Michael Paquier  writes:
>> On Sun, Feb 06, 2022 at 08:32:59AM -0500, Andrew Dunstan wrote:
>>> But the commit really shouldn't have happened until we know that most
>>> buildfarm owners have installed it. It should have waited wait not just
>>> for the release but for widespread deployment. Otherwise we will just
>>> lose any logging for an error that might appear.
>> Would it be better if I just revert the change for now then and do it
>> again in one/two weeks?
> I don't see a need to revert it.
>
> I note, though, that there's still not been any email to the buildfarm
> owners list about this update.
>
>   


The announcement was held up in list moderation for 20 hours or so.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Documentation about PL transforms

2022-02-07 Thread Peter Eisentraut

On 05.02.22 00:55, Chapman Flack wrote:

I'm thinking plhandler.sgml is the only place that really needs to be
said; readers looking up CREATE TRANSFORM and using an existing PL that
supports it don't need to know how the sausage is made. (Maybe it is
worth mentioning there, though, that it isn't possible to develop
transforms for an arbitrary PL unless that PL applies transforms.)


makes sense


I noticed the CREATE TRANSFORM docs show the argument list as
(argument_type [, ...]) even though check_transform_function will reject
any argument list of length other than 1 or with type other than internal.
Is that an overly-generic way to show the syntax, or is that a style
with precedent elsewhere in the docs?


That could be corrected.


As long as a PL handler has the sole responsibility for invoking
its transforms, I wonder if it would make sense to allow a PL to
define what its transforms should look like, maybe replacing
check_transform_function with a transform_validator for the PL.
I'm not proposing such a patch here, but I am willing to prepare
one for plhandler.sgml and maybe pltemplate.c documenting the current
situation, if nobody tells me I'm wrong about something here.


Maybe.  This kind of thing would mostly be driven what a PL wants in 
actual practice, and then how that could be generalized to many/all PLs.





Re: support for CREATE MODULE

2022-02-07 Thread Peter Eisentraut



On 04.02.22 23:12, Tom Lane wrote:

Right.  We've looked into that before --- when I made pg_namespace,
I called it that because I thought we might be able to support
nested namespaces --- but it'd really create a mess.  In particular,
the SQL standard says what a three-part name means, and this ain't it.


Modules are part of the SQL standard, so there is surely some 
name-resolution system specified there as well.





Re: Database-level collation version tracking

2022-02-07 Thread Peter Eisentraut

On 07.02.22 11:29, Julien Rouhaud wrote:

- there should be a mention to the need for a catversion bump in the message
   comment


done


- there is no test


Suggestions where to put it?  We don't really have tests for the 
collation-level versioning either, do we?



- it's missing some updates in create_database.sgml, and psql tab completion
   for CREATE DATABASE with the new collation_version defelem.


Added to create_database.sgml, but not to psql.  We don't have 
completion for the collation option either, since it's only meant to be 
used by pg_upgrade, not interactively.



- that's not really something new with this patch, but should we output the
   collation version info or mismatch info in \l / \dO?


It's a possibility.  Perhaps there is a question of performance if we 
show it in \l and people have tons of databases and they have to make a 
locale call for each one.  As you say, it's more an independent feature, 
but it's worth looking into.



+   if (!actual_versionstr)
+   ereport(ERROR,
+   (errmsg("database \"%s\" has no actual collation version, but a 
version was specified",
+   name)));-

this means you can't connect on such a database anymore.  The level is probably
ok for collation version check, but for db isn't that too much?


Right, changed to warning.


+Oid
+AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
+{
+   Relationrel;
+   Oid dboid;
+   HeapTuple   tup;
+   Datum   datum;
+   boolisnull;
+   char   *oldversion;
+   char   *newversion;
+
+   rel = table_open(DatabaseRelationId, RowExclusiveLock);
+   dboid = get_database_oid(stmt->dbname, false);
+
+   if (!pg_database_ownercheck(dboid, GetUserId()))
+   aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
+  stmt->dbname);
+
+   tup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(dboid));
+   if (!HeapTupleIsValid(tup))
+   elog(ERROR, "cache lookup failed for database %u", dboid);

Is that ok to not obtain a lock on the database when refreshing the collation?


That code takes a RowExclusiveLock on pg_database.  Did you have 
something else in mind?



+   /*
+* Check collation version.  See similar code in
+* pg_newlocale_from_collation().
+*/
+   datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_datcollversion,
+   );
+   if (!isnull)
+   {

This (and pg_newlocale_from_collation()) reports a problem if a recorded
collation version is found but there's no reported collation version.
Shouldn't it also complain if it's the opposite?  It's otherwise a backdoor to
make sure there won't be any check about the version anymore, and while it can
probably happen if you mess with the catalogs it still doesn't look great.


get_collation_actual_version() always returns either null or not null 
for a given installation.  So the situation that the stored version is 
null and the actual version is not null can only happen as part of a 
software upgrade.  In that case, all uses of collations after an upgrade 
would immediately start complaining about missing versions, which seems 
like a bad experience.  Users can explicitly opt in to version tracking 
by running REFRESH VERSION once.



+   /*
+* template0 shouldn't have any collation-dependent objects, so unset
+* the collation version.  This avoids warnings when making a new
+* database from it.
+*/
+   "UPDATE pg_database SET datcollversion = NULL WHERE datname = 
'template0';\n\n",

I'm not opposed, but shouldn't there indeed be a warning in case of discrepancy
in the source database (whether template or not)?

# update pg_database set datcollversion = 'meh' where datname in ('postgres', 
'template1');
UPDATE 2

# create database test1 template postgres;
CREATE DATABASE

# create database test2 template template1;
CREATE DATABASE

# \c test2
WARNING:  database "test2" has a collation version mismatch


I don't understand what the complaint is here.  It seems to work ok?From c50f4561932f5ff23715a0fb6e3e9554ce395f16 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 7 Feb 2022 16:32:19 +0100
Subject: [PATCH v3] Database-level collation version tracking

This adds to database objects the same version tracking that collation
objects have.  There is a new pg_database column datcollversion that
stores the version, a new function
pg_database_collation_actual_version() to get the version from the
operating system, and a new subcommand ALTER DATABASE ... REFRESH
COLLATION VERSION.

This was not originally added together with pg_collation.collversion,
since originally version tracking was only supported for ICU, and ICU
on a database-level is not currently supported.  But we now have
version tracking for glibc (since PG13), FreeBSD (since PG14), and
Windows (since PG13), so this is useful to have now.

Discussion: 

Re: Storage for multiple variable-length attributes in a single row

2022-02-07 Thread Esteban Zimanyi
Dear all

May I kindly ask your insight about a question I posted 1 month ago and for
which I never received any answer ?

Many thanks

On Thu, Jan 6, 2022 at 4:05 PM Esteban Zimanyi 
wrote:

> Dear all
>
> When ingesting mobility (IoT) data into MobilityDB
> https://mobilitydb.com/
> we transform very wide (2K attributes) car mobility data of high frequence
> (every tenth of a second) from flat format (e.g. CSV) into MobilityDB
> format in which there is a single record per trip and each of the signals
> is transformed into a temporal attribute (tbool, tint, tfloat, ttext,
> tgeompoint, tgeogpoint), which are temporal extensions of the corresponding
> PostgreSQL/PostGIS base types (bool, int, float, text, geometry,
> geography). All temporal types are stored using extended format, e.g.,
> CREATE TYPE tfloat (
>   internallength = variable,
>   [...]
>   storage = extended,
>   alignment = double,
>   [...]
> );
>
> Given that each temporal value can be very wide (on average 30K
> timestamped  points/floats/text/... per trip) our first question is
> * Is extended the right storage for this ?
>
> Our second question is how all the 2K temporal attributes are stored,
> which may be
> * on a single table space
> * in one table space per attribute
> which in other words, relates to the question row vs column storage.
>
> Many thanks for your insight
>
> Esteban
>
>


Re: [PATCH v2] use has_privs_for_role for predefined roles

2022-02-07 Thread Robert Haas
On Sun, Feb 6, 2022 at 12:24 PM Tom Lane  wrote:
> Joe Conway  writes:
> > I'd like to pick this patch up and see it through to commit/push.
> > Presumably that will include back-patching to all supported pg versions.
> > Before I go through the effort to back-patch, does anyone want to argue
> > that this should *not* be back-patched?
>
> Hm, I'm -0.5 or so.  I think changing security-related behaviors
> in a stable branch is a hard sell unless you are closing a security
> hole.  This is a fine improvement for HEAD but I'm inclined to
> leave the back branches alone.

I think the threshold to back-patch a clear behavior change is pretty
high, so I do not think it should be back-patched.

I am also not convinced that a sufficient argument has been made for
changing this in master. This isn't the only thread where NOINHERIT
has come up lately, and I have to admit that I'm not a fan. Let's
suppose that I have two users, let's say sunita and sri. In the real
world, Sunita is Sri's manager and needs to be able to perform actions
as Sri when Sri is out of the office, but it isn't desirable for
Sunita to have Sri's privileges in all situations all the time. So we
mark role sunita as NOINHERIT and grant sri to sunita. Then it turns
out that Sunita also needs to be COPY TO/FROM PROGRAM, so we give her
pg_execute_server_program. Now, if she can't exercise this privilege
without setting role to the prefined role, that's bad, isn't it? I
mean, we want her to be able to copy between *her* tables and various
shell commands, not the tables owned by pg_execute_server_program, of
which there are presumably none.

It seems to me that the INHERIT role flag isn't very well-considered.
Inheritance, or the lack of it, ought to be decided separately for
each inherited role. However, that would be a major architectural
change. But in the absence of that, it seems clearly better for
predefined roles to disregard INHERIT and just always grant the rights
they are intended to give. Because if we don't do that, then we end up
with people having to SET ROLE to the predefined role and perform
actions directly as that role, which seems like it can't be what we
want. I almost feel like we ought to be looking for ways of preventing
people from doing SET ROLE to a predefined role altogether, not
encouraging them to do it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-02-07 Thread Robert Haas
On Fri, Feb 4, 2022 at 10:45 PM Peter Geoghegan  wrote:
> > While I've seen all the above cases triggering anti-wraparound cases
> > by far the majority of the cases are not of these pathological forms.
>
> Right - it's practically inevitable that you'll need an
> anti-wraparound VACUUM to advance relfrozenxid right now. Technically
> it's possible to advance relfrozenxid in any VACUUM, but in practice
> it just never happens on a large table. You only need to get unlucky
> with one heap page, either by failing to get a cleanup lock, or (more
> likely) by setting even one single page all-visible but not all-frozen
> just once (once in any VACUUM that takes place between anti-wraparound
> VACUUMs).

But ... if I'm not mistaken, in the kind of case that Greg is
describing, relfrozenxid will be advanced exactly as often as it is
today. That's because, if VACUUM is only ever getting triggered by XID
age advancement and not by bloat, there's no opportunity for your
patch set to advance relfrozenxid any sooner than we're doing now. So
I think that people in this kind of situation will potentially be
helped or hurt by other things the patch set does, but the eager
relfrozenxid stuff won't make any difference for them.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-02-07 Thread Robert Haas
On Fri, Feb 4, 2022 at 10:21 PM Greg Stark  wrote:
> By far the majority of anti-wraparound vacuums are triggered by tables
> that are very large and so don't trigger regular vacuums for "long
> periods" of time and consistently hit the anti-wraparound threshold
> first.

That's interesting, because my experience is different. Most of the
time when I get asked to look at a system, it turns out that there is
a prepared transaction or a forgotten replication slot and nobody
noticed until the system hit the wraparound threshold. Or occasionally
a long-running transaction or a failing/stuck vacuum that has the same
effect.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Support escape sequence for cluster_name in postgres_fdw.application_name

2022-02-07 Thread Fujii Masao



On 2022/01/28 14:07, r.takahash...@fujitsu.com wrote:

I think %c of log_line_prefix (Session ID) is also useful for 
postgres_fdw.application_name.
Therefore, how about adding both %c (Session ID) and %C (cluster_name)?


+1

Attached is the updated version of the patch. It adds those escape sequences %c 
and %C.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATIONdiff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index fc3ce6a53a..af38e956e7 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -489,6 +489,12 @@ process_pgfdw_appname(const char *appname)
case 'a':
appendStringInfoString(, application_name);
break;
+   case 'c':
+   appendStringInfo(, "%lx.%x", (long) 
(MyStartTime), MyProcPid);
+   break;
+   case 'C':
+   appendStringInfoString(, cluster_name);
+   break;
case 'd':
appendStringInfoString(, 
MyProcPort->database_name);
break;
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 2bb31f1125..17cd90ab12 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -983,6 +983,20 @@ postgres=# SELECT postgres_fdw_disconnect_all();
  %a
  Application name on local server
 
+
+ %c
+ 
+  Session ID on local server
+  (see  for details)
+ 
+
+
+ %C
+ 
+  Cluster name in local server
+  (see  for details)
+ 
+
 
  %u
  User name on local server
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 6bb81707b0..f1bfe79feb 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -271,7 +271,7 @@ extern int  temp_file_limit;
 
 extern int num_temp_buffers;
 
-extern char *cluster_name;
+extern PGDLLIMPORT char *cluster_name;
 extern PGDLLIMPORT char *ConfigFileName;
 extern char *HbaFileName;
 extern char *IdentFileName;


Announcing Release 14 of the PostgreSQL Buildfarm client

2022-02-07 Thread Andrew Dunstan
Significant changes:

  * don't run TestUpgrade if TAP tests are present in |src/bin/pg_upgrade|
  * Add proposed new location of |pg_upgrade| logs in TestUpgrade module
  * Use an HMAC over the whole content as the signature.
  * Quote |PROVE_FLAGS |in case there are multiple settings
  * tighten failure detection for cross version upgrade
  * be more verbose about git mirror failures
  * Support symlinks on Windows where possible in SCM module
  * Document |rm_worktrees| setting and make |on| the default.
  * Add new branches_to_build keywords |STABLE| and |OLD|

Because of the changes to how pg_upgrade logging works, owners are
strongly urged to upgrade to the new release as soon as possible if they
are running the TestUpgrade module.

The meaning of the branches_to_build keywords are as follows: STABLE
means all the live branches other than master/HEAD, while OLD means
those branches older than STABLE that we are now supporting limited
builds for (currently REL9_2_STABLE through REL9_6_STABLE).

Downloads are available at
 and



cheers


andrew





Re: Database-level collation version tracking

2022-02-07 Thread Julien Rouhaud
On Tue, Feb 01, 2022 at 04:20:14PM +0100, Peter Eisentraut wrote:
> This patch adds to database objects the same version tracking that collation
> objects have.

This version conflicts with 87669de72c2 (Some cleanup for change of collate and
ctype fields to type text), so I'm attaching a simple rebase of your patch to
make the cfbot happy, no other changes.

> There is a new pg_database column datcollversion that stores
> the version, a new function pg_database_collation_actual_version() to get
> the version from the operating system, and a new subcommand ALTER DATABASE
> ... REFRESH COLLATION VERSION.
> 
> This was not originally added together with pg_collation.collversion, since
> originally version tracking was only supported for ICU, and ICU on a
> database-level is not currently supported.  But we now have version tracking
> for glibc (since PG13), FreeBSD (since PG14), and Windows (since PG13), so
> this is useful to have now.  And of course ICU on database-level is being
> worked on at the moment as well.
> This patch is pretty much complete AFAICT.

Agreed.  Here's a review of the patch:

- there should be a mention to the need for a catversion bump in the message
  comment
- there is no test
- it's missing some updates in create_database.sgml, and psql tab completion
  for CREATE DATABASE with the new collation_version defelem.

- that's not really something new with this patch, but should we output the
  collation version info or mismatch info in \l / \dO?

+   if (!actual_versionstr)
+   ereport(ERROR,
+   (errmsg("database \"%s\" has no actual collation version, 
but a version was specified",
+   name)));-

this means you can't connect on such a database anymore.  The level is probably
ok for collation version check, but for db isn't that too much?

+Oid
+AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
+{
+   Relationrel;
+   Oid dboid;
+   HeapTuple   tup;
+   Datum   datum;
+   boolisnull;
+   char   *oldversion;
+   char   *newversion;
+
+   rel = table_open(DatabaseRelationId, RowExclusiveLock);
+   dboid = get_database_oid(stmt->dbname, false);
+
+   if (!pg_database_ownercheck(dboid, GetUserId()))
+   aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_DATABASE,
+  stmt->dbname);
+
+   tup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(dboid));
+   if (!HeapTupleIsValid(tup))
+   elog(ERROR, "cache lookup failed for database %u", dboid);

Is that ok to not obtain a lock on the database when refreshing the collation?
I guess it's not worth bothering as it can only lead to spurious messages for
connection done concurrently, but there should be a comment to clarify it.
Also, it means that someone can drop the database concurrently. So it's
should be a "database does not exist" rather than a cache lookup failed error
message.

+   /*
+* Check collation version.  See similar code in
+* pg_newlocale_from_collation().
+*/
+   datum = SysCacheGetAttr(DATABASEOID, tup, Anum_pg_database_datcollversion,
+   );
+   if (!isnull)
+   {

This (and pg_newlocale_from_collation()) reports a problem if a recorded
collation version is found but there's no reported collation version.
Shouldn't it also complain if it's the opposite?  It's otherwise a backdoor to
make sure there won't be any check about the version anymore, and while it can
probably happen if you mess with the catalogs it still doesn't look great.

+   /*
+* template0 shouldn't have any collation-dependent objects, so unset
+* the collation version.  This avoids warnings when making a new
+* database from it.
+*/
+   "UPDATE pg_database SET datcollversion = NULL WHERE datname = 
'template0';\n\n",

I'm not opposed, but shouldn't there indeed be a warning in case of discrepancy
in the source database (whether template or not)?

# update pg_database set datcollversion = 'meh' where datname in ('postgres', 
'template1');
UPDATE 2

# create database test1 template postgres;
CREATE DATABASE

# create database test2 template template1;
CREATE DATABASE

# \c test2
WARNING:  database "test2" has a collation version mismatch
DETAIL:  The database was created using collation version meh, but the 
operating system provides version 2.34.
HINT:  Rebuild all objects affected by collation in this database and run ALTER 
DATABASE test2 REFRESH COLLATION VERSION, or build PostgreSQL with the right 
library version.
You are now connected to database "test2" as user "rjuju".

The rest of the patch looks good to me.  There's notably pg_dump and pg_upgrade
support so it indeed looks complete.

> One bonus thing would be to add
> a query to the ALTER DATABASE ref page similar to the one on the ALTER
> COLLATION ref page that identifies the objects that are affected by outdated
> collations.  The database version of that might just show all
> collation-using objects or 

Allow parallel plan for referential integrity checks?

2022-02-07 Thread Frédéric Yhuel

Hello,

I noticed that referential integrity checks aren't currently 
parallelized. Is it on purpose?


From the documentation [1], the planner will not generate a parallel 
plan for a given query if any of the following are true:


1) The system is running in single-user mode.
2) max_parallel_workers_per_gather = 0.
3) The query writes any data or locks any database rows.
4) The query might be suspended during execution (cursor or  PL/pgSQL loop).
5) The query uses any function marked PARALLEL UNSAFE.
6) The query is running inside of another query that is already parallel.

From my understanding of the code, it seems to me that the fourth 
condition is not always accurately detected. In particular, the query 
triggered by a foreign key addition (a Left Excluding JOIN between the 
two tables) isn't parallelized, because the flag CURSOR_OPT_PARALLEL_OK 
hasn't been set at some point. I couldn't find any reason why not, but 
maybe I missed something.


Attached is a (naive) patch that aims to fix the case of a FK addition, 
but the handling of the flag CURSOR_OPT_PARALLEL_OK, generally speaking, 
looks rather hackish.


Best regards,
Frédéric

[1] 
https://www.postgresql.org/docs/current/when-can-parallel-query-be-used.htmlFrom 1353a4b7e7d08b13cbaa85a5f9ae26819b5cf2c8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Fr=C3=A9d=C3=A9ric=20Yhuel?= 
Date: Sun, 6 Feb 2022 13:31:55 +0100
Subject: [PATCH] Allow parallel plan for referential integrity checks

---
 src/backend/utils/adt/ri_triggers.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index 96269fc2ad..84c16b6962 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -1317,6 +1317,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 	char		workmembuf[32];
 	int			spi_result;
 	SPIPlanPtr	qplan;
+	SPIPrepareOptions options;
 
 	riinfo = ri_FetchConstraintInfo(trigger, fk_rel, false);
 
@@ -1483,10 +1484,12 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
 	 * Generate the plan.  We don't need to cache it, and there are no
 	 * arguments to the plan.
 	 */
-	qplan = SPI_prepare(querybuf.data, 0, NULL);
+	memset(, 0, sizeof(options));
+	options.cursorOptions = CURSOR_OPT_PARALLEL_OK;
+	qplan = SPI_prepare_extended(querybuf.data, );
 
 	if (qplan == NULL)
-		elog(ERROR, "SPI_prepare returned %s for %s",
+		elog(ERROR, "SPI_prepare_extended returned %s for %s",
 			 SPI_result_code_string(SPI_result), querybuf.data);
 
 	/*
-- 
2.30.2



Re: Fix CheckIndexCompatible comment

2022-02-07 Thread Yugo NAGATA
Hello, Fujii-san,

On Fri, 4 Feb 2022 09:08:22 +0900
Fujii Masao  wrote:

> 
> 
> On 2022/02/04 1:46, Yugo NAGATA wrote:
> > Hello,
> > 
> > I found a old parameter name 'heapRelation' in the comment
> > of CheckIndexCompatible. This parameter was removed by 5f173040.
> > 
> > Attached is a patch to remove it from the comment.
> 
> Thanks for the report! I agree to remove the mention of parameter already 
> dropped, from the comment. OTOH, I found CheckIndexCompatible() now has 
> "oldId" parameter but there is no comment about it though there are comments 
> about other parameters. Isn't it better to add the comment about "oldId"?

Agreed. I updated the patch to add a comment about 'oldId'.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 560dcc87a2..408ffc70f6 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -129,7 +129,7 @@ typedef struct ReindexErrorInfo
  *		prospective index definition, such that the existing index storage
  *		could become the storage of the new index, avoiding a rebuild.
  *
- * 'heapRelation': the relation the index would apply to.
+ * 'oldId': the existing index's relation OID
  * 'accessMethodName': name of the AM to use.
  * 'attributeList': a list of IndexElem specifying columns and expressions
  *		to index on.


Re: row filtering for logical replication

2022-02-07 Thread Amit Kapila
On Mon, Feb 7, 2022 at 1:21 PM Peter Smith  wrote:
>
> 5. src/backend/commands/publicationcmds.c - IsRowFilterSimpleExpr (Simple?)
>
> +/*
> + * Is this a simple Node permitted within a row filter expression?
> + */
> +static bool
> +IsRowFilterSimpleExpr(Node *node)
> +{
>
> A lot has changed in this area recently and I feel that there is
> something not quite 100% right with the naming and/or logic in this
> expression validation. IMO there are several functions that seem to
> depend too much on each other in special ways...
>
> IIUC the "walker" logic now seems to be something like this:
> a) Check for special cases of the supported nodes
> b) Then check for supported (simple?) nodes (i.e.
> IsRowFilterSimpleExpr is now acting as a "catch-all" after the special
> case checks)
> c) Then check for some unsupported node embedded within a supported
> node (i.e. call expr_allowed_in_node)
> d) If any of a,b,c was bad then give an error.
>
> To achieve that logic the T_FuncExpr was added to the
> "IsRowFilterSimpleExpr". Meanwhile, other nodes like
> T_ScalarArrayOpExpr and T_NullIfExpr now are removed from
> IsRowFilterSimpleExpr - I don't quite know why these got removed
>

They are removed because those nodes need some special checks based on
which errors could be raised whereas other nodes don't need such
checks.

> but
> perhaps there is implicit knowledge that those node kinds were already
> checked by the "walker" before the IsRowFilterSimpleExpr function ever
> gets called.
>
> So, although I trust that everything is working OK,  I don't think
> IsRowFilterSimpleExpr is really just about simple nodes anymore. It is
> harder to see why some supported nodes are in there, and some
> supported nodes are not. It seems tightly entwined with the logic of
> check_simple_rowfilter_expr_walker; i.e. there seem to be assumptions
> about exactly when it will be called and what was checked before and
> what will be checked after calling it.
>
> IMO probably all the nodes we are supporting should be in the
> IsRowFilterSimpleExpr just for completeness (e.g. put T_NullIfExpr and
> T_ScalarArrayOpExpr back in there...), and maybe the function should
> be renamed (IsRowFilterAllowedNode?),
>

I am not sure if that is a good idea because then instead of
true/false, we need to get an error message as well but I think we can
move back all the nodes handled in IsRowFilterSimpleExpr back to
check_simple_rowfilter_expr_walker() and change the handling to
switch..case

One more thing in this context is, in ScalarArrayOpExpr handling, we
are not checking a few parameters like hashfuncid. Can we please add a
comment that why some parameters are checked and others not?

>
> ~~~
>
> 6. src/backend/commands/publicationcmds.c - IsRowFilterSimpleExpr (T_List)
>
> (From Amit's patch)
>
> @@ -395,6 +397,7 @@ IsRowFilterSimpleExpr(Node *node)
>   case T_NullTest:
>   case T_RelabelType:
>   case T_XmlExpr:
> + case T_List:
>   return true;
>   default:
>   return false;
>
>
> The case T_List should be moved to be alphabetical the same as all the
> other cases.
>

Hmm, I have added based on the way it is defined in nodes.h. T_List is
defined after T_XmlExpr in nodes.h. I don't see they are handled in
alphabetical order in other places like in check_functions_in_node().
I think the nodes that need the same handling should be together and
again there also we can keep them in order as they are defined in
nodes.h and otherwise also all other nodes should be in the same order
as they are defined in nodes.h. That way we will be consistent.

-- 
With Regards,
Amit Kapila.




Re: [PATCH] nodeindexscan with reorder memory leak

2022-02-07 Thread Aliaksandr Kalenik
Thanks for your review!

On Sun, Jan 30, 2022 at 7:24 PM Tom Lane  wrote:
> Actually, that code has got worse problems than that.  I tried to improve
> our regression tests to exercise that code path, as attached.  What I got
> was
>
> +SELECT point(x,x), (SELECT circle_center(f1) FROM gcircle_tbl ORDER BY f1 
> <-> p
> oint(x,x) LIMIT 1) as c FROM generate_series(0,1000,1) x;
> +ERROR:  index returned tuples in wrong order

I tried to figure out what is the problem with this query. This error
happens when actual distance is less than estimated distance.
For this specific query it happened while comparing these values:
50.263279680219099532223481219262 (actual distance returned by
dist_cpoint in geo_ops.c) and 50.263279680219113743078196421266
(bounding box distance returned by computeDistance in gistproc.c).

So for me it looks like this error is not really related to KNN scan
code but to some floating-arithmetic issue in distance calculation
functions for geometry type.Would be great to figure out a fix but
for now I didn’t manage to find a better way than comparing the
difference of distance with FLT_EPSILON which definitely doesn't seem
like the way to fix :(




Re: [PATCH] Accept IP addresses in server certificate SANs

2022-02-07 Thread Kyotaro Horiguchi
At Fri, 4 Feb 2022 17:06:53 +, Jacob Champion  wrote 
in 
> That works a lot better than what I had in my head. Done that way in
> v4. Thanks!

Thanks!

0002:

+#define PGSQL_AF_INET  (AF_INET + 0)
+#define PGSQL_AF_INET6 (AF_INET + 1)
..
-#define PGSQL_AF_INET  (AF_INET + 0)
-#define PGSQL_AF_INET6 (AF_INET + 1)

I feel this should be a part of 0001.  (But the patches will be
finally merged so maybe no need to bother moving it).



> * The use of inet_aton() instead of inet_pton() is deliberate; the
> * latter cannot handle alternate IPv4 notations ("numbers-and-dots").

I think we should be consistent in handling IP addresses.  We have
both inet_pton and inet_aton to parse IPv4 addresses.

We use inet_pton in the inet type (network_in).
We use inet_aton in server addresses.

# Hmm. I'm surprised to see listen_addresses accepts "0x7f.1".
# I think we should accept the same by network_in but it is another
# issue.

So, inet_aton there seems to be the right choice but the comment
doesn't describe the reason for that behavior. I think we should add
an explanation about the reason for the behavior, maybe something like
this:

> We accept alternative IPv4 address notations that are accepted by
> inet_aton but not by inet_pton as server address.



+* GEN_IPADD is an OCTET STRING containing an IP address in network byte
+* order.

+   /* OK to cast from unsigned to plain char, since it's all ASCII. */
+   return pq_verify_peer_name_matches_certificate_ip(conn, (const char *) 
addrdata, len, store_name);

Aren't the two comments contradicting each other? The retruned general
name looks like an octet array, which is not a subset of ASCII
string. So pq_verify_peer_name_matches_certificate_ip should receive
addrdata as "const unsigned char *", without casting.



+   if (name->type == host_type)
+   check_cn = false;

Don't we want a concise coment for this?



-   if (*names_examined == 0)
+   if ((rc == 0) && check_cn)

To me, it seems a bit hard to understand.  We can set false to
check_cn in the rc != 0 path in the loop on i, like this:

>   if (rc != 0)
> + {
> + /*
> +  * don't fall back to CN when we have a match 
> or have an error
> +  */
> + check_cn = false;
>   break;
> + }
...
> - if ((rc == 0) && check_cn)
> + if (check_cn)



The following existing code (CN fallback)

>   rc = openssl_verify_peer_name_matches_certificate_name(conn,
>   X509_NAME_ENTRY_get_data(X509_NAME_get_entry(subject_name, cn_index)),
>   first_name);

is expecting that first_name has not been set when it is visited.
However, with this patch, first_name can be set when the cert has any
SAN of unmatching type (DNS/IPADD) and the already-set name leaks.  We
need to avoid that memory leak since the path can be visited multiple
times from the user-program of libpq. I came up with two directions.

1. Completely ignore type-unmatching entries. first_name is not set by
 such entries.  Such unmatching entreis doesn't increase
 *names_examined.

2. Avoid overwriting first_name there.

I like 1, but since we don't make distinction between DNS and IPADDR
in the error message emited by the caller, we would take 2?


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Error "initial slot snapshot too large" in create replication slot

2022-02-07 Thread Dilip Kumar
On Mon, Jan 31, 2022 at 11:50 AM Kyotaro Horiguchi
 wrote:
>
> At Mon, 17 Jan 2022 09:27:14 +0530, Dilip Kumar  wrote 
> in
>
> me> Mmm. The size of the array cannot be larger than the numbers the
> me> *Connt() functions return.  Thus we cannot attach the oversized array
> me> to ->subxip.  (I don't recall clearly but that would lead to assertion
> me> failure somewhere..)
>
> Then, I fixed the v3 error and post v4.

Yeah you are right, SetTransactionSnapshot() has that assertion.
Anyway after looking again it appears that
GetMaxSnapshotSubxidCount is the correct size because this is
PGPROC_MAX_CACHED_SUBXIDS +1, i.e. it considers top transactions as
well so we don't need to add them separately.

>
> SnapBUildInitialSnapshot tries to store XIDS of both top and sub
> transactions into snapshot->xip array but the array is easily
> overflowed and CREATE_REPLICATOIN_SLOT command ends with an error.
>
> To fix this, this patch is doing the following things.
>
> - Use subxip array instead of xip array to allow us have larger array
>   for xids.  So the snapshot is marked as takenDuringRecovery, which
>   is a kind of abuse but largely reduces the chance of getting
>   "initial slot snapshot too large" error.

Right. I think the patch looks fine to me.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Synchronizing slots from primary to standby

2022-02-07 Thread Ashutosh Sharma
Hi Andres,

Are you talking about this scenario - what if the logical replication
slot on the publisher is dropped, but is being referenced by the
standby where the slot is synchronized? Should the redo function for
the drop replication slot have the capability to drop it on standby
and its subscribers (if any) as well?

--
With Regards,
Ashutosh Sharma.

On Sun, Feb 6, 2022 at 1:29 AM Andres Freund  wrote:
>
> Hi,
>
> On 2022-01-03 14:46:52 +0100, Peter Eisentraut wrote:
> > From ec00dc6ab8bafefc00e9b1c78ac9348b643b8a87 Mon Sep 17 00:00:00 2001
> > From: Peter Eisentraut 
> > Date: Mon, 3 Jan 2022 14:43:36 +0100
> > Subject: [PATCH v3] Synchronize logical replication slots from primary to
> >  standby
>
> I've just skimmed the patch and the related threads. As far as I can tell this
> cannot be safely used without the conflict handling in [1], is that correct?
>
>
> Greetings,
>
> Andres Freund
>
> [1] 
> https://postgr.es/m/CA%2BTgmoZd-JqNL1-R3RJ0jQRD%2B-dc94X0nPJgh%2BdwdDF0rFuE3g%40mail.gmail.com
>
>




Use JOIN USING aliases in ruleutils.c

2022-02-07 Thread Peter Eisentraut
When reverse-compiling a query, ruleutils.c has some complicated code to 
handle the join output columns of a JOIN USING join.  There used to be 
no way to qualify those columns, and so if there was a naming conflict 
anywhere in the query, those output columns had to be renamed to be 
unique throughout the query.


Since PostgreSQL 14, we have a new feature that allows adding an alias 
to a JOIN USING clause.  This provides a better solution to this 
problem.  This patch changes the logic in ruleutils.c so that if naming 
conflicts with JOIN USING output columns are found in the query, these 
JOIN USING aliases with generated names are attached everywhere and the 
columns are then qualified everywhere.


The test output changes show the effects nicely.

Obviously, the copy-and-paste code in set_rtable_names() could be 
refactored a bit better, perhaps.  I also just named the generated 
aliases "ju" with numbers added, maybe there are other ideas for how to 
generate these names.From ae6aa961aeb0545bd05b9be3b3f79bc79f190293 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 7 Feb 2022 09:04:02 +0100
Subject: [PATCH] Use JOIN USING aliases in ruleutils.c

When reverse-compiling a query, ruleutils.c has some complicated code
to handle the join output columns of a JOIN USING join.  There used to
be no way to qualify those columns, and so if there was a naming
conflict anywhere in the query, those output columns had to be renamed
to be unique throughout the query.

Since PostgreSQL 14, we have a new feature that allows adding an alias
to a JOIN USING clause.  This provides a better solution to this
problem.  This patch changes the logic in ruleutils.c so that if
naming conflicts with JOIN USING output columns are found in the
query, these JOIN USING aliases with generated names are attached
everywhere and the columns are then qualified everywhere.
---
 src/backend/utils/adt/ruleutils.c | 102 
 src/test/regress/expected/create_view.out | 194 +++---
 2 files changed, 169 insertions(+), 127 deletions(-)

diff --git a/src/backend/utils/adt/ruleutils.c 
b/src/backend/utils/adt/ruleutils.c
index b16526e65e..5f18995d3d 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -138,11 +138,6 @@ typedef struct
  * array of AppendRelInfo nodes, indexed by child relid.  We use that to map
  * child-table Vars to their inheritance parents.
  *
- * In some cases we need to make names of merged JOIN USING columns unique
- * across the whole query, not only per-RTE.  If so, unique_using is true
- * and using_names is a list of C strings representing names already assigned
- * to USING columns.
- *
  * When deparsing plan trees, there is always just a single item in the
  * deparse_namespace list (since a plan tree never contains Vars with
  * varlevelsup > 0).  We store the Plan node that is the immediate
@@ -157,13 +152,13 @@ typedef struct
 {
List   *rtable; /* List of RangeTblEntry nodes 
*/
List   *rtable_names;   /* Parallel list of names for RTEs */
+   List   *join_using_aliases; /* Parallel list of join using aliases 
*/
List   *rtable_columns; /* Parallel list of deparse_columns structs 
*/
List   *subplans;   /* List of Plan trees for SubPlans */
List   *ctes;   /* List of CommonTableExpr 
nodes */
AppendRelInfo **appendrels; /* Array of AppendRelInfo nodes, or NULL */
/* Workspace for column alias assignment: */
boolunique_using;   /* Are we making USING names globally 
unique */
-   List   *using_names;/* List of assigned names for USING 
columns */
/* Remaining fields are used only when deparsing a Plan tree: */
Plan   *plan;   /* immediate parent of current 
expression */
List   *ancestors;  /* ancestors of plan */
@@ -3841,6 +3836,7 @@ set_rtable_names(deparse_namespace *dpns, List 
*parent_namespaces,
{
RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
char   *refname;
+   char   *join_using_alias;
 
/* Just in case this takes an unreasonable amount of time ... */
CHECK_FOR_INTERRUPTS();
@@ -3871,6 +3867,16 @@ set_rtable_names(deparse_namespace *dpns, List 
*parent_namespaces,
refname = rte->eref->aliasname;
}
 
+   if (rte->rtekind == RTE_JOIN && rte->joinmergedcols)
+   {
+   if (rte->join_using_alias)
+   join_using_alias = 
rte->join_using_alias->aliasname;
+   else
+   join_using_alias = "ju";
+   }
+   else
+   join_using_alias = NULL;
+
/*
 * If the selected name isn't