date:20230222

Re: [PATCH] Add pretty-printed XML output option

2023-02-22 Thread Peter Eisentraut


On 23.02.23 07:36, Jim Jones wrote:

On 23.02.23 02:52, Peter Smith wrote:

Here are my review comments for patch v17-0001.

==
src/test/regress/sql/xml.sql

The blank line(s) which previously separated the xmlserialize tests
from the xml IS [NOT] DOCUMENT tests are now missing...


v18 adds a new line in the xml.sql file to separate the xmlserialize 
test cases from the rest.


In kwlist.h you have

PG_KEYWORD("indent", INDENT, UNRESERVED_KEYWORD, AS_LABEL)

but you can actually make it BARE_LABEL, which is preferable.

More importantly, you need to add the new keyword to the 
bare_label_keyword production in gram.y.  I thought we had some tooling 
in the build system to catch this kind of omission, but it's apparently 
not working right now.


Elsewhere, let's rename the xmlformat() C function to xmlserialize() (or 
maybe something like xmlserialize_indent()), so the association is clearer.

Make some xlogreader messages more accurate

2023-02-22 Thread Peter Eisentraut

Here is a small patch to make some invalid-record error messages in 
xlogreader a bit more accurate (IMO).


My starting point was that when you have some invalid WAL, you often get 
a message like "wanted 24, got 0".  This is a bit incorrect, since it 
really wanted *at least* 24, not exactly 24.  So I have updated the 
messages to that effect, and also added that detail to one message where 
it was available but not printed.


Going through the remaining report_invalid_record() calls I then 
adjusted the use of "invalid" vs. "incorrect" in one case.  The message 
"record with invalid length" makes it sound like the length was 
something like -5, but really we know what the length should be and what 
we got wasn't it, so "incorrect" sounded better and is also used in 
other error messages in that file.From 36dca712966093a932d86263629f3a596691b061 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 23 Feb 2023 08:31:03 +0100
Subject: [PATCH] Make some xlogreader messages more accurate

---
 src/backend/access/transam/xlogreader.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/src/backend/access/transam/xlogreader.c 
b/src/backend/access/transam/xlogreader.c
index aa6c929477..36d711483c 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -623,8 +623,9 @@ XLogDecodeNextRecord(XLogReaderState *state, bool 
nonblocking)
}
else if (targetRecOff < pageHeaderSize)
{
-   report_invalid_record(state, "invalid record offset at %X/%X",
- 
LSN_FORMAT_ARGS(RecPtr));
+   report_invalid_record(state, "invalid record offset at %X/%X: 
wanted >=%u, got %u",
+ 
LSN_FORMAT_ARGS(RecPtr),
+ pageHeaderSize, 
targetRecOff);
goto err;
}
 
@@ -672,7 +673,7 @@ XLogDecodeNextRecord(XLogReaderState *state, bool 
nonblocking)
if (total_len < SizeOfXLogRecord)
{
report_invalid_record(state,
- "invalid 
record length at %X/%X: wanted %u, got %u",
+ "invalid 
record length at %X/%X: wanted >=%u, got %u",
  
LSN_FORMAT_ARGS(RecPtr),
  (uint32) 
SizeOfXLogRecord, total_len);
goto err;
@@ -1119,7 +1120,7 @@ ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr 
RecPtr,
if (record->xl_tot_len < SizeOfXLogRecord)
{
report_invalid_record(state,
- "invalid record 
length at %X/%X: wanted %u, got %u",
+ "invalid record 
length at %X/%X: wanted >=%u, got %u",
  
LSN_FORMAT_ARGS(RecPtr),
  (uint32) 
SizeOfXLogRecord, record->xl_tot_len);
return false;
@@ -1942,7 +1943,7 @@ DecodeXLogRecord(XLogReaderState *state,
 
 shortdata_err:
report_invalid_record(state,
- "record with invalid length 
at %X/%X",
+ "record with incorrect length 
at %X/%X",
  
LSN_FORMAT_ARGS(state->ReadRecPtr));
 err:
*errormsg = state->errormsg_buf;
-- 
2.39.2

Re: Add SHELL_EXIT_CODE to psql

2023-02-22 Thread Corey Huinker

On Wed, Feb 22, 2023 at 4:18 PM Thomas Munro  wrote:

> On Tue, Jan 31, 2023 at 9:23 AM Corey Huinker 
> wrote:
> >> Unfortunately, there is a fail in FreeBSD
> https://cirrus-ci.com/task/6466749487382528
> >>
> >> Maybe, this patch is need to be rebased?
> >>
> >
> > That failure is in postgres_fdw, which this code doesn't touch.
> >
> > I'm not able to get to
> /tmp/cirrus-ci-build/build/testrun/postgres_fdw-running/regress/regression.out
> - It's not in either of the browseable zip files and the 2nd zip file isn't
> downloadable, so it's hard to see what's going on.
>
> Yeah, that'd be our current top flapping CI test[1].  These new
> "*-running" tests (like the old "make installcheck") are only enabled
> in the FreeBSD CI task currently, so that's why the failures only show
> up there.  I see[2] half a dozen failures of the "fdw_retry_check"
> thing in the past ~24 hours.
>
> [1]
> https://www.postgresql.org/message-id/flat/20221209001511.n3vqodxobqgscuil%40awork3.anarazel.de
> [2] http://cfbot.cputube.org/highlights/regress.html


Thanks for the explanation. I was baffled.

Re: [PATCH] Add pretty-printed XML output option

2023-02-22 Thread Jim Jones


On 23.02.23 02:52, Peter Smith wrote:

Here are my review comments for patch v17-0001.

==
src/test/regress/sql/xml.sql

The blank line(s) which previously separated the xmlserialize tests
from the xml IS [NOT] DOCUMENT tests are now missing...


v18 adds a new line in the xml.sql file to separate the xmlserialize 
test cases from the rest.


Thanks!

Best, Jim
From a37e8cea68e9e6032e29b555b986c28d12f4a16b Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Mon, 20 Feb 2023 23:35:22 +0100
Subject: [PATCH v18] Add pretty-printed XML output option

This patch implements the XML/SQL:2011 feature 'X069, XMLSERIALIZE: INDENT.'
It adds the options INDENT and NO INDENT (default) to the existing
xmlserialize function. It uses the indentation feature of xmlDocDumpFormatMemory
from libxml2 to format XML strings. Although the INDENT feature is designed
to work with xml strings of type DOCUMENT, this implementation also allows
the usage of CONTENT type strings as long as it contains a well-formed xml -
note the XMLOPTION_DOCUMENT in the xml_parse call.

This patch also includes documentation, regression tests and their three
possible output files xml.out, xml_1.out and xml_2.out.
---
 doc/src/sgml/datatype.sgml|   8 +-
 src/backend/catalog/sql_features.txt  |   2 +-
 src/backend/executor/execExprInterp.c |  11 ++-
 src/backend/parser/gram.y |  12 ++-
 src/backend/parser/parse_expr.c   |   1 +
 src/backend/utils/adt/xml.c   |  41 ++
 src/include/nodes/parsenodes.h|   1 +
 src/include/nodes/primnodes.h |   1 +
 src/include/parser/kwlist.h   |   1 +
 src/include/utils/xml.h   |   1 +
 src/test/regress/expected/xml.out | 106 ++
 src/test/regress/expected/xml_1.out   |  74 ++
 src/test/regress/expected/xml_2.out   | 106 ++
 src/test/regress/sql/xml.sql  |  26 +++
 14 files changed, 384 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 467b49b199..53d59662b9 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4460,14 +4460,18 @@ xml 'bar'
 xml, uses the function
 xmlserialize:xmlserialize
 
-XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type )
+XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type [ [NO] INDENT ] )
 
 type can be
 character, character varying, or
 text (or an alias for one of those).  Again, according
 to the SQL standard, this is the only way to convert between type
 xml and character types, but PostgreSQL also allows
-you to simply cast the value.
+you to simply cast the value. The option INDENT allows to
+indent the serialized xml output - the default is NO INDENT.
+It is designed to indent XML strings of type DOCUMENT, but it can also
+   be used with CONTENT as long as value
+   contains a well-formed XML.

 

diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 3766762ae3..2e196faeeb 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -619,7 +619,7 @@ X061	XMLParse: character string input and DOCUMENT option			YES
 X065	XMLParse: binary string input and CONTENT option			NO	
 X066	XMLParse: binary string input and DOCUMENT option			NO	
 X068	XMLSerialize: BOM			NO	
-X069	XMLSerialize: INDENT			NO	
+X069	XMLSerialize: INDENT			YES	
 X070	XMLSerialize: character string serialization and CONTENT option			YES	
 X071	XMLSerialize: character string serialization and DOCUMENT option			YES	
 X072	XMLSerialize: character string serialization			YES	
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 19351fe34b..7ba3131d92 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3829,6 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
 			{
 Datum	   *argvalue = op->d.xmlexpr.argvalue;
 bool	   *argnull = op->d.xmlexpr.argnull;
+bool	indent = op->d.xmlexpr.xexpr->indent;
+text	   *data;
 
 /* argument type is known to be xml */
 Assert(list_length(xexpr->args) == 1);
@@ -3837,9 +3839,14 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
 	return;
 value = argvalue[0];
 
-*op->resvalue = PointerGetDatum(xmltotext_with_xmloption(DatumGetXmlP(value),
-		 xexpr->xmloption));
 *op->resnull = false;
+
+data = xmltotext_with_xmloption(DatumGetXmlP(value),
+xexpr->xmloption);
+if(indent)
+	*op->resvalue = PointerGetDatum(xmlformat(data));
+else
+	*op->resvalue = PointerGetDatum(data);
 			}
 			break;
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a0138382a1..2814f16082 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -619,6 +619,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases,

Re: pgindent vs. git whitespace check

2023-02-22 Thread Tom Lane

John Naylor  writes:
> On Thu, Feb 23, 2023 at 5:03 AM Andrew Dunstan  wrote:
>> I suspect not allowing // is at least a minor annoyance to any new
>> developer we acquire under the age of about 40.

> pgindent changes those to our style, so it's not much of an annoyance if
> one prefers to type it that way during development.

Right, it's not like we reject patches for that (or at least, we shouldn't
reject patches for any formatting issues that pgindent can fix).

For my own taste, I really don't have any objection to // in isolation --
the problem with it is just that we've got megabytes of code in the other
style.  I fear it'd look really ugly to have an intermixture of // and /*
comment styles.  Mass conversion of /* to // style would answer that,
but would also create an impossible back-patching problem.

regards, tom lane

RE: Allow logical replication to copy tables in binary format

2023-02-22 Thread Hayato Kuroda (Fujitsu)

Dear Jelte,

> I don't think it's necessary to check versions. Yes, there are
> situations where binary will fail across major versions. But in many
> cases it does not. To me it seems the responsibility of the operator
> to evaluate this risk. And if the operator chooses wrong and uses
> binary copy across incompatible versions, then it will still fail hard
> in that case during the copy phase (so still a very early error). So I
> don't see a reason to check pre-emptively, afaict it will only
> disallow some valid usecases and introduce more code.
> 
> Furthermore no major version check is done for "binary = true" either
> (afaik). The only additional failure scenario that copy_format=binary
> introduces is when one of the types does not implement a send function
> on the source. With binary=true, this would continue to work, but with
> copy_format=binary this stops working. All other failure scenarios
> that binary encoding of types introduces apply to both binary=true and
> copy_format=binary (the only difference being in which phase of the
> replication these failures happen, the apply or the copy phase).

I thought that current specification was lack of consideration, but you meant to
say that it is intentional one to keep the availability, right? 
Indeed my suggestion seems to be too pessimistic, but I want to listen to other
opinions more...

> > I'm not sure the combination of "copy_format = binary" and "copy_data = 
> > false"
> > should be accepted or not. How do you think?
> 
> It seems quite useless indeed to specify the format of a copy that won't 
> happen.

I understood that the conbination of "copy_format = binary" and "copy_data = 
false"
should be rejected in parse_subscription_options() and AlterSubscription(). Is 
it right?
I'm expecting that is done in next version.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: pgindent vs. git whitespace check

2023-02-22 Thread John Naylor

On Thu, Feb 23, 2023 at 5:03 AM Andrew Dunstan  wrote:
>
> I suspect not allowing // is at least a minor annoyance to any new
developer we acquire under the age of about 40.

pgindent changes those to our style, so it's not much of an annoyance if
one prefers to type it that way during development.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Improving inferred query column names

2023-02-22 Thread Tom Lane

Andres Freund  writes:
> On 2023-02-22 16:38:51 -0500, Tom Lane wrote:
>> The proposal so far was just to handle a function call wrapped
>> around something else by converting to the function name followed
>> by whatever we'd emit for the something else.

> SELECT sum(relpages), sum(reltuples), 1+1 FROM pg_class;
> ┌──┬───┬──┐
> │ sum_relpages │ sum_reltuples │ ?column? │
> ├──┼───┼──┤

So far so good, but what about multi-argument functions?
Do we do "f_x_y_z", and truncate wherever?  How well will this
work with nested function calls?

>> You cannot realistically
>> handle, say, operator expressions without emitting names that will
>> require quoting, which doesn't seem attractive.

> Well, it doesn't require much to be better than "?column?", which already
> requires quoting...

I think the point of "?column?" is to use something that nobody's going
to want to reference that way, quoted or otherwise.  The SQL spec says
(in SQL:2021, it's 7.16  syntax rule 18) that if the
column expression is anything more complex than a simple column reference
(or SQL parameter reference, which I think we don't support) then the
column name is implementation-dependent, which is standards-ese for
"here be dragons".

BTW, SQL92 and SQL99 had a further constraint:

c) Otherwise, the  of the i-th column of the  is implementation-dependent and different
  from the  of any column, other than itself, of
  a table referenced by any  contained in the
  SQL-statement.

We never tried to implement that literally, and now I'm glad we didn't
bother, because recent spec versions only say "implementation-dependent",
full stop.  In any case, the spec is clearly in the camp of "don't depend
on these column names".

> We could just do something like printing __. So
> something like avg(reltuples / relpages) would end up as
> avg_reltuples_float48div_relpages.
> Whether that's worth it, or whether column name lengths would be too painful,
> IDK.

I think you'd soon be hitting NAMEDATALEN limits ...

>> And no, deduplication isn't on the table at all here.

> +1

I remembered while looking at the spec that duplicate column names
in SELECT output are not only allowed but *required* by the spec.
If you write, say, "SELECT 1 AS x, 2 AS x, ..." then the column
names of those two columns are both "x", no wiggle room at all.
So I see little point in trying to deduplicate generated names,
even aside from the points you made.

regards, tom lane

Re: "out of relcache_callback_list slots" after multiple calls to pg_logical_slot_get_binary_changes

2023-02-22 Thread Peter Smith

On Thu, Feb 23, 2023 at 11:28 AM Michael Paquier  wrote:
>
> On Wed, Feb 22, 2023 at 10:21:51AM +, shiy.f...@fujitsu.com wrote:
> > Thanks for your reply. Using two flags makes sense to me.
> > Attach the updated patch.
>
> Fine by me as far as it goes.  Any thoughts from others?
> --

Should the 'relation_callback_registered' variable name be plural?

Otherwise, LGTM.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: [PATCH] Add pretty-printed XML output option

2023-02-22 Thread Peter Smith

Here are my review comments for patch v17-0001.

==
src/test/regress/sql/xml.sql

The blank line(s) which previously separated the xmlserialize tests
from the xml IS [NOT] DOCUMENT tests are now missing...


e.g.

-- indent different encoding (returns UTF-8)
SELECT xmlserialize(DOCUMENT '' AS
text INDENT);
SELECT xmlserialize(CONTENT  '' AS
text INDENT);
-- 'no indent' = not using 'no indent'
SELECT xmlserialize(DOCUMENT '42' AS text) = xmlserialize(DOCUMENT
'42' AS text NO INDENT);
SELECT xmlserialize(CONTENT '42' AS text) = xmlserialize(CONTENT
'42' AS text NO INDENT);
SELECT xml 'bar' IS DOCUMENT;
SELECT xml 'barfoo' IS DOCUMENT;
SELECT xml '' IS NOT DOCUMENT;
SELECT xml 'abc' IS NOT DOCUMENT;
SELECT '<>' IS NOT DOCUMENT;

~~

Apart from that, patch v17 LGTM.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: buildfarm + meson

2023-02-22 Thread Andres Freund

Hi,

On 2023-02-22 18:23:44 -0500, Andrew Dunstan wrote:
> Here's a progress report on adapting the buildfarm client to meson
> 
> There is a development branch where I'm working on the changes. They can be
> seen here:
> 
> 
> 
> On my Linux box (Fedora 37, where crake runs) I can get a complete run.

Nice!


> There is work to do to make sure we pick up the right log files, and maybe
> adjust a module or two. I have adopted a design where instead of trying to
> know a lot about the testing regime the client needs to know a lot less.
> Instead, it gets meson to tell it the set of tests. I will probably work on
> enabling some sort of filter, but I think this makes things more
> future-proof. I have stuck with the design of making testing fairly
> fine-grained, so each suite runs separately.

I don't understand why you'd want to run each suite separately. Serially
executing the test takes way longer than doing so in parallel. Why would we
want to enforce that?

Particularly because with meson the tests log files and the failed tests can
directly be correlated? And it should be easy to figure out which log files
need to be kept, you can just skip the directories in testrun/ that contain
test.success.


> On a Windows instance, fairly similar to what's running drongo, I can get a
> successful build with meson+VS2019, but I'm getting an error in the
> regression tests, which don't like setting lc_time to 'de_DE'. Not sure
> what's going on there.

Huh, that's odd.


> meson apparently wants touch and cp installed, although I can't see why at
> first glance. For Windows I just copied them into the path from an msys2
> installation.

Those should probably be fixed.

Greetings,

Andres Freund

Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-02-22 Thread Peter Smith

Patch v6 LGTM.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: XLogReadBufferExtended() vs disconnected segments

2023-02-22 Thread Andres Freund

Hi,

On 2023-02-22 17:01:47 -0800, Andres Freund wrote:
> One way to to defend against this would be to make mdextend(), whenever it
> extends into the last block of a segment, unlink the next segment - it can't
> be a validly existing contents.  But it seems scary to just unlink entire
> segments.

Another way might be for XLOG_SMGR_TRUNCATE record, as well as smgr unlinks in
commit/abort records, to include not just the "target size", as we do today,
but to also include the current size.

I'm not sure that'd fix all potential issues, but it seems like it'd fix a lot
of the more obvious issues, because it'd prevent scenarios like a base backup
copying segment N, without copying N - 1, due to a concurrent truncate/drop,
from causing harm. Due to the range being included in the WAL record, replay
would know that N needs to be unlinked, even if smgrnblocks() thinks the
relation is much smaller.

Greetings,

Andres Freund

XLogReadBufferExtended() vs disconnected segments

2023-02-22 Thread Andres Freund

Hi,

I was trying to implement ExtendRelationBufferedTo(), responding to a review
comment by Heikki, in
https://www.postgresql.org/message-id/2023003152.rh4s75aedj65h...@awork3.anarazel.de

Which lead me to stare at the P_NEW do while loop in
XLogReadBufferExtended(). I first started to reply on that thread, but it
seems like a big enough issue that it seemed worth starting a separate thread.

The relevant logic was added in 6f2aead1ffec, the relevant discussion is at
https://www.postgresql.org/message-id/32313.1392231107%40sss.pgh.pa.us

My understanding of what happend there is that we tried to extend a relation,
sized one block below a segment boundary, and after that the relation was much
larger, because the next segment file existed, and had a non-zero size. And
because we extended blkno-lastblock times, we'd potentially blow up the
relation size much more than intended.

The actual cause of that in the reported case appears to have been a bug in
wal-e. But I suspect it's possible to hit something like that without such
problems, just due to crashes on the replica, or "skew" while taking a base
backup.


I find it pretty sketchy that we just leave the contents of the previously
"disconnected" segment contents around, without using log_invalid_page() for
the range, or warning, or ...

Most of the time this issue would be fixed due to later WAL replay
initializing the later segment. But I don't think there's a guarantee for
that (see below).

It'd be one thing if we accidentally used data in such a segment, if the
situation is only caused by a bug in base backup tooling, or filesystem
corruption, or ...

But I think we can encounter the issue without anything like that being
involved. Imagine this scenario:

1) filenode A gets extended to segment 3
2) basebackup starts, including performing a checkpoint
3) basebackup ends up copying A's segment 3 first, while in progress
4) filenode A is dropped
5) checkpoint happens, allowing smgrrel 10 to be used again
6) filenode 10 is created newly
7) basebackup ends

At that point A will have segment 0, segment 3. The WAL replay for 4) won't
drop segment 3, because an smgrnblocks() won't even see it, because segment 2
doesn't exist.

If a replica starts from this base backup, we'll be fine until A again grows
far enough to fill segment 2. At that point, we'll suddenly have completely
bogus contents in 3. Obviously accesses to those contents could trivially
crash at that point.


I suspect there's an easier to hit version of this: Consider this path in
ExecuteTruncateGuts():

/*
 * Normally, we need a transaction-safe truncation here.  
However, if
 * the table was either created in the current (sub)transaction 
or has
 * a new relfilenumber in the current (sub)transaction, then we 
can
 * just truncate it in-place, because a rollback would cause 
the whole
 * table or the current physical file to be thrown away anyway.
 */
if (rel->rd_createSubid == mySubid ||
rel->rd_newRelfilelocatorSubid == mySubid)
{
/* Immediate, non-rollbackable truncation is OK */
heap_truncate_one_rel(rel);



Afaict that could easily lead to a version of the above that doesn't even
require relfilenodes getting recycled.


One way to to defend against this would be to make mdextend(), whenever it
extends into the last block of a segment, unlink the next segment - it can't
be a validly existing contents.  But it seems scary to just unlink entire
segments.


Greetings,

Andres Freund

Re: meson and sslfiles.mk in src/test/ssl/

2023-02-22 Thread Michael Paquier

On Wed, Feb 22, 2023 at 11:33:09AM -0800, Andres Freund wrote:
> sslfiles.mk doesn't depend on the rest of the buildsystem, and is a rarely
> executed command, so I don't see a problem with using make to update the
> files. At least for a long while.

Agreed to keep things simple for now, even if it means an implicit
dependency to make for developers.
--
Michael


signature.asc
Description: PGP signature

Re: "out of relcache_callback_list slots" after multiple calls to pg_logical_slot_get_binary_changes

2023-02-22 Thread Michael Paquier

On Wed, Feb 22, 2023 at 10:21:51AM +, shiy.f...@fujitsu.com wrote:
> Thanks for your reply. Using two flags makes sense to me.
> Attach the updated patch.

Fine by me as far as it goes.  Any thoughts from others?
--
Michael


signature.asc
Description: PGP signature

Re: buildfarm + meson

2023-02-22 Thread Michael Paquier

On Wed, Feb 22, 2023 at 06:23:44PM -0500, Andrew Dunstan wrote:
> On my Linux box (Fedora 37, where crake runs) I can get a complete run.
> There is work to do to make sure we pick up the right log files, and maybe
> adjust a module or two. I have adopted a design where instead of trying to
> know a lot about the testing regime the client needs to know a lot less.
> Instead, it gets meson to tell it the set of tests. I will probably work on
> enabling some sort of filter, but I think this makes things more
> future-proof. I have stuck with the design of making testing fairly
> fine-grained, so each suite runs separately.

Nice!

> On a Windows instance, fairly similar to what's running drongo, I can get a
> successful build with meson+VS2019, but I'm getting an error in the
> regression tests, which don't like setting lc_time to 'de_DE'. Not sure
> what's going on there.

What's the regression issue?  Some text-field ordering that ought to
be enforced with a C collation?
--
Michael


signature.asc
Description: PGP signature

Re: We shouldn't signal process groups with SIGQUIT

2023-02-22 Thread Michael Paquier

On Wed, Feb 22, 2023 at 09:39:55AM -0500, Tom Lane wrote:
> Michael Paquier  writes:
>> What would be the advantage of doing that for groups other than
>> -StartupPID and -PgArchPID?  These are the two groups of processes we
>> need to worry about, AFAIK.
> 
> No, we have the issue for regular backends too, since they could be
> executing COPY FROM PROGRAM or the like (not to mention that functions
> in plperlu, plpythonu, etc could spawn child processes).

Indeed, right.  I completely forgot about these cases.
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] Add pretty-printed XML output option

2023-02-22 Thread Jim Jones


On 22.02.23 23:45, Peter Smith wrote:

src/backend/executor/execExprInterp.c

2. ExecEvalXmlExpr

@@ -3829,7 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
{
Datum*argvalue = op->d.xmlexpr.argvalue;
bool*argnull = op->d.xmlexpr.argnull;
-
+ boolindent = op->d.xmlexpr.xexpr->indent;
+ text*data;
/* argument type is known to be xml */
Assert(list_length(xexpr->args) == 1);
Missing whitespace after the variable declarations

Whitespace added.

~

Oh, I meant something different to that fix. I meant there is a
missing blank line after the last ('data') variable declaration.

I believe I see it now (it took me a while) :)

==
Test code.

I wondered if there ought to be a test that demonstrates explicitly
saying NO INDENT will give the identical result to just omitting it.

For example:

test=# -- no indent is default
test=# SELECT xmlserialize(DOCUMENT '42' AS text) = xmlserialize(DOCUMENT
'42' AS text NO INDENT);
  ?column?
--
  t
(1 row)

test=# SELECT xmlserialize(CONTENT '42' AS text) = xmlserialize(CONTENT
'42' AS text NO INDENT);
  ?column?
--
  t
(1 row)


Actually NO INDENT just ignores this feature and doesn't call the 
function at all, so in this particular case the result sets will always 
be identical. But yes, I totally agree that a test case for that is also 
important.


v17 attached.

Thanks!

Best, Jim

From 98524ed5e39188c2ad177c3f22159d3aff301899 Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Mon, 20 Feb 2023 23:35:22 +0100
Subject: [PATCH v17] Add pretty-printed XML output option

This patch implements the XML/SQL:2011 feature 'X069, XMLSERIALIZE: INDENT.'
It adds the options INDENT and NO INDENT (default) to the existing
xmlserialize function. It uses the indentation feature of xmlDocDumpFormatMemory
from libxml2 to format XML strings. Although the INDENT feature is designed
to work with xml strings of type DOCUMENT, this implementation also allows
the usage of CONTENT type strings as long as it contains a well-formed xml -
note the XMLOPTION_DOCUMENT in the xml_parse call.

This patch also includes documentation, regression tests and their three
possible output files xml.out, xml_1.out and xml_2.out.
---
 doc/src/sgml/datatype.sgml|   8 +-
 src/backend/catalog/sql_features.txt  |   2 +-
 src/backend/executor/execExprInterp.c |  11 ++-
 src/backend/parser/gram.y |  12 ++-
 src/backend/parser/parse_expr.c   |   1 +
 src/backend/utils/adt/xml.c   |  41 ++
 src/include/nodes/parsenodes.h|   1 +
 src/include/nodes/primnodes.h |   1 +
 src/include/parser/kwlist.h   |   1 +
 src/include/utils/xml.h   |   1 +
 src/test/regress/expected/xml.out | 106 ++
 src/test/regress/expected/xml_1.out   |  74 ++
 src/test/regress/expected/xml_2.out   | 106 ++
 src/test/regress/sql/xml.sql  |  27 ++-
 14 files changed, 384 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 467b49b199..53d59662b9 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4460,14 +4460,18 @@ xml 'bar'
 xml, uses the function
 xmlserialize:xmlserialize
 
-XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type )
+XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type [ [NO] INDENT ] )
 
 type can be
 character, character varying, or
 text (or an alias for one of those).  Again, according
 to the SQL standard, this is the only way to convert between type
 xml and character types, but PostgreSQL also allows
-you to simply cast the value.
+you to simply cast the value. The option INDENT allows to
+indent the serialized xml output - the default is NO INDENT.
+It is designed to indent XML strings of type DOCUMENT, but it can also
+   be used with CONTENT as long as value
+   contains a well-formed XML.

 

diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 3766762ae3..2e196faeeb 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -619,7 +619,7 @@ X061	XMLParse: character string input and DOCUMENT option			YES
 X065	XMLParse: binary string input and CONTENT option			NO	
 X066	XMLParse: binary string input and DOCUMENT option			NO	
 X068	XMLSerialize: BOM			NO	
-X069	XMLSerialize: INDENT			NO	
+X069	XMLSerialize: INDENT			YES	
 X070	XMLSerialize: character string serialization and CONTENT option			YES	
 X071	XMLSerialize: character string serialization and DOCUMENT option			YES	
 X072	XMLSerialize: character string serialization			YES	
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 19351fe34b..7ba3131d92 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3829,6 +3829,8 @@ ExecEvalXmlExpr(ExprState *state,

buildfarm + meson

2023-02-22 Thread Andrew Dunstan


Here's a progress report on adapting the buildfarm client to meson

There is a development branch where I'm working on the changes. They can 
be seen here:




On my Linux box (Fedora 37, where crake runs) I can get a complete run. 
There is work to do to make sure we pick up the right log files, and 
maybe adjust a module or two. I have adopted a design where instead of 
trying to know a lot about the testing regime the client needs to know a 
lot less. Instead, it gets meson to tell it the set of tests. I will 
probably work on enabling some sort of filter, but I think this makes 
things more future-proof. I have stuck with the design of making testing 
fairly fine-grained, so each suite runs separately.


On a Windows instance, fairly similar to what's running drongo, I can 
get a successful build with meson+VS2019, but I'm getting an error in 
the regression tests, which don't like setting lc_time to 'de_DE'. Not 
sure what's going on there.


meson apparently wants touch and cp installed, although I can't see why 
at first glance. For Windows I just copied them into the path from an 
msys2 installation.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

2023-02-22 Thread Andres Freund

Hi,

On 2023-02-22 16:34:44 -0500, Tom Lane wrote:
> I wrote:
> > Andres Freund  writes:
> >> Maybe it's worth sticking a StaticAssert() for the struct size
> >> somewhere.
> 
> > Indeed.  I thought we had one already.
> 
> >> I'm a bit wary about that being too noisy, there are some machines with
> >> odd alignment requirements. Perhaps worth restricting the assertion to
> >> x86-64 + armv8 or such?
> 
> > I'd put it in first and only reconsider if it shows unfixable problems.
> 
> Now that we've got the sizeof(ExprEvalStep) under control, shouldn't
> we do the attached?

Indeed. Pushed.

Let's hope there's no rarely used architecture with odd alignment rules.

Greetings,

Andres Freund

Re: [PATCH] Add pretty-printed XML output option

2023-02-22 Thread Peter Smith

Here are some review comments for patch v16-0001.

==
> src/backend/executor/execExprInterp.c
>
> 2. ExecEvalXmlExpr
>
> @@ -3829,7 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
>{
>Datum*argvalue = op->d.xmlexpr.argvalue;
>bool*argnull = op->d.xmlexpr.argnull;
> -
> + boolindent = op->d.xmlexpr.xexpr->indent;
> + text*data;
>/* argument type is known to be xml */
>Assert(list_length(xexpr->args) == 1);
> Missing whitespace after the variable declarations
Whitespace added.

~

Oh, I meant something different to that fix. I meant there is a
missing blank line after the last ('data') variable declaration.

==
Test code.

I wondered if there ought to be a test that demonstrates explicitly
saying NO INDENT will give the identical result to just omitting it.

For example:

test=# -- no indent is default
test=# SELECT xmlserialize(DOCUMENT '42' AS text) = xmlserialize(DOCUMENT
'42' AS text NO INDENT);
 ?column?
--
 t
(1 row)

test=# SELECT xmlserialize(CONTENT '42' AS text) = xmlserialize(CONTENT
'42' AS text NO INDENT);
 ?column?
--
 t
(1 row)

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Improving inferred query column names

2023-02-22 Thread Andres Freund

Hi,

On 2023-02-22 16:38:51 -0500, Tom Lane wrote:
> Vladimir Churyukin  writes:
> > It doesn't need to be perfect, but it needs to be consistent. So far you
> > proposed a rule to replace () with _. What is the plan for expressions, how
> > to convert them to names (with deduplication I guess?, because there could
> > be 2 similar expressions mapped to the same name potentially).
> 
> I do not think we need to do anything for arbitrary expressions.

Exactly. It's not like they have a useful name today.  Nor are they unique.


> The proposal so far was just to handle a function call wrapped
> around something else by converting to the function name followed
> by whatever we'd emit for the something else.

Just to showcase that better, what I think we're discussing is changing:

SELECT sum(relpages), sum(reltuples), 1+1 FROM pg_class;
┌──┬┬──┐
│ sum  │  sum   │ ?column? │
├──┼┼──┤
│ 2774 │ 257896 │2 │
└──┴┴──┘
(1 row)

to

SELECT sum(relpages), sum(reltuples), 1+1 FROM pg_class;
┌──┬───┬──┐
│ sum_relpages │ sum_reltuples │ ?column? │
├──┼───┼──┤
│ 2774 │257896 │2 │
└──┴───┴──┘
(1 row)


> You cannot realistically
> handle, say, operator expressions without emitting names that will
> require quoting, which doesn't seem attractive.

Well, it doesn't require much to be better than "?column?", which already
requires quoting...

We could just do something like printing __. So
something like avg(reltuples / relpages) would end up as
avg_reltuples_float48div_relpages.

Whether that's worth it, or whether column name lengths would be too painful,
IDK.


> And no, deduplication isn't on the table at all here.

+1

Greetings,

Andres Freund

Re: Experiments with Postgres and SSL

2023-02-22 Thread Jacob Champion

On Wed, Feb 22, 2023 at 4:26 AM Heikki Linnakangas  wrote:
> On 20/01/2023 03:28, Jacob Champion wrote:
> > If you want to multiplex protocols on a port, now is an excellent time
> > to require clients to use ALPN on implicit-TLS connections. (There are
> > no clients that can currently connect via implicit TLS, so you'll
> > never have another chance to force the issue without breaking
> > backwards compatibility.) That should hopefully make it harder to
> > ALPACA yourself or others [2].
>
> Good idea. Do we want to just require the protocol to be "postgres", or
> perhaps "postgres/3.0"? Need to register that with IANA, I guess.

Unless you plan to make the next minor protocol version fundamentally
incompatible, I don't think there's much reason to add '.0'. (And even
if that does happen, 'postgres/3.1' is still distinct from
'postgres/3'. Or 'postgres' for that matter.) The Expert Review
process might provide some additional guidance?

> We implemented a protocol version negotiation mechanism in the libpq
> protocol itself, how would this interact with it? If it's just
> "postgres", then I guess we'd still negotiate the protocol version and
> list of extensions after the TLS handshake.

Yeah. You could choose to replace major version negotiation completely
with ALPN, I suppose, but there might not be any maintenance benefit
if you still have to support plaintext negotiation. Maybe there are
performance implications to handling the negotiation earlier vs.
later?

Note that older versions of TLS will expose the ALPN in plaintext...
but that may not be a factor by the time a postgres/4 shows up, and if
the next protocol is incompatible then it may not be feasible to hide
the differences via transport encryption anyway.

> Regarding Vladimir's comments on how clients can migrate to this, I
> don't have any great suggestions. To summarize, there are several options:
>
> - Add an "fast_tls" option that the user can enable if they know the
> server supports it

I like that such an option could eventually be leveraged for a
postgresqls:// URI scheme (which should not fall back, ever). There
would be other things we'd have to change first to make that a reality
-- postgresqls://example.com?host=evil.local is problematic, for
example -- but it'd be really nice to have an HTTPS equivalent.

--Jacob

Re: refactoring relation extension and BufferAlloc(), faster COPY

2023-02-22 Thread Andres Freund

Hi,

On 2023-02-21 11:22:26 -0800, Andres Freund wrote:
> On 2023-02-21 18:18:02 +0200, Heikki Linnakangas wrote:
> > Do other ReadBufferModes than RBM_ZERO_AND_LOCK make sense with
> > ExtendRelationBuffered?
> 
> Hm. That's a a good point. Probably not. Perhaps it could be useful to support
> RBM_NORMAL as well? But even if, it'd just be a lock release away if we always
> used RBM_ZERO_AND_LOCK.

There's a fair number of callers using RBM_NORMAL, via ReadBuffer[Extended]()
right now. While some of them are trivial to convert, others aren't (e.g.,
brin_getinsertbuffer()).  So I'm inclined to continue allowing RBM_NORMAL.

Greetings,

Andres Freund

Re: pgindent vs. git whitespace check

2023-02-22 Thread Andrew Dunstan



On 2023-02-22 We 09:52, Tom Lane wrote:

pgindent has never been very kind to non-end-of-line comments, and
I'm not excited about working on making it do so.  As a thought
experiment, what would happen if we reversed course and started
allowing "//" comments?  Naive conversion of this comment could
break the code altogether.  (Plenty of programming languages
don't even *have* non-end-of-line comments.)





I suspect not allowing // is at least a minor annoyance to any new 
developer we acquire under the age of about 40.



cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: Improving inferred query column names

2023-02-22 Thread Tom Lane

Vladimir Churyukin  writes:
> It doesn't need to be perfect, but it needs to be consistent. So far you
> proposed a rule to replace () with _. What is the plan for expressions, how
> to convert them to names (with deduplication I guess?, because there could
> be 2 similar expressions mapped to the same name potentially).

I do not think we need to do anything for arbitrary expressions.
The proposal so far was just to handle a function call wrapped
around something else by converting to the function name followed
by whatever we'd emit for the something else.  You cannot realistically
handle, say, operator expressions without emitting names that will
require quoting, which doesn't seem attractive.

And no, deduplication isn't on the table at all here.

regards, tom lane

Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

2023-02-22 Thread Tom Lane

I wrote:
> Andres Freund  writes:
>> Maybe it's worth sticking a StaticAssert() for the struct size
>> somewhere.

> Indeed.  I thought we had one already.

>> I'm a bit wary about that being too noisy, there are some machines with
>> odd alignment requirements. Perhaps worth restricting the assertion to
>> x86-64 + armv8 or such?

> I'd put it in first and only reconsider if it shows unfixable problems.

Now that we've got the sizeof(ExprEvalStep) under control, shouldn't
we do the attached?

regards, tom lane

diff --git a/src/include/executor/execExpr.h b/src/include/executor/execExpr.h
index 86e1ac1e65..06c3adc0a1 100644
--- a/src/include/executor/execExpr.h
+++ b/src/include/executor/execExpr.h
@@ -669,6 +669,10 @@ typedef struct ExprEvalStep
 	}			d;
 } ExprEvalStep;
 
+/* Enforce the size rule given in the comment above */
+StaticAssertDecl(sizeof(ExprEvalStep) <= 64,
+ "size of ExprEvalStep exceeds 64 bytes");
+
 
 /* Non-inline data for container operations */
 typedef struct SubscriptingRefState

Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode

2023-02-22 Thread Melanie Plageman

Hi,

So, I attached a rough implementation of both the autovacuum failsafe
reverts to shared buffers and the vacuum option (no tests or docs or
anything).

The first three patches in the set are just for enabling use of shared
buffers in failsafe mode for autovacuum. I haven't actually ensured it
works (i.e. triggering failsafe mode and checking the stats for whether
or not shared buffers were used).

I was wondering about the status of the autovacuum wraparound failsafe
test suggested in [1]. I don't see it registered for the March's
commitfest. I'll probably review it since it will be useful for this
patchset.

The first patch in the set is to free the BufferAccessStrategy object
that is made in do_autovacuum() -- I don't see when the memory context
it is allocated in is destroyed, so it seems like it might be a leak?

The last patch in the set is a trial implementation of the VACUUM option
suggested -- BUFFER_USAGE_LIMIT. More on that below.

On Wed, Jan 11, 2023 at 4:39 PM Andres Freund  wrote:

> Hi,
>
> On 2023-01-11 16:18:34 -0500, Tom Lane wrote:
> > Peter Geoghegan  writes:
> > > On Wed, Jan 11, 2023 at 11:18 AM Andres Freund 
> wrote:
> > >> I don't like that - it's also quite useful to disable use of
> ringbuffers when
> > >> you actually need to clean up indexes. Especially when we have a lot
> of dead
> > >> tuples we'll rescan indexes over and over...
> >
> > > That's a fair point.
> >
> > > My vote goes to "REUSE_BUFFERS", then.
> >
> > I wonder whether it could make sense to allow a larger ringbuffer size,
> > rather than just the limit cases of "on" and "off".
>
> I can see that making sense, particularly if we were to later extend this
> to
> other users of ringbuffers. E.g. COPYs us of the ringbuffer makes loading
> of
> data > 16MB but also << s_b vastly slower, but it can still be very
> important
> to use if there's lots of parallel processes loading data.
>
> Maybe BUFFER_USAGE_LIMIT, with a value from -1 to N, with -1 indicating the
> default value, 0 preventing use of a buffer access strategy, and 1..N
> indicating the size in blocks?
>
>
I have found the implementation you suggested very hard to use.
The attached fourth patch in the set implements it the way you suggest.
I had to figure out what number to set the BUFER_USAGE_LIMIT to -- and,
since I don't specify shared buffers in units of nbuffer, it's pretty
annoying to have to figure out a valid number.

I think that it would be better to have it be either a percentage of
shared buffers or a size in units of bytes/kb/mb like that of shared
buffers.

Using a fraction or percentage appeals to me because you don't need to
reference your shared buffers setting and calculate what size you want
to set it to. Also, parsing the size in different units sounds like more
work.

Unfortunately, the fraction doesn't really work if we cap the ring size
of a buffer access strategy to NBuffers / 8. Also, there are other
issues like what would 0% and 100% mean.

I have a list of other questions, issues, and TODOs related to the code
I wrote to implement BUFFER_USAGE_LIMIT, but I'm not sure those are
worth discussing until we shape up the interface.

> Would we want to set an upper limit lower than implied by the memory limit
> for
> the BufferAccessStrategy allocation?
>
>
So, I was wondering what you thought about NBuffers / 8 (the current
limit). Does it make sense?

If we clamp the user-specified value to this, I think we definitely need
to inform them through some kind of logging or message. I am sure there
are lots of other gucs doing this -- do you know any off the top of your
head?

- Melanie

 [1]
https://www.postgresql.org/message-id/flat/CAB8KJ%3Dj1b3kscX8Cg5G%3DQ39ZQsv2x4URXsuTueJLz%3DfcvJ3eoQ%40mail.gmail.com#ee67664e85c4d11596a92cc71780d29c
From 313deb07f729924da650eb01d31cdb722950a5b1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Wed, 22 Feb 2023 12:26:01 -0500
Subject: [PATCH v1 3/4] use shared buffers when failsafe active

---
 src/backend/access/heap/vacuumlazy.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8f14cf85f3..b319a244d5 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2622,6 +2622,11 @@ lazy_check_wraparound_failsafe(LVRelState *vacrel)
 	if (unlikely(vacuum_xid_failsafe_check(>cutoffs)))
 	{
 		vacrel->failsafe_active = true;
+		/*
+		 * Assume the caller who allocated the memory for the
+		 * BufferAccessStrategy object will free it.
+		 */
+		vacrel->bstrategy = NULL;

 		/* Disable index vacuuming, index cleanup, and heap rel truncation */
 		vacrel->do_index_vacuuming = false;
-- 
2.37.2

From 49552b63a87365d9ea7c4b9d74f77007e08b54fc Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Wed, 22 Feb 2023 11:59:53 -0500
Subject: [PATCH v1 1/4] dont leak strategy object

---
 src/backend/postmaster/autovacuum.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git

Re: Improving inferred query column names

2023-02-22 Thread Vladimir Churyukin

On Wed, Feb 22, 2023, 12:40 PM Andres Freund  wrote:

> Hi,
>
> On 2023-02-11 12:47:04 -0800, Vladimir Churyukin wrote:
> > That is a good idea for simple cases, I'm just curious how it would look
> > like for more complex cases (you can have all kinds of expressions as
> > parameters for aggregate function calls).
> > If it works only for simple cases, I think it would be confusing and not
> > very helpful.
>
> I don't think it needs to be perfect to be helpful.
>

It doesn't need to be perfect, but it needs to be consistent. So far you
proposed a rule to replace () with _. What is the plan for expressions, how
to convert them to names (with deduplication I guess?, because there could
be 2 similar expressions mapped to the same name potentially).

> > Wouldn't it make more sense to just deduplicate the names by adding
> > numerical postfixes, like sum_1, sum_2?
>
> That'd be considerably worse than what we do today imo, because any
> reordering
> / added aggregate would lead to everything else changing as well.
>

Ok, that I kinda agree with. Not necessarily worse overall, but worse for
some cases. Well, the proposal above about keeping the names exactly the
same as the full expressions is probably the best we can do then. It will
take care of possible duplications and won't be position-sensitive. And
will be consistent. The only issue is somewhat unusual column names that
you will have to use quotes to refer to. But is that a real issue?

-Vladimir Churyukin

>

Re: Add SHELL_EXIT_CODE to psql

2023-02-22 Thread Thomas Munro

On Tue, Jan 31, 2023 at 9:23 AM Corey Huinker  wrote:
>> Unfortunately, there is a fail in FreeBSD 
>> https://cirrus-ci.com/task/6466749487382528
>>
>> Maybe, this patch is need to be rebased?
>>
>
> That failure is in postgres_fdw, which this code doesn't touch.
>
> I'm not able to get to 
> /tmp/cirrus-ci-build/build/testrun/postgres_fdw-running/regress/regression.out
>  - It's not in either of the browseable zip files and the 2nd zip file isn't 
> downloadable, so it's hard to see what's going on.

Yeah, that'd be our current top flapping CI test[1].  These new
"*-running" tests (like the old "make installcheck") are only enabled
in the FreeBSD CI task currently, so that's why the failures only show
up there.  I see[2] half a dozen failures of the "fdw_retry_check"
thing in the past ~24 hours.

[1] 
https://www.postgresql.org/message-id/flat/20221209001511.n3vqodxobqgscuil%40awork3.anarazel.de
[2] http://cfbot.cputube.org/highlights/regress.html

Re: [PATCH] Fix unbounded authentication exchanges during PQconnectPoll()

2023-02-22 Thread Jacob Champion

On Wed, Feb 22, 2023 at 11:43 AM Heikki Linnakangas  wrote:
> I separated the earlier message-length checks so that you get "invalid
> invalid authentication request" or "received invalid protocol
> negotiation message", depending on whether it was an 'R' or 'v' message.
> With that, "invalid invalid authentication request" becomes translatable
> anyway, which makes the point on translation overhead moot. I added a
> comment to mention that it's unreachable, though.

Looks good, thank you!

--Jacob

Re: pg_regress: Treat child process failure as test failure

2023-02-22 Thread Tom Lane

Daniel Gustafsson  writes:
>> On 22 Feb 2023, at 21:33, Andres Freund  wrote:
>> On 2023-02-22 15:10:11 +0100, Daniel Gustafsson wrote:
>>> Rebased patch to handle breakage of v2 due to bd8d453e9b.

>> I think we probably should just apply this? The current behaviour doesn't 
>> seem
>> right, and I don't see a downside of the new behaviour?

> Agreed, I can't think of a regression test where we wouldn't want this.  My
> only concern was if any of the ECPG tests were doing something odd that would
> break from this but I can't see anything.

+1.  I was a bit surprised to realize that we might not count such
a case as a failure.

regards, tom lane

Re: Wrong query results caused by loss of join quals

2023-02-22 Thread Tom Lane

Richard Guo  writes:
> On Mon, Feb 20, 2023 at 4:56 AM Tom Lane  wrote:
>> * I suspect the other use of get_common_eclass_indexes, in
>> have_relevant_eclass_joinclause, is broken as well.

> It seems have_relevant_joinclause is broken for the presented case.  It
> does not get a change to call have_relevant_eclass_joinclause, because
> flag 'has_eclass_joins' is not set for t1 due to t1 being not in the
> EC's ec_relids.  As a result, have_relevant_joinclause thinks there is
> no joinclause that involves t1 and t2, which is not right.

I thought about this and decided that it's not really a problem.
have_relevant_joinclause is just a heuristic, and I don't think we
need to prioritize forming a join if the only relevant clauses look
like this.  We won't be able to use such clauses for merge or hash,
so we're going to end up with an unconstrained nestloop, which isn't
something to be eager to form.  The join ordering rules will take
care of forcing us to make the join when necessary.

>> * This fix throws away a fair bit of the optimization intended by
>> 3373c7155, since it will result in examining some irrelevant ECs.

> Yeah, this is also my concern that we'd lose some optimization about
> finding ECs.

The only easy improvement I can see to make here is to apply the old
rules at inner joins.  Maybe it's worth complicating the data structures
to be smarter at outer joins, but I rather doubt it: we could easily
expend more overhead than we'll save here by examining irrelevant ECs.
In any case, if there is a useful optimization here, it can be pursued
later.

>> BTW, while looking at this I saw that generate_join_implied_equalities'
>> calculation of nominal_join_relids is wrong for child rels, because it
>> fails to fold the join relid into that if appropriate.

> I dug a little into this and it seems this is all right as-is.

I changed it anyway after noting that (a) passing in the ojrelid is
needful to be able to distinguish inner and outer joins, and
(b) the existing comment about the join_relids input is now wrong.
Even if it happens to not be borked for current callers, that seems
like a mighty fragile assumption.

Less-hasty v2 patch attached.

regards, tom lane

diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index faabe4d065..18950351c3 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -1366,15 +1366,19 @@ generate_base_implied_equalities_broken(PlannerInfo *root,
  * commutative duplicates, i.e. if the algorithm selects "a.x = b.y" but
  * we already have "b.y = a.x", we return the existing clause.
  *
- * join_relids should always equal bms_union(outer_relids, inner_rel->relids).
- * We could simplify this function's API by computing it internally, but in
- * most current uses, the caller has the value at hand anyway.
+ * If we are considering an outer join, ojrelid is the associated OJ relid,
+ * otherwise it's zero.
+ *
+ * join_relids should always equal bms_union(outer_relids, inner_rel->relids)
+ * plus ojrelid if that's not zero.  We could simplify this function's API by
+ * computing it internally, but most callers have the value at hand anyway.
  */
 List *
 generate_join_implied_equalities(PlannerInfo *root,
  Relids join_relids,
  Relids outer_relids,
- RelOptInfo *inner_rel)
+ RelOptInfo *inner_rel,
+ Index ojrelid)
 {
 	List	   *result = NIL;
 	Relids		inner_relids = inner_rel->relids;
@@ -1392,6 +1396,8 @@ generate_join_implied_equalities(PlannerInfo *root,
 		nominal_inner_relids = inner_rel->top_parent_relids;
 		/* ECs will be marked with the parent's relid, not the child's */
 		nominal_join_relids = bms_union(outer_relids, nominal_inner_relids);
+		if (ojrelid != 0)
+			nominal_join_relids = bms_add_member(nominal_join_relids, ojrelid);
 	}
 	else
 	{
@@ -1400,10 +1406,23 @@ generate_join_implied_equalities(PlannerInfo *root,
 	}
 
 	/*
-	 * Get all eclasses that mention both inner and outer sides of the join
+	 * Examine all potentially-relevant eclasses.
+	 *
+	 * If we are considering an outer join, we must include "join" clauses
+	 * that mention either input rel plus the outer join's relid; these
+	 * represent post-join filter clauses that have to be applied at this
+	 * join.  We don't have infrastructure that would let us identify such
+	 * eclasses cheaply, so just fall back to considering all eclasses
+	 * mentioning anything in nominal_join_relids.
+	 *
+	 * At inner joins, we can be smarter: only consider eclasses mentioning
+	 * both input rels.
 	 */
-	matching_ecs = get_common_eclass_indexes(root, nominal_inner_relids,
-			 outer_relids);
+	if (ojrelid != 0)
+		matching_ecs = get_eclass_indexes_for_relids(root, nominal_join_relids);
+	else
+		matching_ecs = get_common_eclass_indexes(root, nominal_inner_relids,
+ outer_relids);
 
 	i = -1;
 	while ((i = bms_next_member(matching_ecs,

Re: pg_regress: Treat child process failure as test failure

2023-02-22 Thread Daniel Gustafsson

> On 22 Feb 2023, at 21:33, Andres Freund  wrote:
> On 2023-02-22 15:10:11 +0100, Daniel Gustafsson wrote:

>> Rebased patch to handle breakage of v2 due to bd8d453e9b.
> 
> I think we probably should just apply this? The current behaviour doesn't seem
> right, and I don't see a downside of the new behaviour?

Agreed, I can't think of a regression test where we wouldn't want this.  My
only concern was if any of the ECPG tests were doing something odd that would
break from this but I can't see anything.

--
Daniel Gustafsson

Re: Improving inferred query column names

2023-02-22 Thread Andres Freund

Hi,

On 2023-02-11 12:47:04 -0800, Vladimir Churyukin wrote:
> That is a good idea for simple cases, I'm just curious how it would look
> like for more complex cases (you can have all kinds of expressions as
> parameters for aggregate function calls).
> If it works only for simple cases, I think it would be confusing and not
> very helpful.

I don't think it needs to be perfect to be helpful.


> Wouldn't it make more sense to just deduplicate the names by adding
> numerical postfixes, like sum_1, sum_2?

That'd be considerably worse than what we do today imo, because any reordering
/ added aggregate would lead to everything else changing as well.


Greetings,

Andres Freund

Re: Improving inferred query column names

2023-02-22 Thread Andres Freund

Hi,

On 2023-02-20 16:08:00 +0100, Peter Eisentraut wrote:
> On 11.02.23 20:24, Andres Freund wrote:
> I think we should just do it and not care about what breaks.  There has
> never been any guarantee about these.
> 
> FWIW, "most" other SQL implementations appear to generate column names like
> 
> SELECT SUM(reads), SUM(writes) FROM pg_stat_io;
> column names: "SUM(reads)", "SUM(writes)"

Hm, personally I don't like leaving in parens in the names, that makes it
unnecessarily hard to reference the columns.  sum_reads imo is more usable
than than "SUM(reads)".

> We had a colleague look into this a little while ago, and it got pretty
> tedious to implement this for all the expression types.

Hm, any chance that colleague could be pointed at this discussion and chime
in? It doesn't immediately look that hard to do substantially better than
today. Of course there's an approximately endless amount of effort that could
be poured into this, but even some fairly basic improvements seem like a big
win.

> And, you know, the bikeshedding ...

Indeed. I already started above :)

Greetings,

Andres Freund

Re: pg_regress: Treat child process failure as test failure

2023-02-22 Thread Andres Freund

Hi,

On 2023-02-22 15:10:11 +0100, Daniel Gustafsson wrote:
> > On 26 Nov 2022, at 22:46, Daniel Gustafsson  wrote:
> 
> > I've moved the statuses[i] check before the differ check, such that there 
> > is a
> > separate block for this not mixed up with the differs check and printing.
> 
> Rebased patch to handle breakage of v2 due to bd8d453e9b.

I think we probably should just apply this? The current behaviour doesn't seem
right, and I don't see a downside of the new behaviour?

Greetings,

Andres Freund

Re: refactoring relation extension and BufferAlloc(), faster COPY

2023-02-22 Thread Andres Freund

Hi,

On 2023-02-22 11:18:57 +0200, Heikki Linnakangas wrote:
> On 21/02/2023 21:22, Andres Freund wrote:
> > On 2023-02-21 18:18:02 +0200, Heikki Linnakangas wrote:
> > > Is it ever possible to call this without a relcache entry? WAL redo
> > > functions do that with ReadBuffer, but they only extend a relation
> > > implicitly, by replay a record for a particular block.
> > 
> > I think we should use it for crash recovery as well, but the patch doesn't
> > yet. We have some gnarly code there, see the loop using P_NEW in
> > XLogReadBufferExtended(). Extending the file one-by-one is a lot more
> > expensive than doing it in bulk.
> 
> Hmm, XLogReadBufferExtended() could use smgrzeroextend() to fill the gap,
> and then call ExtendRelationBuffered for the target page. Or the new
> ExtendRelationBufferedTo() function that you mentioned.

I don't think it's safe to just use smgrzeroextend(). Without the page-level
interlock from the buffer entry, a concurrent reader can read/write the
extended portion of the relation, while we're extending. That can lead to
loosing writes.

It also turns out that just doing smgrzeroextend(), without filling s_b, is
often bad for performance, because it may cause reads when trying to fill the
buffers. Although hopefully that's less of an issue during WAL replay, due to
REGBUF_WILL_INIT.

> In the common case that you load a lot of data to a relation extending it,
> and then crash, the WAL replay would still extend the relation one page at a
> time, which is inefficient. Changing that would need bigger changes, to
> WAL-log the relation extension as a separate WAL record, for example. I
> don't think we need to solve that right now, it can be addressed separately
> later.

Yea, that seems indeed something for later.

There's several things we could do without adding WAL logging of relation
extension themselves.

One relatively easy thing would be to add information about the number of
blocks we're extending by to XLOG_HEAP2_MULTI_INSERT records. Compared to the
insertions themselves that'd barely be noticable.

A slightly more complicated thing would be to peek ahead in the WAL (we have
infrastructure for that now) and extend by enough for the next few relation
extensions.

Greetings,

Andres Freund

Re: Non-superuser subscription owners

2023-02-22 Thread Joe Conway


On 2/22/23 14:12, Mark Dilger wrote:

On Feb 22, 2023, at 10:49 AM, Jeff Davis  wrote:
On Wed, 2023-02-22 at 09:27 -0800, Mark Dilger wrote:

Another option is to execute under the intersection of their
privileges, where both the definer and the invoker need the
privileges in order for the action to succeed.  That would be more
permissive than the proposed SECURITY NONE, while still preventing
either party from hijacking privileges of the other.


Interesting idea, I haven't heard of something like that being done
before. Is there some precedent for that or a use case where it's
helpful?

 > No current use case comes to mind, but I proposed it for event
triggers one or two development cycles ago, to allow for
non-superuser event trigger owners.  The problems associated with
allowing non-superusers to create and own event triggers were pretty
similar to the problems being discussed in this thread.



The intersection of privileges is used, for example, in multi-level 
security contexts where the intersection of the network-allowed levels 
and the subject allowed levels is used to bracket what can be accessed 
and how.


Other examples I found with a quick search:

https://docs.oracle.com/javase/8/docs/api/java/security/AccessController.html#doPrivileged-java.security.PrivilegedAction-java.security.AccessControlContext-

https://learn.microsoft.com/en-us/dotnet/api/system.security.permissions.dataprotectionpermission.intersect?view=dotnet-plat-ext-7.0


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Add SHELL_EXIT_CODE to psql

2023-02-22 Thread Corey Huinker

>
>
>
> The patch set seem to be in a good shape and pretty stable for quite a
> while.
> Could you add one more minor improvement, a new line after variables
> declaration?
>

As you wish. Attached.


>
> After this changes, I think, we make this patch RfC, shall we?
>
>
Fingers crossed.
From bb55e0fd1cd2de6fa25b7fc738dd6482d0c008c4 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Wed, 11 Jan 2023 17:27:23 -0500
Subject: [PATCH v9 1/3] Add PG_OS_TARGET environment variable to enable
 OS-specific changes to regression tests.

---
 src/test/regress/pg_regress.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 6cd5998b9d..bf0ada4560 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -594,6 +594,17 @@ initialize_environment(void)
 #endif
 	}
 
+		/*
+		 * Regression tests that invoke OS commands must occasionally use
+		 * OS-specific syntax (example: NUL vs /dev/null). This env var allows
+		 * those tests to adjust commands accordingly.
+		 */
+#if defined(WIN32)
+		setenv("PG_OS_TARGET", "WIN32", 1);
+#else
+		setenv("PG_OS_TARGET", "OTHER", 1);
+#endif
+
 	/*
 	 * Set translation-related settings to English; otherwise psql will
 	 * produce translated messages and produce diffs.  (XXX If we ever support
-- 
2.39.2

From fe954f883f4ee65cbfe5dc2c20f012465d031ada Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Thu, 12 Jan 2023 23:04:31 -0500
Subject: [PATCH v9 2/3] Add wait_result_to_exit_code().

---
 src/common/wait_error.c | 15 +++
 src/include/port.h  |  1 +
 2 files changed, 16 insertions(+)

diff --git a/src/common/wait_error.c b/src/common/wait_error.c
index 4a3c3c61af..1eb7ff3fa2 100644
--- a/src/common/wait_error.c
+++ b/src/common/wait_error.c
@@ -127,3 +127,18 @@ wait_result_is_any_signal(int exit_status, bool include_command_not_found)
 		return true;
 	return false;
 }
+
+/*
+ * Return the POSIX exit code (0 to 255) that corresponds to the argument.
+ * The argument is a return code returned by wait(2) or waitpid(2), which
+ * also applies to pclose(3) and system(3).
+ */
+int
+wait_result_to_exit_code(int exit_status)
+{
+	if (WIFEXITED(exit_status))
+		return WEXITSTATUS(exit_status);
+	if (WIFSIGNALED(exit_status))
+		return WTERMSIG(exit_status);
+	return 0;
+}
diff --git a/src/include/port.h b/src/include/port.h
index e66193bed9..1b119a3c56 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -495,6 +495,7 @@ extern char *escape_single_quotes_ascii(const char *src);
 extern char *wait_result_to_str(int exitstatus);
 extern bool wait_result_is_signal(int exit_status, int signum);
 extern bool wait_result_is_any_signal(int exit_status, bool include_command_not_found);
+extern int wait_result_to_exit_code(int exit_status);
 
 /*
  * Interfaces that we assume all Unix system have.  We retain individual macros
-- 
2.39.2

From 0f91719c26b6201f4aaa436bd13141ba325e2f85 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Wed, 22 Feb 2023 14:55:34 -0500
Subject: [PATCH v9 3/3] Add psql variables SHELL_ERROR and SHELL_EXIT_CODE
 which report the success or failure of the most recent psql OS-command
 executed via the \! command or `backticks`. These variables are roughly
 analogous to ERROR and SQLSTATE, but for OS level operations instead of SQL
 commands.

---
 doc/src/sgml/ref/psql-ref.sgml | 21 ++
 src/bin/psql/command.c | 15 +
 src/bin/psql/help.c|  4 
 src/bin/psql/psqlscanslash.l   | 34 +-
 src/test/regress/expected/psql.out | 29 +
 src/test/regress/sql/psql.sql  | 25 ++
 6 files changed, 123 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index dc6528dc11..3d8a7353b3 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -4264,6 +4264,27 @@ bar
 
   
 
+  
+   SHELL_ERROR
+   
+
+ true if the last shell command failed, false if
+ it succeeded. This applies to shell commands invoked via either \!
+ or `. See also SHELL_EXIT_CODE.
+
+   
+  
+
+  
+   SHELL_EXIT_CODE
+   
+
+ The exit code return by the last shell command, invoked via either \! or `.
+ 0 means no error. See also SHELL_ERROR.
+
+   
+  
+
   
 SHOW_ALL_RESULTS
 
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 955397ee9d..40ba2810ea 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -5032,6 +5032,21 @@ do_shell(const char *command)
 	else
 		result = system(command);
 
+	if (result == 0)
+	{
+		SetVariable(pset.vars, "SHELL_EXIT_CODE", "0");
+		SetVariable(pset.vars, "SHELL_ERROR", "false");
+	}
+	else
+	{
+		int		exit_code = wait_result_to_exit_code(result);
+		char	buf[32];
+
+

Re: Rework of collation code, extensibility

2023-02-22 Thread Peter Eisentraut


On 14.02.23 00:45, Jeff Davis wrote:

Now the patches are:

 0001: pg_strcoll/pg_strxfrm
 0002: pg_locale_deterministic()
 0003: cleanup a USE_ICU special case
 0004: GUCs (only for testing, not for commit)


I haven't read the whole thing again, but this arrangement looks good to 
me.  I don't have an opinion on whether 0004 is actually useful.

Re: [PATCH] Fix unbounded authentication exchanges during PQconnectPoll()

2023-02-22 Thread Heikki Linnakangas


On 22/02/2023 20:49, Jacob Champion wrote:

On Tue, Feb 21, 2023 at 12:35 PM Heikki Linnakangas  wrote:

@@ -3370,6 +3389,7 @@ keep_going:   
/* We will come back to here until there is
 /* Get the type of request. */
 if (pqGetInt((int *) , 4, conn))
 {
+   libpq_append_conn_error(conn, "server sent 
truncated authentication request");
 goto error_return;
 }
 msgLength -= 4;


This is unreachable, because we already checked the length. Better safe
than sorry I guess, but let's avoid the translation overhead of this at
least.


Should we just Assert() instead of an error message?


I separated the earlier message-length checks so that you get "invalid 
invalid authentication request" or "received invalid protocol 
negotiation message", depending on whether it was an 'R' or 'v' message. 
With that, "invalid invalid authentication request" becomes translatable 
anyway, which makes the point on translation overhead moot. I added a 
comment to mention that it's unreachable, though.


I also reformatted the comments a little more.

Pushed with those changes, thanks!

- Heikki

Re: Proposal: %T Prompt parameter for psql for current time (like Oracle has)

2023-02-22 Thread Nathan Bossart

On Wed, Feb 22, 2023 at 07:17:37PM +0100, Daniel Gustafsson wrote:
>> On 22 Feb 2023, at 19:14, Heikki Linnakangas  wrote:
> 
>> How about a new backslash command or psql variable to show how long the 
>> previous statement took? Something like:
>> 
>> postgres=# select 
>> ?column?
>> --
>>  123
>> (1 row)
>> 
>> postgres=# \time
>> 
>> Time: 14011.975 ms (00:14.012)
>> 
>> This would solve the "I forgot to time something" problem.
> 
> I don't have an opinion on adding a prompt option, but I've wanted this
> (without realizing this was the format of it) many times.

+1

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: meson and sslfiles.mk in src/test/ssl/

2023-02-22 Thread Andres Freund

Hi,

On 2023-02-22 09:42:54 -0800, Jacob Champion wrote:
> I'm happy to contribute cycles to a Meson port when you're ready for
> it. From a skim it seems like maybe in-source generation isn't a focus for
> Meson [1]. They might encourage us to write custom Python for it anyway?

You'd normally just invoke commands for updating sources as run_target()s, not
custom targets. Or generate them in the build tree via custom_target() and
then copy to the source tree via a run_target().

For this case I think it'd suffice to add a run target that does something
like
make -C ~/src/postgresql/src/test/ssl/ -f 
~/src/postgresql/src/test/ssl/sslfiles.mk sslfiles OPENSSL=openssl

obviously with the necessary things being replaced by the relevant variables.

sslfiles.mk doesn't depend on the rest of the buildsystem, and is a rarely
executed command, so I don't see a problem with using make to update the
files. At least for a long while.

Greetings,

Andres Freund

Re: Reducing connection overhead in pg_upgrade compat check phase

2023-02-22 Thread Nathan Bossart

On Wed, Feb 22, 2023 at 10:37:35AM +0100, Daniel Gustafsson wrote:
>> On 18 Feb 2023, at 06:46, Nathan Bossart  wrote:
>> The last 4 are for supported versions and, from a very
>> quick glance, seem possible to consolidate.  That would bring us to a total
>> of 11 separate loops that we could consolidate into one.  However, the data
>> type checks seem to follow a nice pattern, so perhaps this is easier said
>> than done.
> 
> There is that, refactoring the data type checks leads to removal of duplicated
> code and a slight performance improvement.  Refactoring the other checks to
> reduce overhead would be an interesting thing to look at, but this point in 
> the
> v16 cycle might not be ideal for that.

Makes sense.

>> I wonder if it'd be better to perform all of the data type
>> checks in all databases before failing so that all of the violations are
>> reported.  Else, users would have to run pg_upgrade, fix a violation, run
>> pg_upgrade again, fix another one, etc.
> 
> I think that's better, and have changed the patch to do it that way.

Thanks.  This seems to work as intended.  One thing I noticed is that the
"failed check" log is only printed once, even if multiple data type checks
failed.  I believe this is because this message uses PG_STATUS.  If I
change it to PG_REPORT, all of the "failed check" messages appear.  TBH I'm
not sure we need this message at all since a more detailed explanation will
be printed afterwards.  If we do keep it around, I think it should be
indented so that it looks more like this:

Checking for data type usagechecking 
all databases  
failed check: incompatible aclitem data type in user tables
failed check: reg* data types in user tables

> One change this brings is that check.c contains version specific checks in the
> struct.  Previously these were mostly contained in version.c (some, like the
> 9.4 jsonb check was in check.c) which maintained some level of separation.
> Splitting the array init is of course one option but it also seems a tad 
> messy.
> Not sure what's best, but for now I've documented it in the array comment at
> least.

Hm.  We could move check_for_aclitem_data_type_usage() and
check_for_jsonb_9_4_usage() to version.c since those are only used for
determining whether the check applies now.  Otherwise, IMO things are in
roughly the right place.  I don't think it's necessary to split the array.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Non-superuser subscription owners

2023-02-22 Thread Mark Dilger

> On Feb 22, 2023, at 10:49 AM, Jeff Davis  wrote:
> 
> On Wed, 2023-02-22 at 09:27 -0800, Mark Dilger wrote:
>> Another option is to execute under the intersection of their
>> privileges, where both the definer and the invoker need the
>> privileges in order for the action to succeed.  That would be more
>> permissive than the proposed SECURITY NONE, while still preventing
>> either party from hijacking privileges of the other.
> 
> Interesting idea, I haven't heard of something like that being done
> before. Is there some precedent for that or a use case where it's
> helpful?

No current use case comes to mind, but I proposed it for event triggers one or 
two development cycles ago, to allow for non-superuser event trigger owners.  
The problems associated with allowing non-superusers to create and own event 
triggers were pretty similar to the problems being discussed in this thread.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Proposal: %T Prompt parameter for psql for current time (like Oracle has)

2023-02-22 Thread Pavel Stehule

st 22. 2. 2023 v 18:59 odesílatel Nikolay Samokhvalov 
napsal:

> On Wed, Feb 22, 2023 at 9:55 AM Tom Lane  wrote:
>
>> On the whole I'd rather not eat more of the limited namespace for
>> psql prompt codes for this.
>>
>
> It depends on personal preferences. When I work on a large screen, I can
> afford to spend some characters in prompts, if it gives convenience – and
> many do (looking, for example, at modern tmux/zsh prompts showing git
> branch context, etc).
>
> Default behavior might remain short – it wouldn't make sense to extend it
> for everyone.
>

+1

Re: [PATCH] Fix unbounded authentication exchanges during PQconnectPoll()

2023-02-22 Thread Jacob Champion

On Tue, Feb 21, 2023 at 12:35 PM Heikki Linnakangas  wrote:
> I don't think the "overlong" or "truncated" bit is helpful. For example,
> if the pre-v3.0 error message seems to be "overlong", it's not clear
> that's really what happened. More likely, it's just garbage.

I think this is maybe a distinction without a difference, at least at
the protocol level -- in the event of a missed terminator, any message
could be garbage independently of whether it's too long. But I also
don't mind the "invalid" wording you've proposed, so done that way in
v2. (You're probably going to break out Wireshark for this either
way.)

> It's useful to have a unique error message for every different error, so
> that if you see that error, you can point to the exact place in the code
> where it was generated. If we care about that, we could add some detail
> to the messages, like "received invalid error message; null-terminator
> not found before end-of-message". I don't think that's necessary,
> though, and we've re-used the "expected authentication request from
> server, but received %c" for two different checks already.

(Note that I've reworded the duplicate message in patch v2, if that
changes the calculus.)

> > @@ -3370,6 +3389,7 @@ keep_going:   
> > /* We will come back to here until there is
> > /* Get the type of request. */
> > if (pqGetInt((int *) , 4, conn))
> > {
> > +   libpq_append_conn_error(conn, 
> > "server sent truncated authentication request");
> > goto error_return;
> > }
> > msgLength -= 4;
>
> This is unreachable, because we already checked the length. Better safe
> than sorry I guess, but let's avoid the translation overhead of this at
> least.

Should we just Assert() instead of an error message?

> This isn't from your patch, but a pre-existing message in the vicinity
> that caught my eye:
>
> >   if ((beresp == 'R' || beresp == 'v') && 
> > (msgLength < 8 || msgLength > 2000))
> >   {
> >   libpq_append_conn_error(conn, 
> > "expected authentication request from server, but received %c",
> >  
> > beresp);
> >   goto error_return;
> >   }
>
> If we receive a 'R' or 'v' message that's too long or too short, the
> message is confusing because the 'beresp' that it prints is actually
> expected, but the length is unexpected.

Updated. I think there's room for additional improvement here, since
as of the protocol negotiation improvements, we don't just expect an
authentication request anymore.

> (Wow, that was a long message for such a simple patch. I may have fallen
> into the trap of bikeshedding, sorry :-) )

No worries :D This code is overdue for a tuneup, I think, and message
tweaks are cheap.

Thanks!
--Jacob
1:  04942e8e64 ! 1:  d7ff5c8f64 PQconnectPoll: fix unbounded authentication 
exchanges
@@ Commit message
 long, and a v3 error should be bounded by its original message length.
 
 For the existing error_return cases, I added some additional error
-messages for symmetry with the new ones.
+messages for symmetry with the new ones, and cleaned up some message
+rot.
 
  ## src/interfaces/libpq/fe-connect.c ##
 @@ src/interfaces/libpq/fe-connect.c: keep_going:  
/* We will come back to here until there is
+*/
+   if ((beresp == 'R' || beresp == 'v') && 
(msgLength < 8 || msgLength > 2000))
+   {
+-  libpq_append_conn_error(conn, "expected 
authentication request from server, but received %c",
+- 
beresp);
++  libpq_append_conn_error(conn, "received 
invalid authentication request");
goto error_return;
}
  
@@ src/interfaces/libpq/fe-connect.c: keep_going:   
/* We will come back to here
 +  avail = conn->inEnd - 
conn->inCursor;
 +  if (avail > MAX_ERRLEN)
 +  {
-+  
libpq_append_conn_error(conn, "server sent overlong v2 error message");
++  
libpq_append_conn_error(conn, "received invalid error message");
 +

Re: Non-superuser subscription owners

2023-02-22 Thread Jeff Davis

On Wed, 2023-02-22 at 09:27 -0800, Mark Dilger wrote:
> Another option is to execute under the intersection of their
> privileges, where both the definer and the invoker need the
> privileges in order for the action to succeed.  That would be more
> permissive than the proposed SECURITY NONE, while still preventing
> either party from hijacking privileges of the other.

Interesting idea, I haven't heard of something like that being done
before. Is there some precedent for that or a use case where it's
helpful?

Regards,
Jeff Davis

Re: Proposal: %T Prompt parameter for psql for current time (like Oracle has)

2023-02-22 Thread Kirk Wolak

On Wed, Feb 22, 2023 at 1:14 PM Heikki Linnakangas  wrote:

> On 22/02/2023 19:59, Nikolay Samokhvalov wrote:
> > On Wed, Feb 22, 2023 at 9:55 AM Tom Lane  > > wrote:
> >
> > On the whole I'd rather not eat more of the limited namespace for
> > psql prompt codes for this.
> >
> >
> > It depends on personal preferences. When I work on a large screen, I can
> > afford to spend some characters in prompts, if it gives convenience –
> > and many do (looking, for example, at modern tmux/zsh prompts showing
> > git branch context, etc).
> >
> > Default behavior might remain short – it wouldn't make sense to extend
> > it for everyone.
>
> I have no objections to adding a %T option, although deciding what
> format to use is a hassle. -1 for changing the default.
>
> But let's look at the original request:
>
> > This has been in sqlplus since I can remember, and I find it really
> > useful when I forgot to time something, or to review for Time spent
> > on a problem, or for how old my session is...
> I've felt that pain too. You run a query, and it takes longer than I
> expected. How long did it actually take? Too bad I didn't enable \timing
> beforehand..
>
> How about a new backslash command or psql variable to show how long the
> previous statement took? Something like:
>
> postgres=# select 
>   ?column?
> --
>123
> (1 row)
>
> postgres=# \time
>
> Time: 14011.975 ms (00:14.012)
>
> This would solve the "I forgot to time something" problem.
>
> - Heikki
>
> TBH, I have that turned on by default.  Load a script.  Have 300 of those
lines, and tell me how long it took?
In my case, it's much easier.  The other uses cases, including noticing I
changed some configuration and I
should reconnect (because I use multiple sessions, and I am in the early
stages with lots of changes).

Re: Proposal: %T Prompt parameter for psql for current time (like Oracle has)

2023-02-22 Thread Kirk Wolak

On Wed, Feb 22, 2023 at 12:55 PM Tom Lane  wrote:

> Kirk Wolak  writes:
> > Proposal:  Simply add the %T (PROMPT variable) to output the current time
> > (HH24:MI:SS) into the prompt.
>
> I'm not really convinced that %`date` isn't a usable solution for this,
> especially since it seems like a very niche requirement.  The next
> person who wants it might well have a different desire than you
> for exactly what gets shown.  The output of date can be customized,
> but a hard-wired prompt.c feature not so much.
>
> On the whole I'd rather not eat more of the limited namespace for
> psql prompt codes for this.
>
> regards, tom lane
>
Tom,
  I totally respect where you are coming from, and you are rightfully the
big dog!

In reverse order.  That limited namespace.  I assume you mean the 52 alpha
characters, of which, we are using 7,
and this change would make it 8.  Can we agree that at the current pace of
consumption it will be decades before
we get to 26, and they appear to be pretty well defended?

I already requested ONLY the HH24 format.  8 characters of output.  no
options.  It's a waste of time.
After all these years, sqlplus still has only one setting (show it, or
not).  I am asking the same here.
And I will gladly defend not changing it!  Ever!

I believe that leaves the real question:
Can't we just shell out? (which is what I do no, with issues as stated, and
a lot harder to do from memory if someplace new)

It's far easier in linux than windows to get what you want.
It's much more complicated if you try to use the same pgsqlrc file for
multiple environments and users.

We are talking about adding this much code, and consuming 1 of the
remaining 45 namespace items.
case 'T':
time_t current_time = time(NULL);
struct tm *tm_info = localtime(_time);
sprintf(buf, "%02d:%02d:%02d", tm_info->tm_hour,
tm_info->tm_min, tm_info->tm_sec);
break;

Does this help my case at all?
If I crossed any lines, it's not my intention.  I was tired of dealing with
this, and helping others to set it up.

With Respect,

Kirk

allow meson to find ICU in non-standard localtion

2023-02-22 Thread Jeff Davis

I attached a simple patch that allows meson to find ICU in a non-
standard location if if you specify -Dextra_lib_dirs and
-Dextra_include_dirs.

I'm not sure it's the right thing to do though. One downside is that it
doesn't output the version that it finds, it only outputs "YES".
From 8fd8d42a6ef9a4589776260e539f10d730e1c3f1 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Wed, 22 Feb 2023 10:24:52 -0800
Subject: [PATCH v1] Allow meson to find ICU in non-standard locations.

---
 meson.build | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/meson.build b/meson.build
index f534704452..250840b281 100644
--- a/meson.build
+++ b/meson.build
@@ -721,8 +721,11 @@ endif
 
 icuopt = get_option('icu')
 if not icuopt.disabled()
-  icu = dependency('icu-uc', required: icuopt.enabled())
-  icu_i18n = dependency('icu-i18n', required: icuopt.enabled())
+
+  icu = cc.find_library('icuuc', required: icuopt.enabled(),
+ dirs: postgres_lib_d, header_include_directories: postgres_inc)
+  icu_i18n = cc.find_library('icui18n', required: icuopt.enabled(),
+dirs: postgres_lib_d, header_include_directories: postgres_inc)
 
   if icu.found()
 cdata.set('USE_ICU', 1)
-- 
2.34.1

Re: Proposal: %T Prompt parameter for psql for current time (like Oracle has)

2023-02-22 Thread Pavel Stehule

st 22. 2. 2023 v 19:14 odesílatel Heikki Linnakangas 
napsal:

> On 22/02/2023 19:59, Nikolay Samokhvalov wrote:
> > On Wed, Feb 22, 2023 at 9:55 AM Tom Lane  > > wrote:
> >
> > On the whole I'd rather not eat more of the limited namespace for
> > psql prompt codes for this.
> >
> >
> > It depends on personal preferences. When I work on a large screen, I can
> > afford to spend some characters in prompts, if it gives convenience –
> > and many do (looking, for example, at modern tmux/zsh prompts showing
> > git branch context, etc).
> >
> > Default behavior might remain short – it wouldn't make sense to extend
> > it for everyone.
>
> I have no objections to adding a %T option, although deciding what
> format to use is a hassle. -1 for changing the default.
>
> But let's look at the original request:
>
> > This has been in sqlplus since I can remember, and I find it really
> > useful when I forgot to time something, or to review for Time spent
> > on a problem, or for how old my session is...
> I've felt that pain too. You run a query, and it takes longer than I
> expected. How long did it actually take? Too bad I didn't enable \timing
> beforehand..
>
> How about a new backslash command or psql variable to show how long the
> previous statement took? Something like:
>
> postgres=# select 
>   ?column?
> --
>123
> (1 row)
>
> postgres=# \time
>
> Time: 14011.975 ms (00:14.012)
>
> This would solve the "I forgot to time something" problem.
>

It is a good idea, unfortunately, it doesn't help with more commands. But
it is a nice idea, and can be implemented.

I am not sure if \time is best way - maybe we can display another runtime
data (when it will be possible, like io profile or queryid)

Regards

Pavel



>
> - Heikki
>
>

Re: Proposal: %T Prompt parameter for psql for current time (like Oracle has)

2023-02-22 Thread Pavel Stehule

Hi

st 22. 2. 2023 v 18:55 odesílatel Tom Lane  napsal:

> Kirk Wolak  writes:
> > Proposal:  Simply add the %T (PROMPT variable) to output the current time
> > (HH24:MI:SS) into the prompt.
>
> I'm not really convinced that %`date` isn't a usable solution for this,
> especially since it seems like a very niche requirement.  The next
> person who wants it might well have a different desire than you
> for exactly what gets shown.  The output of date can be customized,
> but a hard-wired prompt.c feature not so much.
>
> On the whole I'd rather not eat more of the limited namespace for
> psql prompt codes for this.
>

Can we introduce some special syntax that allows using words (and maybe
some params)?

Regards

Pavel


>
> regards, tom lane
>

Re: Proposal: %T Prompt parameter for psql for current time (like Oracle has)

2023-02-22 Thread Daniel Gustafsson

> On 22 Feb 2023, at 19:14, Heikki Linnakangas  wrote:

> How about a new backslash command or psql variable to show how long the 
> previous statement took? Something like:
> 
> postgres=# select 
> ?column?
> --
>  123
> (1 row)
> 
> postgres=# \time
> 
> Time: 14011.975 ms (00:14.012)
> 
> This would solve the "I forgot to time something" problem.

I don't have an opinion on adding a prompt option, but I've wanted this
(without realizing this was the format of it) many times.

--
Daniel Gustafsson

Re: Proposal: %T Prompt parameter for psql for current time (like Oracle has)

2023-02-22 Thread Heikki Linnakangas


On 22/02/2023 19:59, Nikolay Samokhvalov wrote:
On Wed, Feb 22, 2023 at 9:55 AM Tom Lane > wrote:


On the whole I'd rather not eat more of the limited namespace for
psql prompt codes for this.


It depends on personal preferences. When I work on a large screen, I can 
afford to spend some characters in prompts, if it gives convenience – 
and many do (looking, for example, at modern tmux/zsh prompts showing 
git branch context, etc).


Default behavior might remain short – it wouldn't make sense to extend 
it for everyone.


I have no objections to adding a %T option, although deciding what 
format to use is a hassle. -1 for changing the default.


But let's look at the original request:


This has been in sqlplus since I can remember, and I find it really
useful when I forgot to time something, or to review for Time spent
on a problem, or for how old my session is...
I've felt that pain too. You run a query, and it takes longer than I 
expected. How long did it actually take? Too bad I didn't enable \timing 
beforehand..


How about a new backslash command or psql variable to show how long the 
previous statement took? Something like:


postgres=# select 
 ?column?
--
  123
(1 row)

postgres=# \time

Time: 14011.975 ms (00:14.012)

This would solve the "I forgot to time something" problem.

- Heikki

Re: Proposal: %T Prompt parameter for psql for current time (like Oracle has)

2023-02-22 Thread Nikolay Samokhvalov

On Wed, Feb 22, 2023 at 9:55 AM Tom Lane  wrote:

> On the whole I'd rather not eat more of the limited namespace for
> psql prompt codes for this.
>

It depends on personal preferences. When I work on a large screen, I can
afford to spend some characters in prompts, if it gives convenience – and
many do (looking, for example, at modern tmux/zsh prompts showing git
branch context, etc).

Default behavior might remain short – it wouldn't make sense to extend it
for everyone.

Re: Proposal: %T Prompt parameter for psql for current time (like Oracle has)

2023-02-22 Thread Tom Lane

Kirk Wolak  writes:
> Proposal:  Simply add the %T (PROMPT variable) to output the current time
> (HH24:MI:SS) into the prompt.

I'm not really convinced that %`date` isn't a usable solution for this,
especially since it seems like a very niche requirement.  The next
person who wants it might well have a different desire than you
for exactly what gets shown.  The output of date can be customized,
but a hard-wired prompt.c feature not so much.

On the whole I'd rather not eat more of the limited namespace for
psql prompt codes for this.

regards, tom lane

Re: Proposal: %T Prompt parameter for psql for current time (like Oracle has)

2023-02-22 Thread Nikolay Samokhvalov

On Wed, Feb 22, 2023 at 9:18 AM Kirk Wolak  wrote:

> Proposal:  Simply add the %T (PROMPT variable) to output the current time
> (HH24:MI:SS) into the prompt.  This has been in sqlplus since I can
> remember, and I find it really useful when I forgot to time something, or
> to review for Time spent on a problem, or for how old my session is...
>

This is a great idea, in my opinion. I usually do something involving ts to
track timestamps when executing something non-trivial via psql in
interactive (see below) or non-interactive mode.

But this is a not well-known thing to use (and ts is not installed by
default on Ubuntu, etc.) – having timestamps in prompt would be convenient.

test=> \o | ts
test=> select 1;
test=> Feb 22 09:49:49  ?column?
Feb 22 09:49:49 --
Feb 22 09:49:49 1
Feb 22 09:49:49 (1 row)
Feb 22 09:49:49

Re: meson and sslfiles.mk in src/test/ssl/

2023-02-22 Thread Jacob Champion

On Wed, Feb 22, 2023 at 5:40 AM Peter Eisentraut
 wrote:
> I think the tradeoff here is, given how rarely those rules are used, is
> it worth maintaining duplicate implementations or complicated bridging?
>
> It's clearly something to deal with eventually, but it's not high priority.

Yeah... in the same vein, I originally thought that I'd need to
quickly add VPATH support to sslfiles.mk, but it seems like it just
hasn't been a problem in practice and now I'm glad I didn't spend much
time on it.

I'm happy to contribute cycles to a Meson port when you're ready for
it. From a skim it seems like maybe in-source generation isn't a focus
for Meson [1]. They might encourage us to write custom Python for it
anyway?

--Jacob

[1] https://github.com/mesonbuild/meson/issues/5434

Re: wrong query result due to wang plan

2023-02-22 Thread Tom Lane

Richard Guo  writes:
> I'm thinking that maybe we can do the strip-the-outer-joins work only
> when it's not the top JoinDomain.  When we are in the top JoinDomain, it
> seems to me that it's safe to push the qual to the top of the tree.

Yeah, because there's nothing to commute with.  Might as well do that
for consistency with the distribute_qual_to_rels behavior, although
I felt it was better to put it inside get_join_domain_min_rels.

Pushed, thanks for the report!

regards, tom lane

Re: Allow tailoring of ICU locales with custom rules

2023-02-22 Thread Peter Eisentraut


On 20.02.23 17:30, Daniel Verite wrote:

- pg_dump support need to be added for CREATE COLLATION / DATABASE


I have added that.



- there doesn't seem to be a way to add rules to template1.
If someone wants to have icu rules and initial contents to their new
databases, I think they need to create a custom template database
(from template0) for that purpose, in addition to template1.
 From a usability standpoint, this is a bit cumbersome, as it's
normally the role of template1.
To improve on that, shouldn't initdb be able to create template0 with
rules too?


Right, that would be an initdb option.  Is that too many initdb options 
then?  It would be easy to add, if we think it's worth it.
From b566e8756fbf78da804f5538d68350cda7a9bab3 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 22 Feb 2023 18:33:47 +0100
Subject: [PATCH v6] Allow tailoring of ICU locales with custom rules

This exposes the ICU facility to add custom collation rules to a
standard collation.

Discussion: 
https://www.postgresql.org/message-id/flat/821c71a4-6ef0-d366-9acf-bb8e367f7...@enterprisedb.com
---
 doc/src/sgml/catalogs.sgml|  18 
 doc/src/sgml/ref/create_collation.sgml|  22 
 doc/src/sgml/ref/create_database.sgml |  13 +++
 src/backend/catalog/pg_collation.c|   5 +
 src/backend/commands/collationcmds.c  |  23 +++-
 src/backend/commands/dbcommands.c |  51 -
 src/backend/utils/adt/pg_locale.c |  41 ++-
 src/backend/utils/init/postinit.c |  11 +-
 src/bin/pg_dump/pg_dump.c |  37 +++
 src/bin/psql/describe.c   | 100 +++---
 src/include/catalog/pg_collation.h|   2 +
 src/include/catalog/pg_database.h |   3 +
 src/include/utils/pg_locale.h |   1 +
 .../regress/expected/collate.icu.utf8.out |  30 ++
 src/test/regress/expected/psql.out|  18 ++--
 src/test/regress/sql/collate.icu.utf8.sql |  13 +++
 16 files changed, 332 insertions(+), 56 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index c1e4048054..746baf5053 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -2428,6 +2428,15 @@ pg_collation 
Columns
   
  
 
+ 
+  
+   collicurules text
+  
+  
+   ICU collation rules for this collation object
+  
+ 
+
  
   
collversion text
@@ -3106,6 +3115,15 @@ pg_database 
Columns
   
  
 
+ 
+  
+   daticurules text
+  
+  
+   ICU collation rules for this database
+  
+ 
+
  
   
datcollversion text
diff --git a/doc/src/sgml/ref/create_collation.sgml 
b/doc/src/sgml/ref/create_collation.sgml
index 136976165c..289f8147f1 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -27,6 +27,7 @@
 [ LC_CTYPE = lc_ctype, ]
 [ PROVIDER = provider, ]
 [ DETERMINISTIC = boolean, ]
+[ RULES = rules, ]
 [ VERSION = version ]
 )
 CREATE COLLATION [ IF NOT EXISTS ] name FROM 
existing_collation
@@ -149,6 +150,19 @@ Parameters
  
 
 
+
+ rules
+
+ 
+  
+   Specifies additional collation rules to customize the behavior of the
+   collation.  This is supported for ICU only.  See https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+   for details on the syntax.
+  
+ 
+
+
 
  version
 
@@ -228,6 +242,14 @@ Examples
 
   
 
+  
+   To create a collation using the ICU provider, based on the English ICU
+   locale, with custom rules:
+
+
+
+  
+
   
To create a collation from an existing collation:
 
diff --git a/doc/src/sgml/ref/create_database.sgml 
b/doc/src/sgml/ref/create_database.sgml
index 57d13e34c2..6f62161b80 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -30,6 +30,7 @@
[ LC_COLLATE [=] lc_collate ]
[ LC_CTYPE [=] lc_ctype ]
[ ICU_LOCALE [=] icu_locale ]
+   [ ICU_RULES [=] icu_rules ]
[ LOCALE_PROVIDER [=] locale_provider ]
[ COLLATION_VERSION = collation_version ]
[ TABLESPACE [=] tablespace_name ]
@@ -192,6 +193,18 @@ Parameters
   
  
 
+ 
+  icu_rules
+  
+   
+Specifies additional collation rules to customize the behavior of the
+collation.  This is supported for ICU only.  See https://unicode-org.github.io/icu/userguide/collation/customization/"/>
+for details on the syntax.
+   
+  
+ 
+
  
   locale_provider
 
diff --git a/src/backend/catalog/pg_collation.c 
b/src/backend/catalog/pg_collation.c
index 287b13725d..fd022e6fc2 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -50,6 +50,7 @@ CollationCreate(const char *collname, Oid collnamespace,

Re: Non-superuser subscription owners

2023-02-22 Thread Mark Dilger

> On Feb 22, 2023, at 9:18 AM, Jeff Davis  wrote:
> 
> Another option is having some kind SECURITY NONE that would run the
> code as a very limited-privilege user that can basically only access
> the catalog. That would be useful for running default expressions and
> the like without the definer or invoker needing to be careful.

Another option is to execute under the intersection of their privileges, where 
both the definer and the invoker need the privileges in order for the action to 
succeed.  That would be more permissive than the proposed SECURITY NONE, while 
still preventing either party from hijacking privileges of the other.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Raising the SCRAM iteration count

2023-02-22 Thread Jonathan S. Katz


On 2/22/23 8:39 AM, Daniel Gustafsson wrote:

On 17 Dec 2022, at 04:27, Michael Paquier  wrote:




Superuser-only GUCs should be documented as such, or do you intend to
make it user-settable like I suggested upthread :) ?


I don't really have strong feelings, so I reverted to being user-settable since
I can't really present a strong argument for superuser-only.


I was going to present some weak arguments, but not worth it. Anything 
around using up CPU cycles would be true of just writing plain old queries.



The attached is a rebase on top of master with no other additional hacking done
on top of the above review comments.


Generally LGTM. I read through earlier comments (sorry I missed 
replying) and have nothing to add or object to.


Jonathan


OpenPGP_signature
Description: OpenPGP digital signature

Re: Non-superuser subscription owners

2023-02-22 Thread Jeff Davis

On Mon, 2023-02-06 at 14:40 -0500, Robert Haas wrote:
> On Mon, Feb 6, 2023 at 2:18 PM Andres Freund 
> wrote:
> > It's decidedly not great, yes. I don't know if it's quite a CVE
> > type issue,
> > after all, the same is true for any other type of query the
> > superuser
> > executes. But at the very least the documentation needs to be
> > better, with a
> > big red box making sure the admin is aware of the problem.
> 
> I don't think that's the same thing at all. A superuser executing a
> query interactively can indeed cause all sorts of bad things to
> happen, but you don't have to log in as superuser and run DML queries
> on tables owned by unprivileged users, and you shouldn't.

There are two questions:

1. Is the security situation with logical replication bad? Yes. You
nicely summarized just how bad.

2. Is it the same situation as accessing a table owned by a user you
don't absolutely trust? 

Regardless of how the second question is answered, it won't diminish
your point that logical replication is in a bad state. If another
situation is also bad, we should fix that too.

And I think the DML situation is really bad, too. Anyone reading our
documentation would find extensive explanations about GRANT/REVOKE, and
puzzle over the fine details of exactly how much they trust user foo.
Do I trust foo enough for WITH GRANT OPTION? Does foo really need to
see all of the columns of this table, or just a subset?

But there's no obvious mention that user foo must trust you absolutely
in order to exercise the GRANT at all, because you (as table owner) can
trivially cause foo to execute arbitrary code. There's no warning or
hint or suggestion at runtime to know that you are about to execute
someone else's code with your privileges or that it might be dangerous.

It gets worse. Let's say that user foo figures that out, and they're
extra cautious to SET SESSION AUTHORIZATION or SET ROLE to drop their
privileges before accessing a table. No good: the table owner can just
craft their arbitrary code with a "RESET SESSION AUTHORIZATION" or a
"RESET ROLE" at the top, and the code will still execute with the
privileges of user foo.

So I don't think "shouldn't" is quite good enough. In the first place,
the user needs to know that the risk exists. Second, what if they
actually do want to access a table owned by someone else for whatever
reason -- how do they do that safely?

I can't resist mentioning that these are all SECURITY INVOKER problems.
SECURITY INVOKER is insecure unless the invoker absolutely trusts the
definer, and that only really makes sense if the definer is a superuser
(or something very close). That's why we keep adding exceptions with
SECURITY_RESTRICTED_OPERATION, which is really just a way to silently
ignore the SECURITY INVOKER label and use SECURITY DEFINER instead.

At some point we need to ask: "when is SECURITY INVOKER both safe and
useful?" and contain it to those cases, rather than silently ignoring
it in an expanding list of cases.

I know that the response here is that SECURITY DEFINER is somehow
worse. Maybe for superuser-defined functions, it is. But basically, the
problems with SECURITY DEFINER all amount to "the author of the code
needs to be careful", which is a lot more intuitive than the problems
with SECURITY INVOKER.

Another option is having some kind SECURITY NONE that would run the
code as a very limited-privilege user that can basically only access
the catalog. That would be useful for running default expressions and
the like without the definer or invoker needing to be careful.

-- 
Jeff Davis
PostgreSQL Contributor Team - AWS

Proposal: %T Prompt parameter for psql for current time (like Oracle has)

2023-02-22 Thread Kirk Wolak

Proposal:  Simply add the %T (PROMPT variable) to output the current time
(HH24:MI:SS) into the prompt.  This has been in sqlplus since I can
remember, and I find it really useful when I forgot to time something, or
to review for Time spent on a problem, or for how old my session is...

I am recommending no formatting options, just keep it simple.  No, I don't
care about adding the date.  If I don't know the date of some line in my
history, it's already a problem!  (And date would logically be some other
variable)

Yes, I've found ways around it using the shell backquote.  This is hacky,
and it's also really ugly in windows. I also found it impossible to share
my plpgsqlrc file because between linux and windows.

This would be current time on the local machine.  Keeping it simple.

It feels like a small change.  The simplest test would be to capture the
prompt, select sleep(1.1); and make sure the prompt change.  This code
should be trivially stable.

If it seems useful, I believe I can work with others to get it implemented,
and the documentation changed, and a patch generated.  (I need to develop
these skills)

What does the community say?  Is there support for this?

Regards, Kirk

Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy

2023-02-22 Thread Nathan Bossart

On Wed, Feb 22, 2023 at 12:30:20PM +0900, Michael Paquier wrote:
> Okay, applied and backpatched with a minimal test set, then.  I have
> kept the tweaks I did to the tests with extra comments.

Thanks!

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Fix the description of GUC "max_locks_per_transaction" and "max_pred_locks_per_transaction" in guc_table.c

2023-02-22 Thread Nathan Bossart

On Wed, Feb 22, 2023 at 12:40:07PM +, wangw.f...@fujitsu.com wrote:
> On Wed, Feb 22, 2023 at 8:37 AM Nathan Bossart  
> wrote:
>> So, even with your patch applied, I don't think the formulas are correct.
>> I don't know if it's worth writing out the exact formula, though.  It
>> doesn't seem to be kept up-to-date, and I don't know if users would choose
>> different values for max_locks_per_transaction if it _was_ updated.
>> Perhaps max_connections is a good enough approximation of MaxBackends most
>> of the time...
> 
> Thanks very much for your careful review.
> 
> Yes, you are right. I think the formulas in the v1 patch are all 
> approximations.
> I think the exact formula (see function InitializeMaxBackends) is:
> ```
>   max_locks_per_transaction * (max_connections + autovacuum_max_workers + 
> 1 + 
>
> max_worker_processes + max_wal_senders +
>
> max_prepared_transactions)
> ```
> 
> After some rethinking, I think users can easily get exact value according to
> exact formula, and I think using accurate formula can help users adjust
> max_locks_per_transaction or max_predicate_locks_per_transaction if needed. 
> So,
> I used the exact formulas in the attached v2 patch.

IMHO this is too verbose.  Perhaps it could be simplified to something like

The shared lock table is sized on the assumption that at most
max_locks_per_transaction objects per eligible process or prepared
transaction will need to be locked at any one time.

But if others disagree and think the full formula is appropriate, I'm fine
with it.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: logical decoding and replication of sequences, take 2

2023-02-22 Thread Jonathan S. Katz


On 2/22/23 5:02 AM, Tomas Vondra wrote:


On 2/22/23 03:28, Jonathan S. Katz wrote:



Thanks for continuing to work on this patch! I tested the latest version
and have some feedback/clarifications.



Thanks!


Also I should mention I've been testing with both async/sync logical 
replication. I didn't have any specific comments on either as it seemed 
to just work and behaviors aligned with existing expectations.


Generally it's been a good experience and it seems to be working. :) At 
this point I'm trying to understand the limitations and tripwires so we 
can guide users appropriately.



Yes, this is due to how we WAL-log sequences. We don't log individual
increments, but every 32nd increment and we log the "future" sequence
state so that after a crash/recovery we don't generate duplicates.

So you do nextval() and it returns 1. But into WAL we record 32. And
there will be no WAL records until nextval reaches 32 and needs to
generate another batch.

And because logical replication relies on these WAL records, it inherits
this batching behavior with a "jump" on recovery/failover. IMHO it's OK,
it works for the "logical failover" use case and if you need gapless
sequences then regular sequences are not an issue anyway.

It's possible to reduce the jump a bit by reducing the batch size (from
32 to 0) so that every increment is logged. But it doesn't eliminate it
because of rollbacks.


I generally agree. I think it's mainly something we should capture in 
the user docs that they can be a jump on the subscriber side, so people 
are not surprised.


Interestingly, in systems that tend to have higher rates of failover 
(I'm thinking of a few distributed systems), this may cause int4 
sequences to exhaust numbers slightly (marginally?) more quickly. Likely 
not too big of an issue, but something to keep in mind.



2. Using with origin=none with nonconflicting sequences.

I modified the example in [1] to set up two schemas with non-conflicting
sequences[2], e.g. on instance 1:

CREATE TABLE public.room (
     id int GENERATED BY DEFAULT AS IDENTITY (INCREMENT 2 START WITH 1)
PRIMARY KEY,
     name text NOT NULL
);

and instance 2:

CREATE TABLE public.room (
     id int GENERATED BY DEFAULT AS IDENTITY (INCREMENT 2 START WITH 2)
PRIMARY KEY,
     name text NOT NULL
);



Well, yeah. We don't support active-active logical replication (at least
not with the built-in). You can easily get into similar issues without
sequences.


The "origin=none" feature lets you replicate tables bidirectionally. 
While it's not full "active-active", this is a starting point and a 
feature for v16. We'll definitely have users replicating data 
bidirectionally with this.



Replicating a sequence overwrites the state of the sequence on the other
side, which may result in it generating duplicate values with the other
node, etc.


I understand that we don't currently support global sequences, but I am 
concerned there may be a tripwire here in the origin=none case given 
it's fairly common to use serial/GENERATED BY to set primary keys. And 
it's fairly trivial to set them to be nonconflicting, or at least give 
the user the appearance that they are nonconflicting.


From my high level understand of how sequences work, this sounds like 
it would be a lift to support the example in [1]. Or maybe the answer is 
that you can bidirectionally replicate the changes in the tables, but 
not sequences?


In any case, we should update the restrictions in [2] to state: while 
sequences can be replicated, there is additional work required if you 
are bidirectionally replicating tables that use sequences, esp. if used 
in a PK or a constraint. We can provide alternatives to how a user could 
set that up, i.e. not replicates the sequences or do something like in [3].


Thanks,

Jonathan

[1] https://gist.github.com/jkatz/5c34bf1e401b3376dfe8e627fcd30af3
[2] 
https://www.postgresql.org/docs/devel/logical-replication-restrictions.html

[3] https://gist.github.com/jkatz/1599e467d55abec88ab487d8ac9dc7c3


OpenPGP_signature
Description: OpenPGP digital signature

Re: pgindent vs. git whitespace check

2023-02-22 Thread Tom Lane

Peter Eisentraut  writes:
> Commit e4602483e95 accidentally introduced a situation where pgindent
> disagrees with the git whitespace check.  The code is

>  conn = libpqsrv_connect_params(keywords, values,
>  /* expand_dbname = */ false,
> PG_WAIT_EXTENSION);

> where the current source file has 4 spaces before the /*, and the
> whitespace check says that that should be a tab.

Hmm, I don't think that's per project style in the first place.
Most places that annotate function arguments do it like

 conn = libpqsrv_connect_params(keywords, values,
false, /* expand_dbname */
PG_WAIT_EXTENSION);

pgindent has never been very kind to non-end-of-line comments, and
I'm not excited about working on making it do so.  As a thought
experiment, what would happen if we reversed course and started
allowing "//" comments?  Naive conversion of this comment could
break the code altogether.  (Plenty of programming languages
don't even *have* non-end-of-line comments.)

regards, tom lane

Re: pgindent vs. git whitespace check

2023-02-22 Thread Alvaro Herrera

On 2023-Feb-22, Peter Eisentraut wrote:

> In the meantime, I suggest we work around this, perhaps by
> 
> conn = libpqsrv_connect_params(keywords, values, /* expand_dbname = 
> */ false,
>PG_WAIT_EXTENSION);

I suggest

 conn = libpqsrv_connect_params(keywords, values,

false, /* expand_dbname */
PG_WAIT_EXTENSION);

which is what we typically do elsewhere and doesn't go overlength.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
Maybe there's lots of data loss but the records of data loss are also lost.
(Lincoln Yeoh)

Re: Add SHELL_EXIT_CODE to psql

2023-02-22 Thread Maxim Orlov

On Mon, 30 Jan 2023 at 23:23, Corey Huinker  wrote:

>
>
> I rebased, but there are no code differences.
>

The patch set seem to be in a good shape and pretty stable for quite a
while.
Could you add one more minor improvement, a new line after variables
declaration?

+   int exit_code =
wait_result_to_exit_code(result);
+   charbuf[32];
...here
+   snprintf(buf, sizeof(buf), "%d", exit_code);
+   SetVariable(pset.vars, "SHELL_EXIT_CODE", buf);
+   SetVariable(pset.vars, "SHELL_ERROR", "true");

+   charexit_code_buf[32];
... and here
+   snprintf(exit_code_buf, sizeof(exit_code_buf), "%d",
+wait_result_to_exit_code(exit_code));
+   SetVariable(pset.vars, "SHELL_EXIT_CODE", exit_code_buf);
+   SetVariable(pset.vars, "SHELL_ERROR", "true");

After this changes, I think, we make this patch RfC, shall we?

-- 
Best regards,
Maxim Orlov.

Re: LWLock deadlock in brinRevmapDesummarizeRange

2023-02-22 Thread Alvaro Herrera

On 2023-Feb-22, Tomas Vondra wrote:

> > ... and in there, I wrote that we would first write the brin tuple in
> > the regular page, unlock that, and then lock the revmap for the update,
> > without holding lock on the data page.  I don't remember why we do it
> > differently now, but maybe the fix is just to release the regular page
> > lock before locking the revmap page?  One very important change is that
> > in previous versions the revmap used a separate fork, and we had to
> > introduce an "evacuation protocol" when we integrated the revmap into
> > the main fork, which may have changed the locking considerations.
> 
> What would happen if two processes built the summary concurrently? How
> would they find the other tuple, so that we don't end up with two BRIN
> tuples for the same range?

Well, the revmap can only keep track of one tuple per range; if two
processes build summary tuples, and each tries to insert its tuple in a
regular page, that part may succeed; but then only one of them is going
to successfully register the summary tuple in the revmap: when the other
goes to do the same, it would find that a CTID is already present.

... Looking at the code (brinSetHeapBlockItemptr), I think what happens
here is that the second process would overwrite the TID with its own.
Not sure if it would work to see whether the item is empty and bail out
if it's not.

But in any case, it seems to me that the update of the regular page is
pretty much independent of the update of the revmap.

> > Another point: to desummarize a range, just unlinking the entry from
> > revmap should suffice, from the POV of other index scanners.  Maybe we
> > can simplify the whole procedure to: lock revmap, remove entry, remember
> > page number, unlock revmap; lock regular page, delete entry, unlock.
> > Then there are no two locks held at the same time during desummarize.
> 
> Perhaps, as long as it doesn't confuse anything else.

Well, I don't have the details fresh in mind, but I think it shouldn't,
because the only way to reach a regular tuple is coming from the revmap;
and we reuse "items" (lines) in a regular page only when they are
empty (so vacuuming should also be OK).

> > This comes from v16:
> 
> I don't follow - what do you mean by v16? I don't see anything like that
> anywhere in the repository.

I meant the minmax-proposal file in patch v16, the one that I linked to.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"You're _really_ hosed if the person doing the hiring doesn't understand
relational systems: you end up with a whole raft of programmers, none of
whom has had a Date with the clue stick."  (Andrew Sullivan)

Re: We shouldn't signal process groups with SIGQUIT

2023-02-22 Thread Tom Lane

Michael Paquier  writes:
> What would be the advantage of doing that for groups other than
> -StartupPID and -PgArchPID?  These are the two groups of processes we
> need to worry about, AFAIK.

No, we have the issue for regular backends too, since they could be
executing COPY FROM PROGRAM or the like (not to mention that functions
in plperlu, plpythonu, etc could spawn child processes).

regards, tom lane

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2023-02-22 Thread Maxim Orlov

On Tue, 21 Feb 2023 at 16:59, Aleksander Alekseev 
wrote:

> Hi,
>
> One thing that still bothers me is that during the upgrade we only
> migrate the CLOG segments (i.e. pg_xact / pg_clog) and completely
> ignore all the rest of SLRUs:
>
> * pg_commit_ts
> * pg_multixact/offsets
> * pg_multixact/members
> * pg_subtrans
> * pg_notify
> * pg_serial

Hi! We do ignore these values, since in order to pg_upgrade the server it
must be properly stopped and no transactions can outlast this moment.

-- 
Best regards,
Maxim Orlov.

Re: pg_regress: Treat child process failure as test failure

2023-02-22 Thread Daniel Gustafsson

> On 26 Nov 2022, at 22:46, Daniel Gustafsson  wrote:

> I've moved the statuses[i] check before the differ check, such that there is a
> separate block for this not mixed up with the differs check and printing.

Rebased patch to handle breakage of v2 due to bd8d453e9b.

--
Daniel Gustafsson



v3-0001-pg_regress-Consider-a-failed-test-process-as-a-fa.patch
Description: Binary data

Re: pg_rewind race condition just after promotion

2023-02-22 Thread Heikki Linnakangas


On 11/12/2022 02:01, Ian Lawrence Barwick wrote:

2021年11月9日(火) 20:31 Daniel Gustafsson :



On 14 Jul 2021, at 14:03, Aleksander Alekseev  wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

The v3 patch LGTM. I wonder if we should explicitly say in pg_rewind tests that
they _don't_ have to call `checkpoint`, or otherwise, we will lose the test
coverage for this scenario. But I don't have a strong opinion on this one.

The new status of this patch is: Ready for Committer


Heikki, do you have plans to address this patch during this CF?


Friendly reminder ping one year on; I haven't looked at this patch in
detail but going by the thread contents it seems it should be marked
"Ready for Committer"? Moved to the next CF anyway.


Here's an updated version of the patch.

I renamed the arguments to findCommonAncestorTimeline() so that the 
'targetHistory' argument doesn't shadow the global 'targetHistory' 
variable. No other changes, and this still looks good to me, so I'll 
wait for the cfbot to run on this and commit in the next few days.


- Heikki
From 046529d0ec039888b10ef65ff7d8228bba89666c Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 22 Feb 2023 15:58:17 +0200
Subject: [PATCH v4 1/1] pg_rewind: Fix determining TLI when server was just
 promoted.

If the source server was just promoted, and it hasn't written the
checkpoint record yet, pg_rewind considered the server to be still on
the old timeline. Because of that, it would claim incorrectly that no
rewind is required. Fix that by looking at minRecoveryPointTLI in the
control file in addition to the ThisTimeLineID on the checkpoint.

This has been a known issue since forever, and we had worked around it
in the regression tests by issuing a checkpoint after each promotion,
before running pg_rewind. But that was always quite hacky, so better
to fix this properly. This doesn't add any new tests for this, but
removes the previously-added workarounds from the existing tests, so
that they should occasionally hit this codepath again.

Backpatch to all supported versions.

Reviewed-by: Kyotaro Horiguchi, Ibrar Ahmed, Aleksander Alekseev
Discussion: https://www.postgresql.org/message-id/9f568c97-87fe-a716-bd39-65299b8a60f4%40iki.fi
---
 src/bin/pg_rewind/pg_rewind.c | 105 +++---
 src/bin/pg_rewind/t/007_standby_source.pl |   1 -
 src/bin/pg_rewind/t/008_min_recovery_point.pl |   9 --
 src/bin/pg_rewind/t/RewindTest.pm |   8 --
 4 files changed, 64 insertions(+), 59 deletions(-)

diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index 858d8d9f2f..f7f3b8227f 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -45,7 +45,13 @@ static void digestControlFile(ControlFileData *ControlFile,
 			  const char *content, size_t size);
 static void getRestoreCommand(const char *argv0);
 static void sanityChecks(void);
-static void findCommonAncestorTimeline(XLogRecPtr *recptr, int *tliIndex);
+static TimeLineHistoryEntry *getTimelineHistory(TimeLineID tli, bool is_source,
+int *nentries);
+static void findCommonAncestorTimeline(TimeLineHistoryEntry *a_history,
+	   int a_nentries,
+	   TimeLineHistoryEntry *b_history,
+	   int b_nentries,
+	   XLogRecPtr *recptr, int *tliIndex);
 static void ensureCleanShutdown(const char *argv0);
 static void disconnect_atexit(void);
 
@@ -134,6 +140,8 @@ main(int argc, char **argv)
 	XLogRecPtr	chkptrec;
 	TimeLineID	chkpttli;
 	XLogRecPtr	chkptredo;
+	TimeLineID	source_tli;
+	TimeLineID	target_tli;
 	XLogRecPtr	target_wal_endrec;
 	size_t		size;
 	char	   *buffer;
@@ -332,14 +340,28 @@ main(int argc, char **argv)
 
 	sanityChecks();
 
+	/*
+	 * Usually, the TLI can be found in the latest checkpoint record. But if
+	 * the source server is just being promoted (or it's a standby that's
+	 * following a primary that's just being promoted), and the checkpoint
+	 * requested by the promotion hasn't completed yet, the latest timeline is
+	 * in minRecoveryPoint. So we check which is later, the TLI of the
+	 * minRecoveryPoint or the latest checkpoint.
+	 */
+	source_tli = Max(ControlFile_source.minRecoveryPointTLI,
+	 ControlFile_source.checkPointCopy.ThisTimeLineID);
+
+	/* Similarly for the target. */
+	target_tli = Max(ControlFile_target.minRecoveryPointTLI,
+	 ControlFile_target.checkPointCopy.ThisTimeLineID);
+
 	/*
 	 * Find the common ancestor timeline between the clusters.
 	 *
 	 * If both clusters are already on the same timeline, there's nothing to
 	 * do.
 	 */
-	if (ControlFile_target.checkPointCopy.ThisTimeLineID ==
-		ControlFile_source.checkPointCopy.ThisTimeLineID)
+	if (target_tli == source_tli)
 	{
 		pg_log_info("source and target cluster are on the same

Unexpected abort at llvm::report_bad_alloc_error when load JIT library

2023-02-22 Thread Hugo Zhang

Hi hackers,
I met a coredump when backend has no enough memory at dlopen which want to 
allocate memory for libLLVM-10.so.1.

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x7f10bde19859 in __GI_abort () at abort.c:79
#2  0x7f109c24cc33 in llvm::report_bad_alloc_error(char const*, bool) () 
from /lib/x86_64-linux-gnu/libLLVM-10.so.1
#3  0x7f109c23dc32 in ?? () from /lib/x86_64-linux-gnu/libLLVM-10.so.1
#4  0x7f109c23dd6c in ?? () from /lib/x86_64-linux-gnu/libLLVM-10.so.1
#5  0x7f109c2312db in llvm::cl::Option::addArgument() () from 
/lib/x86_64-linux-gnu/libLLVM-10.so.1
#6  0x7f109c18c08e in ?? () from /lib/x86_64-linux-gnu/libLLVM-10.so.1
#7  0x7f10c179fb9a in call_init (l=, argc=argc@entry=12, 
argv=argv@entry=0x7ffd8f53cc38,
env=env@entry=0x557459ba1ce0) at dl-init.c:72
#8  0x7f10c179fca1 in call_init (env=0x557459ba1ce0, argv=0x7ffd8f53cc38, 
argc=12, l=) at dl-init.c:30
#9  _dl_init (main_map=0x557459e6e9d0, argc=12, argv=0x7ffd8f53cc38, 
env=0x557459ba1ce0) at dl-init.c:119
#10 0x7f10bdf57985 in __GI__dl_catch_exception 
(exception=exception@entry=0x0,
operate=operate@entry=0x7f10c17a32d0 , 
args=args@entry=0x7ffd8f53ac70) at dl-error-skeleton.c:182
#11 0x7f10c17a443d in dl_open_worker (a=a@entry=0x7ffd8f53ae20) at 
dl-open.c:758
#12 0x7f10bdf57928 in __GI__dl_catch_exception 
(exception=exception@entry=0x7ffd8f53ae00,
operate=operate@entry=0x7f10c17a3c20 , 
args=args@entry=0x7ffd8f53ae20) at dl-error-skeleton.c:208
#13 0x7f10c17a360a in _dl_open (file=0x557459df2820 " 
/lib/postgresql/llvmjit.so", mode=-2147483390,
caller_dlopen=, nsid=-2, argc=12, argv=0x7ffd8f53cc38, 
env=0x557459ba1ce0) at dl-open.c:837
#14 0x7f10bfa5134c in dlopen_doit (a=a@entry=0x7ffd8f53b040) at dlopen.c:66
#15 0x7f10bdf57928 in __GI__dl_catch_exception 
(exception=exception@entry=0x7ffd8f53afe0,
operate=operate@entry=0x7f10bfa512f0 , 
args=args@entry=0x7ffd8f53b040) at dl-error-skeleton.c:208
#16 0x7f10bdf579f3 in __GI__dl_catch_error 
(objname=objname@entry=0x557459c9e450, errstring=errstring@entry=0x557459c9e458,
mallocedp=mallocedp@entry=0x557459c9e448, 
operate=operate@entry=0x7f10bfa512f0 , 
args=args@entry=0x7ffd8f53b040)
at dl-error-skeleton.c:227
#17 0x7f10bfa51b59 in _dlerror_run (operate=operate@entry=0x7f10bfa512f0 
, args=args@entry=0x7ffd8f53b040)
at dlerror.c:170
#18 0x7f10bfa513da in __dlopen (file=, mode=) 
at dlopen.c:87
#19 0x557458502d07 in ?? ()
#20 0x557458503526 in load_external_function ()
#21 0x557458569c3d in ?? ()
#22 0x557458569e0e in jit_compile_expr ()
#23 0x55745810d6c6 in ExecBuildProjectionInfoExt ()
#24 0x55745812921b in ExecConditionalAssignProjectionInfo ()
#25 0x55745814f12d in ExecInitSeqScanForPartition ()
#26 0x55745812205c in ExecInitNode ()
#27 0x5574581638ef in ExecInitMotion ()
#28 0x557458121ebc in ExecInitNode ()
#29 0x55745811a544 in standard_ExecutorStart ()
#30 0x55745838c059 in PortalStart ()
…

Platform : Ubuntu 20.04. x86_64 Linux 5.15.0-52-generic

Our llvmjit.h implemented the function llvm_enter_fatal_on_oom to FATAL out 
when llvm meet OOM, but when we load libLLVM.so, we may met some oom situation 
like upper stack that loaded there lib and want to init some options with 
memory allocation.
I didn’t figure out a better way to set an error_handler for this situation 
when load libLLVM.so.


-  Hugo.

RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-02-22 Thread Hayato Kuroda (Fujitsu)

Dear Amit,

Thank you for reviewing! PSA new version.

> I think it would be better to say: "The minimum delay, in
> milliseconds, by the publisher before sending all the changes". If you
> agree then similar change is required in below comment as well:
> + /*
> + * The minimum delay, in milliseconds, by the publisher before sending
> + * COMMIT/PREPARE record.
> + */
> + int32 min_send_delay;

OK, both of them were fixed.

> > > Should the validation be also checking/asserting no negative numbers,
> > > or actually should the min_send_delay be defined as a uint32 in the
> > > first place?
> >
> > I think you are right because min_apply_delay does not have related code.
> > we must consider additional possibility that user may send
> START_REPLICATION
> > by hand and it has minus value.
> > Fixed.
> >
> 
> Your reasoning for adding the additional check seems good to me but I
> don't see it in the patch. The check as I see is as below:
> + if (delay_val > PG_INT32_MAX)
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("min_send_delay \"%s\" out of range",
> + strVal(defel->arg;
> 
> Am, I missing something, and the new check is at some other place?

For extracting value from the string, strtoul() is used.
This is an important point.

```
delay_val = strtoul(strVal(defel->arg), , 10);
```

If user specifies min_send_delay as '-1', the value is read as a bit string
'0x', and it is interpreted as PG_UINT64_MAX. After that such a
strange value is rejected by the part you copied. I have tested the case and it 
has
correctly rejected.

```
postgres=#  START_REPLICATION SLOT "sub" LOGICAL 0/0 (min_send_delay '-1');
ERROR:  min_send_delay "-1" out of range
CONTEXT:  slot "sub", output plugin "pgoutput", in the startup callback
```

> +  has been finished. However, there is a possibility that the table
> +  status written in  linkend="catalog-pg-subscription-rel">pg_subscription_rel ctname>
> +  will be delayed in getting to "ready" state, and also two-phase
> +  (if specified) will be delayed in getting to "enabled".
> + 
> 
> There appears to be a special value <0x00> after "ready". I think that
> is added by mistake or probably you have used some editor which has
> added this value. Can we slightly reword this to: "However, there is a
> possibility that the table status updated in  linkend="catalog-pg-subscription-rel">pg_subscription_rel ctname>
> could be delayed in getting to the "ready" state, and also two-phase
> (if specified) could be delayed in getting to "enabled"."?

Oh, my Visual Studio Code did not detect the strange character.
And reworded accordingly.

Additionally, I modified the commit message to describe more clearly the reason
why the do not allow combination of min_send_delay and streaming = parallel.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v6-0001-Time-delayed-logical-replication-on-publisher-sid.patch
Description:  v6-0001-Time-delayed-logical-replication-on-publisher-sid.patch

Re: Missing cases from SPI_result_code_string()

2023-02-22 Thread Dean Rasheed

On Mon, 20 Feb 2023 at 19:49, Dean Rasheed  wrote:
>
> On Mon, 20 Feb 2023 at 19:39, Tom Lane  wrote:
> >
> > Ugh.  Grepping around, it looks like pltcl_process_SPI_result
> > is missing a case for SPI_OK_MERGE as well.
>
> Yes, I was about to post a patch for that too. That's the last case
> that I found, looking around.
>

OK, I've pushed fixes for those.

Regards,
Dean

Re: meson and sslfiles.mk in src/test/ssl/

2023-02-22 Thread Peter Eisentraut


On 21.02.23 00:54, Michael Paquier wrote:

While playing with the SSL tests last week, I have noticed that we
don't have a way to regenerate the SSL files with meson for its TAP
suite.  It seems to me that we had better transfer the rules currently
stored in sslfiles.mk into something that meson can use?

Another approach may be to keep sslfiles.mk around, still allow meson
to invoke it with a specific target?  I am not exactly sure how this
would be handled, but it looks like a transfer would make sense in the
long-term if we target a removal of the dependency with make.


I think the tradeoff here is, given how rarely those rules are used, is 
it worth maintaining duplicate implementations or complicated bridging?


It's clearly something to deal with eventually, but it's not high priority.

Re: Raising the SCRAM iteration count

2023-02-22 Thread Daniel Gustafsson

> On 17 Dec 2022, at 04:27, Michael Paquier  wrote:
> 
> On Thu, Dec 15, 2022 at 12:09:15PM +0100, Daniel Gustafsson wrote:
>>> On 15 Dec 2022, at 00:52, Michael Paquier  wrote:
>>>   conn->in_hot_standby = PG_BOOL_UNKNOWN;
>>> +   conn->scram_iterations = SCRAM_DEFAULT_ITERATIONS;
>>> 
>>> s/SCRAM_DEFAULT_ITERATIONS/SCRAM_SHA_256_DEFAULT_ITERATIONS/ and
>>> s/scram_iterations/scram_sha_256_interations/ perhaps?  
>> 
>> Distinct members in the conn object is only of interest if there is a way for
>> the user to select a different password method in \password right?  I can
>> rename it now but I think doing too much here is premature, awaiting work on
>> \password (should that materialize) seems reasonable no?
> 
> You could do that already, somewhat indirectly, with
> password_encryption, assuming that it supports more than one mode
> whose password build is influenced by it.  If you wish to keep it
> named this way, this is no big deal for me either way, so feel free to
> use what you think is best based on the state of HEAD.  I think that
> I'd value more the consistency with the backend in terms of naming,
> though.

ok, renamed.

>>> @@ -692,7 +697,7 @@ mock_scram_secret(const char *username, int 
>>> *iterations, char **salt,
>>>   encoded_salt[encoded_len] = '\0';
>>> 
>>>   *salt = encoded_salt;
>>> -   *iterations = SCRAM_DEFAULT_ITERATIONS;
>>> +   *iterations = scram_sha256_iterations;
>>> 
>>> This looks incorrect to me?  The mock authentication is here to
>>> produce a realistic verifier, still it will fail.  It seems to me that
>>> we'd better stick to the default in all the cases.
>> 
>> For avoiding revealing anything, I think a case can be argued for both.  I've
>> reverted back to the default though.
>> 
>> I also renamed the GUC sha_256 to match terminology we use.
> 
> +   "SET password_encryption='scram-sha-256';
> +SET scram_sha_256_iterations=10;
> Maybe use a lower value to keep the test cheap?

Fixed.

> +time of encryption. In order to make use of a changed value, new
> +password must be set.
> "A new password must be set".

Fixed.

> Superuser-only GUCs should be documented as such, or do you intend to
> make it user-settable like I suggested upthread :) ?

I don't really have strong feelings, so I reverted to being user-settable since
I can't really present a strong argument for superuser-only.

The attached is a rebase on top of master with no other additional hacking done
on top of the above review comments.

--
Daniel Gustafsson



v4-0001-Make-SCRAM-iteration-count-configurable.patch
Description: Binary data

Re: Seek for helper documents to implement WAL with an FDW

2023-02-22 Thread Peter Eisentraut


On 20.02.23 10:53, Komal Habura wrote:
         I have written an FDW, which is similar to the file_fdw. I need 
the support of WAL to perform logical and stream replication. I have 
knowledge about custom WAL but do not have clarity about implementing 
WAL(writing, redo, desc, identify, etc..) and cases where WAL can be 
applied.


A foreign-data wrapper manages *foreign* data, which almost by 
definition means that it does not participate in the transaction 
management of the local PostgreSQL instance, including in the WAL.  If 
you want to build a custom storage format that does participate in the 
local transaction management, you should probably look at building 
either a table access method or a storage manager.

Re: Improving inferred query column names

2023-02-22 Thread Peter Eisentraut


On 20.02.23 16:17, David G. Johnston wrote:

I think we should just do it and not care about what breaks.  There has
never been any guarantee about these.


I'm going to toss a -1 into the ring but if this does go through a 
strong request that it be disabled via a GUC.  The ugliness of that 
option is why we shouldn't do this.


Defacto reality is still a reality we are on the hook for.

I too find the legacy design choice to be annoying but not so much that 
changing it seems like a good idea.


Well, a small backward compatibility GUC might not be too cumbersome.

Re: Ignoring BRIN for HOT updates (was: -udpates seems broken)

2023-02-22 Thread Matthias van de Meent

On Wed, 22 Feb 2023 at 13:15, Tomas Vondra
 wrote:
>
> On 2/20/23 19:15, Matthias van de Meent wrote:
> > Thanks. Based on feedback, attached is v2 of the patch, with as
> > significant changes:
> >
> > - We don't store the columns we mention in predicates of summarized
> > indexes in the hotblocking column anymore, they are stored in the
> > summarized columns bitmap instead. This further reduces the chance of
> > failiing to apply HOT with summarizing indexes.
>
> Interesting idea. I need to think about the correctness, but AFAICS it
> should work. Do we have any tests covering such cases?

There is a test that checks that an update to the predicated column
does update the index (on table brin_hot_2). However, the description
was out of date, so I've updated that in v4.

> > - The heaptuple header bit for summarized update in inserted tuples is
> > replaced with passing an out parameter. This simplifies the logic and
> > decreases chances of accidentally storing incorrect data.
> >
>
> OK.
>
> 0002 proposes a minor RelationGetIndexPredicate() tweak, getting rid of
> the repeated if/else branches. Feel free to discard, if you think the v2
> approach is better.

I agree that this is better, it's included in v4 of the patch, as attached.

Kind regards,

Matthias van de Meent.


v4-0001-Ignore-BRIN-indexes-when-checking-for-HOT-updates.patch
Description: Binary data

Re: Allow ordered partition scans in more cases

2023-02-22 Thread Ronan Dunklau

Thank you for improving this optimization !

Le mardi 21 février 2023, 04:14:02 CET David Rowley a écrit :
> I still need to look to see if there's some small amount of data that
> can be loaded into the table to help coax the planner into producing
> the ordered scan for this one.  It works fine as-is for ORDER BY a,b
> and ORDER BY a; so I've put tests in for that.

I haven't looked too deeply into it, but it seems reasonable that the whole 
sort would cost cheaper than individual sorts on partitions + incremental 
sorts, except when the the whole sort would spill to disk much more than the 
incremental ones. I find it quite difficult to reason about what that threshold 
should be, but I managed to find a case which could fit in a test:

create table range_parted (a int, b int, c int) partition by range(a, b);
create table range_parted1 partition of range_parted for values from (0,0) to 
(10,10);
create table range_parted2 partition of range_parted for values from (10,10) 
to (20,20);
insert into range_parted(a, b, c) select i, j, k from generate_series(1, 19) 
i, generate_series(1, 19) j, generate_series(1, 5) k;
analyze range_parted;
set random_page_cost = 10;
set work_mem = '64kB';
explain (costs off) select * from range_parted order by a,b,c;

It's quite convoluted, because it needs the following:
 - estimate the individual partition sorts to fit into work_mem (even if that's 
not the case here at runtime)
 - estimate the whole table sort to not fit into work_mem
 - the difference between the two should be big enough to compensate the 
incremental sort penalty (hence raising random_page_cost).

This is completely tangential to the subject at hand, but maybe we have 
improvements to do with the way we estimate what type of sort will be 
performed ? It seems to underestimate the memory amount needed. I'm not sure 
it makes a real difference in real use cases though. 

Regards,

--
Ronan Dunklau

Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-02-22 Thread Melih Mutlu

Hi Wang,

Thanks for reviewing.
Please see updated patches. [1]

wangw.f...@fujitsu.com , 7 Şub 2023 Sal, 10:28
tarihinde şunu yazdı:

> 1. In the function ApplyWorkerMain.
> I think we need to keep the worker name as "leader apply worker" in the
> comment
> like the current HEAD.
>

Done.


> I think in this case we also need to pop the error context stack before
> returning. Otherwise, I think we might use the wrong callback
> (apply error_callback) after we return from this function.
>

Done.


> 3. About the function UpdateSubscriptionRelReplicationSlot.
> This newly introduced function UpdateSubscriptionRelReplicationSlot does
> not
> seem to be invoked. Do we need this function?


Removed.

I think if 'need_full_snapshot' is false, it means we will create a snapshot
> that can read only catalogs. (see SnapBuild->building_full_snapshot)


Fixed.

```
> 'use' will use the snapshot for the current transaction executing the
> command.
> This option must be used in a transaction, and CREATE_REPLICATION_SLOT
> must be
> the first command run in that transaction.
> ```

So I think in the function CreateDecodingContext, when "need_full_snapshot"
> is
> true, we seem to need the following check, just like in the function
> CreateInitDecodingContext:

```
> if (IsTransactionState() &&
> GetTopTransactionIdIfAny() != InvalidTransactionId)
> ereport(ERROR,
> (errcode(ERRCODE_ACTIVE_SQL_TRANSACTION),
>  errmsg("cannot create logical replication
> slot in transaction that has performed writes")));
> ```


You're right to "use" the snapshot, it must be the first command in the
transaction. And that check happens here [2]. CreateReplicationSnapshot has
also similar check.
I think the check you're referring to is needed to actually create a
replication slot and it performs whether the snapshot will be "used" or
"exported". This is not the case for CreateReplicationSnapshot.

It seems that we also need to invoke the function
> CheckLogicalDecodingRequirements in the new function
> CreateReplicationSnapshot,
> just like the function CreateReplicationSlot and the function
> StartLogicalReplication.


Added this check.

3. The invocation of startup_cb_wrapper in the function
> CreateDecodingContext.
> I think we should change the third input parameter to true when invoke
> function
> startup_cb_wrapper for CREATE_REPLICATION_SNAPSHOT. BTW, after applying
> patch
> v10-0002*, these settings will be inconsistent when sync workers use
> "CREATE_REPLICATION_SLOT" and "CREATE_REPLICATION_SNAPSHOT" to take
> snapshots.
> This input parameter (true) will let us disable streaming and two-phase
> transactions in function pgoutput_startup. See the last paragraph of the
> commit
> message for 4648243 for more details.


I'm not sure if "is_init" should be set to true. CreateDecodingContext only
creates a context for an already existing logical slot and does not
initialize new one.
I think inconsistencies between "CREATE_REPLICATION_SLOT" and
"CREATE_REPLICATION_SNAPSHOT" are expected since one creates a new
replication slot and the other does not.
CreateDecodingContext is also used in other places as well. Not sure how
this change would affect those places. I'll look into this more. Please let
me know if I'm missing something.


[1]
https://www.postgresql.org/message-id/CAGPVpCQmEE8BygXr%3DHi2N2t2kOE%3DPJwofn9TX0J9J4crjoXarQ%40mail.gmail.com
[2]
https://github.com/postgres/postgres/blob/master/src/backend/replication/walsender.c#L1108

Thanks,
-- 
Melih Mutlu
Microsoft

Re: [PATCH] Support SK_SEARCHNULL / SK_SEARCHNOTNULL for heap-only scans

2023-02-22 Thread Aleksander Alekseev

Hi,

> I'm confused, what exactly is the benefit of this? What extension
> performs a direct table scan bypassing the executor, searching for NULLs
> or not-NULLs?

Basically any extension that accesses the tables without SPI in order
to avoid parsing and planning overhead for relatively simple cases.
One can specify *several* ScanKeys for a single scan which will be an
equivalent of WHERE condition(a) AND b IS NOT NULL /* AND ... */;

> If heapam can check for NULL/not-NULL more efficiently than the code
> that calls it [...]

This is done not for efficiency but rather for convenience.
Additionally, practice shows that for an extension author it's very
easy to miss a comment in skey.h:

"""
 * SK_SEARCHARRAY, SK_SEARCHNULL and SK_SEARCHNOTNULL are supported only
 * for index scans, not heap scans;
"""

... which results in many hours of debugging. The current interface is
misleading and counterintuitive.

I did my best in order to add as few new assembly instructions as
possible, and only one extra if/else branching. I don't expect any
measurable performance difference since the bottleneck for SeqScans is
unlikely to be CPU in the affected piece of code but rather
disk/locks/network/etc. On top of that the scenario when somebody is
really worried about the performance AND is using seqscans (not index
scans) AND this particular seqscan is a bottleneck (not JOINs, etc)
seems rare, to me at least.

> For tableam extensions, which may or may not support checking for NULLs,
> we need to add an 'amsearchnulls' field to the table AM API.

This will result in an unnecessary complication of the code and
expensive extra checks that for the default heapam will always return
true. I would argue that what we actually want is to force any TAM to
support checking for NULLs. At least until somebody working on a real
TAM will complain about this limitation.

> But then let's also modify the caller in nodeSeqScan.c to actually make use 
> of it.

That could actually be a good point.

If memory serves I noticed that WHERE ... IS NULL queries don't even
hit HeapKeyTest() and I was curious where the check for NULLs is
actually made. As I understand, SeqNext() in nodeSeqscan.c simply
iterates over all the tuples it can find and pushes them to the parent
node. We could get a slightly better performance for certain queries
if SeqNext() did the check internally.

Unfortunately I'm not very experienced with plan nodes in order to go
down this rabbit hole straight away. I suggest we make one change at a
time and keep the patchset small as it was previously requested by
many people on several occasions (the 64-bit XIDs story, etc). I will
be happy to propose a follow-up patch accompanied by the benchmarks if
and when we reach the consensus on this patch.

-- 
Best regards,
Aleksander Alekseev

Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-02-22 Thread Melih Mutlu

Hi Shveta,

Thanks for reviewing.
Please see attached patches.

shveta malik , 2 Şub 2023 Per, 14:31 tarihinde şunu
yazdı:

> On Wed, Feb 1, 2023 at 5:37 PM Melih Mutlu  wrote:
> for (int64 i = 1; i <= lastusedid; i++)
> {
> charoriginname_to_drop[NAMEDATALEN] = {0};
> snprintf(originname_to_drop,
> sizeof(originname_to_drop), "pg_%u_%lld", subid, (long long) i);
>  ...
>   }
>
> --Is it better to use the function
> 'ReplicationOriginNameForLogicalRep' here instead of sprintf, just to
> be consistent everywhere to construct origin-name?
>

ReplicationOriginNameForLogicalRep creates a slot name with current
"lastusedid" and doesn't accept that id as parameter. Here the patch needs
to check all possible id's.

> /* Drop replication origin */
> replorigin_drop_by_name(originname, true, false);
> }
>
> --Are we passing missing_ok as true (second arg) intentionally here in
> replorigin_drop_by_name? Once we fix the issue reported  in my earlier
> email (ASSERT), do you think it makes sense to  pass missing_ok as
> false here?
>

Yes, missing_ok is intentional. The user might be concurrently refreshing
the sub or the apply worker might already drop the origin at that point.
So, missing_ok is set to true.
This is also how origin drops before the worker exits are handled on HEAD
too. I only followed the same approach.

> --Do we need to palloc for each relation separately? Shall we do it
> once outside the loop and reuse it? Also pfree is not done for rstate
> here.
>

Removed palloc from the loop. No need to pfree here since the memory
context will be deleted with the next CommitTransactionCommand call.

> Can you please review the above flow (I have given line# along with),
> I think it could be problematic. We alloced prev_slotname, assigned it
> to slotname, freed prev_slotname and used slotname after freeing the
> prev_slotname.
> Also slotname is allocated some memory too, that is not freed.
>

Right, I used memcpy instead of assigning prev_slotname to slotname.
slotname is returned in the end and pfree'd later [1]

I also addressed your other reviews that I didn't explicitly mention in
this email. I simply applied the changes you pointed out. Also added some
more logs as well. I hope it's more useful now.

[1]
https://github.com/postgres/postgres/blob/master/src/backend/replication/logical/worker.c#L4359

Thanks,
-- 
Melih Mutlu
Microsoft

v8-0001-Add-replication-protocol-cmd-to-create-a-snapsho.patch
Description: Binary data

v11-0002-Reuse-Logical-Replication-Background-worker.patch
Description: Binary data

meson: Add equivalent of configure --disable-rpath option

2023-02-22 Thread Peter Eisentraut

The configure option --disable-rpath currently has no equivalent in 
meson.  This option is used by packagers, so I think it would be good to 
have it in meson as well.  I came up with the attached patch.From 2677bb5f7f808000e8a04e3b8c3193e3faa4de39 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 22 Feb 2023 13:54:23 +0100
Subject: [PATCH] meson: Add equivalent of configure --disable-rpath option

---
 doc/src/sgml/installation.sgml | 22 +-
 meson.build| 16 +---
 meson_options.txt  |  3 +++
 src/makefiles/meson.build  |  2 +-
 4 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index 0ed35d99e9..6e1087c67f 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -2216,7 +2216,9 @@ Installation Locations
   different subdirectories may render the installation non-relocatable,
   meaning you won't be able to move it after installation.
   (The man and doc locations are
-  not affected by this restriction.)
+  not affected by this restriction.)  For relocatable installs, you
+  might want to use the -Drpath=false option
+  described later.
  
 
  
@@ -2856,6 +2858,24 @@ Build Process Details
   
  
 
+ 
+  -Drpath={ true | false }
+  
+   
+Do not mark PostgreSQL's executables
+to indicate that they should search for shared libraries in the
+installation's library directory (see --libdir).
+On most platforms, this marking uses an absolute path to the
+library directory, so that it will be unhelpful if you relocate
+the installation later.  However, you will then need to provide
+some other way for the executables to find the shared libraries.
+Typically this requires configuring the operating system's
+dynamic linker to search the library directory; see
+ for more detail.
+   
+  
+ 
+
  
   
-DBINARY_NAME=PATH
   
diff --git a/meson.build b/meson.build
index f534704452..136a5f8b09 100644
--- a/meson.build
+++ b/meson.build
@@ -2572,7 +2572,6 @@ default_target_args = {
 
 default_lib_args = default_target_args + {
   'name_prefix': '',
-  'install_rpath': ':'.join(lib_install_rpaths),
 }
 
 internal_lib_args = default_lib_args + {
@@ -2583,14 +2582,25 @@ internal_lib_args = default_lib_args + {
 default_mod_args = default_lib_args + {
   'name_prefix': '',
   'install_dir': dir_lib_pkg,
-  'install_rpath': ':'.join(mod_install_rpaths),
 }
 
 default_bin_args = default_target_args + {
   'install_dir': dir_bin,
-  'install_rpath': ':'.join(bin_install_rpaths),
 }
 
+if get_option('rpath')
+  default_lib_args += {
+'install_rpath': ':'.join(lib_install_rpaths),
+  }
+
+  default_mod_args += {
+'install_rpath': ':'.join(mod_install_rpaths),
+  }
+
+  default_bin_args += {
+'install_rpath': ':'.join(bin_install_rpaths),
+  }
+endif
 
 
 # Helper for exporting a limited number of symbols
diff --git a/meson_options.txt b/meson_options.txt
index 9d3ef4aa20..7d33c9f1d4 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -67,6 +67,9 @@ option('extra_version', type : 'string', value: '',
 option('darwin_sysroot', type : 'string', value: '',
   description: 'select a non-default sysroot path')
 
+option('rpath', type : 'boolean', value: true,
+  description: 'whether to embed shared library search path in executables')
+
 
 # External dependencies
 
diff --git a/src/makefiles/meson.build b/src/makefiles/meson.build
index bf7303dc99..5a0032ab0d 100644
--- a/src/makefiles/meson.build
+++ b/src/makefiles/meson.build
@@ -53,7 +53,7 @@ pgxs_kv = {
   'abs_top_srcdir': meson.source_root(),
 
   'enable_thread_safety': 'yes',
-  'enable_rpath': 'yes',
+  'enable_rpath': get_option('rpath') ? 'yes' : 'no',
   'enable_nls': libintl.found() ? 'yes' : 'no',
   'enable_tap_tests': tap_tests_enabled ? 'yes' : 'no',
   'enable_debug': get_option('debug') ? 'yes' : 'no',
-- 
2.39.2

Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-02-22 Thread Melih Mutlu

Hi,

Melih Mutlu , 16 Şub 2023 Per, 14:37 tarihinde şunu
yazdı:

> I see that setting originname in the catalog before actually creating it
> causes issues. My concern with setting originname when setting the state to
> FINISHEDCOPY is that if worker waits until FINISHEDCOPY to update
> slot/origin name but it fails somewhere before reaching FINISHEDCOPY and
> after creating slot/origin, then this new created slot/origin will be left
> behind. It wouldn't be possible to find and drop them since their names are
> not stored in the catalog. Eventually, this might also cause hitting
> the max_replication_slots limit in case of such failures between origin
> creation and FINISHEDCOPY.
>
> One fix I can think is to update the catalog right after creating a new
> origin. But this would also require commiting the current transaction to
> actually persist the originname. I guess this action of commiting the
> transaction in the middle of initial sync could hurt the copy process.
>

Here are more thoughts on this:
I still believe that updating originname when setting state
to FINISHEDCOPY is not a good idea since any failure
before FINISHEDCOPY prevent us to store originname in the catalog. If an
origin or slot is not in the catalog, it's not easily possible to find and
drop origins/slot that are left behind. And we definitely do not want to
keep unnecessary origins/slots since we would hit max_replication_slots
limit.
It's better to be safe and update origin/slot names when setting state
to DATASYNC. At this point, the worker must be sure that it writes correct
origin/slot names into the catalog.
Following part actually cleans up the catalog if a table is left behind in
DATASYNC state and its slot and origin cannot be used for sync.

ReplicationSlotDropAtPubNode(LogRepWorkerWalRcvConn, prev_slotname, true);
>
> StartTransactionCommand();
> /* Replication origin might still exist. Try to drop */
> replorigin_drop_by_name(originname, true, false);
>
> /*
> * Remove replication slot and origin name from the relation's
> * catalog record
> */
> UpdateSubscriptionRel(MyLogicalRepWorker->subid,
>  MyLogicalRepWorker->relid,
>  MyLogicalRepWorker->relstate,
>  MyLogicalRepWorker->relstate_lsn,
>  NULL,
>  NULL);
>

The patch needs to refresh the origin name before it begins copying the
table. It will try to read from the catalog but won't find any slot/origin
since they are cleaned. Then, it will move on with the correct origin name
which is the one created/will be created for the current sync worker.

I tested refetching originname and seems like it fixes the errors you
reported.

Thanks,
-- 
Melih Mutlu
Microsoft

RE: Fix the description of GUC "max_locks_per_transaction" and "max_pred_locks_per_transaction" in guc_table.c

2023-02-22 Thread wangw.f...@fujitsu.com

On Wed, Feb 22, 2023 at 8:37 AM Nathan Bossart  wrote:
> On Wed, Feb 15, 2023 at 08:16:43AM +, wangw.f...@fujitsu.com wrote:
> > When I refer to the GUC "max_locks_per_transaction", I find that the
> description
> > of the shared lock table size in pg-doc[1] is inconsistent with the code
> > (guc_table.c). BTW, the GUC "max_predicate_locks_per_xact" has similar
> problems.
> >
> > I think the descriptions in pg-doc are correct.
> > - GUC "max_locks_per_transaction"
> > Please refer to the macro "NLOCKENTS" used to obtain max_table_size in the
> > function InitLocks.
> >
> > - GUC "max_predicate_locks_per_xact"
> > Please refer to the macro "NPREDICATELOCKTARGETENTS" used to obtain
> > max_table_size in the function InitPredicateLocks.
> 
> The GUC description for max_locks_per_transaction was first added in
> b700a67 (July 2003).  Neither the GUC description nor the documentation was
> updated when max_prepared_transactions was introduced in d0a8968 (July
> 2005).  However, the documentation was later fixed via 78ef2d3 (August
> 2005).  It looks like the GUC description for
> max_predicate_locks_per_transaction was wrong from the start.  In dafaa3e
> (February 2011), the GUC description does not include
> max_prepared_transactions, but the documentation does.
> 
> It's interesting that the documentation cites max_connections, as the
> tables are sized using MaxBackends, which includes more than just
> max_connections (e.g., autovacuum_max_workers, max_worker_processes,
> max_wal_senders).  After some digging, I see that MaxBackends was the
> original variable used for max_connections (which was called max_backends
> until 648677c (July 2000)), and it wasn't until autovacuum_max_workers was
> introduced in e2a186b (April 2007) before max_connections got its own
> MaxConnections variable and started diverging from MaxBackends.
> 
> So, even with your patch applied, I don't think the formulas are correct.
> I don't know if it's worth writing out the exact formula, though.  It
> doesn't seem to be kept up-to-date, and I don't know if users would choose
> different values for max_locks_per_transaction if it _was_ updated.
> Perhaps max_connections is a good enough approximation of MaxBackends most
> of the time...

Thanks very much for your careful review.

Yes, you are right. I think the formulas in the v1 patch are all approximations.
I think the exact formula (see function InitializeMaxBackends) is:
```
max_locks_per_transaction * (max_connections + autovacuum_max_workers + 
1 + 
 
max_worker_processes + max_wal_senders +
 
max_prepared_transactions)
```

After some rethinking, I think users can easily get exact value according to
exact formula, and I think using accurate formula can help users adjust
max_locks_per_transaction or max_predicate_locks_per_transaction if needed. So,
I used the exact formulas in the attached v2 patch.

Regards,
Wang Wei


v2-0001-Fix-the-description-of-shared-lock-table-size-and.patch
Description:  v2-0001-Fix-the-description-of-shared-lock-table-size-and.patch

RE: [Proposal] Add foreign-server health checks infrastructure

2023-02-22 Thread Hayato Kuroda (Fujitsu)

Dear Katsuragi-san,

Thank you for reviewing! New patch set can be available on [1].

> I rethought the pqSocketPoll part. Current interpretation of
> arguments seems a little bit confusing because a specific pattern
> of arguments has a different meaning. What do you think about
> introducing a new argument like `int forConnCheck`? This seems
> straightforward and readable.

I think it may be better, so fixed.
But now we must consider another thing - will we support combination of 
conncheck
and {read|write} or timeout? Especially about timeout, if someone calls 
pqSocketPoll()
with forConnCheck = 1 and end_time = -1, the process may be stuck because it 
waits
till socket peer closed connection.
Currently the forConnCheck can be specified with other request, but timeout 
must be zero.

> I think there are cases where the given server name does exist,
> however the connection check is not performed (does not pass the
> first if statement above). 1) a case where the server name exist,
> however an open connection to the specified server is not found.
> 2) case where connection for specified server is invalidated.
> 3) case where there is not ConnectionHash entry for the specified
> server. Current implementation returns true in that case, however
> the check is not performed. Suppose the checking mechanism is
> supported on the platform, it does not seem reasonable to return
> true or false when the check is not performed. What do you think?

Thank you for detailed explanation. I agreed your opinion and modified like 
that.

While making the patch, I come up with idea that 
postgres_fdw_verify_connection_states*
returns NULL if PQconnCheck() cannot work on this platform. This can be done by
adding PQconnCheckable(). It may be reasonable because the checking is not 
really
done on this environment. How do you think?

[1]: 
https://www.postgresql.org/message-id/TYAPR01MB586664F5ED2128E572EA95B0F5AA9%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [Proposal] Add foreign-server health checks infrastructure

2023-02-22 Thread Hayato Kuroda (Fujitsu)

Dear Peter,

Thank you for reviewing! PSA new version.

> 1.
> PQconnCheck() function allows to check the status of the socket by polling
> the socket. This function is currently available only on systems that
> support the non-standard POLLRDHUP extension to the poll system call,
> including Linux.
> 
> ~
> 
> (missed fix from previous review)
> 
> "status of the socket" --> "status of the connection"

Sorry, fixed.

> 
> doc/src/sgml/libpq.sgml
> 
> 2. PQconnCheck
> +  
> +   This function check the health of the connection. Unlike  linkend="libpq-PQstatus"/>,
> +   this check is performed by polling the corresponding socket. This
> +   function is currently available only on systems that support the
> +   non-standard POLLRDHUP extension to the
> poll
> +   system call, including Linux. 
> +   returns greater than zero if the remote peer seems to be closed, 
> returns
> +   0 if the socket is valid, and returns
> -1
> +   if the connection has already been closed or an error has occurred.
> +  
> 
> "check the health" --> "checks the health"

Fixed.

> 3. PQcanConnCheck
> 
> +  
> +   Returns true (1) or false (0) to indicate if the PQconnCheck function
> +   is supported on this platform.
> 
> Should the reference to PQconnCheck be a link as it previously was?

Right, fixed.

> src/interfaces/libpq/fe-misc.c
> 
> 4. PQconnCheck
> 
> +/*
> + * Check whether the socket peer closed connection or not.
> + *
> + * Returns >0 if remote peer seems to be closed, 0 if it is valid,
> + * -1 if the input connection is bad or an error occurred.
> + */
> +int
> +PQconnCheck(PGconn *conn)
> +{
> + return pqSocketCheck(conn, 0, 0, (time_t) 0);
> +}
> 
> I'm confused. This comment says =0 means connection is valid. But the
> pqSocketCheck comment says =0 means it timed out.
> 
> So those two function comments don't seem compatible

Added further descriptions atop pqSocketCheck() and pqSocketPoll().
Is it helpful to understand?

> 5. PQconnCheckable
> 
> +/*
> + * Check whether PQconnCheck() works well on this platform.
> + *
> + * Returns true (1) if this can use PQconnCheck(), otherwise false (0).
> + */
> +int
> +PQconnCheckable(void)
> +{
> +#if (defined(HAVE_POLL) && defined(POLLRDHUP))
> + return true;
> +#else
> + return false;
> +#endif
> +}
> 
> Why say "works well"? IMO it either works or doesn't work – there is no 
> "well".
> 
> SUGGESTION1
> Check whether PQconnCheck() works on this platform.
> 
> SUGGESTION2
> Check whether PQconnCheck() can work on this platform.

I choose 2.

> 6. pqSocketCheck
> 
>  /*
>   * Checks a socket, using poll or select, for data to be read, written,
> - * or both.  Returns >0 if one or more conditions are met, 0 if it timed
> + * or both. Moreover, when neither forRead nor forWrite is requested and
> + * timeout is disabled, try to check the health of socket.
> + *
> + * Returns >0 if one or more conditions are met, 0 if it timed
>   * out, -1 if an error occurred.
>   *
>   * If SSL is in use, the SSL buffer is checked prior to checking the socket
> 
> ~
> 
> See review comment #4. (e.g. This says =0 if it timed out).

Descriptions were added.

> 7. pqSocketPoll
> 
> + * When neither forRead nor forWrite are set and timeout is disabled,
> + *
> + * - If the timeout is disabled, try to check the health of the socket
> + * - Otherwise this immediately returns 0
> + *
> + * Return >0 if condition is met, 0 if a timeout occurred, -1 if an error
> + * or interrupt occurred.
> 
> Don't say "and timeout is disabled," because it clashes with the 1st
> bullet which also says "- If the timeout is disabled,".

This comments were reworded.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v33-0001-Add-PQconnCheck-and-PQconnCheckable-to-libpq.patch
Description:  v33-0001-Add-PQconnCheck-and-PQconnCheckable-to-libpq.patch


v33-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch
Description:  v33-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch


v33-0003-add-test.patch
Description: v33-0003-add-test.patch

Re: Experiments with Postgres and SSL

2023-02-22 Thread Heikki Linnakangas


On 20/01/2023 03:28, Jacob Champion wrote:

On Wed, Jan 18, 2023 at 7:16 PM Greg Stark  wrote:

* "Service Mesh" type tools that hide multiple services behind a
single host/port ("Service Mesh" is just a new buzzword for "proxy").


If you want to multiplex protocols on a port, now is an excellent time
to require clients to use ALPN on implicit-TLS connections. (There are
no clients that can currently connect via implicit TLS, so you'll
never have another chance to force the issue without breaking
backwards compatibility.) That should hopefully make it harder to
ALPACA yourself or others [2].


Good idea. Do we want to just require the protocol to be "postgres", or 
perhaps "postgres/3.0"? Need to register that with IANA, I guess.


We implemented a protocol version negotiation mechanism in the libpq 
protocol itself, how would this interact with it? If it's just 
"postgres", then I guess we'd still negotiate the protocol version and 
list of extensions after the TLS handshake.



Incidentally I find the logic in ProcessStartupPacket incredibly
confusing. It took me a while before I realized it's using tail
recursion to implement the startup logic. I think it would be way more
straightforward and extensible if it used a much more common iterative
style. I think it would make it possible to keep more state than just
ssl_done and gss_done without changing the function signature every
time for example.


+1. The complexity of the startup logic, both client- and server-side,
is a big reason why I want implicit TLS in the first place. That way,
bugs in that code can't be exploited before the TLS handshake
completes.


+1. We need to support explicit TLS for a long time, so we can't 
simplify by just removing it. But let's refactor the code somehow, to 
make it more clear.


Looking at the patch, I think it accepts an SSLRequest packet even if 
implicit TLS has already been established. That's surely wrong, and 
shows how confusing the code is. (Or I'm reading it incorrectly, which 
also shows how confusing it is :-) )


Regarding Vladimir's comments on how clients can migrate to this, I 
don't have any great suggestions. To summarize, there are several options:


- Add an "fast_tls" option that the user can enable if they know the 
server supports it


- First connect in old-fashioned way, and remember the server version. 
Later, if you reconnect to the same server, use implicit TLS if the 
server version was high enough. This would be most useful for connection 
pools.


- Connect both ways at the same time, and continue with the fastest, 
i.e. "happy eyeballs"


- Try implicit TLS first, and fall back to explicit TLS if it fails.

For libpq, we don't necessarily need to do anything right now. We can 
add the implicit TLS support in a later version. Not having libpq 
support makes it hard to test the server codepath, though. Maybe just 
test it with 'stunnel' or 'openssl s_client'.


- Heikki

Re: Performance issues with parallelism and LIMIT

2023-02-22 Thread Tomas Vondra



On 2/20/23 19:18, David Geier wrote:
> Hi,
> 
> On 2/8/23 11:42, Tomas Vondra wrote:
>> On 2/1/23 14:41, David Geier wrote:
>>
>> Yeah, this is a pretty annoying regression. We already can hit poor
>> behavior when matching rows are not distributed uniformly in the tables
>> (which is what LIMIT costing assumes), and this makes it more likely to
>> hit similar issues. A bit like when doing many HTTP requests makes it
>> more likely to hit at least one 99% outlier.
> Are you talking about the use of ordering vs filtering indexes in
> queries where there's both an ORDER BY and a filter present (e.g. using
> an ordering index but then all rows passing the filter are at the end of
> the table)? If not, can you elaborate a bit more on that and maybe give
> an example.

Yeah, roughly. I don't think the explicit ORDER BY is a requirement for
this to happen - it's enough when the part of the plan below LIMIT
produces many rows, but the matching rows are at the end.

>> No opinion on these options, but worth a try. Alternatively, we could
>> try the usual doubling approach - start with a low threshold (and set
>> the latch frequently), and then gradually increase it up to the 1/4.
>>
>> That should work both for queries expecting only few rows and those
>> producing a lot of data.
> 
> I was thinking about this variant as well. One more alternative would be
> latching the leader once a worker has produced 1/Nth of the LIMIT where
> N is the number of workers. Both variants have the disadvantage that
> there are still corner cases where the latch is set too late; but it
> would for sure be much better than what we have today.
> 
> I also did some profiling and - at least on my development laptop with 8
> physical cores - the original example, motivating the batching change is
> slower than when it's disabled by commenting out:
> 
>     if (force_flush || mqh->mqh_send_pending > (mq->mq_ring_size >> 2))
> 
> SET parallel_tuple_cost TO 0;
> CREATE TABLE b (a int);
> INSERT INTO b SELECT generate_series(1, 2);
> ANALYZE b;
> EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM b;
> 
>  Gather  (cost=1000.00..1200284.61 rows=200375424 width=4) (actual
> rows=2 loops=1)
>    Workers Planned: 7
>    Workers Launched: 7
>    ->  Parallel Seq Scan on b  (cost=0.00..1199284.61 rows=28625061
> width=4) (actual rows=2500 loops=8)
> 
> Always latch: 19055 ms
> Batching:     19575 ms
> 
> If I find some time, I'll play around a bit more and maybe propose a patch.
> 

OK. Once you have a WIP patch maybe share it and I'll try to do some
profiling too.

>>> ...
>>>
>>> We would need something similar to CHECK_FOR_INTERRUPTS() which returns
>>> a NULL slot if a parallel worker is supposed to stop execution (we could
>>> e.g. check if the queue got detached). Or could we amend
>>> CHECK_FOR_INTERRUPTS() to just stop the worker gracefully if the queue
>>> got detached?
>>>
>> That sounds reasonable, but I'm not very familiar the leader-worker
>> communication, so no opinion on how it should be done.
> 
> I think an extra macro that needs to be called from dozens of places to
> check if parallel execution is supposed to end is the least preferred
> approach. I'll read up more on how CHECK_FOR_INTERRUPTS() works and if
> we cannot actively signal the workers that they should stop.
> 

IMHO if this requires adding another macro to a bunch of ad hoc places
is rather inconvenient. It'd be much better to fix this in a localized
manner (especially as it seems related to a fairly specific place).

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Ignoring BRIN for HOT updates (was: -udpates seems broken)

2023-02-22 Thread Tomas Vondra



On 2/20/23 19:15, Matthias van de Meent wrote:
> Hi,
> 
> On Sun, 19 Feb 2023 at 16:04, Tomas Vondra
>  wrote:
>>
>> Hi,
>>
>> On 2/19/23 02:03, Matthias van de Meent wrote:
>>> On Thu, 16 Jun 2022 at 15:05, Tomas Vondra
>>>  wrote:

 I've pushed the revert. Let's try again for PG16.
>>>
>>> As we discussed in person at the developer meeting, here's a patch to
>>> try again for PG16.
>>>
>>> It combines the committed patches with my fix, and adds some
>>> additional comments and polish. I am confident the code is correct,
>>> but not that it is clean (see the commit message of the patch for
>>> details).
>>>
>>
>> Thanks for the patch. I took a quick look, and I agree it seems correct,
>> and fairly clean too.
> 
> Thanks. Based on feedback, attached is v2 of the patch, with as
> significant changes:
> 
> - We don't store the columns we mention in predicates of summarized
> indexes in the hotblocking column anymore, they are stored in the
> summarized columns bitmap instead. This further reduces the chance of
> failiing to apply HOT with summarizing indexes.

Interesting idea. I need to think about the correctness, but AFAICS it
should work. Do we have any tests covering such cases?

I see both v1 and v2 had exactly this

 src/test/regress/expected/stats.out   | 110 ++
 src/test/regress/sql/stats.sql|  82 -

so I guess there are no new tests testing this for BRIN with predicates.
We should probably add some ...

> - The heaptuple header bit for summarized update in inserted tuples is
> replaced with passing an out parameter. This simplifies the logic and
> decreases chances of accidentally storing incorrect data.
> 

OK.

0002 proposes a minor RelationGetIndexPredicate() tweak, getting rid of
the repeated if/else branches. Feel free to discard, if you think the v2
approach is better.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom b238e9c4308e9f2b7f0f2ca068b1217d1f906604 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent 
Date: Mon, 20 Feb 2023 18:38:30 +0100
Subject: [PATCH v3 1/2] Ignore BRIN indexes when checking for HOT updates

When determining whether an index update may be skipped by using HOT, we
can ignore attributes indexed by block summarizing indexes without
references to individual tuples that need to be cleaned up.

This also removes rd_indexattr list, and replaces it with rd_attrsvalid
flag. The list was not used anywhere, and a simple flag is sufficient.

A new type TU_UpdateIndexes is invented provide a signal to the executor
to determine which indexes to update - no indexes, all indexes, or only
the summarizing indexes.

One otherwise unused bit in the heap tuple header is (ab)used to signal
that the HOT update would still update at least one summarizing index.
The bit is cleared immediately

Original patch by Josef Simanek, various fixes and improvements by
Tomas Vondra and me.

Authors: Josef Simanek, Tomas Vondra, Matthias van de Meent
Reviewed-by: Tomas Vondra, Alvaro Herrera
---
 doc/src/sgml/indexam.sgml |  13 +++
 src/backend/access/brin/brin.c|   1 +
 src/backend/access/gin/ginutil.c  |   1 +
 src/backend/access/gist/gist.c|   1 +
 src/backend/access/hash/hash.c|   1 +
 src/backend/access/heap/heapam.c  |  48 +++-
 src/backend/access/heap/heapam_handler.c  |  19 ++-
 src/backend/access/nbtree/nbtree.c|   1 +
 src/backend/access/spgist/spgutils.c  |   1 +
 src/backend/access/table/tableam.c|   2 +-
 src/backend/catalog/index.c   |   9 +-
 src/backend/catalog/indexing.c|  35 --
 src/backend/commands/copyfrom.c   |   5 +-
 src/backend/commands/indexcmds.c  |  10 +-
 src/backend/executor/execIndexing.c   |  37 --
 src/backend/executor/execReplication.c|   9 +-
 src/backend/executor/nodeModifyTable.c|  13 ++-
 src/backend/nodes/makefuncs.c |   7 +-
 src/backend/utils/cache/relcache.c|  73 
 src/include/access/amapi.h|   2 +
 src/include/access/heapam.h   |   5 +-
 src/include/access/tableam.h  |  19 ++-
 src/include/executor/executor.h   |   3 +-
 src/include/nodes/execnodes.h |   2 +
 src/include/nodes/makefuncs.h |   4 +-
 src/include/utils/rel.h   |   4 +-
 src/include/utils/relcache.h  |   5 +-
 .../modules/dummy_index_am/dummy_index_am.c   |   1 +
 src/test/regress/expected/stats.out   | 110 ++
 src/test/regress/sql/stats.sql|  82 -
 30 files changed, 445 insertions(+), 78 deletions(-)

diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 4f83970c851..897419ec959 100644
--- a/doc/src/sgml/indexam.sgml

RE: Rework LogicalOutputPluginWriterUpdateProgress

2023-02-22 Thread wangw.f...@fujitsu.com

On Sun, Feb 19, 2023 at 21:06 PM Wang, Wei/王 威  wrote:
> On Thur, Feb 14, 2023 at 2:03 AM Andres Freund  wrote:
> > On 2023-02-13 14:06:57 +0530, Amit Kapila wrote:
> > > > > The patch calls update_progress in change_cb_wrapper and other
> > > > > wrappers which will miss the case of DDLs that generates a lot of data
> > > > > that is not processed by the plugin. I think for that we either need
> > > > > to call update_progress from reorderbuffer.c similar to what the patch
> > > > > has removed or we need some other way to address it. Do you have any
> > > > > better idea?
> > > >
> > > > I don't mind calling something like update_progress() in the specific 
> > > > cases
> > > > that's needed, but I think those are just the
> > > >   if (!RelationIsLogicallyLogged(relation))
> > > >   if (relation->rd_rel->relrewrite && !rb->output_rewrites))
> > > >
> > > > To me it makes a lot more sense to call update_progress() for those, 
> > > > rather
> > > > than generally.
> > > >
> > >
> > > Won't it be better to call it wherever we don't invoke any wrapper
> > > function like for cases REORDER_BUFFER_CHANGE_INVALIDATION, sequence
> > > changes, etc.? I was thinking that wherever we don't call the wrapper
> > > function which means we don't have a chance to invoke
> > > update_progress(), the timeout can happen if there are a lot of such
> > > messages.
> >
> > ISTM that the likelihood of causing harm due to increased overhead is higher
> > than the gain.
> 
> I would like to do something for this thread. So, I am planning to update the
> patch as per discussion in the email chain unless someone is already working 
> on
> it.

Thanks to Andres and Amit for the discussion.

Based on the discussion and Andres' WIP(in [1]), I made the following
modifications:
1. Some function renaming stuffs.
2. Added the threshold-related logic in the function
update_progress_and_keepalive.
3. Added the timeout-related processing of temporary data and
unlogged/foreign/system tables in the function ReorderBufferProcessTXN.
4. Improved error messages in the function OutputPluginPrepareWrite.
5. Invoked function update_progress_and_keepalive to fix sync-related problems
caused by filters such as origin in functions DecodeCommit(), DecodePrepare()
and ReorderBufferAbort();
6. Removed the invocation of function update_progress_and_keepalive in the
function begin_prepare_cb_wrapper().
7. Invoked the function update_progress_and_keepalive() in the function
stream_truncate_cb_wrapper(), just like we do in the function
truncate_cb_wrapper().
8. Removed the check of SyncRepRequested() in the syncrep logic in the function
WalSndUpdateProgressAndKeepAlive();
9. Added the check for wal_sender_timeout before using it in functions
WalSndUpdateProgressAndKeepAlive() and WalSndWriteData();

Attach the new patch.

[1] - 
https://www.postgresql.org/message-id/20230208200235.esfoggsmuvf4pugt%40awork3.anarazel.de

Regards,
Wang wei

v2-0001-Rework-LogicalOutputPluginWriterUpdateProgress.patch
Description:  v2-0001-Rework-LogicalOutputPluginWriterUpdateProgress.patch

Re: LWLock deadlock in brinRevmapDesummarizeRange

2023-02-22 Thread Tomas Vondra




On 2/22/23 12:35, Alvaro Herrera wrote:
> On 2023-Feb-22, Tomas Vondra wrote:
> 
>> But instead of I almost immediately ran into a LWLock deadlock :-(
> 
> Ouch.
> 
>> I've managed to reproduce this on PG13+, but I believe it's there since
>> the brinRevmapDesummarizeRange was introduced in PG10. I just haven't
>> tried on pre-13 releases.
> 
> Hmm, I think that might just be an "easy" way to hit it, but the problem
> is actually older than that, since AFAICS brin_doupdate is careless
> regarding locking order of revmap page vs. regular page.
> 

That's certainly possible, although I ran a lot of BRIN stress tests and
it only started failing after I added the desummarization. Although, the
tests are "randomized" like this:

UPDATE t SET a = '...' WHERE random() < 0.05;

which is fairly sequential. Maybe reordering the CTIDs a bit would hit
additional deadlocks, I'll probably give it a try. OTOH that'd probably
be much more likely to be hit by users, and I don't recall any such reports.

> Sadly, the README doesn't cover locking considerations.  I had that in a
> file called 'minmax-proposal' in version 16 of the patch here
> https://postgr.es/m/20140820225133.gb6...@eldon.alvh.no-ip.org
> but by version 18 (when 'minmax' became BRIN) I seem to have removed
> that file and replaced it with the README and apparently I didn't copy
> this material over.
> 

Yeah :-( There's a couple more things that are missing in the README,
like what oi_regular_nulls mean.

> ... and in there, I wrote that we would first write the brin tuple in
> the regular page, unlock that, and then lock the revmap for the update,
> without holding lock on the data page.  I don't remember why we do it
> differently now, but maybe the fix is just to release the regular page
> lock before locking the revmap page?  One very important change is that
> in previous versions the revmap used a separate fork, and we had to
> introduce an "evacuation protocol" when we integrated the revmap into
> the main fork, which may have changed the locking considerations.
> 

What would happen if two processes built the summary concurrently? How
would they find the other tuple, so that we don't end up with two BRIN
tuples for the same range?

> Another point: to desummarize a range, just unlinking the entry from
> revmap should suffice, from the POV of other index scanners.  Maybe we
> can simplify the whole procedure to: lock revmap, remove entry, remember
> page number, unlock revmap; lock regular page, delete entry, unlock.
> Then there are no two locks held at the same time during desummarize.
> 

Perhaps, as long as it doesn't confuse anything else.

> This comes from v16:
> 

I don't follow - what do you mean by v16? I don't see anything like that
anywhere in the repository.

> + Locking considerations
> + --
> + 
> + To read the TID during an index scan, we follow this protocol:
> + 
> + * read revmap page
> + * obtain share lock on the revmap buffer
> + * read the TID
> + * obtain share lock on buffer of main fork
> + * LockTuple the TID (using the index as relation).  A shared lock is
> +   sufficient.  We need the LockTuple to prevent VACUUM from recycling
> +   the index tuple; see below.
> + * release revmap buffer lock
> + * read the index tuple
> + * release the tuple lock
> + * release main fork buffer lock
> + 
> + 
> + To update the summary tuple for a page range, we use this protocol:
> + 
> + * insert a new index tuple somewhere in the main fork; note its TID
> + * read revmap page
> + * obtain exclusive lock on revmap buffer
> + * write the TID
> + * release lock
> + 
> + This ensures no concurrent reader can obtain a partially-written TID.
> + Note we don't need a tuple lock here.  Concurrent scans don't have to
> + worry about whether they got the old or new index tuple: if they get the
> + old one, the tighter values are okay from a correctness standpoint because
> + due to MVCC they can't possibly see the just-inserted heap tuples anyway.
> +
> + [vacuum stuff elided]
> 

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

1 2 >

1 - 100 of 127 matches

Mail list logo