date:20221220

Re: pgsql: Doc: Explain about Column List feature.

2022-12-20 Thread Alvaro Herrera

On 2022-Dec-21, Peter Smith wrote:

> By "searching" I also meant just scanning visually, although I was
> thinking more about scanning the PDF.
> 
> Right now, the intention of any text box is obvious at a glance
> because of those titles like "Caution", "Tip", "Note", "Warning".
> Sure, the HTML rendering also uses colours to convey the purpose, but
> in the PDF version [1] everything is black-and-white so apart from the
> title all boxes look the same. That's why I felt using non-standard
> box titles might be throwing away some of the meaning - e.g. the
> reader of the PDF won't know anymore at a glance are they looking at a
> warning or a tip.

Oh, I see.  It's been so long that I haven't looked at the PDFs, that I
failed to realize that they don't use color.  I agree that would be a
problem.  Maybe we can change the title to have the word:

Warning: Combining Column Lists from Multiple Publications

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: [PATCH] random_normal function

2022-12-20 Thread Fabien COELHO



Bonjour Michaël,


Overall, I think that there should be a clearer discussion and plan about
which random functionS postgres should provide to complement the standard
instead of going there… randomly:-)


So, what does the specification tells about seeds, normal and random
functions?  A bunch of DBMSs implement RAND, sometimes RANDOM, SEED or
even NORMAL using from time to time specific SQL keywords to do the
work.


I do not have the SQL standard, so I have no idea about what is in there.

From a typical use case point of view, I'd say uniform, normal and 
exponential would make sense for floats. I'm also okay with generating a 
uniform bytes pseudo-randomly.


I'd be more at ease to add simple functions rather than a special 
heavy-on-keywords syntax, even if standard.



Note that SQLValueFunction made the addition of more returning data
types a bit more complicated (not much, still) than the new
COERCE_SQL_SYNTAX by going through a mapping function, so the
keyword/function mapping is straight-forward.


I'm unclear about why this paragraph is here.

--
Fabien.

Re: (non) translatable string splicing

2022-12-20 Thread Kyotaro Horiguchi

At Mon, 19 Dec 2022 16:14:04 -0500, Robert Haas  wrote 
in 
> On Fri, Dec 16, 2022 at 8:25 AM Justin Pryzby  wrote:
> > Due to incomplete translation, that allows some pretty fancy output,
> > like:
> > | You must identify the directory where the residen los binarios del 
> > clúster antiguo.
> 
> I can't see how that could be mejor. :-)

It was quite annoying but not untranslatable. But the "the" before
"residen" looks like badly misplaced:p It should be a part of the
inner text ("residen los..").

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: (non) translatable string splicing

2022-12-20 Thread Kyotaro Horiguchi

At Mon, 19 Dec 2022 13:20:55 +0900, Michael Paquier  wrote 
in 
> On Fri, Dec 16, 2022 at 07:24:52AM -0600, Justin Pryzby wrote:
> > Due to incomplete translation, that allows some pretty fancy output,
> > like:
> > | You must identify the directory where the residen los binarios del 
> > clúster antiguo.
> > 
> > That commit also does this a couple times:
> > 
> > +_(" which is an index on \"%s.%s\""),

For this specific case I didn't feel a difficulty since it is
compatible with "This is blah" in that context.

> Ugh.  Perhaps we could just simplify the wordings as of "index on
> blah", "index on OID %u", "TOAST table for blah" and "TOAST table for
> OID %u" with newlines after each item?

I'm fine with just removing " which ". but I don't understand about
the extra newlines.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Force streaming every change in logical decoding

2022-12-20 Thread Masahiko Sawada

On Tue, Dec 20, 2022 at 7:49 PM Amit Kapila  wrote:
>
> On Tue, Dec 20, 2022 at 2:46 PM Hayato Kuroda (Fujitsu)
>  wrote:
> >
> > Dear hackers,
> >
> > > We have discussed three different ways to provide GUC for these
> > > features. (1) Have separate GUCs like force_server_stream_mode,
> > > force_server_serialize_mode, force_client_serialize_mode (we can use
> > > different names for these) for each of these; (2) Have two sets of
> > > GUCs for server and client. We can have logical_decoding_mode with
> > > values as 'stream' and 'serialize' for the server and then
> > > logical_apply_serialize = true/false for the client. (3) Have one GUC
> > > like logical_replication_mode with values as 'server_stream',
> > > 'server_serialize', 'client_serialize'.
> >
> > I also agreed for adding new GUC parameters (and I have already done 
> > partially
> > in parallel apply[1]), and basically options 2 made sense for me. But is it 
> > OK
> > that we can choose "serialize" mode even if subscribers require streaming?
> >
> > Currently the reorder buffer transactions are serialized on publisher only 
> > when
> > the there are no streamable transaction. So what happen if the
> > logical_decoding_mode = "serialize" but streaming option streaming is on? 
> > If we
> > break the first one and serialize changes on publisher anyway, it may be not
> > suitable for testing the normal operation.
> >
>
> I think the change will be streamed as soon as the next change is
> processed even if we serialize based on this option. See
> ReorderBufferProcessPartialChange. However, I see your point that when
> the streaming option is given, the value 'serialize' for this GUC may
> not make much sense.
>
> > Therefore, I came up with the variant of (2): logical_decoding_mode can be
> > "normal" or "immediate".
> >
> > "normal" is a default value, which is same as current HEAD. Changes are 
> > streamed
> > or serialized when the buffered size exceeds logical_decoding_work_mem.
> >
> > When users set to "immediate", the walsenders starts to stream or serialize 
> > all
> > changes. The choice is depends on the subscription option.
> >
>
> The other possibility to achieve what you are saying is that we allow
> a minimum value of logical_decoding_work_mem as 0 which would mean
> stream or serialize each change depending on whether the streaming
> option is enabled. I think we normally don't allow a minimum value
> below a certain threshold for other *_work_mem parameters (like
> maintenance_work_mem, work_mem), so we have followed the same here.
> And, I think it makes sense from the user's perspective because below
> a certain threshold it will just add overhead by either writing small
> changes to the disk or by sending those over the network. However, it
> can be quite useful for testing/debugging. So, not sure, if we should
> restrict setting logical_decoding_work_mem below a certain threshold.
> What do you think?

I agree with (2), having separate GUCs for publisher side and
subscriber side. Also, on the publisher side, Amit's idea, controlling
the logical decoding behavior by changing logical_decoding_work_mem,
seems like a good idea.

But I'm not sure it's a good idea if we lower the minimum value of
logical_decoding_work_mem to 0. I agree it's helpful for testing and
debugging but setting logical_decoding_work_mem = 0 doesn't benefit
users at all, rather brings risks.

I prefer the idea Kuroda-san previously proposed; setting
logical_decoding_mode = 'immediate' means setting
logical_decoding_work_mem = 0. We might not need to have it as an enum
parameter since it has only two values, though.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Use get_call_result_type() more widely

2022-12-20 Thread Bharath Rupireddy

On Wed, Dec 21, 2022 at 6:44 AM Michael Paquier  wrote:
>
> I have applied v2-0001.

Thanks for taking care of this.

By seeing the impact that get_call_result_type() can have for the
functions that are possibly called repeatedly, I couldn't resist
sharing a patch (attached herewith) that adds a note of caution and
another way to build TupleDesc in the documentation to help developers
out there. Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 3ee0c4a402bcf2d25a7c544ddd60d3e9742b9048 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Wed, 21 Dec 2022 06:38:27 +
Subject: [PATCH v1] Add another way to build TupleDesc in documentation

---
 doc/src/sgml/xfunc.sgml | 44 +
 1 file changed, 44 insertions(+)

diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml
index cf5810b3c1..8b9c718fdf 100644
--- a/doc/src/sgml/xfunc.sgml
+++ b/doc/src/sgml/xfunc.sgml
@@ -2839,6 +2839,35 @@ TypeFuncClass get_call_result_type(FunctionCallInfo fcinfo,
  cannot accept type record.)
 
 
+
+ Another way to setup TupleDesc is to use the
+ following functions:
+
+TupleDesc CreateTemplateTupleDesc(int natts);
+void TupleDescInitEntry(TupleDesc desc,
+AttrNumber attributeNumber,
+const char *attributeName,
+Oid oidtypeid,
+int32 typmod,
+int attdim);
+TupleDesc BlessTupleDesc(TupleDesc tupdesc);
+
+For example:
+
+/*
+ * Construct a tuple descriptor for the result row.  The number of output
+ * columns, column names and types must match the function's pg_proc entry
+ * or function's definition.
+ */
+tupdesc = CreateTemplateTupleDesc(4);
+TupleDescInitEntry(tupdesc, (AttrNumber) 1, "foo", INT4OID, -1, 0);
+TupleDescInitEntry(tupdesc, (AttrNumber) 2, "bar", BOOLOID, -1, 0);
+TupleDescInitEntry(tupdesc, (AttrNumber) 3, "baz", TEXTOID, -1, 0);
+TupleDescInitEntry(tupdesc, (AttrNumber) 4, "qux", TIMESTAMPTZOID, -1, 0);
+tupdesc = BlessTupleDesc(tupdesc);
+
+
+
 
  
   get_call_result_type can resolve the actual type of a
@@ -2849,6 +2878,21 @@ TypeFuncClass get_call_result_type(FunctionCallInfo fcinfo,
  
 
 
+
+ 
+get_call_result_type is relatively expensive
+as it has to look up system catalogs and do other things to setup
+TupleDesc. While it is recommended
+in most of the cases, remember to consider setting up
+TupleDesc with
+CreateTemplateTupleDesc,
+TupleDescInitEntry and
+BlessTupleDesc as shown above, if the function
+returns set or the function is possible used repeatedly
+(monitoring purposes, for instance).
+ 
+
+
 
  
   get_call_result_type has a sibling
-- 
2.34.1

Re: Force streaming every change in logical decoding

2022-12-20 Thread shveta malik

Going with ' logical_decoding_work_mem' seems a reasonable solution, but
since we are mixing
the functionality of developer and production GUC, there is a slight risk
that customer/DBAs may end
up setting it to 0 and forget about it and thus hampering system's
performance.
Have seen many such cases in previous org.

Adding a new developer parameter seems slightly safe, considering we
already have one
such category supported in postgres. It can be on the same line as that of
'force_parallel_mode'.
It will be purely developer GUC, plus if we want to extend something in
future to add/automate
heavier test-cases or any other streaming related dev option, we can extend
the same parameter w/o
disturbing production's one. (see force_parallel_mode=regress for ref).

thanks
Shveta

On Wed, Dec 21, 2022 at 11:25 AM Hayato Kuroda (Fujitsu) <
kuroda.hay...@fujitsu.com> wrote:

> Dear Amit,
>
> > The other possibility to achieve what you are saying is that we allow
> > a minimum value of logical_decoding_work_mem as 0 which would mean
> > stream or serialize each change depending on whether the streaming
> > option is enabled.
>
> I understood that logical_decoding_work_mem may double as normal option as
> developer option. I think yours is smarter because we can reduce # of GUCs.
>
> > I think we normally don't allow a minimum value
> > below a certain threshold for other *_work_mem parameters (like
> > maintenance_work_mem, work_mem), so we have followed the same here.
> > And, I think it makes sense from the user's perspective because below
> > a certain threshold it will just add overhead by either writing small
> > changes to the disk or by sending those over the network. However, it
> > can be quite useful for testing/debugging. So, not sure, if we should
> > restrict setting logical_decoding_work_mem below a certain threshold.
> > What do you think?
>
> You mean to say that there is a possibility that users may set a small
> value without deep
> considerations, right? If so, how about using the approach like
> autovacuum_work_mem?
>
> autovacuum_work_mem has a range [-1, MAX_KIROBYTES], and -1 mean that it
> follows
> maintenance_work_mem. If it is set small value like 5KB, its working
> memory is rounded
> up to 1024KB. See check_autovacuum_work_mem().
>
> Based on that, I suggest followings. Can they solve the problem what you
> said?
>
> * If logical_decoding_work_mem is set to 0, all transactions are streamed
> or serialized
>   on publisher.
> * If logical_decoding_work_mem is set within [1, 63KB], the value is
> rounded up or ERROR
>   is raised.
> * If logical_decoding_work_mem  is set greater than or equal to 64KB, the
> set value
>   is used.
>
> Best Regards,
> Hayato Kuroda
> FUJITSU LIMITED
> > Amit Kapila.
>

Re: Small miscellaneus fixes (Part II)

2022-12-20 Thread Michael Paquier

On Tue, Dec 20, 2022 at 06:51:34PM -0600, Justin Pryzby wrote:
> On Fri, Nov 25, 2022 at 06:27:04PM -0300, Ranier Vilela wrote:
> > 5. Use boolean operator with boolean operands
> > (b/src/backend/commands/tablecmds.c)
> 
> tablecmds.c: right.  Since 074c5cfbf

Most of this does not seem to be really worth poking at.

newcons = AddRelationNewConstraints(rel, NIL,
list_make1(copyObject(constr)),
-   recursing | is_readd,   /* allow_merge 
*/
+   recursing || is_readd,  /* allow_merge 
*/
There is no damage here, but that looks like a typo so no objections
on this one.
--
Michael


signature.asc
Description: PGP signature

RE: Force streaming every change in logical decoding

2022-12-20 Thread Hayato Kuroda (Fujitsu)

Dear Amit,

> The other possibility to achieve what you are saying is that we allow
> a minimum value of logical_decoding_work_mem as 0 which would mean
> stream or serialize each change depending on whether the streaming
> option is enabled.

I understood that logical_decoding_work_mem may double as normal option as
developer option. I think yours is smarter because we can reduce # of GUCs.

> I think we normally don't allow a minimum value
> below a certain threshold for other *_work_mem parameters (like
> maintenance_work_mem, work_mem), so we have followed the same here.
> And, I think it makes sense from the user's perspective because below
> a certain threshold it will just add overhead by either writing small
> changes to the disk or by sending those over the network. However, it
> can be quite useful for testing/debugging. So, not sure, if we should
> restrict setting logical_decoding_work_mem below a certain threshold.
> What do you think?

You mean to say that there is a possibility that users may set a small value 
without deep
considerations, right? If so, how about using the approach like 
autovacuum_work_mem?

autovacuum_work_mem has a range [-1, MAX_KIROBYTES], and -1 mean that it follows
maintenance_work_mem. If it is set small value like 5KB, its working memory is 
rounded
up to 1024KB. See check_autovacuum_work_mem().

Based on that, I suggest followings. Can they solve the problem what you said?

* If logical_decoding_work_mem is set to 0, all transactions are streamed or 
serialized
  on publisher.
* If logical_decoding_work_mem is set within [1, 63KB], the value is rounded up 
or ERROR
  is raised.
* If logical_decoding_work_mem  is set greater than or equal to 64KB, the set 
value
  is used.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
> Amit Kapila.

Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

2022-12-20 Thread Bharath Rupireddy

On Wed, Dec 21, 2022 at 5:39 AM Michael Paquier  wrote:
>
> On Tue, Dec 20, 2022 at 06:04:40PM +0530, Bharath Rupireddy wrote:
> > On Tue, Dec 20, 2022 at 1:27 PM Magnus Hagander  wrote:
> >> Caught this thread late. To me, pg_dissect_walfile_name() is a
> >> really strange name for a function. Grepping our I code I see the
> >> term dissect s used somewhere inside the regex code and exactly
> >> zero instances elsewhere. Which is why I definitely didn't
> >> recognize the term...
> >>
> >> Wouldn't something like pg_split_walfile_name() be a lot more
> >> consistent with the rest of our names?
>
> Fine by me to change that if there is little support for the current
> naming, though the current one does not sound that bad to me either.
>
> > Hm. FWIW, here's the patch.
>
> "split" is used a lot for the picksplit functions, but not in any of
> the existing functions as a name.  Some extra options: parse, read,
> extract, calculate, deduce, get.  "parse" would be something I would
> be OK with.

"dissect", "split" and "parse" -  I'm okay with either of these.

Read somewhere - a saying that goes this way "the hardest part of
coding is to name variables and functions" :).

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Add SHELL_EXIT_CODE to psql

2022-12-20 Thread Corey Huinker

I've rebased and updated the patch to include documentation.

Regression tests have been moved to a separate patchfile because error
messages will vary by OS and configuration, so we probably can't do a
stable regression test, but having them handy at least demonstrates the
feature.

On Sun, Dec 4, 2022 at 12:35 AM Corey Huinker 
wrote:

> Rebased. Still waiting on feedback before working on documentation.
>
> On Fri, Nov 4, 2022 at 5:23 AM Corey Huinker 
> wrote:
>
>> Oops, that sample output was from a previous run, should have been:
>>
>> -- SHELL_EXIT_CODE is undefined
>> \echo :SHELL_EXIT_CODE
>> :SHELL_EXIT_CODE
>> -- bad \!
>> \! borp
>> sh: line 1: borp: command not found
>> \echo :SHELL_EXIT_CODE
>> 127
>> -- bad backtick
>> \set var `borp`
>> sh: line 1: borp: command not found
>> \echo :SHELL_EXIT_CODE
>> 127
>> -- good \!
>> \! true
>> \echo :SHELL_EXIT_CODE
>> 0
>> -- play with exit codes
>> \! exit 4
>> \echo :SHELL_EXIT_CODE
>> 4
>> \set var `exit 3`
>> \echo :SHELL_EXIT_CODE
>> 3
>>
>>
>> On Fri, Nov 4, 2022 at 5:08 AM Corey Huinker 
>> wrote:
>>
>>>
>>> Over in
>>> https://www.postgresql.org/message-id/eaf326ad693e74eba068f33a7f518...@oss.nttdata.com
>>>  Justin
>>> Pryzby suggested that psql might need the ability to capture the shell exit
>>> code.
>>>
>>> This is a POC patch that does that, but doesn't touch on the
>>> ON_ERROR_STOP stuff.
>>>
>>> I've added some very rudimentary tests, but haven't touched the
>>> documentation, because I strongly suspect that someone will suggest a
>>> better name for the variable.
>>>
>>> But basically, it works like this
>>>
>>> -- SHELL_EXIT_CODE is undefined
>>> \echo :SHELL_EXIT_CODE
>>> :SHELL_EXIT_CODE
>>> -- bad \!
>>> \! borp
>>> sh: line 1: borp: command not found
>>> \echo :SHELL_EXIT_CODE
>>> 32512
>>> -- bad backtick
>>> \set var `borp`
>>> sh: line 1: borp: command not found
>>> \echo :SHELL_EXIT_CODE
>>> 127
>>> -- good \!
>>> \! true
>>> \echo :SHELL_EXIT_CODE
>>> 0
>>> -- play with exit codes
>>> \! exit 4
>>> \echo :SHELL_EXIT_CODE
>>> 1024
>>> \set var `exit 3`
>>> \echo :SHELL_EXIT_CODE
>>> 3
>>>
>>>
>>> Feedback welcome.
>>>
>>
From 443be6f64ca89bbd6367a011f2a98aa9324f27a9 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Wed, 21 Dec 2022 00:24:47 -0500
Subject: [PATCH 1/2] Make the exit code of shell commands executed via psql
 visible via the variable SHELL_EXIT_CODE.

---
 doc/src/sgml/ref/psql-ref.sgml |  9 +
 src/bin/psql/command.c |  4 
 src/bin/psql/help.c|  2 ++
 src/bin/psql/psqlscanslash.l   | 24 +---
 4 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 8a5285da9a..d0c80b4528 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -4260,6 +4260,15 @@ bar
 
   
 
+  
+   SHELL_EXIT_CODE
+   
+
+ The exit code return by the last shell command. 0 means no error.
+
+   
+  
+
   
 SHOW_ALL_RESULTS
 
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index de6a3a71f8..f6d6a489a9 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -4998,6 +4998,7 @@ static bool
 do_shell(const char *command)
 {
 	int			result;
+	char		exit_code_buf[32];
 
 	fflush(NULL);
 	if (!command)
@@ -5025,6 +5026,9 @@ do_shell(const char *command)
 	else
 		result = system(command);
 
+	snprintf(exit_code_buf, sizeof(exit_code_buf), "%d", WEXITSTATUS(result));
+	SetVariable(pset.vars, "SHELL_EXIT_CODE", exit_code_buf);
+
 	if (result == 127 || result == -1)
 	{
 		pg_log_error("\\!: failed");
diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
index b4e0ec2687..caf13e2ed2 100644
--- a/src/bin/psql/help.c
+++ b/src/bin/psql/help.c
@@ -455,6 +455,8 @@ helpVariables(unsigned short int pager)
 		  "show all results of a combined query (\\;) instead of only the last\n");
 	HELP0("  SHOW_CONTEXT\n"
 		  "controls display of message context fields [never, errors, always]\n");
+	HELP0("  SHELL_EXIT_CODE\n"
+		  "Exit code of the last shell command\n");
 	HELP0("  SINGLELINE\n"
 		  "if set, end of line terminates SQL commands (same as -S option)\n");
 	HELP0("  SINGLESTEP\n"
diff --git a/src/bin/psql/psqlscanslash.l b/src/bin/psql/psqlscanslash.l
index a467b72144..ecbc3b9d8b 100644
--- a/src/bin/psql/psqlscanslash.l
+++ b/src/bin/psql/psqlscanslash.l
@@ -27,6 +27,8 @@
 
 %{
 #include "fe_utils/psqlscan_int.h"
+#include "settings.h"
+#include "variables.h"
 
 /*
  * We must have a typedef YYSTYPE for yylex's first argument, but this lexer
@@ -774,6 +776,8 @@ evaluate_backtick(PsqlScanState state)
 	bool		error = false;
 	char		buf[512];
 	size_t		result;
+	int			exit_code = 0;
+	char		exit_code_buf[32];
 
 	initPQExpBuffer(_output);
 
@@ -783,6 +787,7 @@ evaluate_backtick(PsqlScanState state)
 	{
 		pg_log_error("%s: %m", cmd);
 		error = true;
+		exit_code = -1;
 	}
 
 	if (!error)

Re: New strategies for freezing, advancing relfrozenxid early

2022-12-20 Thread Peter Geoghegan

On Tue, Dec 20, 2022 at 7:15 PM Peter Geoghegan  wrote:
> On Tue, Dec 20, 2022 at 5:44 PM Jeff Davis  wrote:
> > Next, the 'freeze_required' field suggests that it's more involved in
> > the control flow that causes freezing than it actually is. All it does
> > is communicate how the trackers need to be adjusted. The return value
> > of heap_prepare_freeze_tuple() (and underneath, the flags set by
> > FreezeMultiXactId()) are what actually control what happens. It would
> > be nice to make this more clear somehow.
>
> I'm not sure what you mean. Page-level freezing *doesn't* have to go
> ahead when freeze_required is not ever set to true for any tuple on
> the page (which is most of the time, in practice). lazy_scan_prune
> gets to make a choice about freezing the page, when the choice is
> available.

Oh wait, I think I see the point of confusion now.

When freeze_required is set to true, that means that lazy_scan_prune
literally has no choice -- it simply must freeze the page as
instructed by heap_prepare_freeze_tuple/FreezeMultiXactId. It's not
just a strong suggestion -- it's crucial that lazy_scan_prune freezes
the page as instructed.

The "no freeze" trackers (HeapPageFreeze.NoFreezePageRelfrozenXid and
HeapPageFreeze.NoFreezePageRelminMxid) won't have been maintained
properly when freeze_required was set, so lazy_scan_prune can't expect
to use them -- doing so would lead to VACUUM setting incorrect values
in pg_class later on.

Avoiding the work of maintaining those "no freeze" trackers isn't just
a nice-to-have microoptimization -- it is sometimes very important. We
kind of rely on this to be able to avoid getting too many MultiXact
member SLRU buffer misses inside FreezeMultiXactId. There is a comment
above FreezeMultiXactId that advises its caller that it had better not
call heap_tuple_should_freeze when freeze_required is set to true,
because that could easily lead to multixact member SLRU buffer misses
-- misses that FreezeMultiXactId set out to avoid itself.

It could actually be cheaper to freeze than to not freeze, in the case
of a Multi -- member space misses can sometimes be really expensive.
And so FreezeMultiXactId sometimes freezes a Multi even though it's
not strictly required to do so.

Note also that this isn't a new behavior -- it's actually an old one,
for the most part. It kinda doesn't look that way, because we haven't
passed down separate FreezeLimit/OldestXmin cutoffs (and separate
OldestMxact/MultiXactCutoff cutoffs) until now. But we often don't
need that granular information to be able to process Multis before the
multi value is < MultiXactCutoff.

If you look at how FreezeMultiXactId works, in detail, you'll see that
even on Postgres HEAD it can (say) set a tuple's xmax to
InvalidTransactionId long before the multi value is < MultiXactCutoff.
It just needs to detect that the multi is not still running, and
notice that it's HEAP_XMAX_IS_LOCKED_ONLY(). Stuff like that happens
quite a bit. So for the most part "eager processing of Multis as a
special case" is an old behavior, that has only been enhanced a little
bit (the really important, new change in FreezeMultiXactId is how the
FRM_NOOP case works with FreezeLimit, even though OldestXmin is used
nearby -- this is extra confusing because 0002 doesn't change how we
use FreezeLimit -- it actually changes every other use of FreezeLimit
nearby, making it OldestXmin).

-- 
Peter Geoghegan

Re: Simplifications for error messages related to compression

2022-12-20 Thread Justin Pryzby

On Wed, Dec 21, 2022 at 01:52:21PM +0900, Michael Paquier wrote:
> On Tue, Dec 20, 2022 at 08:29:32PM -0600, Justin Pryzby wrote:
> > -   pg_fatal("not built with zlib support");
> > +   pg_fatal("this build does not support compression 
> > with %s", "gzip");
> > 
> > I tried to say in the other thread that gzip != zlib.
> >
> > This message may be better for translation, but (for WriteDataToArchive
> > et al) the message is now less accurate, and I suspect will cause some
> > confusion.
> 
> Compression specifications use this term, so, like bbstreamer_gzip.c,

Yes, and its current users (basebackup) output a gzip file, right ?

pg_dump -Fc doesn't output a gzip file, but now it's using user-facing
compression specifications referring to it as "gzip".

> that does not sound like a big difference to me as everything depends
> on HAVE_LIBZ, still we use gzip for all the user-facing terms.

postgres is using -lz to write both gzip files and non-gzip libz files;
its associated compiletime constant has nothing to do with which header
format is being output.

* This file includes two APIs for dealing with compressed data. The 
first
* provides more flexibility, using callbacks to read/write data from the
* underlying stream. The second API is a wrapper around fopen/gzopen and
* friends, providing an interface similar to those, but abstracts away
* the possible compression. Both APIs use libz for the compression, but
* the second API uses gzip headers, so the resulting files can be easily
* manipulated with the gzip utility.

> > 5e73a6048 introduced a similar user-facing issue: pg_dump -Fc -Z gzip
> > does not output a gzip.
> 
> We've never mentioned any compression method in the past docs, just
> that things can be compressed.

What do you mean ?

The commit added:
+The compression method can be set to gzip or ...

And the docs still say:

-For plain text output, setting a nonzero compression level causes
-the entire output file to be compressed, as though it had been
-fed through gzip; but the default is not to 
compress.

If you tell someone they can write -Z gzip, they'll be justified in
expecting to see "gzip" as output.

-- 
Justin

Re: assertion failures on BuildFarm that happened in slab.c

2022-12-20 Thread David Rowley

On Wed, 21 Dec 2022 at 18:04, Michael Paquier  wrote:
> Good catch.  Yes, d21ded7 looks to have issues.  I am adding David in
> CC.

Thanks. I'll look now.

David

Re: assertion failures on BuildFarm that happened in slab.c

2022-12-20 Thread Michael Paquier

On Wed, Dec 21, 2022 at 03:28:01AM +, Takamichi Osumi (Fujitsu) wrote:
> The last failure for subscriptionCheck before those for HEAD happened 66 days 
> ago [3]
> and I checked all failures there within 90 days. There is no similar failure.
> 
> I'm not sure, but this could be related to a recent commit (d21ded75fd) in 
> [4].
> 
> [1] - 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=butterflyfish=2022-12-20%2013%3A00%3A19
> [2] - 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit=2022-12-20%2010%3A34%3A39
> [3] - 
> https://buildfarm.postgresql.org/cgi-bin/show_failures.pl?max_days=90=subscriptionCheck=Submit
> [4] - 
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d21ded75fdbc18d68be6e6c172f0f842c50e9263

Good catch.  Yes, d21ded7 looks to have issues.  I am adding David in
CC.
--
Michael


signature.asc
Description: PGP signature

Re: Refactor SCRAM code to dynamically handle hash type and key length

2022-12-20 Thread Michael Paquier

On Tue, Dec 20, 2022 at 08:45:29PM -0500, Jonathan S. Katz wrote:
> On 12/20/22 2:25 AM, Michael Paquier wrote:
>> 4) The case of MD5 is something that looks a bit tricky at quick
>> glance.  We know that if the role has a MD5 password stored, we will
>> fail anyway.  So we could just advertise the SHA-256 mechanisms in
>> this case and map the mock to that?
> 
> Is this the case where we specify "md5" as the auth method but the
> user-password is stored in SCRAM?

Yes.  A port storing uaMD5 with a SCRAM password makes the backend use
SASL for the whole exchange.  At quick glance, we could fallback to
look at the password of the user sent by the startup packet and
advertise the mechanisms based on that because we know that one user
=> one password currently.  I'd need to double-check on the RFCs to
see if there is anything specific here to worry about.  The recent
ones being worked on may tell more.

>> 5) The mechanism choice in libpq needs to be reworked a bit based on
>> what the backend sends.  There may be no point in advertising all the
>> SHA-256 and SHA-512 mechanisms at the same time, I guess.
> 
> Yeah, I think a user may want to select which ones they want to use (e.g.
> they may not want to advertise SHA-256).

Yep, they should be able to do so.  libpq should select the strongest
one if the server sends all of them, but things like
https://commitfest.postgresql.org/41/3716/ should be able to enforce
one over the other.
--
Michael


signature.asc
Description: PGP signature

Re: Simplifications for error messages related to compression

2022-12-20 Thread Michael Paquier

On Tue, Dec 20, 2022 at 08:29:32PM -0600, Justin Pryzby wrote:
> -   pg_fatal("not built with zlib support");
> +   pg_fatal("this build does not support compression 
> with %s", "gzip");
> 
> I tried to say in the other thread that gzip != zlib.
>
> This message may be better for translation, but (for WriteDataToArchive
> et al) the message is now less accurate, and I suspect will cause some
> confusion.

Compression specifications use this term, so, like bbstreamer_gzip.c,
that does not sound like a big difference to me as everything depends
on HAVE_LIBZ, still we use gzip for all the user-facing terms.

> 5e73a6048 introduced a similar user-facing issue: pg_dump -Fc -Z gzip
> does not output a gzip.

We've never mentioned any compression method in the past docs, just
that things can be compressed.

> $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Fc -Z gzip regression 
> |xxd |head
> : 5047 444d 5001 0e00 0408 0101 0100   PGDMP...
> 
> I'm okay with it if you think this is no problem - maybe it's enough to
> document that the output is zlib and not gzip.

Perhaps.

> Otherwise, one idea was to reject "gzip" with -Fc.  It could accept
> integers only.

I am not sure what we would gain by doing that, except complications
with the code surrounding the handling of compression specifications,
which is a backward-compatible thing as it can handle integer-only
inputs.

> BTW I think it's helpful to include the existing participants when
> forking a thread.

Err, okay..  Sorry about that.
--
Michael

signature.asc
Description: PGP signature

Optimization issue of branching UNION ALL

2022-12-20 Thread Andrey Lepikhov


Hi,

Client report on a corner case have shown up possible minor 
non-optimality in procedure of transformation of simple UNION ALL 
statement tree.
Complaint is about auto-generated query with 1E4 simple union all's (see 
t.sh to generate a demo script). The reason: in REL_11_STABLE it is 
planned and executed in a second, but REL_12_STABLE and beyond makes 
matters worse: planning of such a query needs tons of gigabytes of RAM.


Superficial study revealed possibly unnecessary operations that could be 
avoided:
1. Walking across a query by calling substitute_phv_relids() even if 
lastPHId shows that no one phv is presented.
2. Iterative passes along the append_rel_list for replacing vars in the 
translated_vars field. I can't grasp real necessity of passing all the 
append_rel_list during flattening of an union all leaf subquery. No one 
can reference this leaf, isn't it?


In attachment you can see some sketch that reduces a number of planner 
cycles/copyings.


--
Regards
Andrey Lepikhov
Postgres Professional

t.sh
Description: application/shellscript
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 08a73fb9d86..3739e3fe7ba 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -131,7 +131,7 @@ static bool find_dependent_phvs_in_jointree(PlannerInfo *root,
 			Node *node, int varno);
 static void substitute_phv_relids(Node *node,
   int varno, Relids subrelids);
-static void fix_append_rel_relids(List *append_rel_list, int varno,
+static void fix_append_rel_relids(PlannerInfo *root, int varno,
   Relids subrelids);
 static Node *find_jointree_node_for_rel(Node *jtnode, int relid);
 
@@ -1156,8 +1156,9 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
 		Relids		subrelids;
 
 		subrelids = get_relids_in_jointree((Node *) subquery->jointree, false);
-		substitute_phv_relids((Node *) parse, varno, subrelids);
-		fix_append_rel_relids(root->append_rel_list, varno, subrelids);
+		if (root->glob->lastPHId != 0)
+			substitute_phv_relids((Node *) parse, varno, subrelids);
+		fix_append_rel_relids(root, varno, subrelids);
 	}
 
 	/*
@@ -2064,17 +2065,25 @@ perform_pullup_replace_vars(PlannerInfo *root,
 	 * use PHVs for safety.  (This analysis could be made tighter but it seems
 	 * unlikely to be worth much trouble.)
 	 */
-	foreach(lc, root->append_rel_list)
+	if (containing_appendrel)
 	{
-		AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
-		bool		save_need_phvs = rvcontext->need_phvs;
+		bool save_need_phvs = rvcontext->need_phvs;
 
-		if (appinfo == containing_appendrel)
-			rvcontext->need_phvs = false;
-		appinfo->translated_vars = (List *)
-			pullup_replace_vars((Node *) appinfo->translated_vars, rvcontext);
+		rvcontext->need_phvs = false;
+		containing_appendrel->translated_vars = (List *)
+			pullup_replace_vars((Node *) containing_appendrel->translated_vars,
+rvcontext);
 		rvcontext->need_phvs = save_need_phvs;
 	}
+	else
+	{
+		foreach(lc, root->append_rel_list)
+		{
+			AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
+			appinfo->translated_vars = (List *)
+pullup_replace_vars((Node *) appinfo->translated_vars, rvcontext);
+		}
+	}
 
 	/*
 	 * Replace references in the joinaliasvars lists of join RTEs.
@@ -3273,7 +3282,7 @@ remove_result_refs(PlannerInfo *root, int varno, Node *newjtloc)
 		subrelids = get_relids_in_jointree(newjtloc, false);
 		Assert(!bms_is_empty(subrelids));
 		substitute_phv_relids((Node *) root->parse, varno, subrelids);
-		fix_append_rel_relids(root->append_rel_list, varno, subrelids);
+		fix_append_rel_relids(root, varno, subrelids);
 	}
 
 	/*
@@ -3492,7 +3501,7 @@ substitute_phv_relids(Node *node, int varno, Relids subrelids)
  * We assume we may modify the AppendRelInfo nodes in-place.
  */
 static void
-fix_append_rel_relids(List *append_rel_list, int varno, Relids subrelids)
+fix_append_rel_relids(PlannerInfo *root, int varno, Relids subrelids)
 {
 	ListCell   *l;
 	int			subvarno = -1;
@@ -3503,7 +3512,7 @@ fix_append_rel_relids(List *append_rel_list, int varno, Relids subrelids)
 	 * AppendRelInfo nodes refer to it.  So compute it on first use. Note that
 	 * bms_singleton_member will complain if set is not singleton.
 	 */
-	foreach(l, append_rel_list)
+	foreach(l, root->append_rel_list)
 	{
 		AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
 
@@ -3518,8 +3527,9 @@ fix_append_rel_relids(List *append_rel_list, int varno, Relids subrelids)
 		}
 
 		/* Also fix up any PHVs in its translated vars */
-		substitute_phv_relids((Node *) appinfo->translated_vars,
-			  varno, subrelids);
+		if (root->glob->lastPHId != 0)
+			substitute_phv_relids((Node *) appinfo->translated_vars,
+  varno, subrelids);
 	}
 }

assertion failures on BuildFarm that happened in slab.c

2022-12-20 Thread Takamichi Osumi (Fujitsu)

Hi, hackers


I've found two assertion failures on BuildFarm [1][2].
The call stack can be found in [2].

TRAP: failed Assert("dlist_is_empty(blocklist)"), File: "slab.c", Line: 564, 
PID: 16148
postgres: publisher: walsender postgres postgres [local] 
START_REPLICATION(ExceptionalCondition+0x54)[0x983564]
postgres: publisher: walsender postgres postgres [local] 
START_REPLICATION(SlabAlloc+0x50d)[0x9b554d]
postgres: publisher: walsender postgres postgres [local] 
START_REPLICATION(MemoryContextAlloc+0x4c)[0x9b2b2c]
postgres: publisher: walsender postgres postgres [local] 
START_REPLICATION(ReorderBufferGetChange+0x15)[0x7d62d5]
postgres: publisher: walsender postgres postgres [local] 
START_REPLICATION(heap_decode+0x4d1)[0x7cb071]
postgres: publisher: walsender postgres postgres [local] 
START_REPLICATION(LogicalDecodingProcessRecord+0x63)[0x7ca033]
postgres: publisher: walsender postgres postgres [local] 
START_REPLICATION[0x7f1962]
postgres: publisher: walsender postgres postgres [local] 
START_REPLICATION[0x7f42e2]
postgres: publisher: walsender postgres postgres [local] 
START_REPLICATION(exec_replication_command+0xa88)[0x7f4ec8]
postgres: publisher: walsender postgres postgres [local] 
START_REPLICATION(PostgresMain+0x524)[0x847a44]
2022-12-20 13:18:39.725 UTC [16167:4] 015_stream.pl LOG:  disconnection: 
session time: 0:00:00.019 user=postgres database=postgres host=[local]
postgres: publisher: walsender postgres postgres [local] 
START_REPLICATION[0x7b57e6]
postgres: publisher: walsender postgres postgres [local] 
START_REPLICATION(PostmasterMain+0xe27)[0x7b6757]
postgres: publisher: walsender postgres postgres [local] 
START_REPLICATION(main+0x683)[0x4c74e3]
/lib/libc.so.6(__libc_start_main+0xf0)[0x7f5179a8d070]
postgres: publisher: walsender postgres postgres [local] 
START_REPLICATION(_start+0x2a)[0x4c75aa]
2022-12-20 13:18:39.795 UTC [16107:4] LOG:  server process (PID 16148) was 
terminated by signal 6: Aborted


The last failure for subscriptionCheck before those for HEAD happened 66 days 
ago [3]
and I checked all failures there within 90 days. There is no similar failure.

I'm not sure, but this could be related to a recent commit (d21ded75fd) in [4].

[1] - 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=butterflyfish=2022-12-20%2013%3A00%3A19
[2] - 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit=2022-12-20%2010%3A34%3A39
[3] - 
https://buildfarm.postgresql.org/cgi-bin/show_failures.pl?max_days=90=subscriptionCheck=Submit
[4] - 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d21ded75fdbc18d68be6e6c172f0f842c50e9263



Best Regards,
Takamichi Osumi

Re: Array initialisation notation in syscache.c

2022-12-20 Thread Thomas Munro

On Wed, Dec 21, 2022 at 1:33 PM Thomas Munro  wrote:
> KEY(Anum_pg_attribute_attrelid,
> Anum_pg_attribute_attnum),

I independently rediscovered that our VA_ARGS_NARGS() macro in c.h
always returns 1 on MSVC via trial-by-CI.  Derp.  Here is the same
patch, no change from v2, but this time accompanied by Victor Spirin's
fix, which I found via one of the tab-completion-is-busted-on-Windows
discussions.  I can't supply a useful commit message, because I
haven't understood why it works, but it does indeed seem to work and
this should make cfbot green.
From a875324d9aca79cfbc46924354dcb4631d418dff Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Wed, 21 Dec 2022 15:36:46 +1300
Subject: [PATCH v3 1/2] Fix VA_ARGS_NARGS() macro on MSVC.

The previous coding of VA_ARGS_NARGS() always returned 1 on MSVC.
XXX Explain?

Author: Victor Spirin 
Discussion: https://postgr.es/m/f450fc57-a147-19d0-e50c-33571c52cc13%40postgrespro.ru
---
 src/include/c.h | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/src/include/c.h b/src/include/c.h
index bd6d8e5bf5..06f49e7592 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -325,6 +325,19 @@
  * the call so that that is the appropriate one of the list of constants.
  * This idea is due to Laurent Deniau.
  */
+#ifdef _MSC_VER
+#define EXPAND(args) args
+#define VA_ARGS_NARGS(...) \
+	VA_ARGS_NARGS_ EXPAND((__VA_ARGS__, \
+   63,62,61,60,   \
+   59,58,57,56,55,54,53,52,51,50, \
+   49,48,47,46,45,44,43,42,41,40, \
+   39,38,37,36,35,34,33,32,31,30, \
+   29,28,27,26,25,24,23,22,21,20, \
+   19,18,17,16,15,14,13,12,11,10, \
+   9, 8, 7, 6, 5, 4, 3, 2, 1, 0))
+#else
+
 #define VA_ARGS_NARGS(...) \
 	VA_ARGS_NARGS_(__VA_ARGS__, \
    63,62,61,60,   \
@@ -334,6 +347,8 @@
    29,28,27,26,25,24,23,22,21,20, \
    19,18,17,16,15,14,13,12,11,10, \
    9, 8, 7, 6, 5, 4, 3, 2, 1, 0)
+#endif
+
 #define VA_ARGS_NARGS_( \
 	_01,_02,_03,_04,_05,_06,_07,_08,_09,_10, \
 	_11,_12,_13,_14,_15,_16,_17,_18,_19,_20, \
-- 
2.38.1

From fe6420c519a6351a1a53b5e7e1f44418985c5609 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 20 Dec 2022 15:47:59 +1300
Subject: [PATCH v3 2/2] Improve notation of cacheinfo table in syscache.c.

Use C99 designated initializer syntax for the array elements, instead of
writing the enumerator name and position  in a comment.  Replace nkeys
and key with a local variadic macro, for a shorter notation.
---
 src/backend/utils/cache/syscache.c | 943 +
 1 file changed, 293 insertions(+), 650 deletions(-)

diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 5f17047047..99513aa336 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -89,8 +89,7 @@
 
 	Add your entry to the cacheinfo[] array below. All cache lists are
 	alphabetical, so add it in the proper place.  Specify the relation OID,
-	index OID, number of keys, key attribute numbers, and initial number of
-	hash buckets.
+	index OID, key attribute numbers, and initial number of hash buckets.
 
 	The number of hash buckets must be a power of 2.  It's reasonable to
 	set this to the number of entries that might be in the particular cache
@@ -122,920 +121,558 @@ struct cachedesc
 	int			nbuckets;		/* number of hash buckets for this cache */
 };
 
+/* Macro to provide nkeys and key array with convenient syntax. */
+#define KEY(...) VA_ARGS_NARGS(__VA_ARGS__), { __VA_ARGS__ }
+
 static const struct cachedesc cacheinfo[] = {
-	{AggregateRelationId,		/* AGGFNOID */
+	[AGGFNOID] = {
+		AggregateRelationId,
 		AggregateFnoidIndexId,
-		1,
-		{
-			Anum_pg_aggregate_aggfnoid,
-			0,
-			0,
-			0
-		},
+		KEY(Anum_pg_aggregate_aggfnoid),
 		16
 	},
-	{AccessMethodRelationId,	/* AMNAME */
+	[AMNAME] = {
+		AccessMethodRelationId,
 		AmNameIndexId,
-		1,
-		{
-			Anum_pg_am_amname,
-			0,
-			0,
-			0
-		},
+		KEY(Anum_pg_am_amname),
 		4
 	},
-	{AccessMethodRelationId,	/* AMOID */
+	[AMOID] = {
+		AccessMethodRelationId,
 		AmOidIndexId,
-		1,
-		{
-			Anum_pg_am_oid,
-			0,
-			0,
-			0
-		},
+		KEY(Anum_pg_am_oid),
 		4
 	},
-	{AccessMethodOperatorRelationId,	/* AMOPOPID */
+	[AMOPOPID] = {
+		AccessMethodOperatorRelationId,
 		AccessMethodOperatorIndexId,
-		3,
-		{
-			Anum_pg_amop_amopopr,
+		KEY(Anum_pg_amop_amopopr,
 			Anum_pg_amop_amoppurpose,
-			Anum_pg_amop_amopfamily,
-			0
-		},
+			Anum_pg_amop_amopfamily),
 		64
 	},
-	{AccessMethodOperatorRelationId,	/* AMOPSTRATEGY */
+	[AMOPSTRATEGY] = {
+		AccessMethodOperatorRelationId,
 		AccessMethodStrategyIndexId,
-		4,
-		{
-			Anum_pg_amop_amopfamily,
+		KEY(Anum_pg_amop_amopfamily,
 			Anum_pg_amop_amoplefttype,
 			Anum_pg_amop_amoprighttype,
-			Anum_pg_amop_amopstrategy
-		},
+			Anum_pg_amop_amopstrategy),
 		64
 	},
-	{AccessMethodProcedureRelationId,	/* AMPROCNUM */
+	[AMPROCNUM] = {
+		AccessMethodProcedureRelationId,

Re: New strategies for freezing, advancing relfrozenxid early

2022-12-20 Thread Peter Geoghegan

On Tue, Dec 20, 2022 at 5:44 PM Jeff Davis  wrote:
> Comments on 0002:
>
> Can you explain the following portion of the diff:
>
>
>   - else if (MultiXactIdPrecedes(multi, cutoffs->MultiXactCutoff))
>   + else if (MultiXactIdPrecedes(multi, cutoffs->OldestMxact))
>
>   ...
>
>   + /* Can't violate the MultiXactCutoff invariant, either */
>   + if (!need_replace)
>   + need_replace = MultiXactIdPrecedes(
>   +multi, cutoffs->MultiXactCutoff);

Don't forget the historic context: before Postgres 15's commit
0b018fab, VACUUM's final relfrozenxid always came from FreezeLimit.
Almost all of this code predates that work. So the general idea that
you can make a "should I freeze or should I ratchet back my
relfrozenxid tracker instead?" trade-off at the level of individual
tuples and pages is still a very new one. Right now it's only applied
within lazy_scan_noprune(), but 0002 leverages the same principles
here.

Before now, these heapam.c freezing routines had cutoffs called
cutoff_xid and cutoff_multi. These had values that actually came from
vacuumlazy.c's FreezeLimit and MultiXactCutoff cutoffs (which was
rather unclear). But cutoff_xid and cutoff_multi were *also* used as
inexact proxies for OldestXmin and OldestMxact (also kind of unclear,
but true). For example, there are some sanity checks in heapam.c that
kind of pretend that cutoff_xid is OldestXmin, even though it usually
isn't the same value (it can be, but only during VACUUM FREEZE, or
when the min freeze age is 0 in some other way).

So 0002 teaches the same heapam.c code about everything -- about all
of the different cutoffs, and about the true requirements of VACUUM
around relfrozenxid advancement. In fact, 0002 makes vacuumlazy.c cede
a lot of control of "XID stuff" to the same heapam.c code, freezing it
up to think about freezing as something that works at the level of
physical pages. This is key to allowing vacuumlazy.c to reason about
freezing at the level of the whole table. It thinks about physical
blocks, leaving logical XIDs up to heapam.c code.

This business that you asked about in FreezeMultiXactId() is needed so
that we can allow vacuumlazy.c to "think in terms of physical pages",
while at the same time avoiding allocating new Multis in VACUUM --
which requires "thinking about individual xmax fields" instead -- a
somewhat conflicting goal. We're really trying to have it both ways
(we get page-level freezing, with a little tuple level freezing on the
side, sufficient to to avoid allocating new Multis during VACUUMs in
roughly the same way as we do right now).

In most cases "freezing a page" removes all XIDs < OldestXmin, and all
MXIDs < OldestMxact. It doesn't quite work that way in certain rare
cases involving MultiXacts, though. It is convenient to define "freeze
the page" in a way that gives heapam.c's FreezeMultiXactId() the
leeway to put off the work of processing an individual tuple's xmax,
whenever it happens to be a MultiXactId that would require an
expensive second pass to process aggressively (allocating a new Multi
during VACUUM is especially worth avoiding here).

Our definition of "freeze the page" is a bit creative, at least if
you're used to thinking about it in terms of strict XID-wise cutoffs
like OldestXmin/FreezeLimit. But even if you do think of it in terms
of XIDs, the practical difference is extremely small in practice.

FreezeMultiXactId() effectively makes a decision on how to proceed
with processing at the level of each individual xmax field.  Its no-op
multi processing "freezes" an xmax in the event of a costly-to-process
xmax on a page when (for whatever reason) page-level freezing is
triggered. If, on the other hand, page-level freezing isn't triggered
for the page, then page-level no-op processing takes care of the multi
for us instead. Either way, the remaining Multi will ratchet back
VACUUM's relfrozenxid and/or relminmxid trackers as required, and we
won't need an expensive second pass over the multi (unless we really
have no choice, for example during a VACUUM FREEZE, where
OldestXmin==FreezeLimit).

> Regarding correctness, it seems like the basic structure and invariants
> are the same, and it builds on the changes already in 9e5405993c. Patch
> 0002 seems *mostly* about making choices within the existing framework.
> That gives me more confidence.

You're right that it's the same basic invariants as before, of course.
Turns out that those invariants can be pushed quite far.

Though note that I kind of invented a new invariant (not really, sort
of). Well, it's a postcondition, which is a sort of invariant: any
scanned heap page that can be cleanup locked must never have any
remaining XIDs < FreezeLimit, nor can any MXIDs < MultiXactCutoff
remain. But a cleanup-locked page does *not* need to get rid of all
XIDs < OldestXmin, nor MXIDs < OldestMxact.

This flexibility is mostly useful because it allows lazy_scan_prune to
just decide to not freeze. But, to a much lesser degree, it's useful

Re: Simplifications for error messages related to compression

2022-12-20 Thread Justin Pryzby

On Wed, Dec 21, 2022 at 10:43:19AM +0900, Michael Paquier wrote:
> On Mon, Dec 19, 2022 at 02:42:13PM +0900, Michael Paquier wrote:
> > Thoughts or objections?
> 
> Hearing nothing, done..

-   pg_fatal("not built with zlib support");
+   pg_fatal("this build does not support compression with 
%s", "gzip");

I tried to say in the other thread that gzip != zlib.

This message may be better for translation, but (for WriteDataToArchive
et al) the message is now less accurate, and I suspect will cause some
confusion.

5e73a6048 introduced a similar user-facing issue: pg_dump -Fc -Z gzip
does not output a gzip.

$ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Fc -Z gzip regression |xxd 
|head
: 5047 444d 5001 0e00 0408 0101 0100   PGDMP...

I'm okay with it if you think this is no problem - maybe it's enough to
document that the output is zlib and not gzip.

Otherwise, one idea was to reject "gzip" with -Fc.  It could accept
integers only.

BTW I think it's helpful to include the existing participants when
forking a thread.

-- 
Justin

Re: Perform streaming logical transactions by background workers and parallel apply

2022-12-20 Thread Peter Smith

On Tue, Dec 20, 2022 at 5:22 PM Peter Smith  wrote:
>
> On Tue, Dec 20, 2022 at 2:20 PM Amit Kapila  wrote:
> >
> > On Tue, Dec 20, 2022 at 8:17 AM Peter Smith  wrote:
> > >
> > > Summary
> > > ---
> > >
> > > In summary, everything I have tested so far appeared to be working
> > > properly. In other words, for overlapping streamed transactions of
> > > different kinds, and regardless of whether zero/some/all of those
> > > transactions are getting processed by a PA worker, the resulting
> > > replicated data looked consistently OK.
> > >
> >
> > Thanks for doing the detailed testing of this patch. I think the one
> > area where we can focus more is the switch-to-serialization mode while
> > sending changes to the parallel worker.
> >
> > >
> > > NOTE - all testing described in this post above was using v58-0001
> > > only. However, the point of implementing these as a .spec test was to
> > > be able to repeat these same regression tests on newer versions with
> > > minimal manual steps required. Later I plan to fetch/apply the most
> > > recent patch version and repeat these same tests.
> > >
> >
> > That would be really helpful.
> >
>

FYI, my pub-sub.spec tests gave the same result (i.e. pass) when
re-run with the latest v63 (0001,0002,0003) applied.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: [PATCH] Add function to_oct

2022-12-20 Thread Ian Lawrence Barwick

2022年12月21日(水) 10:42 Eric Radman :
>
> On Wed, Dec 21, 2022 at 08:36:40AM +0900, Ian Lawrence Barwick wrote:
> > 2022年12月21日(水) 7:08 Eric Radman :>
> > > Hello!
> > >
> > > This patch is a new function based on the implementation of to_hex(int).
> > >
> > > Since support for octal integer literals was added, to_oct(int) allows
> > > octal values to be easily stored and returned in query results.
> > >
> > >   to_oct(0o755) = '755'
> > >
> > > This is probably most useful for storing file system permissions.
> >
> > Seems like it would be convenient to have. Any reason why there's
> > no matching "to_oct(bigint)" version?
>
> I couldn't think of a reason someone might want an octal
> representation of a bigint.  Certainly it would be easy to add
> if there is value in supporting all of the same argument types.

Yeah, I am struggling to think of a practical application other than
symmetry with to_hex().


Regards

Ian Barwick

Re: Avoid lost result of recursion (src/backend/optimizer/util/inherit.c)

2022-12-20 Thread David Rowley

On Wed, 21 Dec 2022 at 13:15, Ranier Vilela  wrote:
> IMO, I think that commit a61b1f7, has an oversight.
> Currently is losing the result of recursion of function 
> translate_col_privs_multilevel.
>
> Once the variable result (Bitmapset pointer) is reassigned.
>
> Without a test case for this patch.
> But also, do not have a test case for the current thinko in head.

hmm, that code looks a bit suspect to me too.

Are you able to write a test that shows the bug which fails before
your change and passes after applying it? I don't think it's quite
enough to claim that your changes pass make check given that didn't
fail before your change.

Also, I think it might be better to take the opportunity to rewrite
the function to not use recursion. I don't quite see the need for it
here and it looks like that might have helped contribute to the
reported issue.  Can't we just write this as a while loop instead of
having the function call itself? It's not as if we need stack space
for keeping track of multiple parents. A child relation can only have
1 parent. It seems to me that we can just walk there by looping.

David

Re: Refactor SCRAM code to dynamically handle hash type and key length

2022-12-20 Thread Jonathan S. Katz


On 12/20/22 2:25 AM, Michael Paquier wrote:

On Tue, Dec 20, 2022 at 08:58:38AM +0900, Michael Paquier wrote:

Thanks!  I have applied for I have here..  There are other pieces to
think about in this area.


FYI, I have spent a few hours looking at the remaining parts of the
SCRAM code that could be simplified if a new hash method is added, and
this b3bb7d1 has really made things easier.


Great! Thanks for doing a quick "stress test" on this.


 There are a few things
that will need more thoughts.  Here are my notes, assuming that
SHA-512 is done:
1) HBA entries had better use a new keyword for scram-sha-512, implying
a new uaSCRAM512 to combine with the existing uaSCRAM.  One reason
behind that it to advertise the mechanisms supported back to the
client depending on the matching HBA entry.


This does seem like a challenge, particularly if we have to support 
multiple different SCRAM hashes.


Perhaps this can be done with an interface change in HBA. For example, 
we could rename the auth method from "scram-sha-256" to "scram" and 
support an option list of hashes (e.g. "hash=sha-512,sha-256"). We can 
then advertise the user-selected hashes as part of the handshake.


For backwards compatibility, we can take an auth method of 
"scram-sha-256" to mean "scram" + using a sha-256 hash. Similarly, if no 
hash map is defined, we can default to "scram-sha-256".


Anyway, I understand this point would require more discussion, but 
perhaps it is a way to simplify the amount of code we would need to 
write to support more hashes.



2) If a role has a SCRAM-SHA-256 password and the HBA entry matches
scram-sha-512, the SASL exchange needs to go through the mock process
with SHA-512 and fail.
3) If a role has a SCRAM-SHA-512 password and the HBA entry matches
scram-sha-256, the SASL exchange needs to go through the mock process
with SHA-256 and fail.


*nods*


4) The case of MD5 is something that looks a bit tricky at quick
glance.  We know that if the role has a MD5 password stored, we will
fail anyway.  So we could just advertise the SHA-256 mechanisms in
this case and map the mock to that?


Is this the case where we specify "md5" as the auth method but the 
user-password is stored in SCRAM?



5) The mechanism choice in libpq needs to be reworked a bit based on
what the backend sends.  There may be no point in advertising all the
SHA-256 and SHA-512 mechanisms at the same time, I guess.


Yeah, I think a user may want to select which ones they want to use 
(e.g. they may not want to advertise SHA-256).



Attached is a WIP patch that I have played with.  This shows the parts
of the code that would need more thoughts if implementing such
things.  This works for the cases 1~3 (see the TAP tests).  I've given
up on the MD5 case 4 for now, but perhaps I just missed a simple trick.
5 in libpq uses dirty tricks.  I have marked this CF entry as
committed, and I'll come back to each relevant part on new separate
threads.


Thanks for starting this.

Jonathan



OpenPGP_signature
Description: OpenPGP digital signature

Re: New strategies for freezing, advancing relfrozenxid early

2022-12-20 Thread Jeff Davis

On Sun, 2022-12-18 at 14:20 -0800, Peter Geoghegan wrote:
> On Thu, Dec 15, 2022 at 10:53 AM Peter Geoghegan  wrote:
> > I agree that the burden of catch-up freezing is excessive here (in
> > fact I already wrote something to that effect on the wiki page).
> > The
> > likely solution can be simple enough.
> 
> Attached is v10, which fixes this issue, but using a different
> approach to the one I sketched here.

Comments on 0002:

Can you explain the following portion of the diff:

  - else if (MultiXactIdPrecedes(multi, cutoffs->MultiXactCutoff))
  + else if (MultiXactIdPrecedes(multi, cutoffs->OldestMxact))

  ...

  + /* Can't violate the MultiXactCutoff invariant, either */
  + if (!need_replace)
  + need_replace = MultiXactIdPrecedes(
  +multi, cutoffs->MultiXactCutoff);

Regarding correctness, it seems like the basic structure and invariants
are the same, and it builds on the changes already in 9e5405993c. Patch
0002 seems *mostly* about making choices within the existing framework.
That gives me more confidence.

That being said, it does push harder against the limits on both sides.
If I understand correctly, that means pages with wider distributions of
xids are going to persist longer, which could expose pre-existing bugs
in new and interesting ways.

Next, the 'freeze_required' field suggests that it's more involved in
the control flow that causes freezing than it actually is. All it does
is communicate how the trackers need to be adjusted. The return value
of heap_prepare_freeze_tuple() (and underneath, the flags set by
FreezeMultiXactId()) are what actually control what happens. It would
be nice to make this more clear somehow.

The comment:

  /*  
   * If we freeze xmax, make absolutely sure that it's not an XID that
   * is important.  (Note, a lock-only xmax can be removed independent
   * of committedness, since a committed lock holder has released the 
   * lock).   
   */

caused me to go down a rabbit hole looking for edge cases where we
might want to freeze an xmax but not an xmin; e.g. tup.xmax <
OldestXmin < tup.xmin or the related case where tup.xmax < RecentXmin <
tup.xmin. I didn't find a problem, so that's good news.

I also tried some pgbench activity along with concurrent vacuums (and
vacuum freezes) along with periodic verify_heapam(). No problems there.

Did you already describe the testing you've done for 0001+0002
specfiically? It's not radically new logic, but it would be good to try
to catch minor state-handling errors.

-- 
Jeff Davis
PostgreSQL Contributor Team - AWS

Re: Simplifications for error messages related to compression

2022-12-20 Thread Michael Paquier

On Mon, Dec 19, 2022 at 02:42:13PM +0900, Michael Paquier wrote:
> Thoughts or objections?

Hearing nothing, done..
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] Add function to_oct

2022-12-20 Thread Eric Radman

On Wed, Dec 21, 2022 at 08:36:40AM +0900, Ian Lawrence Barwick wrote:
> 2022年12月21日(水) 7:08 Eric Radman :>
> > Hello!
> >
> > This patch is a new function based on the implementation of to_hex(int).
> >
> > Since support for octal integer literals was added, to_oct(int) allows
> > octal values to be easily stored and returned in query results.
> >
> >   to_oct(0o755) = '755'
> >
> > This is probably most useful for storing file system permissions.
> 
> Seems like it would be convenient to have. Any reason why there's
> no matching "to_oct(bigint)" version?

I couldn't think of a reason someone might want an octal
representation of a bigint.  Certainly it would be easy to add
if there is value in supporting all of the same argument types.

Re: Use get_call_result_type() more widely

2022-12-20 Thread Michael Paquier

On Tue, Dec 20, 2022 at 11:12:09AM -0500, Tom Lane wrote:
> On the whole, I'm content to leave the BlessTupleDesc calls in
> these callers.  They are cheap enough if the tupdesc is already
> blessed.

Yeah, agreed.

I have applied v2-0001, after fixing one error in wparser.c where some
of the previous style was not removed, leading to unnecessary work and
the same TupleDesc being built twice for the two ts_token_type()'s
(input of OID or text).
--
Michael


signature.asc
Description: PGP signature

Re: Perform streaming logical transactions by background workers and parallel apply

2022-12-20 Thread Peter Smith

FYI - applying v63-0001 using the latest master does not work.

git apply 
../patches_misc/v63-0001-Perform-streaming-logical-transactions-by-parall.patch
error: patch failed: src/backend/replication/logical/meson.build:1
error: src/backend/replication/logical/meson.build: patch does not apply

Looks like a recent commit [1] to add copyrights broke the patch

--
[1] 
https://github.com/postgres/postgres/commit/8284cf5f746f84303eda34d213e89c8439a83a42

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Small miscellaneus fixes (Part II)

2022-12-20 Thread Justin Pryzby

On Fri, Nov 25, 2022 at 06:27:04PM -0300, Ranier Vilela wrote:
> 5. Use boolean operator with boolean operands
> (b/src/backend/commands/tablecmds.c)

tablecmds.c: right.  Since 074c5cfbf

pg_dump.c: right.  Since b08dee24a

> 4. Fix dead code (src/backend/utils/adt/formatting.c)
> Np->sign == '+', is different than "!= '-'" and is different than "!= '+'"
> So the else is never hit.

formatting.c: I don't see the problem.

if (Np->sign != '-')
...
else if (Np->sign != '+' && IS_PLUS(Np->Num))
...

You said that the "else" is never hit, but why ?
What if sign is 0 ?

-- 
Justin

Re: Array initialisation notation in syscache.c

2022-12-20 Thread Thomas Munro

On Wed, Dec 21, 2022 at 12:05 PM Tom Lane  wrote:
> Thomas Munro  writes:
> > Do you think this is better?
>
> I'm not at all on board with adding runtime overhead to
> save maintaining the nkeys fields.

I don't see how to do it at compile time without getting the
preprocessor involved.  What do you think about this version?

[ATTNUM] = {
AttributeRelationId,
AttributeRelidNumIndexId,
KEY(Anum_pg_attribute_attrelid,
Anum_pg_attribute_attnum),
128
},

> I'm kind of neutral on using "[N] = " as a substitute for
> ordering the entries correctly.  While that does remove
> one failure mode, it seems like it adds another (ie
> failure to provide an entry at all would be masked).

It fails very early in testing if you do that.  Admittedly, the
assertion is hard to understand, but if I add a new assertion close to
the cause with a new comment to say what you did wrong, I think that
should be good enough?
From 6a2425bdfccaa9a247657f3e148b87753213e746 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 20 Dec 2022 15:47:59 +1300
Subject: [PATCH v2] Improve notation of cacheinfo table in syscache.c.

Use C99 designated initializer syntax for the array elements, instead of
writing the enumerator name and position  in a comment.  Replace nkeys
and key with a local variadic macro, for a shorter notation.
---
 src/backend/utils/cache/syscache.c | 943 +
 1 file changed, 293 insertions(+), 650 deletions(-)

diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 5f17047047..99513aa336 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -89,8 +89,7 @@
 
 	Add your entry to the cacheinfo[] array below. All cache lists are
 	alphabetical, so add it in the proper place.  Specify the relation OID,
-	index OID, number of keys, key attribute numbers, and initial number of
-	hash buckets.
+	index OID, key attribute numbers, and initial number of hash buckets.
 
 	The number of hash buckets must be a power of 2.  It's reasonable to
 	set this to the number of entries that might be in the particular cache
@@ -122,920 +121,558 @@ struct cachedesc
 	int			nbuckets;		/* number of hash buckets for this cache */
 };
 
+/* Macro to provide nkeys and key array with convenient syntax. */
+#define KEY(...) VA_ARGS_NARGS(__VA_ARGS__), { __VA_ARGS__ }
+
 static const struct cachedesc cacheinfo[] = {
-	{AggregateRelationId,		/* AGGFNOID */
+	[AGGFNOID] = {
+		AggregateRelationId,
 		AggregateFnoidIndexId,
-		1,
-		{
-			Anum_pg_aggregate_aggfnoid,
-			0,
-			0,
-			0
-		},
+		KEY(Anum_pg_aggregate_aggfnoid),
 		16
 	},
-	{AccessMethodRelationId,	/* AMNAME */
+	[AMNAME] = {
+		AccessMethodRelationId,
 		AmNameIndexId,
-		1,
-		{
-			Anum_pg_am_amname,
-			0,
-			0,
-			0
-		},
+		KEY(Anum_pg_am_amname),
 		4
 	},
-	{AccessMethodRelationId,	/* AMOID */
+	[AMOID] = {
+		AccessMethodRelationId,
 		AmOidIndexId,
-		1,
-		{
-			Anum_pg_am_oid,
-			0,
-			0,
-			0
-		},
+		KEY(Anum_pg_am_oid),
 		4
 	},
-	{AccessMethodOperatorRelationId,	/* AMOPOPID */
+	[AMOPOPID] = {
+		AccessMethodOperatorRelationId,
 		AccessMethodOperatorIndexId,
-		3,
-		{
-			Anum_pg_amop_amopopr,
+		KEY(Anum_pg_amop_amopopr,
 			Anum_pg_amop_amoppurpose,
-			Anum_pg_amop_amopfamily,
-			0
-		},
+			Anum_pg_amop_amopfamily),
 		64
 	},
-	{AccessMethodOperatorRelationId,	/* AMOPSTRATEGY */
+	[AMOPSTRATEGY] = {
+		AccessMethodOperatorRelationId,
 		AccessMethodStrategyIndexId,
-		4,
-		{
-			Anum_pg_amop_amopfamily,
+		KEY(Anum_pg_amop_amopfamily,
 			Anum_pg_amop_amoplefttype,
 			Anum_pg_amop_amoprighttype,
-			Anum_pg_amop_amopstrategy
-		},
+			Anum_pg_amop_amopstrategy),
 		64
 	},
-	{AccessMethodProcedureRelationId,	/* AMPROCNUM */
+	[AMPROCNUM] = {
+		AccessMethodProcedureRelationId,
 		AccessMethodProcedureIndexId,
-		4,
-		{
-			Anum_pg_amproc_amprocfamily,
+		KEY(Anum_pg_amproc_amprocfamily,
 			Anum_pg_amproc_amproclefttype,
 			Anum_pg_amproc_amprocrighttype,
-			Anum_pg_amproc_amprocnum
-		},
+			Anum_pg_amproc_amprocnum),
 		16
 	},
-	{AttributeRelationId,		/* ATTNAME */
+	[ATTNAME] = {
+		AttributeRelationId,
 		AttributeRelidNameIndexId,
-		2,
-		{
-			Anum_pg_attribute_attrelid,
-			Anum_pg_attribute_attname,
-			0,
-			0
-		},
+		KEY(Anum_pg_attribute_attrelid,
+			Anum_pg_attribute_attname),
 		32
 	},
-	{AttributeRelationId,		/* ATTNUM */
+	[ATTNUM] = {
+		AttributeRelationId,
 		AttributeRelidNumIndexId,
-		2,
-		{
-			Anum_pg_attribute_attrelid,
-			Anum_pg_attribute_attnum,
-			0,
-			0
-		},
+		KEY(Anum_pg_attribute_attrelid,
+			Anum_pg_attribute_attnum),
 		128
 	},
-	{AuthMemRelationId,			/* AUTHMEMMEMROLE */
+	[AUTHMEMMEMROLE] = {
+		AuthMemRelationId,
 		AuthMemMemRoleIndexId,
-		3,
-		{
-			Anum_pg_auth_members_member,
+		KEY(Anum_pg_auth_members_member,
 			Anum_pg_auth_members_roleid,
-			Anum_pg_auth_members_grantor,
-			0
-		},
+			Anum_pg_auth_members_grantor),
 		8
 	},
-

Avoid lost result of recursion (src/backend/optimizer/util/inherit.c)

2022-12-20 Thread Ranier Vilela

Hi.

IMO, I think that commit a61b1f7
,
has an oversight.
Currently is losing the result of recursion of function
translate_col_privs_multilevel.

Once the variable result (Bitmapset pointer) is reassigned.

Without a test case for this patch.
But also, do not have a test case for the current thinko in head.

Pass regress check.

regards,
Ranier Vilela


avoid_lost_result_of_recursion.patch
Description: Binary data

Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

2022-12-20 Thread Michael Paquier

On Tue, Dec 20, 2022 at 06:04:40PM +0530, Bharath Rupireddy wrote:
> On Tue, Dec 20, 2022 at 1:27 PM Magnus Hagander  wrote:
>> Caught this thread late. To me, pg_dissect_walfile_name() is a
>> really strange name for a function. Grepping our I code I see the
>> term dissect s used somewhere inside the regex code and exactly
>> zero instances elsewhere. Which is why I definitely didn't
>> recognize the term... 
>>
>> Wouldn't something like pg_split_walfile_name() be a lot more
>> consistent with the rest of our names? 

Fine by me to change that if there is little support for the current
naming, though the current one does not sound that bad to me either.

> Hm. FWIW, here's the patch.

"split" is used a lot for the picksplit functions, but not in any of
the existing functions as a name.  Some extra options: parse, read,
extract, calculate, deduce, get.  "parse" would be something I would
be OK with.
--
Michael


signature.asc
Description: PGP signature

Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

2022-12-20 Thread Ian Lawrence Barwick

2022年12月20日(火) 21:35 Bharath Rupireddy :
>
> On Tue, Dec 20, 2022 at 1:27 PM Magnus Hagander  wrote:
> >
> > On Tue, Dec 20, 2022 at 5:40 AM Michael Paquier  wrote:
> >>
> >> On Tue, Dec 20, 2022 at 09:01:02AM +0900, Michael Paquier wrote:
> >> > Yeah, my mind was considering as well yesterday the addition of a note
> >> > in the docs about something among these lines, so fine by me.
> >>
> >> And applied that, after tweaking a few tiny things on a last lookup
> >> with a catversion bump.  Note that the example has been moved at the
> >> bottom of the table for these functions, which is more consistent with
> >> the surroundings.
> >>
> >
> > Hi!
> >
> > Caught this thread late. To me, pg_dissect_walfile_name() is a really 
> > strange name for a function. Grepping our I code I see the term dissect s 
> > used somewhere inside the regex code and exactly zero instances elsewhere. 
> > Which is why I definitely didn't recognize the term...

Late to the party too, but I did wonder about the name when I saw it.

> > Wouldn't something like pg_split_walfile_name() be a lot more consistent 
> > with the rest of our names?
>
> Hm. FWIW, here's the patch.

Hmm, "pg_split_walfile_name()" sounds like it would return three 8
character strings.

Maybe something like "pg_walfile_name_elements()" ?

Regards

Ian Barwick

Re: Support logical replication of DDLs

2022-12-20 Thread Peter Smith

On Tue, Dec 20, 2022 at 2:29 AM Alvaro Herrera  wrote:
>
> On 2022-Oct-31, Peter Smith wrote:
>
> > 6. add_policy_clauses
> >
> > + else
> > + {
> > + append_bool_object(policyStmt, "present", false);
> > + }
> >
> > Something seems strange. Probably I'm wrong but just by code
> > inspection it looks like there is potential for there to be multiple
> > param {present:false} JSON objects:
> >
> > {"present" :false},
> > {"present" :false},
> > {"present" :false},
> >
> > Shouldn't those all be array elements or something? IIUC apart from
> > just DDL, the JSON idea was going to (in future) allow potential
> > machine manipulation of the values prior to the replication, but
> > having all these ambiguous-looking objects does not seem to lend
> > itself to that idea readily. How to know what are each of those params
> > representing?
>
> Do you mean that a single JSON object has multiple member with
> "present":"false"?  That sounds like something we should never produce,
> and post-processing to remove them does not sound good either.  Is that
> really what is happening, or do I misunderstand?
>

Yes, that is what I meant.

The add_policy_clauses code below is from latest patch v54-0001:

+static void
+add_policy_clauses(ObjTree *ret, Oid policyOid, List *roles, bool do_qual,
+bool do_with_check)
+{
+ Relation polRel = table_open(PolicyRelationId, AccessShareLock);
+ HeapTuple polTup = get_catalog_object_by_oid(polRel,
Anum_pg_policy_oid, policyOid);
+ Form_pg_policy polForm;
+
+ if (!HeapTupleIsValid(polTup))
+ elog(ERROR, "cache lookup failed for policy with OID %u", policyOid);
+
+ polForm = (Form_pg_policy) GETSTRUCT(polTup);
+
+ /* Add the "ON table" clause */
+ append_object_object(ret, "ON %{table}D",
+ new_objtree_for_qualname_id(RelationRelationId,
+ polForm->polrelid));
+
+ /*
+ * Add the "TO role" clause, if any.  In the CREATE case, it always
+ * contains at least PUBLIC, but in the ALTER case it might be empty.
+ */
+ if (roles)
+ {
+ List*list = NIL;
+ ListCell   *cell;
+
+ foreach(cell, roles)
+ {
+ RoleSpec   *spec = (RoleSpec *) lfirst(cell);
+
+ list = lappend(list,
+new_object_object(new_objtree_for_rolespec(spec)));
+ }
+ append_array_object(ret, "TO %{role:, }R", list);
+ }
+ else
+ append_not_present(ret);
+
+ /* Add the USING clause, if any */
+ if (do_qual)
+ {
+ Datum deparsed;
+ Datum storedexpr;
+ bool isnull;
+
+ storedexpr = heap_getattr(polTup, Anum_pg_policy_polqual,
+   RelationGetDescr(polRel), );
+ if (isnull)
+ elog(ERROR, "null polqual expression in policy %u", policyOid);
+ deparsed = DirectFunctionCall2(pg_get_expr, storedexpr, polForm->polrelid);
+ append_string_object(ret, "USING (%{expression}s)", "expression",
+ TextDatumGetCString(deparsed));
+ }
+ else
+ append_not_present(ret);
+
+ /* Add the WITH CHECK clause, if any */
+ if (do_with_check)
+ {
+ Datum deparsed;
+ Datum storedexpr;
+ bool isnull;
+
+ storedexpr = heap_getattr(polTup, Anum_pg_policy_polwithcheck,
+   RelationGetDescr(polRel), );
+ if (isnull)
+ elog(ERROR, "null polwithcheck expression in policy %u", policyOid);
+ deparsed = DirectFunctionCall2(pg_get_expr, storedexpr, polForm->polrelid);
+ append_string_object(ret, "WITH CHECK (%{expression}s)",
+ "expression", TextDatumGetCString(deparsed));
+ }
+ else
+ append_not_present(ret);
+
+ relation_close(polRel, AccessShareLock);
+}

Actually, I have not yet tried running this so maybe I am mistaken,
but looking at the code above I thought if 'roles' is NULL and
'do_qual' is false and 'do_with_check' is false then the logic could
end up doing:

+ append_not_present(ret);
+ append_not_present(ret);
+ append_not_present(ret);

> Obviously, if you have an object with several sub-objects, each of the
> sub-objects can have its own "present:false" label.  The idea is that
> the clause that each subobject represents may not be in the command as
> written by the user; but perhaps a post-processor of the JSON blob wants
> to change things so that the clause does appear in the final output.
> And this should be doable for each individual optional clause in each
> command, which means that, yeah, there should be multiple
> "present:false" pairs in a single JSON blob, in different paths.
>
> (For example, if the user writes "CREATE SEQUENCE foobar", we would get
> a tree that has {fmt: "CACHE %{value}", present: false, value: 32}, so
> if you just convert that to text DDL without further ado you would get
> the original command verbatim; but you can poke the "present" to true so
> you would get "CREATE SEQUENCE foobar CACHE 32".)
>
>
> Also, I think I came up with the idea of having "present:boolean" a bit
> late in the development of this code, so it's quite possible that there
> are commands that are inconsistent in their support of this pattern.
> That makes it especially important to review the representation of each
> command carefully.
>

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: [PATCH] Add function to_oct

2022-12-20 Thread Ian Lawrence Barwick

2022年12月21日(水) 7:08 Eric Radman :>
> Hello!
>
> This patch is a new function based on the implementation of to_hex(int).
>
> Since support for octal integer literals was added, to_oct(int) allows
> octal values to be easily stored and returned in query results.
>
>   to_oct(0o755) = '755'
>
> This is probably most useful for storing file system permissions.

Seems like it would be convenient to have. Any reason why there's
no matching "to_oct(bigint)" version?

Patch has been added to the next commitfest [1], thanks.

[1] https://commitfest.postgresql.org/41/4071/

Regards

Ian Barwick

Re: Array initialisation notation in syscache.c

2022-12-20 Thread Tom Lane

Thomas Munro  writes:
> Do you think this is better?

I'm not at all on board with adding runtime overhead to
save maintaining the nkeys fields.

Getting rid of the useless trailing zeroes in the key[] arrays
is clearly a win, though.

I'm kind of neutral on using "[N] = " as a substitute for
ordering the entries correctly.  While that does remove
one failure mode, it seems like it adds another (ie
failure to provide an entry at all would be masked).

regards, tom lane

Re: meson files copyright

2022-12-20 Thread Thomas Munro

On Tue, Dec 20, 2022 at 6:27 PM Noah Misch  wrote:
> On Mon, Dec 19, 2022 at 09:09:25PM +0100, Peter Eisentraut wrote:
> > On 19.12.22 19:33, Robert Haas wrote:
> > >On Mon, Dec 19, 2022 at 1:03 PM Tom Lane  wrote:
> > >>Vik Fearing  writes:
> > >>>Perhaps a bit off-topic, but what is the point of the file identifiers?
> > >>
> > >>IMO, it helps to tell things apart when you've got a bunch of editor
> > >>windows open on some mighty samey-looking meson.build files.
> > >
> > >On the other hand, maintaining those identification lines in all of
> > >our files has a pretty high distributed cost. I never use them to
> > >figure out what file I'm editing because my editor can tell me that.
> > >But I do have to keep fixing those lines as I create new files. It's
> > >not the most annoying thing ever, but I wouldn't mind a bit if I
> > >didn't have to do it any more.
> >
> > I agree it's not very useful and a bit annoying.
>
> Agreed.  To me, it's just one more thing to get wrong.

+1

We're just cargo-culting the old CVS $Id$ tags.

Array initialisation notation in syscache.c

2022-12-20 Thread Thomas Munro

Hi,

While hacking on a new system catalogue for a nearby thread, it
occurred to me that syscache.c's table of entries could be made more
readable and less error prone.  They look like this:

{AttributeRelationId,   /* ATTNUM */
AttributeRelidNumIndexId,
2,
{
Anum_pg_attribute_attrelid,
Anum_pg_attribute_attnum,
0,
0
},
128
},

Do you think this is better?

[ATTNUM] = {
AttributeRelationId,
AttributeRelidNumIndexId,
{
Anum_pg_attribute_attrelid,
Anum_pg_attribute_attnum
},
128
},

We could also consider writing eg ".nbuckets  = 128", but it's not a
complicated struct that the eye gets lost in, so I didn't bother with
that in the attached.
From 3c554145b65bde834a1ff9f65396d2ab07f72120 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 20 Dec 2022 15:47:59 +1300
Subject: [PATCH] Improve notation of cacheinfo table in syscache.c.

Use C99 designated initializer syntax for the array elements, instead of
writing the position in a comment.  Remove the explicit trailing zeros
from the key.  Remove the human-supplied nkeys member, which can be
computed by searching for the last non-zero key.
---
 src/backend/utils/cache/syscache.c | 700 +++--
 1 file changed, 253 insertions(+), 447 deletions(-)

diff --git a/src/backend/utils/cache/syscache.c b/src/backend/utils/cache/syscache.c
index 5f17047047..4f22861926 100644
--- a/src/backend/utils/cache/syscache.c
+++ b/src/backend/utils/cache/syscache.c
@@ -89,8 +89,7 @@
 
 	Add your entry to the cacheinfo[] array below. All cache lists are
 	alphabetical, so add it in the proper place.  Specify the relation OID,
-	index OID, number of keys, key attribute numbers, and initial number of
-	hash buckets.
+	index OID, key attribute numbers, and initial number of hash buckets.
 
 	The number of hash buckets must be a power of 2.  It's reasonable to
 	set this to the number of entries that might be in the particular cache
@@ -117,59 +116,48 @@ struct cachedesc
 {
 	Oid			reloid;			/* OID of the relation being cached */
 	Oid			indoid;			/* OID of index relation for this cache */
-	int			nkeys;			/* # of keys needed for cache lookup */
 	int			key[4];			/* attribute numbers of key attrs */
 	int			nbuckets;		/* number of hash buckets for this cache */
 };
 
 static const struct cachedesc cacheinfo[] = {
-	{AggregateRelationId,		/* AGGFNOID */
+	[AGGFNOID] = {
+		AggregateRelationId,
 		AggregateFnoidIndexId,
-		1,
 		{
-			Anum_pg_aggregate_aggfnoid,
-			0,
-			0,
-			0
+			Anum_pg_aggregate_aggfnoid
 		},
 		16
 	},
-	{AccessMethodRelationId,	/* AMNAME */
+	[AMNAME] = {
+		AccessMethodRelationId,
 		AmNameIndexId,
-		1,
 		{
-			Anum_pg_am_amname,
-			0,
-			0,
-			0
+			Anum_pg_am_amname
 		},
 		4
 	},
-	{AccessMethodRelationId,	/* AMOID */
+	[AMOID] = {
+		AccessMethodRelationId,
 		AmOidIndexId,
-		1,
 		{
-			Anum_pg_am_oid,
-			0,
-			0,
-			0
+			Anum_pg_am_oid
 		},
 		4
 	},
-	{AccessMethodOperatorRelationId,	/* AMOPOPID */
+	[AMOPOPID] = {
+		AccessMethodOperatorRelationId,
 		AccessMethodOperatorIndexId,
-		3,
 		{
 			Anum_pg_amop_amopopr,
 			Anum_pg_amop_amoppurpose,
-			Anum_pg_amop_amopfamily,
-			0
+			Anum_pg_amop_amopfamily
 		},
 		64
 	},
-	{AccessMethodOperatorRelationId,	/* AMOPSTRATEGY */
+	[AMOPSTRATEGY] = {
+		AccessMethodOperatorRelationId,
 		AccessMethodStrategyIndexId,
-		4,
 		{
 			Anum_pg_amop_amopfamily,
 			Anum_pg_amop_amoplefttype,
@@ -178,9 +166,9 @@ static const struct cachedesc cacheinfo[] = {
 		},
 		64
 	},
-	{AccessMethodProcedureRelationId,	/* AMPROCNUM */
+	[AMPROCNUM] = {
+		AccessMethodProcedureRelationId,
 		AccessMethodProcedureIndexId,
-		4,
 		{
 			Anum_pg_amproc_amprocfamily,
 			Anum_pg_amproc_amproclefttype,
@@ -189,131 +177,108 @@ static const struct cachedesc cacheinfo[] = {
 		},
 		16
 	},
-	{AttributeRelationId,		/* ATTNAME */
+	[ATTNAME] = {
+		AttributeRelationId,
 		AttributeRelidNameIndexId,
-		2,
 		{
 			Anum_pg_attribute_attrelid,
-			Anum_pg_attribute_attname,
-			0,
-			0
+			Anum_pg_attribute_attname
 		},
 		32
 	},
-	{AttributeRelationId,		/* ATTNUM */
+	[ATTNUM] = {
+		AttributeRelationId,
 		AttributeRelidNumIndexId,
-		2,
 		{
 			Anum_pg_attribute_attrelid,
-			Anum_pg_attribute_attnum,
-			0,
-			0
+			Anum_pg_attribute_attnum
 		},
 		128
 	},
-	{AuthMemRelationId,			/* AUTHMEMMEMROLE */
+	[AUTHMEMMEMROLE] = {
+		AuthMemRelationId,
 		AuthMemMemRoleIndexId,
-		3,
 		{
 			Anum_pg_auth_members_member,
 			Anum_pg_auth_members_roleid,
-			Anum_pg_auth_members_grantor,
-			0
+			Anum_pg_auth_members_grantor
 		},
 		8
 	},
-	{AuthMemRelationId,			/* AUTHMEMROLEMEM */
+	[AUTHMEMROLEMEM] = {
+		AuthMemRelationId,
 		AuthMemRoleMemIndexId,

Re: pgsql: Doc: Explain about Column List feature.

2022-12-20 Thread Peter Smith

On Tue, Dec 20, 2022 at 7:21 PM Alvaro Herrera  wrote:
>
> On 2022-Dec-20, Peter Smith wrote:
>
> > If you change this warning title then it becomes the odd one out -
> > every other warning in all the pg docs just says "Warning".  IMO
> > maintaining consistency throughout is best. e.g. I can imagine maybe
> > someone searching for "Warning" in the docs, and now they are not
> > going to find this one.
>
> Hmm, how do you propose that people search for warnings, and fail to
> notice one that is not titled "Warning"?  In my mind, it is much more
> likely that they scan a page visually until they hit a red box ("eh
> look, a red box that I can ignore because its title is not Warning" does
> not sound very credible).  On the other hand, if you're going over the
> source .sgml files, you're going to grep for the warning tag, and that's
> going to be there.
>
> (Maybe you'd say somebody would grep for "" and not find this
> one because the > is not there anymore.  I grant you that that could
> happen.  But then they're doing it wrong already.  I don't think we need
> to cater to that.)
>

By "searching" I also meant just scanning visually, although I was
thinking more about scanning the PDF.

Right now, the intention of any text box is obvious at a glance
because of those titles like "Caution", "Tip", "Note", "Warning".
Sure, the HTML rendering also uses colours to convey the purpose, but
in the PDF version [1] everything is black-and-white so apart from the
title all boxes look the same. That's why I felt using non-standard
box titles might be throwing away some of the meaning - e.g. the
reader of the PDF won't know anymore at a glance are they looking at a
warning or a tip.

>
> Now, I did notice that we don't have any other titled warning boxes,
> because I had a quick look at all the other warnings we have.  I was
> surprised to find out that we have very few, which I think is good,
> because warnings are annoying.  I was also surprised that most of them
> are right not to have a title, because they are very quick one-para
> boxes.  But I did find two others that should probably have a title:
>
> https://www.postgresql.org/docs/15/app-pgrewind.html
> Maybe "Failures while rewinding"
>
> https://www.postgresql.org/docs/15/applevel-consistency.html
> Maybe "Serializable Transactions and Data Replication"
> (and patch it to mention logical replication)
>

--
[1] PDF docs - 
https://www.postgresql.org/files/documentation/pdf/15/postgresql-15-A4.pdf

Kind Regards,
Peter Smith.
Fujitsu Australia

[PATCH] Add function to_oct

2022-12-20 Thread Eric Radman

Hello!

This patch is a new function based on the implementation of to_hex(int).

Since support for octal integer literals was added, to_oct(int) allows
octal values to be easily stored and returned in query results.

  to_oct(0o755) = '755'

This is probably most useful for storing file system permissions.

--
Eric Radman
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 1f63dc6dba..b539e0dc0b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3699,6 +3699,23 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ to_oct
+
+to_oct ( integer )
+text
+   
+   
+Converts the number to its equivalent octal representation.
+   
+   
+to_oct(32)
+40
+   
+  
+
   

 
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 1c52deec55..16929ddeed 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -5229,6 +5229,31 @@ to_hex64(PG_FUNCTION_ARGS)
PG_RETURN_TEXT_P(cstring_to_text(ptr));
 }
 
+#define OCTBASE 8
+/*
+ * Convert an int32 to a string containing a base 8 (oct) representation of
+ * the number.
+ */
+Datum
+to_oct32(PG_FUNCTION_ARGS)
+{
+   uint32  value = (uint32) PG_GETARG_INT32(0);
+   char   *ptr;
+   const char *digits = "01234567";
+   charbuf[32];/* bigger than needed, but 
reasonable */
+
+   ptr = buf + sizeof(buf) - 1;
+   *ptr = '\0';
+
+   do
+   {
+   *--ptr = digits[value % OCTBASE];
+   value /= OCTBASE;
+   } while (ptr > buf && value);
+
+   PG_RETURN_TEXT_P(cstring_to_text(ptr));
+}
+
 /*
  * Return the size of a datum, possibly compressed
  *
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 98d90d9338..824df6fdc8 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -3687,6 +3687,9 @@
 { oid => '2090', descr => 'convert int8 number to hex',
   proname => 'to_hex', prorettype => 'text', proargtypes => 'int8',
   prosrc => 'to_hex64' },
+{ oid => '2173', descr => 'convert int4 number to oct',
+  proname => 'to_oct', prorettype => 'text', proargtypes => 'int4',
+  prosrc => 'to_oct32' },
 
 # for character set encoding support
 
diff --git a/src/test/regress/expected/strings.out 
b/src/test/regress/expected/strings.out
index f028c1f10f..5ebaeb4a80 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2143,6 +2143,15 @@ select 
to_hex(256::bigint*256::bigint*256::bigint*256::bigint - 1) AS ""
  
 (1 row)
 
+--
+-- test to_oct
+--
+select to_oct(256*256*256 - 1) AS "";
+  
+--
+ 
+(1 row)
+
 --
 -- SHA-2
 --
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 932f71cbca..c4c40949e7 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -691,6 +691,11 @@ select to_hex(256*256*256 - 1) AS "ff";
 
 select to_hex(256::bigint*256::bigint*256::bigint*256::bigint - 1) AS 
"";
 
+--
+-- test to_oct
+--
+select to_oct(256*256*256 - 1) AS "";
+
 --
 -- SHA-2
 --

Re: appendBinaryStringInfo stuff

2022-12-20 Thread Robert Haas

On Tue, Dec 20, 2022 at 10:47 AM Andrew Dunstan  wrote:
> There are 5 uses in the jsonb code where the length param is a compile
> time constant:
>
> andrew@ub22:adt $ grep appendBinary.*[0-9] jsonb*
> jsonb.c:appendBinaryStringInfo(out, "null", 4);
> jsonb.c:appendBinaryStringInfo(out, "true", 4);
> jsonb.c:appendBinaryStringInfo(out, "false", 5);
> jsonb.c:appendBinaryStringInfo(out, ": ", 2);
> jsonb.c:appendBinaryStringInfo(out, "", 4);
>
> None of these really bother me much, TBH. In fact the last one is
> arguably nicer because it tells you without counting how many spaces
> there are.

+1. There are certainly cases where this kind of style can create
confusion, but I have a hard time putting any of these instances into
that category. It's obvious at a glance that null is 4 bytes, false is
5, etc.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Minimal logical decoding on standbys

2022-12-20 Thread Robert Haas

On Tue, Dec 20, 2022 at 3:39 PM Robert Haas  wrote:
> I think this might be the only WAL record type where there's a
> problem, but I haven't fully confirmed that yet.

It's not. GIST has the same issue. The same test case demonstrates the
problem there, if you substitute this test script for
kpt_hash_setup.sql and possibly also run it for somewhat longer. One
might think that this wouldn't be a problem, because the comments for
gistxlogDelete say this:

 /*
  * In payload of blk 0 : todelete OffsetNumbers
  */

But it's not in the payload of blk 0. It follows the main payload.

This is the reverse of xl_heap_freeze_page, which claims that freeze
plans and offset numbers follow, but they don't: they're in the data
for block 0. xl_btree_delete is also wrong, claiming that the data
follows when it's really attached to block 0. I guess whatever else we
do here, we should fix the comments.

Bottom line is that I think the two cases that have alignment issues
as coded are xl_hash_vacuum_one_page and gistxlogDelete. Everything
else is OK, as far as I can tell right now.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

kpt_gist_setup.sql
Description: Binary data

Re: Common function for percent placeholder replacement

2022-12-20 Thread Corey Huinker

>
> How about this new one with variable arguments?


I like this a lot, but I also see merit in Alvaro's PERCENT_OPT variadic,
which at least avoids the two lists getting out of sync.

Initially, I was going to ask that we have shell-quote-safe equivalents of
whatever fixed parameters we baked in, but this allows the caller to do
that as needed. It seems we could now just copy quote_identifier and strip
out the keyword checks to get the desired effect. Has anyone else had a
need for quote-safe args in the shell commands?

Re: appendBinaryStringInfo stuff

2022-12-20 Thread David Rowley

On Wed, 21 Dec 2022 at 04:47, Andrew Dunstan  wrote:
> jsonb.c:appendBinaryStringInfo(out, "", 4);
>
> None of these really bother me much, TBH. In fact the last one is
> arguably nicer because it tells you without counting how many spaces
> there are.

appendStringInfoSpaces() might be even better.

David

Re: Minimal logical decoding on standbys

2022-12-20 Thread Robert Haas

On Tue, Dec 20, 2022 at 1:25 PM Robert Haas  wrote:
> I'm going to go through all the record types one by one before
> commenting further.

OK, so xl_hash_vacuum_one_page, at least, is a live issue. To reproduce:

./configure 
echo 'COPT=-fsanitize=alignment -fno-sanitize-recover=all' > src/Makefile.custom
make -j8
make install
initdb
postgres

Then in another window:

pg_basebackup -D pgstandby -R
# edit postgresql.conf, change port number
postgres -D pgstandby

Then in a third window, using the attached files:

pgbench -i -s 10
psql -f kpt_hash_setup.sql
pgbench -T 10 -c 4 -j 4 -f kpt_hash_pgbench.sql

With the patch, the standby falls over:

bufpage.c:1194:31: runtime error: load of misaligned address
0x7fa62f05d119 for type 'OffsetNumber' (aka 'unsigned short'), which
requires 2 byte alignment
0x7fa62f05d119: note: pointer points here
 00 00 00  00 e5 00 8f 00 00 00 00  87 00 ab 20 20 20 20 20  20 20 20
20 20 20 20 20  20 20 20 20 20
  ^

Without the patch, this doesn't occur.

I think this might be the only WAL record type where there's a
problem, but I haven't fully confirmed that yet.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


kpt_hash_setup.sql
Description: Binary data


kpt_hash_pgbench.sql
Description: Binary data

Re: Minimal logical decoding on standbys

2022-12-20 Thread Drouvot, Bertrand


Hi,

On 12/20/22 7:31 PM, Robert Haas wrote:

On Tue, Dec 20, 2022 at 1:19 PM Andres Freund  wrote:

I don't understand what the "may*" or "*Possible" really are
about. snapshotConflictHorizon is a conflict with a certain xid - there
commonly won't be anything to conflict with. If there's a conflict in
the logical-decoding-on-standby case, we won't be able to apply it only
sometimes or such.


The way I was imagining it is that snapshotConflictHorizon tells us
whether there is a conflict with this record and then, if there is,
this new Boolean tells us whether it's relevant to logical decoding as
well.



the "may*" or "*Possible" was, most probably, because I preferred to have the 
uncertainty of the conflict mentioned in the name.
But, somehow, I was forgetting about the relationship with 
snapshotConflictHorizon.
So, agree with both of you that mentioning about the uncertainty of the 
conflict is useless.


How about "affectsLogicalDecoding", "conflictsWithSlots" or
"isCatalogRel" or such?


isCatalogRel seems fine to me.


For me too, please find attached v34 using it.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom 46a728b39f7ea85f2dec60d72cb400094955b785 Mon Sep 17 00:00:00 2001
From: bdrouvotAWS 
Date: Tue, 20 Dec 2022 19:56:22 +
Subject: [PATCH v34 6/6] Fixing Walsender corner case with logical decoding on
 standby.

The problem is that WalSndWaitForWal() waits for the *replay* LSN to
increase, but gets woken up by walreceiver when new WAL has been
flushed. Which means that typically walsenders will get woken up at the
same time that the startup process will be - which means that by the
time the logical walsender checks GetXLogReplayRecPtr() it's unlikely
that the startup process already replayed the record and updated
XLogCtl->lastReplayedEndRecPtr.

Introducing a new condition variable to fix this corner case.
---
 src/backend/access/transam/xlogrecovery.c | 28 
 src/backend/replication/walsender.c   | 31 +--
 src/backend/utils/activity/wait_event.c   |  3 +++
 src/include/access/xlogrecovery.h |  3 +++
 src/include/replication/walsender.h   |  1 +
 src/include/utils/wait_event.h|  1 +
 6 files changed, 59 insertions(+), 8 deletions(-)
  41.2% src/backend/access/transam/
  48.5% src/backend/replication/
   3.6% src/backend/utils/activity/
   3.4% src/include/access/

diff --git a/src/backend/access/transam/xlogrecovery.c 
b/src/backend/access/transam/xlogrecovery.c
index d5a81f9d83..ac8b169ab5 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -358,6 +358,9 @@ typedef struct XLogRecoveryCtlData
RecoveryPauseState recoveryPauseState;
ConditionVariable recoveryNotPausedCV;
 
+   /* Replay state (see getReplayedCV() for more explanation) */
+   ConditionVariable replayedCV;
+
slock_t info_lck;   /* locks shared variables shown 
above */
 } XLogRecoveryCtlData;
 
@@ -467,6 +470,7 @@ XLogRecoveryShmemInit(void)
SpinLockInit(>info_lck);
InitSharedLatch(>recoveryWakeupLatch);
ConditionVariableInit(>recoveryNotPausedCV);
+   ConditionVariableInit(>replayedCV);
 }
 
 /*
@@ -1916,6 +1920,11 @@ ApplyWalRecord(XLogReaderState *xlogreader, XLogRecord 
*record, TimeLineID *repl
XLogRecoveryCtl->lastReplayedTLI = *replayTLI;
SpinLockRelease(>info_lck);
 
+   /*
+* wake up walsender(s) used by logical decoding on standby.
+*/
+   ConditionVariableBroadcast(>replayedCV);
+
/*
 * If rm_redo called XLogRequestWalReceiverReply, then we wake up the
 * receiver so that it notices the updated lastReplayedEndRecPtr and 
sends
@@ -4916,3 +4925,22 @@ assign_recovery_target_xid(const char *newval, void 
*extra)
else
recoveryTarget = RECOVERY_TARGET_UNSET;
 }
+
+/*
+ * Return the ConditionVariable indicating that a replay has been done.
+ *
+ * This is needed for logical decoding on standby. Indeed the "problem" is that
+ * WalSndWaitForWal() waits for the *replay* LSN to increase, but gets woken up
+ * by walreceiver when new WAL has been flushed. Which means that typically
+ * walsenders will get woken up at the same time that the startup process
+ * will be - which means that by the time the logical walsender checks
+ * GetXLogReplayRecPtr() it's unlikely that the startup process already 
replayed
+ * the record and updated XLogCtl->lastReplayedEndRecPtr.
+ *
+ * The ConditionVariable XLogRecoveryCtl->replayedCV solves this corner case.
+ */
+ConditionVariable *
+getReplayedCV(void)
+{
+   return >replayedCV;
+}
diff --git a/src/backend/replication/walsender.c 
b/src/backend/replication/walsender.c
index 9662e316c9..8c8dbe812f 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1548,6 +1548,7 @@

Re: MultiXact\SLRU buffers configuration

2022-12-20 Thread Andrey Borodin

On Sat, Jul 23, 2022 at 1:48 AM Thomas Munro  wrote:
>
> On Sat, Jul 23, 2022 at 8:41 PM Andrey Borodin  wrote:
> > Thomas, do you still have any doubts? Or is it certain that SLRU will be 
> > replaced by any better subsystem in 16?
>
> Hi Andrey,
>
> Sorry for my lack of replies on this and the other SLRU thread -- I'm
> thinking and experimenting.  More soon.
>

Hi Thomas,

PostgreSQL 16 feature freeze is approaching again. Let's choose
something from possible solutions, even if the chosen one is
temporary.
Thank you!

Best regards, Andrey Borodin.

Re: Minimal logical decoding on standbys

2022-12-20 Thread Robert Haas

On Tue, Dec 20, 2022 at 1:19 PM Andres Freund  wrote:
> I don't understand what the "may*" or "*Possible" really are
> about. snapshotConflictHorizon is a conflict with a certain xid - there
> commonly won't be anything to conflict with. If there's a conflict in
> the logical-decoding-on-standby case, we won't be able to apply it only
> sometimes or such.

The way I was imagining it is that snapshotConflictHorizon tells us
whether there is a conflict with this record and then, if there is,
this new Boolean tells us whether it's relevant to logical decoding as
well.

> How about "affectsLogicalDecoding", "conflictsWithSlots" or
> "isCatalogRel" or such?

isCatalogRel seems fine to me.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Minimal logical decoding on standbys

2022-12-20 Thread Robert Haas

On Wed, Dec 14, 2022 at 12:48 PM Robert Haas  wrote:
> No?

Nope, I was wrong. The block reference data is stored in the WAL
record *before* the main data, so it was wrong to imagine (as I did)
that the alignment of the main data would affect the alignment of the
block data. If anything, it's the other way around. That means that
the only records where this patch could conceivably cause a problem
are those where something is stored in the main data after the main
struct. And there aren't many of those, because an awful lot of record
types have moved to using the block data.

I'm going to go through all the record types one by one before
commenting further.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: meson and tmp_install

2022-12-20 Thread Andres Freund

Hi,

On 2022-12-20 21:11:26 +0300, Nikita Malakhov wrote:
> Didn't know where to ask, so I've chosen this thread - there is no any
> documentation on meson build platform in PostgreSQL docs.

There is now:
https://www.postgresql.org/docs/devel/install-meson.html

Needs further work, but it's a start.

> Is this okay? For me it was a surprise when the meson platform was
> added

It's been discussed on the list for a year or so before it was
added. It's a large change, so unfortunately it's not something that I
could get done in a single day, with perfect docs from the get go.

Greetings,

Andres Freund

Re: Minimal logical decoding on standbys

2022-12-20 Thread Andres Freund

On 2022-12-16 11:38:33 -0500, Robert Haas wrote:
> On Fri, Dec 16, 2022 at 10:08 AM Drouvot, Bertrand
>  wrote:
> > > After 1489b1ce728 the name mayConflictInLogicalDecoding seems odd. Seems
> > > it should be a riff on snapshotConflictHorizon?
> >
> > Gotcha, what about logicalSnapshotConflictThreat?
> 
> logicalConflictPossible? checkDecodingConflict?
> 
> I think we should try to keep this to three words if we can. There's
> not likely to be enough value in a fourth word to make up for the
> downside of being more verbose.

I don't understand what the "may*" or "*Possible" really are
about. snapshotConflictHorizon is a conflict with a certain xid - there
commonly won't be anything to conflict with. If there's a conflict in
the logical-decoding-on-standby case, we won't be able to apply it only
sometimes or such.

How about "affectsLogicalDecoding", "conflictsWithSlots" or
"isCatalogRel" or such?

Greetings,

Andres Freund

Re: Checksum errors in pg_stat_database

2022-12-20 Thread Andres Freund

Hi,

On 2022-12-11 20:48:15 -0500, Tom Lane wrote:
> Michael Paquier  writes:
> > On Sun, Dec 11, 2022 at 04:51:49PM -0800, Andres Freund wrote:
> >> I think there's a good argument for starting to track some stats based on 
> >> the
> >> relfilenode, rather the oid, because it'd allow us to track e.g. the 
> >> number of
> >> writes for a relation too (we don't have the oid when writing out
> >> buffers). But that's a relatively large change...
> 
> > Yeah.  I was thinking among the lines of sync requests and sync
> > failures, as well.
> 
> I think a stats table indexed solely by relfilenode wouldn't be a great
> idea, because you'd lose all the stats about a table as soon as you
> do vacuum full/cluster/rewriting-alter-table/etc.

I don't think that'd be a huge issue - we already have code to keep some
stats as part of rewrites that change the oid of a relation. We could do
the same for rewrites that just change the relfilenode.

However, I'm not sure it's a good idea to keep the stats during
rewrites. Most rewrites end up not copying dead tuples, for example, so
keeping the old counts of updated tuples doesn't really make sense.

> Can we create two index structures over the same stats table entries,
> so you can look up by either relfilenode or OID?

We could likely do that, yes. I think we'd have one "stats body" and
multiple hash table entries pointing to one. The complicated bit would
likely be that we'd need some additional refcounting to know when
there's no references to the "stats body" left.

Greetings,

Andres Freund

Re: meson and tmp_install

2022-12-20 Thread Nikita Malakhov

Hi!

Didn't know where to ask, so I've chosen this thread - there is no any
documentation on meson build platform in PostgreSQL docs. Is this
okay? For me it was a surprise when the meson platform was added,
and I have had to spend some time sweeping through meson docs
when I'd added new source files, to build Postgres successfully.

Thanks!

On Tue, Dec 20, 2022 at 8:30 PM Andres Freund  wrote:

> Hi,
>
> On 2022-12-20 07:12:04 -0500, Andrew Dunstan wrote:
> > Yesterday when testing a patch I got annoyed when my test failed. I
> > tested it like this:
> >
> > meson test ldap_password_func/001_mutated_bindpasswd
> >
> > It turned out that I needed to do this:
> >
> > meson test tmp_install ldap_password_func/001_mutated_bindpasswd
> >
> > The Makefile equivalent ensures that you have a tmp_install without
> > having to request to explicitly. It would be nice if we could make meson
> > do the same thing, and also honor NO_TEMP_INSTALL if set.
>
> I would like that too, but there's no easy fix that doesn't have
> downsides as far as I am aware. We could make the temp install a build
> target that the tests depend on, but for historical reasons in meson
> that means that the 'all' target depends on temp-install. Which isn't
> great.
>
> My current thinking is that we should get away from needing the
> temporary install and instead allow to run the tests against the build
> directory itself. The temp-install adds a fair bit of overhead and
> failure potential. The only reason we need it is that a) initdb and a
> few other programs insist that postgres needs to be in the same
> directory b) contrib modules currently need to reside in one single
> directory.
>
> Greetings,
>
> Andres Freund
>
>
>

-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

Re: New strategies for freezing, advancing relfrozenxid early

2022-12-20 Thread Nikita Malakhov

Hi!

I'll try to apply this patch onto my branch with Pluggable TOAST to test
these mechanics with new TOAST. Would reply on the result. It could
be difficult though, because both have a lot of changes that affect
the same code.

>I'm not sure how much this would help with bloat. I suspect that it
>could make a big difference with the right workload. If you always
>need frequent autovacuums, just to deal with bloat, then there is
>never a good time to run an aggressive antiwraparound autovacuum. An
>aggressive AV will probably end up taking much longer than the typical
>autovacuum that deals with bloat. While the aggressive AV will remove
>as much bloat as any other AV, in theory, that might not help much. If
>the aggressive AV takes as long as (say) 5 regular autovacuums would
>have taken, and if you really needed those 5 separate autovacuums to
>run, just to deal with the bloat, then that's a real problem.  The
>aggressive AV effectively causes bloat with such a workload.



On Tue, Dec 20, 2022 at 12:01 PM Jeff Davis  wrote:

> On Sun, 2022-12-18 at 14:20 -0800, Peter Geoghegan wrote:
> > Attached is v10, which fixes this issue, but using a different
> > approach to the one I sketched here.
>
> In 0001, it's fairly straightforward rearrangement and looks like an
> improvement to me. I have a few complaints, but they are about pre-
> existing code that you moved around, and I like that you didn't
> editorialize too much while just moving code around. +1 from me.
>
>
> --
> Jeff Davis
> PostgreSQL Contributor Team - AWS
>
>
>
>
>

-- 
Regards,

--
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

Re: appendBinaryStringInfo stuff

2022-12-20 Thread Andres Freund

Hi,

On 2022-12-19 21:29:10 +1300, David Rowley wrote:
> On Mon, 19 Dec 2022 at 21:12, Andres Freund  wrote:
> > Perhaps we should make appendStringInfoString() a static inline function
> > - most compilers can compute strlen() of a constant string at compile
> > time.
> 
> I had wondered about that, but last time I looked into it there was a
> small increase in the size of the binary from doing it. Perhaps it
> does not matter, but it's something to consider.

I'd not be too worried about that in this case.

> Re-thinking, I wonder if we could use the same macro trick used in
> ereport_domain(). Something like:
> 
> #ifdef HAVE__BUILTIN_CONSTANT_P
> #define appendStringInfoString(str, s) \
> __builtin_constant_p(s) ? \
> appendBinaryStringInfo(str, s, sizeof(s) - 1) : \
> appendStringInfoStringInternal(str, s)
> #else
> #define appendStringInfoString(str, s) \
> appendStringInfoStringInternal(str, s)
> #endif
> 
> and rename the existing function to appendStringInfoStringInternal.
> 
> Because __builtin_constant_p is a known compile-time constant, it
> should be folded to just call the corresponding function during
> compilation.

Several compilers can optimize away repeated strlen() calls, even if the
string isn't a compile-time constant. So I'm not really convinced that
tying inlining-strlen to __builtin_constant_p() is a good ida.

> Just looking at the binary sizes for postgres. I see:
> 
> unpatched = 9972128 bytes
> inline function = 9990064 bytes

That seems acceptable to me.

> macro trick = 9984968 bytes
>
> I'm currently not sure why the macro trick increases the binary at
> all. I understand why the inline function does.

I think Tom's explanation is on point.

I've in the past looked at stringinfo.c being the bottleneck in a bunch
of places and concluded that we really need to remove the function call
in the happy path entirely - we should have an enlargeStringInfo() that
we can call externally iff needed and then implement the rest of
appendBinaryStringInfo() etc in an inline function.  That allows the
compiler to e.g. optimize out the repeated maintenance of the \0 write
etc.

Greetings,

Andres Freund

Re: Call lazy_check_wraparound_failsafe earlier for parallel vacuum

2022-12-20 Thread Imseih (AWS), Sami

Attached is a patch to check scanned pages rather
than blockno. 

Regards,

Sami Imseih
Amazon Web Services (AWS)




v1-0001-fixed-when-wraparound-failsafe-is-checked.patch
Description: v1-0001-fixed-when-wraparound-failsafe-is-checked.patch

Re: Inconsistency in reporting checkpointer stats

2022-12-20 Thread Andres Freund

On 2022-12-20 08:18:36 -0500, Robert Haas wrote:
> I think that the SLRU information is potentially useful, but mixing it
> with the information about regular buffers just seems confusing.

+1

At least for now, it'd be different if/when we manage to move SLRUs to
the main buffer pool.

- Andres

Re: meson files copyright

2022-12-20 Thread Andres Freund

Hi,

On 2022-12-19 10:20:45 -0500, Tom Lane wrote:
> Their comment density is pretty awful too --- maybe I'm just
> not used to meson, but they seem just about completely undocumented.
> And there's certainly been no effort to transfer the accumulated wisdom
> of the makefile comments (where it's still relevant, of course).

I did try to retain comments that seemed useful. E.g.

toplevel meson.build:

# The separate ldap_r library only exists in OpenLDAP < 2.5, and if we
# have 2.5 or later, we shouldn't even probe for ldap_r (we might find a
# library from a separate OpenLDAP installation).  The most reliable
# way to check that is to check for a function introduced in 2.5.
...
# We are after Embed's ldopts, but without the subset mentioned in
# Config's ccdlflags and ldflags.  (Those are the choices of those who
# built the Perl installation, which are not necessarily appropriate
# for building PostgreSQL.)
...
# Functions introduced in OpenSSL 1.1.0. We used to check for
# OPENSSL_VERSION_NUMBER, but that didn't work with 1.1.0, because LibreSSL
# defines OPENSSL_VERSION_NUMBER to claim version 2.0.0, even though it
# doesn't have these OpenSSL 1.1.0 functions. So check for individual
# functions.
...
# Check if the C compiler understands __builtin_$op_overflow(),
# and define HAVE__BUILTIN_OP_OVERFLOW if so.
#
# Check for the most complicated case, 64 bit multiplication, as a
# proxy for all of the operations.  To detect the case where the compiler
# knows the function but library support is missing, we must link not just
# compile, and store the results in global variables so the compiler doesn't
# optimize away the call.
...

src/backend/meson.build:
...
# As of 1/2010:
# The probes.o file is necessary for dtrace support on Solaris, and on recent
# versions of systemtap.  (Older systemtap releases just produce an empty
# file, but that's okay.)  However, macOS's dtrace doesn't use it and doesn't
# even recognize the -G option.  So, build probes.o except on macOS.
# This might need adjustment as other platforms add dtrace support.

I'm sure there are a lot of comments that could also have been useful
that I missed - but there's also a lot that just didn't seem
meaningful. E.g. stuff like

# The following targets are specified in make commands that appear in
# the make files in our subdirectories. Note that it's important we
# match the dependencies shown in the subdirectory makefiles!
# Also, in cases where a subdirectory makefile generates two files in
# what's really one step, such as bison producing both gram.h and gram.c,
# we must request making the one that is shown as the secondary (dependent)
# output, else the timestamp on it might be wrong.  By project convention,
# the .h file is the dependent one for bison output, so we need only request
# that; but in other cases, request both for safety.

which just doesn't apply to meson.

- Andres

Re: meson and tmp_install

2022-12-20 Thread Andres Freund

Hi,

On 2022-12-20 07:12:04 -0500, Andrew Dunstan wrote:
> Yesterday when testing a patch I got annoyed when my test failed. I
> tested it like this:
> 
> meson test ldap_password_func/001_mutated_bindpasswd
> 
> It turned out that I needed to do this:
> 
> meson test tmp_install ldap_password_func/001_mutated_bindpasswd
> 
> The Makefile equivalent ensures that you have a tmp_install without
> having to request to explicitly. It would be nice if we could make meson
> do the same thing, and also honor NO_TEMP_INSTALL if set.

I would like that too, but there's no easy fix that doesn't have
downsides as far as I am aware. We could make the temp install a build
target that the tests depend on, but for historical reasons in meson
that means that the 'all' target depends on temp-install. Which isn't
great.

My current thinking is that we should get away from needing the
temporary install and instead allow to run the tests against the build
directory itself. The temp-install adds a fair bit of overhead and
failure potential. The only reason we need it is that a) initdb and a
few other programs insist that postgres needs to be in the same
directory b) contrib modules currently need to reside in one single
directory.

Greetings,

Andres Freund

Re: meson files copyright

2022-12-20 Thread Andres Freund

On 2022-12-19 13:33:46 -0500, Robert Haas wrote:
> On Mon, Dec 19, 2022 at 1:03 PM Tom Lane  wrote:
> > Vik Fearing  writes:
> > > Perhaps a bit off-topic, but what is the point of the file identifiers?
> >
> > IMO, it helps to tell things apart when you've got a bunch of editor
> > windows open on some mighty samey-looking meson.build files.
> 
> On the other hand, maintaining those identification lines in all of
> our files has a pretty high distributed cost. I never use them to
> figure out what file I'm editing because my editor can tell me that.
> But I do have to keep fixing those lines as I create new files. It's
> not the most annoying thing ever, but I wouldn't mind a bit if I
> didn't have to do it any more.

+1

Re: Use get_call_result_type() more widely

2022-12-20 Thread Tom Lane

Bharath Rupireddy  writes:
> On Tue, Dec 20, 2022 at 1:41 PM Tom Lane  wrote:
>> Hmm ... at least one of the paths through internal_get_result_type
>> is intentionally blessing the result tupdesc:
>> but it's not clear if they all do, and the comments certainly
>> aren't promising it.

> It looks to be safe to get rid of BlessTupleDesc() after
> get_call_result_type() for the functions that have OUT parameters and
> return 'record' type.

I think it's an absolutely horrid idea for callers to depend on
such details of get_call_result_type's behavior --- especially
when there is no function documentation promising it.

If we want to do something here, the thing to do would be to
guarantee in get_call_result_type's API spec that any returned
tupledesc is blessed.  However, that might make some other
cases slower, if they don't need that.

On the whole, I'm content to leave the BlessTupleDesc calls in
these callers.  They are cheap enough if the tupdesc is already
blessed.

regards, tom lane

Re: appendBinaryStringInfo stuff

2022-12-20 Thread Andrew Dunstan



On 2022-12-19 Mo 17:48, David Rowley wrote:
> On Tue, 20 Dec 2022 at 11:42, Tom Lane  wrote:
>> I think Peter is entirely right to question whether *this* type's
>> output function is performance-critical.  Who's got large tables with
>> jsonpath columns?  It seems to me the type would mostly only exist
>> as constants within queries.
> The patch touches code in the path of jsonb's output function too. I
> don't think you could claim the same for that.
>

I agree that some of the uses in the jsonpath code could reasonably just
be converted to use appendStringInfoString()

There are 5 uses in the jsonb code where the length param is a compile
time constant:

andrew@ub22:adt $ grep appendBinary.*[0-9] jsonb*
jsonb.c:            appendBinaryStringInfo(out, "null", 4);
jsonb.c:                appendBinaryStringInfo(out, "true", 4);
jsonb.c:                appendBinaryStringInfo(out, "false", 5);
jsonb.c:                appendBinaryStringInfo(out, ": ", 2);
jsonb.c:            appendBinaryStringInfo(out, "    ", 4);

None of these really bother me much, TBH. In fact the last one is
arguably nicer because it tells you without counting how many spaces
there are.

Changing the type of the second argument to appendBinaryStringInfo to
void* seems reasonable.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Add LZ4 compression in pg_dump

2022-12-20 Thread Justin Pryzby

On Tue, Dec 20, 2022 at 11:19:15AM +, gkokola...@pm.me wrote:
> --- Original Message ---
> On Monday, December 19th, 2022 at 6:27 PM, Justin Pryzby 
>  wrote:
> > On Mon, Dec 19, 2022 at 05:03:21PM +, gkokola...@pm.me wrote:
> > 
> > > > > 001 still doesn't compile on freebsd, and 002 doesn't compile on
> > > > > windows. Have you checked test results from cirrusci on your private
> > > > > github account ?
> > > 
> > > There are still known gaps in 0002 and 0003, for example documentation,
> > > and I have not been focusing too much on those. You are right, it is 
> > > helpful
> > > and kind to try to reduce the noise. The attached should have hopefully
> > > tackled the ci errors.
> > 
> > 
> > Yep. Are you using cirrusci under your github account ?
> 
> Thank you. To be very honest, I am not using github exclusively to post 
> patches.
> Sometimes I do, sometimes I do not. Is github a requirement?

Github isn't a requirement for postgres (but cirrusci only supports
github).  I wasn't not trying to say that it's required, only trying to
make sure that you (and others) know that it's available, since our
cirrus.yml is relatively new.

> > > > > 002 breaks "pg_dump -Fc -Z2" because (I think) AllocateCompressor()
> > > > > doesn't store the passed-in compression_spec.
> > > 
> > > I am afraid I have not been able to reproduce this error. I tried both
> > > debian and freebsd after I addressed the compilation warnings. Which
> > > error did you get? Is it still present in the attached?
> > 
> > 
> > It's not that there's an error - it's that compression isn't working.
> > 
> > $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z1 -Fp regression |wc 
> > -c
> > 659956
> > $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z2 -Fp regression |wc 
> > -c
> > 637192
> > 
> > $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z1 -Fc regression |wc 
> > -c
> > 1954890
> > $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z2 -Fc regression |wc 
> > -c
> > 1954890
> > 
> 
> Thank you. Now I understand what you mean. Trying the same on top of v18-0003
> on Ubuntu 22.04 yields:

You're right; this seems to be fixed in v18.  Thanks.

It looks like I'd forgotten to run "meson test tmp_install", so had
retested v17...

-- 
Justin

Re: Avoid generating SSL certs for LDAP tests

2022-12-20 Thread Andrew Dunstan



On 2022-12-19 Mo 11:04, Andrew Dunstan wrote:
> On 2022-12-19 Mo 10:25, Tom Lane wrote:
>> Andrew Dunstan  writes:
>>> We don't generate SSL certificates for running the SSL tests, but
>>> instead use pregenerated certificates that are part of our source code.
>>> This patch applies the same policy to the LDAP tests, and in fact simply
>>> reuses certificates from the SSL test suite by copying them. It won't
>>> save much but it should save a handful of cycles at run time.
>> +1, but should there be a comment somewhere under test/ssl pointing
>> out this external use of the certs?
>
> OK, I'll find a place to mention that.


Done.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2022-12-20 Thread Melih Mutlu

Hi Amit,

Amit Kapila , 16 Ara 2022 Cum, 05:46 tarihinde
şunu yazdı:

> Right, but when the size is 100MB, it seems to be taking a bit more
> time. Do we want to evaluate with different sizes to see how it looks?
> Other than that all the numbers are good.
>

I did a similar testing with again 100MB and also 1GB this time.

 | 100 MB   | 1 GB
--
master  |  14761.425 ms   |  160932.982 ms
--
 patch   |  14398.408 ms   |  160593.078 ms

This time, it seems like the patch seems slightly faster than the master.
Not sure if we can say the patch slows things down (or speeds up) if the
size of tables increases.
The difference may be something arbitrary or caused by other factors. What
do you think?

I also wondered what happens when "max_sync_workers_per_subscription" is
set to 1.
Which means tablesync will be done sequentially in both cases but the patch
will use only one worker and one replication slot during the whole
tablesync process.
Here are the numbers for that case:

 | 100 MB  | 1 GB
--
master  |  27751.463 ms  |  312424.999 ms
--
 patch   |  27342.760 ms  |  310021.767 ms

Best,
-- 
Melih Mutlu
Microsoft

Re: code cleanups

2022-12-20 Thread Ranier Vilela

 Justin Pryzby  writes:
> Some modest cleanups I've accumulated

Hi Justin.

0001:
Regarding initializer {0}, the problem is still with old compilers, which
don't initialize exactly like memset.
Only more modern compilers fill in any "holes" that may exist.
This means that as old compilers are not supported, this will no longer be
a problem.
Fast and secure solution: memset

+1 to switch from loop to memset, same as many places in the code base.

- /* initialize nulls and values */
- for (i = 0; i < Natts_pg_constraint; i++)
- {
- nulls[i] = false;
- values[i] = (Datum) NULL;
- }
+ memset(nulls, false, sizeof(nulls));
+ memset(values, 0, sizeof(values));

What made me curious about this excerpt is that the Datum values are NULL,
but aren't nulls?
Would it not be?
+ memset(nulls, true, sizeof(nulls));
+ memset(values, 0, sizeof(values));

src/backend/tcop/pquery.c:
/* single format specified, use for all columns */
-int16 format1 = formats[0];
-
-for (i = 0; i < natts; i++)
-portal->formats[i] = format1;
+ memset(portal->formats, formats[0], natts * sizeof(*portal->formats));

0002:
contrib/sslinfo/sslinfo.c

memset is faster than intercalated stores.

src/backend/replication/logical/origin.c
+1
one store, is better than three.
but, should be:
- memset(nulls, 1, sizeof(nulls));
+memset(nulls, false, sizeof(nulls));

The correct style is false, not 0.

src/backend/utils/adt/misc.c:
-1
It got worse.
It's only one store, which could be avoided by the "continue" statement.

src/backend/utils/misc/pg_controldata.c:
+1
+memset(nulls, false, sizeof(nulls));

or
nulls[0] = false;
nulls[1] = false;
nulls[2] = false;
nulls[3] = false;

Bad style, intercalated stores are worse.


0003:
+1

But you should reduce the scope of vars:
RangeTblEntry *rte
Oid userid;

+ if (varno != relid)
+ {
+ RangeTblEntry *rte;
+ Oid userid;

0005:
+1

0006:
+1

regards,
Ranier Vilela

Re: Inconsistency in reporting checkpointer stats

2022-12-20 Thread Robert Haas

On Tue, Dec 20, 2022 at 8:03 AM Nitin Jadhav
 wrote:
> Thanks Robert for sharing your thoughts.
> My first thought was to just remove counting SLRU buffers, then after
> some more analysis, I found that the checkpointer is responsible for
> including both regular data buffers and SLRU buffers.

I know that, but what the checkpointer handles and what ought to be
included in the stats are two separate questions.

I think that the SLRU information is potentially useful, but mixing it
with the information about regular buffers just seems confusing.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Inconsistency in reporting checkpointer stats

2022-12-20 Thread Nitin Jadhav

> Presumably we could make this consistent either by counting SLRU
> writes in both places, or by counting them in neither place. This
> proposal would count them in both places. But why is that the right
> thing to do?
>
> I'm somewhat inclined to think that we should use "buffers" to mean
> regular data buffers, and if SLRU buffers also need to be counted, we
> ought to make that a separate counter. Or just leave it out
> altogether.
>
> This is arguable, though, for sure

Thanks Robert for sharing your thoughts.
My first thought was to just remove counting SLRU buffers, then after
some more analysis, I found that the checkpointer is responsible for
including both regular data buffers and SLRU buffers. Please refer to
dee663f7843902535a15ae366cede8b4089f1144 commit for more information.
The part of the commit message is included here [1] for quick
reference. Hence I concluded to keep the information and added an
increment to count SLRU buffers. I am not in favour of making this as
a separate counter as this can be treated as little low level
information and it just adds up in the stats. Please share your
thoughts.

[1]:
Hoist ProcessSyncRequests() up into CheckPointGuts() to make it clearer
that it applies to all the SLRU mini-buffer-pools as well as the main
buffer pool.  Rearrange things so that data collected in CheckpointStats
includes SLRU activity.

Thanks & Regards,
Nitin Jadhav

On Tue, Dec 20, 2022 at 2:38 AM Robert Haas  wrote:
>
> On Wed, Dec 14, 2022 at 2:32 AM Nitin Jadhav
>  wrote:
> > In order to fix this, the
> > PendingCheckpointerStats.buf_written_checkpoints should be incremented
> > in SlruInternalWritePage() similar to
> > CheckpointStats.ckpt_bufs_written. I have attached the patch for the
> > same. Please share your thoughts.
>
> Presumably we could make this consistent either by counting SLRU
> writes in both places, or by counting them in neither place. This
> proposal would count them in both places. But why is that the right
> thing to do?
>
> I'm somewhat inclined to think that we should use "buffers" to mean
> regular data buffers, and if SLRU buffers also need to be counted, we
> ought to make that a separate counter. Or just leave it out
> altogether.
>
> This is arguable, though, for sure
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com

Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

2022-12-20 Thread Bharath Rupireddy

On Tue, Dec 20, 2022 at 1:27 PM Magnus Hagander  wrote:
>
> On Tue, Dec 20, 2022 at 5:40 AM Michael Paquier  wrote:
>>
>> On Tue, Dec 20, 2022 at 09:01:02AM +0900, Michael Paquier wrote:
>> > Yeah, my mind was considering as well yesterday the addition of a note
>> > in the docs about something among these lines, so fine by me.
>>
>> And applied that, after tweaking a few tiny things on a last lookup
>> with a catversion bump.  Note that the example has been moved at the
>> bottom of the table for these functions, which is more consistent with
>> the surroundings.
>>
>
> Hi!
>
> Caught this thread late. To me, pg_dissect_walfile_name() is a really strange 
> name for a function. Grepping our I code I see the term dissect s used 
> somewhere inside the regex code and exactly zero instances elsewhere. Which 
> is why I definitely didn't recognize the term...
>
> Wouldn't something like pg_split_walfile_name() be a lot more consistent with 
> the rest of our names?

Hm. FWIW, here's the patch.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v1-0001-Rename-pg_dissect_walfile_name-to-pg_split_walfil.patch
Description: Binary data

Re: meson files copyright

2022-12-20 Thread Andrew Dunstan



On 2022-12-20 Tu 00:26, Noah Misch wrote:
> On Mon, Dec 19, 2022 at 09:09:25PM +0100, Peter Eisentraut wrote:
>> On 19.12.22 19:33, Robert Haas wrote:
>>> On Mon, Dec 19, 2022 at 1:03 PM Tom Lane  wrote:
 Vik Fearing  writes:
> Perhaps a bit off-topic, but what is the point of the file identifiers?
 IMO, it helps to tell things apart when you've got a bunch of editor
 windows open on some mighty samey-looking meson.build files.
>>> On the other hand, maintaining those identification lines in all of
>>> our files has a pretty high distributed cost. I never use them to
>>> figure out what file I'm editing because my editor can tell me that.
>>> But I do have to keep fixing those lines as I create new files. It's
>>> not the most annoying thing ever, but I wouldn't mind a bit if I
>>> didn't have to do it any more.
>> I agree it's not very useful and a bit annoying.
> Agreed.  To me, it's just one more thing to get wrong.


OK, I think there are enough objections that we can put that aside for
now, I will just go and add the copyright notices.


cheers


andew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

meson and tmp_install

2022-12-20 Thread Andrew Dunstan

Yesterday when testing a patch I got annoyed when my test failed. I
tested it like this:

meson test ldap_password_func/001_mutated_bindpasswd

It turned out that I needed to do this:

meson test tmp_install ldap_password_func/001_mutated_bindpasswd

The Makefile equivalent ensures that you have a tmp_install without
having to request to explicitly. It would be nice if we could make meson
do the same thing, and also honor NO_TEMP_INSTALL if set.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Allow batched insert during cross-partition updates

2022-12-20 Thread Amit Langote

On Tue, Dec 20, 2022 at 7:18 PM Etsuro Fujita  wrote:
> On Wed, Dec 14, 2022 at 10:29 PM Amit Langote  wrote:
> > On Wed, Dec 14, 2022 at 6:45 PM Etsuro Fujita  
> > wrote:
> > > One thing I noticed is this bit:
> > >
> > >  -- Clean up
> > > -DROP TABLE batch_table, batch_cp_upd_test, batch_table_p0,
> > > batch_table_p1 CASCADE;
> > > +DROP TABLE batch_table, batch_table_p0, batch_table_p1,
> > > batch_cp_upd_test, cmdlog CASCADE;
> > >
> > > This would be nitpicking, but this as-proposed will not remove remote
> > > tables created for foreign-table partitions of the partitioned table
> > > ‘batch_cp_upd_test’.  So I modified this a bit further to remove them
> > > as well.  Also, I split this into two, for readability.  Another thing
> > > is a typo in a test-case comment: s/a single INSERTs/a single INSERT/.
> > > I fixed it as well.  Other than that, the patch looks good to me.
> > > Attached is an updated patch.  If there are no objections, I will
> > > commit the patch.
> >
> > LGTM.
>
> Cool!  Pushed.

Thank you, Fujita-san.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Re: Add LZ4 compression in pg_dump

2022-12-20 Thread gkokolatos







--- Original Message ---
On Monday, December 19th, 2022 at 6:27 PM, Justin Pryzby  
wrote:


> 
> 
> On Mon, Dec 19, 2022 at 05:03:21PM +, gkokola...@pm.me wrote:
> 
> > > > 001 still doesn't compile on freebsd, and 002 doesn't compile on
> > > > windows. Have you checked test results from cirrusci on your private
> > > > github account ?
> > 
> > There are still known gaps in 0002 and 0003, for example documentation,
> > and I have not been focusing too much on those. You are right, it is helpful
> > and kind to try to reduce the noise. The attached should have hopefully
> > tackled the ci errors.
> 
> 
> Yep. Are you using cirrusci under your github account ?

Thank you. To be very honest, I am not using github exclusively to post patches.
Sometimes I do, sometimes I do not. Is github a requirement?

To answer your question, some of my github accounts are integrated with 
cirrusci,
others are not.

The current cfbot build is green for what is worth.
https://cirrus-ci.com/build/5934319840002048

> 
> > > FYI, I have re-added an entry to the CF app to get some automated
> > > coverage:
> > > https://commitfest.postgresql.org/41/3571/
> > 
> > Much obliged. Should I change the state to "ready for review" when post a
> > new version or should I leave that to the senior personnel?
> 
> 
> It's better to update it to reflect what you think its current status
> is. If you think it's ready for review.

Thank you.

> 
> > > > 002 breaks "pg_dump -Fc -Z2" because (I think) AllocateCompressor()
> > > > doesn't store the passed-in compression_spec.
> > 
> > I am afraid I have not been able to reproduce this error. I tried both
> > debian and freebsd after I addressed the compilation warnings. Which
> > error did you get? Is it still present in the attached?
> 
> 
> It's not that there's an error - it's that compression isn't working.
> 
> $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z1 -Fp regression |wc -c
> 659956
> $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z2 -Fp regression |wc -c
> 637192
> 
> $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z1 -Fc regression |wc -c
> 1954890
> $ ./tmp_install/usr/local/pgsql/bin/pg_dump -h /tmp -Z2 -Fc regression |wc -c
> 1954890
> 

Thank you. Now I understand what you mean. Trying the same on top of v18-0003
on Ubuntu 22.04 yields:

$ for compression in none gzip:1 gzip:6 gzip:9; do \
  pg_dump --format=custom --compress="$compression" -f 
regression."$compression".dump -d regression; \
  wc -c regression."$compression".dump; \
  done;
14963753 regression.none.dump
3600183 regression.gzip:1.dump
3223755 regression.gzip:6.dump
3196903 regression.gzip:9.dump

and on FreeBSD 13.1

$ for compression in none gzip:1 gzip:6 gzip:9; do \
  pg_dump --format=custom --compress="$compression" -f 
regression."$compression".dump -d regression; \
  wc -c regression."$compression".dump; \
  done;
14828822 regression.none.dump
3584304 regression.gzip:1.dump
3208548 regression.gzip:6.dump
3182044 regression.gzip:9.dump

Although there are some variations between the installations, within the same
installation the size of the dump file is shrinking as expected.

Investigating a bit further on the issue, you are correct in identifying an
issue in v17. Up until v16, the compressor function looked like:

+InitCompressorGzip(CompressorState *cs, int compressionLevel)
+{
+   GzipCompressorState *gzipcs;
+
+   cs->readData = ReadDataFromArchiveGzip;
+   cs->writeData = WriteDataToArchiveGzip;
+   cs->end = EndCompressorGzip;
+
+   gzipcs = (GzipCompressorState *) 
pg_malloc0(sizeof(GzipCompressorState));
+   gzipcs->compressionLevel = compressionLevel;

V17 considered that more options could become available in the future
and changed the signature of the relevant Init functions to:

+InitCompressorGzip(CompressorState *cs, const pg_compress_specification 
compression_spec)
+{
+   GzipCompressorState *gzipcs;
+
+   cs->readData = ReadDataFromArchiveGzip;
+   cs->writeData = WriteDataToArchiveGzip;
+   cs->end = EndCompressorGzip;
+
+   gzipcs = (GzipCompressorState *) 
pg_malloc0(sizeof(GzipCompressorState));
+

V18 reinstated the assignment in similar fashion to InitCompressorNone and 
InitCompressorLz4:

+void
+InitCompressorGzip(CompressorState *cs, const pg_compress_specification 
compression_spec)
+{
+   GzipCompressorState *gzipcs;
+
+   cs->readData = ReadDataFromArchiveGzip;
+   cs->writeData = WriteDataToArchiveGzip;
+   cs->end = EndCompressorGzip;
+
+   cs->compression_spec = compression_spec;
+
+   gzipcs = (GzipCompressorState *) 
pg_malloc0(sizeof(GzipCompressorState));

A test case can be added which performs a check similar to the loop above.
Create a custom dump with the least and most compression for each method.
Then verify that the output sizes differ as expected. This addition could
become 0001 in the current series.

Thoughts?

Cheers,
//Georgios

> --
>

Re: Use get_call_result_type() more widely

2022-12-20 Thread Bharath Rupireddy

On Tue, Dec 20, 2022 at 1:41 PM Tom Lane  wrote:
>
> Michael Paquier  writes:
> > On Mon, Dec 19, 2022 at 07:41:27PM +0530, Bharath Rupireddy wrote:
> >> 0002 - gets rid of an unnecessary call to BlessTupleDesc()
> >> after get_call_result_type().
>
> > Hmm.  I am not sure whether this is right, actually..
>
> Hmm ... at least one of the paths through internal_get_result_type
> is intentionally blessing the result tupdesc:
>
> if (tupdesc->tdtypeid == RECORDOID &&
> tupdesc->tdtypmod < 0)
> assign_record_type_typmod(tupdesc);
>
> but it's not clear if they all do, and the comments certainly
> aren't promising it.

It looks to be safe to get rid of BlessTupleDesc() after
get_call_result_type() for the functions that have OUT parameters and
return 'record' type. This is because, the
get_call_result_type()->internal_get_result_type()->build_function_result_tupdesc_t()
returns non-NULL tupdesc for such functions and all the functions that
0002 patch touches are having OUT parameters and their return type is
'record'. I've also verified with Assert(tupdesc->tdtypmod >= 0); -
https://github.com/BRupireddy/postgres/tree/test_for_tdypmod_init_v1.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Force streaming every change in logical decoding

2022-12-20 Thread Amit Kapila

On Tue, Dec 20, 2022 at 2:46 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear hackers,
>
> > We have discussed three different ways to provide GUC for these
> > features. (1) Have separate GUCs like force_server_stream_mode,
> > force_server_serialize_mode, force_client_serialize_mode (we can use
> > different names for these) for each of these; (2) Have two sets of
> > GUCs for server and client. We can have logical_decoding_mode with
> > values as 'stream' and 'serialize' for the server and then
> > logical_apply_serialize = true/false for the client. (3) Have one GUC
> > like logical_replication_mode with values as 'server_stream',
> > 'server_serialize', 'client_serialize'.
>
> I also agreed for adding new GUC parameters (and I have already done partially
> in parallel apply[1]), and basically options 2 made sense for me. But is it OK
> that we can choose "serialize" mode even if subscribers require streaming?
>
> Currently the reorder buffer transactions are serialized on publisher only 
> when
> the there are no streamable transaction. So what happen if the
> logical_decoding_mode = "serialize" but streaming option streaming is on? If 
> we
> break the first one and serialize changes on publisher anyway, it may be not
> suitable for testing the normal operation.
>

I think the change will be streamed as soon as the next change is
processed even if we serialize based on this option. See
ReorderBufferProcessPartialChange. However, I see your point that when
the streaming option is given, the value 'serialize' for this GUC may
not make much sense.

> Therefore, I came up with the variant of (2): logical_decoding_mode can be
> "normal" or "immediate".
>
> "normal" is a default value, which is same as current HEAD. Changes are 
> streamed
> or serialized when the buffered size exceeds logical_decoding_work_mem.
>
> When users set to "immediate", the walsenders starts to stream or serialize 
> all
> changes. The choice is depends on the subscription option.
>

The other possibility to achieve what you are saying is that we allow
a minimum value of logical_decoding_work_mem as 0 which would mean
stream or serialize each change depending on whether the streaming
option is enabled. I think we normally don't allow a minimum value
below a certain threshold for other *_work_mem parameters (like
maintenance_work_mem, work_mem), so we have followed the same here.
And, I think it makes sense from the user's perspective because below
a certain threshold it will just add overhead by either writing small
changes to the disk or by sending those over the network. However, it
can be quite useful for testing/debugging. So, not sure, if we should
restrict setting logical_decoding_work_mem below a certain threshold.
What do you think?

-- 
With Regards,
Amit Kapila.

Re: Allow batched insert during cross-partition updates

2022-12-20 Thread Etsuro Fujita

Hi Amit-san,

On Wed, Dec 14, 2022 at 10:29 PM Amit Langote  wrote:
> On Wed, Dec 14, 2022 at 6:45 PM Etsuro Fujita  wrote:
> > One thing I noticed is this bit:
> >
> >  -- Clean up
> > -DROP TABLE batch_table, batch_cp_upd_test, batch_table_p0,
> > batch_table_p1 CASCADE;
> > +DROP TABLE batch_table, batch_table_p0, batch_table_p1,
> > batch_cp_upd_test, cmdlog CASCADE;
> >
> > This would be nitpicking, but this as-proposed will not remove remote
> > tables created for foreign-table partitions of the partitioned table
> > ‘batch_cp_upd_test’.  So I modified this a bit further to remove them
> > as well.  Also, I split this into two, for readability.  Another thing
> > is a typo in a test-case comment: s/a single INSERTs/a single INSERT/.
> > I fixed it as well.  Other than that, the patch looks good to me.
> > Attached is an updated patch.  If there are no objections, I will
> > commit the patch.
>
> LGTM.

Cool!  Pushed.

Best regards,
Etsuro Fujita

RE: Perform streaming logical transactions by background workers and parallel apply

2022-12-20 Thread houzj.f...@fujitsu.com


On Monday, December 19, 2022 8:47 PMs Amit Kapila :
> 
> On Sat, Dec 17, 2022 at 7:34 PM houzj.f...@fujitsu.com
>  wrote:
> >
> > Agreed. I have addressed all the comments and did some cosmetic changes.
> > Attach the new version patch set.
> >
> 
> Few comments:
> 
> 1.
> + if (fileset_state == FS_SERIALIZE_IN_PROGRESS) {
> + pa_lock_stream(MyParallelShared->xid, AccessShareLock);
> + pa_unlock_stream(MyParallelShared->xid, AccessShareLock); }
> +
> + /*
> + * We cannot read the file immediately after the leader has serialized
> + all
> + * changes to the file because there may still be messages in the
> + memory
> + * queue. We will apply all spooled messages the next time we call this
> + * function, which should ensure that there are no messages left in the
> + * memory queue.
> + */
> + else if (fileset_state == FS_SERIALIZE_DONE) {
> 
> Once we have waited in the FS_SERIALIZE_IN_PROGRESS, the file state can be
> FS_SERIALIZE_DONE immediately after that. So, won't it be better to have a
> separate if block for FS_SERIALIZE_DONE state? If you agree to do so then we
> can probably remove the comment: "* XXX It is possible that immediately after
> we have waited for a lock in ...".

Changed and slightly adjust the comments.

> 2.
> +void
> +pa_decr_and_wait_stream_block(void)
> +{
> + Assert(am_parallel_apply_worker());
> +
> + if (pg_atomic_sub_fetch_u32(>pending_stream_count,
> + 1) == 0)
> 
> I think here the count can go negative when we are in serialize mode because
> we don't increase it for serialize mode. I can't see any problem due to that 
> but
> OTOH, this doesn't seem to be intended because in the future if we decide to
> implement the functionality of switching back to non-serialize mode, this 
> could
> be a problem. Also, I guess we don't even need to try locking/unlocking the
> stream lock in that case.
> One idea to avoid this is to check if the pending count is zero then if 
> file_set in
> not available raise an error (elog ERROR), otherwise, simply return from here.

Added the check.

> 
> 3. In apply_handle_stream_stop(), we are setting backendstate as idle for 
> cases
> TRANS_LEADER_SEND_TO_PARALLEL and TRANS_PARALLEL_APPLY. For other
> cases, it is set by stream_stop_internal. I think it would be better to set 
> the state
> explicitly for all cases to make the code look consistent and remove it from
> stream_stop_internal(). The other reason to remove setting the state from
> stream_stop_internal() is that when that function is invoked from other places
> like apply_handle_stream_commit(), it seems to be setting the idle before
> actually we reach the idle state.

Changed. Besides, I notice that the pgstat_report_activity in pa_stream_abort
for sub transaction is unnecessary since the state should be consistent with the
state set at last stream_stop, so I have removed that as well.

> 
> 4. Apart from the above, I have made a few changes in the comments, see
> attached.

Thanks, I have merged the patch.

Best regards,
Hou zj

Re: Minimal logical decoding on standbys

2022-12-20 Thread Drouvot, Bertrand


Hi,

On 12/16/22 6:24 PM, Drouvot, Bertrand wrote:

Hi,

On 12/16/22 5:38 PM, Robert Haas wrote:

On Fri, Dec 16, 2022 at 10:08 AM Drouvot, Bertrand
 wrote:

After 1489b1ce728 the name mayConflictInLogicalDecoding seems odd. Seems
it should be a riff on snapshotConflictHorizon?


Gotcha, what about logicalSnapshotConflictThreat?


logicalConflictPossible? checkDecodingConflict?

I think we should try to keep this to three words if we can. There's
not likely to be enough value in a fourth word to make up for the
downside of being more verbose.




Yeah agree, I'd vote for logicalConflictPossible then.



Please find attached v33 using logicalConflictPossible as the new field name 
instead of mayConflictInLogicalDecoding.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom 68890d7a0edcd997c0daab6e375a779656367797 Mon Sep 17 00:00:00 2001
From: bdrouvotAWS 
Date: Tue, 20 Dec 2022 08:38:56 +
Subject: [PATCH v33 6/6] Fixing Walsender corner case with logical decoding on
 standby.

The problem is that WalSndWaitForWal() waits for the *replay* LSN to
increase, but gets woken up by walreceiver when new WAL has been
flushed. Which means that typically walsenders will get woken up at the
same time that the startup process will be - which means that by the
time the logical walsender checks GetXLogReplayRecPtr() it's unlikely
that the startup process already replayed the record and updated
XLogCtl->lastReplayedEndRecPtr.

Introducing a new condition variable to fix this corner case.
---
 src/backend/access/transam/xlogrecovery.c | 28 
 src/backend/replication/walsender.c   | 31 +--
 src/backend/utils/activity/wait_event.c   |  3 +++
 src/include/access/xlogrecovery.h |  3 +++
 src/include/replication/walsender.h   |  1 +
 src/include/utils/wait_event.h|  1 +
 6 files changed, 59 insertions(+), 8 deletions(-)
  41.2% src/backend/access/transam/
  48.5% src/backend/replication/
   3.6% src/backend/utils/activity/
   3.4% src/include/access/

diff --git a/src/backend/access/transam/xlogrecovery.c 
b/src/backend/access/transam/xlogrecovery.c
index d5a81f9d83..ac8b169ab5 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -358,6 +358,9 @@ typedef struct XLogRecoveryCtlData
RecoveryPauseState recoveryPauseState;
ConditionVariable recoveryNotPausedCV;
 
+   /* Replay state (see getReplayedCV() for more explanation) */
+   ConditionVariable replayedCV;
+
slock_t info_lck;   /* locks shared variables shown 
above */
 } XLogRecoveryCtlData;
 
@@ -467,6 +470,7 @@ XLogRecoveryShmemInit(void)
SpinLockInit(>info_lck);
InitSharedLatch(>recoveryWakeupLatch);
ConditionVariableInit(>recoveryNotPausedCV);
+   ConditionVariableInit(>replayedCV);
 }
 
 /*
@@ -1916,6 +1920,11 @@ ApplyWalRecord(XLogReaderState *xlogreader, XLogRecord 
*record, TimeLineID *repl
XLogRecoveryCtl->lastReplayedTLI = *replayTLI;
SpinLockRelease(>info_lck);
 
+   /*
+* wake up walsender(s) used by logical decoding on standby.
+*/
+   ConditionVariableBroadcast(>replayedCV);
+
/*
 * If rm_redo called XLogRequestWalReceiverReply, then we wake up the
 * receiver so that it notices the updated lastReplayedEndRecPtr and 
sends
@@ -4916,3 +4925,22 @@ assign_recovery_target_xid(const char *newval, void 
*extra)
else
recoveryTarget = RECOVERY_TARGET_UNSET;
 }
+
+/*
+ * Return the ConditionVariable indicating that a replay has been done.
+ *
+ * This is needed for logical decoding on standby. Indeed the "problem" is that
+ * WalSndWaitForWal() waits for the *replay* LSN to increase, but gets woken up
+ * by walreceiver when new WAL has been flushed. Which means that typically
+ * walsenders will get woken up at the same time that the startup process
+ * will be - which means that by the time the logical walsender checks
+ * GetXLogReplayRecPtr() it's unlikely that the startup process already 
replayed
+ * the record and updated XLogCtl->lastReplayedEndRecPtr.
+ *
+ * The ConditionVariable XLogRecoveryCtl->replayedCV solves this corner case.
+ */
+ConditionVariable *
+getReplayedCV(void)
+{
+   return >replayedCV;
+}
diff --git a/src/backend/replication/walsender.c 
b/src/backend/replication/walsender.c
index 9662e316c9..8c8dbe812f 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1548,6 +1548,7 @@ WalSndWaitForWal(XLogRecPtr loc)
 {
int wakeEvents;
static XLogRecPtr RecentFlushPtr = InvalidXLogRecPtr;
+   ConditionVariable *replayedCV = getReplayedCV();
 
/*
 * Fast path to avoid acquiring the spinlock in case we already know we
@@ -1566,7 +1567,6 @@ WalSndWaitForWal(XLogRecPtr loc)
 
for

Re: Add enable_presorted_aggregate GUC

2022-12-20 Thread David Rowley

On Fri, 16 Dec 2022 at 12:47, David Rowley  wrote:
> Normally we add some enable_* GUC to leave an escape hatch when we add
> some new feature like this. I likely should have done that when I
> added 1349d279, but I didn't and I want to now.
>
> I mainly just shifted this discussion out of [1] as we normally like
> to debate GUC names and I feel that the discussion over on the other
> thread is buried a little too deep to be visible to most people.
>
> Can anyone think of a better name?  Or does anyone see error with my ambition?

I've now pushed this.

I'm still happy to consider other names if anyone has any good ideas,
but so far it's all quiet here, so I've assumed that's a good thing.

David

Container Types

2022-12-20 Thread Vik Fearing

The standard has several constructs for creating new types from other 
types.  I don't mean anything like CREATE TYPE here, I mean things like 
this:


- ROW(a, b, c), ()
- ARRAY[a, b, c], ()
- PERIOD(a, b), ()
- MULTISET[a, b, c], ()
- MDARRAY[x(1:3)][a, b, c], ()

I am not sure what magic we use for the row value constructor.  We 
handle ARRAY by creating an array type for every non-array type that is 
created.  Periods are very similar to range types and we have to create 
new functions such as int4range(a,b) and int8range(a,b) instead of some 
kind of generic RANGE(a, b, '[)') and not worrying about what the type 
is as long as there is a btree opclass for it.


Obviously there would have to be an actual type in order to store it in 
a table, but what I am most interested in here is being able to create 
them on the fly.  I do not think it is feasible to create N new types 
for every type like we do for arrays on the off-chance you would want to 
put it in a PERIOD for example.


For those who know the code much better than I do, what would be a 
plausible way forward to support these containers?

--
Vik Fearing

RE: Force streaming every change in logical decoding

2022-12-20 Thread Hayato Kuroda (Fujitsu)

Dear hackers,

> We have discussed three different ways to provide GUC for these
> features. (1) Have separate GUCs like force_server_stream_mode,
> force_server_serialize_mode, force_client_serialize_mode (we can use
> different names for these) for each of these; (2) Have two sets of
> GUCs for server and client. We can have logical_decoding_mode with
> values as 'stream' and 'serialize' for the server and then
> logical_apply_serialize = true/false for the client. (3) Have one GUC
> like logical_replication_mode with values as 'server_stream',
> 'server_serialize', 'client_serialize'.

I also agreed for adding new GUC parameters (and I have already done partially
in parallel apply[1]), and basically options 2 made sense for me. But is it OK
that we can choose "serialize" mode even if subscribers require streaming?

Currently the reorder buffer transactions are serialized on publisher only when
the there are no streamable transaction. So what happen if the
logical_decoding_mode = "serialize" but streaming option streaming is on? If we
break the first one and serialize changes on publisher anyway, it may be not
suitable for testing the normal operation.

Therefore, I came up with the variant of (2): logical_decoding_mode can be
"normal" or "immediate".

"normal" is a default value, which is same as current HEAD. Changes are streamed
or serialized when the buffered size exceeds logical_decoding_work_mem.

When users set to "immediate", the walsenders starts to stream or serialize all
changes. The choice is depends on the subscription option.


In short: +1 for (2) family.

[1]: 
https://www.postgresql.org/message-id/TYAPR01MB5866160DE81FA2D88B8F22DEF5159%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: Force streaming every change in logical decoding

2022-12-20 Thread Amit Kapila

On Tue, Dec 20, 2022 at 10:52 AM Dilip Kumar  wrote:
>
> On Wed, Dec 14, 2022 at 5:29 PM Amit Kapila  wrote:
> >
> > On Wed, Dec 14, 2022 at 2:15 PM shiy.f...@fujitsu.com
> >  wrote:
> > >
> > > Please see the attached patch. I also fix Peter's comments[1]. The GUC 
> > > name and
> > > design are still under discussion, so I didn't modify them.
> > >
> >
> > Let me summarize the discussion on name and design till now. As per my
> > understanding, we have three requirements: (a) In publisher, stream
> > each change of transaction instead of waiting till
> > logical_decoding_work_mem or commit; this will help us to reduce the
> > test timings of current and future tests for replication of
> > in-progress transactions; (b) In publisher, serialize each change
> > instead of waiting till logical_decoding_work_mem; this can help
> > reduce the test time of tests related to serialization of changes in
> > logical decoding; (c) In subscriber, during parallel apply for
> > in-progress transactions (a new feature being discussed at [1]) allow
> > the system to switch to serialize mode (no more space in shared memory
> > queue between leader and parallel apply worker either due to a
> > parallel worker being busy or waiting on some lock) while sending
> > changes.
> >
> > Having a GUC that controls these actions/features will allow us to
> > write tests with shorter duration and better predictability as
> > otherwise, they require a lot of changes. Apart from tests, these also
> > help to easily debug the required code. So they fit the Developer
> > Options category of GUC [2].
> >
> > We have discussed three different ways to provide GUC for these
> > features. (1) Have separate GUCs like force_server_stream_mode,
> > force_server_serialize_mode, force_client_serialize_mode (we can use
> > different names for these) for each of these; (2) Have two sets of
> > GUCs for server and client. We can have logical_decoding_mode with
> > values as 'stream' and 'serialize' for the server and then
> > logical_apply_serialize = true/false for the client. (3) Have one GUC
> > like logical_replication_mode with values as 'server_stream',
> > 'server_serialize', 'client_serialize'.
> >
> > The names used here are tentative mainly to explain each of the
> > options, we can use different names once we decide among the above.
> >
> > Thoughts?
>
> I think option 2 makes sense because stream/serialize are two related
> options and both are dependent on the same parameter
> (logical_decoding_work_mem) so having a single know is logical.  And
> another GUC for serializing logical apply.
>

Thanks for your input. Sawada-San, others, any preferences/suggestions?

-- 
With Regards,
Amit Kapila.

Re: Perform streaming logical transactions by background workers and parallel apply

2022-12-20 Thread Amit Kapila

On Mon, Dec 19, 2022 at 6:17 PM Amit Kapila  wrote:
>
> On Sat, Dec 17, 2022 at 7:34 PM houzj.f...@fujitsu.com
>  wrote:
> >
> > Agreed. I have addressed all the comments and did some cosmetic changes.
> > Attach the new version patch set.
> >
>
> Few comments:
> 
>

Few more minor points:
1.
-static inline void
+void
 changes_filename(char *path, Oid subid, TransactionId xid)
 {

This function seems to be used only in worker.c. So, what is the need
to make it extern?

2. I have made a few changes in the comments. See attached. This is
atop my yesterday's top-up patch.

I think we should merge the 0001 and 0002 patches as they need to be
committed together.

-- 
With Regards,
Amit Kapila.

changes_amit_2.patch
Description: Binary data

Re: New strategies for freezing, advancing relfrozenxid early

2022-12-20 Thread Jeff Davis

On Sun, 2022-12-18 at 14:20 -0800, Peter Geoghegan wrote:
> Attached is v10, which fixes this issue, but using a different
> approach to the one I sketched here.

In 0001, it's fairly straightforward rearrangement and looks like an
improvement to me. I have a few complaints, but they are about pre-
existing code that you moved around, and I like that you didn't
editorialize too much while just moving code around. +1 from me.

-- 
Jeff Davis
PostgreSQL Contributor Team - AWS

Re: slab allocator performance issues

2022-12-20 Thread David Rowley

On Tue, 20 Dec 2022 at 21:19, John Naylor  wrote:
>
>
> On Tue, Dec 20, 2022 at 10:36 AM David Rowley  wrote:
> >
> > I'm planning on pushing the attached v3 patch shortly. I've spent
> > several days reading over this and testing it in detail along with
> > adding additional features to the SlabCheck code to find more
> > inconsistencies.
>
> FWIW, I reran my test from last week and got similar results.

Thanks a lot for testing that stuff last week.  I got a bit engrossed
in the perf weirdness and forgot to reply.  I found they made much
more sense after using palloc0 and touching the allocated memory just
before freeing.  I think this is a more realistic test.

I've now pushed the patch after making a small adjustment to the
version I sent earlier.

David

Re: pgsql: Doc: Explain about Column List feature.

2022-12-20 Thread Alvaro Herrera

On 2022-Dec-20, Amit Kapila wrote:

> +   
> +Combining Column Lists from Multiple Subscriptions
> 
> Shouldn't the title be "Combining Column Lists from Multiple
> Publications"? We can define column lists while defining publications
> so the proposed title doesn't seem to be conveying the right thing.

Doh, of course.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
Bob [Floyd] used to say that he was planning to get a Ph.D. by the "green
stamp method," namely by saving envelopes addressed to him as 'Dr. Floyd'.
After collecting 500 such letters, he mused, a university somewhere in
Arizona would probably grant him a degree.  (Don Knuth)

Re: pgsql: Doc: Explain about Column List feature.

2022-12-20 Thread Alvaro Herrera

On 2022-Dec-20, Peter Smith wrote:

> If you change this warning title then it becomes the odd one out -
> every other warning in all the pg docs just says "Warning".  IMO
> maintaining consistency throughout is best. e.g. I can imagine maybe
> someone searching for "Warning" in the docs, and now they are not
> going to find this one.

Hmm, how do you propose that people search for warnings, and fail to
notice one that is not titled "Warning"?  In my mind, it is much more
likely that they scan a page visually until they hit a red box ("eh
look, a red box that I can ignore because its title is not Warning" does
not sound very credible).  On the other hand, if you're going over the
source .sgml files, you're going to grep for the warning tag, and that's
going to be there.

(Maybe you'd say somebody would grep for "" and not find this
one because the > is not there anymore.  I grant you that that could
happen.  But then they're doing it wrong already.  I don't think we need
to cater to that.)

Now, I did notice that we don't have any other titled warning boxes,
because I had a quick look at all the other warnings we have.  I was
surprised to find out that we have very few, which I think is good,
because warnings are annoying.  I was also surprised that most of them
are right not to have a title, because they are very quick one-para
boxes.  But I did find two others that should probably have a title:

https://www.postgresql.org/docs/15/app-pgrewind.html
Maybe "Failures while rewinding"

https://www.postgresql.org/docs/15/applevel-consistency.html
Maybe "Serializable Transactions and Data Replication"
(and patch it to mention logical replication)

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Re: slab allocator performance issues

2022-12-20 Thread John Naylor

On Tue, Dec 20, 2022 at 10:36 AM David Rowley  wrote:
>
> I'm planning on pushing the attached v3 patch shortly. I've spent
> several days reading over this and testing it in detail along with
> adding additional features to the SlabCheck code to find more
> inconsistencies.

FWIW, I reran my test from last week and got similar results.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Use get_call_result_type() more widely

2022-12-20 Thread Tom Lane

Michael Paquier  writes:
> On Mon, Dec 19, 2022 at 07:41:27PM +0530, Bharath Rupireddy wrote:
>> 0002 - gets rid of an unnecessary call to BlessTupleDesc()
>> after get_call_result_type().

> Hmm.  I am not sure whether this is right, actually..

Hmm ... at least one of the paths through internal_get_result_type
is intentionally blessing the result tupdesc:

if (tupdesc->tdtypeid == RECORDOID &&
tupdesc->tdtypmod < 0)
assign_record_type_typmod(tupdesc);

but it's not clear if they all do, and the comments certainly
aren't promising it.

I'd be in favor of making this a documented API promise,
but it isn't that right now.

regards, tom lane

92 matches

Mail list logo