Re: Switching XLog source from archive to streaming when primary available

2023-01-20 Thread Bharath Rupireddy
On Thu, Jan 19, 2023 at 6:20 AM Nathan Bossart  wrote:
>
> On Tue, Jan 17, 2023 at 07:44:52PM +0530, Bharath Rupireddy wrote:
> > On Thu, Jan 12, 2023 at 6:21 AM Nathan Bossart  
> > wrote:
> >> With your patch, we might replay one of these "old" files in pg_wal instead
> >> of the complete version of the file from the archives,
> >
> > That's true even today, without the patch, no? We're not changing the
> > existing behaviour of the state machine. Can you explain how it
> > happens with the patch?
>
> My point is that on HEAD, we will always prefer a complete archive file.
> With your patch, we might instead choose to replay an old file in pg_wal
> because we are artificially advancing the state machine.  IOW even if
> there's a complete archive available, we might not use it.  This is a
> behavior change, but I think it is okay.

Oh, yeah, I too agree that it's okay because manually copying WAL
files directly to pg_wal (which eventually get replayed before
switching to streaming) isn't recommended anyway for production level
servers. I think, we covered it in the documentation that it exhausts
all the WAL present in pg_wal before switching. Isn't that enough?

+Specifies amount of time after which standby attempts to switch WAL
+source from WAL archive to streaming replication (get WAL from
+primary). However, exhaust all the WAL present in pg_wal before
+switching. If the standby fails to switch to stream mode, it falls
+back to archive mode.

> >> Would you mind testing this scenario?
ndby should receive f6 via archive and replay it (check the
> > replay lsn an> >
>
> I meant testing the scenario where there's an old file in pg_wal, a
> complete file in the archives, and your new GUC forces replay of the
> former.  This might be difficult to do in a TAP test.  Ultimately, I just
> want to validate the assumptions discussed above.

I think testing the scenario [1] is achievable. I could write a TAP
test for it - 
https://github.com/BRupireddy/postgres/tree/prefer_archived_wal_v1.
It's a bit flaky and needs a little more work (1 - writing a custom
script for restore_command  that sleeps only after fetching an
existing WAL file from archive, not sleeping for a history file or a
non-existent WAL file. 2- finding a command-line way to sleep on
Windows.) to stabilize it, but it seems doable. I can spend some more
time, if one thinks that the test is worth adding to the core, perhaps
discussing it separately from this thread.

[1] RestoreArchivedFile():
/*
 * When doing archive recovery, we always prefer an archived log file even
 * if a file of the same name exists in XLOGDIR.  The reason is that the
 * file in XLOGDIR could be an old, un-filled or partly-filled version
 * that was copied and restored as part of backing up $PGDATA.
 *

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Change xl_hash_vacuum_one_page.ntuples from int to uint16

2023-01-20 Thread Drouvot, Bertrand

Hi,

On 1/20/23 9:01 PM, Nathan Bossart wrote:

On Tue, Jan 10, 2023 at 11:08:33AM +0100, Drouvot, Bertrand wrote:

While working on [1], I noticed that xl_hash_vacuum_one_page.ntuples is an int.

Unless I'm missing something, It seems to me that it would make more sense to 
use an uint16 (like this is done for
gistxlogDelete.ntodelete for example).


I think that is correct.  This value is determined by looping through
offsets, which are uint16 as well. 


Thanks for the review!


Should we also change the related
variables (e.g., ndeletable in _hash_vacuum_one_page()) to uint16?



Yeah, I thought about it too. What I saw is that there is other places that 
would be good candidates for the same
kind of changes (see the int ntodelete argument in gistXLogDelete() being 
assigned to gistxlogDelete.ntodelete (uint16) for example).

So, what do you think about:

1) keep this patch as it is (to "only" address the struct field and avoid possible future 
"useless" padding size increase)
and
2) create a new patch (once this one is committed) to align the types for 
variables/arguments with the structs (related to XLOG records) fields when they 
are not?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

2023-01-20 Thread Andres Freund
Hi,

On 2023-01-20 20:16:13 -0800, Andres Freund wrote:
> On 2023-01-18 14:05:35 +0100, David Geier wrote:
> > @Andres: will you take care of these changes and provide me with an updated
> > patch set so I can rebase the RDTSC changes?
> > Otherwise, I can also apply Tom suggestions to your patch set and send out
> > the complete patch set.
> 
> I'm planning to push most of my changes soon, had hoped to get to it a bit
> sooner, but ...

I pushed the int64-ification commits.


> If you have time to look at the pg_test_timing part, it'd be
> appreciated. That's a it larger, and nobody looked at it yet. So I'm a bit
> hesitant to push it.

I haven't yet pushed the pg_test_timing (nor it's small prerequisite)
patch.

Thanks to Justin I've polished the pg_test_timing docs some.


I've attached those two patches. Feel free to include them in your series if
you want, then the CF entry (and thus cfbot) makes sense again...

Greetings,

Andres Freund
>From 2546f3000455a7086ea930986b294fd79024ea59 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 20 Jan 2023 15:31:54 -0800
Subject: [PATCH v9 1/2] instr_time: Add INSTR_TIME_SET_SECONDS(),
 INSTR_TIME_IS_LT()

INSTR_TIME_SET_SECONDS() is useful to calculate the end of a time-bound loop
without having to convert into time units (which is
costly). INSTR_TIME_IS_LT() can be used to check the loop condition.

Author:
Reviewed-by:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/include/portability/instr_time.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/src/include/portability/instr_time.h b/src/include/portability/instr_time.h
index cc85138e21f..aab80effb00 100644
--- a/src/include/portability/instr_time.h
+++ b/src/include/portability/instr_time.h
@@ -15,6 +15,8 @@
  *
  * INSTR_TIME_IS_ZERO(t)			is t equal to zero?
  *
+ * INSTR_TIME_IS_LT(x, y)			x < y
+ *
  * INSTR_TIME_SET_ZERO(t)			set t to zero (memset is acceptable too)
  *
  * INSTR_TIME_SET_CURRENT(t)		set t to current time
@@ -22,6 +24,8 @@
  * INSTR_TIME_SET_CURRENT_LAZY(t)	set t to current time if t is zero,
  *	evaluates to whether t changed
  *
+ * INSTR_TIME_SET_SECONDS(t, s)		set t to s seconds
+ *
  * INSTR_TIME_ADD(x, y)x += y
  *
  * INSTR_TIME_SUBTRACT(x, y)		x -= y
@@ -122,6 +126,9 @@ pg_clock_gettime_ns(void)
 #define INSTR_TIME_SET_CURRENT(t) \
 	((t) = pg_clock_gettime_ns())
 
+#define INSTR_TIME_SET_SECONDS(t, s) \
+	((t).ticks = NS_PER_S * (s))
+
 #define INSTR_TIME_GET_NANOSEC(t) \
 	((int64) (t).ticks)
 
@@ -156,6 +163,9 @@ GetTimerFrequency(void)
 #define INSTR_TIME_SET_CURRENT(t) \
 	((t) = pg_query_performance_counter())
 
+#define INSTR_TIME_SET_SECONDS(t, s) \
+	((t).ticks = s * GetTimerFrequency())
+
 #define INSTR_TIME_GET_NANOSEC(t) \
 	((int64) ((t).ticks * ((double) NS_PER_S / GetTimerFrequency(
 
@@ -168,6 +178,8 @@ GetTimerFrequency(void)
 
 #define INSTR_TIME_IS_ZERO(t)	((t).ticks == 0)
 
+#define INSTR_TIME_IS_LT(x, y)	((x).ticks < (y).ticks)
+
 
 #define INSTR_TIME_SET_ZERO(t)	((t).ticks = 0)
 
-- 
2.38.0

>From a3149e3430c9594ca2ceb3039c954e729bd7c46e Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 16 Jan 2023 11:19:11 -0800
Subject: [PATCH v9 2/2] wip: report nanoseconds in pg_test_timing

This commit also updates pg_test_timing's documentation:

- compare EXPLAIN (ANALYZE, TIMING ON/OFF) instead of comparing performance of
  of statement with/without EXPLAIN ANALYZE
- explain the 2x overhead (due to two timestamp acquisitions per row)
- remove old section about old versions of linux - I couldn't update the
  numbers, and it's old enough nobody would care
---
 src/bin/pg_test_timing/pg_test_timing.c |  74 +--
 doc/src/sgml/ref/pgtesttiming.sgml  | 117 ++--
 2 files changed, 95 insertions(+), 96 deletions(-)

diff --git a/src/bin/pg_test_timing/pg_test_timing.c b/src/bin/pg_test_timing/pg_test_timing.c
index c29d6f87629..e20718669a5 100644
--- a/src/bin/pg_test_timing/pg_test_timing.c
+++ b/src/bin/pg_test_timing/pg_test_timing.c
@@ -19,8 +19,8 @@ static void handle_args(int argc, char *argv[]);
 static uint64 test_timing(unsigned int duration);
 static void output(uint64 loop_count);
 
-/* record duration in powers of 2 microseconds */
-long long int histogram[32];
+/* record duration in powers of 2 nanoseconds */
+uint64		histogram[64];
 
 int
 main(int argc, char *argv[])
@@ -121,35 +121,48 @@ handle_args(int argc, char *argv[])
 static uint64
 test_timing(unsigned int duration)
 {
-	uint64		total_time;
-	int64		time_elapsed = 0;
 	uint64		loop_count = 0;
-	uint64		prev,
-cur;
+	instr_time	until_time,
+total_time;
 	instr_time	start_time,
-end_time,
-temp;
-
-	total_time = duration > 0 ? duration * INT64CONST(100) : 0;
+end_time;
+	instr_time	cur;
 
 	INSTR_TIME_SET_CURRENT(start_time);
-	cur = INSTR_TIME_GET_MICROSEC(start_time);
 
-	while (time_elapsed < total_time)
+	/*
+	 * To reduce loop overhead, check loop condition in instr_time domain.
+	 */
+	

Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

2023-01-20 Thread Andres Freund
On 2023-01-20 22:50:37 -0600, Justin Pryzby wrote:
> On Fri, Jan 20, 2023 at 04:40:32PM -0800, Andres Freund wrote:
> > From 5a458d4584961dedd3f80a07d8faea66e57c5d94 Mon Sep 17 00:00:00 2001
> > From: Andres Freund 
> > Date: Mon, 16 Jan 2023 11:19:11 -0800
> > Subject: [PATCH v8 4/5] wip: report nanoseconds in pg_test_timing
> 
> >
> > -   The i7-860 system measured runs the count query in 9.8 ms while
> > -   the EXPLAIN ANALYZE version takes 16.6 ms, each
> > -   processing just over 100,000 rows.  That 6.8 ms difference means the 
> > timing
> > -   overhead per row is 68 ns, about twice what pg_test_timing estimated it
> > -   would be.  Even that relatively small amount of overhead is making the 
> > fully
> > -   timed count statement take almost 70% longer.  On more substantial 
> > queries,
> > -   the timing overhead would be less problematic.
> > +   The i9-9880H system measured shows an execution time of 4.116 ms for the
> > +   TIMING OFF query, and 6.965 ms for the
> > +   TIMING ON, each processing 100,000 rows.
> > +
> > +   That 2.849 ms difference means the timing overhead per row is 28 ns.  As
> > +   TIMING ON measures timestamps twice per row returned 
> > by
> > +   an executor node, the overhead is very close to what pg_test_timing
> > +   estimated it would be.
> > +
> > +   more than what pg_test_timing estimated it would be.  Even that 
> > relatively
> > +   small amount of overhead is making the fully timed count statement take
> > +   about 60% longer.  On more substantial queries, the timing overhead 
> > would
> > +   be less problematic.
> 
> I guess you intend to merge these two paragraphs ?

Oops. I was intending to drop the last paragraph.

Looking at the docs again I noticed that I needed to rephrase the 'acpi_pm'
section further, as I'd left the "a small multiple of what's measured directly
by this utility" language in there.

Do the changes otherwise make sense?

The "small multiple" stuff was just due to a) comparing "raw statement" with
explain analyze b) not accounting for two timestamps being taken per row.

I think it makes sense to remove the "jiffies" section - the output shown is
way outdated. And I don't think the jiffies time counter is one something
still sees in the wild, outside of bringing up a new cpu architecture or such.

Greetings,

Andres Freund




Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

2023-01-20 Thread Justin Pryzby
On Fri, Jan 20, 2023 at 04:40:32PM -0800, Andres Freund wrote:
> From 5a458d4584961dedd3f80a07d8faea66e57c5d94 Mon Sep 17 00:00:00 2001
> From: Andres Freund 
> Date: Mon, 16 Jan 2023 11:19:11 -0800
> Subject: [PATCH v8 4/5] wip: report nanoseconds in pg_test_timing

>
> -   The i7-860 system measured runs the count query in 9.8 ms while
> -   the EXPLAIN ANALYZE version takes 16.6 ms, each
> -   processing just over 100,000 rows.  That 6.8 ms difference means the 
> timing
> -   overhead per row is 68 ns, about twice what pg_test_timing estimated it
> -   would be.  Even that relatively small amount of overhead is making the 
> fully
> -   timed count statement take almost 70% longer.  On more substantial 
> queries,
> -   the timing overhead would be less problematic.
> +   The i9-9880H system measured shows an execution time of 4.116 ms for the
> +   TIMING OFF query, and 6.965 ms for the
> +   TIMING ON, each processing 100,000 rows.
> +
> +   That 2.849 ms difference means the timing overhead per row is 28 ns.  As
> +   TIMING ON measures timestamps twice per row returned by
> +   an executor node, the overhead is very close to what pg_test_timing
> +   estimated it would be.
> +
> +   more than what pg_test_timing estimated it would be.  Even that relatively
> +   small amount of overhead is making the fully timed count statement take
> +   about 60% longer.  On more substantial queries, the timing overhead would
> +   be less problematic.

I guess you intend to merge these two paragraphs ?




Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

2023-01-20 Thread Andres Freund
Hi,

On 2023-01-18 14:02:48 +0100, David Geier wrote:
> On 1/16/23 18:37, Andres Freund wrote:
> > I'm doubtful this is worth the complexity it incurs. By the time we convert
> > out of the instr_time format, the times shouldn't be small enough that the
> > accuracy is affected much.
>
> I don't feel strong about it and you have a point that we most likely only
> convert ones we've accumulated a fair amount of cycles.

I think we can avoid the issue another way. The inaccuracy comes from the
cycles_to_sec ending up very small, right? Right now your patch has (and
probably my old version similarly had):

cycles_to_sec = 1.0 / (tsc_freq * 1000);

I think it's better if we have one multiplier to convert cycles to nanoseconds
- that'll be a double comparatively close to 1. We can use that to implement
INSTR_TIME_GET_NANOSECONDS(). The conversion to microseconds then is just a
division by 1000 (which most compilers convert into a multiplication/shift
combo), and the conversions to milliseconds and seconds will be similar.

Because we'll never "wrongly" go into the "huge number" or "very small number"
ranges, that should provide sufficient precision? We'll of course still end up
with a very small number when converting a few nanoseconds to seconds, but
that's ok because it's the precision being asked for, instead of loosing
precision in some intermediate representation.


> > Looking around, most of the existing uses of INSTR_TIME_GET_MICROSEC()
> > actually accumulate themselves, and should instead keep things in the
> > instr_time format and convert later. We'd win more accuracy / speed that 
> > way.
> > 
> > I don't think the introduction of pg_time_usec_t was a great idea, but oh
> > well.
> Fully agreed. Why not replacing pg_time_usec_t with instr_time in a separate
> patch?

pgbench used to use instr_time, but it was replaced by somebody thinking the
API is too cumbersome. Which I can't quite deny, even though I think the
specific change isn't great.

But yes, this should definitely be a separate patch.


Greetings,

Andres Freund




Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

2023-01-20 Thread Andres Freund
Hi,

On 2023-01-18 14:05:35 +0100, David Geier wrote:
> @Andres: will you take care of these changes and provide me with an updated
> patch set so I can rebase the RDTSC changes?
> Otherwise, I can also apply Tom suggestions to your patch set and send out
> the complete patch set.

I'm planning to push most of my changes soon, had hoped to get to it a bit
sooner, but ...

If you have time to look at the pg_test_timing part, it'd be
appreciated. That's a it larger, and nobody looked at it yet. So I'm a bit
hesitant to push it.

Greetings,

Andres Freund




Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

2023-01-20 Thread Andres Freund
Hi,

On 2023-01-20 07:43:00 +0100, David Geier wrote:
> On 1/18/23 13:52, David Geier wrote:
> > On 1/16/23 21:39, Pavel Stehule wrote:
> > > 
> > > po 16. 1. 2023 v 21:34 odesílatel Tomas Vondra
> > >  napsal:
> > > 
> > >     Hi,
> > > 
> > >     there's minor bitrot in the Mkvcbuild.pm change, making cfbot
> > > unhappy.
> > > 
> > >     As for the patch, I don't have much comments. I'm wondering if
> > > it'd be
> > >     useful to indicate which timing source was actually used for EXPLAIN
> > >     ANALYZE, say something like:
> > > 
> > >      Planning time: 0.197 ms
> > >      Execution time: 0.225 ms
> > >      Timing source: clock_gettime (or tsc)
> > > 
> > > +1
> > 
> > I like the idea of exposing the timing source in the EXPLAIN ANALYZE
> > output.
> > It's a good tradeoff between inspectability and effort, given that RDTSC
> > should always be better to use.
> > If there are no objections I go this way.
> Thinking about this a little more made me realize that this will cause
> different pg_regress output depending on the platform. So if we go this
> route we would at least need an option for EXPLAIN ANALYZE to disable it. Or
> rather have it disabled by default and allow for enabling it. Thoughts?

The elapsed time is already inherently unstable, so we shouldn't have any test
output showing the time.

But I doubt showing it in every explain is a good idea - we use instr_time in
plenty of other places. Why show it in explain, but not in all those other
places?

Greetings,

Andres Freund




Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

2023-01-20 Thread Andres Freund
Hi,

On 2023-01-19 11:47:49 +0100, David Geier wrote:
> > I also couldn't help and hacked a bit on the rdtsc pieces. I did figure out
> > how to do the cycles->nanosecond conversion with integer shift and multiply 
> > in
> > the common case, which does show a noticable speedup. But that's for another
> > day.
> I also have code for that here. I decided against integrating it because we
> don't convert frequently enough to make it matter. Or am I missing
> something?

We do currently do the conversion quite frequently.  Admittedly I was
partially motivated by trying to get the per-loop overhead in pg_test_timing
down ;)

But I think it's a real issue. Places where we do, but shouldn't, convert:

- ExecReScan() - quite painful, we can end up with a lot of those
- InstrStopNode() - adds a good bit of overhead to simple
- PendingWalStats.wal_write_time - this is particularly bad because it happens
  within very contended code
- calls to pgstat_count_buffer_read_time(), pgstat_count_buffer_write_time() -
  they can be very frequent
- pgbench.c, as we already discussed
- pg_stat_statements.c
- ...

These all will get a bit slower when moving to a "variable" frequency.


What was your approach for avoiding the costly operation?  I ended up with a
integer multiplication + shift approximation for the floating point
multiplication (which in turn uses the inverse of the division by the
frequency). To allow for sufficient precision while also avoiding overflows, I
had to make that branch conditional, with a slow path for large numbers of
nanoseconds.


> > I fought a bit with myself about whether to send those patches in this 
> > thread,
> > because it'll take over the CF entry. But decided that it's ok, given that
> > David's patches should be rebased over these anyway?
> That's alright.
> Though, I would hope we attempt to bring your patch set as well as the RDTSC
> patch set in.

I think it'd be great - but I'm not sure we're there yet, reliability and
code-complexity wise.

I think it might be worth makign the rdts aspect somewhat
measurable. E.g. allowing pg_test_timing to use both at the same time, and
have it compare elapsed time with both sources of counters.

Greetings,

Andres Freund




Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

2023-01-20 Thread Andres Freund
Hi,

On 2023-01-20 22:27:07 -0500, Tom Lane wrote:
> Andres Freund  writes:
> >> Perhaps an INSTR_TIME_ZERO() that could be assigned in variable definitions
> >> could give us the best of both worlds?
> 
> > I tried that in the attached 0005. I found that it reads better if I also 
> > add
> > INSTR_TIME_CURRENT(). If we decide to go for this, I'd roll it into 0001
> > instead, but I wanted to get agreement on it first.
> 
> -1 from here.  This forecloses the possibility that it's best to use more
> than one assignment to initialize the value, and the code doesn't read
> any better than it did before.

I think it does read a bit better, but it's a pretty small improvement. So
I'll leave this aspect be for now.

Thanks for checking.

Greetings,

Andres Freund




Re: Generating code for query jumbling through gen_node_support.pl

2023-01-20 Thread Michael Paquier
On Fri, Jan 20, 2023 at 11:56:00AM +0100, Peter Eisentraut wrote:
> In your 0001 patch, most of the comment reformattings for location fields
> are no longer needed, so you should undo those.
> 
> The 0002 patch looks good.

Okay, I have gone through these two again and applied what I had.
0001 has been cleaned up of the extra comment moves for the
locations.  Now, I have kept a few changes for some of the nodes to
have some consistency with the other fields, in the case where most of
the fields at the end of the structures have to be marked with new
node attributes.  This made the style of the header a bit more
elegant, IMV.

> I'll read the 0003 again more carefully.  I haven't studied the new 0004
> yet.

Thanks, again.  Rebased version attached.
--
Michael
From 977214a0c7b175efe0deae1847d7aa33ad66c120 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Sat, 21 Jan 2023 12:29:26 +0900
Subject: [PATCH v5 3/4] Support for automated query jumble with all Nodes

This applies query jumbling in a consistent way to all the Nodes,
including DDLs & friends.
---
 src/include/nodes/bitmapset.h |   2 +-
 src/include/nodes/nodes.h |   7 +
 src/include/nodes/parsenodes.h| 126 ++--
 src/include/nodes/primnodes.h | 269 
 src/backend/nodes/README  |   1 +
 src/backend/nodes/gen_node_support.pl | 105 +++-
 src/backend/nodes/meson.build |   2 +-
 src/backend/nodes/queryjumblefuncs.c  | 855 ++
 8 files changed, 509 insertions(+), 858 deletions(-)

diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index 0dca6bc5fa..3d2225e1ae 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -50,7 +50,7 @@ typedef int32 signedbitmapword; /* must be the matching signed type */
 
 typedef struct Bitmapset
 {
-	pg_node_attr(custom_copy_equal, special_read_write)
+	pg_node_attr(custom_copy_equal, special_read_write, no_query_jumble)
 
 	NodeTag		type;
 	int			nwords;			/* number of words in array */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 10752e8011..6ecd944a90 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -59,6 +59,8 @@ typedef enum NodeTag
  *
  * - no_copy_equal: Shorthand for both no_copy and no_equal.
  *
+ * - no_query_jumble: Does not support jumble() at all.
+ *
  * - no_read: Does not support nodeRead() at all.
  *
  * - nodetag_only: Does not support copyObject(), equal(), outNode(),
@@ -97,6 +99,11 @@ typedef enum NodeTag
  * - equal_ignore_if_zero: Ignore the field for equality if it is zero.
  *   (Otherwise, compare normally.)
  *
+ * - query_jumble_ignore: Ignore the field for the query jumbling.
+ *
+ * - query_jumble_location: Mark the field as a location to track.  This is
+ *   only allowed for integer fields that include "location" in their name.
+ *
  * - read_as(VALUE): In nodeRead(), replace the field's value with VALUE.
  *
  * - read_write_ignore: Ignore the field for read/write.  This is only allowed
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 89335d95e7..12da5c120e 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -116,6 +116,11 @@ typedef uint64 AclMode;			/* a bitmask of privilege bits */
  *
  *	  Planning converts a Query tree into a Plan tree headed by a PlannedStmt
  *	  node --- the Query structure is not used by the executor.
+ *
+ *	  All the fields ignored for the query jumbling are not semantically
+ *	  significant (such as alias names), as is ignored anything that can
+ *	  be deduced from child nodes (else we'd just be double-hashing that
+ *	  piece of information).
  */
 typedef struct Query
 {
@@ -124,45 +129,47 @@ typedef struct Query
 	CmdType		commandType;	/* select|insert|update|delete|merge|utility */
 
 	/* where did I come from? */
-	QuerySource querySource;
+	QuerySource querySource pg_node_attr(query_jumble_ignore);
 
 	/*
 	 * query identifier (can be set by plugins); ignored for equal, as it
-	 * might not be set; also not stored
+	 * might not be set; also not stored.  This is the result of the query
+	 * jumble, hence ignored.
 	 */
-	uint64		queryId pg_node_attr(equal_ignore, read_write_ignore, read_as(0));
+	uint64		queryId pg_node_attr(equal_ignore, query_jumble_ignore, read_write_ignore, read_as(0));
 
 	/* do I set the command result tag? */
-	bool		canSetTag;
+	bool		canSetTag pg_node_attr(query_jumble_ignore);
 
 	Node	   *utilityStmt;	/* non-null if commandType == CMD_UTILITY */
 
 	/*
 	 * rtable index of target relation for INSERT/UPDATE/DELETE/MERGE; 0 for
-	 * SELECT.
+	 * SELECT.  This is ignored in the query jumble as unrelated to the
+	 * compilation of the query ID.
 	 */
-	int			resultRelation;
+	int			resultRelation pg_node_attr(query_jumble_ignore);
 
 	/* has aggregates in tlist or havingQual */
-	bool		hasAggs;
+	bool		hasAggs pg_node_attr(query_jumble_ignore);
 	/* has window functions in 

Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

2023-01-20 Thread Tom Lane
Andres Freund  writes:
>> Perhaps an INSTR_TIME_ZERO() that could be assigned in variable definitions
>> could give us the best of both worlds?

> I tried that in the attached 0005. I found that it reads better if I also add
> INSTR_TIME_CURRENT(). If we decide to go for this, I'd roll it into 0001
> instead, but I wanted to get agreement on it first.

-1 from here.  This forecloses the possibility that it's best to use more
than one assignment to initialize the value, and the code doesn't read
any better than it did before.

regards, tom lane




Re: postgres_fdw uninterruptible during connection establishment / ProcSignalBarrier

2023-01-20 Thread Andres Freund
Hi,

On 2023-01-14 14:45:06 +1300, Thomas Munro wrote:
> On Tue, Jan 10, 2023 at 3:05 PM Andres Freund  wrote:
> > On 2023-01-03 12:05:20 -0800, Andres Freund wrote:
> > > The attached patch adds libpq-be-fe-helpers.h and uses it in 
> > > libpqwalreceiver,
> > > dblink, postgres_fdw.
> >
> > > As I made libpq-be-fe-helpers.h handle reserving external fds,
> > > libpqwalreceiver now does so. I briefly looked through its users without
> > > seeing cases of leaking in case of errors - which would already have been 
> > > bad,
> > > since we'd already have leaked a libpq connection/socket.
> > >
> > >
> > > Given the lack of field complaints and the size of the required changes, I
> > > don't think we should backpatch this, even though it's pretty clearly 
> > > buggy
> > > as-is.
> >
> > Any comments on this? Otherwise I think I'll go with this approach.
> 
> +1.  Not totally convinced about the location but we are free to
> re-organise it any time, and the random CI failures are bad.

Cool.

Updated patch attached. I split it into multiple pieces.
1) A fix for [1], included here because I encountered it while testing
2) Introduction of libpq-be-fe-helpers.h
3) Convert dblink and postgres_fdw to the helper
4) Convert libpqwalreceiver.c to the helper

Even if we eventually decide to backpatch 3), we'd likely not backpatch 4), as
there's no bug (although perhaps the lack of FD handling could be called a
bug?).

There's also some light polishing, improving commit message, comments and
moving some internal helper functions to later in the file.


Greetings,

Andres Freund

[1] https://postgr.es/m/20230121011237.q52apbvlarfv6jm6%40awork3.anarazel.de
>From da9345151423180732d2928fe7fe6ff0b6a11d4d Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 20 Jan 2023 18:27:05 -0800
Subject: [PATCH v3 1/4] Fix error handling in libpqrcv_connect()

Author:
Reviewed-by:
Discussion: https://postgr.es/m/20230121011237.q52apbvlarfv6...@awork3.anarazel.de
Backpatch:
---
 .../libpqwalreceiver/libpqwalreceiver.c   | 26 +++
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index c40c6220db8..fefc8660259 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -175,10 +175,7 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname,
 	conn->streamConn = PQconnectStartParams(keys, vals,
 			 /* expand_dbname = */ true);
 	if (PQstatus(conn->streamConn) == CONNECTION_BAD)
-	{
-		*err = pchomp(PQerrorMessage(conn->streamConn));
-		return NULL;
-	}
+		goto bad_connection_errmsg;
 
 	/*
 	 * Poll connection until we have OK or FAILED status.
@@ -220,10 +217,7 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname,
 	} while (status != PGRES_POLLING_OK && status != PGRES_POLLING_FAILED);
 
 	if (PQstatus(conn->streamConn) != CONNECTION_OK)
-	{
-		*err = pchomp(PQerrorMessage(conn->streamConn));
-		return NULL;
-	}
+		goto bad_connection_errmsg;
 
 	if (logical)
 	{
@@ -234,9 +228,9 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname,
 		if (PQresultStatus(res) != PGRES_TUPLES_OK)
 		{
 			PQclear(res);
-			ereport(ERROR,
-	(errmsg("could not clear search path: %s",
-			pchomp(PQerrorMessage(conn->streamConn);
+			*err = psprintf(_("could not clear search path: %s"),
+			pchomp(PQerrorMessage(conn->streamConn)));
+			goto bad_connection;
 		}
 		PQclear(res);
 	}
@@ -244,6 +238,16 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname,
 	conn->logical = logical;
 
 	return conn;
+
+	/* error path, using libpq's error message */
+bad_connection_errmsg:
+	*err = pchomp(PQerrorMessage(conn->streamConn));
+
+	/* error path, error already set */
+bad_connection:
+	PQfinish(conn->streamConn);
+	pfree(conn);
+	return NULL;
 }
 
 /*
-- 
2.38.0

>From 61531d075fc385bd22f7ec4cdb38e87da8aaec6d Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 20 Jan 2023 17:15:49 -0800
Subject: [PATCH v3 2/4] Add helper library for use of libpq inside the server
 environment

Currently dblink and postgres_fdw don't process interrupts during connection
establishment. Besides preventing query cancellations etc, this can lead to
undetected deadlocks, as global barriers are not processed.

Libpqwalreceiver in contrast, processes interrupts during connection
establishment. The required code is not trivial, so duplicating it into
additional places does not seem like a good option.

These aforementioned undetected deadlocks are the reason for the spate of CI
test failures in the FreeBSD 'test_running' step.

For now the helper library is just a header, as it needs to be linked into
each extension using libpq, and it seems too small to be worth adding a
dedicated static library for.

The conversion to the helper are 

Re: libpqrcv_connect() leaks PGconn

2023-01-20 Thread Andres Freund
Hi,

On 2023-01-20 17:12:37 -0800, Andres Freund wrote:
> We have code like this in libpqrcv_connect():
> 
>   conn = palloc0(sizeof(WalReceiverConn));
>   conn->streamConn = PQconnectStartParams(keys, vals,
>   
>  /* expand_dbname = */ true);
>   if (PQstatus(conn->streamConn) == CONNECTION_BAD)
>   {
>   *err = pchomp(PQerrorMessage(conn->streamConn));
>   return NULL;
>   }
> 
> [try to establish connection]
> 
>   if (PQstatus(conn->streamConn) != CONNECTION_OK)
>   {
>   *err = pchomp(PQerrorMessage(conn->streamConn));
>   return NULL;
>   }
> 
> 
> Am I missing something, or are we leaking the libpq connection in case of
> errors?
> 
> It doesn't matter really for walreceiver, since it will exit anyway, but we
> also use libpqwalreceiver for logical replication, where it might?
> 
> 
> Seems pretty clear that we should do a PQfinish() before returning NULL? I
> lean towards thinking that this isn't worth backpatching given the current
> uses of libpq, but I could easily be convinced otherwise.
> 

It's bit worse than I earlier thought: We use walrv_connect() during CREATE
SUBSCRIPTION. One can easily exhaust file descriptors right now.  So I think
we need to fix this.

I also noticed the following in libpqrcv_connect, added in 11da97024abb:

if (logical)
{
PGresult   *res;

res = libpqrcv_PQexec(conn->streamConn,
  
ALWAYS_SECURE_SEARCH_PATH_SQL);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
PQclear(res);
ereport(ERROR,
(errmsg("could not clear search path: 
%s",

pchomp(PQerrorMessage(conn->streamConn);
}
PQclear(res);
}


Which doesn't seem quite right? The comment for the function says:

 * Returns NULL on error and fills the err with palloc'ed error message.

which this doesn't do. Of course we don't expect this to fail, but network
issues etc could still lead us to hit this case.  In this case we'll actually
have an open libpq connection around that we'll leak.


The attached patch fixes both issues.



I seems we don't have any tests for creating a subscription that fails during
connection establishment? That doesn't seem optimal - I guess there may have
been concern around portability of the error messages? I think we can control
for that in a tap test, by failing to connect due to a non-existant database,
then the error is under our control. Whereas e.g. an invalid hostname would
contain an error from gai_strerror().


Greetings,

Andres Freund
>From da9345151423180732d2928fe7fe6ff0b6a11d4d Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 20 Jan 2023 18:27:05 -0800
Subject: [PATCH v3 1/4] Fix error handling in libpqrcv_connect()

Author:
Reviewed-by:
Discussion: https://postgr.es/m/20230121011237.q52apbvlarfv6...@awork3.anarazel.de
Backpatch:
---
 .../libpqwalreceiver/libpqwalreceiver.c   | 26 +++
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index c40c6220db8..fefc8660259 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -175,10 +175,7 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname,
 	conn->streamConn = PQconnectStartParams(keys, vals,
 			 /* expand_dbname = */ true);
 	if (PQstatus(conn->streamConn) == CONNECTION_BAD)
-	{
-		*err = pchomp(PQerrorMessage(conn->streamConn));
-		return NULL;
-	}
+		goto bad_connection_errmsg;
 
 	/*
 	 * Poll connection until we have OK or FAILED status.
@@ -220,10 +217,7 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname,
 	} while (status != PGRES_POLLING_OK && status != PGRES_POLLING_FAILED);
 
 	if (PQstatus(conn->streamConn) != CONNECTION_OK)
-	{
-		*err = pchomp(PQerrorMessage(conn->streamConn));
-		return NULL;
-	}
+		goto bad_connection_errmsg;
 
 	if (logical)
 	{
@@ -234,9 +228,9 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname,
 		if (PQresultStatus(res) != PGRES_TUPLES_OK)
 		{
 			PQclear(res);
-			ereport(ERROR,
-	(errmsg("could not clear search path: %s",
-			pchomp(PQerrorMessage(conn->streamConn);
+			*err = psprintf(_("could not clear search path: %s"),
+			pchomp(PQerrorMessage(conn->streamConn)));
+			goto bad_connection;
 		}
 		PQclear(res);
 	}
@@ -244,6 +238,16 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname,
 	conn->logical = logical;
 
 	return conn;
+
+	/* 

Re: bug: copy progress reporting of backends which run multiple COPYs

2023-01-20 Thread Matthias van de Meent
On Sat, 21 Jan 2023 at 02:28, Justin Pryzby  wrote:
>
> On Sat, Jan 21, 2023 at 01:51:28AM +0100, Matthias van de Meent wrote:
> > On Thu, 19 Jan 2023 at 06:47, Justin Pryzby  wrote:
> > >
> > > pg_stat_progress_copy was added in v14 (8a4f618e7, 9d2d45700).
> > >
> > > But if a command JOINs file_fdw tables, the progress report gets bungled
> > > up.  This will warn/assert during file_fdw tests.
> >
> > I don't know what to do with that other than disabling COPY progress
> > reporting for file_fdw, i.e. calls to BeginCopyFrom that don't supply
> > a pstate. This is probably the best option, because a table backed by
> > file_fdw would also interfere with COPY TO's progress reporting.
> >
> > Attached a patch that solves this specific issue in a
> > binary-compatible way. I'm not super happy about relying on behavior
> > of callers of BeginCopyFrom (assuming that users that run copy
> > concurrently will not provide a ParseState* to BeginCopyFrom), but it
> > is what it is.
>
> Thanks for looking.  Maybe another option is to avoid progress reporting
> in 2nd and later CopyFrom() if another COPY was already running in that
> backend.

Let me think about it. I think it would work, but I'm not sure that's
a great option - it adds backend state that we would need to add
cleanup handles for. But then again, COPY ... TO could use TRIGGER to
trigger actual COPY FROM statements, which would also break progress
reporting in a vanilla instance without extensions.

I'm not sure what the right thing to do is here.

> Would you do anything different in the master branch, with no
> compatibility constraints ?  I think the progress reporting would still
> be limited to one row per backend, not one per CopyFrom().

I think I would at least introduce another parameter to BeginCopyFrom
for progress reporting (instead of relying on pstate != NULL), like
how we have a bit in reindex_index's params->options that specifies
whether we want progress reporting (which is unset for parallel
workers iirc).

- Matthias




Re: bug: copy progress reporting of backends which run multiple COPYs

2023-01-20 Thread Justin Pryzby
On Sat, Jan 21, 2023 at 01:51:28AM +0100, Matthias van de Meent wrote:
> On Thu, 19 Jan 2023 at 06:47, Justin Pryzby  wrote:
> >
> > pg_stat_progress_copy was added in v14 (8a4f618e7, 9d2d45700).
> >
> > But if a command JOINs file_fdw tables, the progress report gets bungled
> > up.  This will warn/assert during file_fdw tests.
> 
> I don't know what to do with that other than disabling COPY progress
> reporting for file_fdw, i.e. calls to BeginCopyFrom that don't supply
> a pstate. This is probably the best option, because a table backed by
> file_fdw would also interfere with COPY TO's progress reporting.
> 
> Attached a patch that solves this specific issue in a
> binary-compatible way. I'm not super happy about relying on behavior
> of callers of BeginCopyFrom (assuming that users that run copy
> concurrently will not provide a ParseState* to BeginCopyFrom), but it
> is what it is.

Thanks for looking.  Maybe another option is to avoid progress reporting
in 2nd and later CopyFrom() if another COPY was already running in that
backend.

Would you do anything different in the master branch, with no
compatibility constraints ?  I think the progress reporting would still
be limited to one row per backend, not one per CopyFrom().

-- 
Justin




Re: bug: copy progress reporting of backends which run multiple COPYs

2023-01-20 Thread Matthias van de Meent
On Sat, 21 Jan 2023 at 02:04, Ted Yu  wrote:
>
>
>
> On Fri, Jan 20, 2023 at 4:51 PM Matthias van de Meent 
>  wrote:
>>
>> Attached a patch that solves this specific issue in a
>> binary-compatible way. I'm not super happy about relying on behavior
>> of callers of BeginCopyFrom (assuming that users that run copy
>> concurrently will not provide a ParseState* to BeginCopyFrom), but it
>> is what it is.
>
> Is it possible to check range_table / rteperminfos so that we don't introduce 
> the bool field ?

I think yes, but I'm not sure we can depend on rteperminfos to be set,
and the same for p_rtable. I also don't think it's a good idea for
code clarity: there is no good reason why the (un)availability of
either range_table or rteperminfos would allow progress reporting - it
would require additional extensive documentation around both the usage
sites and the field itself. Adding a well-named field provides a much
better experience in my opinion.

If someone were opposed to adding that field in backbranches I'm fine
with using one of these instead, assuming additional clear
documentation is added as well.

- Matthias




Re: Inconsistency in vacuum behavior

2023-01-20 Thread Nathan Bossart
On Mon, Jan 16, 2023 at 11:18:08AM +0300, Alexander Pyhalov wrote:
> Is it intended? Why don't we perform vacuum_is_permitted_for_relation()
> check for inheritors in expand_vacuum_rel()?

Since no lock is held on the partition, the calls to functions like
object_ownercheck() and pg_class_aclcheck() in
vacuum_is_permitted_for_relation() will produce cache lookup ERRORs if the
relation is concurrently dropped.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




libpqrcv_connect() leaks PGconn

2023-01-20 Thread Andres Freund
Hi,

We have code like this in libpqrcv_connect():

conn = palloc0(sizeof(WalReceiverConn));
conn->streamConn = PQconnectStartParams(keys, vals,

 /* expand_dbname = */ true);
if (PQstatus(conn->streamConn) == CONNECTION_BAD)
{
*err = pchomp(PQerrorMessage(conn->streamConn));
return NULL;
}

[try to establish connection]

if (PQstatus(conn->streamConn) != CONNECTION_OK)
{
*err = pchomp(PQerrorMessage(conn->streamConn));
return NULL;
}


Am I missing something, or are we leaking the libpq connection in case of
errors?

It doesn't matter really for walreceiver, since it will exit anyway, but we
also use libpqwalreceiver for logical replication, where it might?


Seems pretty clear that we should do a PQfinish() before returning NULL? I
lean towards thinking that this isn't worth backpatching given the current
uses of libpq, but I could easily be convinced otherwise.


Noticed while taking another look through [1].

Greetings,

Andres Freund

[1] 
https://www.postgresql.org/message-id/20220925232237.p6uskba2dw6fnwj2%40awork3.anarazel.de




Re: bug: copy progress reporting of backends which run multiple COPYs

2023-01-20 Thread Ted Yu
On Fri, Jan 20, 2023 at 4:51 PM Matthias van de Meent <
boekewurm+postg...@gmail.com> wrote:

> On Thu, 19 Jan 2023 at 06:47, Justin Pryzby  wrote:
> >
> > pg_stat_progress_copy was added in v14 (8a4f618e7, 9d2d45700).
> >
> > But if a command JOINs file_fdw tables, the progress report gets bungled
> > up.  This will warn/assert during file_fdw tests.
>
> I don't know what to do with that other than disabling COPY progress
> reporting for file_fdw, i.e. calls to BeginCopyFrom that don't supply
> a pstate. This is probably the best option, because a table backed by
> file_fdw would also interfere with COPY TO's progress reporting.
>
> Attached a patch that solves this specific issue in a
> binary-compatible way. I'm not super happy about relying on behavior
> of callers of BeginCopyFrom (assuming that users that run copy
> concurrently will not provide a ParseState* to BeginCopyFrom), but it
> is what it is.
>
> Kind regards,
>
> Matthias van de Meent
>
Hi,
In `BeginCopyFrom`, I see the following :

if (pstate)
{
cstate->range_table = pstate->p_rtable;
cstate->rteperminfos = pstate->p_rteperminfos;

Is it possible to check range_table / rteperminfos so that we don't
introduce the bool field ?

Cheers


Re: bug: copy progress reporting of backends which run multiple COPYs

2023-01-20 Thread Matthias van de Meent
On Thu, 19 Jan 2023 at 06:47, Justin Pryzby  wrote:
>
> pg_stat_progress_copy was added in v14 (8a4f618e7, 9d2d45700).
>
> But if a command JOINs file_fdw tables, the progress report gets bungled
> up.  This will warn/assert during file_fdw tests.

I don't know what to do with that other than disabling COPY progress
reporting for file_fdw, i.e. calls to BeginCopyFrom that don't supply
a pstate. This is probably the best option, because a table backed by
file_fdw would also interfere with COPY TO's progress reporting.

Attached a patch that solves this specific issue in a
binary-compatible way. I'm not super happy about relying on behavior
of callers of BeginCopyFrom (assuming that users that run copy
concurrently will not provide a ParseState* to BeginCopyFrom), but it
is what it is.

Kind regards,

Matthias van de Meent


v1-0001-Only-report-progress-if-we-re-actually-in-a-parse.patch
Description: Binary data


Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

2023-01-20 Thread Andres Freund
Hi,

On 2023-01-17 10:50:53 -0800, Andres Freund wrote:
> On 2023-01-17 12:26:57 -0500, Tom Lane wrote:
> > > 0001) Add INSTR_TIME_SET_ZERO() calls where otherwise 0002 causes gcc to
> > >   warn
> > >   Alternatively we can decide to deprecate INSTR_TIME_SET_ZERO() and
> > >   just allow to assign 0.
> >
> > I think it's probably wise to keep the macro.  If we ever rethink this
> > again, we'll be glad we kept it.  Similarly, IS_ZERO is a good idea
> > even if it would work with just compare-to-zero.
>
> Perhaps an INSTR_TIME_ZERO() that could be assigned in variable definitions
> could give us the best of both worlds?

I tried that in the attached 0005. I found that it reads better if I also add
INSTR_TIME_CURRENT(). If we decide to go for this, I'd roll it into 0001
instead, but I wanted to get agreement on it first.

Comments?


> > I'm almost tempted to suggest you define instr_time as a struct with a
> > uint64 field, just to help keep us honest about that.
>
> I can see that making sense. Unless somebody pipes up with opposition to that
> plan soon, I'll see how it goes.

Done in the attached. I think it looks good. Actually found a type confusion
buglet in 0004, so the type safety benefit is noticable.

It does require a new INSTR_TIME_IS_LT() for the loop exit condition in 0004,
but that seems fine.


Besides cosmetic stuff I also added back the cast to double in window's
INSTR_TIME_GET_NANOSEC() - I think there's an overflow danger without it.

We should make this faster by pre-computing
  (double) NS_PER_S / GetTimerFrequency()
once, as that'd avoid doing the the slow division on every conversion. But
that's an old issue and thus better tackled separately.

Greetings,

Andres Freund
>From dfea7f159b79aaf1fb6f4c8617614f6fe5c1176f Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 16 Jan 2023 10:04:42 -0800
Subject: [PATCH v8 1/5] Zero initialize instr_time uses causing compiler
 warnings

These are all not necessary from a correctness POV. However, in a subsequent
patch instr_time will be simplified to an int64, at which point gcc would
otherwise start to warn about the changed places.

Author:
Reviewed-by:
Discussion: https://postgr.es/m/20230116023639.rn36vf6ajqmfc...@awork3.anarazel.de
Backpatch:
---
 src/backend/access/transam/xlog.c   | 4 
 src/backend/storage/buffer/bufmgr.c | 4 
 src/backend/storage/file/buffile.c  | 4 
 src/backend/storage/ipc/latch.c | 2 ++
 src/bin/psql/common.c   | 6 ++
 5 files changed, 20 insertions(+)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index cc0d9a05d9f..fb4c860bdea 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2191,6 +2191,8 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
 /* Measure I/O timing to write WAL data */
 if (track_wal_io_timing)
 	INSTR_TIME_SET_CURRENT(start);
+else
+	INSTR_TIME_SET_ZERO(start);
 
 pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
 written = pg_pwrite(openLogFile, from, nleft, startoffset);
@@ -8151,6 +8153,8 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 	/* Measure I/O timing to sync the WAL file */
 	if (track_wal_io_timing)
 		INSTR_TIME_SET_CURRENT(start);
+	else
+		INSTR_TIME_SET_ZERO(start);
 
 	pgstat_report_wait_start(WAIT_EVENT_WAL_SYNC);
 	switch (sync_method)
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 8075828e8a6..800a4248c95 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1017,6 +1017,8 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 
 			if (track_io_timing)
 INSTR_TIME_SET_CURRENT(io_start);
+			else
+INSTR_TIME_SET_ZERO(io_start);
 
 			smgrread(smgr, forkNum, blockNum, (char *) bufBlock);
 
@@ -2902,6 +2904,8 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln)
 
 	if (track_io_timing)
 		INSTR_TIME_SET_CURRENT(io_start);
+	else
+		INSTR_TIME_SET_ZERO(io_start);
 
 	/*
 	 * bufToWrite is either the shared buffer or a copy, as appropriate.
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index c5464b6aa62..0a51624df3b 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -446,6 +446,8 @@ BufFileLoadBuffer(BufFile *file)
 
 	if (track_io_timing)
 		INSTR_TIME_SET_CURRENT(io_start);
+	else
+		INSTR_TIME_SET_ZERO(io_start);
 
 	/*
 	 * Read whatever we can get, up to a full bufferload.
@@ -525,6 +527,8 @@ BufFileDumpBuffer(BufFile *file)
 
 		if (track_io_timing)
 			INSTR_TIME_SET_CURRENT(io_start);
+		else
+			INSTR_TIME_SET_ZERO(io_start);
 
 		bytestowrite = FileWrite(thisfile,
  file->buffer.data + wpos,
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index d79d71a8515..f4123e7de7e 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ 

Re: Rework of collation code, extensibility

2023-01-20 Thread Jeff Davis
On Fri, 2023-01-20 at 12:54 -0800, Jeff Davis wrote:
> Both of these are surprising, and I haven't investigated deeply yet.

It's just because autoconf defaults to -O2 and meson to -O3, at least
on my machine. It turns out that, at -O2, master and the refactoring
branch are even; but at -O3, both get faster, and the refactoring pulls
ahead by a few percentage points.

At least that's what's happening for en-US-x-icu on UTF-8 with my test
data set. I didn't see much of a difference in other situations, but I
didn't retest those other situations this time around.

We should still look into why disabling abbreviated keys improves
performance in some cases. Maybe we need a GUC for that?


-- 
Jeff Davis
PostgreSQL Contributor Team - AWS






Re: Introduce a new view for checkpointer related stats

2023-01-20 Thread Bharath Rupireddy
On Fri, Dec 2, 2022 at 1:07 PM Drouvot, Bertrand
 wrote:
>
> Patch LGTM, marking it as Ready for Committer.

Had to rebase, attached v5 patch for further consideration.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From e705916c40649d13a66299159cc992d0572960f5 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Fri, 20 Jan 2023 14:20:03 +
Subject: [PATCH v5] Introduce a new view for checkpointer related stats

pg_stat_bgwriter view currently reports checkpointer stats as well.
It is that way because historically checkpointer was part of
bgwriter until the commits 806a2ae and bf405ba, that went into
PG 9.2, separated them out.

It is time for us to separate checkpointer stats to its own view
called pg_stat_checkpointer.

Bump catalog version.
---
 doc/src/sgml/monitoring.sgml  | 110 +-
 src/backend/catalog/system_views.sql  |  18 +--
 .../utils/activity/pgstat_checkpointer.c  |   1 +
 src/backend/utils/adt/pgstatfuncs.c   |  19 +--
 src/include/catalog/pg_proc.dat   |  22 ++--
 src/include/pgstat.h  |   1 +
 src/test/recovery/t/029_stats_restart.pl  |   6 +-
 src/test/regress/expected/rules.out   |  17 +--
 src/test/regress/expected/stats.out   |  21 +++-
 src/test/regress/sql/stats.sql|  12 +-
 10 files changed, 156 insertions(+), 71 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index e3a783abd0..54166d88fc 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -494,6 +494,15 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   
  
 
+ 
+  pg_stat_checkpointerpg_stat_checkpointer
+  One row only, showing statistics about the
+   checkpointer process's activity. See
+   
+   pg_stat_checkpointer for details.
+ 
+ 
+
  
   pg_stat_walpg_stat_wal
   One row only, showing statistics about WAL activity. See
@@ -3671,7 +3680,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
   
The pg_stat_bgwriter view will always have a
-   single row, containing global data for the cluster.
+   single row, containing data about the bgwriter of the cluster.
   
 
   
@@ -3691,97 +3700,138 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
  
   
-   checkpoints_timed bigint
+   buffers_clean bigint
   
   
-   Number of scheduled checkpoints that have been performed
+   Number of buffers written by the background writer
   
  
 
  
   
-   checkpoints_req bigint
+   maxwritten_clean bigint
   
   
-   Number of requested checkpoints that have been performed
+   Number of times the background writer stopped a cleaning
+   scan because it had written too many buffers
   
  
 
  
   
-   checkpoint_write_time double precision
+   buffers_alloc bigint
   
   
-   Total amount of time that has been spent in the portion of
-   checkpoint processing where files are written to disk, in milliseconds
+   Number of buffers allocated
   
  
 
  
   
-   checkpoint_sync_time double precision
+   stats_reset timestamp with time zone
   
   
-   Total amount of time that has been spent in the portion of
-   checkpoint processing where files are synchronized to disk, in
-   milliseconds
+   Time at which these statistics were last reset
   
  
+
+   
+  
+
+ 
+
+ 
+  pg_stat_checkpointer
+
+  
+   pg_stat_checkpointer
+  
+
+  
+   The pg_stat_checkpointer view will always have a
+   single row, containing data about the checkpointer process of the cluster.
+  
 
+  
+   pg_stat_checkpointer View
+   
+
  
   
-   buffers_checkpoint bigint
+   Column Type
   
   
-   Number of buffers written during checkpoints
+   Description
   
  
+
 
+
  
   
-   buffers_clean bigint
+   timed_checkpoints bigint
   
   
-   Number of buffers written by the background writer
+   Number of scheduled checkpoints that have been performed
   
  
 
  
   
-   maxwritten_clean bigint
+   requested_checkpoints bigint
   
   
-   Number of times the background writer stopped a cleaning
-   scan because it had written too many buffers
+   Number of requested checkpoints that have been performed
   
  
 
  
   
-   buffers_backend bigint
+   checkpoint_write_time double precision
   
   
-   Number of buffers written directly by a backend
+   Total amount of time that has been spent in the portion of
+   checkpoint processing where files are written to disk, in milliseconds
   
  

Re: Introduce a new view for checkpointer related stats

2023-01-20 Thread Nathan Bossart
On Fri, Dec 02, 2022 at 08:36:38AM +0100, Drouvot, Bertrand wrote:
> Patch LGTM, marking it as Ready for Committer.

Unfortunately, this patch no longer applies.  Bharath, would you mind
posting a rebased version?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: pgindent vs variable declaration across multiple lines

2023-01-20 Thread Robert Haas
On Thu, Jan 19, 2023 at 8:31 PM Andres Freund  wrote:
> I know I can leave the variable initially uninitialized and then do a separate
> assignment, but that's not a great fix.

That's what I do.

If you pick names for all of your data types that are very very long
and wordy then you don't feel as bad about this, because you were
gonna need a line break anyway. :-)

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pgindent vs variable declaration across multiple lines

2023-01-20 Thread Andres Freund
Hi,

On 2023-01-19 17:59:49 -0800, Andres Freund wrote:
> On 2023-01-19 20:43:44 -0500, Tom Lane wrote:
> > Andres Freund  writes:
> > > There's a few places in the code that try to format a variable definition 
> > > like this
> >
> > > ReorderBufferChange *next_change =
> > > dlist_container(ReorderBufferChange, node, next);
> >
> > > but pgindent turns that into
> >
> > > ReorderBufferChange *next_change =
> > > dlist_container(ReorderBufferChange, node, next);
> >
> > Yeah, that's bugged me too.  I suspect that the triggering factor is
> > use of a typedef name within the assigned expression, but I've not
> > tried to run it to ground.
>
> It's not that - it happens even with just
> int frak =
> 1;
>
> since it doesn't happen for plain assignments, I think it's somehow related to
> code dealing with variable declarations.

Another fun one: pgindent turns

return (instr_time) {t.QuadPart};
into
return (struct instr_time)
{
t.QuadPart
};

Obviously it can be dealt with with a local variable, but ...

Greetings,

Andres Freund




feature request: IN clause optimized through append nodes with UNION ALL

2023-01-20 Thread Merlin Moncure
In the script below, the presence of an IN clause forces the internal
components of the UNION ALL clause to fully compute even though they are
fully optimizable.  = ANY doesn't have this issue, so I wonder if there is
any opportunity to convert the 'slow' variant (see below) to the 'fast'
variant.thank you!

merlin


drop table a cascade;
drop table b cascade;
drop table c cascade;

create table a (a_id int  primary key);
create table b (b_id int primary key, a_id int references a);
create table c (c_id int primary key, b_id int references b);


insert into a select s from generate_series(1, 5) s;
insert into b select s, (s % 5 ) + 1 from generate_series(1, 10) s;
insert into c select s, (s % 10 ) + 1 from generate_series(1, 100)
s;

create index on b (a_id, b_id);
create index on c (b_id, c_id);

analyze a;
analyze b;
analyze c;


create temp table d (a_id int);
insert into d values (99);
insert into d values (999);
insert into d values ();
analyze d;

create or replace view v as
select * from a join b using(a_id) join c using(b_id)
union all select * from a join b using(a_id) join c using(b_id);

explain analyze select * from v where a_id in (select a_id from d);   --
this is slow
explain analyze select * from v where a_id = any(array(select a_id from
d)); -- this is fast


Re: pg_stats and range statistics

2023-01-20 Thread Tomas Vondra
Hi Egor,

While reviewing a patch improving join estimates for ranges [1] I
realized we don't show stats for ranges in pg_stats, and I recalled we
had this patch.

I rebased the v2, and I decided to took a stab at showing separate
histograms for lower/upper histogram bounds. I believe it makes it way
more readable, which is what pg_stats is about IMHO.

This simply adds two functions, accepting/producing anyarray - one for
lower bounds, one for upper bounds. I don't think it can be done with a
plain subquery (or at least I don't know how).

Finally, it renames the empty_range_frac to start with range_, per the
earlier discussion. I wonder if the new column names for lower/upper
bounds (range_lower_bounds_histograms/range_upper_bounds_histograms) are
too long ...

regards

[1] https://commitfest.postgresql.org/41/3821/

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom b339c0eab5616cec61e9d9e85398034861608d30 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Fri, 20 Jan 2023 20:50:41 +0100
Subject: [PATCH 1/3] Display length and bounds histograms in pg_stats

Values corresponding to STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM and
STATISTIC_KIND_BOUNDS_HISTOGRAM were not exposed to pg_stats when these
slot kinds were introduced in 918eee0c49.

This commit adds the missing fields to pg_stats.

TODO: catalog version bump
---
 doc/src/sgml/catalogs.sgml   | 32 
 src/backend/catalog/system_views.sql | 23 +++-
 src/include/catalog/pg_statistic.h   |  3 +++
 src/test/regress/expected/rules.out  | 26 +-
 4 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index c1e4048054e..c8bd84c56eb 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9634,6 +9634,38 @@ SCRAM-SHA-256$iteration count:
User mapping specific options, as keyword=value strings
   
  
+
+ 
+  
+   range_length_histogram anyarray
+  
+  
+   A histogram of the lengths of non-empty and non-null range values of a
+   range type column. (Null for non-range types.)
+  
+ 
+
+ 
+  
+   empty_range_frac float4
+  
+  
+   Fraction of column entries whose values are empty ranges.
+   (Null for non-range types.)
+  
+ 
+
+ 
+  
+   range_bounds_histograms anyarray
+  
+  
+   Histograms of lower and upper bounds of non-empty, non-null ranges,
+   combined into a single array of range values. The lower and upper bounds
+   of each value correspond to the histograms of lower and upper bounds
+   respectively. (Null for non-range types.)
+  
+ 
 

   
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 8608e3fa5b1..ccd6c7ffdb7 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -243,7 +243,28 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
 WHEN stakind3 = 5 THEN stanumbers3
 WHEN stakind4 = 5 THEN stanumbers4
 WHEN stakind5 = 5 THEN stanumbers5
-END AS elem_count_histogram
+END AS elem_count_histogram,
+CASE
+WHEN stakind1 = 6 THEN stavalues1
+WHEN stakind2 = 6 THEN stavalues2
+WHEN stakind3 = 6 THEN stavalues3
+WHEN stakind4 = 6 THEN stavalues4
+WHEN stakind5 = 6 THEN stavalues5
+END AS range_length_histogram,
+CASE
+WHEN stakind1 = 6 THEN stanumbers1[1]
+WHEN stakind2 = 6 THEN stanumbers2[1]
+WHEN stakind3 = 6 THEN stanumbers3[1]
+WHEN stakind4 = 6 THEN stanumbers4[1]
+WHEN stakind5 = 6 THEN stanumbers5[1]
+END AS empty_range_frac,
+CASE
+WHEN stakind1 = 7 THEN stavalues1
+WHEN stakind2 = 7 THEN stavalues2
+WHEN stakind3 = 7 THEN stavalues3
+WHEN stakind4 = 7 THEN stavalues4
+WHEN stakind5 = 7 THEN stavalues5
+END AS range_bounds_histograms
 FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
  JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
  LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
diff --git a/src/include/catalog/pg_statistic.h b/src/include/catalog/pg_statistic.h
index 8770c5b4c60..10401dece0d 100644
--- a/src/include/catalog/pg_statistic.h
+++ b/src/include/catalog/pg_statistic.h
@@ -152,6 +152,9 @@ DECLARE_FOREIGN_KEY((starelid, staattnum), pg_attribute, (attrelid, attnum));
  * data "kind" will appear in any particular slot.  Instead, search the
  * stakind fields to see if the desired data is available.  (The standard
  * function get_attstatsslot() may be used for this.)
+ *
+ * Note: The pg_stats view needs to be modified whenever a new slot kind is
+ * added to core.
  */
 
 /*
diff 

Re: almost-super-user problems that we haven't fixed yet

2023-01-20 Thread Robert Haas
On Fri, Jan 20, 2023 at 4:02 PM Nathan Bossart  wrote:
> On Fri, Jan 20, 2023 at 03:42:03PM -0500, Robert Haas wrote:
> > Thanks to you both. I have committed these patches.
>
> Thanks!  Does this need a catversion bump?

I was surprised by this question because I thought I'd included one.

But it turns out I didn't include that in the commit and it's still in
my working tree. *facepalm*

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2023-01-20 Thread Peter Geoghegan
On Fri, Jan 20, 2023 at 12:57 PM Robert Haas  wrote:
> It doesn't seem that way to me. What am I missing? In that case, the
> problem was a DROP TRIGGER command waiting behind autovacuum's lock
> and thus causing all new table locks to wait behind DROP TRIGGER's
> lock request. But it does not sound like that was a one-off event.

It's true that I cannot categorically state that it would have made
the crucial difference in this particular case. It comes down to two
factors:

1. How many attempts would any given amount of additional XID space
head room have bought them in practice? We can be all but certain that
the smallest possible number is 1, which is something.

2. Would that have been enough for relfrozenxid to be advanced in practice?

I think that it's likely that the answer to 2 is yes, since there was
no mention of bloat as a relevant factor at any point in the
postmortem. It was all about locking characteristics of antiwraparound
autovacuuming in particular, and its interaction with their
application. I think that they were perfectly okay with the autovacuum
cancellation behavior most of the time. In fact, I don't think that
there was any bloat in the table at all -- it was a really huge table
(likely an events table), and those tend to be append-only.

Even if I'm wrong about this specific case (we'll never know for
sure), the patch as written would be virtually guaranteed to make the
crucial differences in cases that I have seen up close. For example, a
case with TRUNCATE.

> It sounds like they used DROP TRIGGER pretty regularly. So I think this
> sounds like exactly the kind of case I was talking about, where
> autovacuums keep getting cancelled until we decide to stop cancelling
> them.

I don't know how you can reach that conclusion. The chances of a
non-aggressive VACUUM advancing relfrozenxid right now are virtually
zero, at least for a big table like this one. It seems quite likely
that plenty of non-aggressive autovacuums completed, or would have had
the insert-driven autovacuum feature been available.

The whole article was about how this DROP TRIGGER pattern worked just
fine most of the time, because most of the time autovacuum was just
autocancelled. They say this at one point:

"The normal autovacuum mechanism is skipped when locks are held in
order to minimize service disruption. However, because transaction
wraparound is such a severe problem, if the system gets too close to
wraparound, an autovacuum is launched that does not back off under
lock contention."

At another point:

"When the outage was resolved, we still had a number of questions: is
a wraparound autovacuum always so disruptive? Given that it was
blocking all table operations, why does it throttle itself?"

ISTM that it was a combination of aggressive vacuuming taking far
longer than usual (especially likely because this was pre freeze map),
and the no-auto-cancel behavior. Aggressive/antiwraparound VACUUMs are
naturally much more likely to coincide with periodic DDL, just because
they take so much longer. That is a dangerous combination.

-- 
Peter Geoghegan




Re: almost-super-user problems that we haven't fixed yet

2023-01-20 Thread Nathan Bossart
On Fri, Jan 20, 2023 at 03:42:03PM -0500, Robert Haas wrote:
> Thanks to you both. I have committed these patches.

Thanks!  Does this need a catversion bump?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2023-01-20 Thread Robert Haas
On Fri, Jan 20, 2023 at 3:43 PM Peter Geoghegan  wrote:
> The only reason why I'm using table age at all is because that's how
> it works already, rightly or wrongly. If nothing else, t's pretty
> clear that there is no theoretical or practical reason why it has to
> be exactly the same table age as the one for launching autovacuums to
> advance relfrozenxid/relminmxid. In v5 of the patch, the default is to
> use 1.8x of the threshold that initially makes autovacuum.c want to
> launch autovacuums to deal with table age. That's based on a
> suggestion from Andres, but I'd be almost as happy with a default as
> low as 1.1x or even 1.05x. That's going to make very little difference
> to those users that really rely on the no-auto-cancellation behavior,
> while at the same time making things a lot safer for scenarios like
> the Joyent/Manta "DROP TRIGGER" outage (not perfectly safe, by any
> means, but meaningfully safer).

It doesn't seem that way to me. What am I missing? In that case, the
problem was a DROP TRIGGER command waiting behind autovacuum's lock
and thus causing all new table locks to wait behind DROP TRIGGER's
lock request. But it does not sound like that was a one-off event. It
sounds like they used DROP TRIGGER pretty regularly. So I think this
sounds like exactly the kind of case I was talking about, where
autovacuums keep getting cancelled until we decide to stop cancelling
them.

If so, then they were going to have a problem whenever that happened.
Delaying the point at which we stop cancelling them would not help at
all, as your patch would do. What about stopping cancelling them
sooner, as with the proposal to switch to that behavior after a
certain number of auto-cancels? That doesn't prevent the problem
either. If it's aggressive enough, it has some chance of making the
problem visible in a well-run test environment, which could
conceivably prevent you from hitting it in production, but certainly
there are no guarantees.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Doc: Rework contrib appendix -- informative titles, tweaked sentences

2023-01-20 Thread Karl O. Pinc
On Fri, 20 Jan 2023 14:22:25 -0600
"Karl O. Pinc"  wrote:

> v8-0001-List-trusted-and-obsolete-extensions.patch
> 
> Instead of putting [trusted] and [obsolete] in the titles
> of the modules, like v7 does, add a list of them into the text.

The list is inline.  It might be worthwhile experimenting
with a tabular list, like that produced by:

  

But only for the list of trusted extensions.  There's not
enough obsolete extensions to do anything but inline.  (IMO)

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein




Re: Rework of collation code, extensibility

2023-01-20 Thread Jeff Davis
On Tue, 2023-01-17 at 14:18 -0800, Peter Geoghegan wrote:
> The second goal is a perfectly good enough goal on its own, and one
> that I am totally supportive of. Making the code clearer is icing on
> the cake.

Attached v8, which is just a rebase.

To reiterate: commitfest entry
https://commitfest.postgresql.org/41/3956/ is dependent on these
patches and is a big part of the motivation for refactoring.

> 
> I don't know. Quite possibly not. It would be nice to have some data
> on that, though.

I tested with hash aggregation, which might be more dependent on
pg_locale_deterministic() than sorting. I didn't see any significant
difference between master and the refactoring branch, so I don't see a
need to make that function "inline".

I also re-tested sorting and found some interesting results for en-US-
x-icu on a UTF-8 database (which is I suspect one of the most common
configurations for ICU):

  * the refactoring branch is now more than 5% faster, whether using
abbreviated keys or not
  * disabling abbreviated keys makes sorting 8-10% faster on both
master and the refactoring branch

Both of these are surprising, and I haven't investigated deeply yet.
Maybe something about LTO, some intervening patch, or I just made some
mistakes somewhere (I did this fairly quickly). But as of now, it
doesn't look like the refactoring patch hurts anything.


-- 
Jeff Davis
PostgreSQL Contributor Team - AWS


From d1e2e1757b043c876695b8fa8c304b5126efb3aa Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Mon, 5 Dec 2022 10:43:52 -0800
Subject: [PATCH v8 2/2] Refactor pg_locale_t routines.

  * add pg_locale_internal.h to hide pg_locale_struct
  * move info.lt into info.libc.lt to match icu
  * introduce init_default_locale()
  * introduce pg_locale_deterministic() accessor
  * make default_locale a static global in pg_locale.c
  * refactor pg_newlocale_from_collation()
---
 src/backend/access/hash/hashfunc.c |  82 +++---
 src/backend/commands/collationcmds.c   |   1 +
 src/backend/regex/regc_pg_locale.c |  45 ++--
 src/backend/utils/adt/formatting.c |  25 +-
 src/backend/utils/adt/like.c   |   3 +-
 src/backend/utils/adt/like_support.c   |   3 +-
 src/backend/utils/adt/pg_locale.c  | 342 +++--
 src/backend/utils/adt/varchar.c|  62 ++---
 src/backend/utils/adt/varlena.c|  14 +-
 src/backend/utils/init/postinit.c  |  29 ++-
 src/include/utils/pg_locale.h  |  55 +---
 src/include/utils/pg_locale_internal.h |  68 +
 12 files changed, 402 insertions(+), 327 deletions(-)
 create mode 100644 src/include/utils/pg_locale_internal.h

diff --git a/src/backend/access/hash/hashfunc.c b/src/backend/access/hash/hashfunc.c
index c0ed995919..7cbd39f466 100644
--- a/src/backend/access/hash/hashfunc.c
+++ b/src/backend/access/hash/hashfunc.c
@@ -282,36 +282,28 @@ hashtext(PG_FUNCTION_ARGS)
 	if (!lc_collate_is_c(collid))
 		mylocale = pg_newlocale_from_collation(collid);
 
-	if (!mylocale || mylocale->deterministic)
+	if (pg_locale_deterministic(mylocale))
 	{
 		result = hash_any((unsigned char *) VARDATA_ANY(key),
 		  VARSIZE_ANY_EXHDR(key));
 	}
 	else
 	{
-#ifdef USE_ICU
-		if (mylocale->provider == COLLPROVIDER_ICU)
-		{
-			Size		bsize, rsize;
-			char	   *buf;
-			const char *keydata = VARDATA_ANY(key);
-			size_t		keylen = VARSIZE_ANY_EXHDR(key);
-
-			bsize = pg_strnxfrm(NULL, 0, keydata, keylen, mylocale);
-			buf = palloc(bsize);
-
-			rsize = pg_strnxfrm(buf, bsize, keydata, keylen, mylocale);
-			if (rsize != bsize)
-elog(ERROR, "pg_strnxfrm() returned unexpected result");
-
-			result = hash_any((uint8_t *) buf, bsize);
-
-			pfree(buf);
-		}
-		else
-#endif
-			/* shouldn't happen */
-			elog(ERROR, "unsupported collprovider: %c", mylocale->provider);
+		Size		bsize, rsize;
+		char	   *buf;
+		const char *keydata = VARDATA_ANY(key);
+		size_t		keylen = VARSIZE_ANY_EXHDR(key);
+
+		bsize = pg_strnxfrm(NULL, 0, keydata, keylen, mylocale);
+		buf = palloc(bsize);
+
+		rsize = pg_strnxfrm(buf, bsize, keydata, keylen, mylocale);
+		if (rsize != bsize)
+			elog(ERROR, "pg_strnxfrm() returned unexpected result");
+
+		result = hash_any((uint8_t *) buf, bsize);
+
+		pfree(buf);
 	}
 
 	/* Avoid leaking memory for toasted inputs */
@@ -337,7 +329,7 @@ hashtextextended(PG_FUNCTION_ARGS)
 	if (!lc_collate_is_c(collid))
 		mylocale = pg_newlocale_from_collation(collid);
 
-	if (!mylocale || mylocale->deterministic)
+	if (pg_locale_deterministic(mylocale))
 	{
 		result = hash_any_extended((unsigned char *) VARDATA_ANY(key),
    VARSIZE_ANY_EXHDR(key),
@@ -345,30 +337,22 @@ hashtextextended(PG_FUNCTION_ARGS)
 	}
 	else
 	{
-#ifdef USE_ICU
-		if (mylocale->provider == COLLPROVIDER_ICU)
-		{
-			Size		bsize, rsize;
-			char	   *buf;
-			const char *keydata = VARDATA_ANY(key);
-			size_t		keylen = VARSIZE_ANY_EXHDR(key);
-
-			bsize = pg_strnxfrm(NULL, 0, keydata, keylen, mylocale);
-			buf = palloc(bsize);
-
-			rsize = pg_strnxfrm(buf, bsize, 

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2023-01-20 Thread Peter Geoghegan
On Fri, Jan 20, 2023 at 5:47 AM Robert Haas  wrote:
> Yeah, this is a major reason why I'm very leery about changes in this
> area. A lot of autovacuum behavior is emergent, in the sense that it
> wasn't directly intended by whoever wrote the code. It's just a
> consequence of other decisions that probably seemed very reasonable at
> the time they were made but turned out to have surprising and
> unpleasant consequences.

I certainly agree with your general description of the ways things
are. To a large degree we're living in a world where DBAs have already
compensated for some of the autovacuum shortcomings discussed on this
thread. For example, by setting autovacuum_vacuum_scale_factor (and
even autovacuum_vacuum_insert_scale_factor) to very low values, to
compensate for the issues with random sampling of dead tuples by
analyze, and to compensate for the way that VACUUM doesn't reason
correctly about how the number of dead tuples changes as VACUUM runs.
They might not have thought of it that way -- it could have happened
as a byproduct of tuning a production system through trial and error
-- but it still counts as compensating for a defect in autovacuum
scheduling IMV.

It's actually quite likely that even a strict improvement to (say)
autovacuum scheduling will cause some number of regressions, since now
what were effectively mitigations become unnecessary. This is somewhat
similar to the dynamic with optimizer improvements, where (say) a
selectivity estimate function that's better by every available metric
can still easily cause regressions that really cannot be blamed on the
improvement itself. I personally believe that it's a price worth
paying when it comes to the issues with autovacuum statistics,
particularly the dead tuple count issues. Since much of the behavior
that we sometimes see is just absurdly bad. We have both water tight
theoretical arguments and practical experiences pointing in that
direction.

> In this particular case, I think that there is a large risk that
> postponing auto-cancellation will make things significantly worse,
> possibly drastically worse, for a certain class of users -
> specifically, those whose vacuums often get auto-cancelled.

I agree that that's a real concern for the autocancellation side of
things. That seems quite different to the dead tuples accounting
issues, in that nobody would claim that the existing behavior is
flagrantly wrong (just that it sometimes causes problems instead of
preventing them).

> That's why I really liked your idea of decoupling auto-cancellation
> from XID age. Such an approach can still avoid disabling
> auto-cancellation just because autovacuum_freeze_max_age has been hit,
> but it can also disable it much earlier when it detects that doing so
> is necessary to make progress.

To be clear, I didn't think that that's what Andres was proposing, and
my recent v5 doesn't actually do that. Even in v5, it's still
fundamentally impossible for autovacuums that are triggered by the
tuples inserted or dead tuples thresholds to not be autocancellable.

ISTM that it doesn't make sense to condition the autocancellation
behavior on table XID age in the context of dead tuple VACUUMs. It
could either be way too early or way too late at that point. I was
rather hoping to not have to build the infrastructure required for
fully decoupling the autocancellation behavior from the triggering
condition (dead tuples vs table XID age) in the scope of this
thread/patch, though I can see the appeal of that.

The only reason why I'm using table age at all is because that's how
it works already, rightly or wrongly. If nothing else, t's pretty
clear that there is no theoretical or practical reason why it has to
be exactly the same table age as the one for launching autovacuums to
advance relfrozenxid/relminmxid. In v5 of the patch, the default is to
use 1.8x of the threshold that initially makes autovacuum.c want to
launch autovacuums to deal with table age. That's based on a
suggestion from Andres, but I'd be almost as happy with a default as
low as 1.1x or even 1.05x. That's going to make very little difference
to those users that really rely on the no-auto-cancellation behavior,
while at the same time making things a lot safer for scenarios like
the Joyent/Manta "DROP TRIGGER" outage (not perfectly safe, by any
means, but meaningfully safer).

-- 
Peter Geoghegan




Re: almost-super-user problems that we haven't fixed yet

2023-01-20 Thread Robert Haas
On Fri, Jan 20, 2023 at 1:10 PM Nathan Bossart  wrote:
> > Thanks, this is fixed now with the latest patches.
>
> Thank you for reviewing.

Thanks to you both. I have committed these patches.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Schema variables - new implementation for Postgres 15 (typo)

2023-01-20 Thread Dmitry Dolgov
I've accumulated another collection of various questions and comments. As a
side note I'm getting a good feeling about this patch, those part I've read so
far looks good to me.

* I've suddenly realized one could use pseudo types for variables, and
  it not always works. E.g.:

=# create variable pseudo_array anyarray;
=# select pseudo_array;
 pseudo_array
--
 NULL

=# let pseudo_array = ARRAY[1, 2, 3];
ERROR:  42804: target session variable is of type anyarray but expression 
is of type integer[]
LOCATION:  svariableStartupReceiver, svariableReceiver.c:79

=# create variable pseudo_unknown unknown;
=# select pseudo_unknown;
ERROR:  XX000: failed to find conversion function from unknown to text
LOCATION:  coerce_type, parse_coerce.c:542

  Is it supposed to be like this, or something is missing?

* I think it was already mentioned in the thread, there seems to be not a
  single usage of CHECK_FOR_INTERRUPTS in session_variable.c . But some number
  of loops over the sessionvars are implemented, are they always going to be
  small enough to not make any troubles?

* sync_sessionvars_all explains why is it necessary to copy xact_recheck_varids:

 When we check the variables, the system cache can be 
invalidated,
 and xact_recheck_varids can be enhanced.

  I'm not quite following what the "enhancement" part is about? Is
  xact_recheck_varids could be somehow updated concurrently with the loop?

* A small typo

diff --git a/src/backend/commands/session_variable.c 
b/src/backend/commands/session_variable.c
--- a/src/backend/commands/session_variable.c
+++ b/src/backend/commands/session_variable.c
@@ -485,7 +485,7 @@ sync_sessionvars_all(bool filter_lxid)

/*
 * When we check the variables, the system cache can be 
invalidated,
-* and xac_recheck_varids can be enhanced. We want to iterate
+* and xact_recheck_varids can be enhanced. We want to iterate

NOTE: The commentaries above were made based on the previous patch version, but
it looks like those aspects were not changed.




Re: Doc: Rework contrib appendix -- informative titles, tweaked sentences

2023-01-20 Thread Karl O. Pinc
On Fri, 20 Jan 2023 20:12:38 +0100
Alvaro Herrera  wrote:

> Ah, I wanted to attach the two remaining patches and forgot. 

Attached are 2 alternatives:
(They touch separate files so the ordering is meaningless.)


v8-0001-List-trusted-and-obsolete-extensions.patch

Instead of putting [trusted] and [obsolete] in the titles
of the modules, like v7 does, add a list of them into the text.


v8-0002-Page-break-before-sect1-in-contrib-appendix-when-pdf.patch

This frobs the PDF style sheet so that when sect1 is used
in the appendix for the contrib directory, there is a page
break before every sect1.  This puts each module/extension
onto a separate page, but only for the contrib appendix.

Aside from hardcoding the "contrib" id, which I suppose isn't
too bad since it's publicly exposed as a HTML anchor (or URL 
component?) and unlikely to change, this also means that the 
contrib documentation can't use  instead of .

Sometimes I think I only know enough XSLT to get into trouble.
While v8 is "right", I can't say if it is a good idea/good practice.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 8816e06337..87833915e9 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -84,6 +84,31 @@ CREATE EXTENSION extension_name;
   provide access to outside-the-database functionality.
  
 
+ These are the trusted extensions:
+   
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+
+ 
+
  
   Many extensions allow you to install their objects in a schema of your
   choice.  To do that, add SCHEMA
@@ -100,6 +125,15 @@ CREATE EXTENSION extension_name;
   component for details.
  
 
+ 
+   These modules and extensions are obsolete:
+
+   
+ 
+ 
+   
+ 
+
  
  
  
diff --git a/doc/src/sgml/stylesheet-fo.xsl b/doc/src/sgml/stylesheet-fo.xsl
index 0c4dff92c4..68a46f9e24 100644
--- a/doc/src/sgml/stylesheet-fo.xsl
+++ b/doc/src/sgml/stylesheet-fo.xsl
@@ -132,4 +132,12 @@
   
 
 
+
+
+
+  
+  
+
+
 


Re: Change xl_hash_vacuum_one_page.ntuples from int to uint16

2023-01-20 Thread Nathan Bossart
On Tue, Jan 10, 2023 at 11:08:33AM +0100, Drouvot, Bertrand wrote:
> While working on [1], I noticed that xl_hash_vacuum_one_page.ntuples is an 
> int.
> 
> Unless I'm missing something, It seems to me that it would make more sense to 
> use an uint16 (like this is done for
> gistxlogDelete.ntodelete for example).

I think that is correct.  This value is determined by looping through
offsets, which are uint16 as well.  Should we also change the related
variables (e.g., ndeletable in _hash_vacuum_one_page()) to uint16?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Question regarding "Make archiver process an auxiliary process. commit"

2023-01-20 Thread Nathan Bossart
On Wed, Jan 04, 2023 at 11:35:33AM +0530, Sravan Kumar wrote:
> I have added the thread to the commitfest: 
> https://commitfest.postgresql.org/42/
> Did you get a chance to review the patch? Please let me know if you
> need anything from my end.

This seems like worthwhile simplification to me.  Ultimately, your patch
shouldn't result in any sort of signficant behavior change, and I don't see
any reason to further complicate the timeout calculation.  The copy loop
will run any time the archiver's latch is set, and it'll wait up to 60
seconds otherwise.  As discussed upthread, it might be possible to remove
the timeout completely, but that probably deserves its own thread.

I noticed that time.h is no longer needed by the archiver, so I removed
that and fixed an indentation nitpick in the attached v2.  I'm going to set
the commitfest entry to ready-for-committer shortly after sending this
message.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From a06609e839f039b7e7806456eaf4ee113cfabc3c Mon Sep 17 00:00:00 2001
From: Sravan Velagandula 
Date: Tue, 6 Dec 2022 06:21:38 -0500
Subject: [PATCH v2 1/1] simplify wait loop in the archiver

---
 src/backend/postmaster/pgarch.c | 29 +
 1 file changed, 9 insertions(+), 20 deletions(-)

diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 8ecdb9ca23..6e28067596 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -25,7 +25,6 @@
  */
 #include "postgres.h"
 
-#include 
 #include 
 #include 
 
@@ -297,7 +296,6 @@ pgarch_waken_stop(SIGNAL_ARGS)
 static void
 pgarch_MainLoop(void)
 {
-	pg_time_t	last_copy_time = 0;
 	bool		time_to_stop;
 
 	/*
@@ -335,30 +333,21 @@ pgarch_MainLoop(void)
 
 		/* Do what we're here for */
 		pgarch_ArchiverCopyLoop();
-		last_copy_time = time(NULL);
 
 		/*
 		 * Sleep until a signal is received, or until a poll is forced by
-		 * PGARCH_AUTOWAKE_INTERVAL having passed since last_copy_time, or
-		 * until postmaster dies.
+		 * PGARCH_AUTOWAKE_INTERVAL, or until postmaster dies.
 		 */
 		if (!time_to_stop)		/* Don't wait during last iteration */
 		{
-			pg_time_t	curtime = (pg_time_t) time(NULL);
-			int			timeout;
-
-			timeout = PGARCH_AUTOWAKE_INTERVAL - (curtime - last_copy_time);
-			if (timeout > 0)
-			{
-int			rc;
-
-rc = WaitLatch(MyLatch,
-			   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
-			   timeout * 1000L,
-			   WAIT_EVENT_ARCHIVER_MAIN);
-if (rc & WL_POSTMASTER_DEATH)
-	time_to_stop = true;
-			}
+			int			rc;
+
+			rc = WaitLatch(MyLatch,
+		   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+		   PGARCH_AUTOWAKE_INTERVAL * 1000L,
+		   WAIT_EVENT_ARCHIVER_MAIN);
+			if (rc & WL_POSTMASTER_DEATH)
+time_to_stop = true;
 		}
 
 		/*
-- 
2.25.1



Re: Doc: Rework contrib appendix -- informative titles, tweaked sentences

2023-01-20 Thread Karl O. Pinc
On Fri, 20 Jan 2023 20:12:03 +0100
Alvaro Herrera  wrote:

> On 2023-Jan-20, Karl O. Pinc wrote:
> 
> > On Fri, 20 Jan 2023 12:42:31 +0100
> > Alvaro Herrera  wrote:  
> 
> > > Hmm, I didn't know that.  I guess I can put it back.  My own
> > > instinct is to put the most important stuff first, not last, but
> > > if research says to do otherwise, fine, let's do that.  
> > 
> > A quick google on the subject tells me that I can't figure out a
> > good quick google.  I believe it's from the book at bottom.
> > Memorability goes "end", "beginning", "middle".  IIRC.  
> 
> Ah well.  I just put it back the way you had it.
> 
> > > I hope somebody with more docbook-fu can comment: maybe
> > > there's a way to fix it more generally somehow?  
> > 
> > What would the general solution be?  
> 
> I don't know, I was thinking that perhaps at the start of the appendix
> we could have some kind of marker that says "in this chapter, the
> s all get a page break", then a marker to stop that at the end
> of the appendix.  Or a tweak to the stylesheet, "when inside an
> appendix, all s get a pagebreak", in a way that doesn't affect
> the other chapters.
> 
> The  solution looks really ugly to me (in the source
> code I mean), but I suppose if we discover no other way to do it, we
> could do it like that.

I can do a forced page break for sect1-s in the pdf stylesheet just 
for the contrib appendix (appendix F) by looking for a parent 
with an id of "contrib".  That would work, but seems like a kludge.
(Otherwise, you look for a parent of "appendix" and force the page
break in all appendixes.)

I'll send a patch.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein




Re: Authentication fails for md5 connections if ~/.postgresql/postgresql.{crt and key} exist

2023-01-20 Thread Jacob Champion
On Fri, Jan 20, 2023 at 11:09 AM Jim Jones  wrote:
> Well, I am not suggesting to change the current behavior of PostgreSQL in
> that matter. Quite the contrary, I find this feature very convenient,
> specially when you need to deal with many different clusters. What I am
> proposing is rather the possibility to disable it on demand :) I mean,
> in case I do not want libpq to try to authenticate using the certificates
> in `~/.postgresql`.

I think the sslcertmode=disable option that I introduced in [1] solves
this issue too; would it work for your case? That whole patchset is
meant to tackle the general case of the problem you've described.

(Eventually I'd like to teach the server not to ask for a client
certificate if it's not going to use it.)

> I do realize that this patch is a big ask, since probably nobody except
> me "needs it" :D

I'd imagine other people have run into it too; it's just a matter of
how palatable the workarounds were to them. :)

--Jacob

[1] 
https://www.postgresql.org/message-id/flat/CAAWbhmi4V9zEAvfUSCDFx1pOr3ZWrV9fuxkv_2maRqvyc-m9PQ%40mail.gmail.com#199c1f49fbefa6be401db35f5cfa7742




Re: Doc: Rework contrib appendix -- informative titles, tweaked sentences

2023-01-20 Thread Alvaro Herrera
Ah, I wanted to attach the two remaining patches and forgot.  Here they
are.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
>From ba972984da37b5a315bd7a4face064d24ca41436 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 20 Jan 2023 12:32:22 +0100
Subject: [PATCH v7 1/2] Install [trusted] and [obsolete] markers in section
 title

I'm not sure about this part.
---
 doc/src/sgml/btree-gin.sgml   | 2 +-
 doc/src/sgml/btree-gist.sgml  | 2 +-
 doc/src/sgml/citext.sgml  | 2 +-
 doc/src/sgml/cube.sgml| 2 +-
 doc/src/sgml/dict-int.sgml| 2 +-
 doc/src/sgml/fuzzystrmatch.sgml   | 2 +-
 doc/src/sgml/hstore.sgml  | 2 +-
 doc/src/sgml/intagg.sgml  | 2 +-
 doc/src/sgml/intarray.sgml| 2 +-
 doc/src/sgml/isn.sgml | 2 +-
 doc/src/sgml/lo.sgml  | 2 +-
 doc/src/sgml/ltree.sgml   | 2 +-
 doc/src/sgml/pgcrypto.sgml| 2 +-
 doc/src/sgml/pgtrgm.sgml  | 2 +-
 doc/src/sgml/seg.sgml | 2 +-
 doc/src/sgml/tablefunc.sgml   | 2 +-
 doc/src/sgml/tcn.sgml | 2 +-
 doc/src/sgml/tsm-system-rows.sgml | 2 +-
 doc/src/sgml/tsm-system-time.sgml | 2 +-
 doc/src/sgml/unaccent.sgml| 2 +-
 doc/src/sgml/uuid-ossp.sgml   | 2 +-
 doc/src/sgml/xml2.sgml| 2 +-
 22 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/doc/src/sgml/btree-gin.sgml b/doc/src/sgml/btree-gin.sgml
index 46117209ce..db96f02ba9 100644
--- a/doc/src/sgml/btree-gin.sgml
+++ b/doc/src/sgml/btree-gin.sgml
@@ -1,7 +1,7 @@
 
 
 
- btree_gin  GIN operator classes with B-tree behavior
+ btree_gin  GIN operator classes with B-tree behavior [trusted]
 
  
   btree_gin
diff --git a/doc/src/sgml/btree-gist.sgml b/doc/src/sgml/btree-gist.sgml
index 31e7c78aae..be14779a92 100644
--- a/doc/src/sgml/btree-gist.sgml
+++ b/doc/src/sgml/btree-gist.sgml
@@ -1,7 +1,7 @@
 
 
 
- btree_gist  GiST operator classes with B-tree behavior
+ btree_gist  GiST operator classes with B-tree behavior [trusted]
 
  
   btree_gist
diff --git a/doc/src/sgml/citext.sgml b/doc/src/sgml/citext.sgml
index 8322885661..ff8a98c21b 100644
--- a/doc/src/sgml/citext.sgml
+++ b/doc/src/sgml/citext.sgml
@@ -1,7 +1,7 @@
 
 
 
- citext  a case-insensitive character string type
+ citext  a case-insensitive character string type [trusted]
 
  
   citext
diff --git a/doc/src/sgml/cube.sgml b/doc/src/sgml/cube.sgml
index 0fb7080748..1998e6d81a 100644
--- a/doc/src/sgml/cube.sgml
+++ b/doc/src/sgml/cube.sgml
@@ -1,7 +1,7 @@
 
 
 
- cube  a multi-dimensional cube data type
+ cube  a multi-dimensional cube data type [trusted]
 
  
   cube (extension)
diff --git a/doc/src/sgml/dict-int.sgml b/doc/src/sgml/dict-int.sgml
index 8dd07b9bc1..293ba83ce6 100644
--- a/doc/src/sgml/dict-int.sgml
+++ b/doc/src/sgml/dict-int.sgml
@@ -2,7 +2,7 @@
 
 
  dict_int 
-   example full-text search dictionary for integers
+   example full-text search dictionary for integers [trusted]
 
  
   dict_int
diff --git a/doc/src/sgml/fuzzystrmatch.sgml b/doc/src/sgml/fuzzystrmatch.sgml
index 1a5ebbb22f..bf1caab54d 100644
--- a/doc/src/sgml/fuzzystrmatch.sgml
+++ b/doc/src/sgml/fuzzystrmatch.sgml
@@ -1,7 +1,7 @@
 
 
 
- fuzzystrmatch  determine string similarities and distance
+ fuzzystrmatch  determine string similarities and distance [trusted]
 
  
   fuzzystrmatch
diff --git a/doc/src/sgml/hstore.sgml b/doc/src/sgml/hstore.sgml
index 7d93e49e91..b75c63f348 100644
--- a/doc/src/sgml/hstore.sgml
+++ b/doc/src/sgml/hstore.sgml
@@ -1,7 +1,7 @@
 
 
 
- hstore  hstore key/value datatype
+ hstore  hstore key/value datatype [trusted]
 
  
   hstore
diff --git a/doc/src/sgml/intagg.sgml b/doc/src/sgml/intagg.sgml
index 44a766eb4b..ce7b929a34 100644
--- a/doc/src/sgml/intagg.sgml
+++ b/doc/src/sgml/intagg.sgml
@@ -1,7 +1,7 @@
 
 
 
- intagg  integer aggregator and enumerator
+ intagg  integer aggregator and enumerator [obsolete]
 
  
   intagg
diff --git a/doc/src/sgml/intarray.sgml b/doc/src/sgml/intarray.sgml
index c72d49b01d..9194e94c28 100644
--- a/doc/src/sgml/intarray.sgml
+++ b/doc/src/sgml/intarray.sgml
@@ -1,7 +1,7 @@
 
 
 
- intarray  manipulate arrays of integers
+ intarray  manipulate arrays of integers [trusted]
 
  
   intarray
diff --git a/doc/src/sgml/isn.sgml b/doc/src/sgml/isn.sgml
index ea2aabc87d..a609969af9 100644
--- a/doc/src/sgml/isn.sgml
+++ b/doc/src/sgml/isn.sgml
@@ -1,7 +1,7 @@
 
 
 
- isn  data types for international standard numbers (ISBN, EAN, UPC, etc.)
+ isn  data types for international standard numbers (ISBN, EAN, UPC, etc.) [trusted]
 
  
   isn
diff --git a/doc/src/sgml/lo.sgml b/doc/src/sgml/lo.sgml
index 6d9bcebd42..511e576cac 100644
--- a/doc/src/sgml/lo.sgml
+++ b/doc/src/sgml/lo.sgml
@@ -1,7 +1,7 @@
 
 
 
- lo  manage large objects
+ lo  manage large objects [trusted]
 
  
   lo
diff --git a/doc/src/sgml/ltree.sgml b/doc/src/sgml/ltree.sgml
index bb66e33944..25d98e3f7d 100644
--- a/doc/src/sgml/ltree.sgml
+++ 

Re: Doc: Rework contrib appendix -- informative titles, tweaked sentences

2023-01-20 Thread Alvaro Herrera
On 2023-Jan-20, Karl O. Pinc wrote:

> On Fri, 20 Jan 2023 12:42:31 +0100
> Alvaro Herrera  wrote:

> > Hmm, I didn't know that.  I guess I can put it back.  My own instinct
> > is to put the most important stuff first, not last, but if research
> > says to do otherwise, fine, let's do that.
> 
> A quick google on the subject tells me that I can't figure out a good
> quick google.  I believe it's from the book at bottom.  Memorability
> goes "end", "beginning", "middle".  IIRC.

Ah well.  I just put it back the way you had it.

> > I hope somebody with more docbook-fu can comment: maybe
> > there's a way to fix it more generally somehow?
> 
> What would the general solution be?

I don't know, I was thinking that perhaps at the start of the appendix
we could have some kind of marker that says "in this chapter, the
s all get a page break", then a marker to stop that at the end of
the appendix.  Or a tweak to the stylesheet, "when inside an appendix,
all s get a pagebreak", in a way that doesn't affect the other
chapters.

The  solution looks really ugly to me (in the source
code I mean), but I suppose if we discover no other way to do it, we
could do it like that.

> There could be a forced page break at the beginning of _every_ sect1.
> That seems a bit much, but maybe not.  The only other thing I can
> think of that's "general" would be to force a page break for sect1-s
> that are in an appendix.  Is any of this wanted?  (Or technically
> "better"?)

I wouldn't want to changing the behavior of all the s in the
whole documentation.  Though if you want to try and garner support to do
that, I won't oppose it, particularly since it only matters for PDF.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
 really, I see PHP as like a strange amalgamation of C, Perl, Shell
 inflex: you know that "amalgam" means "mixture with mercury",
   more or less, right?
 i.e., "deadly poison"




Re: Authentication fails for md5 connections if ~/.postgresql/postgresql.{crt and key} exist

2023-01-20 Thread Jim Jones

Hello Israel,

Thanks a lot for the suggestion!

> I do not think it is worth it to change the current behavior of 
PostgreSQL

> in that sense.

Well, I am not suggesting to change the current behavior of PostgreSQL in
that matter. Quite the contrary, I find this feature very convenient,
specially when you need to deal with many different clusters. What I am
proposing is rather the possibility to disable it on demand :) I mean,
in case I do not want libpq to try to authenticate using the certificates
in `~/.postgresql`.

> PostgreSQL looks for the cert and key under `~/.postgresql` as a 
facility.

> These files do not exist by default, so if PostgreSQL finds something in
> there it assumes you want to use it.

Yes. I'm just trying to find an elegant way to disable this assumption 
on demand.


> I also think it is correct in the sense of choosing the certificate over
> a password based authentication when it finds a certificate as the cert
> based would provide you with stronger checks.

I couldn't agree more.

> It would require that you move the SSL cert and key from 
`~/.postgresql` to
> somewhere else and specify `sslcert` and `sslkey` in the expected 
service in the

> `~/.pg_service.conf` file.

That's exactly what I am trying to avoid. IOW, I want to avoid having to 
move

the cert files to another path and consequently having to configure 30
different entries in the pg_service.conf because of a single server that
does not support ssl authentication.

I do realize that this patch is a big ask, since probably nobody except 
me "needs it" :D


Thanks again for the message. Much appreciated!

Best,

Jim


smime.p7s
Description: S/MIME Cryptographic Signature


RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-20 Thread Takamichi Osumi (Fujitsu)
Hi,


On Thursday, January 19, 2023 7:55 PM vignesh C  wrote:
> On Thu, 19 Jan 2023 at 12:06, Takamichi Osumi (Fujitsu)
>  wrote:
> >
> > Updated the comment and the function call.
> >
> > Kindly have a look at the updated patch v17.
> 
> Thanks for the updated patch, few comments:
> 1) min_apply_delay was accepting values like '600 m s h', I was not sure if we
> should allow this:
> alter subscription sub1 set (min_apply_delay = ' 600 m s h');
> 
> +   /*
> +* If no unit was specified, then explicitly
> add 'ms' otherwise
> +* the interval_in function would assume 'seconds'.
> +*/
> +   if (strspn(tmp, "-0123456789 ") == strlen(tmp))
> +   val = psprintf("%sms", tmp);
> +   else
> +   val = tmp;
> +
> +   interval =
> DatumGetIntervalP(DirectFunctionCall3(interval_in,
> +
> 
> CStringGetDatum(val),
> +
> 
> ObjectIdGetDatum(InvalidOid),
> +
>   Int32GetDatum(-1)));
> 
FYI, the input can be accepted by the interval type.
Now we changed the direction of the type from interval to integer
but plus some unit can be added like recovery_min_apply_delay.
Please check.


> 3) There is one check at parse_subscription_options and another check in
> AlterSubscription, this looks like a redundant check in case of alter
> subscription, can we try to merge and keep in one place:
> /*
> * The combination of parallel streaming mode and min_apply_delay is not
> * allowed.
> */
> if (IsSet(supported_opts, SUBOPT_MIN_APPLY_DELAY) &&
> opts->min_apply_delay > 0)
> {
> if (opts->streaming == LOGICALREP_STREAM_PARALLEL) ereport(ERROR,
> errcode(ERRCODE_SYNTAX_ERROR), errmsg("%s and %s are mutually
> exclusive options",
>"min_apply_delay > 0", "streaming = parallel")); }
> 
> if (IsSet(opts.specified_opts, SUBOPT_MIN_APPLY_DELAY)) {
> /*
> * The combination of parallel streaming mode and
> * min_apply_delay is not allowed.
> */
> if (opts.min_apply_delay > 0)
> if ((IsSet(opts.specified_opts, SUBOPT_STREAMING) && opts.streaming ==
> LOGICALREP_STREAM_PARALLEL) ||
> (!IsSet(opts.specified_opts, SUBOPT_STREAMING) && sub->stream ==
> LOGICALREP_STREAM_PARALLEL))
> ereport(ERROR,
> errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> errmsg("cannot enable %s for subscription in %s mode",
>"min_apply_delay", "streaming = parallel"));
> 
> values[Anum_pg_subscription_subminapplydelay - 1] =
> Int64GetDatum(opts.min_apply_delay);
> replaces[Anum_pg_subscription_subminapplydelay - 1] = true; }
We can't. For create subscription, we need to check the patch
from parse_subscription_options, while for alter subscription,
we need to refer the current MySubscription value for those tests
in AlterSubscription.

 
> 4) typo "execeeds" should be "exceeds"
> 
> +  time on the subscriber. Any overhead of time spent in
> logical decoding
> +  and in transferring the transaction may reduce the actual wait 
> time.
> +  It is also possible that the overhead already execeeds the
> requested
> +  min_apply_delay value, in which case no
> additional
> +  wait is necessary. If the system clocks on publisher and subscriber
> +  are not synchronized, this may lead to apply changes earlier
> + than
Fixed.

Kindly have a look at the v18 patch in [1].


[1] - 
https://www.postgresql.org/message-id/TYCPR01MB8373BED9E390C4839AF56685EDC59%40TYCPR01MB8373.jpnprd01.prod.outlook.com


Best Regards,
Takamichi Osumi



Re: PL/Python: Fix return in the middle of PG_TRY() block.

2023-01-20 Thread Nathan Bossart
On Thu, Jan 19, 2023 at 05:07:11PM -0800, Andres Freund wrote:
> I'm somewhat dubious about allowing to return inside PG_CATCH, even if it's
> safe today.

+1.  Unless there are known use-cases, IMHO it'd be better to restrict it
to prevent future compatibility breaks as PG_TRY evolves.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-20 Thread Takamichi Osumi (Fujitsu)
Hi, 


On Thursday, January 19, 2023 10:17 PM Amit Kapila  
wrote:
> On Thu, Jan 19, 2023 at 12:06 PM Takamichi Osumi (Fujitsu)
>  wrote:
> >
> > Kindly have a look at the updated patch v17.
> >
> 
> Can we try to optimize the test time for this test? On my machine, it is the
> second highest time-consuming test in src/test/subscription. It seems you are
> waiting twice for apply_delay and both are for streaming cases by varying the
> number of changes. I think it should be just once and that too for the
> non-streaming case. I think it would be good to test streaming code path
> interaction but not sure if it is important enough to have two test cases for
> apply_delay.
The first insert test is for non-streaming case and we need both cases
for coverage. Regarding the time of test, conducted some optimization
such as turning off the initial table sync, shortening the time of wait, and so 
on.


> 
> One minor comment that I observed while going through the patch.
> + /*
> + * The combination of parallel streaming mode and min_apply_delay is
> + not
> + * allowed.
> + */
> + if (IsSet(supported_opts, SUBOPT_MIN_APPLY_DELAY) &&
> + opts->min_apply_delay > 0)
> 
> I think it would be good if you can specify the reason for not allowing this
> combination in the comments.
Added.


Please have a look at the latest v18 patch in [1].


[1] - 
https://www.postgresql.org/message-id/TYCPR01MB8373BED9E390C4839AF56685EDC59%40TYCPR01MB8373.jpnprd01.prod.outlook.com


Best Regards,
Takamichi Osumi



Re: Named Operators

2023-01-20 Thread Gurjeet Singh
On Fri, Jan 20, 2023 at 9:32 AM Ted Yu  wrote:
>
> Since `validIdentifier` doesn't modify the contents of `name` string, it 
> seems that there is no need to create `tmp` string in `validNamedOperator`.
> You can pass the start and end offsets into the string (name) as second and 
> third parameters to `validIdentifier`.

Thanks for reviewing the patch!

I was making a temporary copy of the string, since I had to modify it
before the validation, whereas the callee expects a `const char*`. I
agree that the same check can be done with more elegance, while
eliminating the temporary allocation. Please find the updated patch
attached.

Instead of passing the start and end of region I want to check, as
suggested, I'm now passing just the length of the string I want
validated. But I think that's for the better, since it now aligns with
the comment that validIdentifier() does not check if the passed string
is shorter than NAMEDATALEN.

Best regards,
Gurjeet
http://Gurje.et


named_operators_v4.patch
Description: Binary data


RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-20 Thread Takamichi Osumi (Fujitsu)
On Friday, January 20, 2023 12:47 PM shveta malik  
wrote:
> 1)
> Tried different variations of altering 'min_apply_delay'. All passed except 
> one
> below:
> 
> postgres=# alter subscription mysubnew set (min_apply_delay = '10.9min
> 1ms'); ALTER SUBSCRIPTION postgres=# alter subscription mysubnew set
> (min_apply_delay = '10.9min 2s 1ms'); ALTER SUBSCRIPTION --very similar to
> above but fails, postgres=# alter subscription mysubnew set
> (min_apply_delay = '10.9s 1ms');
> ERROR:  invalid input syntax for type interval: "10.9s 1ms"
FYI, this was because the interval type couldn't accept this format.
But now we changed the input format from interval to integer alinged
with recovery_min_apply_delay. Thus, we don't face this issue now.


Best Regards,
Takamichi Osumi



Re: PG_SETMASK() archeology

2023-01-20 Thread Nathan Bossart
On Fri, Jan 13, 2023 at 02:00:05PM +1300, Thomas Munro wrote:
> The reason we introduced PG_SETMASK() in the first place was to
> support one particular system that was very slow to adopt the POSIX
> signals stuff:  NeXTSTEP 3.x.
> 
> From some time in the dark age before our current repo begins until
> '97 we used sigprocmask() freely.  Then commit a5494a2d added a
> sigsetmask() fallback for NeXTSTEP (that's a pre-standard function
> inherited from '80s BSD).  In 1999 we added the PG_SETMASK() macro to
> avoid repeating #ifdef HAVE_SIGPROCMASK to select between them at each
> call site (commit 47937403676).  I have no personal knowledge of those
> systems; I wonder if it was already effectively quite defunct while we
> were adding the macro, but I dunno (NS 4.x never shipped?, but its
> living descendent OSX had already shipped that year).
> 
> Then we invented a bogus reason to need the macro for a couple more
> decades: our Windows simulated signal layer accidentally implemented
> the old BSD interface instead of the standard one, as complained about
> in commit a65e0864.

I found this very interesting.  Thanks for sharing.

> That's all ancient history now, and I think we might as well drop the
> macro to make our source a tiny bit less weird for new players, with a
> slightly richer interface.  Trivial patch attached.

+1, LGTM

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-20 Thread Takamichi Osumi (Fujitsu)
On Friday, January 20, 2023 5:54 PM shveta malik  wrote:
> On Fri, Jan 20, 2023 at 1:08 PM Peter Smith  wrote:
> 
> > a) the message should say that this is the *remaining* time to left to wait.
> >
> > b) it might be convenient to know from the log what was the original
> > min_apply_delay value in the 1st place.
> >
> > For example, the logs might look something like this:
> >
> > DEBUG: time-delayed replication for txid 1234, min_apply_delay =
> > 16 ms. Remaining wait time: 159972 ms
> > DEBUG: time-delayed replication for txid 1234, min_apply_delay =
> > 16 ms. Remaining wait time: 142828 ms
> > DEBUG: time-delayed replication for txid 1234, min_apply_delay =
> > 16 ms. Remaining wait time: 129994 ms
> > DEBUG: time-delayed replication for txid 1234, min_apply_delay =
> > 16 ms. Remaining wait time: 110001 ms ...
> >
> 
> +1
> This will also help when min_apply_delay is set to a new value in between the
> current wait. Lets say, I started with min_apply_delay=5 min, when the worker
> was half way through this, I changed min_apply_delay to 3 min or say 10min, I
> see the impact of that change i.e. new wait-time is adjusted, but log becomes
> confusing. So, please keep this scenario as well in mind while improving
> logging.
Yes, now the change of min_apply_delay value can be detected
since I followed the format provided above. So, this scenario is also covered.



Best Regards,
Takamichi Osumi



Re: On login trigger: take three

2023-01-20 Thread Mikhail Gribkov
  Hi Pavel,

On Mon, Jan 16, 2023 at 9:10 AM Pavel Stehule 
wrote:

>
>
> ne 15. 1. 2023 v 7:32 odesílatel Pavel Stehule 
> napsal:
>
>> Hi
>>
>>
>>> On Thu, Jan 12, 2023 at 9:51 AM Pavel Stehule 
>>> wrote:
>>>
 Hi

 I checked this patch and it looks well. All tests passed. Together with
 https://commitfest.postgresql.org/41/4013/ it can be a good feature.

 I re-tested impact on performance and for the worst case looks like
 less than 1% (0.8%). I think it is acceptable. Tested pgbench scenario
 "SELECT 1"

 pgbench -f ~/test.sql -C -c 3 -j 5 -T 100 -P10 postgres

 733 tps (master), 727 tps (patched).

 I think raising an exception inside should be better tested - not it is
 only in 001_stream_rep.pl - generally more tests are welcome - there
 are no tested handling exceptions.

>>>
>> Thank you
>>
>> check-world passed without problems
>> build doc passed without problems
>> I think so tests are now enough
>>
>> I'll mark this patch as ready for committer
>>
>
> Unfortunately, I  forgot one important point. There are not any tests
> related to backup.
>
> I miss pg_dump related tests.
>
> I mark this patch as waiting on the author.
>
>

Thanks for noticing this.
I have added sections to pg_dump tests. Attached v37 patch contains these
additions along with the fresh rebase on master.

--
 best regards,
Mikhail A. Gribkov

e-mail: youzh...@gmail.com
*http://www.flickr.com/photos/youzhick/albums
*
http://www.strava.com/athletes/5085772
phone: +7(916)604-71-12
Telegram: @youzhick






v37-On_client_login_event_trigger.patch
Description: Binary data


RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-20 Thread Takamichi Osumi (Fujitsu)
Hi,


On Friday, January 20, 2023 6:13 PM shveta malik  wrote:
> On Fri, Jan 20, 2023 at 2:23 PM shveta malik  wrote:
> >
> > On Fri, Jan 20, 2023 at 1:08 PM Peter Smith 
> wrote:
> >
> > > a) the message should say that this is the *remaining* time to left to 
> > > wait.
> > >
> > > b) it might be convenient to know from the log what was the original
> > > min_apply_delay value in the 1st place.
> > >
> > > For example, the logs might look something like this:
> > >
> > > DEBUG: time-delayed replication for txid 1234, min_apply_delay =
> > > 16 ms. Remaining wait time: 159972 ms
> > > DEBUG: time-delayed replication for txid 1234, min_apply_delay =
> > > 16 ms. Remaining wait time: 142828 ms
> > > DEBUG: time-delayed replication for txid 1234, min_apply_delay =
> > > 16 ms. Remaining wait time: 129994 ms
> > > DEBUG: time-delayed replication for txid 1234, min_apply_delay =
> > > 16 ms. Remaining wait time: 110001 ms ...
> > >
> >
> > +1
> > This will also help when min_apply_delay is set to a new value in
> > between the current wait. Lets say, I started with min_apply_delay=5
> > min, when the worker was half way through this, I changed
> > min_apply_delay to 3 min or say 10min, I see the impact of that change
> > i.e. new wait-time is adjusted, but log becomes confusing. So, please
> > keep this scenario as well in mind while improving logging.
> >
> 
> 
> when we send-feedback during apply-delay after every
> wal_receiver_status_interval , the log comes as:
> 023-01-19 17:12:56.000 IST [404795] DEBUG:  sending feedback (force 1) to
> recv 0/1570840, write 0/1570840, flush 0/1570840
> 
> Shall we have some info here to indicate that it is sent while waiting for
> apply_delay to distinguish it from other such send-feedback logs?
> It will
> make apply_delay flow clear in logs.
This additional tip of log information has been added in the latest v18.
Kindly have a look at it in [1].


[1] - 
https://www.postgresql.org/message-id/TYCPR01MB8373BED9E390C4839AF56685EDC59%40TYCPR01MB8373.jpnprd01.prod.outlook.com



Best Regards,
Takamichi Osumi



RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-20 Thread Takamichi Osumi (Fujitsu)
On Friday, January 20, 2023 3:56 PM Peter Smith  wrote:
> Hi Osumi-san, here are my review comments for the latest patch v17-0001.
Thanks for your review !


> ==
> Commit Message
> 
> 1.
> Prohibit the combination of this feature and parallel streaming mode.
> 
> SUGGESTION (using the same wording as in the code comments) The
> combination of parallel streaming mode and min_apply_delay is not allowed.
Okay. Fixed.


> ==
> doc/src/sgml/ref/create_subscription.sgml
> 
> 2.
> + 
> +  By default, the subscriber applies changes as soon as possible.
> This
> +  parameter allows the user to delay the application of changes by a
> +  specified amount of time. If the value is specified without units, 
> it
> +  is taken as milliseconds. The default is zero (no delay).
> + 
> 
> Looking at this again, it seemed a  bit strange to repeat "specified"
> twice in 2 sentences. Maybe change one of them.
> 
> I’ve also suggested using the word "interval" because I don’t think docs yet
> mentioned anywhere (except in the example) that using intervals is possible.
> 
> SUGGESTION (for the 2nd sentence)
> This parameter allows the user to delay the application of changes by a given
> time interval.
Adopted.


> ~~~
> 
> 3.
> + 
> +  Any delay occurs only on WAL records for transaction begins after
> all
> +  initial table synchronization has finished. The delay is calculated
> +  between the WAL timestamp as written on the publisher and the
> current
> +  time on the subscriber. Any overhead of time spent in
> logical decoding
> +  and in transferring the transaction may reduce the actual wait 
> time.
> +  It is also possible that the overhead already execeeds the
> requested
> +  min_apply_delay value, in which case no
> additional
> +  wait is necessary. If the system clocks on publisher and subscriber
> +  are not synchronized, this may lead to apply changes earlier than
> +  expected, but this is not a major issue because this parameter is
> +  typically much larger than the time deviations between servers.
> Note
> +  that if this parameter is set to a long delay, the replication will
> +  stop if the replication slot falls behind the current LSN
> by more than
> +   linkend="guc-max-slot-wal-keep-size">max_slot_wal_keep_size literal>.
> + 
> 
> 3a.
> Typo "execeeds" (I think Vignesh reported this already)
Fixed.


> ~
> 
> 3b.
> SUGGESTION (for the 2nd sentence)
> BEFORE
> The delay is calculated between the WAL timestamp...
> AFTER
> The delay is calculated as the difference between the WAL timestamp...
Fixed.


> ~~~
> 
> 4.
> + 
> +   
> +Delaying the replication can mean there is a much longer
> time between making
> +a change on the publisher, and that change being
> committed on the subscriber.
> +v
> +See .
> +   
> + 
> 
> IMO maybe there is a better way to express the 2nd sentence:
> 
> BEFORE
> This can have a big impact on synchronous replication.
> AFTER
> This can impact the performance of synchronous replication.
Fixed.


> ==
> src/backend/commands/subscriptioncmds.c
> 
> 5. parse_subscription_options
> 
> @@ -324,6 +328,43 @@ parse_subscription_options(ParseState *pstate, List
> *stmt_options,
>   opts->specified_opts |= SUBOPT_LSN;
>   opts->lsn = lsn;
>   }
> + else if (IsSet(supported_opts, SUBOPT_MIN_APPLY_DELAY) &&
> + strcmp(defel->defname, "min_apply_delay") == 0) {
> + char*val,
> +*tmp;
> + Interval   *interval;
> + int64 ms;
> 
> IMO 'delay_ms' (or similar) would be a friendlier variable name than just 'ms'
The variable name has been changed which is more clear to the feature.


> ~~~
> 
> 6.
> @@ -404,6 +445,20 @@ parse_subscription_options(ParseState *pstate, List
> *stmt_options,
>   "slot_name = NONE", "create_slot = false")));
>   }
>   }
> +
> + /*
> + * The combination of parallel streaming mode and min_apply_delay is
> + not
> + * allowed.
> + */
> + if (IsSet(supported_opts, SUBOPT_MIN_APPLY_DELAY) &&
> + opts->min_apply_delay > 0)
> + {
> + if (opts->streaming == LOGICALREP_STREAM_PARALLEL)
> ereport(ERROR,
> + errcode(ERRCODE_SYNTAX_ERROR), errmsg("%s and %s are mutually
> + exclusive options",
> +"min_apply_delay > 0", "streaming = parallel")); }
> 
> This could be expressed as a single condition using &&, maybe also with the
> brackets eliminated. (Unless you feel the current code is more readable)
The current style is intentional. We feel the code is more readable.


> ~~~
> 
> 7.
> 
> + if (opts.min_apply_delay > 0)
> + if ((IsSet(opts.specified_opts, SUBOPT_STREAMING) && opts.streaming
> == LOGICALREP_STREAM_PARALLEL) ||
> + (!IsSet(opts.specified_opts, SUBOPT_STREAMING) && sub->stream ==
> LOGICALREP_STREAM_PARALLEL))
> + ereport(ERROR,
> + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + 

Re: Replace PROC_QUEUE / SHM_QUEUE with ilist.h

2023-01-20 Thread Robert Haas
On Thu, Jan 19, 2023 at 9:58 PM Andres Freund  wrote:
> Finally pushed, with some fairly minor additional cleanup. No more SHM_QUEUE,
> yay!

Nice. I won't miss it one bit.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: run pgindent on a regular basis / scripted manner

2023-01-20 Thread Tom Lane
Andres Freund  writes:
> On 2023-01-20 12:09:05 -0500, Tom Lane wrote:
>> The core problem here is that requiring that would translate to
>> requiring every code contributor to have a working copy of pg_bsd_indent.

> Wouldn't just every committer suffice?

Not if we have cfbot complaining about it.

(Another problem here is that there's a sizable subset of committers
who clearly just don't care, and I'm not sure we can convince them to.)

>> Yeah, if we switched to some other tool maybe we could reduce the size
>> of that problem.  But as Bruce replied, we've not found another one that
>> (a) can be coaxed to make output comparable to what we're accustomed to
>> and (b) seems decently well maintained.

> One question around this is how much change we'd accept. clang-format for
> example is well maintained and can get somewhat close to our style - but
> there are things that can't easily be approximated.

If somebody wants to invest the effort in seeing how close we can get
and what the remaining delta would look like, we could have a discussion
about whether it's an acceptable change.  I don't think anyone has
tried with clang-format.

regards, tom lane




Re: real/float example for testlibpq3

2023-01-20 Thread Robert Haas
On Fri, Jan 20, 2023 at 12:58 PM Tom Lane  wrote:
> I don't mind if you write something like
>
>   A float4 value is a 4-byte IEEE single-precision floating point
>   number.  It is transmitted in network byte order, so you must
>   convert to local byte order.  (C users can do this portably
>   using the standard ntohl() function.)
>
> but I'm not sure an example is worth more than such a parenthetical
> comment.  Perhaps others disagree, though.

I don't disagree with that.

I do think that when you suggested documenting this rather than just
adding some examples, you moved the goalposts a long way. If we're
going to add this to the documentation, it probably ought to cover
every data type we ship. Overall, I think that would be a better
result than just adding a few examples for the most common data types
to testlibpq*.c, but it's also substantially more work. I do agree
with you that if we're going to document this, as opposed to provide
examples, then a narrative style is more appropriate than a whole
bunch of small sample programs; maintaining working code in the
documentation seems like an annoying amount of maintenance and is
probably not the most efficient way to communicate useful information.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: almost-super-user problems that we haven't fixed yet

2023-01-20 Thread Nathan Bossart
On Fri, Jan 20, 2023 at 07:04:58PM +0530, tushar wrote:
> On 1/19/23 6:28 PM, tushar wrote:
>> There is  one typo , for the doc changes, it is  mentioned
>> "pg_use_reserved_backends" but i think it supposed to be
>> "pg_use_reserved_connections"
>> under Table 22.1. Predefined Roles.
> 
> Thanks, this is fixed now with the latest patches.

Thank you for reviewing.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: real/float example for testlibpq3

2023-01-20 Thread Tom Lane
Mark Wong  writes:
> On Thu, Nov 03, 2022 at 09:55:22AM -0400, Tom Lane wrote:
>> Perhaps a new chapter under "IV. Client Interfaces" is the right
>> place?
>> 
>> If we wanted to get aggressive, we could move most of the nitpicky details
>> about datatype text formatting (e.g., the array quoting rules) there too.
>> I'm not set on that, but it'd make datatype.sgml smaller which could
>> hardly be a bad thing.

> I suppose figuring out exactly where to put it and how to mark it up, 
> etc., in a repeatable fashion is part of the job here.

> How does this look?

> I've simply moved things around into a new "Binary Format" section with
> the few parts that I've started for some quick feedback about whether
> this is looking like the right landing place.

I took a quick look at this. The patch cannot be applied as given
because it seems to be editing binary-format.sgml from some prior
state, but that file doesn't exist at all yet.  However, there's
enough visible in the patch to comment on.

Personally I'm not excited about supplying fragments of C code
like this.  Those seem quite useless to non-C users, and I'm not
really sure that they'd be very helpful even for C users, because
you have to make a whole lot of assumptions about what the user
wants to do with the value.  I think what *would* be helpful is
just plain prose documentation of the on-the-wire binary format.

I don't mind if you write something like

  A float4 value is a 4-byte IEEE single-precision floating point
  number.  It is transmitted in network byte order, so you must
  convert to local byte order.  (C users can do this portably
  using the standard ntohl() function.)

but I'm not sure an example is worth more than such a parenthetical
comment.  Perhaps others disagree, though.

regards, tom lane




Re: run pgindent on a regular basis / scripted manner

2023-01-20 Thread Andres Freund
Hi,

On 2023-01-20 12:09:05 -0500, Tom Lane wrote:
> Jelte Fennema  writes:
> > To me a master branch that pgindent never complains about sounds
> > amazing! And I personally think rejection of unindented pushes and
> > cfbot complaining about unindented patches would be a very good thing,
> > because that seems to be the only solution that could achieve that.
> 
> The core problem here is that requiring that would translate to
> requiring every code contributor to have a working copy of pg_bsd_indent.

Wouldn't just every committer suffice?


> Maybe that's not a huge lift, but it'd be YA obstacle to new contributors,
> and we don't need any more of those.
> 
> Yeah, if we switched to some other tool maybe we could reduce the size
> of that problem.  But as Bruce replied, we've not found another one that
> (a) can be coaxed to make output comparable to what we're accustomed to
> and (b) seems decently well maintained.

One question around this is how much change we'd accept. clang-format for
example is well maintained and can get somewhat close to our style - but
there are things that can't easily be approximated.

Greetings,

Andres Freund




Re: RFC: logical publication via inheritance root?

2023-01-20 Thread Jacob Champion
On 1/10/23 11:36, Jacob Champion wrote:
> 1) I'm playing around with a marker in pg_inherits, where the inhseqno
> is set to a sentinel value (0) for an inheritance relationship that
> has been marked for logical publication. The intent is that the
> pg_inherits helpers will prevent further inheritance relationships
> when they see that marker, and reusing inhseqno means we can make use
> of the existing index to do the lookups. An example:
> 
> =# CREATE TABLE root (a int);
> =# CREATE TABLE root_p1 () INHERITS (root);
> =# SELECT pg_set_logical_root('root_p1', 'root');
> 
> and then any data written to root_p1 gets replicated via root instead,
> if publish_via_partition_root = true. If root_p1 is set up with extra
> columns, they'll be omitted from replication.

First draft attached. (Due to some indentation changes, it's easiest to
read with --ignore-all-space.)

The overall strategy is
- introduce pg_set_logical_root, which sets the sentinel in pg_inherits,
- swap out any checks for partition parents with checks for logical
parents in the publishing code, and
- introduce the ability for a subscriber to perform an initial table
sync from multiple tables on the publisher.

> 2) While this strategy works well for ongoing replication, it's not
> enough to get the initial synchronization correct. The subscriber
> still does a COPY of the root table directly, missing out on all the
> logical descendant data. The publisher will have to tell the
> subscriber about the relationship somehow, and older subscriber
> versions won't understand how to use that (similar to how old
> subscribers can't correctly handle row filters).

I partially solved this by having the subscriber pull the logical
hierarchy from the publisher to figure out which tables to COPY. This
works when publish_via_partition_root=true, but it doesn't correctly
return to the previous behavior when the setting is false. I need to
check the publication setting from the subscriber, too, but that opens
up the question of what to do if two different publications conflict.

And while I go down that rabbit hole, I wanted to see if anyone thinks
this whole thing is unacceptable. :D

Thanks,
--JacobFrom 379bf99ea022203d428a4027da753a00a3989c04 Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Mon, 26 Sep 2022 13:23:51 -0700
Subject: [PATCH] WIP: introduce pg_set_logical_root for use with pubviaroot

Allows regular inherited tables to be published via their root table,
just like partitions. This works by hijacking pg_inherit's inhseqno
column, and replacing a (single) existing entry for the child with the
value zero, indicating that it should be treated as a logical partition
by the publication machinery.

(For this to work correctly at the moment, the publication must set
publish_via_partition_root=true. See bugs below.)

Initial sync works by pulling in all logical descendants on the
subscriber, then COPYing them one-by-one into the root. The publisher
reuses the existing pubviaroot logic, adding the new logical roots to
code that previously looked only for partition roots.

Known bugs/TODOs:
- When pubviaroot is false, initial sync doesn't work correctly (it
  assumes pubviaroot is on and COPYs all descendants into the root).
- The pg_inherits machinery doesn't prohibit changes to inheritance
  after an entry has been marked as a logical root.
- I haven't given any thought to interactions with row filters, or to
  column lists, or to multiple publications with conflicting pubviaroot
  settings.
- pg_set_logical_root() doesn't check for table ownership yet. Anyone
  can muck with pg_inherits through it.
- I'm not sure that I'm taking all the necessary locks yet, and those I
  do take may be taken in the wrong order.
---
 src/backend/catalog/pg_inherits.c   | 154 ++
 src/backend/catalog/pg_publication.c|  30 +--
 src/backend/commands/publicationcmds.c  |  10 +
 src/backend/replication/logical/tablesync.c | 221 +---
 src/backend/replication/pgoutput/pgoutput.c |  54 ++---
 src/include/catalog/pg_inherits.h   |   2 +
 src/include/catalog/pg_proc.dat |   5 +
 src/test/regress/expected/publication.out   |  32 +++
 src/test/regress/sql/publication.sql|  25 +++
 src/test/subscription/t/013_partition.pl| 186 
 10 files changed, 608 insertions(+), 111 deletions(-)

diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index da969bd2f9..c290a9936a 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -24,10 +24,13 @@
 #include "access/table.h"
 #include "catalog/indexing.h"
 #include "catalog/pg_inherits.h"
+#include "catalog/partition.h"
 #include "parser/parse_type.h"
 #include "storage/lmgr.h"
 #include "utils/builtins.h"
 #include "utils/fmgroids.h"
+#include "utils/fmgrprotos.h"
+#include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
 #include 

Re: Named Operators

2023-01-20 Thread Ted Yu
On Fri, Jan 20, 2023 at 9:17 AM Gurjeet Singh  wrote:

> On Sat, Jan 14, 2023 at 6:14 AM Gurjeet Singh  wrote:
> >
> > I agree that an identifier _surrounded_ by the same token (e.g. #foo#)
> > or the pairing token (e.g. {foo}) looks better aesthetically, so I am
> > okay with any of the following variations of the scheme, as well:
> >
> > \#foo\#  (tested; works)
> > \#foo#   (not tested; reduces ident length by 1)
> >
> > We can choose a different character, instead of #. Perhaps \{foo} !
>
> Please find attached the patch that uses \{foo} styled Named
> Operators. This is in line with Tom's reluctant hint at possibly using
> curly braces as delimiter characters. Since the curly braces are used
> by the SQL Specification for row pattern recognition, this patch
> proposes escaping the first of the curly braces.
>
> We can get rid of the leading backslash, if (a) we're confident that
> SQL committee will not use curly braces anywhere else, and (b) if
> we're confident that if/when Postgres supports Row Pattern Recognition
> feature, we'll be able to treat curly braces inside the PATTERN clause
> specially. Since both of those conditions are unlikely, I think we
> must settle for the escaped-first-curly-brace style for the naming our
> operators.
>
> Keeping with the previous posts, here's a sample SQL script showing
> what the proposed syntax will look like in action. Personally, I
> prefer the \#foo style, since the \# prefix stands out among the text,
> better than \{..} does, and because # character is a better signal of
> an operator than {.
>
> create operator \{add_point}
> (function = box_add, leftarg = box, rightarg = point);
> create table test(a box);
> insert into test values('((0,0),(1,1))'), ('((0,0),(2,1))');
> select a as original, a \{add_point} '(1,1)' as modified from test;
> drop operator \{add_point}(box, point);
>
> Best regards,
> Gurjeet
> http://Gurje.et


Hi,
Since `validIdentifier` doesn't modify the contents of `name` string, it
seems that there is no need to create `tmp` string in `validNamedOperator`.
You can pass the start and end offsets into the string (name) as second and
third parameters to `validIdentifier`.

Cheers


Re: [PATCH] Teach planner to further optimize sort in distinct

2023-01-20 Thread Ankit Kumar Pandey




On 20/01/23 06:07, David Rowley wrote:



Looking at the patch, you've not added any additional tests.  If the
existing tests are all passing then that just tells me that either the
code is not functioning as intended or we have no tests that look at
the EXPLAIN output which can make use of this optimization. If the
former is true, then the patch needs to be fixed. If it's the latter
then you need to write new tests.


I definitely need to add tests because this scenario is missing.




I don't know all the repercussions. If you look at add_path() then
you'll see we do a pathkey comparison when the costs are not fuzzily
different from an existing path so that we try to keep a path with the
best pathkeys.  If we start keeping paths around with other weird
pathkeys that are not along the lines of the query_pathkeys requires,
then add_path might start throwing away paths that are actually good
for something.  It seems probable that could cause some regressions.


Okay, in that case I think it is  better idea to store original pathkeys
(apart from what get assigned by useful_pathkeys). I need to dig deeper for 
this.



Does this patch actually work?  I tried:
I don't see that you're
adjusting IndexPath's pathkeys anywhere. 


I had removed the changes for indexPath (it was in v1) because I hadn't 
investigated
repercussions. But I failed to mention this.


The nested loop in
pathkeys_count_contained_in_unordered() seems to be inside out. The
reordered_pathkeys must have the common pathkeys in the order they
appear in keys2. In your patch, they'll be ordered according to the
keys1 list, which is wrong. Testing would tell you this, so all the
more reason to make the patch work and write some queries to ensure it
does actually work, then some tests to ensure that remains in a
working state.
Feel free to take the proper time to write a working patch which
contains new tests to ensure it's functioning as intended. It's
disheartening to review patches that don't seem to work. If it wasn't
meant to work, then you didn't make that clear.
Please don't rush out the next patch. Take your time and study the code 
and see if you can build up your own picture for what the repercussions 
might be of IndexPaths having additional pathkeys when they were previously empty.  
If you're uncertain of aspects of the patch you've written feel free to leave

XXX type comments to indicate this. That way the reviewer will know
you might need more guidance on that and you'll not forget yourself
when you come back and look again after a few weeks.


I deeply regret this. I will be mindful of my patches and ensure that they are
complete by themselves.
Thanks for your pointers as well, I can see errors in my approach which I will 
address.
 


I'll likely not be
able to do any further reviews until the March commitfest, so it might
be better to only post if you're stuck.  


Yes sure, I will work on patches and limit posts to discussion only (if 
blocked).

Thanks,
Ankit





Re: Named Operators

2023-01-20 Thread Gurjeet Singh
On Sat, Jan 14, 2023 at 6:14 AM Gurjeet Singh  wrote:
>
> I agree that an identifier _surrounded_ by the same token (e.g. #foo#)
> or the pairing token (e.g. {foo}) looks better aesthetically, so I am
> okay with any of the following variations of the scheme, as well:
>
> \#foo\#  (tested; works)
> \#foo#   (not tested; reduces ident length by 1)
>
> We can choose a different character, instead of #. Perhaps \{foo} !

Please find attached the patch that uses \{foo} styled Named
Operators. This is in line with Tom's reluctant hint at possibly using
curly braces as delimiter characters. Since the curly braces are used
by the SQL Specification for row pattern recognition, this patch
proposes escaping the first of the curly braces.

We can get rid of the leading backslash, if (a) we're confident that
SQL committee will not use curly braces anywhere else, and (b) if
we're confident that if/when Postgres supports Row Pattern Recognition
feature, we'll be able to treat curly braces inside the PATTERN clause
specially. Since both of those conditions are unlikely, I think we
must settle for the escaped-first-curly-brace style for the naming our
operators.

Keeping with the previous posts, here's a sample SQL script showing
what the proposed syntax will look like in action. Personally, I
prefer the \#foo style, since the \# prefix stands out among the text,
better than \{..} does, and because # character is a better signal of
an operator than {.

create operator \{add_point}
(function = box_add, leftarg = box, rightarg = point);
create table test(a box);
insert into test values('((0,0),(1,1))'), ('((0,0),(2,1))');
select a as original, a \{add_point} '(1,1)' as modified from test;
drop operator \{add_point}(box, point);

Best regards,
Gurjeet
http://Gurje.et
diff --git a/src/backend/catalog/pg_operator.c b/src/backend/catalog/pg_operator.c
index 1017f2eed1..c5b8562cb5 100644
--- a/src/backend/catalog/pg_operator.c
+++ b/src/backend/catalog/pg_operator.c
@@ -31,6 +31,7 @@
 #include "catalog/pg_type.h"
 #include "miscadmin.h"
 #include "parser/parse_oper.h"
+#include "parser/scansup.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
@@ -79,6 +80,10 @@ validOperatorName(const char *name)
 	if (len == 0 || len >= NAMEDATALEN)
 		return false;
 
+	/* Is this a Named Operator? */
+	if (validNamedOperator(name))
+		return true;
+
 	/* Can't contain any invalid characters */
 	/* Test string here should match op_chars in scan.l */
 	if (strspn(name, "~!@#^&|`?+-*/%<>=") != len)
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index db8b0fe8eb..8587b82c8d 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -379,6 +379,16 @@ self			[,()\[\].;\:\+\-\*\/\%\^\<\>\=]
 op_chars		[\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
 operator		{op_chars}+
 
+/*
+ * Named Operators, e.g. \{foo}
+ *
+ * {namedopfailed*} are error rules to avoid scanner backup when
+ * {namedop} fails to match its trailing tokens.
+ */
+namedop			\\\{{identifier}\}
+namedopfailed1	\\\{{identifier}
+namedopfailed2	\\\{
+
 /*
  * Numbers
  *
@@ -768,6 +778,23 @@ other			.
 }
 <>	{ yyerror("unterminated dollar-quoted string"); }
 
+{namedop}		{
+	SET_YYLLOC();
+	if (yyleng >= NAMEDATALEN)
+		yyerror("operator name too long");
+	/* XXX Should we support double-quoted, case sensitive names? */
+	yylval->str = downcase_identifier(yytext, yyleng, false, false);
+	return Op;
+}
+
+{namedopfailed1}	{
+	yyerror("unexpected token");
+}
+
+{namedopfailed2}	{
+	yyerror("unexpected token");
+}
+
 {xdstart}		{
 	SET_YYLLOC();
 	BEGIN(xd);
diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
index 602108a40f..05c46ae09e 100644
--- a/src/backend/parser/scansup.c
+++ b/src/backend/parser/scansup.c
@@ -125,3 +125,70 @@ scanner_isspace(char ch)
 		return true;
 	return false;
 }
+
+/*
+ * validNamedOperator() -- return true if name adheres to the scanner rule
+ * {namedop}
+ */
+bool
+validNamedOperator(const char *name)
+{
+	size_t	len = strlen(name);
+	bool	valid_identifier;
+	char   *tmp;
+
+	if (len < 4 || len >= NAMEDATALEN)
+	   return false;
+
+	if (name[0] != '\\' || name[1] != '{' || name[len-1] != '}')
+		return false;
+
+	tmp = pstrdup(name);
+
+	// Disregard the delimiters
+	tmp[len-1] = '\0';
+	valid_identifier = validIdentifier(tmp + 2);
+	pfree(tmp);
+
+	return valid_identifier;
+}
+
+/*
+ * validIdentifier() -- return true if name adheres to the scanner rule
+ * {identifier}
+ *
+ * Note: this function does not check if the identifier length
+ * is less than NAMEDATALEN.
+ */
+bool
+validIdentifier(const char *name)
+{
+	uint8	c;
+	size_t	i, len = strlen(name);
+
+	// Reject if first character is not part of ident_start
+	c = name[0];
+	if ( !(c == '_'
+		|| (c >='A' && c <= 'Z')
+		|| (c >='a' && c <= 'z')
+		|| (c >= 0200 && c <= 0377)))
+	{
+		return false;
+	}
+
+	// Reject if other characters are not part of 

Re: Experiments with Postgres and SSL

2023-01-20 Thread Vladimir Sitnikov
>You could just hard code that servers newer than a
> specific version would have this support

Suppose PostgreSQL 21 implements "fast TLS"
Suppose pgjdbc 43 supports "fast TLS"
Suppose PgBouncer 1.17.0 does not support "fast TLS" yet

If pgjdbc connects to the DB via balancer, then the server would
respond with "server_version=21".
The balancer would forward "server_version", so the driver would
assume "fast TLS is supported".

In practice, fast TLS can't be used in that configuration since the
connection will fail when the driver attempts to ask
"fast TLS" from the PgBouncer.

> Or it could be done with a "protocol option"

Would you please clarify what you mean by "protocol option"?

>I guess a lot depends on the way the driver works and the way the
> application is structured

There are cases when applications pre-create connections on startup,
so the faster connections are created the better.
The same case happens when the admin issues "reset connection pool",
so it discards old connections and creates new ones.
People rarely know all the knobs, so I would like to have a "fast by
default" design (e.g. server sending a notification "you may use fast
mode the next time")
rather than "keep old behaviour and require everybody to add fast=true
to their configuration" (e.g. users having to configure
"try_fast_tls_first=true")

Vladimir




Re: run pgindent on a regular basis / scripted manner

2023-01-20 Thread Tom Lane
Jelte Fennema  writes:
> To me a master branch that pgindent never complains about sounds
> amazing! And I personally think rejection of unindented pushes and
> cfbot complaining about unindented patches would be a very good thing,
> because that seems to be the only solution that could achieve that.

The core problem here is that requiring that would translate to
requiring every code contributor to have a working copy of pg_bsd_indent.
Maybe that's not a huge lift, but it'd be YA obstacle to new contributors,
and we don't need any more of those.

Yeah, if we switched to some other tool maybe we could reduce the size
of that problem.  But as Bruce replied, we've not found another one that
(a) can be coaxed to make output comparable to what we're accustomed to
and (b) seems decently well maintained.

regards, tom lane




Re: SLRUs in the main buffer pool, redux

2023-01-20 Thread Shawn Debnath
On Mon, Jul 25, 2022 at 11:54:36AM +0300, Heikki Linnakangas wrote:

> Oh I just saw that you had a comment about that in the patch and had hacked
> around it. Anyway, calling ResourceOwnerEnlargeBuffers() might be a
> solution. Or switch to a separate "CriticalResourceOwner" that's guaranteed
> to have enough pre-allocated space, before entering the critical section.

Wanted to bump up this thread. Rishu in my team posted a patch in the other 
SLRU thread [1] with the latest updates and fixes and looks like performance 
numbers do not show any regression. This change is currently in the 
January commitfest [2] as well. Any feedback would be appreciated!

[1]
https://www.postgresql.org/message-id/A09EAE0D-0D3F-4A34-ADE9-8AC1DCBE7D57%40amazon.com
[2] https://commitfest.postgresql.org/41/3514/

Shawn 
Amazon Web Services (AWS)




Re: run pgindent on a regular basis / scripted manner

2023-01-20 Thread Bruce Momjian
On Fri, Jan 20, 2023 at 10:43:50AM +0100, Jelte Fennema wrote:
> Side-question: What's the reason why pgindent is used instead of some
> more "modern" code formatter that doesn't require keeping
> typedefs.list up to date for good looking output? (e.g. uncrustify or
> clang-format) Because that would also allow for easy editor
> integration.

Good question.  Our last big pgindent dicussion was in 2017, where I
said:


https://www.postgresql.org/message-id/flat/20170612213525.GA4074%40momjian.us#a96eac96c147ebcc1de86fe2356a160d

Understood.  You would think that with the number of open-source
programs written in C that there would be more interest in C formatting
tools.  Is the Postgres community the only ones with specific
requirements, or is it just that we settled on an older tool and can't
easily change?  I have reviewed the C formatting options a few times
over the years and every time the other options were worse than what we
had.

We also discussed it in 2011, and this email was key for me:


https://www.postgresql.org/message-id/flat/201106220218.p5M2InB08144%40momjian.us#096cbcf02cb58c7d6c49bc79d2c79317

I am excited Andrew has done this.  It has been on my TODO list for a
while --- I was hoping someday we could switch to GNU indent but gave up
after the GNU indent report from Greg Stark that exactly matched my
experience years ago:


http://archives.postgresql.org/pgsql-hackers/2011-04/msg01436.php

Basically, GNU indent has new bugs, but bugs that are harder to work
around than the BSD indent bugs.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.




Re: Non-superuser subscription owners

2023-01-20 Thread Robert Haas
On Fri, Jan 20, 2023 at 8:25 AM Robert Haas  wrote:
> I still think you're talking about a different problem here. I'm
> talking about the problem of knowing whether local files are going to
> be accessed by the connection string.

So here's a dumb patch for this. At least in my mind, the connection
string sanitization/validation is the major design problem here, and
I'm not at all sure that what I did in the attached patch is right.
But let's talk about that. This approach is inspired by Jeff's
comments about local file access upthread, but as Andres pointed out,
that's a completely different set of things than we worry about in
other places. I'm not quite sure what the right model is here.

This patch incidentally allows ALTER SUBSCRIPTION .. SKIP for any
subscription owner, removing the existing check that limits that
operation to superusers and replacing it with nothing. I can't really
see why this needs to be any more restricted than that, and
regrettably neither the check in the existing code nor the commit that
added it have any comments explaining the logic behind that check. If,
for example, skipping a subscription could lead to a server crash,
that would be a reason to restrict the feature to superusers (or
revert it). If it's just a case of the operation being maybe not the
right thing to do, that's not a sufficient reason to restrict it to
superusers. This change is really independent of the rest of the patch
and, if we want to do this, I will separate it into its own patch. But
since this is just for discussion, I didn't worry about that right
now.

Aside from the above, I don't yet see a problem here that I would
consider to be serious enough that we couldn't proceed. I'll try to
avoid too much repetition of what's already been said on this topic,
but I do want to add that I think that creating subscriptions is
properly viewed as a *slightly* scary operation, not a *very* scary
operation. It lets you do two things that you couldn't otherwise. One
is get background processes running that take up process slots and
consume resources -- but note that your ability to consume resources
with however many normal database connections you can make is
virtually unlimited. The other thing it lets you do is poke around the
network, maybe figure out whether some ports are open or closed, and
try to replicate data from any accessible servers you can find, which
could include ports or servers that you can't access directly. I think
that the superuser will be in a good position to evaluate whether that
is a risk in a certain environment or not, and I think many superusers
will conclude that it isn't a big risk. I think that the main
motivation for NOT handing out pg_create_subscription will turn out to
be administrative rather than security-related i.e. they'll want to be
something that falls under their authority rather than someone else's.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v1-0001-Add-new-predefined-role-pg_create_subscriptions.patch
Description: Binary data


Re: Experiments with Postgres and SSL

2023-01-20 Thread Greg Stark
On Fri, 20 Jan 2023 at 01:41, Vladimir Sitnikov
 wrote:
>
> Do you think the server can de-support the old code path soon?

I don't have any intention to de-support anything. I really only
picture it being an option in environments where the client and server
are all part of a stack controlled by a single group. User tools and
general purpose tools are better served by our current more flexible
setup.

> Just wondering: do you consider back-porting the feature to all supported DB 
> versions?

I can't see that, no.

> > In practice though, by the time drivers support this it'll probably be
> > far enough in the future
>
> I think drivers release more often than the database, and we can get driver 
> support even before the database releases.
> I'm from pgjdbc Java driver team, and I think it is unfair to suggest that 
> "driver support is only far enough in the future".

Interesting. I didn't realize this would be so attractive to regular
driver authors. I did think of the Happy Eyeballs technique too but I
agree I wouldn't want to go that way either :)

I guess the server doesn't really have to do anything specific to do
what you want. You could just hard code that servers newer than a
specific version would have this support. Or it could be done with a
"protocol option" -- which wouldn't actually change any behaviour but
would be rejected if the server doesn't support "fast ssl" giving you
the feedback you expect without having much extra legacy complexity.

I guess a lot depends on the way the driver works and the way the
application is structured. Applications that make a single connection
or don't have shared state across connections wouldn't think this way.
And interfaces like libpq would normally just leave it up to the
application to make choices like this. But I guess JVM based
applications are more likely to have long-lived systems that make many
connections and also more likely to make it the driver's
responsibility to manage such things.



--
greg




Re: document the need to analyze partitioned tables

2023-01-20 Thread Bruce Momjian
On Fri, Jan 20, 2023 at 10:33:57AM +0100, Laurenz Albe wrote:
> On Thu, 2023-01-19 at 15:56 -0500, Bruce Momjian wrote:
> > On Thu, Jan 19, 2023 at 01:50:05PM +0100, Laurenz Albe wrote:
> > > On Wed, 2023-01-18 at 16:23 -0500, Bruce Momjian wrote:
> > > > Is it possible to document when partition table statistics helps?
> > > 
> > > I think it would be difficult to come up with an exhaustive list.
> > 
> > I was afraid of that.  I asked only because most people assume
> > autovacuum handles _all_ statistics needs, but this case is not handled.
> > Do people even have any statistics maintenance process anymore, and if
> > not, how would they know they need to run a manual ANALYZE?
> 
> Probably not.  I think this would warrant an entry in the TODO list:
> "make autovacuum collect startistics for partitioned tables".

We have it already:

Have autoanalyze of parent tables occur when child tables are modified

> Even if we cannot give better advice than "run ALANYZE manually if
> the execution plan looks fishy", the patch is still an improvement,
> isn't it?

Yes.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.




Re: Add LZ4 compression in pg_dump

2023-01-20 Thread gkokolatos






--- Original Message ---
On Friday, January 20th, 2023 at 12:34 AM, Tomas Vondra 
 wrote:


> 
> 
> On 1/19/23 18:55, Tomas Vondra wrote:
> 
> > Hi,
> > 
> > On 1/19/23 17:42, gkokola...@pm.me wrote:
> > 
> > > ...
> > > 
> > > Agreed. It was initially submitted as one patch. Then it was requested to 
> > > be
> > > split up in two parts, one to expand the use of the existing API and one 
> > > to
> > > replace with the new interface. Unfortunately the expansion of usage of 
> > > the
> > > existing API requires some tweaking, but that is not a very good reason 
> > > for
> > > the current patch set. I should have done a better job there.
> > > 
> > > Please find v22 attach which combines back 0001 and 0002. It is missing 
> > > the
> > > documentation that was discussed above as I wanted to give a quick 
> > > feedback.
> > > Let me know if you think that the combined version is the one to move 
> > > forward
> > > with.
> > 
> > Thanks, I'll take a look.
> 
> 
> After taking a look and thinking about it a bit more, I think we should
> keep the two parts separate. I think Michael (or whoever proposed) the
> split was right, it makes the patches easier to grok.
> 

Excellent. I will attempt a better split this time round.

> 
> While reading the thread, I also noticed this:
> 
> > By the way, I think that this 0002 should drop all the default clauses
> > in the switches for the compression method so as we'd catch any
> > missing code paths with compiler warnings if a new compression method
> > is added in the future.
> 
> 
> Now I realize why there were "not yet implemented" errors for lz4/zstd
> in all the switches, and why after removing them you had to add a
> default branch.
> 
> We DON'T want a default branch, because the idea is that after adding a
> new compression algorithm, we get warnings about switches not handling
> it correctly.
> 
> So I guess we should walk back this change too :-( It's probably easier
> to go back to v20 from January 16, and redo the couple remaining things
> I commented on.
> 

Sure.

> 
> FWIW I think this is a hint that adding LZ4/ZSTD options, in 5e73a6048,
> but without implementation, was not a great idea. It mostly defeats the
> idea of getting the compiler warnings - all the places already handle
> PG_COMPRESSION_LZ4/PG_COMPRESSION_ZSTD by throwing a pg_fatal. So you'd
> have to grep for the options, inspect all the places or something like
> that anyway. The warnings would only work for entirely new methods.
> 
> However, I now also realize the compressor API in 0002 replaces all of
> this with calls to a generic API callback, so trying to improve this was
> pretty silly from me.

I can try to do a better job at splitting things up.

> 
> Please, fix the couple remaining details in v20, add the docs for the
> callbacks, and I'll try to polish it and get it committed.

Excellent. Allow me an attempt to polish and expect a new version soon.

Cheers,
//Georgios

> 
> 
> regards
> 
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company




Re: Add index scan progress to pg_stat_progress_vacuum

2023-01-20 Thread Imseih (AWS), Sami
>Number of indexes that will be vacuumed or cleaned up. This counter only
>advances when the phase is vacuuming indexes or cleaning up indexes.

I agree, this reads better.

---
-/* Report that we are now vacuuming indexes */
-pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
-
PROGRESS_VACUUM_PHASE_VACUUM_INDEX);
+/*
+ * Report that we are now vacuuming indexes
+ * and the number of indexes to vacuum.
+ */
+progress_start_val[0] = PROGRESS_VACUUM_PHASE_VACUUM_INDEX;
+progress_start_val[1] = vacrel->nindexes;
+pgstat_progress_update_multi_param(2, progress_start_index,
progress_start_val);

>According to our code style guideline[1], we limit line lengths so
>that the code is readable in an 80-column window. Some comments
 >   updated in this patch seem too short.

I will correct this.

>I think it's better to define "void *arg".

Agree

>---
>+/*
>+ * A Leader process that receives this 
> message
>+ * must be ready to update progress.
>+ */
>+Assert(pcxt->parallel_progress_callback);
>+
> Assert(pcxt->parallel_progress_callback_arg);
>+
>+/* Report progress */
>+
>pcxt->parallel_progress_callback(pcxt->parallel_progress_callback_arg);

>I think the parallel query infra should not require
>parallel_progress_callback_arg to always be set. I think it can be
>NULL.

This assertion is inside the new 'P' message type handling.
If a leader is consuming this message, they must have a
progress callback set. Right now we only set the callback
in the parallel vacuum case only, so not all leaders will be prepared
to handle this case. 

Would you agree this is needed for safety?

case 'P':   /* Parallel progress reporting */
{
/*
 * A Leader process that receives this message
 * must be ready to update progress.
 */
Assert(pcxt->parallel_progress_callback);
Assert(pcxt->parallel_progress_callback_arg);

---
>+void
>+parallel_vacuum_update_progress(void *arg)
>+{
>+ParallelVacuumState *pvs = (ParallelVacuumState *)arg;
>+
>+Assert(!IsParallelWorker());
>+
>+if (pvs)
>+
> pgstat_progress_update_param(PROGRESS_VACUUM_INDEX_COMPLETED,
>+
>   pg_atomic_add_fetch_u32(&(pvs->shared->nindexes_completed), 1));
>+}

>Since parallel vacuum always sets the arg, I think we don't need to check 
> it.

The arg is only set during parallel vacuum. During non-parallel vacuum,
It's NULL. This check can be removed, but I realize now that we do need 
an Assert(pvs). Do you agree?

--

Regards,

Sami Imseih
Amazon Web Services (AWS)




Re: Implement missing join selectivity estimation for range types

2023-01-20 Thread Tomas Vondra
On 1/18/23 20:23, Mahmoud Sakr wrote:
> Hi Tomas,
> Thanks for picking up the patch and for the interesting discussions that
> you bring !
> 
>> Interesting. Are there any particular differences compared to how we
>> estimate for example range clauses on regular columns?
>
> The theory is the same for scalar types. Yet, the statistics that are 
> currently
> collected for scalar types include other synopsis than the histogram, such
> as MCV, which should be incorporated in the estimation. The theory for using
> the additional statistics is ready in the paper, but we didn't yet implement 
> it.
> We thought of sharing the ready part, till the time allows us to implement the
> rest, or other developers continue it.
> 

I see. We don't have MCV stats for range types, so the algorithms don't
include that. But we have that for scalars, so the code would need to be
modified to consider that too.

However, I wonder how much could that improve the estimates for range
queries on scalar types. I mean, we already get pretty good estimates
for those, so I guess we wouldn't get much.

>> Right. I think 0.5% is roughly expected for the default statistics
>> target, which creates 100 histogram bins, each representing ~1% of the
>> values. Which on average means ~0.5% error.
> Since this patch deals with join selectivity, we are then crossing 100*100 
> bins.
> The ~0.5% error estimation comes from our experiments, rather than a
> mathematical analysis.
> 

Ah, understood. Even for joins there's probably a fairly close
relationship between the bin size and estimation error, but it's
certainly more complex.

BTW the experiments are those described in section 6 of the paper,
correct? I wonder how uniform (or skewed) the data was - in terms of
range length, etc. Or how it works for other operators, not just for
"<<" as in the paper.

>> independence means we take (P1*P2). But maybe we should be very
>> pessimistic and use e.g. Min(P1,P2)? Or maybe something in between?
>>
>> Another option is to use the length histogram, right? I mean, we know
>> what the average length is, and it should be possible to use that to
>> calculate how "far" ranges in a histogram can overlap.
> The independence assumption exists if we use the lower and upper
> histograms. It equally exists if we use the lower and length histograms.
> In both cases, the link between the two histograms is lost during their
> construction.
> You discussion brings an interesting trade-off of optimistic v.s. pessimistic
> estimations. A typical way to deal with such a trade-off is to average the
> two, for example is model validation in machine learning, Do you think we
> should implement something like
> average( (P1*P2), Min(P1,P2) )?
> 

I don't know.

AFAICS the independence assumption is used not only because it's very
cheap/simple to implement, but also because it actually is a reasonable
assumption for a fair number of data sets (particularly in OLTP).

You're right it's an optimistic estimate, but for many data sets  it's
actually quite reasonable.

I'm not sure that applies to range boundaries - the upper/lower bounds
seem pretty strongly correlated. So maybe using a more pessimistic
formula would be appropriate.

I was thinking the length histogram might allow an alternative,
approach, because it says what fraction of ranges has what length. So
for a "fixed" lower boundary, we may check each of those fractions. Of
course, this assumes consistent range length distribution (so if ranges
at one end are much longer, that won't work).

>> OK. But ideally we'd cross-check elements of the two multiranges, no?
> 
>> So if the column(s) contain a couple very common (multi)ranges that make
>> it into an MCV, we'll ignore those? That's a bit unfortunate, because
>> those MCV elements are potentially the main contributors to selectivity.
> Both ideas would require collecting more detailed statistics, for
> example similar
> to arrays. In this patch, we restricted ourselves to the existing statistics.
> 

Ah, I didn't realize we don't actually build MCV for range types. In
that case the current behavior makes perfect sense.

> 
>> Also, calc_hist_selectivity_contains in multirangetypes_selfuncs.c needs
>> a proper comment, not just "this is a copy from rangetypes".
> Right, the comment should elaborate more that the collected statistics are
> currently that same as rangetypes but may potentially deviate.
> 
>> However, it seems the two functions are exactly the same. Would the
>> functions diverge in the future? If not, maybe there should be just a
>> single shared function?
> Indeed, it is possible that the two functions will deviate if that statistics
> of multirange types will be refined.
> 

Right, but are there any such plans? Also, what's the likelihood we'll
add new statistics to only one of the places (e.g. for multiranges but
not plain ranges)?

I'd keep a single function until we actually need two. That's also
easier for maintenance - with two it's easy to fix 

Re: Add SHELL_EXIT_CODE to psql

2023-01-20 Thread Robert Haas
On Wed, Jan 4, 2023 at 2:09 AM Corey Huinker  wrote:
> 2. There are now two psql variables, SHELL_EXIT_CODE, which has the return 
> code, and SHELL_ERROR, which is a true/false flag based on whether the exit 
> code was nonzero. These variables are the shell result analogues of SQLSTATE 
> and ERROR.

Seems redundant.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2023-01-20 Thread Robert Haas
On Thu, Jan 19, 2023 at 5:51 PM Andres Freund  wrote:
> I don't agree. But mainly my issue is that the devil you know (how this has
> worked for a while) is preferrable to introducing an unknown quantity (your
> patch that hasn't yet seen real world exposure).

Yeah, this is a major reason why I'm very leery about changes in this
area. A lot of autovacuum behavior is emergent, in the sense that it
wasn't directly intended by whoever wrote the code. It's just a
consequence of other decisions that probably seemed very reasonable at
the time they were made but turned out to have surprising and
unpleasant consequences.

In this particular case, I think that there is a large risk that
postponing auto-cancellation will make things significantly worse,
possibly drastically worse, for a certain class of users -
specifically, those whose vacuums often get auto-cancelled. I think
that it's actually pretty common for people to have workloads where
something pretty close to all of the autovacuums get auto-cancelled on
certain tables, and those people are always hard up against
autovacuum_freeze_max_age because they *have* to hit that in order to
get any vacuuming done on the affected tables. If the default
threshold for auto-cancellation goes up, those people will be
vacuuming even less often than they are now.

That's why I really liked your idea of decoupling auto-cancellation
from XID age. Such an approach can still avoid disabling
auto-cancellation just because autovacuum_freeze_max_age has been hit,
but it can also disable it much earlier when it detects that doing so
is necessary to make progress.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: daitch_mokotoff module

2023-01-20 Thread Dag Lem
Alvaro Herrera  writes:

> On 2023-Jan-05, Dag Lem wrote:
>
>> Is there anything else I should do here, to avoid the status being
>> incorrectly stuck at "Waiting for Author" again.
>
> Just mark it Needs Review for now.  I'll be back from vacation on Jan
> 11th and can have a look then (or somebody else can, perhaps.)

Paul Ramsey had a few comments in the mean time, and based on this I
have produced (yet another) patch, with improved documentation.

However it's still not marked as "Ready for Committer" - can you please
take a look again?

Best regards

Dag Lem




Re: almost-super-user problems that we haven't fixed yet

2023-01-20 Thread tushar

On 1/19/23 6:28 PM, tushar wrote:


There is  one typo , for the doc changes, it is  mentioned 
"pg_use_reserved_backends" but i think it supposed to be 
"pg_use_reserved_connections"

under Table 22.1. Predefined Roles.


Thanks, this is fixed now with the latest patches.

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company



Re: Non-superuser subscription owners

2023-01-20 Thread Robert Haas
On Thu, Jan 19, 2023 at 8:46 PM Andres Freund  wrote:
> > I wouldn't be OK with writing our own connection string parser for
> > this purpose, but using PQconninfoParse seems OK. We still have to
> > embed knowledge of which connection string parameters can trigger
> > local file access, but that doesn't seem like a massive problem to me.
>
> > If we already had (or have) that logic someplace else, it would
> > probably make sense to reuse it
>
> We hve. See at least postgres_fdw's check_conn_params(), dblink's
> dblink_connstr_check() and dblink_security_check().

That's not the same thing. It doesn't know anything about other
parameters that might try to consult a local file, like sslcert,
sslkey, sslrootcert, sslca, sslcrl, sslcrldir, and maybe service.
Maybe you want to argue we don't need that, but that's what the
earlier discussion was about.

> As part of the fix for 
> https://postgr.es/m/20220925232237.p6uskba2dw6fnwj2%40awork3.anarazel.de
> I am planning to introduce a bunch of server side helpers for dealing with
> libpq (for establishing a connection while accepting interrupts). We could try
> to centralize knowledge for those checks there.

Maybe. We could also add something into libpq, as Jeff proposed, e.g.
a new connection parameter
the_other_connection_parameters_might_try_to_trojan_the_local_host=1
blocks all that stuff from doing anything.

> The approach of checking, after connection establishment (see
> dblink_security_check()), that we did in fact use the specified password,
> scares me somewhat. See also below.

Yes, I find that extremely dubious. It blocks things that you might
want to do for legitimate reasons, including things that might be more
secure than using a password. And there's no guarantee that it
accomplishes the intended objective either. The stated motivation for
that restriction was. I believe, that we don't want the outbound
connection to rely on the privileges available from the context in
which PostgreSQL itself is running -- but for all we know the remote
side has an IP filter that only allows the PostgreSQL host and no
others. Moreover, it relies on us knowing what the behavior of the
remote server is, even though we have no way of knowing that that
server shares our security interests.

Worse still, I have always felt that the security vulnerability that
led to these controls being installed is pretty much fabricated: it's
an imaginary problem. Today I went back and found the original CVE at
https://nvd.nist.gov/vuln/detail/CVE-2007-3278 and it seems that at
least one other person agrees. The Red Hat vendor statement on that
page says: "Red Hat does not consider this do be a security issue.
dblink is disabled in default configuration of PostgreSQL packages as
shipped with Red Hat Enterprise Linux versions 2.1, 3, 4 and 5, and it
is a configuration decision whether to grant local users arbitrary
access." I think whoever wrote that has an excellent point. I'm unable
to discern any legitimate security purpose for this restriction. What
I think it mostly does is (a) inconvenience users or (b) force them to
rely on a less-secure authentication method than they would otherwise
have chosen.

> > The basic idea that by looking at which connection string properties are set
> > we can tell what kinds of things the connection string is going to do seems
> > sound to me.
>
> I don't think you *can* check it purely based on existing connection string
> properties, unfortunately. Think of e.g. a pg_hba.conf line of "local all user
> peer" (quite reasonable config) or "host all all 127.0.0.1/32 trust" (less 
> so).
>
> Hence the hack with dblink_security_check().
>
> I think there might be a discussion somewhere about adding an option to force
> libpq to not use certain auth methods, e.g. plaintext password/md5. It's
> possible this could be integrated.

I still think you're talking about a different problem here. I'm
talking about the problem of knowing whether local files are going to
be accessed by the connection string.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Operation log for major operations

2023-01-20 Thread Ted Yu
On Fri, Jan 20, 2023 at 1:19 AM Dmitry Koval  wrote:

> Thanks, Ted Yu!
>
>  > Please update year for the license in pg_controllog.c
>
> License updated for files pg_controllog.c, controllog_utils.c,
> pg_controllog.h, controllog_utils.h.
>
>  > +check_file_exists(const char *datadir, const char *path)
>  > There is existing helper function such as:
>  > src/backend/utils/fmgr/dfmgr.c:static bool file_exists(const char
> *name);
>  > Is it possible to reuse that code ?
>
> There are a lot of functions that check the file existence:
>
> 1) src/backend/utils/fmgr/dfmgr.c:static bool file_exists(const char
> *name);
> 2) src/backend/jit/jit.c:static bool file_exists(const char *name);
> 3) src/test/regress/pg_regress.c:bool file_exists(const char *file);
> 4) src/bin/pg_upgrade/exec.c:bool pid_lock_file_exists(const char
> *datadir);
> 5) src/backend/commands/extension.c:bool extension_file_exists(const
> char *extensionName);
>
> But there is no unified function: different components use their own
> function with their own specific.
> Probably we can not reuse code of dfmgr.c:file_exists() because this
> function skip "errno == EACCES" (this error is critical for us).
> I copied for src/bin/pg_rewind/file_ops.c:check_file_exists() code of
> function jit.c:file_exists() (with adaptation for the utility).
>
> --
> With best regards,
> Dmitry Koval
>
> Postgres Professional: http://postgrespro.com

Hi,
Maybe another discussion thread can be created for the consolidation of
XX_file_exists functions.

Cheers


Re: ***Conflict with recovery error***

2023-01-20 Thread Laurenz Albe
On Fri, 2023-01-20 at 09:59 +, Abhishek Prakash wrote:
> We had set max_standby_streaming_delay = -1, but faced storage issues
> nearly 3.5 TB of storage was consumed.

Then either don't run queries that take that long or run fewer data
modifications on the primary.

Or invest in a few more TB disk storage.

Yours,
Laurenz Albe




RE: ***Conflict with recovery error***

2023-01-20 Thread Abhishek Prakash
Hi Laurenz,

Thanks for your reply.
We had set max_standby_streaming_delay = -1, but faced storage issues nearly 
3.5 TB of storage was consumed. 

Regards,
Abhishek P

-Original Message-
From: Laurenz Albe  
Sent: Friday, January 20, 2023 3:26 PM
To: Abhishek Prakash ; 
pgsql-gene...@lists.postgresql.org; pgsql-hackers@lists.postgresql.org; 
usergro...@postgresql.org
Subject: Re: ***Conflict with recovery error***

[**EXTERNAL EMAIL**]

On Fri, 2023-01-20 at 08:56 +, Abhishek Prakash wrote:
> We are facing below issue with read replica we did work arounds by 
> setting hot_standby_feedback, max_standby_streaming_delay and 
> max_standby_archive_delay, which indeed caused adverse effects on 
> primary DB and storage. As our DB is nearly 6 TB which runs as AWS Postgres 
> RDS.
>
> Even the below error occurs on tables where vacuum is disabled and no 
> DML operations are permitted. Will there be any chances to see row 
> versions being changed even if vacuum is disabled.
> Please advise.
>
> 2023-01-13 07:20:12 
> UTC:10.64.103.75(61096):ubpreplica@ubprdb01:[17707]:ERROR:  canceling 
> statement due to conflict with recovery
> 2023-01-13 07:20:12 
> UTC:10.64.103.75(61096):ubpreplica@ubprdb01:[17707]:DETAIL:  User query might 
> have needed to see row versions that must be removed.

It could be HOT chain pruning or an anti-wraparound autovacuum (which runs even 
if autovacuum is disabled).
Disabling autovacuum is not a smart idea to begin with.

Your best bet is to set "max_standby_streaming_delay = -1".

More reading:
https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.cybertec-postgresql.com%2Fen%2Fstreaming-replication-conflicts-in-postgresql%2F=05%7C01%7Cabhishek.prakash08%40infosys.com%7Ce50f15f9ec4a497669a208dafacc8a3c%7C63ce7d592f3e42cda8ccbe764cff5eb6%7C0%7C0%7C638098053794261389%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=%2FlYwVKhkjP23vza5yhuJfw6mcOYynDVbNIhnKRBwUu4%3D=0

Yours,
Laurenz Albe




Re: ***Conflict with recovery error***

2023-01-20 Thread Laurenz Albe
On Fri, 2023-01-20 at 08:56 +, Abhishek Prakash wrote:
> We are facing below issue with read replica we did work arounds by setting
> hot_standby_feedback, max_standby_streaming_delay and 
> max_standby_archive_delay,
> which indeed caused adverse effects on primary DB and storage. As our DB is
> nearly 6 TB which runs as AWS Postgres RDS. 
>  
> Even the below error occurs on tables where vacuum is disabled and no DML
> operations are permitted. Will there be any chances to see row versions
> being changed even if vacuum is disabled.
> Please advise.
>  
> 2023-01-13 07:20:12 
> UTC:10.64.103.75(61096):ubpreplica@ubprdb01:[17707]:ERROR:  canceling 
> statement due to conflict with recovery
> 2023-01-13 07:20:12 
> UTC:10.64.103.75(61096):ubpreplica@ubprdb01:[17707]:DETAIL:  User query might 
> have needed to see row versions that must be removed.

It could be HOT chain pruning or an anti-wraparound autovacuum (which runs
even if autovacuum is disabled).
Disabling autovacuum is not a smart idea to begin with.

Your best bet is to set "max_standby_streaming_delay = -1".

More reading:
https://www.cybertec-postgresql.com/en/streaming-replication-conflicts-in-postgresql/

Yours,
Laurenz Albe




Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-01-20 Thread Marco Slot
On Mon, Jan 9, 2023 at 11:37 PM Tom Lane  wrote:
> Anyway, to get back to the point at hand: if we do have a REPLICA IDENTITY
> FULL situation then we can make use of any unique index over a subset of
> the transmitted columns, and if there's more than one candidate index
> it's unlikely to matter which one we pick.  Given your comment I guess
> we have to also compare the non-indexed columns, so we can't completely
> convert the FULL case to the straight index case.  But still it doesn't
> seem to me to be appropriate to use the planner to find a suitable index.

The main purpose of REPLICA IDENTITY FULL seems to be to enable logical
replication for tables that may have duplicates and therefore cannot have a
unique index that can be used as a replica identity.

For those tables the user currently needs to choose between update/delete
erroring (bad) or doing a sequential scan on the apply side per
updated/deleted tuple (often worse). This issue currently prevents a lot of
automation around logical replication, because users need to decide whether
and when they are willing to accept partial downtime. The current REPLICA
IDENTITY FULL implementation can work in some cases, but applying the
effects of an update that affected a million rows through a million
sequential scans will certainly not end well.

This patch solves the problem by allowing the apply side to pick a
non-unique index to find any matching tuple instead of always using a
sequential scan, but that either requires some planning/costing logic to
avoid picking a lousy index, or allowing the user to manually preselect the
index to use, which is less convenient.

An alternative might be to construct prepared statements and using the
regular planner. If applied uniformly that would also be nice from the
extensibility point-of-view, since there is currently no way for an
extension to augment the apply side. However, I assume the current approach
of using low-level functions in the common case was chosen for performance
reasons.

I suppose the options are:
1. use regular planner uniformly
2. use regular planner only when there's no replica identity (or
configurable?)
3. only use low-level functions
4. keep using sequential scans for every single updated row
5. introduce a hidden logical row identifier in the heap that is guaranteed
unique within a table and can be used as a replica identity when no unique
index exists

Any thoughts?

cheers,
Marco


Re: Doc: Rework contrib appendix -- informative titles, tweaked sentences

2023-01-20 Thread Karl O. Pinc
On Fri, 20 Jan 2023 12:42:31 +0100
Alvaro Herrera  wrote:

> On 2023-Jan-19, Karl O. Pinc wrote:
> 
> > On Thu, 19 Jan 2023 11:03:53 -0600
> > "Karl O. Pinc"  wrote:
> >   
> > > Attached are 2 patches, a regular and a delta from your v4 review:
> > > 
> > > contrib_v5-delta.patch.txt
> > > contrib_v5.patch.txt  
> > 
> > I left your appendix title unchanged: "Additional Supplied 
> > Extensions and Modules".  
> > 
> > I had put "Extensions" after
> > "Modules", because, apparently, things that come last in the
> > sentence are most remembered by the reader. My impression is that
> > more people are looking for extensions than modules.  
> 
> Hmm, I didn't know that.  I guess I can put it back.  My own instinct
> is to put the most important stuff first, not last, but if research
> says to do otherwise, fine, let's do that.

A quick google on the subject tells me that I can't figure out a good
quick google.  I believe it's from the book at bottom.  Memorability
goes "end", "beginning", "middle".  IIRC.

> I went over all the titles again.  There were a couple of mistakes
> and inconsistencies, which I've fixed to the best of my knowledge.
> I'm happy with 0001 now and will push shortly unless there are
> complaints.
> 
> I'm still unsure of the [trusted]/[obsolete] marker, so I split that
> out to commit 0002.  I would like to see more support for that before
> pushing that one.
> 
> I also put the page-split bits to another page, because it seems a bit
> too clumsy. 

All the above sounds good to me.

> I hope somebody with more docbook-fu can comment: maybe
> there's a way to fix it more generally somehow?

What would the general solution be?  There could be a forced page
break at the beginning of _every_ sect1.  For PDFs.  That seems
a bit much, but maybe not.  The only other thing I can think of
that's "general" would be to force a page break for sect1-s
that are in an appendix.  Is any of this wanted?  (Or technically
"better"?)

Thanks for the help.



Writing for Readers
By George R. Bramer, Dorothy Sedley · 1981

About this edition
ISBN:9780675080453, 0675080452
Page count:532
Published:1981
Format:Hardcover
Publisher:C.E. Merrill Publishing Company
Original from:Pennsylvania State University
Digitized:July 15, 2009
Language:English
Author:George R. Bramer, Dorothy Sedley

It's part of a wave of reaction against Strunk & White,
where they started basing writing on research into reading.
(If it's the right book.)

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein




Re: Doc: Rework contrib appendix -- informative titles, tweaked sentences

2023-01-20 Thread Alvaro Herrera
On 2023-Jan-19, Karl O. Pinc wrote:

> On Thu, 19 Jan 2023 11:03:53 -0600
> "Karl O. Pinc"  wrote:
> 
> > Attached are 2 patches, a regular and a delta from your v4 review:
> > 
> > contrib_v5-delta.patch.txt
> > contrib_v5.patch.txt
> 
> I left your appendix title unchanged: "Additional Supplied 
> Extensions and Modules".  
> 
> I had put "Extensions" after
> "Modules", because, apparently, things that come last in the
> sentence are most remembered by the reader. My impression is that
> more people are looking for extensions than modules.

Hmm, I didn't know that.  I guess I can put it back.  My own instinct is
to put the most important stuff first, not last, but if research says to
do otherwise, fine, let's do that.

I went over all the titles again.  There were a couple of mistakes
and inconsistencies, which I've fixed to the best of my knowledge.
I'm happy with 0001 now and will push shortly unless there are
complaints.

I'm still unsure of the [trusted]/[obsolete] marker, so I split that out
to commit 0002.  I would like to see more support for that before
pushing that one.

I also put the page-split bits to another page, because it seems a bit
too clumsy.  I hope somebody with more docbook-fu can comment: maybe
there's a way to fix it more generally somehow?

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Update: super-fast reaction on the Postgres bugs mailing list. The report
was acknowledged [...], and a fix is under discussion.
The wonders of open-source !"
 https://twitter.com/gunnarmorling/status/1596080409259003906
>From 15cf3a5607ec73900c5e3bb343ab687f1e6990c1 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 19 Jan 2023 13:32:35 +0100
Subject: [PATCH v6 1/3] Describe each contrib module in its SGML section title

The original titles only had the module name, which is not very useful
when scanning the list.  By adding a very brief description to each
title, the table of contents becomes friendlier.

Author: Karl O. Pinc 
Reviewed-by: Brar Piening 
Discussion: https://postgr.es/m/20230102180015.37299...@slate.karlpinc.com
---
 doc/src/sgml/adminpack.sgml   |  2 +-
 doc/src/sgml/amcheck.sgml |  2 +-
 doc/src/sgml/auth-delay.sgml  |  2 +-
 doc/src/sgml/auto-explain.sgml|  2 +-
 doc/src/sgml/basebackup-to-shell.sgml |  2 +-
 doc/src/sgml/basic-archive.sgml   |  2 +-
 doc/src/sgml/bloom.sgml   |  2 +-
 doc/src/sgml/btree-gin.sgml   |  4 +--
 doc/src/sgml/btree-gist.sgml  |  2 +-
 doc/src/sgml/citext.sgml  |  2 +-
 doc/src/sgml/contrib-spi.sgml |  2 +-
 doc/src/sgml/contrib.sgml | 49 +++
 doc/src/sgml/cube.sgml|  2 +-
 doc/src/sgml/dblink.sgml  |  2 +-
 doc/src/sgml/dict-int.sgml|  3 +-
 doc/src/sgml/dict-xsyn.sgml   |  2 +-
 doc/src/sgml/earthdistance.sgml   |  2 +-
 doc/src/sgml/file-fdw.sgml|  2 +-
 doc/src/sgml/fuzzystrmatch.sgml   |  3 +-
 doc/src/sgml/hstore.sgml  |  2 +-
 doc/src/sgml/intagg.sgml  |  2 +-
 doc/src/sgml/intarray.sgml|  2 +-
 doc/src/sgml/isn.sgml |  2 +-
 doc/src/sgml/lo.sgml  |  2 +-
 doc/src/sgml/ltree.sgml   |  2 +-
 doc/src/sgml/oldsnapshot.sgml |  2 +-
 doc/src/sgml/pageinspect.sgml |  2 +-
 doc/src/sgml/passwordcheck.sgml   |  2 +-
 doc/src/sgml/pgbuffercache.sgml   |  2 +-
 doc/src/sgml/pgcrypto.sgml|  2 +-
 doc/src/sgml/pgfreespacemap.sgml  |  2 +-
 doc/src/sgml/pgprewarm.sgml   |  2 +-
 doc/src/sgml/pgrowlocks.sgml  |  2 +-
 doc/src/sgml/pgstatstatements.sgml|  3 +-
 doc/src/sgml/pgstattuple.sgml |  2 +-
 doc/src/sgml/pgsurgery.sgml   |  2 +-
 doc/src/sgml/pgtrgm.sgml  |  3 +-
 doc/src/sgml/pgvisibility.sgml|  2 +-
 doc/src/sgml/pgwalinspect.sgml|  2 +-
 doc/src/sgml/postgres-fdw.sgml|  3 +-
 doc/src/sgml/seg.sgml |  2 +-
 doc/src/sgml/sepgsql.sgml |  3 +-
 doc/src/sgml/sslinfo.sgml |  2 +-
 doc/src/sgml/tablefunc.sgml   |  2 +-
 doc/src/sgml/tcn.sgml |  2 +-
 doc/src/sgml/test-decoding.sgml   |  2 +-
 doc/src/sgml/tsm-system-rows.sgml |  3 +-
 doc/src/sgml/tsm-system-time.sgml |  3 +-
 doc/src/sgml/unaccent.sgml|  2 +-
 doc/src/sgml/uuid-ossp.sgml   |  2 +-
 doc/src/sgml/xml2.sgml|  2 +-
 51 files changed, 86 insertions(+), 73 deletions(-)

diff --git a/doc/src/sgml/adminpack.sgml b/doc/src/sgml/adminpack.sgml
index 184e96d7a0..04f3b52379 100644
--- a/doc/src/sgml/adminpack.sgml
+++ b/doc/src/sgml/adminpack.sgml
@@ -1,7 +1,7 @@
 
 
 
- adminpack
+ adminpack  pgAdmin support toolpack
 
  
   adminpack
diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index 923cbde9dd..2b9c1a9205 100644
--- 

Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

2023-01-20 Thread Tomas Vondra



On 1/20/23 07:43, David Geier wrote:
> On 1/18/23 13:52, David Geier wrote:
>> On 1/16/23 21:39, Pavel Stehule wrote:
>>>
>>> po 16. 1. 2023 v 21:34 odesílatel Tomas Vondra
>>>  napsal:
>>>
>>>     Hi,
>>>
>>>     there's minor bitrot in the Mkvcbuild.pm change, making cfbot
>>> unhappy.
>>>
>>>     As for the patch, I don't have much comments. I'm wondering if
>>> it'd be
>>>     useful to indicate which timing source was actually used for EXPLAIN
>>>     ANALYZE, say something like:
>>>
>>>      Planning time: 0.197 ms
>>>      Execution time: 0.225 ms
>>>      Timing source: clock_gettime (or tsc)
>>>
>>> +1
>>
>> I like the idea of exposing the timing source in the EXPLAIN ANALYZE
>> output.
>> It's a good tradeoff between inspectability and effort, given that
>> RDTSC should always be better to use.
>> If there are no objections I go this way.
> Thinking about this a little more made me realize that this will cause
> different pg_regress output depending on the platform. So if we go this
> route we would at least need an option for EXPLAIN ANALYZE to disable
> it. Or rather have it disabled by default and allow for enabling it.
> Thoughts?
> 

What about only showing it for VERBOSE mode? I don't think there are
very many tests doing EXPLAIN (ANALYZE, VERBOSE) - a quick grep found
one such place in partition_prune.sql.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Generating code for query jumbling through gen_node_support.pl

2023-01-20 Thread Peter Eisentraut

On 20.01.23 05:35, Michael Paquier wrote:

On Thu, Jan 19, 2023 at 09:42:03AM +0100, Peter Eisentraut wrote:

I see that in the 0003 patch, most location fields now have an explicit
markup with query_jumble_ignore.  I thought we had previously resolved to
consider location fields to be automatically ignored unless explicitly
included (like for the Const node).  This appears to invert that?  Am I
missing something?

As a result, I have rebased the patch set to use the two-attribute
approach: query_jumble_ignore and query_jumble_location.


Structurally, this looks okay to me now.

In your 0001 patch, most of the comment reformattings for location 
fields are no longer needed, so you should undo those.


The 0002 patch looks good.

Those two could be committed with those adjustments, I think.

I'll read the 0003 again more carefully.  I haven't studied the new 0004 
yet.






Re: Exit walsender before confirming remote flush in logical replication

2023-01-20 Thread Amit Kapila
On Tue, Jan 17, 2023 at 2:41 PM Amit Kapila  wrote:
>
> Let me try to summarize the discussion till now. The problem we are
> trying to solve here is to allow a shutdown to complete when walsender
> is not able to send the entire WAL. Currently, in such cases, the
> shutdown fails. As per our current understanding, this can happen when
> (a) walreceiver/walapply process is stuck (not able to receive more
> WAL) due to locks or some other reason; (b) a long time delay has been
> configured to apply the WAL (we don't yet have such a feature for
> logical replication but the discussion for same is in progress).
>
> Both reasons mostly apply to logical replication because there is no
> separate walreceiver process whose job is to just flush the WAL. In
> logical replication, the process that receives the WAL also applies
> it. So, while applying it can stuck for a long time waiting for some
> heavy-weight lock to be released by some other long-running
> transaction by the backend.
>

While checking the commits and email discussions in this area, I came
across the email [1] from Michael where something similar seems to
have been discussed. Basically, whether the early shutdown of
walsender can prevent a switchover between publisher and subscriber
but that part was never clearly answered in that email chain. It might
be worth reading the entire discussion [2]. That discussion finally
lead to the following commit:

commit c6c333436491a292d56044ed6e167e2bdee015a2
Author: Andres Freund 
Date:   Mon Jun 5 18:53:41 2017 -0700

Prevent possibility of panics during shutdown checkpoint.

When the checkpointer writes the shutdown checkpoint, it checks
afterwards whether any WAL has been written since it started and
throws a PANIC if so.  At that point, only walsenders are still
active, so one might think this could not happen, but walsenders can
also generate WAL, for instance in BASE_BACKUP and logical decoding
related commands (e.g. via hint bits).  So they can trigger this panic
if such a command is run while the shutdown checkpoint is being
written.

To fix this, divide the walsender shutdown into two phases.  First,
checkpointer, itself triggered by postmaster, sends a
PROCSIG_WALSND_INIT_STOPPING signal to all walsenders.  If the backend
is idle or runs an SQL query this causes the backend to shutdown, if
logical replication is in progress all existing WAL records are
processed followed by a shutdown.
...
...

Here, as mentioned in the commit, we are trying to ensure that before
checkpoint writes its shutdown WAL record, we ensure that "if logical
replication is in progress all existing WAL records are processed
followed by a shutdown.". I think even before this commit, we try to
send the entire WAL before shutdown but not completely sure. There was
no discussion on what happens if the logical walreceiver/walapply
process is waiting on some heavy-weight lock and the network socket
buffer is full due to which walsender is not able to process its WAL.
Is it okay for shutdown to fail in such a case as it is happening now,
or shall we somehow detect that and shut down the walsender, or we
just allow logical walsender to always exit immediately as soon as the
shutdown signal came?

Note: I have added some of the people involved in the previous
thread's [2] discussion in the hope that they can share their
thoughts.

[1] - 
https://www.postgresql.org/message-id/CAB7nPqR3icaA%3DqMv_FuU8YVYH3KUrNMnq_OmCfkzxCHC4fox8w%40mail.gmail.com
[2] - 
https://www.postgresql.org/message-id/CAHGQGwEsttg9P9LOOavoc9d6VB1zVmYgfBk%3DLjsk-UL9cEf-eA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: [Proposal] Add foreign-server health checks infrastructure

2023-01-20 Thread Katsuragi Yuta

On 2023-01-11 19:04, Hayato Kuroda (Fujitsu) wrote:

Dear hackers,

I was not sure, but the cfbot could not be accepted the previous 
version.

I made the patch again from HEAD(5f6401) without any changes,
so I did not count up the version number.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Hi,

Thanks for the patch!
I read the patch v24. These are my comments. Please check.

## v24-0001-Add-PQConncheck-and-PQCanConncheck-to-libpq.patch
+
+ 
PQCanConncheckPQCanConncheck

+ 
+  
+   Returns the status of the socket.

Is this description right? I think this description is for
PQConncheck. Something like "Checks whether PQConncheck is
available on this platform." seems better.


+/* Check whether the postgres server is still alive or not */
+extern int PQConncheck(PGconn *conn);
+extern int PQCanConncheck(void);

Should the names of these functions be in the form of PQconnCheck?
Not PQConncheck (c.f. The section of fe-misc.c in libpq-fe.h).


## v24-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch
+PG_FUNCTION_INFO_V1(postgres_fdw_verify_connection_states);
+PG_FUNCTION_INFO_V1(postgres_fdw_verify_connection_states_all);
+PG_FUNCTION_INFO_V1(postgres_fdw_can_verify_connection_states);

This patch adds new functions to postgres_fdw for PostgreSQL 16.
So, I think it is necessary to update the version of postgres_fdw (v1.1 
to v1.2).



+postgres_fdw_verify_connection_states_all() returns 
boolean

+
+ 
+  This function checks the status of remote connections that are 
established
+  by postgres_fdw from the local session to 
the foreign

+  servers. This check is performed by polling the socket and allows

It seems better to add a description that states this function
checks all the connections. Like the description of
postgres_fdw_disconnect_all(). For example, "This function
checks the status of 'all the' remote connections..."?

regards,

--
Katsuragi Yuta
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: DSA failed to allocate memory

2023-01-20 Thread Thomas Munro
On Fri, Jan 20, 2023 at 11:44 AM Tom Lane  wrote:
> Thomas Munro  writes:
> > Thanks for the report, and for working on the fix.  Can you please
> > create a commitfest entry (if you haven't already)?  I plan to look at
> > this soon, after the code freeze.
>
> Hi Thomas, are you still intending to look at this DSA bug fix?
> It's been sitting idle for months.

Yeah.  I think the analysis looks good, but I'll do some testing next
week with the aim of getting it committed.  Looks like it now needs
Meson changes, but I'll look after that as my penance.




Re: Skipping schema changes in publication

2023-01-20 Thread vignesh C
On Wed, 16 Nov 2022 at 15:35, vignesh C  wrote:
>
> On Wed, 16 Nov 2022 at 09:34, Ian Lawrence Barwick  wrote:
> >
> > 2022年11月7日(月) 22:39 vignesh C :
> > >
> > > On Fri, 4 Nov 2022 at 08:19, Ian Lawrence Barwick  
> > > wrote:
> > > >
> > > > Hi
> > > >
> > > > cfbot reports the patch no longer applies [1]. As CommitFest 2022-11 is
> > > > currently underway, this would be an excellent time to update the patch.
> > > >
> > > > [1] http://cfbot.cputube.org/patch_40_3646.log
> > >
> > > Here is an updated patch which is rebased on top of HEAD.
> >
> > Thanks for the updated patch.
> >
> > While reviewing the patch backlog, we have determined that this patch adds
> > one or more TAP tests but has not added the test to the "meson.build" file.
>
> Thanks, I have updated the meson.build to include the TAP test. The
> attached patch has the changes for the same.

The patch was not applying on top of HEAD, attached a rebased version.

Regards,
Vignesh
From 5889eb0b42b6f595873e4a1d3823cc49b5070623 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Thu, 19 Jan 2023 18:13:54 +0530
Subject: [PATCH v10 1/2] Add RESET clause to Alter Publication which will
 reset the publication with default values.

This patch adds a new RESET clause to ALTER PUBLICATION which will reset
the publication to the default state which includes resetting the publication
parameters, setting ALL TABLES flag to false and dropping the relations and
schemas that are associated with the publication.
Usage:
ALTER PUBLICATION pub1 RESET;
---
 doc/src/sgml/ref/alter_publication.sgml   |  35 ++--
 src/backend/commands/publicationcmds.c| 105 --
 src/backend/parser/gram.y |   9 ++
 src/bin/psql/tab-complete.c   |   2 +-
 src/include/nodes/parsenodes.h|   3 +-
 src/test/regress/expected/publication.out | 101 +
 src/test/regress/sql/publication.sql  |  50 +++
 7 files changed, 291 insertions(+), 14 deletions(-)

diff --git a/doc/src/sgml/ref/alter_publication.sgml b/doc/src/sgml/ref/alter_publication.sgml
index cd20868bca..ab3cebd71c 100644
--- a/doc/src/sgml/ref/alter_publication.sgml
+++ b/doc/src/sgml/ref/alter_publication.sgml
@@ -27,6 +27,7 @@ ALTER PUBLICATION name DROP name SET ( publication_parameter [= value] [, ... ] )
 ALTER PUBLICATION name OWNER TO { new_owner | CURRENT_ROLE | CURRENT_USER | SESSION_USER }
 ALTER PUBLICATION name RENAME TO new_name
+ALTER PUBLICATION name RESET
 
 where publication_object is one of:
 
@@ -67,18 +68,32 @@ ALTER PUBLICATION name RENAME TO 
 
   
-   The remaining variants change the owner and the name of the publication.
+   The OWNER clause will change the owner of the
+   publication.
+  
+
+   
+The RENAME clause will change the name of the
+publication.
+   
+
+   
+The RESET clause will reset the publication to the
+default state which includes resetting the publication parameters, setting
+ALL TABLES flag to false and
+dropping all relations and schemas that are associated with the
+publication.
   
 
   
You must own the publication to use ALTER PUBLICATION.
Adding a table to a publication additionally requires owning that table.
-   The ADD TABLES IN SCHEMA and
-   SET TABLES IN SCHEMA to a publication requires the
-   invoking user to be a superuser.
-   To alter the owner, you must be able to SET ROLE to the
-   new owning role, and that role must have CREATE
-   privilege on the database.
+   The ADD TABLES IN SCHEMA,
+   SET TABLES IN SCHEMA to a publication and
+   RESET of publication requires the invoking user to be a
+   superuser. To alter the owner, you must be able to
+   SET ROLE to the new owning role, and that role must have
+   CREATE privilege on the database.
Also, the new owner of a FOR ALL TABLES or
FOR TABLES IN SCHEMA
publication must be a superuser. However, a superuser can
@@ -212,6 +227,12 @@ ALTER PUBLICATION sales_publication ADD TABLES IN SCHEMA marketing, sales;
production_publication:
 
 ALTER PUBLICATION production_publication ADD TABLE users, departments, TABLES IN SCHEMA production;
+
+
+  
+   Reset the publication production_publication:
+
+ALTER PUBLICATION production_publication RESET;
 
  
 
diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c
index f4ba572697..e9ee77ecd3 100644
--- a/src/backend/commands/publicationcmds.c
+++ b/src/backend/commands/publicationcmds.c
@@ -55,6 +55,14 @@
 #include "utils/varlena.h"
 
 
+/* CREATE PUBLICATION default values for flags and publication parameters */
+#define PUB_DEFAULT_ACTION_INSERT true
+#define PUB_DEFAULT_ACTION_UPDATE true
+#define PUB_DEFAULT_ACTION_DELETE true
+#define PUB_DEFAULT_ACTION_TRUNCATE true
+#define PUB_DEFAULT_VIA_ROOT false
+#define PUB_DEFAULT_ALL_TABLES false
+
 /*
  * Information used to validate the columns in the row filter expression. See
  * contain_invalid_rfcolumn_walker for details.
@@ -93,11 +101,11 @@ 

Re: run pgindent on a regular basis / scripted manner

2023-01-20 Thread Jelte Fennema
As a not very frequent submitter, on all the patches that I submit I
keep running into this problem. I have two open patchsets for
libpq[1][2] both of which currently include the same initial "run
pgindent" patch in addition to the actual patch, just so I can
actually run it on my own patch because a9e9a9f caused formatting to
totally be off in the libpq directory. And there's lots of other
changes that pgindent wants to make, which are visible on the job that
Magnus has set up.

To me a master branch that pgindent never complains about sounds
amazing! And I personally think rejection of unindented pushes and
cfbot complaining about unindented patches would be a very good thing,
because that seems to be the only solution that could achieve that.

Having cfbot complain also doesn't sound like a crazy burden for
submitters. Many open-source projects have CI complaining if code
formatting does not pass automatic formatting tools. As long as there
is good documentation on how to install and run pgindent I don't think
it should be a big problem. A link to those docs could even be
included in the failing CI job its error message. A pre-commit hook
that submitters/committers could install would be super useful too.
Since right now I sometimes forget to run pgindent, especially since
there's no editor integration (that I know of) for pgindent.

Side-question: What's the reason why pgindent is used instead of some
more "modern" code formatter that doesn't require keeping
typedefs.list up to date for good looking output? (e.g. uncrustify or
clang-format) Because that would also allow for easy editor
integration.

[1]: https://commitfest.postgresql.org/41/3511/
[2]: https://commitfest.postgresql.org/41/3679/




Re: document the need to analyze partitioned tables

2023-01-20 Thread Laurenz Albe
On Thu, 2023-01-19 at 15:56 -0500, Bruce Momjian wrote:
> On Thu, Jan 19, 2023 at 01:50:05PM +0100, Laurenz Albe wrote:
> > On Wed, 2023-01-18 at 16:23 -0500, Bruce Momjian wrote:
> > > Is it possible to document when partition table statistics helps?
> > 
> > I think it would be difficult to come up with an exhaustive list.
> 
> I was afraid of that.  I asked only because most people assume
> autovacuum handles _all_ statistics needs, but this case is not handled.
> Do people even have any statistics maintenance process anymore, and if
> not, how would they know they need to run a manual ANALYZE?

Probably not.  I think this would warrant an entry in the TODO list:
"make autovacuum collect startistics for partitioned tables".

Even if we cannot give better advice than "run ALANYZE manually if
the execution plan looks fishy", the patch is still an improvement,
isn't it?

I have already seen several questions by people who read the current
documentation and were worried that autovacuum wouldn't clean up their
partitioned tables.

Yours,
Laurenz Albe




Re: Add SHELL_EXIT_CODE to psql

2023-01-20 Thread Maxim Orlov
On Fri, 13 Jan 2023 at 07:50, Corey Huinker  wrote:

>
> I named it wait_result_to_exit_code(), but I welcome suggestions of a
> better name.
>

Thanks! But CF bot still not happy. I think, we should address issues from
here https://cirrus-ci.com/task/5391002618298368


-- 
Best regards,
Maxim Orlov.


Re: Operation log for major operations

2023-01-20 Thread Dmitry Koval

Thanks, Ted Yu!

> Please update year for the license in pg_controllog.c

License updated for files pg_controllog.c, controllog_utils.c, 
pg_controllog.h, controllog_utils.h.


> +check_file_exists(const char *datadir, const char *path)
> There is existing helper function such as:
> src/backend/utils/fmgr/dfmgr.c:static bool file_exists(const char *name);
> Is it possible to reuse that code ?

There are a lot of functions that check the file existence:

1) src/backend/utils/fmgr/dfmgr.c:static bool file_exists(const char *name);
2) src/backend/jit/jit.c:static bool file_exists(const char *name);
3) src/test/regress/pg_regress.c:bool file_exists(const char *file);
4) src/bin/pg_upgrade/exec.c:bool pid_lock_file_exists(const char *datadir);
5) src/backend/commands/extension.c:bool extension_file_exists(const 
char *extensionName);


But there is no unified function: different components use their own 
function with their own specific.
Probably we can not reuse code of dfmgr.c:file_exists() because this 
function skip "errno == EACCES" (this error is critical for us).
I copied for src/bin/pg_rewind/file_ops.c:check_file_exists() code of 
function jit.c:file_exists() (with adaptation for the utility).


--
With best regards,
Dmitry Koval

Postgres Professional: http://postgrespro.comFrom aabff7fa8e51e760c558efa7b53fb675f996c7ff Mon Sep 17 00:00:00 2001
From: Koval Dmitry 
Date: Mon, 14 Nov 2022 21:39:14 +0300
Subject: [PATCH v5] Operation log

---
 src/backend/access/transam/xlog.c |  10 +
 src/backend/backup/basebackup.c   |   1 +
 src/backend/storage/lmgr/lwlocknames.txt  |   1 +
 src/backend/utils/misc/Makefile   |   1 +
 src/backend/utils/misc/meson.build|   1 +
 src/backend/utils/misc/pg_controllog.c| 142 
 src/bin/pg_checksums/pg_checksums.c   |   1 +
 src/bin/pg_resetwal/pg_resetwal.c |   4 +
 src/bin/pg_rewind/file_ops.c  |  20 +
 src/bin/pg_rewind/file_ops.h  |   2 +
 src/bin/pg_rewind/pg_rewind.c |  59 ++
 src/bin/pg_upgrade/controldata.c  |  52 ++
 src/bin/pg_upgrade/exec.c |   9 +-
 src/bin/pg_upgrade/pg_upgrade.c   |   2 +
 src/bin/pg_upgrade/pg_upgrade.h   |   2 +
 src/common/Makefile   |   1 +
 src/common/controllog_utils.c | 667 ++
 src/common/meson.build|   1 +
 src/include/catalog/pg_controllog.h   | 142 
 src/include/catalog/pg_proc.dat   |   9 +
 src/include/common/controllog_utils.h |  27 +
 src/test/modules/test_misc/meson.build|   1 +
 .../modules/test_misc/t/004_operation_log.pl  | 136 
 src/tools/msvc/Mkvcbuild.pm   |   4 +-
 24 files changed, 1290 insertions(+), 5 deletions(-)
 create mode 100644 src/backend/utils/misc/pg_controllog.c
 create mode 100644 src/common/controllog_utils.c
 create mode 100644 src/include/catalog/pg_controllog.h
 create mode 100644 src/include/common/controllog_utils.h
 create mode 100644 src/test/modules/test_misc/t/004_operation_log.pl

diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 8f47fb7570..dd3c4c7ac4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -68,6 +68,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "common/controldata_utils.h"
+#include "common/controllog_utils.h"
 #include "common/file_utils.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
@@ -4775,6 +4776,9 @@ BootStrapXLOG(void)
/* some additional ControlFile fields are set in WriteControlFile() */
WriteControlFile();
 
+   /* Put information into operation log. */
+   put_operation_log_element(DataDir, OLT_BOOTSTRAP);
+
/* Bootstrap the commit log, too */
BootStrapCLOG();
BootStrapCommitTs();
@@ -5743,8 +5747,14 @@ StartupXLOG(void)
SpinLockRelease(>info_lck);
 
UpdateControlFile();
+
LWLockRelease(ControlFileLock);
 
+   /* Put information into operation log. */
+   if (promoted)
+   put_operation_log_element(DataDir, OLT_PROMOTED);
+   put_operation_log_element(DataDir, OLT_STARTUP);
+
/*
 * Shutdown the recovery environment.  This must occur after
 * RecoverPreparedTransactions() (see notes in lock_twophase_recover())
diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c
index 3fb9451643..0ca709b5b2 100644
--- a/src/backend/backup/basebackup.c
+++ b/src/backend/backup/basebackup.c
@@ -211,6 +211,7 @@ static const struct exclude_list_item excludeFiles[] =
  */
 static const struct exclude_list_item noChecksumFiles[] = {
{"pg_control", false},
+   {"pg_control_log", false},
{"pg_filenode.map", false},
{"pg_internal.init", true},
{"PG_VERSION", false},
diff 

Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-20 Thread shveta malik
On Fri, Jan 20, 2023 at 2:23 PM shveta malik  wrote:
>
> On Fri, Jan 20, 2023 at 1:08 PM Peter Smith  wrote:
>
> > a) the message should say that this is the *remaining* time to left to wait.
> >
> > b) it might be convenient to know from the log what was the original
> > min_apply_delay value in the 1st place.
> >
> > For example, the logs might look something like this:
> >
> > DEBUG: time-delayed replication for txid 1234, min_apply_delay =
> > 16 ms. Remaining wait time: 159972 ms
> > DEBUG: time-delayed replication for txid 1234, min_apply_delay =
> > 16 ms. Remaining wait time: 142828 ms
> > DEBUG: time-delayed replication for txid 1234, min_apply_delay =
> > 16 ms. Remaining wait time: 129994 ms
> > DEBUG: time-delayed replication for txid 1234, min_apply_delay =
> > 16 ms. Remaining wait time: 110001 ms
> > ...
> >
>
> +1
> This will also help when min_apply_delay is set to a new value in
> between the current wait. Lets say, I started with min_apply_delay=5
> min, when the worker was half way through this, I changed
> min_apply_delay to 3 min or say 10min, I see the impact of that change
> i.e. new wait-time is adjusted, but log becomes confusing. So, please
> keep this scenario as well in mind while improving logging.
>


when we send-feedback during apply-delay after every
wal_receiver_status_interval , the log comes as:
023-01-19 17:12:56.000 IST [404795] DEBUG:  sending feedback (force 1)
to recv 0/1570840, write 0/1570840, flush 0/1570840

Shall we have some info here to indicate that it is sent while waiting
for apply_delay to distinguish it from other such send-feedback logs?
It will
make apply_delay flow clear in logs.

thanks
Shveta




Re: Add semi-join pushdown to postgres_fdw

2023-01-20 Thread Alexander Pyhalov

Hi.

Tomas Vondra писал 2023-01-19 20:49:
I took a quick look at the patch. It needs a rebase, although it 
applies

fine using patch.

A couple minor comments:

1) addl_conds seems a bit hard to understand, I'd use either the full
wording (additional_conds) or maybe extra_conds


Renamed to additional_conds.



2) some of the lines got quite long, and need a wrap

Splitted some of them. Not sure if it's enough.



3) unknown_subquery_rels name is a bit misleading - AFAIK it's the rels
that can't be referenced from upper rels (per what the .h says). So 
they

are known, but hidden. Is there a better name?


Renamed to hidden_subquery_rels. These are rels, which can't be referred 
to from upper join levels.




4) joinrel_target_ok() needs a better comment, explaining *when* the
reltarget is safe for pushdown. The conditions are on the same row, but
the project style is to break after '&&'.


Added comment. It seems to be a rephrasing of lower comment in 
joinrel_target_ok().


--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom f37d26d9b622767f94e89034fa8e4fccc69e358d Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Mon, 7 Nov 2022 10:23:32 +0300
Subject: [PATCH v4] postgres_fdw: add support for deparsing semi joins

We deparse semi-joins as EXISTS subqueries. So, deparsing
semi-join leads to generating addl_conds condition,
which is then added to the uppermost JOIN's WHERE clause.
---
 contrib/postgres_fdw/deparse.c| 206 +---
 .../postgres_fdw/expected/postgres_fdw.out| 297 --
 contrib/postgres_fdw/postgres_fdw.c   |  89 +-
 contrib/postgres_fdw/postgres_fdw.h   |   2 +
 contrib/postgres_fdw/sql/postgres_fdw.sql | 119 ++-
 5 files changed, 632 insertions(+), 81 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 473fa45bd43..1217d47050b 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -180,11 +180,14 @@ static void appendConditions(List *exprs, deparse_expr_cxt *context);
 static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root,
   RelOptInfo *foreignrel, bool use_alias,
   Index ignore_rel, List **ignore_conds,
+  StringInfo additional_conds,
   List **params_list);
+static void appendWhereClause(List *exprs, StringInfo additional_conds, deparse_expr_cxt *context);
 static void deparseFromExpr(List *quals, deparse_expr_cxt *context);
 static void deparseRangeTblRef(StringInfo buf, PlannerInfo *root,
 			   RelOptInfo *foreignrel, bool make_subquery,
-			   Index ignore_rel, List **ignore_conds, List **params_list);
+			   Index ignore_rel, List **ignore_conds,
+			   StringInfo additional_conds, List **params_list);
 static void deparseAggref(Aggref *node, deparse_expr_cxt *context);
 static void appendGroupByClause(List *tlist, deparse_expr_cxt *context);
 static void appendOrderBySuffix(Oid sortop, Oid sortcoltype, bool nulls_first,
@@ -1370,23 +1373,21 @@ deparseFromExpr(List *quals, deparse_expr_cxt *context)
 {
 	StringInfo	buf = context->buf;
 	RelOptInfo *scanrel = context->scanrel;
+	StringInfoData additional_conds;
 
 	/* For upper relations, scanrel must be either a joinrel or a baserel */
 	Assert(!IS_UPPER_REL(context->foreignrel) ||
 		   IS_JOIN_REL(scanrel) || IS_SIMPLE_REL(scanrel));
 
+	initStringInfo(_conds);
 	/* Construct FROM clause */
 	appendStringInfoString(buf, " FROM ");
 	deparseFromExprForRel(buf, context->root, scanrel,
 		  (bms_membership(scanrel->relids) == BMS_MULTIPLE),
-		  (Index) 0, NULL, context->params_list);
-
-	/* Construct WHERE clause */
-	if (quals != NIL)
-	{
-		appendStringInfoString(buf, " WHERE ");
-		appendConditions(quals, context);
-	}
+		  (Index) 0, NULL, _conds,
+		  context->params_list);
+	appendWhereClause(quals, _conds, context);
+	pfree(additional_conds.data);
 }
 
 /*
@@ -1598,6 +1599,33 @@ appendConditions(List *exprs, deparse_expr_cxt *context)
 	reset_transmission_modes(nestlevel);
 }
 
+/*
+ * Append WHERE clause, containing conditions
+ * from exprs and additional_conds, to context->buf.
+ */
+static void
+appendWhereClause(List *exprs, StringInfo additional_conds, deparse_expr_cxt *context)
+{
+	StringInfo	buf = context->buf;
+	bool		need_and = false;
+
+	if (exprs != NIL || additional_conds->len > 0)
+		appendStringInfoString(buf, " WHERE ");
+
+	if (exprs != NIL)
+	{
+		appendConditions(exprs, context);
+		need_and = true;
+	}
+
+	if (additional_conds->len > 0)
+	{
+		if (need_and)
+			appendStringInfoString(buf, " AND ");
+		appendStringInfo(buf, "(%s)", additional_conds->data);
+	}
+}
+
 /* Output join name for given join type */
 const char *
 get_jointype_name(JoinType jointype)
@@ -1616,6 +1644,9 @@ get_jointype_name(JoinType jointype)
 		case JOIN_FULL:
 			return "FULL";
 
+		case JOIN_SEMI:
+			return "SEMI";
+
 		default:
 			/* Shouldn't come here, but protect from buggy code. */
 			

  1   2   >