date:20220822

Re: Fast COPY FROM based on batch insert

2022-08-22 Thread Andrey Lepikhov


On 22/8/2022 11:44, Etsuro Fujita wrote:

I think the latter is more consistent with the existing error context
information when in CopyMultiInsertBufferFlush().  Actually, I thought
this too, and I think this would be useful when the COPY FROM command
is executed on a foreign table.  My concern, however, is the case when
the command is executed on a partitioned table containing foreign
partitions; in that case the input data would not always be sorted in
the partition order, so the range for an error-occurring foreign
partition might contain many lines with rows from other partitions,
which I think makes the range information less useful.  Maybe I'm too
worried about that, though.
I got your point. Indeed, perharps such info doesn't really needed to be 
included into the core, at least for now.


--
regards,
Andrey Lepikhov
Postgres Professional

Re: Schema variables - new implementation for Postgres 15

2022-08-22 Thread Julien Rouhaud

Hi,

On Tue, Jan 18, 2022 at 10:01:01PM +0100, Pavel Stehule wrote:
>
> pá 14. 1. 2022 v 3:44 odesílatel Julien Rouhaud  napsal:
>
> > On Thu, Jan 13, 2022 at 07:32:26PM +0100, Pavel Stehule wrote:
> > > čt 13. 1. 2022 v 19:23 odesílatel Dean Rasheed  > >
> > > > On Thu, 13 Jan 2022 at 17:42, Pavel Stehule 
> > > > wrote:
> > > > >
> > > > > I like the idea of prioritizing tables over variables with warnings
> > when
> > > > collision is detected. It cannot break anything. And it allows to using
> > > > short identifiers when there is not collision.
> > > >
> > > > Yeah, that seems OK, as long as it's clearly documented. I don't think
> > > > a warning is necessary.
> >
> > What should be the behavior for a cached plan that uses a variable when a
> > conflicting relation is later created?  I think that it should be the same
> > as a
> > search_path change and the plan should be discarded.
> >
> > > The warning can be disabled by default, but I think it should be there.
> > > This is a signal, so some in the database schema should be renamed.
> > Maybe -
> > > session_variables_ambiguity_warning.
> >
> > I agree that having a way to know that a variable has been bypassed can be
> > useful.
> >
>
> done

I've been thinking a bit more about the shadowing, and one scenario we didn't
discuss is something like this naive example:

CREATE TABLE tt(a text, b text);

CREATE TYPE abc AS (a text, b text, c text);
CREATE VARIABLE tt AS abc;

INSERT INTO tt SELECT 'a', 'b';
LET tt = ('x', 'y', 'z');

SELECT tt.a, tt.b, tt.c FROM tt;

Which, with the default configuration, currently returns

 a | b | c
---+---+---
 a | b | z
(1 row)

I feel a bit uncomfortable that the system allows mixing variable attributes
and relation columns for the same relation name.  This is even worse here as
part of the variable attributes are shadowed.

It feels like a good way to write valid queries that clearly won't do what you
think they do, a bit like the correlated sub-query trap, so maybe we should
have a way to prevent it.

What do you think?

Re: logical decoding and replication of sequences

2022-08-22 Thread Thomas Munro

On Tue, Aug 23, 2022 at 4:21 PM Michael Paquier  wrote:
> On Mon, Aug 08, 2022 at 06:15:46PM +1200, Thomas Munro wrote:
> > The attached patch that simply moves the cache invalidation into
> > report_invalid_record(), so that it's reached by the above code and
> > everything else that reports an error, seems to fix the problem in
> > src/bin/pg_ctl/t/003_promote.pl with Noah's spanner-in-the-works patch
> > applied, and passes check-world without it.  I need to look at this
> > some more, though, and figure out if it's the right fix.
>
> Thomas, where are you on this open item?  A potential PANIC at
> promotion is bad.  One possible exit path would be to switch the
> default of recovery_prefetch, though that's a kind of last-resort
> option seen from here.

I will get a fix committed this week -- I need to study
Horiguchi-san's analysis...

Re: Considering additional sort specialisation functions for PG16

2022-08-22 Thread David Rowley

On Tue, 23 Aug 2022 at 15:22, John Naylor  wrote:
> Did you happen to see
>
> https://www.postgresql.org/message-id/CAFBsxsFhq8VUSkUL5YO17cFXbCPwtbbxBu%2Bd9MFrrsssfDXm3Q%40mail.gmail.com

I missed that.  It looks like a much more promising idea than what I
came up with. I've not looked at your code yet, but I'm interested and
will aim to look soon.

David

Re: logical decoding and replication of sequences

2022-08-22 Thread Michael Paquier

On Mon, Aug 08, 2022 at 06:15:46PM +1200, Thomas Munro wrote:
> The attached patch that simply moves the cache invalidation into
> report_invalid_record(), so that it's reached by the above code and
> everything else that reports an error, seems to fix the problem in
> src/bin/pg_ctl/t/003_promote.pl with Noah's spanner-in-the-works patch
> applied, and passes check-world without it.  I need to look at this
> some more, though, and figure out if it's the right fix.

Thomas, where are you on this open item?  A potential PANIC at
promotion is bad.  One possible exit path would be to switch the
default of recovery_prefetch, though that's a kind of last-resort
option seen from here.
--
Michael


signature.asc
Description: PGP signature

Re: Perform streaming logical transactions by background workers and parallel apply

2022-08-22 Thread Peter Smith

On Mon, Aug 22, 2022 at 10:49 PM kuroda.hay...@fujitsu.com
 wrote:
>

> 04. launcher.c
>
> 04.a
>
> +   worker->main_worker_pid = is_subworker ? MyProcPid : 0;
>
> You can use InvalidPid instead of 0.
> （I thought pid should be represented by the datatype pid_t, but in some codes 
> it is defined as int...)
>
> 04.b
>
> +   worker->main_worker_pid = 0;
>
> You can use InvalidPid instead of 0, same as above.
>
> 05. origin.c
>
>  void
> -replorigin_session_setup(RepOriginId node)
> +replorigin_session_setup(RepOriginId node, int acquired_by)
>
> IIUC the same slot can be used only when the apply main worker has already 
> acquired the slot
> and the subworker for the same subscription tries to acquire, but it cannot 
> understand from comments.
> How about adding comments, or an assertion that acquired_by is same as 
> session_replication_state->acquired_by ?
> Moreover acquired_by should be compared with InvalidPid, based on above 
> comments.
>

In general I agree, and I also suggested to use pid_t and InvalidPid
(at least for all the new code)

In practice, please be aware that InvalidPid is -1 (not 0), so
replacing any existing code (e.g. in replorigin_session_setup) that
was already checking for 0 has to be done with lots of care.

--
Kind Regards,
Peter Smith.
Fujitsu Australia.

Re: ICU for global collation

2022-08-22 Thread Michael Paquier

On Mon, Aug 22, 2022 at 04:10:59PM +0200, Peter Eisentraut wrote:
> I think this piece of code was left over from some earlier attempts to
> specify the libc locale and the icu locale with one option, which never
> really worked well.  The CREATE DATABASE man page does not mention that
> LOCALE provides the default for ICU_LOCALE.  Hence, I think we should delete
> this.

As of 36f729e, is there anything left to address on this thread or
should this open item be closed?
--
Michael


signature.asc
Description: PGP signature

Re: SQL/JSON features for v15

2022-08-22 Thread Michael Paquier

On Mon, Aug 22, 2022 at 07:57:29PM -0700, Andres Freund wrote:
> To me it feels like there's a probably too much work here to cram it at this
> point. If several other committers shared the load of working on this it'd
> perhaps be doable, but I've not seen many volunteers.

While 0002 is dead simple, I am worried about the complexity created
by 0001, 0003 (particularly tightening subtransactions with a
CASE_EEOP) and 0004 at this late stage of the release process:
https://www.postgresql.org/message-id/7d83684b-7932-9f29-400b-0beedfafc...@postgrespro.ru

This is not a good sign after three betas for a feature as complex as
this one.
--
Michael


signature.asc
Description: PGP signature

Re: Perform streaming logical transactions by background workers and parallel apply

2022-08-22 Thread Peter Smith

On Mon, Aug 22, 2022 at 7:01 PM Amit Kapila  wrote:
>
> On Mon, Aug 22, 2022 at 4:42 AM Peter Smith  wrote:
> >
> > On Fri, Aug 19, 2022 at 7:55 PM Amit Kapila  wrote:
> > >
> > > On Fri, Aug 19, 2022 at 3:05 PM Peter Smith  wrote:
> > > >
> > > > On Fri, Aug 19, 2022 at 7:10 PM Amit Kapila  
> > > > wrote:
> > > > >
> > > > > On Fri, Aug 19, 2022 at 2:36 PM Peter Smith  
> > > > > wrote:
> > > > > >
> > > > > > Here are my review comments for the v23-0005 patch:
> > > > > >
> > > > > > ==
> > > > > >
> > > > > > Commit Message says:
> > > > > > main_worker_pid is Process ID of the main apply worker, if this 
> > > > > > process is a
> > > > > > apply background worker. NULL if this process is a main apply 
> > > > > > worker or a
> > > > > > synchronization worker.
> > > > > > The new column can make it easier to distinguish main apply worker 
> > > > > > and apply
> > > > > > background worker.
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Having a column called ‘main_worker_pid’ which is defined to be NULL
> > > > > > if the process *is* the main apply worker does not make any sense to
> > > > > > me.
> > > > > >
> > > > >
> > > > > I haven't read this part of a patch but it seems to me we have
> > > > > something similar for parallel query workers. Refer 'leader_pid'
> > > > > column in pg_stat_activity.
> > > > >
> > > >
> > > > IIUC (from the patch 0005 commit message) the intention is to be able
> > > > to easily distinguish the worker types.
> > > >
> > >
> > > I think it is only to distinguish between leader apply worker and
> > > background apply workers. The tablesync worker can be distinguished
> > > based on relid field.
> > >
> >
> > Right. But that's the reason for my question in the first place - why
> > implement the patch so that the user still has to jump through hoops
> > just to know the worker type information?
> >
>
> I think it is not only to judge worker type but also to know the pid
> of each of the workers during parallel apply. Isn't it better to have
> both main apply worker pid and parallel apply worker pid as we have
> for the parallel query system?
>

OK, thanks for pointing me to that other view. Now that I see the
existing pg_stat_activity already has 'pid' and 'leader_pid' [1], it
suddenly seems more reasonable to do similar for this
pg_stat_subscription.

This background information needs to be conveyed better in the patch
0005 commit message. The current commit message said nothing about
trying to be consistent with the existing stats views; it only says
this field was added to distinguish more easily between the types of
apply workers.

--
[1] https://www.postgresql.org/docs/devel/monitoring-stats.html

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: sockaddr_un.sun_len vs. reality

2022-08-22 Thread Tom Lane

Thomas Munro  writes:
> I was nerd-sniped by the historical context of this single line of
> code.  I'd already wondered many times (not just in PostgreSQL)
> whether and when that became a cargo-cult practice, replicated from
> other software and older books like Stevens.  I failed to find any
> sign of an OS that needs it today, or likely even needed it this
> millennium.  Now I'd like to propose removing it.

Seems worth a try.

regards, tom lane

Re: Inconsistencies around defining FRONTEND

2022-08-22 Thread Andres Freund

Hi,

On 2022-08-22 08:48:34 -0700, Andres Freund wrote:
> On 2022-08-20 12:45:50 -0700, Andres Freund wrote:
> > The -DFRONTENDs for the various ecpg libraries don't seem necessary
> > anymore. That looks to be a leftover from 7143b3e8213, before that ecpg had
> > copies of various pgport libraries.
> >
> > Same with libpq, also looks to be obsoleted by 7143b3e8213.
> >
> > I don't think we need FRONTEND in initdb - looks like that stopped being
> > required with af1a949109d.
> 
> I think the patches for this are fairly obvious, and survived CI without an
> issue [1], so the src/tools/msvc bits work too. So I'm planning to push them
> fairly soon.

Done.

- Andres

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-22 Thread Nathan Bossart

On Mon, Aug 22, 2022 at 02:22:29PM -0700, Nathan Bossart wrote:
> On Mon, Aug 22, 2022 at 09:35:34AM +0700, John Naylor wrote:
>> Not at all! However, the 32-bit-element changes are irrelevant for
>> json, and make review more difficult. I would suggest keeping those in
>> the other thread starting with whatever refactoring is needed. I can
>> always rebase over that.
> 
> Yeah, I'll remove those to keep this thread focused.

Here's a new version of the patch with the 32-bit changes and calls to
lfind() removed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From dd7b3c3b0567d77a7f171d9cb4b6a8d3ee30ceec Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Sat, 20 Aug 2022 21:14:01 -0700
Subject: [PATCH v8 1/1] json_lex_string() SIMD

---
 src/common/jsonapi.c  |  11 +-
 src/include/port/pg_lfind.h   |  65 +++-
 src/include/port/simd.h   | 143 ++
 .../test_lfind/expected/test_lfind.out|  18 ++-
 .../modules/test_lfind/sql/test_lfind.sql |   4 +-
 .../modules/test_lfind/test_lfind--1.0.sql|  10 +-
 src/test/modules/test_lfind/test_lfind.c  |  91 ++-
 7 files changed, 332 insertions(+), 10 deletions(-)

diff --git a/src/common/jsonapi.c b/src/common/jsonapi.c
index fefd1d24d9..87e1d0b192 100644
--- a/src/common/jsonapi.c
+++ b/src/common/jsonapi.c
@@ -19,6 +19,7 @@
 
 #include "common/jsonapi.h"
 #include "mb/pg_wchar.h"
+#include "port/pg_lfind.h"
 
 #ifndef FRONTEND
 #include "miscadmin.h"
@@ -844,7 +845,7 @@ json_lex_string(JsonLexContext *lex)
 		}
 		else
 		{
-			char	   *p;
+			char	   *p = s;
 
 			if (hi_surrogate != -1)
 return JSON_UNICODE_LOW_SURROGATE;
@@ -853,7 +854,13 @@ json_lex_string(JsonLexContext *lex)
 			 * Skip to the first byte that requires special handling, so we
 			 * can batch calls to appendBinaryStringInfo.
 			 */
-			for (p = s; p < end; p++)
+			while (p < end - sizeof(Vector8) &&
+   !pg_lfind8('\\', (uint8 *) p, sizeof(Vector8)) &&
+   !pg_lfind8('"', (uint8 *) p, sizeof(Vector8)) &&
+   !pg_lfind8_le(0x1F, (uint8 *) p, sizeof(Vector8)))
+p += sizeof(Vector8);
+
+			for (; p < end; p++)
 			{
 if (*p == '\\' || *p == '"')
 	break;
diff --git a/src/include/port/pg_lfind.h b/src/include/port/pg_lfind.h
index fb125977b2..86342d71d8 100644
--- a/src/include/port/pg_lfind.h
+++ b/src/include/port/pg_lfind.h
@@ -1,7 +1,8 @@
 /*-
  *
  * pg_lfind.h
- *	  Optimized linear search routines.
+ *	  Optimized linear search routines using SIMD intrinsics where
+ *	  available.
  *
  * Copyright (c) 2022, PostgreSQL Global Development Group
  *
@@ -15,6 +16,68 @@
 
 #include "port/simd.h"
 
+/*
+ * pg_lfind8
+ *
+ * Return true if there is an element in 'base' that equals 'key', otherwise
+ * return false.
+ */
+static inline bool
+pg_lfind8(uint8 key, uint8 *base, uint32 nelem)
+{
+	uint32		i;
+	/* round down to multiple of vector length */
+	uint32		tail_idx = nelem & ~(sizeof(Vector8) - 1);
+	Vector8		chunk;
+
+	for (i = 0; i < tail_idx; i += sizeof(Vector8))
+	{
+		vector8_load(, [i]);
+		if (vector8_eq(chunk, key))
+			return true;
+	}
+
+	/* Process the remaining elements one at a time. */
+	for (; i < nelem; i++)
+	{
+		if (key == base[i])
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * pg_lfind8_le
+ *
+ * Return true if there is an element in 'base' that is less than or equal to
+ * 'key', otherwise return false.
+ */
+static inline bool
+pg_lfind8_le(uint8 key, uint8 *base, uint32 nelem)
+{
+	uint32		i;
+	/* round down to multiple of vector length */
+	uint32		tail_idx = nelem & ~(sizeof(Vector8) - 1);
+	Vector8		chunk;
+
+	for (i = 0; i < tail_idx; i += sizeof(Vector8))
+	{
+		vector8_load(, [i]);
+		if (vector8_le(chunk, key))
+			return true;
+	}
+
+	/* Process the remaining elements one at a time. */
+	for (; i < nelem; i++)
+	{
+		if (base[i] <= key)
+			return true;
+	}
+
+	return false;
+}
+
 /*
  * pg_lfind32
  *
diff --git a/src/include/port/simd.h b/src/include/port/simd.h
index a571e79f57..23f6269169 100644
--- a/src/include/port/simd.h
+++ b/src/include/port/simd.h
@@ -25,6 +25,149 @@
 #if (defined(__x86_64__) || defined(_M_AMD64))
 #include 
 #define USE_SSE2
+typedef __m128i Vector8;
+
+/*
+ * If no SIMD instructions are available, we emulate specialized vector
+ * operations using uint64.
+ */
+#else
+typedef uint64 Vector8;
+#endif
+
+
+static inline void vector8_load(Vector8 *v, const uint8 *s);
+static inline Vector8 vector8_broadcast(const uint8 c);
+static inline bool vector8_has_zero(const Vector8 v);
+static inline bool vector8_eq(const Vector8 v, const uint8 c);
+static inline bool vector8_le(const Vector8 v, const uint8 c);
+
+
+/*
+ * Functions for loading a chunk of memory into a vector.
+ */
+
+static inline void
+vector8_load(Vector8 *v, const uint8 *s)
+{
+#ifdef USE_SSE2
+	*v = _mm_loadu_si128((const __m128i *) s);
+#else

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2022-08-22 Thread Andres Freund

Hi,

On 2022-08-22 13:15:18 -0400, Melanie Plageman wrote:
> v28 attached.

Pushed 0001, 0002. Thanks!

- Andres

Re: Considering additional sort specialisation functions for PG16

2022-08-22 Thread John Naylor

On Tue, Aug 23, 2022 at 9:18 AM David Rowley  wrote:
>
> I have the following dimensions in mind for consideration:
>
> 1. Specialisations to handle sorting of non-null datums (eliminates
> checking for nulls in the comparison function)
> 2. Specialisations to handle single column sorts (eliminates
> tiebreaker function call or any checks for existence of tiebreaker)
> 3. ASC sort (No need for if (ssup->ssup_reverse) 
> INVERT_COMPARE_RESULT(compare))
>
> If we did all of the above then we'd end up with 3 * 2 * 2 * 2 = 24
> specialization functions.  That seems a bit excessive. So here I'd
> like to discuss which ones we should add, if any.
>
> I've attached a very basic implementation of #1 which adds 3 new
> functions for sorting non-null datums.

Did you happen to see

https://www.postgresql.org/message-id/CAFBsxsFhq8VUSkUL5YO17cFXbCPwtbbxBu%2Bd9MFrrsssfDXm3Q%40mail.gmail.com

where I experimented with removing all null handling? What I had in
mind was pre-partitioning nulls and non-nulls when populating the
SortTuple array, then calling qsort twice, once with the non-null
partition with comparators that assume non-null, and (only if there
are additional sort keys) once on the null partition. And the
pre-partitioning would take care of nulls first/last upfront. I
haven't looked into the feasibility of this yet, but the good thing
about the concept is that it removes null handling in the comparators
without additional sort specializations.

-- 
John Naylor
EDB: http://www.enterprisedb.com

Re: shared-memory based stats collector - v70

2022-08-22 Thread Andres Freund

Hi,

On 2022-08-22 11:32:14 +0900, Kyotaro Horiguchi wrote:
> At Wed, 10 Aug 2022 14:02:34 +0200, "Drouvot, Bertrand"  
> wrote in 
> > what about?
> > 
> > +   /*
> > +    * Acquire the LWLock directly instead of using
> > pg_stat_lock_entry_shared()
> > +    * which requires a reference.
> > +    */
> > 
> > 
> > I think that's more consistent with other comments mentioning LWLock
> > acquisition.
> 
> Sure. Thaks!. I did that in the attached.

Pushed, thanks!

Greetings,

Andres Freund

Re: making relfilenodes 56 bits

2022-08-22 Thread Dilip Kumar

On Tue, Aug 23, 2022 at 1:46 AM Robert Haas  wrote:
>
> On Mon, Aug 22, 2022 at 3:55 AM Amit Kapila  wrote:
> > To solve that problem, how about rewriting the system table in the new
> > cluster which has a conflicting relfilenode? I think we can probably
> > do this conflict checking before processing the tables from the old
> > cluster.
>
> Thanks for chiming in.
>
> Right now, there are two parts to the relfilenumber preservation
> system, and this scheme doesn't quite fit into either of them. First,
> the dump includes commands to set pg_class.relfilenode in the new
> cluster to the same value that it had in the old cluster. The dump
> can't include any SQL commands that depend on what's happening in the
> new cluster because pg_dump(all) only connects to a single cluster,
> which in this case is the old cluster. Second, pg_upgrade itself
> copies the files from the old cluster to the new cluster. This doesn't
> involve a database connection at all. So there's no part of the
> current relfilenode preservation mechanism that can look at the old
> AND the new database and decide on some SQL to execute against the new
> database.
>
> I thought for a while that we could use the information that's already
> gathered by get_rel_infos() to do what you're suggesting here, but it
> doesn't quite work, because that function excludes system tables, and
> we can't afford to do that here. We'd either need to modify that query
> to include system tables - at least for the new cluster - or run a
> separate one to gather information about system tables in the new
> cluster. Then, we could put all the pg_class.relfilenode values we
> found in the new cluster into a hash table, loop over the list of rels
> this function found in the old cluster, and for each one, probe into
> the hash table. If we find a match, that's a system table that needs
> to be moved out of the way before calling create_new_objects(), or
> maybe inside that function but before it runs pg_restore.
>
> That doesn't seem too crazy, I think. It's a little bit of new
> mechanism, but it doesn't sound horrific. It's got the advantage of
> being significantly cheaper than my proposal of moving everything out
> of the way unconditionally, and at the same time it retains one of the
> key advantages of that proposal - IMV, anyway - which is that we don't
> need separate relfilenode ranges for user and system objects any more.
> So I guess on balance I kind of like it, but maybe I'm missing
> something.

Okay, so this seems exactly the same as your previous proposal but
instead of unconditionally rewriting all the system tables we will
rewrite only those conflict with the user table or pg_largeobject from
the previous cluster.  Although it might have additional
implementation complexity on the pg upgrade side, it seems cheaper
than rewriting everything.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: SQL/JSON features for v15

2022-08-22 Thread Andres Freund

Hi,

On 2022-08-22 21:52:01 -0400, Jonathan S. Katz wrote:
> Andres, Andrew, Amit, Robert -- as you have either worked on this or
> expressed opinions -- any thoughts on this current patch set?

To me it feels like there's a probably too much work here to cram it at this
point. If several other committers shared the load of working on this it'd
perhaps be doable, but I've not seen many volunteers.

Greetings,

Andres Freund

Re: Improve description of XLOG_RUNNING_XACTS

2022-08-22 Thread Kyotaro Horiguchi

At Mon, 15 Aug 2022 11:16:56 +0900, Masahiko Sawada  
wrote in 
> Sorry for the late reply.

No worries. Anyway I was in a long (as a Japanese:) vacation.

> On Thu, Jul 28, 2022 at 4:29 PM Kyotaro Horiguchi
>  wrote:
> > record is sound". Is it any trouble with assuming the both *can*
> > happen at once?  If something's broken, it will be reflected in the
> > output.
> 
> Fair point. We may not need to interpret the contents.

Yeah.

> On Thu, Jul 28, 2022 at 3:24 PM Kyotaro Horiguchi
>  wrote:
> >
> > Another point is if the xid/subxid lists get long, I see it annoying
> > that the "overflowed" messages goes far away to the end of the long
> > line. Couldn't we rearrange the item order of the line as the follows?
> >
> > nextXid %u latestCompletedXid %u oldestRunningXid %u;[ subxid overflowed;][ 
> > %d xacts: %u %u ...;][ subxacts: %u %u ..]
> >
> 
> I'm concerned that we have two information of subxact apart. Given
> that showing both individual subxacts and "overflow" is a bug, I think

Bug or every kind of breakage of the file. So if "overflow"ed, we
don't need to show a subxid there but I thought that there's no need
to change behavior in that case since it scarcely happens.

> we can output like:
> 
> if (xlrec->subxcnt > 0)
> {
> appendStringInfo(buf, "; %d subxacts:", xlrec->subxcnt);
> for (i = 0; i < xlrec->subxcnt; i++)
> appendStringInfo(buf, " %u", xlrec->xids[xlrec->xcnt + i]);
> }
> 
> if (xlrec->subxid_overflow)
> appendStringInfoString(buf, "; subxid overflowed");

Yea, it seems like what I proposed upthread. I'm fine with that since
it is an abonormal situation.

> Or we can output the "subxid overwlowed" first.

(I prefer this, as that doesn't change the output in the normal case
but the anormality will be easilly seen if happens.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: standby promotion can create unreadable WAL

2022-08-22 Thread Nathan Bossart

On Mon, Aug 22, 2022 at 02:36:36PM -0400, Robert Haas wrote:
> (Incidentally, there's also a bug in pg_waldump here: it's reporting
> the wrong LSN as the source of the error. 0/4FFF700 is not the record
> that's busted, as shown by the fact that it was successfully decoded
> and shown in the output. The relevant code in pg_waldump should be
> using EndRecPtr instead of ReadRecPtr to report the error. If it did,
> it would complain about 0/4FFFEA8, which is where the problem really
> is. This is of the same vintage as the bug fixed by
> d9fbb8862959912c5266364059c0abeda0c93bbf, though in that case the
> issue was reporting all errors using the start LSN of the first of
> several records read no matter where the error actually happened,
> whereas in this case the error is using the start LSN of the previous
> record instead of the current one.)

There was some previous discussion on this [0] [1].

[0] https://postgr.es/m/2B4510B2-3D70-4990-BFE3-0FE64041C08A%40amazon.com
[1] 
https://postgr.es/m/20220127.100738.1985658263632578184.horikyota.ntt%40gmail.com

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: pg_basebackup: add test about zstd compress option

2022-08-22 Thread Michael Paquier

On Tue, Aug 23, 2022 at 10:58:22AM +0900, Dong Wook Lee wrote:
> I checked the test code not to test the zstd option, then added it.
> I hope my patch will help us to ensure safety of the test.

It seems to me that checking that the contents generated are valid is
equally necessary.  We do that with zlib with gzip --test, and you
could use ${ZSTD} in the context of this test.

What about lz4?
--
Michael

signature.asc
Description: PGP signature

Re: SQL/JSON features for v15

2022-08-22 Thread Amit Langote

On Tue, Aug 23, 2022 at 10:52 AM Jonathan S. Katz  wrote:
> Andres, Andrew, Amit, Robert -- as you have either worked on this or
> expressed opinions -- any thoughts on this current patch set?

FWIW, I've started looking at these patches.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Re: fix typo - empty statement ;;

2022-08-22 Thread John Naylor

On Tue, Aug 23, 2022 at 7:14 AM Peter Smith  wrote:
>
> I noticed an accidental ;;
>
> PSA patch to remove the same.

Pushed.

-- 
John Naylor
EDB: http://www.enterprisedb.com

Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-08-22 Thread Michael Paquier

On Mon, Aug 22, 2022 at 08:10:10AM -0700, Jacob Champion wrote:
> otherwise I'll sit tight.

So am I.  I have done an extra round of checks around the
serialization/deserialization logic where I put some elog()'s to look 
at the output passed down with some workers and a couple of auth
methods, and after an indentation and some comment polishing I finish
with the attached.

There was one thing that annoyed me with the patch, though, as of the
lack of initialization of MyClientConnectionInfo at backend startup,
as we may finish by not calling set_authn() to fill in some of its
data, so I have placed an extra memset(0) in InitProcessGlobals()
(note that Port does a calloc() much earlier than that, but I think
that we don't really want to do more in such code paths, especially
for the parallelized client information).

I have written a commit message, while on it.  Does that look fine to
you?
--
Michael
From 73c28e17f1c77af134dc117500226c201099499b Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 23 Aug 2022 11:23:55 +0900
Subject: [PATCH v4] Allow parallel workers to retrieve some data from Port

This commit moves authn_id into a new global structure called
ClientConnectionInfo (mapping to a MyClientConnectionInfo for each
backend) which is intended to hold all the client information that
should be shared between the backend and any of its parallel workers,
access for extensions and triggers being the primary use case.  There is
no need to push all the data of Port to the workers, and authn_id is
quite a generic concept so using a separate structure provides the best
balance (the name of the structure has been suggested by Robert Haas).

While on it, and per discussion as this would be useful for a potential
SYSTEM_USER that can be accessed through parallel workers, a second
field is added for the authentication method, copied directly from
Port.

ClientConnectionInfo is serialized and restored using a new parallel
key and a structure tracks the length of the authn_id, making the
addition of more fields straight-forward.

Author: Jocob Champion
Reviewed-by: Bertrand Drouvot, Stephen Frost, Robert Haas, Tom Lane,
Michael Paquier, Julien Rouhaud
Discussion: https://postgr.es/m/793d990837ae5c06a558d58d62de9378ab525d83.ca...@vmware.com
---
 src/include/libpq/libpq-be.h  | 45 +
 src/include/miscadmin.h   |  4 ++
 src/backend/access/transam/parallel.c | 19 +-
 src/backend/libpq/auth.c  | 23 ---
 src/backend/postmaster/postmaster.c   |  6 +-
 src/backend/utils/init/miscinit.c | 93 +++
 src/tools/pgindent/typedefs.list  |  2 +
 7 files changed, 166 insertions(+), 26 deletions(-)

diff --git a/src/include/libpq/libpq-be.h b/src/include/libpq/libpq-be.h
index 32d3a4b085..d3c8c83e51 100644
--- a/src/include/libpq/libpq-be.h
+++ b/src/include/libpq/libpq-be.h
@@ -88,6 +88,37 @@ typedef struct
 } pg_gssinfo;
 #endif
 
+/*
+ * ClientConnectionInfo includes the fields describing the client connection
+ * that are copied over to parallel workers as nothing from Port does that.
+ * The same rules apply for allocations here as for Port (everything must be
+ * malloc'd or palloc'd in TopMemoryContext).
+ *
+ * If you add a struct member here, remember to also handle serialization in
+ * SerializeClientConnectionInfo() and co.
+ */
+typedef struct ClientConnectionInfo
+{
+	/*
+	 * Authenticated identity.  The meaning of this identifier is dependent on
+	 * auth_method; it is the identity (if any) that the user presented during
+	 * the authentication cycle, before they were assigned a database role.
+	 * (It is effectively the "SYSTEM-USERNAME" of a pg_ident usermap --
+	 * though the exact string in use may be different, depending on pg_hba
+	 * options.)
+	 *
+	 * authn_id is NULL if the user has not actually been authenticated, for
+	 * example if the "trust" auth method is in use.
+	 */
+	const char *authn_id;
+
+	/*
+	 * The HBA method that determined the above authn_id. This only has
+	 * meaning if authn_id is not NULL; otherwise it's undefined.
+	 */
+	UserAuth	auth_method;
+} ClientConnectionInfo;
+
 /*
  * This is used by the postmaster in its communication with frontends.  It
  * contains all state information needed during this communication before the
@@ -148,19 +179,6 @@ typedef struct Port
 	 */
 	HbaLine*hba;
 
-	/*
-	 * Authenticated identity.  The meaning of this identifier is dependent on
-	 * hba->auth_method; it is the identity (if any) that the user presented
-	 * during the authentication cycle, before they were assigned a database
-	 * role.  (It is effectively the "SYSTEM-USERNAME" of a pg_ident usermap
-	 * -- though the exact string in use may be different, depending on pg_hba
-	 * options.)
-	 *
-	 * authn_id is NULL if the user has not actually been authenticated, for
-	 * example if the "trust" auth method is in use.
-	 */
-	const char *authn_id;
-
 	/*
 	 * TCP keepalive and user timeout settings.

Re: Column Filtering in Logical Replication

2022-08-22 Thread Peter Smith

On Mon, Aug 22, 2022 at 7:11 PM Erik Rijkers  wrote:
>
> Op 22-08-2022 om 10:27 schreef Peter Smith:
> >
> > PSA new set of v2* patches.
>
> Hi,
>
> In the second file a small typo, I think:
>
> "enclosed by parenthesis"  should be
> "enclosed by parentheses"
>

Thanks for your feedback.

Fixed in the v3* patches [1].

--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPtHgQbFs9DDeOoqqQLZmMBD8FQPK2WOXJpR1nyDQy8AGA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Column Filtering in Logical Replication

2022-08-22 Thread Peter Smith

On Mon, Aug 22, 2022 at 9:25 PM vignesh C  wrote:
>
...

> Few comments:
> 1) I felt no expressions are allowed in case of column filters. Only
> column names can be specified. The second part of the sentence
> confuses what is allowed and what is not allowed. Won't it be better
> to remove the second sentence and mention that only column names can
> be specified.
> +   
> +Column list can contain only simple column references. Complex
> +expressions, function calls etc. are not allowed.
> +   
>

This wording was lifted verbatim from the commit message [1]. But I
see your point that it just seems to be overcomplicating a simple
rule. Modified as suggested.

> 2) tablename should be table name.
> +   
> +A column list is specified per table following the tablename, and
> enclosed by
> +parenthesis. See  for details.
> +   
>
> We have used table name in the same page in other instances like:
> a) The row filter is defined per table. Use a WHERE clause after the
> table name for each published table that requires data to be filtered
> out. The WHERE clause must be enclosed by parentheses.
> b) The tables are matched between the publisher and the subscriber
> using the fully qualified table name.
>

Fixed as suggested.

> 3) One small whitespace issue:
> git am v2-0001-Column-List-replica-identity-rules.patch
> Applying: Column List replica identity rules.
> .git/rebase-apply/patch:30: trailing whitespace.
>if the publication publishes only INSERT operations.
> warning: 1 line adds whitespace errors.
>

Fixed.

~~~

PSA the v3* patch set.

--
[1] 
https://github.com/postgres/postgres/commit/923def9a533a7d986acfb524139d8b9e5466d0a5

Kind Regards,
Peter Smith.
Fujitsu Australia


v3-0001-Column-List-replica-identity-rules.patch
Description: Binary data


v3-0002-Column-Lists-new-pgdocs-section.patch
Description: Binary data

Considering additional sort specialisation functions for PG16

2022-08-22 Thread David Rowley

Hi,

(Background: 697492434 added 3 new sort functions to remove the
indirect function calls for the comparator function.  This sped up
sorting for various of our built-in data types.)

There was a bit of unfinished discussion around exactly how far to
take these specialisations for PG15.  We could certainly add more.

There are various other things we could do to further speed up sorting
for these datatypes.  One example is, we could add 3 more variations
of these functions that can be called when there are no NULL datums to
sort.  That effectively multiplies the number of specialisations by 2,
or adds another dimension.

I have the following dimensions in mind for consideration:

1. Specialisations to handle sorting of non-null datums (eliminates
checking for nulls in the comparison function)
2. Specialisations to handle single column sorts (eliminates
tiebreaker function call or any checks for existence of tiebreaker)
3. ASC sort (No need for if (ssup->ssup_reverse) INVERT_COMPARE_RESULT(compare))

If we did all of the above then we'd end up with 3 * 2 * 2 * 2 = 24
specialization functions.  That seems a bit excessive. So here I'd
like to discuss which ones we should add, if any.

I've attached a very basic implementation of #1 which adds 3 new
functions for sorting non-null datums.  This could be made a bit more
advanced.  For now, I just added a bool flag to track if we have any
NULL datum1s in memtuples[].  For bounded sorts, we may remove NULLs
from that array, and may end up with no nulls after having seen null.
So maybe a count would be better than a flag.

A quick performance test with 1 million random INTs shows ~6%
performance improvement when there are no nulls.

Master
$ pgbench -n -f bench.sql -T 60 postgres
latency average = 159.837 ms
latency average = 161.193 ms
latency average = 159.512 ms

master + not_null_sort_specializations.patch
$ pgbench -n -f bench.sql -T 60 postgres
latency average = 150.791 ms
latency average = 149.843 ms
latency average = 150.319 ms

I didn't test for any regression when there are NULLs and we're unable
to use the new specializations. I'm hoping the null tracking will be
almost free, but I will need to check.

It's all quite subjective to know which specializations should be
added. I think #1 is likely to have the biggest wins when it can be
used as it removes the most branching in the comparator function,
however the biggest gains are not the only thing to consider. We also
need to consider how commonly these functions will be used. I don't
have any information about that.

David
diff --git a/src/backend/utils/sort/tuplesort.c 
b/src/backend/utils/sort/tuplesort.c
index d90a1aebdf..52927f93b8 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -218,6 +218,8 @@ struct Tuplesortstate
int memtupsize; /* allocated length of 
memtuples array */
boolgrowmemtuples;  /* memtuples' growth still underway? */
 
+   boolmemtuples_hasnull;  /* does any SortTuple in 
memtuples have
+* 
isnull1 set to true? */
/*
 * Memory for tuples is sometimes allocated using a simple slab 
allocator,
 * rather than with palloc().  Currently, we switch to slab allocation
@@ -496,7 +498,10 @@ static void tuplesort_updatemax(Tuplesortstate *state);
  * abbreviations of text or multi-key sorts.  There could be!  Is it worth it?
  */
 
-/* Used if first key's comparator is ssup_datum_unsigned_compare */
+/*
+ * Used if first key's comparator is ssup_datum_unsigned_compare and
+ * state->memtuples_hasnull is true
+ */
 static pg_attribute_always_inline int
 qsort_tuple_unsigned_compare(SortTuple *a, SortTuple *b, Tuplesortstate *state)
 {
@@ -518,8 +523,39 @@ qsort_tuple_unsigned_compare(SortTuple *a, SortTuple *b, 
Tuplesortstate *state)
return state->base.comparetup(a, b, state);
 }
 
+/*
+ * Used if first key's comparator is ssup_datum_unsigned_compare and
+ * state->memtuples_hasnull is false
+ */
+static pg_attribute_always_inline int
+qsort_tuple_unsigned_compare_notnull(SortTuple *a, SortTuple *b,
+
Tuplesortstate *state)
+{
+   int compare;
+
+   /* make sure the null tracking code didn't mess up */
+   Assert(!a->isnull1 && !b->isnull1);
+
+   compare = ApplyUnsignedSortComparatorNotNull(a->datum1, b->datum1,
+   
 >base.sortKeys[0]);
+   if (compare != 0)
+   return compare;
+
+   /*
+* No need to waste effort calling the tiebreak function when there are 
no
+* other keys to sort on.
+*/
+   if (state->base.onlyKey != NULL)
+   return 0;
+
+   return state->base.comparetup(a, b, state);
+}
+
 #if

Re: sockaddr_un.sun_len vs. reality

2022-08-22 Thread Thomas Munro

On Tue, Feb 15, 2022 at 4:21 AM Tom Lane  wrote:
> Thomas Munro  writes:
> > On Mon, Feb 14, 2022 at 7:19 PM Tom Lane  wrote:
> >> I'm leaning to adding a compile-time clamp on the value,
> >> that is
> >>
> >>  unp->sun_len = Min(sizeof(struct sockaddr_un),
> >> (1 << (sizeof(unp->sun_len) * 8)) - 1);
>
> > Any system that has sun_len, and has expanded sun_path so that the
> > struct size doesn't fit in sun_len, must surely be ignoring sun_len
> > but retaining it for binary compatibility.  Otherwise there would be
> > no way to use the extra bytes of sun_path!  It's tempting to suggest
> > setting sun_len to zero in this case...
>
> Yeah, I thought about doing that or just skipping the assignment
> altogether.  However, we'd need just as much code, because the
> change would have to be conditional on more or less this same
> computation as to whether sizeof(struct sockaddr_un) fits into
> the field.

I was nerd-sniped by the historical context of this single line of
code.  I'd already wondered many times (not just in PostgreSQL)
whether and when that became a cargo-cult practice, replicated from
other software and older books like Stevens.  I failed to find any
sign of an OS that needs it today, or likely even needed it this
millennium.  Now I'd like to propose removing it.

I believe we have the complete set of surviving systems with sun_len.
These are the systems with sockets descended from 4.4BSD, plus AIX
when using its socket API in 4.4-compatible mode:

AIX: no sun_len if -DCOMPAT_43[1], so surely ignored by kernel!
NetBSD: manual says it's ignored[2]
OpenBSD: ditto[3]
FreeBSD: ditto[4]
DragonFlyBSD: clobbered[5] (like FreeBSD, just not documented)
macOS: ditto[6]

I know that between '88 and '97 some of these would fail with EINVAL
if sun_len was out of range.  The code is still there to do that in
some cases, but at various points in the 90s they started clobbering
it on entry to connect() and bind(), probably to ease porting pain
from Solaris and Linux.  I think it'd be pretty clear on the build
farm if it turns out I'm wrong here, because the zero-initialised
sun_len would be rejected as invalid on a hypothetical system that
didn't change that policy.  I've tested on all but AIX.

[1] 
https://www.postgresql.org/message-id/IKEPJJEJDCJPKMLEECEDGEIHCCAA.vvanwynsberghe%40ccncsi.net
[2] https://man.netbsd.org/unix.4
[3] https://man.openbsd.org/connect.2
[4] https://www.freebsd.org/cgi/man.cgi?connect
[5] 
https://github.com/DragonFlyBSD/DragonFlyBSD/blob/master/sys/kern/uipc_syscalls.c#L1516
[6] https://github.com/apple/darwin-xnu/blob/main/bsd/kern/uipc_syscalls.c#L2817
From 0a7c7e8c288300f15cea0477d5d692f652f5f44b Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 23 Aug 2022 13:16:54 +1200
Subject: [PATCH] Don't bother to set sockaddr_un.sun_len.

There are no known systems that require you to fill in sun_len when
calling bind() or connect().  We might as well stop perpetuating this
obsolete practice.

Discussion: https://postgr.es/m/2781112.1644819528%40sss.pgh.pa.us
---
 src/common/ip.c | 14 --
 1 file changed, 14 deletions(-)

diff --git a/src/common/ip.c b/src/common/ip.c
index 9b611cdc8c..6343f49a70 100644
--- a/src/common/ip.c
+++ b/src/common/ip.c
@@ -218,20 +218,6 @@ getaddrinfo_unix(const char *path, const struct addrinfo *hintsp,
 		aip->ai_addrlen = offsetof(struct sockaddr_un, sun_path) + strlen(path);
 	}

-	/*
-	 * The standard recommendation for filling sun_len is to set it to the
-	 * struct size (independently of the actual path length).  However, that
-	 * draws an integer-overflow warning on AIX 7.1, where sun_len is just
-	 * uint8 yet the struct size exceeds 255 bytes.  It's likely that nothing
-	 * is paying attention to sun_len on that platform, but we have to do
-	 * something with it.  To suppress the warning, clamp the struct size to
-	 * what will fit in sun_len.
-	 */
-#ifdef HAVE_STRUCT_SOCKADDR_SA_LEN
-	unp->sun_len = Min(sizeof(struct sockaddr_un),
-	   ((size_t) 1 << (sizeof(unp->sun_len) * BITS_PER_BYTE)) - 1);
-#endif
-
 	return 0;
 }

-- 
2.35.1

Re: shadow variables - pg15 edition

2022-08-22 Thread Justin Pryzby

On Tue, Aug 23, 2022 at 01:38:40PM +1200, David Rowley wrote:
> On Tue, 23 Aug 2022 at 13:17, Justin Pryzby  wrote:
> >  Attached is a squished version.
> 
> I see there's some renaming ones snuck in there. e.g:
> ... in fact, there's lots of renaming, so I'll just stop looking.

Actually, they didn't sneak in - what I sent are the patches which are ready to
be reviewed, excluding the set of "this" and "tmp" and other renames which you
disliked.  In the branch (not the squished patch) the first ~15 patches were
mostly for C99 for loops - I presented them this way deliberately, so you could
review and comment on whatever you're able to bite off, or run with whatever
parts you think are ready.  I rewrote it now to be more bite sized by
truncating off the 2nd half of the patches.

> Can you just send a patch that only changes the cases where you can
> remove a variable declaration from an outer scope into a single inner
> scope, or multiple inner scope when the variable can be declared
> inside a for() loop?

> would be to move the found_whole_row declaration into multiple inner
> scopes.  That's a net increase in code lines, for which I think
> requires more careful thought if we want that or not.

IMO it doesn't make sense to declare multiple integers for something like this
whether they're all ignored.  Nor for "save_errno" nor the third, similar case,
for the reason in the commit message.

-- 
Justin
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index e88f7efa7e4..69f21abfb59 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -353,45 +353,44 @@ brinbeginscan(Relation r, int nkeys, int norderbys)
 int64
 bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
 {
RelationidxRel = scan->indexRelation;
Buffer  buf = InvalidBuffer;
BrinDesc   *bdesc;
Oid heapOid;
RelationheapRel;
BrinOpaque *opaque;
BlockNumber nblocks;
BlockNumber heapBlk;
int totalpages = 0;
FmgrInfo   *consistentFn;
MemoryContext oldcxt;
MemoryContext perRangeCxt;
BrinMemTuple *dtup;
BrinTuple  *btup = NULL;
Sizebtupsz = 0;
ScanKey   **keys,
  **nullkeys;
int*nkeys,
   *nnullkeys;
-   int keyno;
char   *ptr;
Sizelen;
char   *tmp PG_USED_FOR_ASSERTS_ONLY;

opaque = (BrinOpaque *) scan->opaque;
bdesc = opaque->bo_bdesc;
pgstat_count_index_scan(idxRel);

/*
 * We need to know the size of the table so that we know how long to
 * iterate on the revmap.
 */
heapOid = IndexGetRelation(RelationGetRelid(idxRel), false);
heapRel = table_open(heapOid, AccessShareLock);
nblocks = RelationGetNumberOfBlocks(heapRel);
table_close(heapRel, AccessShareLock);

/*
 * Make room for the consistent support procedures of indexed columns.  
We
 * don't look them up here; we do that lazily the first time we see a 
scan
 * key reference each of them.  We rely on zeroing fn_oid to InvalidOid.
 */
@@ -435,45 +434,45 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
nkeys = (int *) ptr;
ptr += MAXALIGN(sizeof(int) * bdesc->bd_tupdesc->natts);

nnullkeys = (int *) ptr;
ptr += MAXALIGN(sizeof(int) * bdesc->bd_tupdesc->natts);

for (int i = 0; i < bdesc->bd_tupdesc->natts; i++)
{
keys[i] = (ScanKey *) ptr;
ptr += MAXALIGN(sizeof(ScanKey) * scan->numberOfKeys);

nullkeys[i] = (ScanKey *) ptr;
ptr += MAXALIGN(sizeof(ScanKey) * scan->numberOfKeys);
}

Assert(tmp + len == ptr);

/* zero the number of keys */
memset(nkeys, 0, sizeof(int) * bdesc->bd_tupdesc->natts);
memset(nnullkeys, 0, sizeof(int) * bdesc->bd_tupdesc->natts);

/* Preprocess the scan keys - split them into per-attribute arrays. */
-   for (keyno = 0; keyno < scan->numberOfKeys; keyno++)
+   for (int keyno = 0; keyno < scan->numberOfKeys; keyno++)
{
ScanKey key = >keyData[keyno];
AttrNumber  keyattno = key->sk_attno;

/*
 * The collation of the scan key must match the collation used 
in the
 * index column (but only if the search is not IS NULL/ IS NOT 
NULL).
 * Otherwise we shouldn't be using this index ...
 */
Assert((key->sk_flags & SK_ISNULL) ||
   (key->sk_collation ==
TupleDescAttr(bdesc->bd_tupdesc,
  keyattno - 
1)->attcollation));

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-08-22 Thread shiy.f...@fujitsu.com

On Sat, Aug 20, 2022 7:02 PM Önder Kalacı  wrote:
> Hi,
> 
> I'm a little late to catch up with your comments, but here are my replies:

Thanks for your patch. Here are some comments.

1.
In FilterOutNotSuitablePathsForReplIdentFull(), is "nonPartialIndexPathList" a
good name for the list? Indexes on only expressions are also be filtered.

+static List *
+FilterOutNotSuitablePathsForReplIdentFull(List *pathlist)
+{
+   ListCell   *lc;
+   List *nonPartialIndexPathList = NIL;

2.
+typedef struct LogicalRepPartMapEntry
+{
+   Oid partoid;/* LogicalRepPartMap's 
key */
+   LogicalRepRelMapEntry relmapentry;
+   Oid usableIndexOid; /* which index to use? (Invalid 
when no index
+* used) */
+} LogicalRepPartMapEntry;

For partition tables, is it possible to use relmapentry->usableIndexOid to mark
which index to use? Which means we don't need to add "usableIndexOid" to
LogicalRepPartMapEntry.

3.
It looks we should change the comment for FindReplTupleInLocalRel() in this
patch.

/*
 * Try to find a tuple received from the publication side (in 'remoteslot') in
 * the corresponding local relation using either replica identity index,
 * primary key or if needed, sequential scan.
 *
 * Local tuple, if found, is returned in '*localslot'.
 */
static bool
FindReplTupleInLocalRel(EState *estate, Relation localrel,

4.
@@ -2030,16 +2017,19 @@ apply_handle_delete_internal(ApplyExecutionData *edata,
 {
EState *estate = edata->estate;
Relationlocalrel = relinfo->ri_RelationDesc;
-   LogicalRepRelation *remoterel = >targetRel->remoterel;
+   LogicalRepRelMapEntry *targetRel = edata->targetRel;
+   LogicalRepRelation *remoterel = >remoterel;
EPQStateepqstate;
TupleTableSlot *localslot;

Do we need this change? I didn't see any place to use the variable targetRel
afterwards.

5.
+   if (!AttributeNumberIsValid(mainattno))
+   {
+   /*
+* There are two cases to consider. First, if the index 
is a primary or
+* unique key, we cannot have any indexes with 
expressions. So, at this
+* point we are sure that the index we deal is not 
these.
+*/
+   Assert(RelationGetReplicaIndex(rel) != 
RelationGetRelid(idxrel) &&
+  RelationGetPrimaryKeyIndex(rel) != 
RelationGetRelid(idxrel));
+
+   /*
+* For a non-primary/unique index with an expression, 
we are sure that
+* the expression cannot be used for replication index 
search. The
+* reason is that we create relevant index paths by 
providing column
+* equalities. And, the planner does not pick 
expression indexes via
+* column equality restrictions in the query.
+*/
+   continue;
+   }

Is it possible that it is a usable index with an expression? I think indexes
with an expression has been filtered in 
FilterOutNotSuitablePathsForReplIdentFull(). If it can't be a usable index with
an expression, maybe we shouldn't use "continue" here.

6.
In the following case, I got a result which is different from HEAD, could you
please look into it?

-- publisher
CREATE TABLE test_replica_id_full (x int); 
ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL; 
CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;

-- subscriber
CREATE TABLE test_replica_id_full (x int, y int); 
CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x,y); 
CREATE SUBSCRIPTION tap_sub_rep_full_0 CONNECTION 'dbname=postgres port=5432' 
PUBLICATION tap_pub_rep_full;

-- publisher
INSERT INTO test_replica_id_full VALUES (1);
UPDATE test_replica_id_full SET x = x + 1 WHERE x = 1;

The data in subscriber:
on HEAD:
postgres=# select * from test_replica_id_full ;
 x | y
---+---
 2 |
(1 row)

After applying the patch:
postgres=# select * from test_replica_id_full ;
 x | y
---+---
 1 |
(1 row)

Regards,
Shi yu

pg_basebackup: add test about zstd compress option

2022-08-22 Thread Dong Wook Lee

Hi hackers,
I checked the test code not to test the zstd option, then added it.
I hope my patch will help us to ensure safety of the test.


---
Regards,
DongWook Lee.


v1_add_test_pg_basebackup.patch
Description: Binary data

Re: Schema variables - new implementation for Postgres 15

2022-08-22 Thread Julien Rouhaud

On Mon, Aug 22, 2022 at 09:13:39PM +0200, Pavel Stehule wrote:
> po 22. 8. 2022 v 9:33 odesílatel Julien Rouhaud  napsal:
>
> >
> > - you define new AclMode READ and WRITE.  Those bits are precious and I
> > don't
> >   think it's ok to consume 2 bits for session variables, especially since
> > those
> >   are the last two bits available since the recent GUC access control patch
> >   (ACL_SET and ACL_ALTER_SYSTEM).  Maybe we could existing INSERT and
> > UPDATE
> >   privileges instead, like it's done for sequences?
> >
> >
> I have not a strong opinion about it.  AclMode is uint32 - so I think there
> are still 15bites reserved. I think so UPDATE and SELECT rights can work,
> but maybe it is better to use separate rights WRITE, READ to be stronger
> signalized so the variable is not the relation. On other hand large objects
> use ACL_UPDATE, ACL_SELECT too, and it works. So I am neutral in this
> question. Has somebody here some opinion on this point? If not I'll modify
> the patch like Julien proposes.

Actually no, because AclMode is also used to store the grant option part.  The
comment before AclMode warns about it:

 * The present representation of AclItem limits us to 16 distinct rights,
 * even though AclMode is defined as uint32.  See utils/acl.h.

Re: SQL/JSON features for v15

2022-08-22 Thread Jonathan S. Katz


On 8/19/22 10:11 AM, Jonathan S. Katz wrote:

Hi,

On 8/17/22 11:45 PM, Nikita Glukhov wrote:

Hi,

On 17.08.2022 04:45, Jonathan S. Katz wrote:


On 8/15/22 10:14 PM, Andres Freund wrote:

I pushed a few cleanups to 
https://github.com/anarazel/postgres/commits/json
while I was hacking on this (ignore that it's based on the meson 
tree, that's
just faster for me). Some of them might not be applicable anymore, 
but it

might still make sense for you to look at.


With RMT hat on, this appears to be making progress. A few questions 
/ comments for the group:


1. Nikita: Did you have a chance to review Andres's changes as well?


Yes, I have reviewed Andres's changes, they all are ok.


Thank you!


Then I started to do on the top of it other fixes that help to avoid
subtransactions when they are not needed. And it ended in the new
refactoring of coercion code.  Also I moved here from v6-0003 fix of
ExecEvalJsonNeedSubtransaction() which considers more cases.


Great.

Andres, Robert: Do these changes address your concerns about the use of 
substransactions and reduce the risk of xid wraparound?



On 16.08.2022 05:14, Andres Freund wrote:

But for JIT I still had to construct additional ExprState with a
function compiled from subexpression steps.



Why did you have to do this?


I simply did not dare to implement compilation of recursively-callable
function with additional parameter stepno.  In the v8 patch I did it
by adding a switch with all possible jump addresses of EEOP_SUBTRANS
steps in the beginning of the function.  And it really seems to work
faster, but needs more exploration.  See patch 0003, where both
variants preserved using #ifdef.


The desciprion of the v7 patches:

0001 Simplify JsonExpr execution
  Andres's changes + mine:
   - Added JsonCoercionType enum, fields like via_io replaced with it
   - Emit only context item steps in JSON_TABLE_OP case
   - Skip coercion of NULLs to non-domain types (is it correct?)

0002 Fix returning of json[b] domains in JSON_VALUE:
   simply rebase of v6 onto 0001

0003 Add EEOP_SUBTRANS executor step
   v6 + new recursive JIT

0004 Split JsonExpr execution into steps
   simply rebase of v6 + used LLMBuildSwitch() in EEOP_JSONEXPR


What do folks think of these patches?


Andres, Andrew, Amit, Robert -- as you have either worked on this or 
expressed opinions -- any thoughts on this current patch set?


Thanks,

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature

pg_waldump: add test for coverage

2022-08-22 Thread Dong Wook Lee

Hi Hackers,
I wrote a test for coverage.
Unfortunately, it seems to take quite a while to run the test.
I want to improve these execution times, but I don't know exactly what to do.
Therefore, I want to hear feedback from many people.
---
Regards,
Dong Wook Lee


v1_add_test_pg_waldump.patch
Description: Binary data

xml2: add test for coverage

2022-08-22 Thread Dong Wook Lee

Hi, hackers
I made a small patch for xml2 to improve test coverage.
However, there was a problem using the functions below.

- xpath_number
- xpath_bool
- xpath_nodeset
- xpath_list

Do you have any advice on how to use this function correctly?
It would also be good to add an example of using the function to the document.

---
Regards,
DongWook Lee.


v1_add_test_xml2.patch
Description: Binary data

Re: shadow variables - pg15 edition

2022-08-22 Thread David Rowley

On Tue, 23 Aug 2022 at 13:17, Justin Pryzby  wrote:
>  Attached is a squished version.

I see there's some renaming ones snuck in there. e.g:

- Relation rel;
- HeapTuple tuple;
+ Relation pg_foreign_table;
+ HeapTuple foreigntuple;

This one does not seem to be in the category I mentioned:

@@ -3036,8 +3036,6 @@ XLogFileInitInternal(XLogSegNo logsegno,
TimeLineID logtli,
  pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_SYNC);
  if (pg_fsync(fd) != 0)
  {
- int save_errno = errno;
-

More renaming:

+++ b/src/backend/catalog/heap.c
@@ -1818,19 +1818,19 @@ heap_drop_with_catalog(Oid relid)
  */
  if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
  {
- Relation rel;
- HeapTuple tuple;
+ Relation pg_foreign_table;
+ HeapTuple foreigntuple;

More renaming:

+++ b/src/backend/commands/publicationcmds.c
@@ -106,7 +106,7 @@ parse_publication_options(ParseState *pstate,
  {
  char*publish;
  List*publish_list;
- ListCell   *lc;
+ ListCell   *lc2;

and again:

+++ b/src/backend/commands/tablecmds.c
@@ -10223,7 +10223,7 @@ CloneFkReferencing(List **wqueue, Relation
parentRel, Relation partRel)
  Oid constrOid;
  ObjectAddress address,
  referenced;
- ListCell   *cell;
+ ListCell   *lc;

I've not checked the context one this, but this does not appear to
meet the category of moving to an inner scope:

+++ b/src/backend/executor/execPartition.c
@@ -768,7 +768,6 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
EState *estate,
  {
  List*onconflset;
  List*onconflcols;
- bool found_whole_row;

Looks like you're just using the one from the wider scope. That's not
the category we're after for now.

You've also got some renaming going on in ExecInitAgg()

- phasedata->gset_lengths[i] = perhash->numCols = aggnode->numCols;
+ phasedata->gset_lengths[setno] = perhash->numCols = aggnode->numCols;

I wondered about this one too:

- i = -1;
- while ((i = bms_next_member(all_grouped_cols, i)) >= 0)
- aggstate->all_grouped_cols = lcons_int(i, aggstate->all_grouped_cols);
+ {
+ int i = -1;
+ while ((i = bms_next_member(all_grouped_cols, i)) >= 0)
+ aggstate->all_grouped_cols = lcons_int(i, aggstate->all_grouped_cols);
+ }

I had in mind that maybe we should switch those to be something more like:

for (int i = -1; (i = bms_next_member(all_grouped_cols, i)) >= 0;)

But I had 2nd thoughts as the "while" version has become the standard method.

(Really that code should be using bms_prev_member() and lappend_int()
so we don't have to memmove() the entire list each lcons_int() call.
(not for this patch though))

More renaming being done here:

- int i; /* Index into *ident_user */
+ int j; /* Index into *ident_user */

... in fact, there's lots of renaming, so I'll just stop looking.

Can you just send a patch that only changes the cases where you can
remove a variable declaration from an outer scope into a single inner
scope, or multiple inner scope when the variable can be declared
inside a for() loop? The mcv_get_match_bitmap() change is an example
of this.  There's still a net reduction in lines of code, so I think
the mcv_get_match_bitmap(), and any like it are ok for this next step.
A counter example is ExecInitPartitionInfo() where the way to do this
would be to move the found_whole_row declaration into multiple inner
scopes.  That's a net increase in code lines, for which I think
requires more careful thought if we want that or not.

David

Letter case of "admin option"

2022-08-22 Thread Kyotaro Horiguchi

Today, I see some error messages have been added, two of which look
somewhat inconsistent.

commands/user.c
@707:
>errmsg("must have admin option on role \"%s\" to add members",
@1971:
>errmsg("grantor must have ADMIN OPTION on \"%s\"",

A grep'ing told me that the latter above is the only outlier among 6
occurrences in total of "admin option/ADMIN OPTION".

Don't we unify them?  I slightly prefer "ADMIN OPTION" but no problem
with them being in small letters.  (Attached).


In passing, I met the following code in the same file.

>   if (!have_createrole_privilege() &&
>   !is_admin_of_role(currentUserId, roleid))
>   ereport(ERROR,
>   
> (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
>errmsg("must have admin option on role 
> \"%s\"",
>   rolename)));

The message seems a bit short that it only mentions admin option while
omitting CREATEROLE privilege. "must have CREATEROLE privilege or
admin option on role %s" might be better.  Or we could say just
"insufficient privilege" or "permission denied" in the main error
message then provide "CREATEROLE privilege or admin option on role %s
is required" in DETAILS or HINTS message.  The message was added by
c33d575899 along with the have_createrole_privilege() call so it is
unclear to me whether it is intentional or not.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c
index 511ca8d8fd..4a2e5556f1 100644
--- a/src/backend/commands/user.c
+++ b/src/backend/commands/user.c
@@ -1968,7 +1968,7 @@ check_role_grantor(Oid currentUserId, Oid roleid, Oid grantorId, bool is_grant)
 			select_best_admin(grantorId, roleid) != grantorId)
 			ereport(ERROR,
 	(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
-	 errmsg("grantor must have ADMIN OPTION on \"%s\"",
+	 errmsg("grantor must have admin option on \"%s\"",
 			GetUserNameFromId(roleid, false;
 	}
 	else

Re: Change pfree to accept NULL argument

2022-08-22 Thread David Rowley

On Tue, 23 Aug 2022 at 06:30, Tom Lane  wrote:
>
> Peter Eisentraut  writes:
> > Per discussion in [0], here is a patch set that allows pfree() to accept
> > a NULL argument, like free() does.
>
> So the question is, is this actually a good thing to do?

I think making pfree() accept NULL is a bad idea.  The vast majority
of cases the pointer will never be NULL, so we're effectively just
burdening those with the additional overhead of checking for NULL.

We know from [1] that adding branching in the memory management code
can be costly.

I'm measuring about a 2.6% slowdown from the 0002 patch using a
function that I wrote [2] to hammer palloc/pfree.

master
postgres=# select pg_allocate_memory_test(64, 1024*1024,
10::bigint*1024*1024*1024,'aset');
Time: 2007.527 ms (00:02.008)
Time: 1991.574 ms (00:01.992)
Time: 2008.945 ms (00:02.009)
Time: 2011.410 ms (00:02.011)
Time: 2019.317 ms (00:02.019)
Time: 2060.832 ms (00:02.061)
Time: 2003.066 ms (00:02.003)
Time: 2025.039 ms (00:02.025)
Time: 2039.744 ms (00:02.040)
Time: 2090.384 ms (00:02.090)

master + pfree modifed to check for NULLs
postgres=# select pg_allocate_memory_test(64, 1024*1024,
10::bigint*1024*1024*1024,'aset');
Time: 2057.625 ms (00:02.058)
Time: 2074.699 ms (00:02.075)
Time: 2075.629 ms (00:02.076)
Time: 2104.581 ms (00:02.105)
Time: 2072.620 ms (00:02.073)
Time: 2066.916 ms (00:02.067)
Time: 2071.962 ms (00:02.072)
Time: 2097.520 ms (00:02.098)
Time: 2087.421 ms (00:02.087)
Time: 2078.695 ms (00:02.079)

(~2.62% slowdown)

If the aim here is to remove a bunch of ugly if (ptr) pfree(ptr);
code, then why don't we just have a[n inline] function or a macro for
that and only use it when we need to?

David

[1] 
https://www.postgresql.org/message-id/CAApHDvr6qFw3jLBL9d4zUpo3A2Cb6hoZsUnWD0vF1OGsd67v=w...@mail.gmail.com
[2] 
https://www.postgresql.org/message-id/attachment/136801/pg_allocate_memory_test.patch.txt

Re: shadow variables - pg15 edition

2022-08-22 Thread Justin Pryzby

On Sat, Aug 20, 2022 at 09:17:41PM +1200, David Rowley wrote:
> On Fri, 19 Aug 2022 at 16:28, Justin Pryzby  wrote:
> > Let me know what I can do when it's time for round two.
> 
> I pushed the modified 0001-0008 patches earlier today and also the one
> I wrote to fixup the 36 warnings about "expected" being shadowed.

Thank you

> I looked through a bunch of your remaining patches and was a bit
> unexcited to see many more renaming such as:

Yes - after Michael said that was the sane procedure, I had rearranged the
patch series to present eariler those patches first which renamed variables ..

> However, one category of these changes that I do like are the ones
> where we can move the variable into an inner scope.

There are a lot of these, which ISTM is a good thing.
This fixes about half of the remaining warnings.

https://github.com/justinpryzby/postgres/tree/avoid-shadow-vars
You could review without applying the patches, on the webpage or (probably
better) by adding as a git remote.  Attached is a squished version.

-- 
Justin
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index e88f7efa7e4..69f21abfb59 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -372,7 +372,6 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
  **nullkeys;
int*nkeys,
   *nnullkeys;
-   int keyno;
char   *ptr;
Sizelen;
char   *tmp PG_USED_FOR_ASSERTS_ONLY;
@@ -454,7 +453,7 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
memset(nnullkeys, 0, sizeof(int) * bdesc->bd_tupdesc->natts);
 
/* Preprocess the scan keys - split them into per-attribute arrays. */
-   for (keyno = 0; keyno < scan->numberOfKeys; keyno++)
+   for (int keyno = 0; keyno < scan->numberOfKeys; keyno++)
{
ScanKey key = >keyData[keyno];
AttrNumber  keyattno = key->sk_attno;
diff --git a/src/backend/access/brin/brin_minmax_multi.c 
b/src/backend/access/brin/brin_minmax_multi.c
index 10d4f17bc6f..524c1846b83 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -582,7 +582,6 @@ brin_range_serialize(Ranges *range)
int typlen;
booltypbyval;
 
-   int i;
char   *ptr;
 
/* simple sanity checks */
@@ -621,18 +620,14 @@ brin_range_serialize(Ranges *range)
 */
if (typlen == -1)   /* varlena */
{
-   int i;
-
-   for (i = 0; i < nvalues; i++)
+   for (int i = 0; i < nvalues; i++)
{
len += VARSIZE_ANY(range->values[i]);
}
}
else if (typlen == -2)  /* cstring */
{
-   int i;
-
-   for (i = 0; i < nvalues; i++)
+   for (int i = 0; i < nvalues; i++)
{
/* don't forget to include the null terminator ;-) */
len += strlen(DatumGetCString(range->values[i])) + 1;
@@ -662,7 +657,7 @@ brin_range_serialize(Ranges *range)
 */
ptr = serialized->data; /* start of the serialized data */
 
-   for (i = 0; i < nvalues; i++)
+   for (int i = 0; i < nvalues; i++)
{
if (typbyval)   /* simple by-value data types */
{
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 5866c6aaaf7..30069f139c7 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -234,7 +234,6 @@ gistplacetopage(Relation rel, Size freespace, GISTSTATE 
*giststate,
Pagepage = BufferGetPage(buffer);
boolis_leaf = (GistPageIsLeaf(page)) ? true : false;
XLogRecPtr  recptr;
-   int i;
boolis_split;
 
/*
@@ -420,7 +419,7 @@ gistplacetopage(Relation rel, Size freespace, GISTSTATE 
*giststate,
{
char   *data = (char *) (ptr->list);
 
-   for (i = 0; i < ptr->block.num; i++)
+   for (int i = 0; i < ptr->block.num; i++)
{
IndexTuple  thistup = (IndexTuple) data;
 
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 87b243e0d4b..46e3bb55ebb 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3036,8 +3036,6 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID 
logtli,
pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_SYNC);
if (pg_fsync(fd) != 0)
{
-   int save_errno = errno;
-

Re: pg_receivewal and SIGTERM

2022-08-22 Thread Michael Paquier

On Mon, Aug 22, 2022 at 04:05:16PM +0200, Christoph Berg wrote:
> Do you mean it can, or can not be backpatched? (I'd argue for
> backpatching since the behaviour is slightly broken at the moment.)

I mean that it is fine to backpatch that, in my opinion.
--
Michael


signature.asc
Description: PGP signature

fix typo - empty statement ;;

2022-08-22 Thread Peter Smith

I noticed an accidental ;;

PSA patch to remove the same.

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v1-0001-Fix-typo-empty-statement.patch
Description: Binary data

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-22 Thread Nathan Bossart

On Mon, Aug 22, 2022 at 09:35:34AM +0700, John Naylor wrote:
> Not at all! However, the 32-bit-element changes are irrelevant for
> json, and make review more difficult. I would suggest keeping those in
> the other thread starting with whatever refactoring is needed. I can
> always rebase over that.

Yeah, I'll remove those to keep this thread focused.

> - I like the idea of simplifying the assertions, but I can't get
> behind using platform lfind to do it, since it has a different API,
> requires new functions we don't need, and possibly has portability
> issues. A simple for-loop is better for assertions.

My main goal with this was improving readability, which is likely possible
without lfind().  I'll see what I can do.

> - A runtime elog is not appropriate for a compile time check -- use
> #error instead.

Will do.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-22 Thread Nathan Bossart

On Mon, Aug 22, 2022 at 11:50:35AM +0700, John Naylor wrote:
> On Sat, Aug 20, 2022 at 5:28 AM Nathan Bossart  
> wrote:
>> Thanks for the pointer.  GCC, Clang, and the Arm compiler all seem to
>> define __ARM_NEON, so here is a patch that uses that instead.
> 
> Is this also ever defined on 32-bit? If so, is it safe, meaning the
> compiler will not emit these instructions without additional flags?
> I'm wondering if  __aarch64__ would be clearer on that, and if we get
> windows-on-arm support as has been proposed, could also add _M_ARM64.

I haven't been able to enable __ARM_NEON on 32-bit, but if it is somehow
possible, we should probably add an __aarch64__ check since functions like
vmaxvq_u32() do not appear to be available on 32-bit.  I have been able to
compile for __aarch64__ without __ARM_NEON, so it might still be a good
idea to check for __ARM_NEON.  So, to be safe, perhaps we should use
something like the following:

#if (defined(__aarch64__) || defined(__aarch64)) && defined(__ARM_NEON)

> I also see #if defined(__aarch64__) || defined(__aarch64) in our
> codebase already, but I'm not sure what recognizes the latter.

I'm not sure what uses the latter, either.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: CFM Manager

2022-08-22 Thread Ibrar Ahmed

On Tue, Aug 23, 2022 at 1:46 AM Hamid Akhtar  wrote:

>
>
> On Tue, 23 Aug 2022 at 1:26 AM, Ibrar Ahmed 
> wrote:
>
>>
>>
>> On Mon, Aug 22, 2022 at 9:47 PM Jacob Champion 
>> wrote:
>>
>>> On Mon, Aug 22, 2022 at 9:40 AM Tom Lane  wrote:
>>> > You attribute more organization to this than actually exists ;-)
>>>
>>> Ha, fair enough!
>>>
>>> > If Ibrar wants the job I think it's his.
>>>
>>> Excellent. Ibrar, I'll be updating the CFM checklist [1] over the next
>>> couple of weeks. I'll try to have sections of it touched up by the
>>> time you're due to use them. Let me know if there's anything in
>>> particular that is confusing or needs more explanation.
>>>
>>> Thanks,
>>> --Jacob
>>>
>>> [1] https://wiki.postgresql.org/wiki/CommitFest_Checklist
>>>
>>>
>>> Thanks, I will start working.
>>
>
> I’d like to assist.
>
> Thanks, Hamid

This will help to complete the tasks. I start looking at that; I will let
you know how we both
manage to share the load

--
Ibrar Ahmed

Re: CFM Manager

2022-08-22 Thread Hamid Akhtar

On Tue, 23 Aug 2022 at 1:26 AM, Ibrar Ahmed  wrote:

>
>
> On Mon, Aug 22, 2022 at 9:47 PM Jacob Champion 
> wrote:
>
>> On Mon, Aug 22, 2022 at 9:40 AM Tom Lane  wrote:
>> > You attribute more organization to this than actually exists ;-)
>>
>> Ha, fair enough!
>>
>> > If Ibrar wants the job I think it's his.
>>
>> Excellent. Ibrar, I'll be updating the CFM checklist [1] over the next
>> couple of weeks. I'll try to have sections of it touched up by the
>> time you're due to use them. Let me know if there's anything in
>> particular that is confusing or needs more explanation.
>>
>> Thanks,
>> --Jacob
>>
>> [1] https://wiki.postgresql.org/wiki/CommitFest_Checklist
>>
>>
>> Thanks, I will start working.
>

I’d like to assist.


>
> --
>
> Ibrar Ahmed.
> Senior Software Engineer, PostgreSQL Consultant.
>

Re: CFM Manager

2022-08-22 Thread Ibrar Ahmed

On Mon, Aug 22, 2022 at 9:47 PM Jacob Champion 
wrote:

> On Mon, Aug 22, 2022 at 9:40 AM Tom Lane  wrote:
> > You attribute more organization to this than actually exists ;-)
>
> Ha, fair enough!
>
> > If Ibrar wants the job I think it's his.
>
> Excellent. Ibrar, I'll be updating the CFM checklist [1] over the next
> couple of weeks. I'll try to have sections of it touched up by the
> time you're due to use them. Let me know if there's anything in
> particular that is confusing or needs more explanation.
>
> Thanks,
> --Jacob
>
> [1] https://wiki.postgresql.org/wiki/CommitFest_Checklist
>
>
> Thanks, I will start working.

-- 

Ibrar Ahmed.
Senior Software Engineer, PostgreSQL Consultant.

Re: making relfilenodes 56 bits

2022-08-22 Thread Robert Haas

On Mon, Aug 22, 2022 at 3:55 AM Amit Kapila  wrote:
> To solve that problem, how about rewriting the system table in the new
> cluster which has a conflicting relfilenode? I think we can probably
> do this conflict checking before processing the tables from the old
> cluster.

Thanks for chiming in.

Right now, there are two parts to the relfilenumber preservation
system, and this scheme doesn't quite fit into either of them. First,
the dump includes commands to set pg_class.relfilenode in the new
cluster to the same value that it had in the old cluster. The dump
can't include any SQL commands that depend on what's happening in the
new cluster because pg_dump(all) only connects to a single cluster,
which in this case is the old cluster. Second, pg_upgrade itself
copies the files from the old cluster to the new cluster. This doesn't
involve a database connection at all. So there's no part of the
current relfilenode preservation mechanism that can look at the old
AND the new database and decide on some SQL to execute against the new
database.

I thought for a while that we could use the information that's already
gathered by get_rel_infos() to do what you're suggesting here, but it
doesn't quite work, because that function excludes system tables, and
we can't afford to do that here. We'd either need to modify that query
to include system tables - at least for the new cluster - or run a
separate one to gather information about system tables in the new
cluster. Then, we could put all the pg_class.relfilenode values we
found in the new cluster into a hash table, loop over the list of rels
this function found in the old cluster, and for each one, probe into
the hash table. If we find a match, that's a system table that needs
to be moved out of the way before calling create_new_objects(), or
maybe inside that function but before it runs pg_restore.

That doesn't seem too crazy, I think. It's a little bit of new
mechanism, but it doesn't sound horrific. It's got the advantage of
being significantly cheaper than my proposal of moving everything out
of the way unconditionally, and at the same time it retains one of the
key advantages of that proposal - IMV, anyway - which is that we don't
need separate relfilenode ranges for user and system objects any more.
So I guess on balance I kind of like it, but maybe I'm missing
something.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] ALTER TABLE ... SET STORAGE default

2022-08-22 Thread Nikita Malakhov

Hi!

Anyway, adding a paragraph with default storage mode for each standard data
type seems
like a good idea and I'd prepare a patch for it.
Thank you!

On Mon, Aug 22, 2022 at 4:28 PM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> Hi Nikita,
>
> > This seems a little bit confusing and thus very unfriendly for the user,
> because the actual meaning
> > of the same 'DEFAULT' option will be different for each data type, and
> to check storage mode user
> > has to query full table (or column) description.
> > I'd rather add a paragraph in documentation describing each data type
> default storage mode.
>
> I agree that "SET STORAGE default" syntax leaves much to be desired.
>
> Personally I would prefer "RESET STORAGE" and "RESET COMPRESSION". But
> since we already have "SET COMPRESSION default" this going to be
> either two commands that do the same thing, or a broken backward
> compatibility. Simply removing "SET COMPRESSION default" will make the
> syntax consistent too, but again, this would be a broken backward
> compatibility. I would argue that a sub-optimal but consistent syntax
> that does the job is better than inconsistent syntax and figuring out
> the default storage strategy manually.
>
> But let's see what is others people opinion.
>
> --
> Best regards,
> Aleksander Alekseev
>


-- 
Regards,
Nikita Malakhov
https://postgrespro.ru/

re: Change pfree to accept NULL argument

2022-08-22 Thread Ranier Vilela

>Per discussion in [0], here is a patch set that allows pfree() to accept
>a NULL argument, like free() does.

>Also, a patch that removes the now-unnecessary null pointer checks
>before calling pfree(). And a few patches that do the same for some
>other functions that I found around. (The one with FreeDir() is perhaps
>a bit arguable, since FreeDir() wraps closedir() which does *not* accept
>NULL arguments. Also, neither FreeFile() nor the underlying fclose()
>accept NULL.)

Hi Peter,

+1

However, after a quick review, I noticed some cases of PQ freemen missing.
I took the liberty of making a v1, attached.

regards,

Ranier Vilela


v1-0004-Remove-unneeded-null-pointer-checks-before-PQfreemem.patch
Description: Binary data

Re: Schema variables - new implementation for Postgres 15

2022-08-22 Thread Pavel Stehule

po 22. 8. 2022 v 9:33 odesílatel Julien Rouhaud  napsal:

> Hi Pavel,
>
> On Sun, Aug 21, 2022 at 09:54:03AM +0200, Pavel Stehule wrote:
> >
> > should be fixed now
>
> I started reviewing the patchset, beginning with 0001 (at least the parts
> that
> don't substantially change later) and have a few comments.
>
> - you define new AclMode READ and WRITE.  Those bits are precious and I
> don't
>   think it's ok to consume 2 bits for session variables, especially since
> those
>   are the last two bits available since the recent GUC access control patch
>   (ACL_SET and ACL_ALTER_SYSTEM).  Maybe we could existing INSERT and
> UPDATE
>   privileges instead, like it's done for sequences?
>
>
I have not a strong opinion about it.  AclMode is uint32 - so I think there
are still 15bites reserved. I think so UPDATE and SELECT rights can work,
but maybe it is better to use separate rights WRITE, READ to be stronger
signalized so the variable is not the relation. On other hand large objects
use ACL_UPDATE, ACL_SELECT too, and it works. So I am neutral in this
question. Has somebody here some opinion on this point? If not I'll modify
the patch like Julien proposes.

Regards

Pavel

Re: Schema variables - new implementation for Postgres 15

2022-08-22 Thread Pavel Stehule

po 22. 8. 2022 v 16:05 odesílatel Justin Pryzby 
napsal:

> > +-- test on query with workers
> > +CREATE TABLE svar_test(a int);
> > +INSERT INTO svar_test SELECT * FROM generate_series(1,100);
>
> When I looked at this, I noticed this huge table.
>
> I don't think you should create such a large table just for this.
>
> To exercise parallel workers with a smaller table, decrease
> min_parallel_table_scan_size and others as done in other regression tests.
>
>
I fixed it.

Thank you for tip

Pavel

> --
> Justin
>

Re: Change pfree to accept NULL argument

2022-08-22 Thread Andres Freund

Hi,

On 2022-08-22 14:30:22 -0400, Tom Lane wrote:
> Peter Eisentraut  writes:
> > Per discussion in [0], here is a patch set that allows pfree() to accept
> > a NULL argument, like free() does.
>
> So the question is, is this actually a good thing to do?
>
> If we were starting in a green field, I'd be fine with defining
> pfree(NULL) as okay.  But we're not, so there are a couple of big
> objections:
>
> * Code developed to this standard will be unsafe to back-patch
>
> * The sheer number of places touched will create back-patching
> hazards.
>
> I'm not very convinced that the benefits of making pfree() more
> like free() are worth those costs.

It's probably also not entirely cost free due to the added branches in place
we are certain that the pointer is non-null. That could partially be
ameliorated by moving the NULL pointer check into the callers.

If we don't want to go this route it might be worth adding a
pg_attribute_nonnull() or such to pfree().

> (FWIW, no objection to your 0001.  0004 and 0005 seem okay too;
> they don't touch enough places to create much back-patching risk.)

I like 0001, not sure I find 0004, 0005 an improvement.

Semi-related note: I've sometimes wished for a pfreep(void **p) that'd do
something like

if (*p)
{
pfree(*p);
*p = NULL;
}

so there's no dangling pointers after the pfree(), which often enoughis
important (e.g. because the code could be reached again if there's an error)
and is also helpful when debugging. The explicit form does bulk up code
sufficiently to be annoying.

Greetings,

Andres Freund

standby promotion can create unreadable WAL

2022-08-22 Thread Robert Haas

My colleague Dilip Kumar and I have discovered what I believe to be a
bug in the recently-added "overwrite contrecord" stuff. I'm not sure
whether or not this bug has any serious consequences. I think that
there may be a scenario where it does, but I'm not sure about that.

Suppose you have a primary and a standby, and the standby is promoted
after reading a partial WAL record. The attached script, which was
written by Dilip and slightly modified by me, creates this scenario by
setting up an archiving-only standby, writing a record that crosses a
segment boundary, and then promoting the standby. If you then try to
run pg_waldump on the WAL on timeline 2, it goes boom:

[rhaas pg_wal]$ pg_waldump 00020004
00020005 2>&1 | tail -n4
rmgr: Heaplen (rec/tot):   1959/  1959, tx:728, lsn:
0/04FFE7B0, prev 0/04FFDFF0, desc: INSERT off 4 flags 0x00, blkref #0:
rel 1663/5/16384 blk 2132
rmgr: Heaplen (rec/tot):   1959/  1959, tx:728, lsn:
0/04FFEF58, prev 0/04FFE7B0, desc: INSERT+INIT off 1 flags 0x00,
blkref #0: rel 1663/5/16384 blk 2133
rmgr: Heaplen (rec/tot):   1959/  1959, tx:728, lsn:
0/04FFF700, prev 0/04FFEF58, desc: INSERT off 2 flags 0x00, blkref #0:
rel 1663/5/16384 blk 2133
pg_waldump: error: error in WAL record at 0/4FFF700: invalid record
length at 0/4FFFEA8: wanted 24, got 0

What's happening here is that the last WAL segment from timeline 1,
which is 00010004, gets copied over to the new
timeline up to the point where the last complete record on that
timeline ends, namely, 0/4FFFEA8. I think that the first record on the
new timeline should be written starting at that LSN, but that's not
what happens. Instead, the rest of that WAL segment remains zeroed,
and the first WAL record on the new timeline is written at the
beginning of the next segment:

[rhaas pg_wal]$ pg_waldump 00020005 2>&1 | head -n4
rmgr: XLOGlen (rec/tot): 42/42, tx:  0, lsn:
0/0528, prev 0/04FFF700, desc: OVERWRITE_CONTRECORD lsn 0/4FFFEA8;
time 2022-08-22 13:49:22.874435 EDT
rmgr: XLOGlen (rec/tot):114/   114, tx:  0, lsn:
0/0558, prev 0/0528, desc: CHECKPOINT_SHUTDOWN redo 0/558;
tli 2; prev tli 1; fpw true; xid 0:729; oid 24576; multi 1; offset 0;
oldest xid 719 in DB 1; oldest multi 1 in DB 1; oldest/newest commit
timestamp xid: 0/0; oldest running xid 0; shutdown
rmgr: XLOGlen (rec/tot): 30/30, tx:  0, lsn:
0/05D0, prev 0/0558, desc: NEXTOID 32768
rmgr: Storage len (rec/tot): 42/42, tx:  0, lsn:
0/05F0, prev 0/05D0, desc: CREATE base/5/24576

Nothing that uses xlogreader is going to be able to bridge the gap
between file #4 and file #5. In this case it doesn't matter very much,
because we immediately write a checkpoint record into file #5, so if
we crash we won't try to replay file #4 anyway. However, if anything
did try to look at file #4 it would get confused. Maybe that can
happen if this is a streaming standby, where we only write an
end-of-recovery record upon promotion, rather than a checkpoint, or
maybe if there are cascading standbys someone could try to actually
use the 00020004 file for something. I'm not sure. But
unless I'm missing something, that file is bogus, and our only hope of
not having problems is that perhaps no one will ever look at it.

I think that the cause of this problem is this code right here:

/*
 * Actually, if WAL ended in an incomplete record, skip the parts that
 * made it through and start writing after the portion that persisted.
 * (It's critical to first write an OVERWRITE_CONTRECORD message, which
 * we'll do as soon as we're open for writing new WAL.)
 */
if (!XLogRecPtrIsInvalid(missingContrecPtr))
{
Assert(!XLogRecPtrIsInvalid(abortedRecPtr));
EndOfLog = missingContrecPtr;
}

It seems to me that this if-statement should also test that the TLI
has not changed i.e. if (newTLI != endOfRecoveryInfo->lastRecTLI &&
!XLogRecPtrIsInvalid(missingContrecPtr)). If the TLI hasn't changed,
then everything the comment says is correct and I think that what the
code does is also correct. However, if the TLI *has* changed, then I
think we must not advance EndOfLog here, because the WAL that was
copied from the old timeline to the new timeline ends at the point in
the file corresponding to the value of EndOfLog just before executing
this code. When this code then moves EndOfLog forward to the beginning
of the next segment, it leaves the unused portion of the previous
segment as all zeroes, which creates the problem described above.

(Incidentally, there's also a bug in pg_waldump here: it's reporting
the wrong LSN as the source of the error. 0/4FFF700 is not the record
that's busted, as shown by the fact that it was successfully decoded
and shown in the output. The relevant code in pg_waldump should be
using

Re: Change pfree to accept NULL argument

2022-08-22 Thread Tom Lane

Peter Eisentraut  writes:
> Per discussion in [0], here is a patch set that allows pfree() to accept 
> a NULL argument, like free() does.

So the question is, is this actually a good thing to do?

If we were starting in a green field, I'd be fine with defining
pfree(NULL) as okay.  But we're not, so there are a couple of big
objections:

* Code developed to this standard will be unsafe to back-patch

* The sheer number of places touched will create back-patching
hazards.

I'm not very convinced that the benefits of making pfree() more
like free() are worth those costs.

We could ameliorate the first objection if we wanted to back-patch
0002, I guess.

(FWIW, no objection to your 0001.  0004 and 0005 seem okay too;
they don't touch enough places to create much back-patching risk.)

regards, tom lane

Change pfree to accept NULL argument

2022-08-22 Thread Peter Eisentraut

Per discussion in [0], here is a patch set that allows pfree() to accept 
a NULL argument, like free() does.


Also, a patch that removes the now-unnecessary null pointer checks 
before calling pfree().  And a few patches that do the same for some 
other functions that I found around.  (The one with FreeDir() is perhaps 
a bit arguable, since FreeDir() wraps closedir() which does *not* accept 
NULL arguments.  Also, neither FreeFile() nor the underlying fclose() 
accept NULL.)



[0]: https://www.postgresql.org/message-id/1074830.1655442...@sss.pgh.pa.usFrom e80484a9d3a6897984a2bc1f80df55d590e2e85b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 22 Aug 2022 16:51:52 +0200
Subject: [PATCH 1/5] Remove unnecessary casts in free() and pfree()

---
 contrib/sepgsql/label.c| 8 
 src/backend/utils/fmgr/dfmgr.c | 6 +++---
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/contrib/sepgsql/label.c b/contrib/sepgsql/label.c
index e4c98b7eae..6e7c0d7cff 100644
--- a/contrib/sepgsql/label.c
+++ b/contrib/sepgsql/label.c
@@ -662,28 +662,28 @@ quote_object_name(const char *src1, const char *src2,
temp = quote_identifier(src1);
appendStringInfoString(, temp);
if (src1 != temp)
-   pfree((void *) temp);
+   pfree(temp);
}
if (src2)
{
temp = quote_identifier(src2);
appendStringInfo(, ".%s", temp);
if (src2 != temp)
-   pfree((void *) temp);
+   pfree(temp);
}
if (src3)
{
temp = quote_identifier(src3);
appendStringInfo(, ".%s", temp);
if (src3 != temp)
-   pfree((void *) temp);
+   pfree(temp);
}
if (src4)
{
temp = quote_identifier(src4);
appendStringInfo(, ".%s", temp);
if (src4 != temp)
-   pfree((void *) temp);
+   pfree(temp);
}
return result.data;
 }
diff --git a/src/backend/utils/fmgr/dfmgr.c b/src/backend/utils/fmgr/dfmgr.c
index 08fd7e1264..6ae6fc1556 100644
--- a/src/backend/utils/fmgr/dfmgr.c
+++ b/src/backend/utils/fmgr/dfmgr.c
@@ -240,7 +240,7 @@ internal_load_library(const char *libname)
if (file_scanner->handle == NULL)
{
load_error = dlerror();
-   free((char *) file_scanner);
+   free(file_scanner);
/* errcode_for_file_access might not be appropriate 
here? */
ereport(ERROR,
(errcode_for_file_access(),
@@ -263,7 +263,7 @@ internal_load_library(const char *libname)
 
/* try to close library */
dlclose(file_scanner->handle);
-   free((char *) file_scanner);
+   free(file_scanner);
 
/* issue suitable complaint */
incompatible_module_error(libname, 
_magic_data);
@@ -273,7 +273,7 @@ internal_load_library(const char *libname)
{
/* try to close library */
dlclose(file_scanner->handle);
-   free((char *) file_scanner);
+   free(file_scanner);
/* complain */
ereport(ERROR,
(errmsg("incompatible library \"%s\": 
missing magic block",
-- 
2.37.1

From f5eea6d2f1e8a8ed81cf177cef320e041bbff90b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 22 Aug 2022 16:51:52 +0200
Subject: [PATCH 2/5] Change pfree to accept NULL argument

---
 src/backend/utils/mmgr/README | 6 +-
 src/backend/utils/mmgr/mcxt.c | 7 ++-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/mmgr/README b/src/backend/utils/mmgr/README
index 221b4bd343..e552c6ada6 100644
--- a/src/backend/utils/mmgr/README
+++ b/src/backend/utils/mmgr/README
@@ -66,7 +66,11 @@ pointer, but a valid chunk of which no bytes may be used.  
However, the
 chunk might later be repalloc'd larger; it can also be pfree'd without
 error.  Similarly, repalloc allows realloc'ing to zero size.
 
-* pfree and repalloc do not accept a NULL pointer.  This is intentional.
+* repalloc does not accept a NULL pointer.  This is intentional.  (As
+mentioned above, repalloc does not depend on the current memory
+context.  But then it needs to know which memory context to do the
+allocation in.  So the first allocation has to be done outside
+repalloc.)
 
 
 The Current Memory Context
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index e12be1b9bd..d85e961c1e 100644
---

Re: pg_stat_have_stats() returns true for dropped indexes (or for index creation transaction rolled back)

2022-08-22 Thread Andres Freund

Hi,

On 2022-08-22 18:39:07 +0200, Drouvot, Bertrand wrote:
> While working on the relation stats split into table and index stats [1], I
> noticed that currently pg_stat_have_stats() returns true for dropped indexes
> (or for index creation transaction rolled back).

Good catch.

I guess Horiguchi-san and/or I wrongly assumed it'd be taken care of by the
pgstat_create_relation() in heap_create_with_catalog(), but index_create()
doesn't use that.

> Please find attached a patch proposal to fix it.

Perhaps a better fix would be to move the pgstat_create_relation() from
heap_create_with_catalog() into heap_create()? Although I guess it's a bit
pointless to deduplicate given that you're going to split it up again...

> It does contain additional calls to pgstat_create_relation() and
> pgstat_drop_relation() as well as additional TAP tests.

Would be good to add a test for CREATE INDEX / DROP INDEX / REINDEX
CONCURRENTLY as well.

Might be worth adding a test to stats.sql or stats.spec in the main regression
tests. Perhaps that's best where the aforementioned things should be tested?

> @@ -2349,6 +2354,7 @@ index_drop(Oid indexId, bool concurrent, bool 
> concurrent_lock_mode)
>   CatalogTupleDelete(indexRelation, >t_self);
>  
>   ReleaseSysCache(tuple);
> +
>   table_close(indexRelation, RowExclusiveLock);
>  
>   /*

Assume this was just an accident?

Greetings,

Andres Freund

Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2022-08-22 Thread Melanie Plageman

v28 attached.

I've added the new structs I added to typedefs.list.

I've split the commit which adds all of the logic to track
IO operation statistics into two commits -- one which includes all of
the code to count IOOps for IOContexts locally in a backend and a second
which includes all of the code to accumulate and manage these with the
cumulative stats system.

A few notes about the commit which adds local IO Operation stats:

- There is a comment above pgstat_io_op_stats_collected() which mentions
the cumulative stats system even though this commit doesn't engage the
cumulative stats system. I wasn't sure if it was more or less
confusing to have two different versions of this comment.

- should pgstat_count_io_op() take BackendType as a parameter instead of
using MyBackendType internally?

- pgstat_count_io_op() Assert()s that the passed-in IOOp and IOContext
are valid for this BackendType, but it doesn't check that all of the
pending stats which should be zero are zero. I thought this was okay
because if I did add that zero-check, it would be added to
pgstat_count_ioop() as well, and we already Assert() there that we can
count the op. Thus, it doesn't seem like checking that the stats are
zero would add any additional regression protection.

- I've kept pgstat_io_context_desc() and pgstat_io_op_desc() in the
commit which adds those types (the local stats commit), however they
are not used in that commit. I wasn't sure if I should keep them in
that commit or move them to the first commit using them (the commit
adding the new view).

Notes on the commit which accumulates IO Operation stats in shared
memory:

- I've extended the usage of the Assert()s that IO Operation stats that
should be zero are. Previously we only checked the stats validity when
querying the view. Now we check it when flushing pending stats and
when reading the stats file into shared memory.

Note that the three locations with these validity checks (when
flushing pending stats, when reading stats file into shared memory,
and when querying the view) have similar looking code to loop through
and validate the stats. However, the actual action they perform if the
stats are valid is different for each site (adding counters together,
doing a read, setting nulls in a tuple column to true). Also, some of
these instances have other code interspersed in the loops which would
require additional looping if separated from this logic. So it was
difficult to see a way of combining these into a single helper
function.

- I've left pgstat_fetch_backend_io_context_ops() in the shared stats
commit, however it is not used until the commit which adds the view in
pg_stat_get_io(). I wasn't sure which way seemed better.

- Melanie
From 19f89b5ba164272758f17a35973eae69475f7400 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Mon, 22 Aug 2022 11:08:23 -0400
Subject: [PATCH v28 3/5] Track IO operation statistics locally

Introduce "IOOp", an IO operation done by a backend, and "IOContext",
the IO location source or target or IO type done by a backend. For
example, the checkpointer may write a shared buffer out. This would be
counted as an IOOp "write" on an IOContext IOCONTEXT_SHARED by
BackendType "checkpointer".

Each IOOp (alloc, extend, fsync, read, write) is counted per IOContext
(local, shared, or strategy) through a call to pgstat_count_io_op().

The primary concern of these statistics is IO operations on data blocks
during the course of normal database operations. IO done by, for
example, the archiver or syslogger is not counted in these statistics.

IOCONTEXT_LOCAL and IOCONTEXT_SHARED IOContexts concern operations on
local and shared buffers.

The IOCONTEXT_STRATEGY IOContext concerns buffers
alloc'd/extended/fsync'd/read/written as part of a BufferAccessStrategy.

IOOP_ALLOC is counted for IOCONTEXT_SHARED and IOCONTEXT_LOCAL whenever
a buffer is acquired through [Local]BufferAlloc(). IOOP_ALLOC for
IOCONTEXT_STRATEGY is counted whenever a buffer already in the strategy
ring is reused. And IOOP_WRITE for IOCONTEXT_STRATEGY is counted
whenever the reused dirty buffer is written out.

Stats on IOOps for all IOContexts for a backend are counted in a
backend's local memory. This commit does not expose any functions for
aggregating or viewing these stats.

Author: Melanie Plageman 
Reviewed-by: Justin Pryzby , Kyotaro Horiguchi 
Discussion: https://www.postgresql.org/message-id/flat/20200124195226.lth52iydq2n2uilq%40alap3.anarazel.de
---
 src/backend/postmaster/checkpointer.c  |  12 ++
 src/backend/storage/buffer/bufmgr.c|  64 +--
 src/backend/storage/buffer/freelist.c  |  23 ++-
 src/backend/storage/buffer/localbuf.c  |   5 +
 src/backend/storage/sync/sync.c|   9 +
 src/backend/utils/activity/Makefile|   1 +
 src/backend/utils/activity/pgstat_io_ops.c | 191 +
 src/include/pgstat.h   |  54 ++
 src/include/storage/buf_internals.h|   2 +-

Re: CFM Manager

2022-08-22 Thread Jacob Champion

On Mon, Aug 22, 2022 at 9:40 AM Tom Lane  wrote:
> You attribute more organization to this than actually exists ;-)

Ha, fair enough!

> If Ibrar wants the job I think it's his.

Excellent. Ibrar, I'll be updating the CFM checklist [1] over the next
couple of weeks. I'll try to have sections of it touched up by the
time you're due to use them. Let me know if there's anything in
particular that is confusing or needs more explanation.

Thanks,
--Jacob

[1] https://wiki.postgresql.org/wiki/CommitFest_Checklist

Re: CFM Manager

2022-08-22 Thread Tom Lane

Jacob Champion  writes:
> On Thu, Aug 11, 2022 at 3:14 AM Ibrar Ahmed  wrote:
>> Is anybody else volunteer for that, if not I am ready to take that 
>> resposibility.

> Hi Ibrar,

> I don't think I've seen anyone else volunteer. I'd wait for a
> committer to confirm that you've got the job, though.

You attribute more organization to this than actually exists ;-)

If Ibrar wants the job I think it's his.

regards, tom lane

Re: CFM Manager

2022-08-22 Thread Jacob Champion

On Thu, Aug 11, 2022 at 3:14 AM Ibrar Ahmed  wrote:
> Is anybody else volunteer for that, if not I am ready to take that 
> resposibility.

Hi Ibrar,

I don't think I've seen anyone else volunteer. I'd wait for a
committer to confirm that you've got the job, though.

All: we're rapidly approaching the next CF, so if someone from the
crowd could chime in, it would probably help Ibrar prepare.

Thanks,
--Jacob

Re: Include the dependent extension information in describe command.

2022-08-22 Thread vignesh C

On Tue, Aug 16, 2022 at 9:04 PM Bruce Momjian  wrote:
>
> On Mon, Aug 15, 2022 at 10:09:29PM +0530, vignesh C wrote:
> > I have updated the patch to display "Objects depending on extension"
> > as describe extension footer. The changes for the same are available
> > in the v2 version patch attached. Thoughts?
>
> I wonder if we would be better off with a backslash command that showed
> the dependencies of any object.

Yes, If we have a backslash command which could show the dependencies
of the specified object could be helpful.
Can we something like below:
a) Index idx1 depend on table t1
create table t1(c1 int);
create index idx1 on t1(c1);
postgres=# \dD idx1
Name
-
idx1
Depends on:
table t1

b) Index idx1 depend on table t1 and extension ext1
alter index idx idx1 depends on extension ext1
postgres=# \dD idx1
Name
-
idx1
Depends on:
table t1
extension ext1

c) materialized view mv1 depends on table t1
create materialized view mv1 as select  * from t1;
postgres=# \dD mv1
Name
-
mv1
Depends on:
table t1

If you are ok with this approach, I can implement a patch on similar
lines. Thoughts?

Regards,
Vignesh

Re: Inconsistencies around defining FRONTEND

2022-08-22 Thread Andres Freund

Hi,

On 2022-08-20 12:45:50 -0700, Andres Freund wrote:
> The -DFRONTENDs for the various ecpg libraries don't seem necessary
> anymore. That looks to be a leftover from 7143b3e8213, before that ecpg had
> copies of various pgport libraries.
>
> Same with libpq, also looks to be obsoleted by 7143b3e8213.
>
> I don't think we need FRONTEND in initdb - looks like that stopped being
> required with af1a949109d.

I think the patches for this are fairly obvious, and survived CI without an
issue [1], so the src/tools/msvc bits work too. So I'm planning to push them
fairly soon.


But the remaining "issues" don't have an obvious solutions and not addressed
by these patches:

> Unfortunately, the remaining uses of FRONTEND are required. That's:
> - pg_controldata, via #define
> - pg_resetwal, via #define
> - pg_rewind, via -DFRONTEND, due to xlogreader.c
> - pg_waldump, via #define and -DFRONTEND, due to xlogreader.c, xlogstats.c, 
> rmgrdesc/*desc.c
>
> I'm kind of wondering if we should add xlogreader.c, xlogstat.c, *desc to
> fe_utils, instead of building them in various places.  That'd leave us only
> with #define FRONTENDs for cases that do need to include postgres.h
> themselves, which seems a lot more palatable.


Greetings,

Andres Freund


[1] https://cirrus-ci.com/build/4648937721167872

Re: pg_auth_members.grantor is bunk

2022-08-22 Thread Robert Haas

On Thu, Aug 18, 2022 at 1:26 PM Robert Haas  wrote:
> On Wed, Aug 10, 2022 at 4:28 PM Robert Haas  wrote:
> > Well, CI isn't happy with this, and for good reason:
>
> CI is happier with this version, so I've committed 0001. If no major
> problems emerge, I'll proceed with 0002 as well.

Done.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Strip -mmacosx-version-min options from plperl build

2022-08-22 Thread Andres Freund

Hi,

On 2022-08-22 16:31:53 +0200, Peter Eisentraut wrote:
> On 20.08.22 22:44, Andres Freund wrote:
> > Maybe a daft question: Why do want any of the -l flags other than -lperl? 
> > With
> > the patch configure spits out the following on my debian system:
> > 
> > checking for CFLAGS to compile embedded Perl... -DDEBIAN
> > checking for flags to link embedded Perl...   
> > -L/usr/lib/x86_64-linux-gnu/perl/5.34/CORE -lperl -ldl -lm -lpthread -lc 
> > -lcrypt
> > 
> > those libraries were likely relevant to build libperl, but don't look 
> > relevant
> > for linking to it dynamically. Statically would be a different story, but we
> > already insist on a shared build.
> 
> Looking inside the ExtUtils::Embed source code, I wonder if there are some
> installations that have things like -lperl538 or something like that that it
> wants to deal with.

There definitely are - I wasn't trying to suggest we'd add -lperl ourselves,
just that we'd try to only add -lperl* based on some Config variable.

We have plenty fragile windows specific logic that would be nice to get rid
of. But there's more exciting things to wrangle, I have to admit.

Greetings,

Andres Freund

Re: Strip -mmacosx-version-min options from plperl build

2022-08-22 Thread Andres Freund

Hi,

On 2022-08-22 16:32:36 +0200, Peter Eisentraut wrote:
> On 20.08.22 23:44, Andres Freund wrote:
> > On 2022-08-20 16:53:31 -0400, Tom Lane wrote:
> > > Andres Freund  writes:
> > > > Maybe a daft question: Why do want any of the -l flags other than 
> > > > -lperl? With
> > > > the patch configure spits out the following on my debian system:
> > > 
> > > > checking for CFLAGS to compile embedded Perl... -DDEBIAN
> > > > checking for flags to link embedded Perl...   
> > > > -L/usr/lib/x86_64-linux-gnu/perl/5.34/CORE -lperl -ldl -lm -lpthread 
> > > > -lc -lcrypt
> > > 
> > > > those libraries were likely relevant to build libperl, but don't look 
> > > > relevant
> > > > for linking to it dynamically.
> > > 
> > > I'm certain that there are/were platforms that insist on those libraries
> > > being mentioned anyway.  Maybe they are all obsolete now?
> > 
> > I don't think any of the supported platforms require it for stuff used 
> > inside
> > the shared library (and we'd be in trouble if so, check e.g. libpq.pc). But 
> > of
> > course that's different if there's inline function / macros getting pulled 
> > in.
> > 
> > Which turns out to be an issue on AIX. All the -l flags added by perl can be
> > removed for xlc, but for gcc, -lpthreads (or -pthread) it is required.
> > 
> > Tried it on Solaris (32 bit, not sure if there's a 64bit perl available),
> > works.
> 
> Does that mean my proposed patch (v2) is adequate for these platforms, or
> does it need further analysis?

I think it's a clear improvement over the status quo. Unnecessary -l's are
pretty harmless compared to random other flags.

Greetings,

Andres Freund

[PATCH] Allow usage of archive .backup files as backup_label

2022-08-22 Thread Michael Banck

Hi,

The .backup files written to the archive (if archiving is on) are very
similar to the backup_label that's written/returned by
pg_stop_backup()/pg_backup_stop(), they just have a few extra lines
about the end of backup process that are missing from backup_label.

The parser in xlogrecovery.c however barfs on them because it does not
expect the additional STOP WAL LOCATION on line 2.

The attached makes parsing this line optional, so that one can use those
.backup files in place of backup_label. This is e.g. useful if the
backup_label got lost or the output of pg_stop_backup() was not
captured.


Michael

-- 
Team Lead PostgreSQL
Project Manager
Tel.: +49 2166 9901-171
Mail: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Management: Dr. Michael Meskes, Geoff Richardson, Peter Lilley

Our handling of personal data is subject to:
https://www.credativ.de/en/contact/privacy/
>From 00b1b245f74a0496a4d60cfafff92735dbe73d22 Mon Sep 17 00:00:00 2001
From: Michael Banck 
Date: Mon, 22 Aug 2022 16:20:14 +0200
Subject: [PATCH] Allow usage of archive .backup files as backup_label.

This lets the backup_label parser not bail if STOP WAL LOCATION is encountered,
which is the only meaningful difference between an archive .backup file and a
backup_label file, thus allowing to just copy the corresponding .backup file
from the archive as backup_label, in case the backup_label file got lost or was
never recorded.
---
 src/backend/access/transam/xlogrecovery.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index a59a0e826b..a95946e391 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -1149,6 +1149,7 @@ read_backup_label(XLogRecPtr *checkPointLoc, TimeLineID *backupLabelTLI,
   bool *backupEndRequired, bool *backupFromStandby)
 {
 	char		startxlogfilename[MAXFNAMELEN];
+	char		stopxlogfilename[MAXFNAMELEN];
 	TimeLineID	tli_from_walseg,
 tli_from_file;
 	FILE	   *lfp;
@@ -1183,7 +1184,10 @@ read_backup_label(XLogRecPtr *checkPointLoc, TimeLineID *backupLabelTLI,
 	/*
 	 * Read and parse the START WAL LOCATION and CHECKPOINT lines (this code
 	 * is pretty crude, but we are not expecting any variability in the file
-	 * format).
+	 * format). Also allow STOP WAL LOCATION to be in the file. This line does
+	 * not appear in backup_label, but it is written to the corresponding
+	 * .backup file and allows users to rename or copy that file to
+	 * backup_label without further editing.
 	 */
 	if (fscanf(lfp, "START WAL LOCATION: %X/%X (file %08X%16s)%c",
 			   , , _from_walseg, startxlogfilename, ) != 5 || ch != '\n')
@@ -1192,6 +1196,11 @@ read_backup_label(XLogRecPtr *checkPointLoc, TimeLineID *backupLabelTLI,
  errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE)));
 	RedoStartLSN = ((uint64) hi) << 32 | lo;
 	RedoStartTLI = tli_from_walseg;
+	if (fscanf(lfp, "STOP WAL LOCATION: %X/%X (file %*08X%16s)%c",
+, , stopxlogfilename, ) == 4)
+		ereport(DEBUG1,
+(errmsg_internal("stop wal location %X/%X in file \"%s\"",
+ hi, lo, BACKUP_LABEL_FILE)));
 	if (fscanf(lfp, "CHECKPOINT LOCATION: %X/%X%c",
 			   , , ) != 3 || ch != '\n')
 		ereport(FATAL,
-- 
2.30.2

Re: Remove offsetof definition

2022-08-22 Thread Tom Lane

Peter Eisentraut  writes:
> More portability cruft cleanup:  Our own definition of offsetof() was 
> only relevant for ancient systems and can surely be removed.

+1, it's required by C99.

regards, tom lane

Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-08-22 Thread Jacob Champion

On Mon, Aug 22, 2022 at 4:32 AM Michael Paquier  wrote:
> By the way, I have looked at the patch, tweaked a couple of things
> with comments and the style, but overval that's fine.  First, I have
> intended to apply this stuff today but I have lacked the time to do
> so.  I should be able to get this wrapped tomorrow, though.

Thank you both for the reviews! Let me know if it would help for me to
issue a new patchset, otherwise I'll sit tight.

--Jacob

Re: Strip -mmacosx-version-min options from plperl build

2022-08-22 Thread Peter Eisentraut


On 20.08.22 23:44, Andres Freund wrote:

On 2022-08-20 16:53:31 -0400, Tom Lane wrote:

Andres Freund  writes:

Maybe a daft question: Why do want any of the -l flags other than -lperl? With
the patch configure spits out the following on my debian system:



checking for CFLAGS to compile embedded Perl... -DDEBIAN
checking for flags to link embedded Perl...   
-L/usr/lib/x86_64-linux-gnu/perl/5.34/CORE -lperl -ldl -lm -lpthread -lc -lcrypt



those libraries were likely relevant to build libperl, but don't look relevant
for linking to it dynamically.


I'm certain that there are/were platforms that insist on those libraries
being mentioned anyway.  Maybe they are all obsolete now?


I don't think any of the supported platforms require it for stuff used inside
the shared library (and we'd be in trouble if so, check e.g. libpq.pc). But of
course that's different if there's inline function / macros getting pulled in.

Which turns out to be an issue on AIX. All the -l flags added by perl can be
removed for xlc, but for gcc, -lpthreads (or -pthread) it is required.

Tried it on Solaris (32 bit, not sure if there's a 64bit perl available),
works.


Does that mean my proposed patch (v2) is adequate for these platforms, 
or does it need further analysis?

Re: Strip -mmacosx-version-min options from plperl build

2022-08-22 Thread Peter Eisentraut


On 20.08.22 22:44, Andres Freund wrote:

Maybe a daft question: Why do want any of the -l flags other than -lperl? With
the patch configure spits out the following on my debian system:

checking for CFLAGS to compile embedded Perl... -DDEBIAN
checking for flags to link embedded Perl...   
-L/usr/lib/x86_64-linux-gnu/perl/5.34/CORE -lperl -ldl -lm -lpthread -lc -lcrypt

those libraries were likely relevant to build libperl, but don't look relevant
for linking to it dynamically. Statically would be a different story, but we
already insist on a shared build.


Looking inside the ExtUtils::Embed source code, I wonder if there are 
some installations that have things like -lperl538 or something like 
that that it wants to deal with.

Remove offsetof definition

2022-08-22 Thread Peter Eisentraut



More portability cruft cleanup:  Our own definition of offsetof() was 
only relevant for ancient systems and can surely be removed.From 6c09404772ad7163e2f69b60f9adc61a151173f5 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 22 Aug 2022 16:14:45 +0200
Subject: [PATCH] Remove offsetof definition

This was only needed to deal with some ancient and no longer supported
systems.
---
 src/include/c.h | 15 ++-
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/src/include/c.h b/src/include/c.h
index dfc366b026..a381f9a6c4 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -30,7 +30,7 @@
  * 2)  bool, true, false
  * 3)  standard system types
  * 4)  IsValid macros for system types
- * 5)  offsetof, lengthof, alignment
+ * 5)  lengthof, alignment
  * 6)  assertions
  * 7)  widely useful macros
  * 8)  random stuff
@@ -703,20 +703,9 @@ typedef NameData *Name;
 
 
 /* 
- * Section 5:  offsetof, lengthof, alignment
+ * Section 5:  lengthof, alignment
  * 
  */
-/*
- * offsetof
- * Offset of a structure/union field within that structure/union.
- *
- * XXX This is supposed to be part of stddef.h, but isn't on
- * some systems (like SunOS 4).
- */
-#ifndef offsetof
-#define offsetof(type, field)  ((long) &((type *)0)->field)
-#endif /* offsetof */
-
 /*
  * lengthof
  * Number of elements in an array.
-- 
2.37.1

Re: ICU for global collation

2022-08-22 Thread Marina Polyakova


On 2022-08-22 17:10, Peter Eisentraut wrote:

On 15.08.22 14:06, Marina Polyakova wrote:
1.1) It looks like there's a bug in the function get_db_infos 
(src/bin/pg_upgrade/info.c), where the version of the old cluster is 
always checked:


if (GET_MAJOR_VERSION(old_cluster.major_version) < 1500)
 snprintf(query + strlen(query), sizeof(query) - strlen(query),
  "'c' AS datlocprovider, NULL AS daticulocale, ");
else
 snprintf(query + strlen(query), sizeof(query) - strlen(query),
  "datlocprovider, daticulocale, ");

With the simple patch

diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 
df374ce4b362b4c6c87fc1fd0e476e5d6d353d9e..53ea348e211d3ac38334292bc16cb814bc13bb87 
100644

--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -319,7 +319,7 @@ get_db_infos(ClusterInfo *cluster)

  snprintf(query, sizeof(query),
   "SELECT d.oid, d.datname, d.encoding, d.datcollate, 
d.datctype, ");

-    if (GET_MAJOR_VERSION(old_cluster.major_version) < 1500)
+    if (GET_MAJOR_VERSION(cluster->major_version) < 1500)
  snprintf(query + strlen(query), sizeof(query) - 
strlen(query),

   "'c' AS datlocprovider, NULL AS daticulocale, ");
  else


fixed

1.2) It looks like the mentioned asserion in dbcommands.c conflicts 
with the following lines earlier:


if (dbiculocale == NULL)
 dbiculocale = src_iculocale;


fixed

I'm wondering if this is not a fully-supported feature (because 
createdb creates an SQL command with LC_COLLATE and LC_CTYPE options 
instead of LOCALE option) or is it a bug in CREATE DATABASE?.. From 
src/backend/commands/dbcommands.c:


if (dblocprovider == COLLPROVIDER_ICU && !dbiculocale)
{
 if (dlocale && dlocale->arg)
     dbiculocale = defGetString(dlocale);
}


I think this piece of code was left over from some earlier attempts to
specify the libc locale and the icu locale with one option, which
never really worked well.  The CREATE DATABASE man page does not
mention that LOCALE provides the default for ICU_LOCALE.  Hence, I
think we should delete this.


Thank you!

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: ICU for global collation

2022-08-22 Thread Peter Eisentraut


On 15.08.22 14:06, Marina Polyakova wrote:
1.1) It looks like there's a bug in the function get_db_infos 
(src/bin/pg_upgrade/info.c), where the version of the old cluster is 
always checked:


if (GET_MAJOR_VERSION(old_cluster.major_version) < 1500)
 snprintf(query + strlen(query), sizeof(query) - strlen(query),
  "'c' AS datlocprovider, NULL AS daticulocale, ");
else
 snprintf(query + strlen(query), sizeof(query) - strlen(query),
  "datlocprovider, daticulocale, ");

With the simple patch

diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 
df374ce4b362b4c6c87fc1fd0e476e5d6d353d9e..53ea348e211d3ac38334292bc16cb814bc13bb87 
100644

--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -319,7 +319,7 @@ get_db_infos(ClusterInfo *cluster)

  snprintf(query, sizeof(query),
   "SELECT d.oid, d.datname, d.encoding, d.datcollate, 
d.datctype, ");

-    if (GET_MAJOR_VERSION(old_cluster.major_version) < 1500)
+    if (GET_MAJOR_VERSION(cluster->major_version) < 1500)
  snprintf(query + strlen(query), sizeof(query) - strlen(query),
   "'c' AS datlocprovider, NULL AS daticulocale, ");
  else


fixed

1.2) It looks like the mentioned asserion in dbcommands.c conflicts with 
the following lines earlier:


if (dbiculocale == NULL)
 dbiculocale = src_iculocale;


fixed

I'm wondering if this is not a fully-supported feature (because createdb 
creates an SQL command with LC_COLLATE and LC_CTYPE options instead of 
LOCALE option) or is it a bug in CREATE DATABASE?.. From 
src/backend/commands/dbcommands.c:


if (dblocprovider == COLLPROVIDER_ICU && !dbiculocale)
{
 if (dlocale && dlocale->arg)
     dbiculocale = defGetString(dlocale);
}


I think this piece of code was left over from some earlier attempts to 
specify the libc locale and the icu locale with one option, which never 
really worked well.  The CREATE DATABASE man page does not mention that 
LOCALE provides the default for ICU_LOCALE.  Hence, I think we should 
delete this.

Re: Schema variables - new implementation for Postgres 15

2022-08-22 Thread Justin Pryzby

> +-- test on query with workers
> +CREATE TABLE svar_test(a int);
> +INSERT INTO svar_test SELECT * FROM generate_series(1,100);

When I looked at this, I noticed this huge table.

I don't think you should create such a large table just for this.

To exercise parallel workers with a smaller table, decrease
min_parallel_table_scan_size and others as done in other regression tests.

-- 
Justin

Re: pg_receivewal and SIGTERM

2022-08-22 Thread Christoph Berg

Re: Michael Paquier
> While looking at the last patch proposed, it strikes me that
> time_to_stop should be sig_atomic_t in pg_receivewal.c, as the safe
> type of variable to set in a signal handler.  We could change that,
> while on it..

Done in the attached patch.

> Backpatching this stuff is not an issue here.

Do you mean it can, or can not be backpatched? (I'd argue for
backpatching since the behaviour is slightly broken at the moment.)

Christoph
>From 3d17d6f77566047fc63e05c2f33106a09b2c9793 Mon Sep 17 00:00:00 2001
From: Christoph Berg 
Date: Mon, 15 Aug 2022 14:29:43 +0200
Subject: [PATCH] pg_receivewal, pg_recvlogical: Exit cleanly on SIGTERM

In pg_receivewal, compressed output is only flushed on clean exits. The
reason to support SIGTERM here as well is that pg_receivewal might well
be running as a daemon, and systemd's default KillSignal is SIGTERM.

Since pg_recvlogical is also supposed to run as a daemon, teach it about
SIGTERM as well.

While at it, change pg_receivewal's time_to_stop to sig_atomic_t like in
pg_recvlogical.
---
 doc/src/sgml/ref/pg_receivewal.sgml| 8 +---
 doc/src/sgml/ref/pg_recvlogical.sgml   | 7 +++
 src/bin/pg_basebackup/pg_receivewal.c  | 9 +
 src/bin/pg_basebackup/pg_recvlogical.c | 7 ---
 4 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index 4fe9e1a874..5f83ba1893 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -118,8 +118,9 @@ PostgreSQL documentation
 
   
In the absence of fatal errors, pg_receivewal
-   will run until terminated by the SIGINT signal
-   (ControlC).
+   will run until terminated by the SIGINT
+   (ControlC)
+   or SIGTERM signal.
   
  
 
@@ -457,7 +458,8 @@ PostgreSQL documentation
 
   
pg_receivewal will exit with status 0 when
-   terminated by the SIGINT signal.  (That is the
+   terminated by the SIGINT or
+   SIGTERM signal.  (That is the
normal way to end it.  Hence it is not an error.)  For fatal errors or
other signals, the exit status will be nonzero.
   
diff --git a/doc/src/sgml/ref/pg_recvlogical.sgml b/doc/src/sgml/ref/pg_recvlogical.sgml
index 1a88225409..012ee4468b 100644
--- a/doc/src/sgml/ref/pg_recvlogical.sgml
+++ b/doc/src/sgml/ref/pg_recvlogical.sgml
@@ -46,6 +46,13 @@ PostgreSQL documentation
 a slot without consuming it, use
pg_logical_slot_peek_changes.
   
+
+  
+   In the absence of fatal errors, pg_recvlogical
+   will run until terminated by the SIGINT
+   (ControlC)
+   or SIGTERM signal.
+  
  
 
  
diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index f064cff4ab..4b37a98fd6 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -45,7 +45,7 @@ static int	verbose = 0;
 static int	compresslevel = 0;
 static int	noloop = 0;
 static int	standby_message_timeout = 10 * 1000;	/* 10 sec = default */
-static volatile bool time_to_stop = false;
+static volatile sig_atomic_t time_to_stop = false;
 static bool do_create_slot = false;
 static bool slot_exists_ok = false;
 static bool do_drop_slot = false;
@@ -673,13 +673,13 @@ StreamLog(void)
 }
 
 /*
- * When sigint is called, just tell the system to exit at the next possible
+ * When SIGINT/SIGTERM are caught, just tell the system to exit at the next possible
  * moment.
  */
 #ifndef WIN32
 
 static void
-sigint_handler(int signum)
+sigexit_handler(int signum)
 {
 	time_to_stop = true;
 }
@@ -932,7 +932,8 @@ main(int argc, char **argv)
 	 * if one is needed, in GetConnection.)
 	 */
 #ifndef WIN32
-	pqsignal(SIGINT, sigint_handler);
+	pqsignal(SIGINT, sigexit_handler);
+	pqsignal(SIGTERM, sigexit_handler);
 #endif
 
 	/*
diff --git a/src/bin/pg_basebackup/pg_recvlogical.c b/src/bin/pg_basebackup/pg_recvlogical.c
index 2a4c8b130a..363102cf85 100644
--- a/src/bin/pg_basebackup/pg_recvlogical.c
+++ b/src/bin/pg_basebackup/pg_recvlogical.c
@@ -650,11 +650,11 @@ error:
 #ifndef WIN32
 
 /*
- * When sigint is called, just tell the system to exit at the next possible
+ * When SIGINT/SIGTERM are caught, just tell the system to exit at the next possible
  * moment.
  */
 static void
-sigint_handler(int signum)
+sigexit_handler(int signum)
 {
 	time_to_abort = true;
 }
@@ -922,7 +922,8 @@ main(int argc, char **argv)
 	 * if one is needed, in GetConnection.)
 	 */
 #ifndef WIN32
-	pqsignal(SIGINT, sigint_handler);
+	pqsignal(SIGINT, sigexit_handler);
+	pqsignal(SIGTERM, sigexit_handler);
 	pqsignal(SIGHUP, sighup_handler);
 #endif
 
-- 
2.35.1

Re: Supporting TAP tests with MSVC and Windows

2022-08-22 Thread Andrew Dunstan



On 2022-08-21 Su 20:40, Noah Misch wrote:
> This (commit 13d856e of 2015-07-29) added the following:
>
> --- a/src/test/perl/TestLib.pm
> +++ b/src/test/perl/TestLib.pm
> @@ -242,7 +288,17 @@ sub command_exit_is
>   print("# Running: " . join(" ", @{$cmd}) ."\n");
>   my $h = start $cmd;
>   $h->finish();
> - is($h->result(0), $expected, $test_name);
> +
> + # On Windows, the exit status of the process is returned directly as the
> + # process's exit code, while on Unix, it's returned in the high bits
> + # of the exit code (see WEXITSTATUS macro in the standard 
> + # header file). IPC::Run's result function always returns exit code >> 
> 8,
> + # assuming the Unix convention, which will always return 0 on Windows as
> + # long as the process was not terminated by an exception. To work around
> + # that, use $h->full_result on Windows instead.
> + my $result = ($Config{osname} eq "MSWin32") ?
> + ($h->full_results)[0] : $h->result(0);
> + is($result, $expected, $test_name);
>  }
>
> That behavior came up again in the context of a newer IPC::Run test case.  I'm
> considering changing the IPC::Run behavior such that the above would have been
> unnecessary.  However, if I do, the above code would want to adapt to handle
> pre-change and post-change IPC::Run versions.  If you have an opinion on
> whether or how IPC::Run should change, I welcome comments on
> https://github.com/toddr/IPC-Run/issues/161.
>
>


Assuming it changes, we'll have to have a version test here. I don't
think we can have a flag day where we suddenly require IPC::Run's
bleeding edge on Windows. So changing it is a good thing, but it won't
help us much.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: [PATCH] ALTER TABLE ... SET STORAGE default

2022-08-22 Thread Aleksander Alekseev

Hi Nikita,

> This seems a little bit confusing and thus very unfriendly for the user, 
> because the actual meaning
> of the same 'DEFAULT' option will be different for each data type, and to 
> check storage mode user
> has to query full table (or column) description.
> I'd rather add a paragraph in documentation describing each data type default 
> storage mode.

I agree that "SET STORAGE default" syntax leaves much to be desired.

Personally I would prefer "RESET STORAGE" and "RESET COMPRESSION". But
since we already have "SET COMPRESSION default" this going to be
either two commands that do the same thing, or a broken backward
compatibility. Simply removing "SET COMPRESSION default" will make the
syntax consistent too, but again, this would be a broken backward
compatibility. I would argue that a sub-optimal but consistent syntax
that does the job is better than inconsistent syntax and figuring out
the default storage strategy manually.

But let's see what is others people opinion.

-- 
Best regards,
Aleksander Alekseev

Re: [PATCH] ALTER TABLE ... SET STORAGE default

2022-08-22 Thread Nikita Malakhov

Hi hackers!

This seems a little bit confusing and thus very unfriendly for the user,
because the actual meaning
of the same 'DEFAULT' option will be different for each data type, and
to check storage mode user
has to query full table (or column) description.
I'd rather add a paragraph in documentation describing each data type
default storage mode.

On Mon, Aug 22, 2022 at 3:34 PM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> Hi hackers,
>
> I noticed that we support 'ALTER TABLE ... SET COMPRESSION default'
> syntax, but not 'SET STORAGE default' which seems to be a bit
> inconsistent. When the user changes the storage mode for a column
> there is no convenient way to revert the change.
>
> The proposed patch fixes this.
>
> --
> Best regards,
> Aleksander Alekseev
>

-- 
Regards,
Nikita Malakhov
https://postgrespro.ru/

RE: Perform streaming logical transactions by background workers and parallel apply

2022-08-22 Thread kuroda.hay...@fujitsu.com

Dear Wang,

Thank you for updating the patch! Followings are comments about v23-0001 and 
v23-0005.

v23-0001

01. logical-replication.sgml

+  
+   When the streaming mode is parallel, the finish LSN of
+   failed transactions may not be logged. In that case, it may be necessary to
+   change the streaming mode to on and cause the same
+   conflicts again so the finish LSN of the failed transaction will be written
+   to the server log. For the usage of finish LSN, please refer to ALTER SUBSCRIPTION ...
+   SKIP.
+  

I was not sure about streaming='off' mode. Is there any reasons that only ON 
mode is focused?

02. protocol.sgml

+  
+   Int64 (XLogRecPtr)
+   
+
+ The LSN of the abort. This field is available since protocol version
+ 4.
+
+   
+  
+
+  
+   Int64 (TimestampTz)
+   
+
+ Abort timestamp of the transaction. The value is in number
+ of microseconds since PostgreSQL epoch (2000-01-01). This field is
+ available since protocol version 4.
+
+   
+  
+

It seems that changes are in the variablelist for stream commit.
I think these are included in the stream abort message, so it should be moved.

03. decode.c

-   ReorderBufferForget(ctx->reorder, parsed->subxacts[i], 
buf->origptr);
+   ReorderBufferForget(ctx->reorder, parsed->subxacts[i], 
buf->origptr,
+   commit_time);
}
-   ReorderBufferForget(ctx->reorder, xid, buf->origptr);
+   ReorderBufferForget(ctx->reorder, xid, buf->origptr, 
commit_time);

'commit_time' has been passed as argument 'abort_time', I think it may be 
confusing.
How about adding a comment above, like:
"In case of streamed transactions, they are regarded as being aborted at 
commit_time"

04. launcher.c

04.a

+   worker->main_worker_pid = is_subworker ? MyProcPid : 0;

You can use InvalidPid instead of 0.
（I thought pid should be represented by the datatype pid_t, but in some codes 
it is defined as int...) 

04.b

+   worker->main_worker_pid = 0;

You can use InvalidPid instead of 0, same as above.

05. origin.c

 void
-replorigin_session_setup(RepOriginId node)
+replorigin_session_setup(RepOriginId node, int acquired_by)

IIUC the same slot can be used only when the apply main worker has already 
acquired the slot
and the subworker for the same subscription tries to acquire, but it cannot 
understand from comments.
How about adding comments, or an assertion that acquired_by is same as 
session_replication_state->acquired_by ?
Moreover acquired_by should be compared with InvalidPid, based on above 
comments.

06. proto.c

 void
 logicalrep_write_stream_abort(StringInfo out, TransactionId xid,
- TransactionId subxid)
+ ReorderBufferTXN 
*txn, XLogRecPtr abort_lsn,
+ bool write_abort_lsn

I think write_abort_lsn may be not needed,
because abort_lsn can be used for controlling whether abort_XXX fields should 
be filled or not.

07. worker.c

+/*
+ * The number of changes during one streaming block (only for apply background
+ * workers)
+ */
+static uint32 nchanges = 0;

This variable is used only by the main apply worker, so the comment seems not 
correct.
How about "...(only for SUBSTREAM_PARALLEL case)"?

v23-0005

08. monitoring.sgml

I cannot decide which option proposed in [1] is better, but followings 
descriptions are needed in both cases.
(In [2] I had intended to propose something like option 2)

08.a

You can add a description that the field 'relid' will be NULL even for apply 
background worker.

08.b

You can add a description that fields 'received_lsn', 'last_msg_send_time', 
'last_msg_receipt_time',
'latest_end_lsn', 'latest_end_time' will be NULL for apply background worker.


[1]: 
https://www.postgresql.org/message-id/CAHut%2BPuPwdwZqXBJjtU%2BR9NULbOpxMG%3Di2hmqgg%2B7p0rmK0hrw%40mail.gmail.com
[2]: 
https://www.postgresql.org/message-id/TYAPR01MB58660B4732E7F80B322174A3F5629%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

[PATCH] Tab completion for SET COMPRESSION

2022-08-22 Thread Aleksander Alekseev

Hi hackers,

The proposed patch adds the missing tab completion for 'ALTER TABLE
... SET COMPRESSION ...' syntax.

-- 
Best regards,
Aleksander Alekseev


v1-0001-Tab-completion-for-SET-COMPRESSION.patch
Description: Binary data

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2022-08-22 Thread torikoshia


On 2022-08-15 22:23, Damir Belyalov wrote:

I expected I could see  rows after COPY, but just saw 999 rows.

Also I implemented your case and it worked correctly.


Thanks for the new patch!

Here are some comments on it.


+   if (safecstate->saved_tuples < REPLAY_BUFFER_SIZE)
+   {
+   valid_row = NextCopyFrom(cstate, econtext, values, 
nulls);

+   if (valid_row)
+   {
+   /* Fill replay_buffer in oldcontext*/
+   MemoryContextSwitchTo(safecstate->oldcontext);
+   
safecstate->replay_buffer[safecstate->saved_tuples++] = 
heap_form_tuple(RelationGetDescr(cstate->rel), values, nulls);


+   /* Buffer was filled, commit subtransaction and prepare 
to replay */

+   ReleaseCurrentSubTransaction();


What is actually being committed by this ReleaseCurrentSubTransaction()?
It seems to me that just safecstate->replay_buffer is fulfilled before 
this commit.
As a test, I rewrote this ReleaseCurrentSubTransaction() to 
RollbackAndReleaseCurrentSubTransaction() and COPYed over 1000 rows of 
data, but same data were loaded.



+#defineREPLAY_BUFFER_SIZE 1000


I feel it might be better to have it as a parameter rather than fixed at 
1000.



+/*
+ * Analog of NextCopyFrom() but ignore rows with errors while copying.
+ */
+static bool
+safeNextCopyFrom(CopyFromState cstate, ExprContext *econtext, Datum 
*values, bool *nulls)


NextCopyFrom() is in copyfromparse.c while safeNextCopyFrom() is in 
copyfrom.c.
Since safeNextCopyFrom() is analog of NextCopyFrom() as commented, would 
it be natural to put them in the same file?



188 +   safecstate->errors++;
189 +   if (safecstate->errors <= 100)
190 +   ereport(WARNING,
191 +   (errcode(errdata->sqlerrcode),
192 +   errmsg("%s", errdata->context)));


It would be better to state in the manual that errors exceeding 100 are 
not displayed.

Or, it might be acceptable to output all errors that exceed 100.


+typedef struct SafeCopyFromState
+{
+#defineREPLAY_BUFFER_SIZE 1000
+   HeapTuple   replay_buffer[REPLAY_BUFFER_SIZE]; /* accumulates 
tuples for replaying it after an error */
+   int saved_tuples;   /* # of tuples in 
replay_buffer */
+   int replayed_tuples;/* # of tuples was 
replayed from buffer */

+   int errors; /* total # of errors */
+   boolreplay_is_active;
+   boolbegin_subtransaction;
+   boolprocessed_remaining_tuples; /* for case of 
replaying last tuples */

+   boolskip_row;


It would be helpful to add comments about skip_row, etc.

```
$ git apply ../patch/0003-COPY_IGNORE_ERRORS.patch

../patch/0003-COPY_IGNORE_ERRORS.patch:86: indent with spaces.
 Datum *values, bool *nulls);
warning: 1 line adds whitespace errors.
```

There was a warning when applying the patch.


```
=# copy test from '/tmp/1.data' with (ignore_errors);
WARNING:  FIND 0 ERRORS
COPY 1003
```

When there were no errors, this WARNING seems not necessary.


--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

[PATCH] ALTER TABLE ... SET STORAGE default

2022-08-22 Thread Aleksander Alekseev

Hi hackers,

I noticed that we support 'ALTER TABLE ... SET COMPRESSION default'
syntax, but not 'SET STORAGE default' which seems to be a bit
inconsistent. When the user changes the storage mode for a column
there is no convenient way to revert the change.

The proposed patch fixes this.

-- 
Best regards,
Aleksander Alekseev


v1-0001-ALTER-TABLE-.-SET-STORAGE-default.patch
Description: Binary data

Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-08-22 Thread Michael Paquier

On Wed, Aug 17, 2022 at 09:53:45AM +0200, Drouvot, Bertrand wrote:
> Thanks for the new version!
> 
> +   /* Copy authn_id into the space after the struct. */
> +   if (serialized.authn_id_len >= 0)
> 
> Maybe remove the "." at the end of the comment? (to be consistent with the
> other comment just above)

When it comes to such things, I usually apply the rule of consistency
with the surroundings, which sounds right here.

> +   memcpy(, conninfo, sizeof(serialized));
> +   authn_id = conninfo + sizeof(serialized);
> 
> Move "authn_id = conninfo + sizeof(serialized)" in the "if
> (serialized.authn_id_len >= 0)" below?

Makes sense, so as never have something pointing to an area should
should not look at.  This should just be used when we know that there
is going to be a string.

> + src/backend/utils/init/miscinit.c:RestoreClientConnectionInfo(char
> *conninfo)
> + src/include/miscadmin.h:extern void RestoreClientConnectionInfo(char
> *procinfo);
> 
> conninfo in both to be consistent?

Yep.  Looks like a copy-pasto, seen from here.

By the way, I have looked at the patch, tweaked a couple of things
with comments and the style, but overval that's fine.  First, I have
intended to apply this stuff today but I have lacked the time to do
so.  I should be able to get this wrapped tomorrow, though.
--
Michael


signature.asc
Description: PGP signature

Re: Column Filtering in Logical Replication

2022-08-22 Thread vignesh C

On Mon, Aug 22, 2022 at 1:58 PM Peter Smith  wrote:
>
> Thanks for the view of v1-0001.
>
> On Wed, Aug 17, 2022 at 3:04 AM vignesh C  wrote:
> ...
> > 1) Row filters mentions that "It has no effect on TRUNCATE commands.",
> > the same is not present in case of column filters. We should keep the
> > changes similarly for consistency.
> > --- a/doc/src/sgml/ref/create_publication.sgml
> > +++ b/doc/src/sgml/ref/create_publication.sgml
> > @@ -90,8 +90,7 @@ CREATE PUBLICATION  > class="parameter">name
> >   
> >When a column list is specified, only the named columns are 
> > replicated.
> >If no column list is specified, all columns of the table are 
> > replicated
> > -  through this publication, including any columns added later.  If a 
> > column
> > -  list is specified, it must include the replica identity columns.
> > +  through this publication, including any columns added later.
>
> Modified as suggested.
>
> >
> > 2) The document says that "if the table uses REPLICA IDENTITY FULL,
> > specifying a column list is not allowed.":
> > +   publishes only INSERT operations. Furthermore, if the
> > +   table uses REPLICA IDENTITY FULL, specifying a column
> > +   list is not allowed.
> > +  
> >
> > Did you mean specifying a column list during create publication for
> > REPLICA IDENTITY FULL table like below scenario:
> > postgres=# create table t2(c1 int, c2 int, c3 int);
> > CREATE TABLE
> > postgres=# alter table t2 replica identity full ;
> > ALTER TABLE
> > postgres=# create publication pub1 for table t2(c1,c2);
> > CREATE PUBLICATION
> >
> > If so, the document says specifying column list is not allowed, but
> > creating a publication with column list on replica identity full was
> > successful.
>
> That patch v1-0001 was using the same wording from the github commit
> message [1]. I agree it was a bit vague.
>
> In fact the replica identity validation is done at DML execution time
> so your example will fail as expected when you attempt to do a UPDATE
> operation.
>
> e.g.
> test_pub=# update t2 set c2=23 where c1=1;
> ERROR:  cannot update table "t2"
> DETAIL:  Column list used by the publication does not cover the
> replica identity.
>
> I modified the wording for this part of the docs.

Few comments:
1) I felt no expressions are allowed in case of column filters. Only
column names can be specified. The second part of the sentence
confuses what is allowed and what is not allowed. Won't it be better
to remove the second sentence and mention that only column names can
be specified.
+   
+Column list can contain only simple column references. Complex
+expressions, function calls etc. are not allowed.
+   

2) tablename should be table name.
+   
+A column list is specified per table following the tablename, and
enclosed by
+parenthesis. See  for details.
+   

We have used table name in the same page in other instances like:
a) The row filter is defined per table. Use a WHERE clause after the
table name for each published table that requires data to be filtered
out. The WHERE clause must be enclosed by parentheses.
b) The tables are matched between the publisher and the subscriber
using the fully qualified table name.

3) One small whitespace issue:
git am v2-0001-Column-List-replica-identity-rules.patch
Applying: Column List replica identity rules.
.git/rebase-apply/patch:30: trailing whitespace.
   if the publication publishes only INSERT operations.
warning: 1 line adds whitespace errors.

Regards,
Vignesh

Re: Fix typo with logical connector (src/backend/commands/vacuumparallel.c)

2022-08-22 Thread Ranier Vilela

Em seg., 22 de ago. de 2022 às 01:42, Amit Kapila 
escreveu:

> On Sat, Aug 20, 2022 at 10:04 AM Tom Lane  wrote:
> >
> > Amit Kapila  writes:
> > > Right, but as Tom pointed it is still better to change this. However,
> > > I am not sure if we should backpatch this to PG15 as this won't lead
> > > to any incorrect behavior.
> >
> > If that code only exists in HEAD and v15 then I'd backpatch.
> > It's a very low-risk change and it might avoid merge problems
> > for future backpatches.
> >
>
> Okay, done that way. Thanks!
>
Thank you.

regards,
Ranier Vilela

Re: Logical WAL sender unresponsive during decoding commit

2022-08-22 Thread Amit Kapila

On Tue, Aug 16, 2022 at 2:37 PM Masahiko Sawada  wrote:
>
> I've attached patches for all supported branches.
>

LGTM. I'll push this tomorrow unless there are comments/suggestions.

-- 
With Regards,
Amit Kapila.

Re: proposal: possibility to read dumped table's name from file

2022-08-22 Thread Pavel Stehule

Hi

I am sending fresh rebase + enhancing tests for pg_dumpall and pg_restore

Regards

Pavel
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index c08276bc0a..b64bae6987 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -779,6 +779,80 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects to include
+or exclude from the dump. The patterns are interpreted according to the
+same rules as the corresponding options:
+-t/--table for tables,
+-n/--schema for schemas,
+--include-foreign-data for data on foreign servers and
+--exclude-table-data for table data.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one object pattern per row, with the following format:
+
+{ include | exclude } { table | schema | foreign_data | data } PATTERN
+
+   
+
+   
+The first keyword specifies whether the objects matched by the pattern
+are to be included or excluded. The second keyword specifies the type
+of object to be filtered using the pattern:
+
+ 
+  
+   table: tables, works like
+   -t/--table
+  
+ 
+ 
+  
+   schema: schemas, works like
+   -n/--schema
+  
+ 
+ 
+  
+   foreign_data: data on foreign servers, works like
+   --include-foreign-data. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   data: table data, works like
+   --exclude-table-data. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+
+   
+
+   
+Lines starting with # are considered comments and
+ignored. Comments can be placed after filter as well. Blank lines
+are also ignored. See  for how to
+perform quoting in patterns.
+   
+
+   
+Example files are listed below in the 
+section.
+   
+
+  
+ 
+
  
   --if-exists
   
@@ -1119,6 +1193,7 @@ PostgreSQL documentation
 schema (-n/--schema) and
 table (-t/--table) qualifier
 match at least one extension/schema/table in the database to be dumped.
+This also applies to filters used with --filter.
 Note that if none of the extension/schema/table qualifiers find
 matches, pg_dump will generate an error
 even without --strict-names.
@@ -1528,6 +1603,19 @@ CREATE DATABASE foo WITH TEMPLATE template0;
 
 
 $ pg_dump -t "\"MixedCaseName\"" mydb  mytab.sql
+
+
+  
+   To dump all tables with names starting with mytable, except for table
+   mytable2, specify a filter file
+   filter.txt like:
+
+include table mytable*
+exclude table mytable2
+
+
+
+$ pg_dump --filter=filter.txt mydb  db.sql
 
 
  
diff --git a/doc/src/sgml/ref/pg_dumpall.sgml b/doc/src/sgml/ref/pg_dumpall.sgml
index 5d54074e01..7671795060 100644
--- a/doc/src/sgml/ref/pg_dumpall.sgml
+++ b/doc/src/sgml/ref/pg_dumpall.sgml
@@ -122,6 +122,28 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for databases excluded
+from dump. The patterns are interpretted according to the same rules
+as --exclude-database.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for excluding databases,
+and can also be specified more than once for multiple filter files.
+   
+
+   
+The file lists one database pattern per row, with the following format:
+
+exclude database  PATTERN
+
+   
+  
+ 
+
  
   -g
   --globals-only
diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index 47bd7dbda0..ffeb564c52 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -188,6 +188,31 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects excluded
+or included from restore. The patterns are interpretted according to the
+same rules as --schema, --exclude-schema,
+--function, --index, --table
+or --trigger.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can

Re: Column Filtering in Logical Replication

2022-08-22 Thread Erik Rijkers


Op 22-08-2022 om 10:27 schreef Peter Smith:


PSA new set of v2* patches.


Hi,

In the second file a small typo, I think:

"enclosed by parenthesis"  should be
"enclosed by parentheses"

thanks,
Erik

Re: Perform streaming logical transactions by background workers and parallel apply

2022-08-22 Thread Amit Kapila

On Mon, Aug 22, 2022 at 4:42 AM Peter Smith  wrote:
>
> On Fri, Aug 19, 2022 at 7:55 PM Amit Kapila  wrote:
> >
> > On Fri, Aug 19, 2022 at 3:05 PM Peter Smith  wrote:
> > >
> > > On Fri, Aug 19, 2022 at 7:10 PM Amit Kapila  
> > > wrote:
> > > >
> > > > On Fri, Aug 19, 2022 at 2:36 PM Peter Smith  
> > > > wrote:
> > > > >
> > > > > Here are my review comments for the v23-0005 patch:
> > > > >
> > > > > ==
> > > > >
> > > > > Commit Message says:
> > > > > main_worker_pid is Process ID of the main apply worker, if this 
> > > > > process is a
> > > > > apply background worker. NULL if this process is a main apply worker 
> > > > > or a
> > > > > synchronization worker.
> > > > > The new column can make it easier to distinguish main apply worker 
> > > > > and apply
> > > > > background worker.
> > > > >
> > > > > --
> > > > >
> > > > > Having a column called ‘main_worker_pid’ which is defined to be NULL
> > > > > if the process *is* the main apply worker does not make any sense to
> > > > > me.
> > > > >
> > > >
> > > > I haven't read this part of a patch but it seems to me we have
> > > > something similar for parallel query workers. Refer 'leader_pid'
> > > > column in pg_stat_activity.
> > > >
> > >
> > > IIUC (from the patch 0005 commit message) the intention is to be able
> > > to easily distinguish the worker types.
> > >
> >
> > I think it is only to distinguish between leader apply worker and
> > background apply workers. The tablesync worker can be distinguished
> > based on relid field.
> >
>
> Right. But that's the reason for my question in the first place - why
> implement the patch so that the user still has to jump through hoops
> just to know the worker type information?
>

I think it is not only to judge worker type but also to know the pid
of each of the workers during parallel apply. Isn't it better to have
both main apply worker pid and parallel apply worker pid as we have
for the parallel query system?

-- 
With Regards,
Amit Kapila.

Re: Asynchronous execution support for Custom Scan

2022-08-22 Thread Etsuro Fujita

Hi Onishi-san,

On Sun, Aug 14, 2022 at 12:59 PM Kazutaka Onishi  wrote:
> v1 patch occurs gcc warnings, I fixed it.

Thanks for working on this!

I'd like to review this (though, I'm not sure I can have time for it
in the next commitfet), but I don't think we can review this without
any example.  Could you provide it?  I think a simple example is
better for ease of review.

Best regards,
Etsuro Fujita

Re: Fast COPY FROM based on batch insert

2022-08-22 Thread Etsuro Fujita

On Mon, Aug 15, 2022 at 2:29 PM Andrey Lepikhov
 wrote:
> On 8/9/22 16:44, Etsuro Fujita wrote:
> >>> -1 foo
> >>> 1 bar
> >>> \.
> > ERROR:  new row for relation "t1" violates check constraint "t1_f1positive"
> > DETAIL:  Failing row contains (-1, foo).
> > CONTEXT:  remote SQL command: INSERT INTO public.t1(f1, f2) VALUES
> > ($1, $2), ($3, $4)
> > COPY ft1, line 3
> >
> > In single-insert mode the error context information is correct, but in
> > batch-insert mode it isn’t (i.e., the line number isn’t correct).
> >
> > The error occurs on the remote side, so I'm not sure if there is a
> > simple fix.  What I came up with is to just suppress error context
> > information other than the relation name, like the attached.  What do
> > you think about that?

> I've spent many efforts to this problem too. Your solution have a
> rationale and looks fine.
> I only think, we should add a bit of info into an error report to
> simplify comprehension why don't point specific line here. For example:
> 'COPY %s (buffered)'
> or
> 'COPY FOREIGN TABLE %s'
>
> or, if instead of relname_only field to save a MultiInsertBuffer
> pointer, we might add min/max linenos into the report:
> 'COPY %s, line between %llu and %llu'

I think the latter is more consistent with the existing error context
information when in CopyMultiInsertBufferFlush().  Actually, I thought
this too, and I think this would be useful when the COPY FROM command
is executed on a foreign table.  My concern, however, is the case when
the command is executed on a partitioned table containing foreign
partitions; in that case the input data would not always be sorted in
the partition order, so the range for an error-occurring foreign
partition might contain many lines with rows from other partitions,
which I think makes the range information less useful.  Maybe I'm too
worried about that, though.

Best regards,
Etsuro Fujita

Re: making relfilenodes 56 bits

2022-08-22 Thread Amit Kapila

On Mon, Aug 22, 2022 at 1:25 PM Amit Kapila  wrote:
>
> On Sat, Jul 30, 2022 at 1:59 AM Robert Haas  wrote:
> >
> > On Wed, Jul 20, 2022 at 7:27 AM Dilip Kumar  wrote:
> > > There was also an issue where the user table from the old cluster's
> > > relfilenode could conflict with the system table of the new cluster.
> > > As a solution currently for system table object (while creating
> > > storage first time) we are keeping the low range of relfilenumber,
> > > basically we are using the same relfilenumber as OID so that during
> > > upgrade the normal user table from the old cluster will not conflict
> > > with the system tables in the new cluster.  But with this solution
> > > Robert told me (in off list chat) a problem that in future if we want
> > > to make relfilenumber completely unique within a cluster by
> > > implementing the CREATEDB differently then we can not do that as we
> > > have created fixed relfilenodes for the system tables.
> > >
> > > I am not sure what exactly we can do to avoid that because even if we
> > > do something  to avoid that in the new cluster the old cluster might
> > > be already using the non-unique relfilenode so after upgrading the new
> > > cluster will also get those non-unique relfilenode.
> >
> > I think this aspect of the patch could use some more discussion.
> >
> > To recap, the problem is that pg_upgrade mustn't discover that a
> > relfilenode that is being migrated from the old cluster is being used
> > for some other table in the new cluster. Since the new cluster should
> > only contain system tables that we assume have never been rewritten,
> > they'll all have relfilenodes equal to their OIDs, and thus less than
> > 16384. On the other hand all the user tables from the old cluster will
> > have relfilenodes greater than 16384, so we're fine. pg_largeobject,
> > which also gets migrated, is a special case. Since we don't change OID
> > assignments from version to version, it should have either the same
> > relfilenode value in the old and new clusters, if never rewritten, or
> > else the value in the old cluster will be greater than 16384, in which
> > case no conflict is possible.
> >
> > But if we just assign all relfilenode values from a central counter,
> > then we have got trouble. If the new version has more system catalog
> > tables than the old version, then some value that got used for a user
> > table in the old version might get used for a system table in the new
> > version, which is a problem. One idea for fixing this is to have two
> > RelFileNumber ranges: a system range (small values) and a user range.
> > System tables get values in the system range initially, and in the
> > user range when first rewritten. User tables always get values in the
> > user range. Everything works fine in this scenario except maybe for
> > pg_largeobject: what if it gets one value from the system range in the
> > old cluster, and a different value from the system range in the new
> > cluster, but some other system table in the new cluster gets the value
> > that pg_largeobject had in the old cluster? Then we've got trouble.
> >
>
> To solve that problem, how about rewriting the system table in the new
> cluster which has a conflicting relfilenode? I think we can probably
> do this conflict checking before processing the tables from the old
> cluster.
>

I think while rewriting of system table during the upgrade, we need to
ensure that it gets relfilenumber from the system range, otherwise, if
we allocate it from the user range, there will be a chance of conflict
with the user tables from the old cluster. Another way could be to set
the next-relfilenumber counter for the new cluster to the value from
the old cluster as mentioned by Robert in his previous email [1].

[1] - 
https://www.postgresql.org/message-id/CA%2BTgmoYsNiF8JGZ%2BKp7Zgcct67Qk%2B%2BYAp%2B1ybOQ0qomUayn%2B7A%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: Fix typo kill_prio_tuple

2022-08-22 Thread Daniel Gustafsson

> On 22 Aug 2022, at 09:57, Zhang Mingli  wrote:

> Found a typo in mvcc.sql
> 
> typo kill_prio_tuple -> kill_prior_tuple

Correct, that should be kill_prior_tuple. I'll apply this in a bit.

--
Daniel Gustafsson   https://vmware.com/

Re: Logical replication support for generic wal record

2022-08-22 Thread Amit Kapila

On Mon, Aug 22, 2022 at 11:59 AM Natarajan R  wrote:
>
> Hi All,
>
> I am writing a postgres extension which writes only generic wal record, but 
> this wal is not recognized by logical replication decoder. I have a basic 
> understanding of how logical replication(COPY command for initial sync, wal 
> replica for final sync) works, can you please tell us a way to support this?
>

Did you try with a custom WAL resource manager [1][2]?

[1] - https://www.postgresql.org/docs/devel/custom-rmgr.html
[2] - 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=5c279a6d350205cc98f91fb8e1d3e4442a6b25d1

-- 
With Regards,
Amit Kapila.

Re: Column Filtering in Logical Replication

2022-08-22 Thread Peter Smith

Thanks for the view of v1-0001.

On Wed, Aug 17, 2022 at 3:04 AM vignesh C  wrote:
...
> 1) Row filters mentions that "It has no effect on TRUNCATE commands.",
> the same is not present in case of column filters. We should keep the
> changes similarly for consistency.
> --- a/doc/src/sgml/ref/create_publication.sgml
> +++ b/doc/src/sgml/ref/create_publication.sgml
> @@ -90,8 +90,7 @@ CREATE PUBLICATION  class="parameter">name
>   
>When a column list is specified, only the named columns are replicated.
>If no column list is specified, all columns of the table are replicated
> -  through this publication, including any columns added later.  If a 
> column
> -  list is specified, it must include the replica identity columns.
> +  through this publication, including any columns added later.

Modified as suggested.

>
> 2) The document says that "if the table uses REPLICA IDENTITY FULL,
> specifying a column list is not allowed.":
> +   publishes only INSERT operations. Furthermore, if the
> +   table uses REPLICA IDENTITY FULL, specifying a column
> +   list is not allowed.
> +  
>
> Did you mean specifying a column list during create publication for
> REPLICA IDENTITY FULL table like below scenario:
> postgres=# create table t2(c1 int, c2 int, c3 int);
> CREATE TABLE
> postgres=# alter table t2 replica identity full ;
> ALTER TABLE
> postgres=# create publication pub1 for table t2(c1,c2);
> CREATE PUBLICATION
>
> If so, the document says specifying column list is not allowed, but
> creating a publication with column list on replica identity full was
> successful.

That patch v1-0001 was using the same wording from the github commit
message [1]. I agree it was a bit vague.

In fact the replica identity validation is done at DML execution time
so your example will fail as expected when you attempt to do a UPDATE
operation.

e.g.
test_pub=# update t2 set c2=23 where c1=1;
ERROR:  cannot update table "t2"
DETAIL:  Column list used by the publication does not cover the
replica identity.

I modified the wording for this part of the docs.

~~~

PSA new set of v2* patches.

--
[1] - 
https://github.com/postgres/postgres/commit/923def9a533a7d986acfb524139d8b9e5466d0a5

Kind Regards,
Peter Smith
Fujitsu Australia


v2-0001-Column-List-replica-identity-rules.patch
Description: Binary data


v2-0002-Column-Lists-new-pgdocs-section.patch
Description: Binary data

Fix typo kill_prio_tuple

2022-08-22 Thread Zhang Mingli

Hi,

Found a typo in mvcc.sql

typo kill_prio_tuple -> kill_prior_tuple

Regards,
Zhang Mingli


vn-0001-Fix-typo-kill_prio_tuple-to-kill_prior_tuple.patch
Description: Binary data

Re: making relfilenodes 56 bits

2022-08-22 Thread Amit Kapila

On Sat, Jul 30, 2022 at 1:59 AM Robert Haas  wrote:
>
> On Wed, Jul 20, 2022 at 7:27 AM Dilip Kumar  wrote:
> > There was also an issue where the user table from the old cluster's
> > relfilenode could conflict with the system table of the new cluster.
> > As a solution currently for system table object (while creating
> > storage first time) we are keeping the low range of relfilenumber,
> > basically we are using the same relfilenumber as OID so that during
> > upgrade the normal user table from the old cluster will not conflict
> > with the system tables in the new cluster.  But with this solution
> > Robert told me (in off list chat) a problem that in future if we want
> > to make relfilenumber completely unique within a cluster by
> > implementing the CREATEDB differently then we can not do that as we
> > have created fixed relfilenodes for the system tables.
> >
> > I am not sure what exactly we can do to avoid that because even if we
> > do something  to avoid that in the new cluster the old cluster might
> > be already using the non-unique relfilenode so after upgrading the new
> > cluster will also get those non-unique relfilenode.
>
> I think this aspect of the patch could use some more discussion.
>
> To recap, the problem is that pg_upgrade mustn't discover that a
> relfilenode that is being migrated from the old cluster is being used
> for some other table in the new cluster. Since the new cluster should
> only contain system tables that we assume have never been rewritten,
> they'll all have relfilenodes equal to their OIDs, and thus less than
> 16384. On the other hand all the user tables from the old cluster will
> have relfilenodes greater than 16384, so we're fine. pg_largeobject,
> which also gets migrated, is a special case. Since we don't change OID
> assignments from version to version, it should have either the same
> relfilenode value in the old and new clusters, if never rewritten, or
> else the value in the old cluster will be greater than 16384, in which
> case no conflict is possible.
>
> But if we just assign all relfilenode values from a central counter,
> then we have got trouble. If the new version has more system catalog
> tables than the old version, then some value that got used for a user
> table in the old version might get used for a system table in the new
> version, which is a problem. One idea for fixing this is to have two
> RelFileNumber ranges: a system range (small values) and a user range.
> System tables get values in the system range initially, and in the
> user range when first rewritten. User tables always get values in the
> user range. Everything works fine in this scenario except maybe for
> pg_largeobject: what if it gets one value from the system range in the
> old cluster, and a different value from the system range in the new
> cluster, but some other system table in the new cluster gets the value
> that pg_largeobject had in the old cluster? Then we've got trouble.
>

To solve that problem, how about rewriting the system table in the new
cluster which has a conflicting relfilenode? I think we can probably
do this conflict checking before processing the tables from the old
cluster.

-- 
With Regards,
Amit Kapila.

Re: Schema variables - new implementation for Postgres 15

2022-08-22 Thread Julien Rouhaud

Hi Pavel,

On Sun, Aug 21, 2022 at 09:54:03AM +0200, Pavel Stehule wrote:
> 
> should be fixed now

I started reviewing the patchset, beginning with 0001 (at least the parts that
don't substantially change later) and have a few comments.

- you define new AclMode READ and WRITE.  Those bits are precious and I don't
  think it's ok to consume 2 bits for session variables, especially since those
  are the last two bits available since the recent GUC access control patch
  (ACL_SET and ACL_ALTER_SYSTEM).  Maybe we could existing INSERT and UPDATE
  privileges instead, like it's done for sequences?

- make check and make-check-world don't pass with this test only.  Given that
  the split is mostly done to ease review and probably not intended to be
  committed this way, we probably shouldn't spend efforts to clean up the split
  apart from making sure that each patch compiles cleanly on its own.  But in
  this case it brought my attention to misc_sanity.sql test.  Looking at patch
  0010, I see:

diff --git a/src/test/regress/expected/misc_sanity.out 
b/src/test/regress/expected/misc_sanity.out
index a57fd142a9..ce9bad7211 100644
--- a/src/test/regress/expected/misc_sanity.out
+++ b/src/test/regress/expected/misc_sanity.out
@@ -60,7 +60,9 @@ ORDER BY 1, 2;
  pg_index| indpred   | pg_node_tree
  pg_largeobject  | data  | bytea
  pg_largeobject_metadata | lomacl| aclitem[]
-(11 rows)
+ pg_variable | varacl| aclitem[]
+ pg_variable | vardefexpr| pg_node_tree
+(13 rows)

This is the test for relations with varlena columns without TOAST table.  I
don't think that's correct to add those exceptions, and there should be a TOAST
table declared for pg_variable too, as noted in the comment above that query.

- nitpicking: s/catalogue/catalog/

Some other comments on other patches while testing things around:

- For sessionvariable.c (in 0002), I see that there are still all the comments
  and code about checking type validity based on a generation number and other
  heuristics.  I still fail to understand why this is needed at all as the
  stored datum should remain compatible as long as we prevent the few
  incompatible DDL that are also prevented when there's a relation dependency.
  As an example, I try to quickly disable all that code with the following:

diff --git a/src/backend/commands/sessionvariable.c 
b/src/backend/commands/sessionvariable.c
index 9b4f9482a4..7c8808dc46 100644
--- a/src/backend/commands/sessionvariable.c
+++ b/src/backend/commands/sessionvariable.c
@@ -794,6 +794,8 @@ svartype_verify_composite_fast(SVariableType svt)
 static int64
 get_svariable_valid_type_gennum(SVariableType svt)
 {
+   return 1;
+
HeapTuple   tuple;
boolfast_check = true;

@@ -905,6 +907,8 @@ get_svariabletype(Oid typid)
 static bool
 session_variable_use_valid_type(SVariable svar)
 {
+   return true;
+
Assert(svar);
Assert(svar->svartype);

And session_variable.sql regression test still works just fine.  Am I missing
something?

While at it, the initial comment should probably say "free local memory" rather
than "purge memory".

- doc are missing for GRANT/REVOKE ... ON ALL VARIABLES

- plpgsql.sgml:
+   
+Session variables and constants

I don't think it's ok to use "constant" as an alias for immutable session
variable as immutable session variable can actually be changed.

Similarly, in catalogs.sgml:

+   varisimmutable boolean
+  
+  
+   True if the variable is immutable (cannot be modified). The default 
value is false.
+  
+ 

I think there should be a note and a link to the corresponding part in
create_variable.sgml to explain what exactly is an immutable variable, as the
implemented behavior (for nullable immutable variable) is somewhat unexpected.

- other nitpicking: pg_variable and struct Variable seems a bit inconsistent.
  For instance one uses vartype and vartypmod and the other typid and typmod,
  while both use varname and varnamespace.  I think we should avoid discrepancy
  here.

Also, there's a sessionvariable.c and a session_variable.h.  Let's use
session_variable.[ch], as it seems more readable?

-typedef patch: missing SVariableTypeData, some commits need a pgindent, e.g:

+typedef SVariableTypeData * SVariableType;

+typedef SVariableData * SVariable;

+static SessionVariableValue * RestoreSessionVariables(char **start_address,
+   int *num_session_variables);

+static Query *transformLetStmt(ParseState *pstate,
+  LetStmt * stmt);

(and multiple others)

Re: Logical replication support for generic wal record

2022-08-22 Thread Bharath Rupireddy

On Mon, Aug 22, 2022 at 11:59 AM Natarajan R  wrote:
>
> Hi All,
>
> I am writing a postgres extension which writes only generic wal record, but 
> this wal is not recognized by logical replication decoder. I have a basic 
> understanding of how logical replication(COPY command for initial sync, wal 
> replica for final sync) works, can you please tell us a way to support this?

"Generic" resource manager doesn't have a decoding API, see [1], which
means that the generic WAL records will not get decoded.

Can you be more specific about the use-case? Why use only "Generic"
type WAL records? Why not use "LogicalMessage" type WAL records if you
want your WAL records to be decoded?

[1] 
https://github.com/postgres/postgres/blob/master/src/include/access/rmgrlist.h#L48

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/

Logical replication support for generic wal record

2022-08-22 Thread Natarajan R

Hi All,

I am writing a postgres extension which writes only generic wal record, but
this wal is not recognized by logical replication decoder. I have a basic
understanding of how logical replication(COPY command for initial sync, wal
replica for final sync) works, can you please tell us a way to support this?


Thanks,
Natarajan.R

99 matches

Mail list logo