date:20210226

Different compression methods for FPI

2021-02-26 Thread Andrey Borodin

Hi!

There is a lot of different compression threads nearby. And that's great!
Every few bytes going to IO still deserve to be compressed.

Currently, we have a pglz compression for WAL full page images. As shown in [0] 
this leads to high CPU usage in pglz when wal_compression is on. Swapping pglz 
with lz4 increases pgbench tps by 21% on my laptop (if wal_compression is 
enabled).
So I think it worth to propose a patch to make wal_compression_method = 
{"pglz", "lz4", "zlib"}. Probably, "zstd" can be added to the list.

Even better option would be to teach WAL compression to compress everything, 
not only FPIs. But this is a big orthogonal chunk of work.

Attached is a draft taking CompressionId from "custom compression methods" 
patch and adding zlib to it.
I'm not sure where to add tests that check recovery with different methods. It 
seems to me that only TAP tests are suitable for this.

Thanks!

Best regards, Andrey Borodin.

[0] 
https://www.postgresql.org/message-id/323B1B01-DA42-419F-A99C-23E2C162D53B%40yandex-team.ru


0001-Use-different-compression-methods-for-FPIs.patch
Description: Binary data

Re: SEARCH and CYCLE clauses

2021-02-26 Thread Peter Eisentraut


On 22.02.21 14:45, Vik Fearing wrote:

On 2/22/21 1:28 PM, Peter Eisentraut wrote:

On 22.02.21 11:05, Vik Fearing wrote:

This looks good to me, except that you forgot to add the feature stamp.
   Attached is a small diff to apply on top of your patch to fix that.


The feature code is from SQL:202x, whereas the table is relative to
SQL:2016.  We could add it, but probably with a comment.


OK.



done

Re: repeated decoding of prepared transactions

2021-02-26 Thread Amit Kapila

On Fri, Feb 26, 2021 at 4:13 PM Ajin Cherian  wrote:
>
> On Fri, Feb 26, 2021 at 7:47 PM Ajin Cherian  wrote:
>
> > I've updated snapshot_was_exported_at_  member to pg_replication_slots as 
> > well.
> > Do have a look and let me know if there are any comments.
>
> Update with both patches.
>

Thanks, I have made some minor changes to the first patch and now it
looks good to me. The changes are as below:
1. Removed the changes related to exposing this new parameter via view
as mentioned in my previous email.
2. Changed the variable name initial_consistent_point.
3. Ran pgindent, minor changes in comments, and modified the commit message.

Let me know what you think about these changes.

Next, I'll review your second patch which allows the 2PC option to be
enabled only at slot creation time.


--
With Regards,
Amit Kapila.
From b4d4504b64452ba6cc8602f66acac8209317da0a Mon Sep 17 00:00:00 2001
From: Ajin Cherian 
Date: Fri, 26 Feb 2021 02:58:49 -0500
Subject: [PATCH v5 1/2] Avoid repeated decoding of prepared transactions after
 the restart.

In commit a271a1b50e, we allowed decoding at prepare time and the prepare
was decoded again if there is a restart after decoding it. It was done
that way because we can't distinguish between the cases where we have not
decoded the prepare because it was prior to consistent snapshot or we have
decoded it earlier but restarted. To distinguish between these two cases,
we have introduced an initial_consisten_point at the slot level which is
an LSN at which we found a consistent point at the time of slot creation.
This is also the point where we have exported a snapshot for the initial
copy. So, prepare transaction prior to this point are sent along with
commit prepared.

Author: Ajin Cherian, based on idea by Andres Freund
Reviewed-by: Amit Kapila and Vignesh C
Discussion: d0f60d60-133d-bf8d-bd70-47784d8fabf3@enterprisedb.com">https://postgr.es/m/d0f60d60-133d-bf8d-bd70-47784d8fabf3@enterprisedb.com
---
 contrib/test_decoding/expected/twophase.out| 38 +++---
 contrib/test_decoding/expected/twophase_stream.out | 28 ++--
 doc/src/sgml/logicaldecoding.sgml  |  9 ++---
 src/backend/replication/logical/decode.c   |  2 ++
 src/backend/replication/logical/logical.c  |  3 +-
 src/backend/replication/logical/reorderbuffer.c| 10 +++---
 src/backend/replication/logical/snapbuild.c| 24 +-
 src/include/replication/reorderbuffer.h|  1 +
 src/include/replication/slot.h |  7 
 src/include/replication/snapbuild.h|  4 ++-
 10 files changed, 60 insertions(+), 66 deletions(-)

diff --git a/contrib/test_decoding/expected/twophase.out b/contrib/test_decoding/expected/twophase.out
index f9f6bed..c51870f 100644
--- a/contrib/test_decoding/expected/twophase.out
+++ b/contrib/test_decoding/expected/twophase.out
@@ -33,14 +33,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 
 COMMIT PREPARED 'test_prepared#1';
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-data
-
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:1
- table public.test_prepared1: INSERT: id[integer]:2
- PREPARE TRANSACTION 'test_prepared#1'
+   data
+---
  COMMIT PREPARED 'test_prepared#1'
-(5 rows)
+(1 row)
 
 -- Test that rollback of a prepared xact is decoded.
 BEGIN;
@@ -103,13 +99,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two
 
 COMMIT PREPARED 'test_prepared#3';
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-  data   
--
- BEGIN
- table public.test_prepared1: INSERT: id[integer]:4 data[text]:'frakbar'
- PREPARE TRANSACTION 'test_prepared#3'
+   data
+---
  COMMIT PREPARED 'test_prepared#3'
-(4 rows)
+(1 row)
 
 -- make sure stuff still works
 INSERT INTO test_prepared1 VALUES (6);
@@ -159,14 +152,10 @@ RESET statement_timeout;
 COMMIT PREPARED 'test_prepared_lock';
 -- consume the commit
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'two-phase-commit', '1', 'include-xids', '0', 'skip-empty-xacts', '1');
-   data

- BEGIN
- table public.test_prepared1: INSERT: id[integer]:8 data[text]:'othercol'
- table public.test_prepared1: INSERT: id[integer]:9 data[text]:'othercol2'
-

Re: repeated decoding of prepared transactions

2021-02-26 Thread Ajin Cherian

On Sat, 27 Feb, 2021, 1:59 pm Amit Kapila,  wrote:

>
> I have recommended above to change this name to initial_consistency_at
> because there are times when we don't export snapshot and we still set
> this like when creating slots with CRS_NOEXPORT_SNAPSHOT or when
> creating via SQL APIs.  I am not sure why Ajin neither changed the
> name nor responded to that comment. What is your opinion?
>

I am fine with the name initial_consistency_at. I am also fine with not
showing this in the pg_replication_slot view and keeping this internal.

Regards,
Ajin Cherian
Fujitsu Australia

>
>

Re: repeated decoding of prepared transactions

2021-02-26 Thread Amit Kapila

On Thu, Feb 25, 2021 at 5:04 PM vignesh C  wrote:
>
> On Wed, Feb 24, 2021 at 5:06 PM Ajin Cherian  wrote:
> >
> > On Wed, Feb 24, 2021 at 4:48 PM Ajin Cherian  wrote:
> >
> > > I plan to split this into two patches next. But do review and let me
> > > know if you have any comments.
> >
> > Attaching an updated patch-set with the changes for
> > snapshot_was_exported_at_lsn separated out from the changes for the
> > APIs pg_create_logical_replication_slot() and
> > pg_logical_slot_get_changes(). Along with a rebase that takes in a few
> > more commits since my last patch.
>
> One observation while verifying the patch I noticed that most of
> ReplicationSlotPersistentData structure members are displayed in
> pg_replication_slots, but I did not see snapshot_was_exported_at_lsn
> being displayed. Is this intentional? If not intentional we can
> include snapshot_was_exported_at_lsn in pg_replication_slots.
>

On thinking about this point, I feel we don't need this new parameter
in the view because I am not able to see how it is of any use to the
user. Over time, corresponding to that LSN there won't be any WAL
record or maybe WAL would be overwritten. I think this is primarily
for our internal use so let's not expose it. I intend to remove it
from the patch unless you have some reason for exposing this to the
user.

-- 
With Regards,
Amit Kapila.

Re: Remove latch.c workaround for Linux < 2.6.27

2021-02-26 Thread Tom Lane

Thomas Munro  writes:
> Commit 82ebbeb0 added a workaround for (already in 2017) ancient Linux
> kernels with no EPOLL_CLOEXEC.  I don't see any such systems in the
> build farm today (and if there is one hiding in there somewhere, it's
> well past time to upgrade).  I'd like to rip that code out, because
> I'm about to commit some new code that uses another 2.6.17+
> XXX_CLOEXEC flag, and it'd be silly to have to write new workaround
> code for that too, and a contradiction to have fallback code in one
> place but not another.  Any objections?

I believe we've dropped support for RHEL5, so no objection here.

regards, tom lane

Re: [HACKERS] logical decoding of two-phase transactions

2021-02-26 Thread Amit Kapila

On Sat, Feb 27, 2021 at 7:31 AM Peter Smith  wrote:
>
> On Fri, Feb 26, 2021 at 9:58 PM Amit Kapila  wrote:
> > 6.
> > + * XXX - Is there a potential timing problem here - e.g. if signal arrives
> > + * while executing this then maybe we will set table_states_valid without
> > + * refetching them?
> > + */
> > +static void
> > +FetchTableStates(bool *started_tx)
> > ..
> >
> > Can you explain which race condition you are worried about here which
> > is not possible earlier but can happen after this patch?
> >
>
> Yes, my question (in that XXX comment) was not about anything new for
> the current patch, because this FetchTableStates function has exactly
> the same logic as the HEAD code.
>
> I was only wondering if there is any possibility that one of the
> function calls (inside the if block) can end up calling
> CHECK_INTERRUPTS. If that could happen, then perhaps the
> table_states_valid  flag could be assigned false (by the
> invalidate_syncing_table_states signal handler) only to be
> immediately/wrongly overwritten as table_states_valid  = true in this
> FetchTableStates code.
>

This is not related to CHECK_FOR_INTERRUPTS. The
invalidate_syncing_table_states() can be called only when we process
invalidation messages which we do while locking the relation via
GetSubscriptionRelationstable_open->relation_open->LockRelationOid.
After that, it won't be done in that part of the code. So, I think we
don't need this comment.

-- 
With Regards,
Amit Kapila.

Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?

2021-02-26 Thread Thomas Munro

Here's a new version.  The condition variable patch 0001 fixes a bug:
CleanupProcSignalState() also needs to broadcast.  The hunk that
allows the interrupt handlers to use CVs while you're already waiting
on a CV is now in a separate patch 0002.  I'm thinking of going ahead
and committing those two.  The 0003 patch to achieve $SUBJECT needs
more discussion.
From 51bb33c8755efa11d24595ed0a27fa0b387e8153 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sat, 27 Feb 2021 15:40:11 +1300
Subject: [PATCH v4 1/3] Use condition variables for ProcSignalBarriers.

Instead of a poll/sleep loop, use a condition variable for precise
wake-up whenever a backend's pss_barrierGeneration advances.

Discussion: https://postgr.es/m/ca+hukgldemy2gbm80kz20gte6hnvwoere8kwcjk6-u56ost...@mail.gmail.com
---
 src/backend/storage/ipc/procsignal.c | 34 
 1 file changed, 14 insertions(+), 20 deletions(-)

diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index c43cdd685b..aa0362216b 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -23,6 +23,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "replication/walsender.h"
+#include "storage/condition_variable.h"
 #include "storage/ipc.h"
 #include "storage/latch.h"
 #include "storage/proc.h"
@@ -59,10 +60,11 @@
  */
 typedef struct
 {
-	pid_t		pss_pid;
-	sig_atomic_t pss_signalFlags[NUM_PROCSIGNALS];
+	volatile pid_t		pss_pid;
+	volatile sig_atomic_t pss_signalFlags[NUM_PROCSIGNALS];
 	pg_atomic_uint64 pss_barrierGeneration;
 	pg_atomic_uint32 pss_barrierCheckMask;
+	ConditionVariable pss_barrierCV;
 } ProcSignalSlot;
 
 /*
@@ -93,7 +95,7 @@ typedef struct
 	((flags) &= ~(((uint32) 1) << (uint32) (type)))
 
 static ProcSignalHeader *ProcSignal = NULL;
-static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
+static ProcSignalSlot *MyProcSignalSlot = NULL;
 
 static bool CheckProcSignal(ProcSignalReason reason);
 static void CleanupProcSignalState(int status, Datum arg);
@@ -142,6 +144,7 @@ ProcSignalShmemInit(void)
 			MemSet(slot->pss_signalFlags, 0, sizeof(slot->pss_signalFlags));
 			pg_atomic_init_u64(>pss_barrierGeneration, PG_UINT64_MAX);
 			pg_atomic_init_u32(>pss_barrierCheckMask, 0);
+			ConditionVariableInit(>pss_barrierCV);
 		}
 	}
 }
@@ -156,7 +159,7 @@ ProcSignalShmemInit(void)
 void
 ProcSignalInit(int pss_idx)
 {
-	volatile ProcSignalSlot *slot;
+	ProcSignalSlot *slot;
 	uint64		barrier_generation;
 
 	Assert(pss_idx >= 1 && pss_idx <= NumProcSignalSlots);
@@ -208,7 +211,7 @@ static void
 CleanupProcSignalState(int status, Datum arg)
 {
 	int			pss_idx = DatumGetInt32(arg);
-	volatile ProcSignalSlot *slot;
+	ProcSignalSlot *slot;
 
 	slot = >psh_slot[pss_idx - 1];
 	Assert(slot == MyProcSignalSlot);
@@ -237,6 +240,7 @@ CleanupProcSignalState(int status, Datum arg)
 	 * no barrier waits block on it.
 	 */
 	pg_atomic_write_u64(>pss_barrierGeneration, PG_UINT64_MAX);
+	ConditionVariableBroadcast(>pss_barrierCV);
 
 	slot->pss_pid = 0;
 }
@@ -391,13 +395,11 @@ EmitProcSignalBarrier(ProcSignalBarrierType type)
 void
 WaitForProcSignalBarrier(uint64 generation)
 {
-	long		timeout = 125L;
-
 	Assert(generation <= pg_atomic_read_u64(>psh_barrierGeneration));
 
 	for (int i = NumProcSignalSlots - 1; i >= 0; i--)
 	{
-		volatile ProcSignalSlot *slot = >psh_slot[i];
+		ProcSignalSlot *slot = >psh_slot[i];
 		uint64		oldval;
 
 		/*
@@ -409,20 +411,11 @@ WaitForProcSignalBarrier(uint64 generation)
 		oldval = pg_atomic_read_u64(>pss_barrierGeneration);
 		while (oldval < generation)
 		{
-			int			events;
-
-			CHECK_FOR_INTERRUPTS();
-
-			events =
-WaitLatch(MyLatch,
-		  WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-		  timeout, WAIT_EVENT_PROC_SIGNAL_BARRIER);
-			ResetLatch(MyLatch);
-
+			ConditionVariableSleep(>pss_barrierCV,
+   WAIT_EVENT_PROC_SIGNAL_BARRIER);
 			oldval = pg_atomic_read_u64(>pss_barrierGeneration);
-			if (events & WL_TIMEOUT)
-timeout = Min(timeout * 2, 1000L);
 		}
+		ConditionVariableCancelSleep();
 	}
 
 	/*
@@ -589,6 +582,7 @@ ProcessProcSignalBarrier(void)
 	 * next called.
 	 */
 	pg_atomic_write_u64(>pss_barrierGeneration, shared_gen);
+	ConditionVariableBroadcast(>pss_barrierCV);
 }
 
 /*
-- 
2.30.0

From 9fbb46cd8909d8058d32421e0e41c223d710d7d4 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sat, 27 Feb 2021 15:36:22 +1300
Subject: [PATCH v4 2/3] Allow condition variables to be used in interrupt
 code.

Adjust the condition variable sleep loop to work correctly when code
reached by its internal CHECK_FOR_INTERRUPTS() call interacts with
another condition variable.

There are no such cases currently, but a proposed patch would do this.

Discussion: https://postgr.es/m/ca+hukgldemy2gbm80kz20gte6hnvwoere8kwcjk6-u56ost...@mail.gmail.com
---
 src/backend/storage/lmgr/condition_variable.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git

Re: repeated decoding of prepared transactions

2021-02-26 Thread Amit Kapila

On Fri, Feb 26, 2021 at 7:26 PM vignesh C  wrote:
>
> On Fri, Feb 26, 2021 at 4:13 PM Ajin Cherian  wrote:
> >
> > On Fri, Feb 26, 2021 at 7:47 PM Ajin Cherian  wrote:
> >
> > > I've updated snapshot_was_exported_at_  member to pg_replication_slots as 
> > > well.
> > > Do have a look and let me know if there are any comments.
> >
> > Update with both patches.
>
> Thanks for fixing and providing an updated patch. Patch applies, make
> check and make check-world passes. I could see the issue working fine.
>
> Few minor comments:
> +   snapshot_was_exported_at 
> pg_lsn
> +  
> +  
> +   The address (LSN) at which the logical
> +   slot found a consistent point at the time of slot creation.
> +   NULL for physical slots.
> +  
> + 
>
>
> I had seen earlier also we had some discussion on naming
> snapshot_was_exported_at. Can we change snapshot_was_exported_at to
> snapshot_exported_lsn, I felt if we can include the lsn in the name,
> the user will be able to interpret easily and also it will be similar
> to other columns in pg_replication_slots view.
>

I have recommended above to change this name to initial_consistency_at
because there are times when we don't export snapshot and we still set
this like when creating slots with CRS_NOEXPORT_SNAPSHOT or when
creating via SQL APIs.  I am not sure why Ajin neither changed the
name nor responded to that comment. What is your opinion?

-- 
With Regards,
Amit Kapila.

Re: [HACKERS] Custom compression methods

2021-02-26 Thread Dilip Kumar

On Fri, Feb 26, 2021 at 8:10 PM Dilip Kumar  wrote:
>
> On Sun, Feb 21, 2021 at 5:33 PM Dilip Kumar  wrote:
> >
>
> Based on offlist discussion with Robert, I have done further analysis
> of the composite type data.  So the Idea is that I have analyzed all
> the callers of
> HeapTupleGetDatum and HeapTupleHeaderGetDatum and divide them into two
> category 1) Callers which are forming the tuple from values that can
> not have compressed/external data.
> 2) Callers which can have external/compressed data

I just realized that there is one more function
"heap_copy_tuple_as_datum" which is flattening the tuple based on the
HeapTupleHasExternal check, so I think I will have to analyze the
caller of this function as well and need to do a similar analysis,
although there are just a few callers for this.  And, I think the fix
in ExecEvalConvertRowtype is wrong, we will have to do something for
the compressed type here as well.  I am not sure what is the best way
to fix it because we are directly getting the input tuple so we can
not put an optimization of dettoasting before forming the tuple.  We
might detoast in execute_attr_map_tuple, when the source and target
row types are different because we are anyway deforming and processing
each filed in that function but the problem is execute_attr_map_tuple
is used at multiple places but for that, we can make another version
of this function which actually detoast along with conversion and use
that in ExecEvalConvertRowtype.  But if there is no tuple conversion
needed then we directly use heap_copy_tuple_as_datum and in that case,
there is no deforming at all so maybe, in this case, we can not do
anything but I think ExecEvalConvertRowtype should not be the very
common path.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] logical decoding of two-phase transactions

2021-02-26 Thread Peter Smith

On Fri, Feb 26, 2021 at 9:58 PM Amit Kapila  wrote:
> 6.
> + * XXX - Is there a potential timing problem here - e.g. if signal arrives
> + * while executing this then maybe we will set table_states_valid without
> + * refetching them?
> + */
> +static void
> +FetchTableStates(bool *started_tx)
> ..
>
> Can you explain which race condition you are worried about here which
> is not possible earlier but can happen after this patch?
>

Yes, my question (in that XXX comment) was not about anything new for
the current patch, because this FetchTableStates function has exactly
the same logic as the HEAD code.

I was only wondering if there is any possibility that one of the
function calls (inside the if block) can end up calling
CHECK_INTERRUPTS. If that could happen, then perhaps the
table_states_valid  flag could be assigned false (by the
invalidate_syncing_table_states signal handler) only to be
immediately/wrongly overwritten as table_states_valid  = true in this
FetchTableStates code.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Reducing WaitEventSet syscall churn

2021-02-26 Thread Thomas Munro

On Tue, Jan 5, 2021 at 6:10 PM Thomas Munro  wrote:
> For point 2, the question I am raising is: why should users get a
> special FATAL message in some places and not others, for PM death?
> However, if people are attached to that behaviour, we could still
> achieve goal 1 without goal 2 by handling PM death explicitly in
> walsender.c and I'd be happy to post an alternative patch set like
> that.

Here's the alternative patch set, with no change to existing error
message behaviour.  I'm going to commit this version and close this CF
item soon if there are no objections.
From 8f563edefdc2f276b3d8abeb019af1ff172bf7d9 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sat, 27 Feb 2021 13:34:37 +1300
Subject: [PATCH v8 1/2] Introduce symbolic names for FeBeWaitSet positions.

Previously we used 0 and 1 to refer to the socket and latch in far flung
parts of the tree, without any explanation.  Also use PGINVALID_SOCKET
rather than -1 in the place where we weren't already.

Reviewed-by: Kyotaro Horiguchi 
Discussion: https://postgr.es/m/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com
---
 src/backend/libpq/be-secure.c |  4 ++--
 src/backend/libpq/pqcomm.c| 18 +++---
 src/backend/utils/init/miscinit.c |  6 --
 src/include/libpq/libpq.h |  3 +++
 4 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index d1545a2ad6..bb603ad209 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -180,7 +180,7 @@ retry:
 
 		Assert(waitfor);
 
-		ModifyWaitEvent(FeBeWaitSet, 0, waitfor, NULL);
+		ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetSocketPos, waitfor, NULL);
 
 		WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , , 1,
 		 WAIT_EVENT_CLIENT_READ);
@@ -292,7 +292,7 @@ retry:
 
 		Assert(waitfor);
 
-		ModifyWaitEvent(FeBeWaitSet, 0, waitfor, NULL);
+		ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetSocketPos, waitfor, NULL);
 
 		WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , , 1,
 		 WAIT_EVENT_CLIENT_WRITE);
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 1e6b6db540..27a298f110 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -191,6 +191,9 @@ WaitEventSet *FeBeWaitSet;
 void
 pq_init(void)
 {
+	int		socket_pos PG_USED_FOR_ASSERTS_ONLY;
+	int		latch_pos PG_USED_FOR_ASSERTS_ONLY;
+
 	/* initialize state variables */
 	PqSendBufferSize = PQ_SEND_BUFFER_SIZE;
 	PqSendBuffer = MemoryContextAlloc(TopMemoryContext, PqSendBufferSize);
@@ -219,10 +222,19 @@ pq_init(void)
 #endif
 
 	FeBeWaitSet = CreateWaitEventSet(TopMemoryContext, 3);
-	AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE, MyProcPort->sock,
+	socket_pos = AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE,
+   MyProcPort->sock, NULL, NULL);
+	latch_pos = AddWaitEventToSet(FeBeWaitSet, WL_LATCH_SET, PGINVALID_SOCKET,
+  MyLatch, NULL);
+	AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, PGINVALID_SOCKET,
 	  NULL, NULL);
-	AddWaitEventToSet(FeBeWaitSet, WL_LATCH_SET, -1, MyLatch, NULL);
-	AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, -1, NULL, NULL);
+
+	/*
+	 * The event positions match the order we added them, but let's sanity
+	 * check them to be sure.
+	 */
+	Assert(socket_pos == FeBeWaitSetSocketPos);
+	Assert(latch_pos == FeBeWaitSetLatchPos);
 }
 
 /* 
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 734c66d4e8..8de70eb46d 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -202,7 +202,8 @@ SwitchToSharedLatch(void)
 	MyLatch = >procLatch;
 
 	if (FeBeWaitSet)
-		ModifyWaitEvent(FeBeWaitSet, 1, WL_LATCH_SET, MyLatch);
+		ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetLatchPos, WL_LATCH_SET,
+		MyLatch);
 
 	/*
 	 * Set the shared latch as the local one might have been set. This
@@ -221,7 +222,8 @@ SwitchBackToLocalLatch(void)
 	MyLatch = 
 
 	if (FeBeWaitSet)
-		ModifyWaitEvent(FeBeWaitSet, 1, WL_LATCH_SET, MyLatch);
+		ModifyWaitEvent(FeBeWaitSet, FeBeWaitSetLatchPos, WL_LATCH_SET,
+		MyLatch);
 
 	SetLatch(MyLatch);
 }
diff --git a/src/include/libpq/libpq.h b/src/include/libpq/libpq.h
index b41b10620a..e4e5c21565 100644
--- a/src/include/libpq/libpq.h
+++ b/src/include/libpq/libpq.h
@@ -55,6 +55,9 @@ extern const PGDLLIMPORT PQcommMethods *PqCommMethods;
  */
 extern WaitEventSet *FeBeWaitSet;
 
+#define FeBeWaitSetSocketPos 0
+#define FeBeWaitSetLatchPos 1
+
 extern int	StreamServerPort(int family, const char *hostName,
 			 unsigned short portNumber, const char *unixSocketDir,
 			 pgsocket ListenSocket[], int MaxListen);
-- 
2.30.0

From 22b2c474b476ca91b579408f73ebdc014bb76083 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 24 Feb 2020 23:48:29 +1300
Subject: [PATCH v8 2/2] Use FeBeWaitSet for walsender.c.

This avoids the need to set up and tear down a new WaitEventSet every
time we

Re: row filtering for logical replication

2021-02-26 Thread Euler Taveira

On Mon, Feb 22, 2021, at 9:28 AM, Euler Taveira wrote:
> On Mon, Feb 22, 2021, at 7:45 AM, Önder Kalacı wrote:
>> Thanks for working on this. I did some review and testing, please see my 
>> comments below.
> I appreciate your review. I'm working on a new patch set and expect to post 
> it soon.
I'm attaching a new patch set. This new version improves documentation and 
commit  
messages and incorporates a few debug messages. I did a couple of tests and
didn't find issues.

Here are some numbers from my i7 using a simple expression (aid > 0) on table
pgbench_accounts.

$ awk '/row filter time/ {print $9}' postgresql.log | /tmp/stat.pl 99
mean:   33.00 us
stddev: 17.65 us
median: 28.83 us
min-max:[3.48 .. 6404.84] us
percentile(99): 49.66 us
mode:   41.71 us

I don't expect 0005 and 0006 to be included. I attached them to help with some
tests.

--
Euler Taveira
EDB   https://www.enterprisedb.com/
From 596b278d4be77f67467aa1d61ce26e765b078197 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Mon, 18 Jan 2021 11:53:34 -0300
Subject: [PATCH 1/6] Rename a WHERE node

A WHERE clause will be used for row filtering in logical replication. We
already have a similar node: 'WHERE (condition here)'. Let's rename the
node to a generic name and use it for row filtering too.
---
 src/backend/parser/gram.y | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index dd72a9fc3c..ecfd98ba5b 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -485,7 +485,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	def_arg columnElem where_clause where_or_current_clause
 a_expr b_expr c_expr AexprConst indirection_el opt_slice_bound
 columnref in_expr having_clause func_table xmltable array_expr
-ExclusionWhereClause operator_def_arg
+OptWhereClause operator_def_arg
 %type 	rowsfrom_item rowsfrom_list opt_col_def_list
 %type  opt_ordinality
 %type 	ExclusionConstraintList ExclusionConstraintElem
@@ -3806,7 +3806,7 @@ ConstraintElem:
 	$$ = (Node *)n;
 }
 			| EXCLUDE access_method_clause '(' ExclusionConstraintList ')'
-opt_c_include opt_definition OptConsTableSpace  ExclusionWhereClause
+opt_c_include opt_definition OptConsTableSpace  OptWhereClause
 ConstraintAttributeSpec
 {
 	Constraint *n = makeNode(Constraint);
@@ -3908,7 +3908,7 @@ ExclusionConstraintElem: index_elem WITH any_operator
 			}
 		;

-ExclusionWhereClause:
+OptWhereClause:
 			WHERE '(' a_expr ')'	{ $$ = $3; }
 			| /*EMPTY*/{ $$ = NULL; }
 		;
-- 
2.20.1

From 9b1d46d5c146d591022c12c7acec8d61d4e0bfcc Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Mon, 18 Jan 2021 12:07:51 -0300
Subject: [PATCH 2/6] Row filter for logical replication

This feature adds row filter for publication tables. When you define or modify
a publication you can optionally filter rows that does not satisfy a WHERE
condition. It allows you to partially replicate a database or set of tables.
The row filter is per table which means that you can define different row
filters for different tables. A new row filter can be added simply by
informing the WHERE clause after the table name. The WHERE expression must be
enclosed by parentheses.

The WHERE clause should probably contain only columns that are part of the
primary key or that are covered by REPLICA IDENTITY. Otherwise, and DELETEs
won't be replicated. DELETE uses the old row version (that is limited to
primary key or REPLICA IDENTITY) to evaluate the row filter. INSERT and UPDATE
use the new row version to evaluate the row filter, hence, you can use any
column. If the row filter evaluates to NULL, it returns false. For simplicity,
functions are not allowed; it could possibly be addressed in a future patch.

If you choose to do the initial table synchronization, only data that satisfies
the row filters is sent. If the subscription has several publications in which
a table has been published with different WHERE clauses, rows must satisfy all
expressions to be copied.

If your publication contains partitioned table, the parameter
publish_via_partition_root determines if it uses the partition row filter (if
the parameter is false -- default) or the partitioned table row filter.
---
 doc/src/sgml/catalogs.sgml  |   8 +
 doc/src/sgml/ref/alter_publication.sgml |  11 +-
 doc/src/sgml/ref/create_publication.sgml|  38 +++-
 doc/src/sgml/ref/create_subscription.sgml   |   8 +-
 src/backend/catalog/pg_publication.c|  52 -
 src/backend/commands/publicationcmds.c  |  96 +
 src/backend/parser/gram.y   |  28 ++-
 src/backend/parser/parse_agg.c  |  10 +
 src/backend/parser/parse_expr.c |  13 ++
 src/backend/parser/parse_func.c |   3 +
 src/backend/replication/logical/tablesync.c |  93 -

Re: Interest in GSoC 2021 Projects

2021-02-26 Thread Mark Wong

Hi,

On Fri, Feb 26, 2021 at 03:56:17PM +0800, Yixin Shi wrote:
> Hi,
> 
> 
> 
> I am Yixin Shi, a junior student majoring in Computer Science at University
> of Michigan. I notice that the project ideas for GSoC 2021 have been posted
> on your webpage and I am quite interested in two of them. I really wish to
> take part in the project *Make plsample a more complete procedural language
> handler example (2021)* considering both my interest and ability. The
> potential mentor for this Project is *Mark Wong*. I am also interested in
> the project *Improve PostgreSQL Regression Test Coverage (2021)* and the
> potential mentor is *Andreas Scherbaum* and *Stephen Frost*.
> 
> 
> 
> I would like to learn more about these two projects but failed to contact
> the mentors. How can I contact them? Also, I really hope to join the
> project. Are there any suggestions on application?

The idea for plsample is to continue providing working code and
documentation examples so that plsample can be used as a template for
creating additional language handlers.

Depending on the individual's comfort level, some effort may need to be
put into studying the current handlers for ideas on how to implement the
lacking functionality in plsample.

Does that help?

Regards,
Mark

Re: [PATCH] Note effect of max_replication_slots on subscriber side in documentation.

2021-02-26 Thread Paul Martinez

On Fri, Feb 26, 2021 at 5:22 AM Amit Kapila  wrote:
>
> https://www.postgresql.org/docs/devel/logical-replication-config.html
>

Ah, yep. I added a clause to the end of the sentence to clarify why we're
using max_replication_slots here:

- The subscriber also requires the max_replication_slots to be set.

+ The subscriber also requires that max_replication_slots be set to
+ configure how many replication origins can be tracked.

>
> Okay, that makes sense. However, I have sent a patch today (see [1])
> where I have slightly updated the subscriber-side configuration
> paragraph. From PG-14 onwards, table synchronization workers also use
> origins on subscribers, so you might want to adjust.
>
> ...
>
> I guess we can leave adding GUC to some other day as that might
> require a bit broader acceptance and we are already near to the start
> of last CF. I think we can still consider it if we few more people
> share the same opinion as yours.
>

Great. I'll wait to update the GUC patch until your patch and/or my
doc-only patch get merged. Should I add it to the March CF?

Separate question: are documentation updates like these ever backported
to older versions that are still supported? And if so, would the changes
be reflected immediately, or would they require a minor point release?
When I was on an older release I found that I'd jump back and forth
between the version I was using and the latest version to see if
anything had changed.

- Paul

max_replication_slots_subscriber_doc_v01.diff
Description: Binary data

Remove latch.c workaround for Linux < 2.6.27

2021-02-26 Thread Thomas Munro

Hello,

Commit 82ebbeb0 added a workaround for (already in 2017) ancient Linux
kernels with no EPOLL_CLOEXEC.  I don't see any such systems in the
build farm today (and if there is one hiding in there somewhere, it's
well past time to upgrade).  I'd like to rip that code out, because
I'm about to commit some new code that uses another 2.6.17+
XXX_CLOEXEC flag, and it'd be silly to have to write new workaround
code for that too, and a contradiction to have fallback code in one
place but not another.  Any objections?
From 9c6ac992f60a904f8073e62b1b8085ff9df7ae0b Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sat, 27 Feb 2021 11:22:16 +1300
Subject: [PATCH] Remove latch.c workaround for Linux < 2.6.27.

Ancient Linux had no EPOLL_CLOEXEC flag, so commit 82ebbeb0 added a
separate code path with an fcntl() call.  Kernels of that vintage are
long gone.  Now seems like a good time to drop this extra code, because
otherwise we'd have to add similar workaround code to new patches using
XXX_CLOEXEC flags to avoid contradicting ourselves.
---
 src/backend/storage/ipc/latch.c | 21 +
 1 file changed, 1 insertion(+), 20 deletions(-)

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index f2d005eea0..79b9627831 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -666,31 +666,12 @@ CreateWaitEventSet(MemoryContext context, int nevents)
 		/* treat this as though epoll_create1 itself returned EMFILE */
 		elog(ERROR, "epoll_create1 failed: %m");
 	}
-#ifdef EPOLL_CLOEXEC
 	set->epoll_fd = epoll_create1(EPOLL_CLOEXEC);
 	if (set->epoll_fd < 0)
 	{
 		ReleaseExternalFD();
 		elog(ERROR, "epoll_create1 failed: %m");
 	}
-#else
-	/* cope with ancient glibc lacking epoll_create1 (e.g., RHEL5) */
-	set->epoll_fd = epoll_create(nevents);
-	if (set->epoll_fd < 0)
-	{
-		ReleaseExternalFD();
-		elog(ERROR, "epoll_create failed: %m");
-	}
-	if (fcntl(set->epoll_fd, F_SETFD, FD_CLOEXEC) == -1)
-	{
-		int			save_errno = errno;
-
-		close(set->epoll_fd);
-		ReleaseExternalFD();
-		errno = save_errno;
-		elog(ERROR, "fcntl(F_SETFD) failed on epoll descriptor: %m");
-	}
-#endif			/* EPOLL_CLOEXEC */
 #elif defined(WAIT_USE_KQUEUE)
 	if (!AcquireExternalFD())
 	{
@@ -736,7 +717,7 @@ CreateWaitEventSet(MemoryContext context, int nevents)
  *
  * Note: preferably, this shouldn't have to free any resources that could be
  * inherited across an exec().  If it did, we'd likely leak those resources in
- * many scenarios.  For the epoll case, we ensure that by setting FD_CLOEXEC
+ * many scenarios.  For the epoll case, we ensure that by setting EPOLL_CLOEXEC
  * when the FD is created.  For the Windows case, we assume that the handles
  * involved are non-inheritable.
  */
-- 
2.30.0

Re: SSL SNI

2021-02-26 Thread Greg Stark

> Do you mean the IPv6 detection code is not correct?  What is the problem?

This bit, will recognize ipv4 addresses but not ipv6 addresses:

+ /*
+ * Set Server Name Indication (SNI), but not if it's a literal IP address.
+ * (RFC 6066)
+ */
+ if (!(strspn(conn->pghost, "0123456789.") == strlen(conn->pghost) ||
+   strchr(conn->pghost, ':')))
+ {

Re: Allow matching whole DN from a client certificate

2021-02-26 Thread Andrew Dunstan



On 2/26/21 2:55 PM, Jacob Champion wrote:
> On Sat, 2021-01-30 at 16:18 -0500, Andrew Dunstan wrote:
>> Making incremental additions to the certificate set easier wouldn't be a
>> bad thing.
>>
>> I wonder if we should really be setting 1 as the serial number, though.
>> Might it not be better to use, say, `date +%Y%m%d01` rather like we do
>> with catalog version numbers?
> I have been experimenting a bit with both of these suggestions; hope to
> have something in time for commitfest on Monday. Writing new tests for
> NSS has run into the same problems you've mentioned.
>
> FYI, I've pulled the port->peer_dn functionality you've presented here
> into my authenticated identity patchset at [1].
>
> --Jacob
>
> [1] 
> https://www.postgresql.org/message-id/flat/c55788dd1773c521c862e8e0dddb367df51222be.camel%40vmware.com


Cool.


I think the thing that's principally outstanding w.r.t. this patch is
what format we should use to extract the DN. Should we use RFC2253,
which reverses the field order, as has been suggested upthread and is in
the latest patch? I'm slightly worried that it might be a POLA
violation. But I don't have terribly strong feelings about it.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY

2021-02-26 Thread Alvaro Herrera

On 2021-Jan-10, Justin Pryzby wrote:

> On Fri, Jan 08, 2021 at 04:14:33PM -0300, Alvaro Herrera wrote:
> > > > I ended up with apparently broken constraint when running multiple 
> > > > loops around
> > > > a concurrent detach / attach:
> > > > 
> > > > while psql -h /tmp postgres -c "ALTER TABLE p ATTACH PARTITION p1 FOR 
> > > > VALUES FROM (1)TO(2)" -c "ALTER TABLE p DETACH PARTITION p1 
> > > > CONCURRENTLY"; do :; done&
> > > > while psql -h /tmp postgres -c "ALTER TABLE p ATTACH PARTITION p1 FOR 
> > > > VALUES FROM (1)TO(2)" -c "ALTER TABLE p DETACH PARTITION p1 
> > > > CONCURRENTLY"; do :; done&
> > > > 
> > > > "p1_check" CHECK (true)
> > > > "p1_i_check" CHECK (i IS NOT NULL AND i >= 1 AND i < 2)
> > > 
> > > Not good.
> > 
> > Haven't had time to investigate this problem yet.
> 
> I guess it's because you commited the txn and released lock in the middle of
> the command.

Hmm, but if we take this approach, then we're still vulnerable to the
problem that somebody can do DETACH CONCURRENTLY and cancel the wait (or
crash the server), then mess up the state before doing DETACH FINALIZE:
when they cancel the wait, the lock will be released.

I think the right fix is to disallow any action on a partition which is
pending detach other than DETACH FINALIZE.  (Didn't do that here.)

Here's a rebase to current sources; there are no changes from v5.

-- 
Álvaro Herrera   Valdivia, Chile
"No hay ausente sin culpa ni presente sin disculpa" (Prov. francés)
>From d042b99ad3368ba0b658ac9260450ed8e39accea Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 13 Jul 2020 20:15:30 -0400
Subject: [PATCH v6 1/2] Let ALTER TABLE exec routines deal with the relation

This means that ATExecCmd relies on AlteredRelationInfo->rel instead of
keeping the relation as a local variable; this is useful if the
subcommand needs to modify the relation internally.  For example, a
subcommand that internally commits its transaction needs this.
---
 src/backend/commands/tablecmds.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index b2457a6924..b990063d38 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -156,6 +156,8 @@ typedef struct AlteredTableInfo
 	Oid			relid;			/* Relation to work on */
 	char		relkind;		/* Its relkind */
 	TupleDesc	oldDesc;		/* Pre-modification tuple descriptor */
+	/* Transiently set during Phase 2, normally set to NULL */
+	Relation	rel;
 	/* Information saved by Phase 1 for Phase 2: */
 	List	   *subcmds[AT_NUM_PASSES]; /* Lists of AlterTableCmd */
 	/* Information saved by Phases 1/2 for Phase 3: */
@@ -353,7 +355,7 @@ static void ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 	  AlterTableUtilityContext *context);
 static void ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 			  AlterTableUtilityContext *context);
-static void ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
+static void ATExecCmd(List **wqueue, AlteredTableInfo *tab,
 	  AlterTableCmd *cmd, LOCKMODE lockmode, int cur_pass,
 	  AlterTableUtilityContext *context);
 static AlterTableCmd *ATParseTransformCmd(List **wqueue, AlteredTableInfo *tab,
@@ -4513,7 +4515,6 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 		{
 			AlteredTableInfo *tab = (AlteredTableInfo *) lfirst(ltab);
 			List	   *subcmds = tab->subcmds[pass];
-			Relation	rel;
 			ListCell   *lcmd;
 
 			if (subcmds == NIL)
@@ -4522,10 +4523,10 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 			/*
 			 * Appropriate lock was obtained by phase 1, needn't get it again
 			 */
-			rel = relation_open(tab->relid, NoLock);
+			tab->rel = relation_open(tab->relid, NoLock);
 
 			foreach(lcmd, subcmds)
-ATExecCmd(wqueue, tab, rel,
+ATExecCmd(wqueue, tab,
 		  castNode(AlterTableCmd, lfirst(lcmd)),
 		  lockmode, pass, context);
 
@@ -4537,7 +4538,11 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
 			if (pass == AT_PASS_ALTER_TYPE)
 ATPostAlterTypeCleanup(wqueue, tab, lockmode);
 
-			relation_close(rel, NoLock);
+			if (tab->rel)
+			{
+relation_close(tab->rel, NoLock);
+tab->rel = NULL;
+			}
 		}
 	}
 
@@ -4563,11 +4568,12 @@ ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
  * ATExecCmd: dispatch a subcommand to appropriate execution routine
  */
 static void
-ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
+ATExecCmd(List **wqueue, AlteredTableInfo *tab,
 		  AlterTableCmd *cmd, LOCKMODE lockmode, int cur_pass,
 		  AlterTableUtilityContext *context)
 {
 	ObjectAddress address = InvalidObjectAddress;
+	Relation	rel = tab->rel;
 
 	switch (cmd->subtype)
 	{
@@ -5670,6 +5676,7 @@ ATGetQueueEntry(List **wqueue, Relation rel)
 	 */
 	tab = (AlteredTableInfo *) palloc0(sizeof(AlteredTableInfo));
 	tab->relid = relid;
+	tab->rel = NULL;			/* set later */
 	tab->relkind = rel->rd_rel->relkind;
 	tab->oldDesc

authtype parameter in libpq

2021-02-26 Thread Daniel Gustafsson

When looking at disallowing SSL compression I found the parameter "authtype"
which was deprecated in commit d5bbe2aca55bc8 on January 26 1998.  While I do
think there is a case to be made for the backwards compatibility having run its
course on this one, shouldn't we at least remove the environment variable and
default compiled fallback for it to save us a getenv call when filling in the
option defaults?

--
Daniel Gustafsson   https://vmware.com/



0001-Remove-defaults-from-authtype-parameter.patch
Description: Binary data

Re: Disallow SSL compression?

2021-02-26 Thread Daniel Gustafsson

> On 26 Feb 2021, at 03:34, Michael Paquier  wrote:

> There is just pain waiting ahead when breaking connection strings
> that used to work previously.  A "while" could take a long time
> though, see the case of "tty" that's still around (cb7fb3c).

I see your tty removal from 2003, and raise you one "authtype" which was axed
on January 26 1998 in commit d5bbe2aca55bc833e38c768d7f82, but which is still
around. More on that in a separate thread though.

--
Daniel Gustafsson   https://vmware.com/

Re: Allow matching whole DN from a client certificate

2021-02-26 Thread Jacob Champion

On Sat, 2021-01-30 at 16:18 -0500, Andrew Dunstan wrote:
> Making incremental additions to the certificate set easier wouldn't be a
> bad thing.
> 
> I wonder if we should really be setting 1 as the serial number, though.
> Might it not be better to use, say, `date +%Y%m%d01` rather like we do
> with catalog version numbers?

I have been experimenting a bit with both of these suggestions; hope to
have something in time for commitfest on Monday. Writing new tests for
NSS has run into the same problems you've mentioned.

FYI, I've pulled the port->peer_dn functionality you've presented here
into my authenticated identity patchset at [1].

--Jacob

[1] 
https://www.postgresql.org/message-id/flat/c55788dd1773c521c862e8e0dddb367df51222be.camel%40vmware.com

Re: More test/kerberos tweaks

2021-02-26 Thread Jacob Champion

On Fri, 2021-02-05 at 21:54 +, Jacob Champion wrote:
> That just leaves the first patch, then.

I've moved the first patch into the commitfest entry for [1], which
depends on it.

--Jacob

[1] 
https://www.postgresql.org/message-id/flat/c55788dd1773c521c862e8e0dddb367df51222be.camel%40vmware.com

Re: Proposal: Save user's original authenticated identity for logging

2021-02-26 Thread Jacob Champion

On Thu, 2021-02-11 at 20:32 +, Jacob Champion wrote:
> v2 just updates the patchset to remove the Windows TODO and fill in the
> patch notes; no functional changes. The question about escaping log
> contents remains.

v3 rebases onto latest master, for SSL test conflicts.

Note:
- Since the 0001 patch from [1] is necessary for the new Kerberos tests
in 0003, I won't make a separate commitfest entry for it.
- 0002 would be subsumed by [2] if it's committed.

--Jacob

[1] 
https://www.postgresql.org/message-id/flat/fe7a46f8d46ebb074ba1572d4b5e4af72dc95420.camel%40vmware.com
[2] 
https://www.postgresql.org/message-id/flat/fd96ae76-a8e3-ef8e-a642-a592f5b76771%40dunslane.net#642757cec955d8e923025898402f9452
From ed225e1d1671dcd4da94a2244171a206b88563cc Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Wed, 3 Feb 2021 11:04:42 -0800
Subject: [PATCH v3 1/3] test/kerberos: only search forward in logs

The log content tests searched through the entire log file, from the
beginning, for every match. If two tests shared the same expected log
content, it was possible for the second test to get a false positive by
matching against the first test's output. (This could be seen by
modifying one of the expected-failure tests to expect the same output as
a previous happy-path test.)

Store the offset of the previous match, and search forward from there
during the next match, resetting the offset every time the log file
changes. This could still result in false positives if one test spits
out two or more matching log lines, but it should be an improvement over
what's there now.
---
 src/test/kerberos/t/001_auth.pl | 24 +++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/src/test/kerberos/t/001_auth.pl b/src/test/kerberos/t/001_auth.pl
index 079321bbfc..c721d24dbd 100644
--- a/src/test/kerberos/t/001_auth.pl
+++ b/src/test/kerberos/t/001_auth.pl
@@ -182,6 +182,9 @@ $node->safe_psql('postgres', 'CREATE USER test1;');
 
 note "running tests";
 
+my $current_log_name = '';
+my $current_log_position;
+
 # Test connection success or failure, and if success, that query returns true.
 sub test_access
 {
@@ -221,18 +224,37 @@ sub test_access
 		$lfname =~ s/^stderr //;
 		chomp $lfname;
 
+		if ($lfname ne $current_log_name)
+		{
+			$current_log_name = $lfname;
+
+			# Search from the beginning of this new file.
+			$current_log_position = 0;
+			note "current_log_position = $current_log_position";
+		}
+
 		# might need to retry if logging collector process is slow...
 		my $max_attempts = 180 * 10;
 		my $first_logfile;
 		for (my $attempts = 0; $attempts < $max_attempts; $attempts++)
 		{
 			$first_logfile = slurp_file($node->data_dir . '/' . $lfname);
-			last if $first_logfile =~ m/\Q$expect_log_msg\E/;
+
+			# Don't include previously matched text in the search.
+			$first_logfile = substr $first_logfile, $current_log_position;
+			if ($first_logfile =~ m/\Q$expect_log_msg\E/g)
+			{
+$current_log_position += pos($first_logfile);
+last;
+			}
+
 			usleep(100_000);
 		}
 
 		like($first_logfile, qr/\Q$expect_log_msg\E/,
 			 'found expected log file content');
+
+		note "current_log_position = $current_log_position";
 	}
 
 	return;
-- 
2.25.1

From def87ea906d37764d7d6a6e0c6d473ffacf2c801 Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Mon, 8 Feb 2021 10:53:20 -0800
Subject: [PATCH v3 2/3] ssl: store client's DN in port->peer_dn

Original patch by Andrew Dunstan:

https://www.postgresql.org/message-id/fd96ae76-a8e3-ef8e-a642-a592f5b76771%40dunslane.net

but I've taken out the clientname=DN functionality; all that will be
needed for the next patch is the DN string.
---
 src/backend/libpq/be-secure-openssl.c | 53 +++
 src/include/libpq/libpq-be.h  |  1 +
 2 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 4c4f025eb1..d0184a2ce2 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -543,22 +543,25 @@ aloop:
 	/* Get client certificate, if available. */
 	port->peer = SSL_get_peer_certificate(port->ssl);
 
-	/* and extract the Common Name from it. */
+	/* and extract the Common Name / Distinguished Name from it. */
 	port->peer_cn = NULL;
+	port->peer_dn = NULL;
 	port->peer_cert_valid = false;
 	if (port->peer != NULL)
 	{
 		int			len;
+		X509_NAME  *x509name = X509_get_subject_name(port->peer);
+		char	   *peer_cn;
+		char	   *peer_dn;
+		BIO		   *bio = NULL;
+		BUF_MEM*bio_buf = NULL;
 
-		len = X509_NAME_get_text_by_NID(X509_get_subject_name(port->peer),
-		NID_commonName, NULL, 0);
+		len = X509_NAME_get_text_by_NID(x509name, NID_commonName, NULL, 0);
 		if (len != -1)
 		{
-			char	   *peer_cn;
-
 			peer_cn = MemoryContextAlloc(TopMemoryContext, len + 1);
-			r = X509_NAME_get_text_by_NID(X509_get_subject_name(port->peer),
-		  NID_commonName, peer_cn, len + 1);
+			r =

Re: [PATCH] pgbench: Remove ecnt, a member variable of CState

2021-02-26 Thread Alvaro Herrera

On 2021-Feb-26, Michael Paquier wrote:

> On Fri, Feb 26, 2021 at 05:39:31PM +0900, miyake_kouta wrote:
> > Also, the current pgbench's client abandons processing after hitting error,
> > so this variable is no need, I think.
> 
> Agreed.  Its last use was in 12788ae, as far as I can see.  So let's
> just cleanup that.

+1

-- 
Álvaro Herrera   Valdivia, Chile

Re: Disallow SSL compression?

2021-02-26 Thread Daniel Gustafsson

> On 26 Feb 2021, at 11:02, Magnus Hagander  wrote:
> On Mon, Feb 22, 2021 at 12:28 PM Daniel Gustafsson  wrote:
>> 
>>> On 22 Feb 2021, at 11:52, Magnus Hagander  wrote:

>>> Agreed. It will also help with not having to implement it in new SSL
>>> implementations I'm sure :)
>> 
>> Not really, no other TLS library I would consider using actually has
>> compression (except maybe wolfSSL?).  GnuTLS and NSS both removed it, and
>> Secure Transport and Schannel never had it AFAIK.
> 
> Ah, well, you'd still have to implement some empty placeholder :)

Correct.

 The attached removes sslcompression to see what it would look like.  The 
 server
 actively disallows it and the parameter is removed, but the sslcompression
 column in the stat view is retained.  An alternative could be to retain the
 parameter but not act on it in order to not break scripts etc, but that 
 just
 postpones the pain until when we inevitably do remove it.
 
 Thoughts?  Any reason to keep supporting SSL compression or is it time for 
 v14
 to remove it?  Are there still users leveraging this for protocol 
 compression
 without security making it worthwhile to keep?
>>> 
>>> When the last round of discussion happened, I had multiple customers
>>> who did exactly that. None of them do that anymore, due to the pain of
>>> making it work...
>> 
>> Unsurprisingly.
>> 
>>> I think for libpq we want to keep the option for a while but making it
>>> a no-op, to not unnecessarily break systems where people just upgrade
>>> libpq, though. And document it as such having no effect and "will
>>> eventually be removed".
>> 
>> Agreed, that's better.
> 
> In fact, pg_basebackup with -R will generate a connection string that
> includes sslcompression=0 when used today (unless you jump through the
> hoops to make it work), so not accepting that noe would definitely
> break a lot of things needlessly.

Yup, and as mentioned elsewhere in the thread the standard way of doing it is
to leave the param behind and just document it as not in use.  Attached is a v2
which retains the sslcompression parameter for backwards compatibility.

--
Daniel Gustafsson   https://vmware.com/



v2-0001-Disallow-SSL-compression.patch
Description: Binary data

Systems Integration and Raising of the Abstraction Level

2021-02-26 Thread Alejandro Sánchez

We have highly evolved systems such as SQL, HTTP, HTML, file formats or
high level programming languages such as Java or PHP that allow us to
program many things with little code. Even so a lot of effort is 
invested in the integration of these systems. To try to reduce this
problem libraries and frameworks that help in some ways are created but
the integration is not complete.

It is well known that most of the time when you try to create something
to integrate several incompatible systems what you get in the end is to
have another incompatible system :). Still I think the integration
between the systems mentioned above is something very important that
can mean a great step in the evolution of computing and worth a try.

To explore how this integration can be I have created a framework that
I have called NextTypes and that its main objective is the integration
of data types. For me it is something very illogical that something as
basic as a 16 bits integer receives a different name in each of the
systems ("smallint", "short", "number") and can also be signed or
unsigned. In any moment, due to a mistake from the programmer, the
number of the programming language does not fit in the database column.
Besides these names are little indicative of its characteristics, it
would be clearer for example to use "int16". Whatever names are chosen
the most important thing is to use in all systems the same names for
types of the same characteristics.

Also there is no standard system for defining composite data types of
primitive types and other composite types. From an HTML form to a SQL
table or from one application to another are required multiple
transformations of the data. Lack of integration also lowers the level
of abstraction, making it necessary to do lots of low level stuff for
systems to fit.

NextTypes at this moment is nothing more than another incompatible
system with the others. It simply integrates them quite a bit and
raises the level of abstraction. But what I would like is that the
things that compose NextTypes were included in the systems it
integrates.

Finally I would like to list some examples of improvements in database
managers, SQL, HTTP, HTML, browsers and programming languages that
would help the integration and elevation of the level of abstraction.
Some of these enhancements are already included in NextTypes and other
frameworks.


SQL
---

- Custom metadata in tables and columns.
- Date of creation and modification of the rows.
- Date of creation and modification of the definition of the tables.
- Use of table and column names in prepared statements.

Example: select # from # where # = ?;

- Use of arrays in prepared statements.

Example: select # from article where id in (?);

# = author,title
? = 10,24,45

- Standardization of ranges of valid values and resolution for date,
time and datetime types in database managers an HTML time element.


PostgreSQL
--

- Facilitate access to the definition of full text search indexes with
a function to parse "pg_index.indexprs" column.


Other database managers
---

- Allow transactional DDL, deferrable constraints and composite types.


JDBC


- High level methods that allow queries with the execution of a
single method.

Example: Tuple [] tuples = query("select author,title from article
where id in (?), ids);

- Integration with java.time data types.


HTTP - Servers
--

- Processing of arrays of elements composed of several parameters.

fields:0:type = string
fields:0:name = title
fields:0:parameters = 250
fields:0:not_null = true
  
fields:1:type = string
fields:1:name = author
fields:1:parameters = 250
fields:1:not_null = true

   Another possibility is to generate in the browser arrays of JSON
objects from the forms. 
 
"fields": [
{
"type": "string",
"name": "title",
"parameters": 250,
"not_null": true
}, {
"type": "string",
"name": "author",
"parameters": 250,
"not_null": true
}
]


XML/HTML - BROWSER
--

- Input elements for different types of numbers with min and max
values: 16 bits integer, 32 bits integer, 32 bits float and 64 bits
float.

- Input elements for images, audios and videos with preview.

- Timezone input element.

- Boolean input element with "true" and "false" values.

- Null value in file inputs.

- Clear button in file inputs like in date and time inputs.

- Show size in file inputs.

- Extension of the DOM API with high level and chainable methods.

Example: paragraph.appendElement("a").setAttribute("href",
"/article");

- Change of the "action" parameter of the forms to "target" to indicate
the URL where to execute the action. The "action" parameter is moved to
the different buttons on the form and allows executing a different
action with each of the the buttons.

Re: Some regular-expression performance hacking

2021-02-26 Thread Tom Lane

"Joel Jacobson"  writes:
> On Fri, Feb 26, 2021, at 01:16, Tom Lane wrote:
>> 0007-smarter-regex-allocation-2.patch

> I've successfully tested this patch.

Cool, thanks for testing!

> That's a 29% speed-up compared to HEAD! Truly amazing.

Hmm, I'm still only seeing about 10% or a little better.
I wonder why the difference in your numbers.  Either way,
though, I'll take it, since the main point here is to cut
memory consumption and not so much cycles.

regards, tom lane

Re: CREATE INDEX CONCURRENTLY on partitioned index

2021-02-26 Thread Justin Pryzby

On Mon, Feb 15, 2021 at 10:07:05PM +0300, Anastasia Lubennikova wrote:
> 5) Speaking of documentation, I think we need to add a paragraph about CIC
> on partitioned indexes which will explain that invalid indexes may appear
> and what user should do to fix them.

I'm not sure about that - it's already documented in general, for
nonpartitioned indexes.

-- 
Justin
>From fb60da3c0fac8f1699a6caeea57476770c66576d Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 6 Jun 2020 17:42:23 -0500
Subject: [PATCH 1/5] Allow CREATE INDEX CONCURRENTLY on partitioned table

Note, this effectively reverts 050098b14, so take care to not reintroduce the
bug it fixed.
---
 doc/src/sgml/ref/create_index.sgml |   9 --
 src/backend/commands/indexcmds.c   | 143 ++---
 src/test/regress/expected/indexing.out |  60 ++-
 src/test/regress/sql/indexing.sql  |  18 +++-
 4 files changed, 176 insertions(+), 54 deletions(-)

diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 965dcf472c..7c75119d78 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -686,15 +686,6 @@ Indexes:
 cannot.

 
-   
-Concurrent builds for indexes on partitioned tables are currently not
-supported.  However, you may concurrently build the index on each
-partition individually and then finally create the partitioned index
-non-concurrently in order to reduce the time where writes to the
-partitioned table will be locked out.  In this case, building the
-partitioned index is a metadata only operation.
-   
-
   
  
 
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 8bc652ecd3..9ab1a66971 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,6 +68,7 @@
 
 
 /* non-export function prototypes */
+static void reindex_invalid_child_indexes(Oid indexRelationId);
 static bool CompareOpclassOptions(Datum *opts1, Datum *opts2, int natts);
 static void CheckPredicate(Expr *predicate);
 static void ComputeIndexAttrs(IndexInfo *indexInfo,
@@ -680,17 +681,6 @@ DefineIndex(Oid relationId,
 	partitioned = rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE;
 	if (partitioned)
 	{
-		/*
-		 * Note: we check 'stmt->concurrent' rather than 'concurrent', so that
-		 * the error is thrown also for temporary tables.  Seems better to be
-		 * consistent, even though we could do it on temporary table because
-		 * we're not actually doing it concurrently.
-		 */
-		if (stmt->concurrent)
-			ereport(ERROR,
-	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-	 errmsg("cannot create index on partitioned table \"%s\" concurrently",
-			RelationGetRelationName(rel;
 		if (stmt->excludeOpNames)
 			ereport(ERROR,
 	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1128,6 +1118,11 @@ DefineIndex(Oid relationId,
 		if (pd->nparts != 0)
 			flags |= INDEX_CREATE_INVALID;
 	}
+	else if (concurrent && OidIsValid(parentIndexId))
+	{
+		/* If concurrent, initially build index partitions as "invalid" */
+		flags |= INDEX_CREATE_INVALID;
+	}
 
 	if (stmt->deferrable)
 		constr_flags |= INDEX_CONSTR_CREATE_DEFERRABLE;
@@ -1183,6 +1178,14 @@ DefineIndex(Oid relationId,
 		partdesc = RelationGetPartitionDesc(rel);
 		if ((!stmt->relation || stmt->relation->inh) && partdesc->nparts > 0)
 		{
+			/*
+			 * Need to close the relation before recursing into children, so
+			 * copy needed data into a longlived context.
+			 */
+
+			MemoryContext	ind_context = AllocSetContextCreate(PortalContext, "CREATE INDEX",
+	ALLOCSET_DEFAULT_SIZES);
+			MemoryContext	oldcontext = MemoryContextSwitchTo(ind_context);
 			int			nparts = partdesc->nparts;
 			Oid		   *part_oids = palloc(sizeof(Oid) * nparts);
 			bool		invalidate_parent = false;
@@ -1193,8 +1196,10 @@ DefineIndex(Oid relationId,
 		 nparts);
 
 			memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
+			parentDesc = CreateTupleDescCopy(RelationGetDescr(rel));
+			table_close(rel, NoLock);
+			MemoryContextSwitchTo(oldcontext);
 
-			parentDesc = RelationGetDescr(rel);
 			opfamOids = palloc(sizeof(Oid) * numberOfKeyAttributes);
 			for (i = 0; i < numberOfKeyAttributes; i++)
 opfamOids[i] = get_opclass_family(classObjectId[i]);
@@ -1237,10 +1242,12 @@ DefineIndex(Oid relationId,
 	continue;
 }
 
+oldcontext = MemoryContextSwitchTo(ind_context);
 childidxs = RelationGetIndexList(childrel);
 attmap =
 	build_attrmap_by_name(RelationGetDescr(childrel),
 		  parentDesc);
+MemoryContextSwitchTo(oldcontext);
 
 foreach(cell, childidxs)
 {
@@ -1311,10 +1318,14 @@ DefineIndex(Oid relationId,
  */
 if (!found)
 {
-	IndexStmt  *childStmt = copyObject(stmt);
+	IndexStmt  *childStmt;
 	bool		found_whole_row;
 	ListCell   *lc;
 
+	oldcontext = MemoryContextSwitchTo(ind_context);
+	childStmt = copyObject(stmt);
+

Re: Interest in GSoC 2021 Projects

2021-02-26 Thread Stephen Frost

Greetings!

* Yixin Shi (es...@umich.edu) wrote:
> I am Yixin Shi, a junior student majoring in Computer Science at University
> of Michigan. I notice that the project ideas for GSoC 2021 have been posted
> on your webpage and I am quite interested in two of them. I really wish to
> take part in the project *Make plsample a more complete procedural language
> handler example (2021)* considering both my interest and ability. The
> potential mentor for this Project is *Mark Wong*. I am also interested in
> the project *Improve PostgreSQL Regression Test Coverage (2021)* and the
> potential mentor is *Andreas Scherbaum* and *Stephen Frost*.
> 
> I would like to learn more about these two projects but failed to contact
> the mentors. How can I contact them? Also, I really hope to join the
> project. Are there any suggestions on application?

Glad to hear you're interested- I'm one of the mentors you mention.

You've found me, but I do want to point out that I'm also listed with my
email address at: https://wiki.postgresql.org/wiki/GSoC ; the top-level
PostgreSQL GSoC page (I've also added a link to that from the GSoC 2021
project ideas page).

Note that PostgreSQL hasn't yet been formally accepted into the
GSoC 2021 program.  Google has said they intend to announce the chosen
organizations on March 9th.  Of course, you're welcome to start working
on your application even ahead of that date if you wish to do so.

Google provides a lot of useful information for prospective students
here: https://google.github.io/gsocguides/student/ including a lot of
information about how to write a solid proposal.  For the regression
test coverage, the first step would be to review, as suggested in the
project idea, the current state of coverage which you can see here:
https://coverage.postgresql.org/  You'd also want to pull PG down and
make sure that you have an environment where you can build PG and run
the regression tests and build the coverage report yourself.  With that
done, you can consider what areas you'd like to tackle, perhaps starting
out with some of the simpler cases such as extensions that maybe don't
have any test coverage today, or the various tools which have zero or
very little.

I've also CC'd Mark so he can provide some comments regarding plsample.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: SSL SNI

2021-02-26 Thread Stephen Frost

Greetings,

* Peter Eisentraut (peter.eisentr...@enterprisedb.com) wrote:
> A customer asked about including Server Name Indication (SNI) into the SSL
> connection from the client, so they can use an SSL-aware proxy to route
> connections.  There was a thread a few years ago where this was briefly
> discussed but no patch appeared.[0]  I whipped up a quick patch and it did
> seem to do the job, so I figured I'd share it here.

This doesn't actually result in the ability to do that SSL connection
proxying though, does it?  At the least, whatever the proxy is would
have to be taught how to send back an 'S' to the client, and send an 'S'
to the server chosen after the client sends along the TLS ClientHello w/
SNI set, and then pass the traffic between afterwards.

Perhaps it's worth doing this to allow proxy developers to do that, but
this isn't enough to make it actually work without the proxy actually
knowing PG and being able to be configured to do the right thing for the
PG protocol.  I would think that, ideally, we'd have some proxy author
who would be willing to actually implement this and test that it all
works with this patch applied, and then make sure to explain that
proxies will need to be adapted to be able to work.  Simply including
this and then putting in the release notes that we now provide SNI as
part of the SSL connection would likely lead people to believe that
it'll 'just work'.  Perhaps to manage expectations we'd want to say
something like:

- libpq will now include Server Name Indication as part of the
  PostgreSQL SSL startup process; proxies will need to understand the
  PostgreSQL protocol in order to be able to leverage this to perform
  routing.

Or something along those lines, I would think.  Of course, such a proxy
would need to also understand how to tell a client that, for example,
GSSAPI encryption isn't available if a 'G' came first from the client,
and what to do if a plaintext connection was requested.

> The question I had was whether this should be an optional behavior, or
> conversely a behavior that can be turned off, or whether it should just be
> turned on all the time.

Certainly seems like something that we should support turning off, at
least.  As mentioned elsewhere, knowing the system that's being
connected to is certainly interesting to attackers.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Some regular-expression performance hacking

2021-02-26 Thread Joel Jacobson

Hi,

On Fri, Feb 26, 2021, at 01:16, Tom Lane wrote:
> 0007-smarter-regex-allocation-2.patch

I've successfully tested this patch.

I had to re-create the performance_test table
since some cases the previously didn't give an error,
now gives error "invalid regular expression: invalid character range".
This is expected and of course an improvement,
but just wanted to explain why the number of rows
don't match the previous test runs.

CREATE TABLE performance_test AS
SELECT
  subjects.subject,
  patterns.pattern,
  patterns.flags,
  tests.is_match,
  tests.captured
FROM tests
JOIN subjects ON subjects.subject_id = tests.subject_id
JOIN patterns ON patterns.pattern_id = subjects.pattern_id
WHERE tests.error IS NULL
--
-- the below part is added to ignore cases
-- that now results in error:
--
AND NOT EXISTS (
  SELECT 1 FROM deviations
  WHERE deviations.test_id = tests.test_id
  AND deviations.error IS NOT NULL
);
SELECT 3253889

Comparing 13.2 with HEAD,
not a single test resulted in a different is_match value,
i.e. the test just using the ~ regex operator,
to only check if it matches or not. Good.

SELECT COUNT(*)
FROM deviations
JOIN tests ON tests.test_id = deviations.test_id
WHERE tests.is_match <> deviations.is_match

count
---
 0
(1 row)

The below query shows a frequency count per error message:

SELECT error, COUNT(*)
FROM deviations
GROUP BY 1
ORDER BY 2 DESC

error| count
-+
 | 106173
regexp_match() does not support the "global" option |   5799
invalid regular expression: invalid character range |   1060
invalid regular expression option: "y"  |277
(4 rows)

As we can see, 106173 cases now goes through without an error,
that previously gave an error. This is thanks to now allowing escape
sequences within bracket expressions.

The other errors are expected and all good.

End of correctness analysis. Now let's look at performance!
I reran the same query three times to get a feeling for the stddev.

\timing

SELECT
  is_match <> (subject ~ pattern),
  captured IS DISTINCT FROM regexp_match(subject, pattern, flags),
  COUNT(*)
FROM performance_test
GROUP BY 1,2
ORDER BY 1,2;

?column? | ?column? |  count
--+--+-
f| f| 3253889
(1 row)

HEAD (b3a9e9897ec702d56602b26a8cdc0950f23b29dc)
Time: 125938.747 ms (02:05.939)
Time: 125414.792 ms (02:05.415)
Time: 126185.496 ms (02:06.185)

HEAD 
(b3a9e9897ec702d56602b26a8cdc0950f23b29dc)+0007-smarter-regex-allocation-2.patch

?column? | ?column? |  count
--+--+-
f| f| 3253889
(1 row)

Time: 89145.030 ms (01:29.145)
Time: 89083.210 ms (01:29.083)
Time: 89166.442 ms (01:29.166)

That's a 29% speed-up compared to HEAD! Truly amazing.

Let's have a look at the total speed-up compared to PostgreSQL 13.

In my previous benchmarks testing against old versions,
I used precompiled binaries, but this time I compiled REL_13_STABLE:

Time: 483390.132 ms (08:03.390)

That's a 82% speed-up in total! Amazing!

/Joel

Re: Postgresql network transmission overhead

2021-02-26 Thread Tom Lane

"Michael J. Baars"  writes:
> In the logfile you can see that the effective user data being written is only 
> 913kb, while the actual being transmitted over the network is 7946kb when 
> writing
> one row at a time. That is an overhead of 770%!

So ... don't write one row at a time.

You haven't shown any details, but I imagine that most of the overhead
comes from per-query stuff like the RowDescription metadata.  The intended
usage pattern for bulk operations is that there's only one RowDescription
message for a whole lot of data rows.  There might be reasons you want to
work a row at a time, but if your concern is to minimize network traffic,
don't do that.

regards, tom lane

RE: [HACKERS] logical decoding of two-phase transactions

2021-02-26 Thread osumi.takami...@fujitsu.com

Hi


On Thursday, February 25, 2021 4:02 PM Peter Smith 
> Please find attached the latest patch set v43*
> 
> - Added new patch 0007 "Fix apply worker prepare" as discussed here [2]
>
> [2]
> https://www.postgresql.org/message-id/CAA4eK1L%3DdhuCRvyDvrXX5wZ
> gc7s1hLRD29CKCK6oaHtVCPgiFA%40mail.gmail.com
I tested the scenario that
we resulted in skipping prepared transaction data and
the replica became out of sync, which was described in [2].
And, as you said, the problem is addressed in v43.

I used twophase_snapshot.spec as a reference
for the flow (e.g. how to make a consistent snapshot
between prepare and commit prepared) and this time,
as an alternative of the SQL API(pg_create_logical_replication_slot),
I issued CREATE SUBSCRIPTION, and other than that,
I followed other flows in the spec file mainly.

I checked that the replica has the same data at the end of this test,
which means the mechanism of spoolfile works.

Best Regards,
Takamichi Osumi

Re: repeated decoding of prepared transactions

2021-02-26 Thread vignesh C

On Fri, Feb 26, 2021 at 4:13 PM Ajin Cherian  wrote:
>
> On Fri, Feb 26, 2021 at 7:47 PM Ajin Cherian  wrote:
>
> > I've updated snapshot_was_exported_at_  member to pg_replication_slots as 
> > well.
> > Do have a look and let me know if there are any comments.
>
> Update with both patches.

Thanks for fixing and providing an updated patch. Patch applies, make
check and make check-world passes. I could see the issue working fine.

Few minor comments:
+   snapshot_was_exported_at pg_lsn
+  
+  
+   The address (LSN) at which the logical
+   slot found a consistent point at the time of slot creation.
+   NULL for physical slots.
+  
+ 

I had seen earlier also we had some discussion on naming
snapshot_was_exported_at. Can we change snapshot_was_exported_at to
snapshot_exported_lsn, I felt if we can include the lsn in the name,
the user will be able to interpret easily and also it will be similar
to other columns in pg_replication_slots view.

 L.restart_lsn,
 L.confirmed_flush_lsn,
+   L.snapshot_was_exported_at,
 L.wal_status,
 L.safe_wal_size

Looks like there is some indentation issue here.

Regards,
Vignesh

Re: [PATCH] Note effect of max_replication_slots on subscriber side in documentation.

2021-02-26 Thread Amit Kapila

On Fri, Feb 26, 2021 at 1:53 AM Paul Martinez  wrote:
>
> On Thu, Feb 25, 2021 at 5:31 AM Amit Kapila  wrote:
> >
> > For docs only patch, I have few suggestions:
> > 1. On page [1], it is not very clear that we are suggesting to set
> > max_replication_slots for origins whereas your new doc patch has
> > clarified it, can we update the other page as well.
>
> Sorry, what other page are you referring to?
>

https://www.postgresql.org/docs/devel/logical-replication-config.html

>
> > 2.
> > Setting it a lower value than the current
> > + number of tracked replication origins (reflected in
> > +  > linkend="view-pg-replication-origin-status">pg_replication_origin_status,
> > + not  > linkend="catalog-pg-replication-origin">pg_replication_origin)
> > + will prevent the server from starting.
> > +
> >
> > Why can't we just mention pg_replication_origin above?
> >
>
> So this is slightly confusing:
>
> pg_replication_origin just contains mappings from origin names to oids.
> It is regular catalog table and has no limit on its size. Users can also
> manually insert rows into this table.
>
> https://www.postgresql.org/docs/13/catalog-pg-replication-origin.html
>
> The view showing the in-memory information is actually
> pg_replication_origin_status. The number of entries here is what is
> actually constrained by the GUC parameter.
>

Okay, that makes sense. However, I have sent a patch today (see [1])
where I have slightly updated the subscriber-side configuration
paragraph. From PG-14 onwards, table synchronization workers also use
origins on subscribers, so you might want to adjust.

>
>
> This also brings up a point regarding the naming of the added GUC.
> max_replication_origins is cleanest, but has this confusion regarding
> pg_replication_origin vs. pg_replication_origin_status.
> max_replication_origin_statuses is weird (and long).
> max_tracked_replication_origins is a possibility?
>

or maybe max_replication_origin_states. I guess we can leave adding
GUC to some other day as that might require a bit broader acceptance
and we are already near to the start of last CF. I think we can still
consider it if we few more people share the same opinion as yours.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1KkbppndxxRKbaT2sXrLkdPwy44F4pjEZ0EDrVjD9MPjQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: [PATCH] pgbench: Remove ecnt, a member variable of CState

2021-02-26 Thread Michael Paquier

On Fri, Feb 26, 2021 at 05:39:31PM +0900, miyake_kouta wrote:
> Also, the current pgbench's client abandons processing after hitting error,
> so this variable is no need, I think.

Agreed.  Its last use was in 12788ae, as far as I can see.  So let's
just cleanup that.
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] pgbench: Bug fix for the -d option

2021-02-26 Thread Michael Paquier

On Fri, Feb 26, 2021 at 05:16:17PM +0900, Michael Paquier wrote:
> Yes, the existing code could mess up with the selection logic if
> PGPORT and PGUSER are both specified in an environment, masking the
> value of PGUSER, so let's fix that.  This is as old as 412893b.

By the way, I can help but wonder why pgbench has such a different
handling for the user name, fetching first PGUSER and then looking at
the options while most of the other binaries use get_user_name().  It
seems to me that we could simplify the handling around "login" without
really impacting the usefulness of the tool, no?
--
Michael


signature.asc
Description: PGP signature

Re: Add --tablespace option to reindexdb

2021-02-26 Thread Michael Paquier

On Fri, Feb 26, 2021 at 11:00:13AM +0100, Daniel Gustafsson wrote:
> Some other small comments:
> 
> + Assert(PQserverVersion(conn) >= 14);
> Are these assertions really buying us much when we already check the server
> version in reindex_one_database()?

I found these helpful when working on vacuumdb and refactoring the
code, though I'd agree this code may not justify going down to that.

> + printf(_("  --tablespace=TABLESPACE  reindex on specified 
> tablespace\n"));
> While not introduced by this patch, I realized that we have a mix of
> conventions for how to indent long options which don't have a short option.
> Under "Connection options" all options are left-justified but under "Options"
> all long-options are aligned with space indentation for missing shorts.  Some
> tools do it like this, where others like createuser left justifies under
> Options as well.  Maybe not the most pressing issue, but consistency is always
> good in user interfaces so maybe it's worth addressing (in a separate patch)?

Yeah, consistency is good, though I am not sure which one would be
consistent here actually as there is no defined rule.  For this one,
I think that I would keep what I have to be consistent with the
existing inconsistency (?).  In short, I would just add --tablespace
the same way there is --concurrently, without touching the connection
option part.
--
Michael


signature.asc
Description: PGP signature

Re: non-HOT update not looking at FSM for large tuple update

2021-02-26 Thread John Naylor

On Wed, Feb 24, 2021 at 6:29 PM Floris Van Nee 
wrote:
>
>
> > That makes sense, although the exact number seems precisely tailored to
your use case. 2% gives 164 bytes of slack and doesn't seem too large.
Updated patch attached.
>
> Yeah, I tried picking it as conservative as I could, but 2% is obviously
great too. :-) I can't think of any large drawbacks either of having a
slightly larger value.
> Thanks for posting the patch!

I've added this to the commitfest as a bug fix and added you as an author.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Optimising latch signals

2021-02-26 Thread Thomas Munro

Here's a new version with two small changes from Andres:
1.  Reorder InitPostmasterChild() slightly to avoid hanging on
EXEC_BACKEND builds.
2.  Revert v2's use of raise(x) instead of kill(MyProcPid, x); glibc
manages to generate 5 syscalls for raise().

I'm planning to commit this soon if there are no objections.
From acacbf06aa54b4b139553b08d7f8511b0aaa0331 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sat, 8 Aug 2020 15:08:09 +1200
Subject: [PATCH v5 1/5] Optimize latches to send fewer signals.

Don't send signals to processes that aren't currently sleeping.

Author: Andres Freund 
Discussion: https://postgr.es/m/ca+hukgjjxpdpzbe0a3hyuywbvazuc89yx3jk9rfzgfv_khu...@mail.gmail.com
---
 src/backend/storage/ipc/latch.c | 24 
 src/include/storage/latch.h |  1 +
 2 files changed, 25 insertions(+)

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index f2d005eea0..04aed96207 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -274,6 +274,7 @@ void
 InitLatch(Latch *latch)
 {
 	latch->is_set = false;
+	latch->maybe_sleeping = false;
 	latch->owner_pid = MyProcPid;
 	latch->is_shared = false;
 
@@ -321,6 +322,7 @@ InitSharedLatch(Latch *latch)
 #endif
 
 	latch->is_set = false;
+	latch->maybe_sleeping = false;
 	latch->owner_pid = 0;
 	latch->is_shared = true;
 }
@@ -523,6 +525,10 @@ SetLatch(Latch *latch)
 
 	latch->is_set = true;
 
+	pg_memory_barrier();
+	if (!latch->maybe_sleeping)
+		return;
+
 #ifndef WIN32
 
 	/*
@@ -589,6 +595,7 @@ ResetLatch(Latch *latch)
 {
 	/* Only the owner should reset the latch */
 	Assert(latch->owner_pid == MyProcPid);
+	Assert(latch->maybe_sleeping == false);
 
 	latch->is_set = false;
 
@@ -1289,6 +1296,14 @@ WaitEventSetWait(WaitEventSet *set, long timeout,
 		 * ordering, so that we cannot miss seeing is_set if a notification
 		 * has already been queued.
 		 */
+		if (set->latch && !set->latch->is_set)
+		{
+			/* about to sleep on a latch */
+			set->latch->maybe_sleeping = true;
+			pg_memory_barrier();
+			/* and recheck */
+		}
+
 		if (set->latch && set->latch->is_set)
 		{
 			occurred_events->fd = PGINVALID_SOCKET;
@@ -1299,6 +1314,9 @@ WaitEventSetWait(WaitEventSet *set, long timeout,
 			occurred_events++;
 			returned_events++;
 
+			/* could have been set above */
+			set->latch->maybe_sleeping = false;
+
 			break;
 		}
 
@@ -1310,6 +1328,12 @@ WaitEventSetWait(WaitEventSet *set, long timeout,
 		rc = WaitEventSetWaitBlock(set, cur_timeout,
    occurred_events, nevents);
 
+		if (set->latch)
+		{
+			Assert(set->latch->maybe_sleeping);
+			set->latch->maybe_sleeping = false;
+		}
+
 		if (rc == -1)
 			break;/* timeout occurred */
 		else
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index 1468f30a8e..393591be03 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -110,6 +110,7 @@
 typedef struct Latch
 {
 	sig_atomic_t is_set;
+	sig_atomic_t maybe_sleeping;
 	bool		is_shared;
 	int			owner_pid;
 #ifdef WIN32
-- 
2.30.0

From af31f00b4df0b65f177696891180345da8dabaee Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 26 Nov 2020 10:18:29 +1300
Subject: [PATCH v5 2/5] Use SIGURG rather than SIGUSR1 for latches.

Traditionally, SIGUSR1 has been overloaded for ad-hoc signals,
procsignal.c signals and latch.c wakeups.  Move that last use over to a
new dedicated signal.  SIGURG is normally used to report out-of-band
socket data, but PostgreSQL doesn't use that.

The signal handler is now installed in all postmaster children by
InitializeLatchSupport().  Those wishing to disconnect from it should
call ShutdownLatchSupport() which will set it back to SIG_IGN.

Future patches will use this separation to avoid the need for a signal
handler on some operating systems.

Discussion: https://postgr.es/m/ca+hukgjjxpdpzbe0a3hyuywbvazuc89yx3jk9rfzgfv_khu...@mail.gmail.com
---
 src/backend/postmaster/bgworker.c| 19 ++-
 src/backend/postmaster/postmaster.c  |  4 +++
 src/backend/storage/ipc/latch.c  | 50 ++--
 src/backend/storage/ipc/procsignal.c |  2 --
 src/include/storage/latch.h  | 11 +-
 5 files changed, 40 insertions(+), 46 deletions(-)

diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 6fdea3fc2d..bbbc09b0b5 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -713,22 +713,6 @@ bgworker_die(SIGNAL_ARGS)
 	MyBgworkerEntry->bgw_type)));
 }
 
-/*
- * Standard SIGUSR1 handler for unconnected workers
- *
- * Here, we want to make sure an unconnected worker will at least heed
- * latch activity.
- */
-static void
-bgworker_sigusr1_handler(SIGNAL_ARGS)
-{
-	int			save_errno = errno;
-
-	latch_sigusr1_handler();
-
-	errno = save_errno;
-}
-
 /*
  * Start a new background worker
  *
@@ -759,6 +743,7 @@ StartBackgroundWorker(void)
 	 */
 	if ((worker->bgw_flags & BGWORKER_SHMEM_ACCESS) == 0)

Re: [HACKERS] logical decoding of two-phase transactions

2021-02-26 Thread Amit Kapila

On Fri, Feb 26, 2021 at 9:56 AM Amit Kapila  wrote:
>
> On Thu, Feb 25, 2021 at 12:32 PM Peter Smith  wrote:
> >
>
> 5. You need to write/sync the spool file at prepare time because after
> restart between prepare and commit prepared the changes can be lost
> and won't be resent by the publisher assuming there are commits of
> other transactions between prepare and commit prepared. For the same
> reason, I am not sure if we can just rely on the in-memory hash table
> for it (prepare_spoolfile_exists). Sure, if it exists and there is no
> restart then it would be cheap to check in the hash table but I don't
> think it is guaranteed.
>

As we can't rely on the hash table, I think we can get rid of it and
always check if the corresponding file exists.

Few more comments on v43-0007-Fix-apply-worker-empty-prepare

1.
+ * So the "table_states_not_ready" list might end up having a READY
+ * state it it even though

The above sentence doesn't sound correct to me.

2.
@@ -759,6 +798,79 @@ apply_handle_begin_prepare(StringInfo s)
{
..
+ */
+ if (!am_tablesync_worker())
+ {

I think here we should have an Assert for tablesync worker because it
should never receive prepare.

3.
+ while (BusyTablesyncs())
+ {
+ elog(DEBUG1, "apply_handle_begin_prepare - waiting for all sync
workers to be DONE/READY");
+
+ process_syncing_tables(begin_data.end_lsn);

..
+ if (begin_data.end_lsn < BiggestTablesyncLSN()

In both the above places, you need to use begin_data.final_lsn because
the prepare is yet not replayed so we can't use its end_lsn for
syncup.

4.
+/*
+ * Are there any tablesyncs which have still not yet reached
SYNCDONE/READY state?
+ */
+bool
+BusyTablesyncs()

The function name is not clear enough. Can we change it to something
like AnyTableSyncInProgress?

5.
+/*
+ * Are there any tablesyncs which have still not yet reached
SYNCDONE/READY state?
+ */
+bool
+BusyTablesyncs()
{
..
+ /*
+ * XXX - When the process_syncing_tables_for_sync changes the state
+ * from SYNCDONE to READY, that change is actually written directly

In the above comment, do you mean to process_syncing_tables_for_apply
because that is where we change state to READY? And, I don't think we
need to mark this comment as XXX.

6.
+ * XXX - Is there a potential timing problem here - e.g. if signal arrives
+ * while executing this then maybe we will set table_states_valid without
+ * refetching them?
+ */
+static void
+FetchTableStates(bool *started_tx)
..

Can you explain which race condition you are worried about here which
is not possible earlier but can happen after this patch?

7.
@@ -941,6 +1162,26 @@ apply_handle_stream_prepare(StringInfo s)
  elog(DEBUG1, "received prepare for streamed transaction %u", xid);

  /*
+ * Wait for all the sync workers to reach the SYNCDONE/READY state.
+ *
+ * This is same waiting logic as in appy_handle_begin_prepare function
+ * (see that function for more details about this).
+ */
+ if (!am_tablesync_worker())
+ {
+ while (BusyTablesyncs())
+ {
+ process_syncing_tables(prepare_data.end_lsn);
+
+ /* This latch is to prevent 100% CPU looping. */
+ (void) WaitLatch(MyLatch,
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+ 1000L, WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE);
+ ResetLatch(MyLatch);
+ }
+ }

I think we need similar handling in stream_prepare as in begin_prepare
for writing to spool file because this has the same danger. But here
we need to write it xid spool file in StreamXidHash. Another thing we
need to ensure is to sync that file in stream prepare so that it can
survive restarts. Then in apply_handle_commit_prepared, after checking
for prepared spool file, we need to check the existence of xid spool
file, and if the same exists then apply messages from that file.

Again, like begin_prepare, in apply_handle_stream_prepare also we
should have an Assert for table sync worker.

I feel that 2PC and streaming case is a bit complicated to deal with.
How about, for now, we won't allow users to enable streaming if 2PC
option is enabled for Subscription. This requires some change (error
out if both streaming and 2PC options are enabled) in both
createsubscrition and altersubscription but that change should be
fairly small. If we follow this, then in apply_dispatch (for case
LOGICAL_REP_MSG_STREAM_PREPARE), we should report an ERROR "invalid
logical replication message type".

-- 
With Regards,
Amit Kapila.

Re: repeated decoding of prepared transactions

2021-02-26 Thread Ajin Cherian

On Fri, Feb 26, 2021 at 7:47 PM Ajin Cherian  wrote:

> I've updated snapshot_was_exported_at_  member to pg_replication_slots as 
> well.
> Do have a look and let me know if there are any comments.

Update with both patches.

regards,
Ajin Cherian
Fujitsu Australia


v4-0002-Add-option-to-enable-two-phase-commits-in-pg_crea.patch
Description: Binary data


v4-0001-Avoid-repeated-decoding-of-prepared-transactions.patch
Description: Binary data

Re: REINDEX backend filtering

2021-02-26 Thread Magnus Hagander

On Fri, Feb 26, 2021 at 11:07 AM Julien Rouhaud  wrote:
>
> On Fri, Feb 26, 2021 at 5:50 PM Magnus Hagander  wrote:
> >
> > I don't agree with the conclusion though.
> >
> > The most common case of that will be in the case of an upgrade. In
> > that case I want to reindex all of those indexes as quickly as
> > possible. So I'll want to parallelize it across multiple sessions
> > (like reindexdb -j 4 or whatever). But doesn't putting the filter in
> > the grammar prevent me from doing exactly that? Each of those 4 (or
> > whatever) sessions would have to guess which would go where and then
> > speculatively run the command on that, instead of being able to
> > directly distribute the worload?
>
> It means that you'll have to distribute the work on a per-table basis
> rather than a per-index basis.  The time spent to find out that a
> table doesn't have any impacted index should be negligible compared to
> the cost of running a reindex.  This obviously won't help that much if
> you have a lot of table but only one being gigantic.

Yeah -- or at least a couple of large and many small, which I find to
be a very common scenario. Or the case of some tables having many
affected indexes and some having few.

You'd basically want to order the operation by table on something like
"total size of the affected indexes on table x" -- which may very well
put a smaller table with many indexes earlier in the queue. But you
can't do that without having access to the filter

> But even if we put the logic in the client, this still won't help as
> reindexdb doesn't support multiple job with an index list:
>
>  * Index-level REINDEX is not supported with multiple jobs as we
>  * cannot control the concurrent processing of multiple indexes
>  * depending on the same relation.
>  */
> if (concurrentCons > 1 && indexes.head != NULL)
> {
> pg_log_error("cannot use multiple jobs to reindex indexes");
> exit(1);
> }

That sounds like it would be a fixable problem though, in principle.
It could/should probably still limit all indexes on the same table to
be processed in the same connection for the locking reasons of course,
but doing an order by the total size of the indexes like above, and
ensuring that they are grouped that way, doesn't sound *that* hard. I
doubt it's that important in the current usecase of manually listing
the indexes, but it would be useful for something like this.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: REINDEX backend filtering

2021-02-26 Thread Julien Rouhaud

On Fri, Feb 26, 2021 at 5:50 PM Magnus Hagander  wrote:
>
> I don't agree with the conclusion though.
>
> The most common case of that will be in the case of an upgrade. In
> that case I want to reindex all of those indexes as quickly as
> possible. So I'll want to parallelize it across multiple sessions
> (like reindexdb -j 4 or whatever). But doesn't putting the filter in
> the grammar prevent me from doing exactly that? Each of those 4 (or
> whatever) sessions would have to guess which would go where and then
> speculatively run the command on that, instead of being able to
> directly distribute the worload?

It means that you'll have to distribute the work on a per-table basis
rather than a per-index basis.  The time spent to find out that a
table doesn't have any impacted index should be negligible compared to
the cost of running a reindex.  This obviously won't help that much if
you have a lot of table but only one being gigantic.

But even if we put the logic in the client, this still won't help as
reindexdb doesn't support multiple job with an index list:

 * Index-level REINDEX is not supported with multiple jobs as we
 * cannot control the concurrent processing of multiple indexes
 * depending on the same relation.
 */
if (concurrentCons > 1 && indexes.head != NULL)
{
pg_log_error("cannot use multiple jobs to reindex indexes");
exit(1);
}

Re: Disallow SSL compression?

2021-02-26 Thread Magnus Hagander

On Mon, Feb 22, 2021 at 12:28 PM Daniel Gustafsson  wrote:
>
> > On 22 Feb 2021, at 11:52, Magnus Hagander  wrote:
> >
> > On Thu, Feb 18, 2021 at 1:51 PM Daniel Gustafsson  wrote:
> >>
> >> A few years ago we discussed whether to disable SSL compression [0] which 
> >> ended
> >> up with it being off by default combined with a recommendation against it 
> >> in
> >> the docs.
> >>
> >> OpenSSL themselves disabled SSL compression by default in 2016 in 1.1.0 
> >> with
> >> distros often having had it disabled for a long while before then.  
> >> Further,
> >> TLSv1.3 removes compression entirely on the protocol level mandating that 
> >> only
> >> NULL compression is allowed in the ClientHello.  NSS, which is discussed in
> >> another thread, removed SSL compression entirely in version 3.33 in 2017.
> >>
> >> It seems about time to revisit this since it's unlikely to work anywhere 
> >> but in
> >> a very small subset of system setups (being disabled by default 
> >> everywhere) and
> >> is thus likely to be very untested at best.  There is also the security 
> >> aspect
> >> which is less clear-cut for us compared to HTTP client/servers, but not 
> >> refuted
> >> (the linked thread has a good discussion on this).
> >
> > Agreed. It will also help with not having to implement it in new SSL
> > implementations I'm sure :)
>
> Not really, no other TLS library I would consider using actually has
> compression (except maybe wolfSSL?).  GnuTLS and NSS both removed it, and
> Secure Transport and Schannel never had it AFAIK.

Ah, well, you'd still have to implement some empty placeholder :)


> >> The attached removes sslcompression to see what it would look like.  The 
> >> server
> >> actively disallows it and the parameter is removed, but the sslcompression
> >> column in the stat view is retained.  An alternative could be to retain the
> >> parameter but not act on it in order to not break scripts etc, but that 
> >> just
> >> postpones the pain until when we inevitably do remove it.
> >>
> >> Thoughts?  Any reason to keep supporting SSL compression or is it time for 
> >> v14
> >> to remove it?  Are there still users leveraging this for protocol 
> >> compression
> >> without security making it worthwhile to keep?
> >
> > When the last round of discussion happened, I had multiple customers
> > who did exactly that. None of them do that anymore, due to the pain of
> > making it work...
>
> Unsurprisingly.
>
> > I think for libpq we want to keep the option for a while but making it
> > a no-op, to not unnecessarily break systems where people just upgrade
> > libpq, though. And document it as such having no effect and "will
> > eventually be removed".
>
> Agreed, that's better.

In fact, pg_basebackup with -R will generate a connection string that
includes sslcompression=0 when used today (unless you jump through the
hoops to make it work), so not accepting that noe would definitely
break a lot of things needlessly.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Add --tablespace option to reindexdb

2021-02-26 Thread Daniel Gustafsson

> On 26 Feb 2021, at 07:49, Michael Paquier  wrote:

> Since c5b2860, it is possible to specify a tablespace for a REINDEX,
> but the equivalent option has not been added to reindexdb.  Attached
> is a patch to take care of that.
> 
> This includes documentation and tests.

Makes sense.

> While on it, I have added tests for toast tables and indexes with a
> tablespace move during a REINDEX.  Those operations fail, but it is
> not possible to get that into the main regression test suite because
> the error messages include the relation names so that's unstable.
> Well, it would be possible to do that for the non-concurrent case
> using a TRY/CATCH block in a custom function but that would not work
> with CONCURRENTLY.  Anyway, I would rather group the whole set of
> tests together, and using the --tablespace option introduced here
> within a TAP test does the job.

Agreed, doing it with a TAP test removes the complication.

Some other small comments:

+   Assert(PQserverVersion(conn) >= 14);
Are these assertions really buying us much when we already check the server
version in reindex_one_database()?

+   printf(_("  --tablespace=TABLESPACE  reindex on specified 
tablespace\n"));
While not introduced by this patch, I realized that we have a mix of
conventions for how to indent long options which don't have a short option.
Under "Connection options" all options are left-justified but under "Options"
all long-options are aligned with space indentation for missing shorts.  Some
tools do it like this, where others like createuser left justifies under
Options as well.  Maybe not the most pressing issue, but consistency is always
good in user interfaces so maybe it's worth addressing (in a separate patch)?

--
Daniel Gustafsson   https://vmware.com/

Re: REINDEX backend filtering

2021-02-26 Thread Magnus Hagander

On Thu, Jan 21, 2021 at 4:13 AM Julien Rouhaud  wrote:
>
> On Wed, Dec 16, 2020 at 8:27 AM Michael Paquier  wrote:
> >
> > On Tue, Dec 15, 2020 at 06:34:16PM +0100, Magnus Hagander wrote:
> > > Is this really a common enough operation that we need it in the main 
> > > grammar?
> > >
> > > Having the functionality, definitely, but what if it was "just" a
> > > function instead? So you'd do something like:
> > > SELECT 'reindex index ' || i FROM pg_blah(some, arguments, here)
> > > \gexec
> > >
> > > Or even a function that returns the REINDEX commands directly (taking
> > > a parameter to turn on/off concurrency for example).
> > >
> > > That also seems like it would be easier to make flexible, and just as
> > > easy to plug into reindexdb?
> >
> > Having control in the grammar to choose which index to reindex for a
> > table is very useful when it comes to parallel reindexing, because
> > it is no-brainer in terms of knowing which index to distribute to one
> > job or another.  In short, with this grammar you can just issue a set
> > of REINDEX TABLE commands that we know will not conflict with each
> > other.  You cannot get that level of control with REINDEX INDEX as it
> > may be possible that more or more commands conflict if they work on an
> > index of the same relation because it is required to take lock also on
> > the parent table.  Of course, we could decide to implement a
> > redistribution logic in all frontend tools that need such things, like
> > reindexdb, but that's not something I think we should let the client
> > decide of.  A backend-side filtering is IMO much simpler, less code,
> > and more elegant.
>
> Maybe additional filtering capabilities is not something that will be
> required frequently, but I'm pretty sure that reindexing only indexes
> that might be corrupt is something that will be required often..  So I
> agree, having all that logic in the backend makes everything easier
> for users, having the choice of the tools they want to issue the query
> while still having all features available.

I agree with that scenario -- in that the most common case will be
exactly that of reindexing only indexes that might be corrupt.

I don't agree with the conclusion though.

The most common case of that will be in the case of an upgrade. In
that case I want to reindex all of those indexes as quickly as
possible. So I'll want to parallelize it across multiple sessions
(like reindexdb -j 4 or whatever). But doesn't putting the filter in
the grammar prevent me from doing exactly that? Each of those 4 (or
whatever) sessions would have to guess which would go where and then
speculatively run the command on that, instead of being able to
directly distribute the worload?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: REINDEX backend filtering

2021-02-26 Thread Magnus Hagander

On Wed, Dec 16, 2020 at 1:27 AM Michael Paquier  wrote:
>
> On Tue, Dec 15, 2020 at 06:34:16PM +0100, Magnus Hagander wrote:
> > Is this really a common enough operation that we need it in the main 
> > grammar?
> >
> > Having the functionality, definitely, but what if it was "just" a
> > function instead? So you'd do something like:
> > SELECT 'reindex index ' || i FROM pg_blah(some, arguments, here)
> > \gexec
> >
> > Or even a function that returns the REINDEX commands directly (taking
> > a parameter to turn on/off concurrency for example).
> >
> > That also seems like it would be easier to make flexible, and just as
> > easy to plug into reindexdb?
>
> Having control in the grammar to choose which index to reindex for a
> table is very useful when it comes to parallel reindexing, because
> it is no-brainer in terms of knowing which index to distribute to one
> job or another.  In short, with this grammar you can just issue a set
> of REINDEX TABLE commands that we know will not conflict with each
> other.  You cannot get that level of control with REINDEX INDEX as it

(oops, seems I forgot to reply to this one, sorry!)

Uh, isn't it almost exactly the opposite?

If you do it in the backend grammar you *cannot* parallelize it
between indexes, because you can only run one at a time.

Whereas if you do it in the frontend, you can. Either with something
like reindexdb that could do it automatically, or with psql as a
copy/paste job?


> may be possible that more or more commands conflict if they work on an
> index of the same relation because it is required to take lock also on
> the parent table.  Of course, we could decide to implement a
> redistribution logic in all frontend tools that need such things, like
> reindexdb, but that's not something I think we should let the client
> decide of.  A backend-side filtering is IMO much simpler, less code,
> and more elegant.

It's simpler in the simple case, i agree with that. But ISTM it's also
a lot less flexible for anything except the simplest case...

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: NOT VALID for Unique Indexes

2021-02-26 Thread Simon Riggs

On Mon, Jan 18, 2021 at 11:19 PM japin  wrote:
>
>
> On Fri, 15 Jan 2021 at 00:22, Simon Riggs  
> wrote:
> > As you may be aware the NOT VALID qualifier currently only applies to
> > CHECK and FK constraints, but not yet to unique indexes. I have had
> > customer requests to change that.
> >
> > It's a reasonably common requirement to be able to change an index
> > to/from a unique index, i.e. Unique -> NonUnique or NonUnique to
> > Unique. Previously, it was easy enough to do that using a catalog
> > update, but with security concerns  and the fact that the optimizer
> > uses the uniqueness to optimize queries means that there is a gap in
> > our support. We obviously need to scan the index to see if it actually
> > can be marked as unique.
> >
> > In terms of locking we need to exclude writes while we add uniqueness,
> > so scanning the index to check it is unique would cause problems. So
> > we need to do the same thing as we do with other constraint types: add
> > the constraint NOT VALID in one transaction and then later validate it
> > in a separate transaction (if ever).
> >
> > I present a WIP patch to show it's a small patch to change Uniqueness
> > for an index, with docs and tests.
> >
> > ALTER INDEX SET [NOT] UNIQUE [NOT VALID]
> > ALTER INDEX VALIDATE UNIQUE
> >
> > It doesn't do the index validation scan (yet), but I wanted to check
> > acceptability, syntax and requirements before I do that.
> >
> > I can also add similar syntax for UNIQUE and PK constraints.
> >
> > Thoughts please?
>
> Great! I have some questions.
>
> 1. In the patch, you add a new attribute named "induniquevalid" in pg_index,
>however, there is a "indisvalid" in pg_index, can we use "indisvalid"?

indisvalid already has defined meaning related to creating indexes
concurrently, so I was forced to create another column with a similar
name.

Thanks for reviewing the code in detail.

> 2. The foreign key and CHECK constraints are valid by using
>ALTER TABLE .. ADD table_constraint [ NOT VALID ]
>ALTER TABLE .. VALIDATE CONSTRAINT constraint_name
>
>Should we implement unique index valid/not valid same as foreign key and
>CHECK constraints?

Yes, that is possible. (I wrote the NOT VALID patch for FKs, so was
aware of that).

The syntax I presented was for ALTER INDEX. Not all UNIQUE indexes are
constraints, so it is important to add the option on ALTER INDEX.
Adding the ALTER TABLE syntax can be done later.

> 3. If we use the syntax to valid/not valid the unique, should we support
>other constraints, such as foreign key and CHECK constraints?

I'm sorry, I don't understand that question. FKs and CHECK constrants
are already supported, as you point out above.


I won't be able to finish this patch in time for this next CF, but
thanks for your interest, I will complete for PG15 later this year.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: NOT VALID for Unique Indexes

2021-02-26 Thread Simon Riggs

On Mon, Jan 18, 2021 at 12:34 AM David Fetter  wrote:
>
> On Thu, Jan 14, 2021 at 04:22:17PM +, Simon Riggs wrote:
> > As you may be aware the NOT VALID qualifier currently only applies to
> > CHECK and FK constraints, but not yet to unique indexes. I have had
> > customer requests to change that.
>
> This is a great feature!
>
> Not exactly on point with this, but in a pretty closely related
> context, is there some way we could give people the ability to declare
> at their peril that a constraint is valid without incurring the full
> scan that VALIDATE currently does? This is currently doable by
> fiddling directly with the catalog, which operation is broadly more
> dangerous and ill-advised.

That is what NOT VALID allows, but it can't be relied on for optimization.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

RE: Parallel INSERT (INTO ... SELECT ...)

2021-02-26 Thread tanghy.f...@fujitsu.com

From: Tsunakawa, Takayuki/綱川 貴之 
>the current patch showd nice performance improvement in some (many?) patterns. 
> 
>So, I think it can be committed in PG 14, when it has addressed the plan cache 
>issue that Amit Langote-san posed.

Agreed. 
I summarized my test results for the current patch(V18) in the attached 
file(Please use VSCode or Notepad++ to open it, the context is a bit long). 
As you can see, the performance is good in many patterns(Please refer to  my 
test script, test NO in text is correspond to number in sql file).
If you have any question on my test, please feel free to ask.

Regards,
Tang


max_parallel_workers_per_gather=2
-
Test NO |   test case   

|   query plan  
|   patched(ms) |   master(ms)  |   %reg
--
1   |   Test INSERT with underlying query.  

|   parallel INSERT+SELECT  |   20894.069   |   
27030.623   |   -23%
2   |   Test INSERT into a table with a foreign key.

|   parallel SELECT |   80921.960   |   
84416.775   |   -4%
3   |   Test INSERT with ON CONFLICT ... DO UPDATE  

|   non-parallel|   28111.186   |   
29531.922   |   -5%
4   |   Test INSERT with parallel_leader_participation 
= off;   |  
 parallel INSERT+SELECT  |   2799.354|   5131.874|  
 -45%
5   |   Test INSERT with max_parallel_workers=0;

|   non-parallel|   27006.385   |   
26962.279   |   0%
6   |   Test INSERT with parallelized aggregate 

|   parallel SELECT |   95482.307   |   
105090.301  |   -9%
7   |   Test INSERT with parallel bitmap heap scan  

|   parallel INSERT+SELECT  |   606.664 |   844.471 
|   -28%
8   |   Test INSERT with parallel append

|   parallel INSERT+SELECT  |   109.896 |   
239.198 |   -54%
9   |   Test INSERT with parallel index scan

|   parallel INSERT+SELECT  |   552.481 |   
1238.079|   -55%
10  |   Test INSERT with parallel-safe index expression 

|   parallel INSERT+SELECT  |   1453.686|   2908.489
|   -50%
11  |   Test INSERT with parallel-unsafe index 
expression  
 |   non-parallel|   2828.559|  
 2837.570|   0%
12  |   Test INSERT with underlying query - and 
RETURNING (no projection)   |   
parallel INSERT+SELECT  |   417.493 |   685.430 |   
-39%
13  |   Test INSERT with underlying ordered query - and 
RETURNING (no projection)   |   parallel SELECT 
|   971.095 |   1717.502|   -43%
14  |   Test INSERT with underlying ordered query - and 
RETURNING (with projection) |   parallel SELECT 
|   1002.287|

Re: archive_command / pg_stat_archiver & documentation

2021-02-26 Thread Benoit Lobréau

Done here : https://commitfest.postgresql.org/32/3012/

Le jeu. 25 févr. 2021 à 15:34, Julien Rouhaud  a écrit :

> On Thu, Feb 25, 2021 at 7:25 PM Benoit Lobréau 
> wrote:
> >
> > Le mer. 24 févr. 2021 à 14:52, Julien Rouhaud  a
> écrit :
> >>
> >> I thought that this behavior was documented, especially for the lack
> >> of update of pg_stat_archiver.  If it's not the case then we should
> >> definitely fix that!
> >
> > I tried to do it in the attached patch.
> > Building the doc worked fine on my computer.
>
> Great, thanks!  Can you register it in the next commitfest to make
> sure it won't be forgotten?
>

Re: repeated decoding of prepared transactions

2021-02-26 Thread Ajin Cherian

On Thu, Feb 25, 2021 at 10:36 PM Amit Kapila  wrote:

> Few comments on the first patch:
> 1. We can't remove ReorderBufferSkipPrepare because we rely on that in
> SnapBuildDistributeNewCatalogSnapshot.
> 2. I have changed the name of the variable from
> snapshot_was_exported_at_lsn to snapshot_was_exported_at but I am
> still not very sure about this naming because there are times when we
> don't export snapshot and we still set this like when creating slots
> with CRS_NOEXPORT_SNAPSHOT or when creating via SQL APIs. The other
> name that comes to mind is initial_consistency_at, what do you think?
> 3. Changed comments at various places.
>
> Please find the above changes as a separate patch, if you like you can
> include these in the main patch.
>
> Apart from the above, I think the comments related to docs in my
> previous email [1] are still valid, can you please take care of those.
>
> [1] - 
> https://www.postgresql.org/message-id/CAA4eK1Kr34_TiREr57Wpd%3D3%3D03x%3D1n55DAjwJPGpHAEc4dWfUQ%40mail.gmail.com

I've added Amit's changes-patch as well as addressed comments related
to docs in the attached patch.

>On Thu, Feb 25, 2021 at 10:34 PM vignesh C  wrote:
>One observation while verifying the patch I noticed that most of
>ReplicationSlotPersistentData structure members are displayed in
>pg_replication_slots, but I did not see snapshot_was_exported_at_lsn
>being displayed. Is this intentional? If not intentional we can
>include snapshot_was_exported_at_lsn in pg_replication_slots.

I've updated snapshot_was_exported_at_  member to pg_replication_slots as well.
Do have a look and let me know if there are any comments.

regards,
Ajin Cherian
Fujitsu Australia


v4-0001-Avoid-repeated-decoding-of-prepared-transactions.patch
Description: Binary data

[PATCH] pgbench: Remove ecnt, a member variable of CState

2021-02-26 Thread miyake_kouta


Hi.

I created a patch to remove ecnt which is a member variable of CState.
This variable is incremented in some places, but it's not used for any 
purpose.
Also, the current pgbench's client abandons processing after hitting 
error, so this variable is no need, I think.


Regards
--
Kota Miyakediff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 627a244fb7..31a4df45f5 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -427,7 +427,6 @@ typedef struct
 
 	/* per client collected stats */
 	int64		cnt;			/* client transaction count, for -t */
-	int			ecnt;			/* error count */
 } CState;
 
 /*
@@ -2716,7 +2715,6 @@ sendCommand(CState *st, Command *command)
 	if (r == 0)
 	{
 		pg_log_debug("client %d could not send %s", st->id, command->argv[0]);
-		st->ecnt++;
 		return false;
 	}
 	else
@@ -2828,14 +2826,12 @@ readCommandResponse(CState *st, MetaCommand meta, char *varprefix)
 	if (qrynum == 0)
 	{
 		pg_log_error("client %d command %d: no results", st->id, st->command);
-		st->ecnt++;
 		return false;
 	}
 
 	return true;
 
 error:
-	st->ecnt++;
 	PQclear(res);
 	PQclear(next_res);
 	do

RE: libpq debug log

2021-02-26 Thread tsunakawa.ta...@fujitsu.com

From: Kyotaro Horiguchi 
> Using (inCursor - inStart) as logCursor doesn't work correctly if tracing 
> state
> desyncs.  Once desync happens inStart can be moved at the timing that the
> tracing code doesn't expect. This requires (as I mentioned upthread)
> pqReadData to actively reset logCursor, though.
> 
> logCursor should move when bytes are fed to the tracing functoins even when
> theyare the last bytes of a message.

Hmm, sharp and nice insight.  I'd like Iwata-san and Kirk to consider this, 
including Horiguchi-san's previous mails.


Regards
Takayuki Tsunakawa

Re: libpq debug log

2021-02-26 Thread Kyotaro Horiguchi

At Fri, 26 Feb 2021 17:30:32 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Fri, 26 Feb 2021 16:12:39 +0900 (JST), Kyotaro Horiguchi 
>  wrote in 
> > This inhibits logCursor being updated. What is worse, I find that
> > logCursor movement is quite dubious.
> 
> Using (inCursor - inStart) as logCursor doesn't work correctly if
> tracing state desyncs.  Once desync happens inStart can be moved at
> the timing that the tracing code doesn't expect.
-   This requires (as I
- mentioned upthread) pqReadData to actively reset logCursor, though.
+ So pgReadData needs to avtively reset logCursor.  If logCursor is
+ actively reset, we no longer need to use (inCursor - inStart) as
+ logCursor and it is enough that logCursor follows inCursor.
>
> logCursor should move when bytes are fed to the tracing functoins even
> when theyare the last bytes of a message.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: libpq debug log

2021-02-26 Thread Kyotaro Horiguchi

At Fri, 26 Feb 2021 16:12:39 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> This inhibits logCursor being updated. What is worse, I find that
> logCursor movement is quite dubious.

Using (inCursor - inStart) as logCursor doesn't work correctly if
tracing state desyncs.  Once desync happens inStart can be moved at
the timing that the tracing code doesn't expect. This requires (as I
mentioned upthread) pqReadData to actively reset logCursor, though.

logCursor should move when bytes are fed to the tracing functoins even
when theyare the last bytes of a message.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [PATCH] pgbench: Bug fix for the -d option

2021-02-26 Thread Michael Paquier

On Fri, Feb 26, 2021 at 01:18:20PM +0900, miyake_kouta wrote:
> I'm suggesting this bug fix because I think it's a bug, but if there's any
> other intent to this else if statement, could you let me know?

Yes, the existing code could mess up with the selection logic if
PGPORT and PGUSER are both specified in an environment, masking the
value of PGUSER, so let's fix that.  This is as old as 412893b.
--
Michael


signature.asc
Description: PGP signature

Interest in GSoC 2021 Projects

2021-02-26 Thread Yixin Shi

Hi,



I am Yixin Shi, a junior student majoring in Computer Science at University
of Michigan. I notice that the project ideas for GSoC 2021 have been posted
on your webpage and I am quite interested in two of them. I really wish to
take part in the project *Make plsample a more complete procedural language
handler example (2021)* considering both my interest and ability. The
potential mentor for this Project is *Mark Wong*. I am also interested in
the project *Improve PostgreSQL Regression Test Coverage (2021)* and the
potential mentor is *Andreas Scherbaum* and *Stephen Frost*.



I would like to learn more about these two projects but failed to contact
the mentors. How can I contact them? Also, I really hope to join the
project. Are there any suggestions on application?



Best wishes,



Yixin Shi

Re: [PATCH] Note effect of max_replication_slots on subscriber side in documentation.

2021-02-26 Thread Paul Martinez

On Thu, Feb 25, 2021 at 5:31 AM Amit Kapila  wrote:
>
> For docs only patch, I have few suggestions:
> 1. On page [1], it is not very clear that we are suggesting to set
> max_replication_slots for origins whereas your new doc patch has
> clarified it, can we update the other page as well.

Sorry, what other page are you referring to?

> 2.
> Setting it a lower value than the current
> + number of tracked replication origins (reflected in
> +  linkend="view-pg-replication-origin-status">pg_replication_origin_status,
> + not  linkend="catalog-pg-replication-origin">pg_replication_origin)
> + will prevent the server from starting.
> +
>
> Why can't we just mention pg_replication_origin above?
>

So this is slightly confusing:

pg_replication_origin just contains mappings from origin names to oids.
It is regular catalog table and has no limit on its size. Users can also
manually insert rows into this table.

https://www.postgresql.org/docs/13/catalog-pg-replication-origin.html

The view showing the in-memory information is actually
pg_replication_origin_status. The number of entries here is what is
actually constrained by the GUC parameter.

https://www.postgresql.org/docs/13/view-pg-replication-origin-status.html

I clarified pointing to pg_replication_origin_status because it could in
theory be out of sync with pg_replication_origin. I'm actually not sure
how entries there are managed. Perhaps if you were replicating from one
database and then stopped and started replicating from another database
you'd have two replication origins, but only one replication origin
status?

This also brings up a point regarding the naming of the added GUC.
max_replication_origins is cleanest, but has this confusion regarding
pg_replication_origin vs. pg_replication_origin_status.
max_replication_origin_statuses is weird (and long).
max_tracked_replication_origins is a possibility?

(One last bit of naming confusion; the internal code refers to them as
ReplicationStates, rather than ReplicationOrigins or
ReplicationOriginStatuses, or something like that.)

- Paul

Re: 64-bit XIDs in deleted nbtree pages

2021-02-26 Thread Masahiko Sawada

On Fri, Feb 26, 2021 at 9:58 AM Peter Geoghegan  wrote:
>
> On Thu, Feb 25, 2021 at 5:42 AM Masahiko Sawada  wrote:
> > btvacuumcleanup()  has been playing two roles: recycling deleted pages
> > and collecting index statistics.
>
> Right.
>
> I pushed the VACUUM VERBOSE "index pages newly deleted"
> instrumentation patch earlier - it really isn't complicated or
> controversial, so I saw no reason to delay with that.

Thanks!

I think we can improve bloom indexes in a separate patch so that they
use pages_newly_deleted.

>
> Attached is v7, which now only has the final patch -- the optimization
> that makes it possible for VACUUM to recycle pages that were newly
> deleted during the same VACUUM operation.  Still no real changes.
> Again, I just wanted to keep CFBot happy. I haven't thought about or
> improved this final patch recently, and it clearly needs more work to
> be ready to commit.

I've looked at the patch. The patch is straightforward and I agreed
with the direction.

Here are some comments on v7 patch.

---
+   /* Allocate _bt_newly_deleted_pages_recycle related information */
+   vstate.ndeletedspace = 512;

Maybe add a #define for the value 512?


+   for (int i = 0; i < vstate->ndeleted; i++)
+   {
+   BlockNumber blkno = vstate->deleted[i].blkno;
+   FullTransactionId safexid = vstate->deleted[i].safexid;
+
+   if (!GlobalVisCheckRemovableFullXid(heapRel, safexid))
+   break;
+
+   RecordFreeIndexPage(rel, blkno);
+   stats->pages_free++;
+   }

Should we use 'continue' instead of 'break'? Or can we sort
vstate->deleted array by full XID and leave 'break'?

---
Currently, the patch checks only newly-deleted-pages if they are
recyclable at the end of btvacuumscan. What do you think about the
idea of checking also pages that are deleted by previous vacuums
(i.g., pages already marked P_ISDELETED() but not
BTPageIsRecyclable())? There is still a little hope that such pages
become recyclable when we reached the end of btvacuumscan. We will end
up checking such pages twice (during btvacuumscan() and the end of
btvacuumscan()) but if the cost of collecting and checking pages is
not high it probably could expand the chance of recycling pages.

I'm going to reply to the discussion vacuum_cleanup_index_scale_factor
in a separate mail. Or maybe it's better to start a new thread for
that so as get opinions from other hackers. It's no longer related to
the subject.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Is Recovery actually paused?

2021-02-26 Thread Kyotaro Horiguchi

At Thu, 25 Feb 2021 13:22:53 +0530, Dilip Kumar  wrote 
in 
> On Thu, Feb 25, 2021 at 12:42 PM Kyotaro Horiguchi
>  wrote:
> > The latest version applies (almost) cleanly to the current master and
> > works fine.
> > I don't have further comment on this.
> >
> > I'll wait for a day before marking this RfC in case anyone have
> > further comments.
> 
> Okay.

Hearing nothing, done that.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

RE: Parallel INSERT (INTO ... SELECT ...)

2021-02-26 Thread tsunakawa.ta...@fujitsu.com

From: Hou, Zhijie/侯 志杰 
> After doing some more tests on it (performance degradation will not happen
> when source table is out of order).
> I think we can say the performance degradation is related to the order of the
> data in source table.
...
> So, the order of data 's influence seems a normal phenomenon, I think it seems
> we do not need to do anything about it (currently).
> It seems better to mark it as todo which we can improve this in the future.
> 
> (Since the performance degradation in parallel bitmap is because of the lock 
> in
> _bt_search, It will not always happen when the target table already have data,
> less race condition will happened when parallel insert into a evenly 
> distributed
> btree).

I think so, too.  The slowness of parallel insert operation due to index page 
contention, and index bloat, would occur depending on the order of the index 
key values of source records.

I guess other DBMSs exhibit similar phenomenon, but I couldn't find such 
description in the manual, whitepapers, or several books on Oracle.  One 
relevant excerpt is the following.  This is about parallel index build.  Oracle 
tries to minimize page contention and index bloat.  This is completely my 
guess, but they may do similar things in parallel INSERT SELECT, because Oracle 
holds an exclusive lock on the target table.  SQL Server also acquires an 
exclusive lock.  Maybe we can provide an option to do so in the future.

https://docs.oracle.com/en/database/oracle/oracle-database/21/vldbg/parallel-exec-tips.html#GUID-08A08783-C243-4872-AFFA-56B603F1F0F5
--
Optimizing Performance by Creating Indexes in Parallel
...
Multiple processes can work simultaneously to create an index. By dividing the 
work necessary to create an index among multiple server processes, Oracle 
Database can create the index more quickly than if a single server process 
created the index serially.

Parallel index creation works in much the same way as a table scan with an 
ORDER BY clause. The table is randomly sampled and a set of index keys is found 
that equally divides the index into the same number of pieces as the DOP. A 
first set of query processes scans the table, extracts key-rowid pairs, and 
sends each pair to a process in a second set of query processes based on a key. 
Each process in the second set sorts the keys and builds an index in the usual 
fashion. After all index pieces are built, the parallel execution coordinator 
simply concatenates the pieces (which are ordered) to form the final index.
...
When creating an index in parallel, the STORAGE clause refers to the storage of 
each of the subindexes created by the query server processes. Therefore, an 
index created with an INITIAL value of 5 MB and a DOP of 12 consumes at least 
60 MB of storage during index creation because each process starts with an 
extent of 5 MB. When the query coordinator process combines the sorted 
subindexes, some extents might be trimmed, and the resulting index might be 
smaller than the requested 60 MB.
--


IIRC, the current patch showd nice performance improvement in some (many?) 
patterns.  So, I think it can be committed in PG 14, when it has addressed the 
plan cache issue that Amit Langote-san posed.  I remember the following 
issues/comments are pending, but they are not blockers:

1. Insert operation is run serially when the target table has a foreign key, 
sequence or identity column.
This can be added later based on the current design without requiring rework.  
That is, the current patch leaves no debt.  (Personally, foreign key and 
sequence support will also be wanted in PG 14.  We may try them in the last CF 
once the current patch is likely to be committable.)

2. There's a plausible reason for the performance variation and index bloat 
with the bitmap scan case.
Ideally, we want to come up with a solution that can be incorporated in PG 15.

Or, it may be one conclusion that we can't prevent performance degradation in 
all cases.  That may be one hidden reason why Oracle and SQL Server doesn't 
enable parallel DML by default.

We can advise the user in the manual that parallel DML is not always faster 
than serial operation so he should test performance by enabling and disabling 
parallel DML.  Also, maybe we should honestly state that indexes can get a bit 
bigger after parallel insert than after serial insert, and advise the user to 
do REINDEX CONCURRENTLY if necessary.

3. The total time of parallel execution can get longer because of unbalanced 
work distribution among parallel workers.
This seems to be an existing problem, so we can pursue the improvement later, 
hopefully before the release of PG 14.


Does anyone see any problem with committing the current patch (after polishing 
it)?


Regards
Takayuki Tsunakawa

64 matches

Mail list logo