subject:"Re\: \[HACKERS\] Logical decoding on standby"

Re: [HACKERS] Logical decoding on standby

2017-06-26 Thread Craig Ringer

On 21 June 2017 at 17:30, sanyam jain  wrote:
> Hi,
> After changing
> sendTimeLineIsHistoric = state->currTLI == ThisTimeLineID;
> to
> sendTimeLineIsHistoric = state->currTLI != ThisTimeLineID;
>
> I was facing another issue.
> On promotion of a cascaded server ThisTimeLineID in the standby server
> having logical slot becomes 0.
> Then i added a function call to GetStandbyFlushRecPtr in
> StartLogicalReplication which updates ThisTimeLineID.
>
> After the above two changes timeline following is working.But i'm not sure
> whether this is correct or not.In any case please someone clarify.

That's a reasonable thing to do, and again, I thought I did it in a
later revision, but apparently not (?). I've been working on other
things and have lost track of progress here a bit.

I'll check more closely.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-06-26 Thread Craig Ringer

On 21 June 2017 at 13:28, sanyam jain  wrote:
> Hi,
>
> In this patch in walsender.c sendTimeLineIsHistoric is set to true when
> current and ThisTimeLineID are equal.
>
> sendTimeLineIsHistoric = state->currTLI == ThisTimeLineID;
>
>
> Shouldn't sendTimeLineIsHistoric is true when state->currTLI is less than
> ThisTimeLineID.

Correct, that was a bug. I thought it got fixed upthread though.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-06-26 Thread sanyam jain

Hi,
>After changing
>sendTimeLineIsHistoric = state->currTLI == ThisTimeLineID;
>to
>sendTimeLineIsHistoric = state->currTLI != ThisTimeLineID;
>
>I was facing another issue.
>On promotion of a cascaded server ThisTimeLineID in the standby server having 
>>logical slot becomes 0.
>Then i added a function call to GetStandbyFlushRecPtr in 
>StartLogicalReplication >which updates ThisTimeLineID.
>
>After the above two changes timeline following is working.But i'm not sure 
>whether >this is correct or not.In any case please someone clarify.


Please anyone with experience can explain whether the steps i have done are 
correct or not.

Thanks,
Sanyam Jain

Re: [HACKERS] Logical decoding on standby

2017-06-21 Thread sanyam jain

Hi,
After changing
sendTimeLineIsHistoric = state->currTLI == ThisTimeLineID;
to
sendTimeLineIsHistoric = state->currTLI != ThisTimeLineID;

I was facing another issue.
On promotion of a cascaded server ThisTimeLineID in the standby server having 
logical slot becomes 0.
Then i added a function call to GetStandbyFlushRecPtr in 
StartLogicalReplication which updates ThisTimeLineID.

After the above two changes timeline following is working.But i'm not sure 
whether this is correct or not.In any case please someone clarify.

Thanks,
Sanyam Jain

Re: [HACKERS] Logical decoding on standby

2017-06-20 Thread sanyam jain

Hi,

In this patch in walsender.c sendTimeLineIsHistoric is set to true when current 
and ThisTimeLineID are equal.

sendTimeLineIsHistoric = state->currTLI == ThisTimeLineID;


Shouldn't sendTimeLineIsHistoric is true when state->currTLI is less than 
ThisTimeLineID.


When i applied the timeline following patch alone pg_recvlogical quits in 
startup phase but when i made the above change pg_recvlogical works although 
timeline following doesn't work.


Thanks,

Sanyam Jain


From: pgsql-hackers-ow...@postgresql.org <pgsql-hackers-ow...@postgresql.org> 
on behalf of Robert Haas <robertmh...@gmail.com>
Sent: Wednesday, April 5, 2017 3:25:50 PM
To: Andres Freund
Cc: Craig Ringer; Simon Riggs; Thom Brown; Michael Paquier; Petr Jelinek; 
PostgreSQL Hackers
Subject: Re: [HACKERS] Logical decoding on standby

On Wed, Apr 5, 2017 at 10:32 AM, Andres Freund <and...@anarazel.de> wrote:
> On 2017-04-05 17:18:24 +0800, Craig Ringer wrote:
>> On 5 April 2017 at 04:19, Andres Freund <and...@anarazel.de> wrote:
>> > On 2017-04-04 22:32:40 +0800, Craig Ringer wrote:
>> >> I'm much happier with this. I'm still fixing some issues in the tests
>> >> for 03 and tidying them up, but 03 should allow 01 and 02 to be
>> >> reviewed in their proper context now.
>> >
>> > To me this very clearly is too late for v10, and now should be moved to
>> > the next CF.
>>
>> I tend to agree that it's late in the piece. It's still worth cleaning
>> it up into a state ready for early pg11 though.
>
> Totally agreed.

Based on this exchange, marked as "Moved to next CF".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-04-06 Thread Craig Ringer

On 5 April 2017 at 23:25, Robert Haas  wrote:
> On Wed, Apr 5, 2017 at 10:32 AM, Andres Freund  wrote:
>> On 2017-04-05 17:18:24 +0800, Craig Ringer wrote:
>>> On 5 April 2017 at 04:19, Andres Freund  wrote:
>>> > On 2017-04-04 22:32:40 +0800, Craig Ringer wrote:
>>> >> I'm much happier with this. I'm still fixing some issues in the tests
>>> >> for 03 and tidying them up, but 03 should allow 01 and 02 to be
>>> >> reviewed in their proper context now.
>>> >
>>> > To me this very clearly is too late for v10, and now should be moved to
>>> > the next CF.
>>>
>>> I tend to agree that it's late in the piece. It's still worth cleaning
>>> it up into a state ready for early pg11 though.
>>
>> Totally agreed.
>
> Based on this exchange, marked as "Moved to next CF".

Yeah. Can't say I like it, but I have to agree.

Can get this rolling in early pg11, and that way we can hopefully get
support for it into logical replication too.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-04-05 Thread Robert Haas

On Wed, Apr 5, 2017 at 10:32 AM, Andres Freund  wrote:
> On 2017-04-05 17:18:24 +0800, Craig Ringer wrote:
>> On 5 April 2017 at 04:19, Andres Freund  wrote:
>> > On 2017-04-04 22:32:40 +0800, Craig Ringer wrote:
>> >> I'm much happier with this. I'm still fixing some issues in the tests
>> >> for 03 and tidying them up, but 03 should allow 01 and 02 to be
>> >> reviewed in their proper context now.
>> >
>> > To me this very clearly is too late for v10, and now should be moved to
>> > the next CF.
>>
>> I tend to agree that it's late in the piece. It's still worth cleaning
>> it up into a state ready for early pg11 though.
>
> Totally agreed.

Based on this exchange, marked as "Moved to next CF".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-04-05 Thread Andres Freund

On 2017-04-05 17:18:24 +0800, Craig Ringer wrote:
> On 5 April 2017 at 04:19, Andres Freund  wrote:
> > On 2017-04-04 22:32:40 +0800, Craig Ringer wrote:
> >> I'm much happier with this. I'm still fixing some issues in the tests
> >> for 03 and tidying them up, but 03 should allow 01 and 02 to be
> >> reviewed in their proper context now.
> >
> > To me this very clearly is too late for v10, and now should be moved to
> > the next CF.
> 
> I tend to agree that it's late in the piece. It's still worth cleaning
> it up into a state ready for early pg11 though.

Totally agreed.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-04-05 Thread Craig Ringer

On 5 April 2017 at 04:19, Andres Freund  wrote:
> On 2017-04-04 22:32:40 +0800, Craig Ringer wrote:
>> I'm much happier with this. I'm still fixing some issues in the tests
>> for 03 and tidying them up, but 03 should allow 01 and 02 to be
>> reviewed in their proper context now.
>
> To me this very clearly is too late for v10, and now should be moved to
> the next CF.

I tend to agree that it's late in the piece. It's still worth cleaning
it up into a state ready for early pg11 though.

I've just fixed an issue where hot_standby_feedback on a physical slot
could cause oldestCatalogXmin to go backwards. When the slot's
catalog_xmin was 0 and is being set for the first time the standby's
supplied catalog_xmin is trusted. To fix it, in
PhysicalReplicationSlotNewXmin when setting catalog_xmin from 0, clamp
the value to the master's GetOldestSafeDecodingTransactionId().

Tests are cleaned up and fixed.

This series adds full support for logical decoding on a standby.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 24e2baea15c4f435789c7fda5ddc9feae8a7012f Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Wed, 22 Mar 2017 13:36:49 +0800
Subject: [PATCH 1/3] Log catalog_xmin advances before removing catalog tuples

Before advancing the effective catalog_xmin we use to remove old catalog
tuple versions, make sure it is written to WAL. This allows standbys
to know the oldest xid they can safely create a historic snapshot for.
They can then refuse to start decoding from a slot or raise a recovery
conflict.

The catalog_xmin advance is logged in a new xl_catalog_xmin_advance record,
emitted before vacuum or periodically by the bgwriter. WAL is only written if
the lowest catalog_xmin needed by any replication slot has advanced.
---
 src/backend/access/heap/rewriteheap.c   |   3 +-
 src/backend/access/rmgrdesc/xactdesc.c  |   9 ++
 src/backend/access/rmgrdesc/xlogdesc.c  |   3 +-
 src/backend/access/transam/varsup.c |  15 
 src/backend/access/transam/xact.c   |  36 
 src/backend/access/transam/xlog.c   |  23 -
 src/backend/postmaster/bgwriter.c   |   9 ++
 src/backend/replication/logical/decode.c|  12 +++
 src/backend/replication/walreceiver.c   |   2 +-
 src/backend/replication/walsender.c |  46 +-
 src/backend/storage/ipc/procarray.c | 134 ++--
 src/bin/pg_controldata/pg_controldata.c |   2 +
 src/include/access/transam.h|   5 ++
 src/include/access/xact.h   |  12 ++-
 src/include/catalog/pg_control.h|   1 +
 src/include/storage/procarray.h |   5 +-
 src/test/recovery/t/006_logical_decoding.pl |  90 +--
 17 files changed, 383 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index d7f65a5..d1400ec 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -812,7 +812,8 @@ logical_begin_heap_rewrite(RewriteState state)
 	if (!state->rs_logical_rewrite)
 		return;
 
-	ProcArrayGetReplicationSlotXmin(NULL, _xmin);
+	/* Use oldestCatalogXmin here */
+	ProcArrayGetReplicationSlotXmin(NULL, _xmin, NULL);
 
 	/*
 	 * If there are no logical slots in progress we don't need to do anything,
diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c
index 735f8c5..96ea163 100644
--- a/src/backend/access/rmgrdesc/xactdesc.c
+++ b/src/backend/access/rmgrdesc/xactdesc.c
@@ -297,6 +297,12 @@ xact_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "xtop %u: ", xlrec->xtop);
 		xact_desc_assignment(buf, xlrec);
 	}
+	else if (info == XLOG_XACT_CATALOG_XMIN_ADV)
+	{
+		xl_xact_catalog_xmin_advance *xlrec = (xl_xact_catalog_xmin_advance *) XLogRecGetData(record);
+
+		appendStringInfo(buf, "catalog_xmin %u", xlrec->new_catalog_xmin);
+	}
 }
 
 const char *
@@ -324,6 +330,9 @@ xact_identify(uint8 info)
 		case XLOG_XACT_ASSIGNMENT:
 			id = "ASSIGNMENT";
 			break;
+		case XLOG_XACT_CATALOG_XMIN_ADV:
+			id = "CATALOG_XMIN";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 5f07eb1..a66cfc6 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -47,7 +47,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		 "tli %u; prev tli %u; fpw %s; xid %u:%u; oid %u; multi %u; offset %u; "
 		 "oldest xid %u in DB %u; oldest multi %u in DB %u; "
 		 "oldest/newest commit timestamp xid: %u/%u; "
-		 "oldest running xid %u; %s",
+		 "oldest running xid %u; oldest catalog xmin %u; %s",
 (uint32) (checkpoint->redo >> 32), (uint32) checkpoint->redo,
 		 checkpoint->ThisTimeLineID,
 		 checkpoint->PrevTimeLineID,
@@

Re: [HACKERS] Logical decoding on standby

2017-04-04 Thread Andres Freund

On 2017-04-04 22:32:40 +0800, Craig Ringer wrote:
> I'm much happier with this. I'm still fixing some issues in the tests
> for 03 and tidying them up, but 03 should allow 01 and 02 to be
> reviewed in their proper context now.

To me this very clearly is too late for v10, and now should be moved to
the next CF.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-04-04 Thread Craig Ringer

On 4 April 2017 at 22:32, Craig Ringer  wrote:
> Hi all
>
> Here's the final set of three patches on top of what's already committed.
>
> The first is catalog_xmin logging, which is unchanged from the prior post.
>
> The 2nd is support for conflict with recovery, with changes that
> should address Andres's concerns there.
>
> The 3rd actually enables decoding on standby. Unlike the prior
> version, no attempt is made to check the walsender configuration for
> slot use, etc. The ugly code to try to mitigate races is also removed.
> Instead, if wal_level is logical the catalog_xmin sent by
> hot_standby_feedback is now the same as the xmin if there's no local
> slot holding it down. So we're always sending a catalog_xmin in
> feedback and we should always expect to have a valid local
> oldestCatalogXmin once hot_standby_feedback kicks in. This makes the
> race in slot creation no worse than the existing race between
> hot_standby_feedback establishment and the first queries run on a
> downstream, albeit with more annoying consequences. Apps can still
> ensure a slot created on standby is guaranteed safe and conflict-free
> by having a slot on the master first.
>
> I'm much happier with this. I'm still fixing some issues in the tests
> for 03 and tidying them up, but 03 should allow 01 and 02 to be
> reviewed in their proper context now.

Dammit. Attached.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 9b8b1236eb32819430062031ff76750ed8bc1661 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Wed, 22 Mar 2017 13:36:49 +0800
Subject: [PATCH 1/3] Log catalog_xmin advances before removing catalog tuples

Before advancing the effective catalog_xmin we use to remove old catalog
tuple versions, make sure it is written to WAL. This allows standbys
to know the oldest xid they can safely create a historic snapshot for.
They can then refuse to start decoding from a slot or raise a recovery
conflict.

The catalog_xmin advance is logged in a new xl_catalog_xmin_advance record,
emitted before vacuum or periodically by the bgwriter. WAL is only written if
the lowest catalog_xmin needed by any replication slot has advanced.
---
 src/backend/access/heap/rewriteheap.c   |   3 +-
 src/backend/access/rmgrdesc/xactdesc.c  |   9 ++
 src/backend/access/rmgrdesc/xlogdesc.c  |   3 +-
 src/backend/access/transam/varsup.c |  15 
 src/backend/access/transam/xact.c   |  36 
 src/backend/access/transam/xlog.c   |  23 -
 src/backend/postmaster/bgwriter.c   |   9 ++
 src/backend/replication/logical/decode.c|  12 +++
 src/backend/replication/walreceiver.c   |   2 +-
 src/backend/replication/walsender.c |   8 ++
 src/backend/storage/ipc/procarray.c | 134 ++--
 src/bin/pg_controldata/pg_controldata.c |   2 +
 src/include/access/transam.h|   5 ++
 src/include/access/xact.h   |  12 ++-
 src/include/catalog/pg_control.h|   1 +
 src/include/storage/procarray.h |   5 +-
 src/test/recovery/t/006_logical_decoding.pl |  90 +--
 17 files changed, 348 insertions(+), 21 deletions(-)

diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index d7f65a5..d1400ec 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -812,7 +812,8 @@ logical_begin_heap_rewrite(RewriteState state)
 	if (!state->rs_logical_rewrite)
 		return;
 
-	ProcArrayGetReplicationSlotXmin(NULL, _xmin);
+	/* Use oldestCatalogXmin here */
+	ProcArrayGetReplicationSlotXmin(NULL, _xmin, NULL);
 
 	/*
 	 * If there are no logical slots in progress we don't need to do anything,
diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c
index 735f8c5..96ea163 100644
--- a/src/backend/access/rmgrdesc/xactdesc.c
+++ b/src/backend/access/rmgrdesc/xactdesc.c
@@ -297,6 +297,12 @@ xact_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "xtop %u: ", xlrec->xtop);
 		xact_desc_assignment(buf, xlrec);
 	}
+	else if (info == XLOG_XACT_CATALOG_XMIN_ADV)
+	{
+		xl_xact_catalog_xmin_advance *xlrec = (xl_xact_catalog_xmin_advance *) XLogRecGetData(record);
+
+		appendStringInfo(buf, "catalog_xmin %u", xlrec->new_catalog_xmin);
+	}
 }
 
 const char *
@@ -324,6 +330,9 @@ xact_identify(uint8 info)
 		case XLOG_XACT_ASSIGNMENT:
 			id = "ASSIGNMENT";
 			break;
+		case XLOG_XACT_CATALOG_XMIN_ADV:
+			id = "CATALOG_XMIN";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 5f07eb1..a66cfc6 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -47,7 +47,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		 "tli %u; prev tli %u; fpw %s; xid

Re: [HACKERS] Logical decoding on standby

2017-04-04 Thread Craig Ringer

Hi all

Here's the final set of three patches on top of what's already committed.

The first is catalog_xmin logging, which is unchanged from the prior post.

The 2nd is support for conflict with recovery, with changes that
should address Andres's concerns there.

The 3rd actually enables decoding on standby. Unlike the prior
version, no attempt is made to check the walsender configuration for
slot use, etc. The ugly code to try to mitigate races is also removed.
Instead, if wal_level is logical the catalog_xmin sent by
hot_standby_feedback is now the same as the xmin if there's no local
slot holding it down. So we're always sending a catalog_xmin in
feedback and we should always expect to have a valid local
oldestCatalogXmin once hot_standby_feedback kicks in. This makes the
race in slot creation no worse than the existing race between
hot_standby_feedback establishment and the first queries run on a
downstream, albeit with more annoying consequences. Apps can still
ensure a slot created on standby is guaranteed safe and conflict-free
by having a slot on the master first.

I'm much happier with this. I'm still fixing some issues in the tests
for 03 and tidying them up, but 03 should allow 01 and 02 to be
reviewed in their proper context now.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-04-03 Thread Craig Ringer

On 3 April 2017 at 15:27, Craig Ringer  wrote:
> On 3 April 2017 at 13:46, Craig Ringer  wrote:
>
>> OK, updated catalog_xmin logging patch attached.
>
> Ahem, that should be v5.

... and here's v6, which returns to the separate
xl_xact_catalog_xmin_advance approach.

pgintented.

This is what I favour proceeding with.

Now updating/amending recovery conflict patch.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 353e987584e22d268b8ab1c10c46d7e8c74ef552 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Wed, 22 Mar 2017 13:36:49 +0800
Subject: [PATCH] Log catalog_xmin advances before removing catalog tuples

Before advancing the effective catalog_xmin we use to remove old catalog
tuple versions, make sure it is written to WAL. This allows standbys
to know the oldest xid they can safely create a historic snapshot for.
They can then refuse to start decoding from a slot or raise a recovery
conflict.

The catalog_xmin advance is logged in a new xl_catalog_xmin_advance record,
emitted before vacuum or periodically by the bgwriter. WAL is only written if
the lowest catalog_xmin needed by any replication slot has advanced.
---
 src/backend/access/heap/rewriteheap.c   |   3 +-
 src/backend/access/rmgrdesc/xactdesc.c  |   9 ++
 src/backend/access/rmgrdesc/xlogdesc.c  |   3 +-
 src/backend/access/transam/varsup.c |  15 
 src/backend/access/transam/xact.c   |  36 
 src/backend/access/transam/xlog.c   |  20 -
 src/backend/commands/vacuum.c   |   9 ++
 src/backend/postmaster/bgwriter.c   |  10 +++
 src/backend/replication/logical/decode.c|  12 +++
 src/backend/replication/walreceiver.c   |   2 +-
 src/backend/replication/walsender.c |   8 ++
 src/backend/storage/ipc/procarray.c | 132 ++--
 src/bin/pg_controldata/pg_controldata.c |   2 +
 src/include/access/transam.h|   5 ++
 src/include/access/xact.h   |  12 ++-
 src/include/catalog/pg_control.h|   1 +
 src/include/storage/procarray.h |   5 +-
 src/test/recovery/t/006_logical_decoding.pl |  90 +--
 18 files changed, 353 insertions(+), 21 deletions(-)

diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index d7f65a5..d1400ec 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -812,7 +812,8 @@ logical_begin_heap_rewrite(RewriteState state)
 	if (!state->rs_logical_rewrite)
 		return;
 
-	ProcArrayGetReplicationSlotXmin(NULL, _xmin);
+	/* Use oldestCatalogXmin here */
+	ProcArrayGetReplicationSlotXmin(NULL, _xmin, NULL);
 
 	/*
 	 * If there are no logical slots in progress we don't need to do anything,
diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c
index 735f8c5..96ea163 100644
--- a/src/backend/access/rmgrdesc/xactdesc.c
+++ b/src/backend/access/rmgrdesc/xactdesc.c
@@ -297,6 +297,12 @@ xact_desc(StringInfo buf, XLogReaderState *record)
 		appendStringInfo(buf, "xtop %u: ", xlrec->xtop);
 		xact_desc_assignment(buf, xlrec);
 	}
+	else if (info == XLOG_XACT_CATALOG_XMIN_ADV)
+	{
+		xl_xact_catalog_xmin_advance *xlrec = (xl_xact_catalog_xmin_advance *) XLogRecGetData(record);
+
+		appendStringInfo(buf, "catalog_xmin %u", xlrec->new_catalog_xmin);
+	}
 }
 
 const char *
@@ -324,6 +330,9 @@ xact_identify(uint8 info)
 		case XLOG_XACT_ASSIGNMENT:
 			id = "ASSIGNMENT";
 			break;
+		case XLOG_XACT_CATALOG_XMIN_ADV:
+			id = "CATALOG_XMIN";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 5f07eb1..a66cfc6 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -47,7 +47,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		 "tli %u; prev tli %u; fpw %s; xid %u:%u; oid %u; multi %u; offset %u; "
 		 "oldest xid %u in DB %u; oldest multi %u in DB %u; "
 		 "oldest/newest commit timestamp xid: %u/%u; "
-		 "oldest running xid %u; %s",
+		 "oldest running xid %u; oldest catalog xmin %u; %s",
 (uint32) (checkpoint->redo >> 32), (uint32) checkpoint->redo,
 		 checkpoint->ThisTimeLineID,
 		 checkpoint->PrevTimeLineID,
@@ -63,6 +63,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		 checkpoint->oldestCommitTsXid,
 		 checkpoint->newestCommitTsXid,
 		 checkpoint->oldestActiveXid,
+		 checkpoint->oldestCatalogXmin,
  (info == XLOG_CHECKPOINT_SHUTDOWN) ? "shutdown" : "online");
 	}
 	else if (info == XLOG_NEXTOID)
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index 5efbfbd..ffabf1c 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -414,6 +414,21

Re: [HACKERS] Logical decoding on standby

2017-04-03 Thread Craig Ringer

On 3 April 2017 at 13:46, Craig Ringer  wrote:

> OK, updated catalog_xmin logging patch attached.

Ahem, that should be v5.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 7f742f582e1f6f8f23c4e9d78cd0298180e5387c Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Wed, 22 Mar 2017 13:36:49 +0800
Subject: [PATCH] Log catalog_xmin advances before removing catalog tuples

Before advancing the effective catalog_xmin we use to remove old catalog
tuple versions, make sure it is written to WAL. This allows standbys
to know the oldest xid they can safely create a historic snapshot for.
They can then refuse to start decoding from a slot or raise a recovery
conflict.

The catalog_xmin advance is logged in the next xl_running_xacts records, so
vacuum of catalogs may be held back up to 10 seconds when a replication slot
with catalog_xmin is holding down the global catalog_xmin.
---
 src/backend/access/heap/rewriteheap.c   |  3 +-
 src/backend/access/rmgrdesc/standbydesc.c   |  5 +-
 src/backend/access/rmgrdesc/xlogdesc.c  |  3 +-
 src/backend/access/transam/varsup.c | 15 +
 src/backend/access/transam/xlog.c   | 26 +++-
 src/backend/postmaster/bgwriter.c   |  2 +-
 src/backend/replication/slot.c  |  2 +-
 src/backend/replication/walreceiver.c   |  2 +-
 src/backend/replication/walsender.c |  8 +++
 src/backend/storage/ipc/procarray.c | 61 ---
 src/backend/storage/ipc/standby.c   | 60 +--
 src/bin/pg_controldata/pg_controldata.c |  2 +
 src/include/access/transam.h|  5 ++
 src/include/catalog/pg_control.h|  1 +
 src/include/storage/procarray.h |  3 +-
 src/include/storage/standby.h   |  8 ++-
 src/include/storage/standbydefs.h   |  1 +
 src/test/recovery/t/006_logical_decoding.pl | 93 +++--
 18 files changed, 269 insertions(+), 31 deletions(-)

diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index d7f65a5..d1400ec 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -812,7 +812,8 @@ logical_begin_heap_rewrite(RewriteState state)
 	if (!state->rs_logical_rewrite)
 		return;
 
-	ProcArrayGetReplicationSlotXmin(NULL, _xmin);
+	/* Use oldestCatalogXmin here */
+	ProcArrayGetReplicationSlotXmin(NULL, _xmin, NULL);
 
 	/*
 	 * If there are no logical slots in progress we don't need to do anything,
diff --git a/src/backend/access/rmgrdesc/standbydesc.c b/src/backend/access/rmgrdesc/standbydesc.c
index 278546a..4aaae59 100644
--- a/src/backend/access/rmgrdesc/standbydesc.c
+++ b/src/backend/access/rmgrdesc/standbydesc.c
@@ -21,10 +21,11 @@ standby_desc_running_xacts(StringInfo buf, xl_running_xacts *xlrec)
 {
 	int			i;
 
-	appendStringInfo(buf, "nextXid %u latestCompletedXid %u oldestRunningXid %u",
+	appendStringInfo(buf, "nextXid %u latestCompletedXid %u oldestRunningXid %u oldestCatalogXmin %u",
 	 xlrec->nextXid,
 	 xlrec->latestCompletedXid,
-	 xlrec->oldestRunningXid);
+	 xlrec->oldestRunningXid,
+	 xlrec->oldestCatalogXmin);
 	if (xlrec->xcnt > 0)
 	{
 		appendStringInfo(buf, "; %d xacts:", xlrec->xcnt);
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 5f07eb1..a66cfc6 100644
--- a/src/backend/access/rmgrdesc/xlogdesc.c
+++ b/src/backend/access/rmgrdesc/xlogdesc.c
@@ -47,7 +47,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		 "tli %u; prev tli %u; fpw %s; xid %u:%u; oid %u; multi %u; offset %u; "
 		 "oldest xid %u in DB %u; oldest multi %u in DB %u; "
 		 "oldest/newest commit timestamp xid: %u/%u; "
-		 "oldest running xid %u; %s",
+		 "oldest running xid %u; oldest catalog xmin %u; %s",
 (uint32) (checkpoint->redo >> 32), (uint32) checkpoint->redo,
 		 checkpoint->ThisTimeLineID,
 		 checkpoint->PrevTimeLineID,
@@ -63,6 +63,7 @@ xlog_desc(StringInfo buf, XLogReaderState *record)
 		 checkpoint->oldestCommitTsXid,
 		 checkpoint->newestCommitTsXid,
 		 checkpoint->oldestActiveXid,
+		 checkpoint->oldestCatalogXmin,
  (info == XLOG_CHECKPOINT_SHUTDOWN) ? "shutdown" : "online");
 	}
 	else if (info == XLOG_NEXTOID)
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index 5efbfbd..ffabf1c 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -414,6 +414,21 @@ SetTransactionIdLimit(TransactionId oldest_datfrozenxid, Oid oldest_datoid)
 	}
 }
 
+/*
+ * Set the global oldest catalog_xmin used to determine when tuples
+ * may be removed from catalogs and user-catalogs accessible from logical
+ * decoding.
+ *
+ * Only to be called from the startup process or from LogCurrentRunningXacts()
+ * which ensures the

Re: [HACKERS] Logical decoding on standby

2017-04-02 Thread Craig Ringer

On 31 March 2017 at 12:49, Craig Ringer  wrote:
> On 31 March 2017 at 01:16, Andres Freund  wrote:
>>> @@ -9633,6 +9643,12 @@ xlog_redo(XLogReaderState *record)
>>>   SetTransactionIdLimit(checkPoint.oldestXid, 
>>> checkPoint.oldestXidDB);
>>>
>>>   /*
>>> +  * There can be no concurrent writers to oldestCatalogXmin 
>>> during
>>> +  * recovery, so no need to take ProcArrayLock.
>>> +  */
>>> + ShmemVariableCache->oldestCatalogXmin = 
>>> checkPoint.oldestCatalogXmin;
>>
>> s/writers/writes/?
>
> I meant writers, i.e. nothing else can be writing to it. But writes works too.
>

Fixed.

>>> @@ -276,8 +279,21 @@ CreateInitDecodingContext(char *plugin,
>>>
>>>   ReplicationSlotsComputeRequiredXmin(true);
>>>
>>> + /*
>>> +  * If this is the first slot created on the master we won't have a
>>> +  * persistent record of the oldest safe xid for historic snapshots 
>>> yet.
>>> +  * Force one to be recorded so that when we go to replay from this 
>>> slot we
>>> +  * know it's safe.
>>> +  */
>>> + force_standby_snapshot =
>>> + !TransactionIdIsValid(ShmemVariableCache->oldestCatalogXmin);
>>
>> s/first slot/first logical slot/. Also, the reference to master doesn't
>> seem right.
>
> Unsure what you mean re reference to master not seeming right.
>
> If oldestCatalogXmin is 0 we'll ERROR when trying to start decoding
> from the new slot so we need to make sure it gets advanced one we've
> decided on our starting catalog_xmin.

Moved to next patch, will address there.

>>>   LWLockRelease(ProcArrayLock);
>>>
>>> + /* Update ShmemVariableCache->oldestCatalogXmin */
>>> + if (force_standby_snapshot)
>>> + LogStandbySnapshot();
>>
>> The comment and code don't quite square to me - it's far from obvious
>> that LogStandbySnapshot does something like that. I'd even say it's a
>> bad idea to have it do that.
>
> So you prefer the prior approach with separate xl_catalog_xmin advance 
> records?
>
> I don't have much preference; I liked the small code reduction of
> Simon's proposed approach, but it landed up being a bit awkward in
> terms of ordering and locking. I don't think catalog_xmin tracking is
> really closely related to the standby snapshot stuff and it feels a
> bit like it's a tacked-on afterthought where it is now.

This code moved to next patch. But we do need to agree on the best approach.

If we're not going to force a standby snapshot here, then it's
probably better to use the separate xl_catalog_xmin design instead.

>>>   /*
>>>* tell the snapshot builder to only assemble snapshot once reaching 
>>> the
>>>* running_xact's record with the respective xmin.
>>> @@ -376,6 +392,8 @@ CreateDecodingContext(XLogRecPtr start_lsn,
>>>   start_lsn = slot->data.confirmed_flush;
>>>   }
>>>
>>> + EnsureActiveLogicalSlotValid();
>>
>> This seems like it should be in a separate patch, and seperately
>> reviewed. It's code that's currently unreachable (and thus untestable).
>
> It is reached and is run, those checks run whenever creating a
> non-initial decoding context on master or replica.

Again, moved to next patch.

>>>   /*
>>> +  * Update our knowledge of the oldest xid we can safely create 
>>> historic
>>> +  * snapshots for.
>>> +  *
>>> +  * There can be no concurrent writers to oldestCatalogXmin during
>>> +  * recovery, so no need to take ProcArrayLock.
>>
>> By now I think is pretty flawed logic, because there can be concurrent
>> readers, that need to be safe against oldestCatalogXmin advancing
>> concurrently.
>
> You're right, we'll need a lock or suitable barriers here to ensure
> that slot conflict with recovery and startup of new decoding sessions
> doesn't see outdated values. This would be the peer of the
> pg_memory_barrier() above. Or could just take a lock; there's enough
> other locking activity in redo that it should be fine.

Now takes ProcArrayLock briefly.

oldestCatalogXmin is also used in GetOldestSafeDecodingTransactionId,
and there we want to prevent it from being advanced. But on further
thought, relying on oldestCatalogXmin there is actually unsafe; on the
master, we might've already logged our intent to advance it to some
greater value of procArray->replication_slot_catalog_xmin and will do
so as ProcArrayLock is released. On standby we're also better off
relying on procArray->replication_slot_catalog_xmin since that's what
we'll be sending in feedback.

Went back to using replication_slot_catalog_xmin here, with comment

 *
 * We don't use ShmemVariableCache->oldestCatalogXmin here because another
 * backend may have already logged its intention to advance it to a higher
 * value (still <= replication_slot_catalog_xmin) and just be waiting on
 * ProcArrayLock to actually apply the change. On a standby
 *

Re: [HACKERS] Logical decoding on standby

2017-03-30 Thread Craig Ringer

On 31 March 2017 at 12:49, Craig Ringer  wrote:
> On 31 March 2017 at 01:16, Andres Freund  wrote:

>> The comment and code don't quite square to me - it's far from obvious
>> that LogStandbySnapshot does something like that. I'd even say it's a
>> bad idea to have it do that.
>
> So you prefer the prior approach with separate xl_catalog_xmin advance 
> records?

Alternately, we can record the creation timeline on slots, so we know
if there's been a promotion. It wouldn't make sense to do this if that
were the only use of timelines on slots. But I'm aware you'd rather
keep slots timeline-agnostic and I tend to agree.

Anyway, per your advice will split out the validation step.

(I'd like replication origins to be able to track time alongside lsn,
and to pass the timeline of each LSN to output plugin callbacks so we
can detect if a physical promotion results in us backtracking down a
fork in history, but that doesn't affect slots.)

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-30 Thread Craig Ringer

On 31 March 2017 at 01:16, Andres Freund  wrote:
>> @@ -9633,6 +9643,12 @@ xlog_redo(XLogReaderState *record)
>>   SetTransactionIdLimit(checkPoint.oldestXid, 
>> checkPoint.oldestXidDB);
>>
>>   /*
>> +  * There can be no concurrent writers to oldestCatalogXmin 
>> during
>> +  * recovery, so no need to take ProcArrayLock.
>> +  */
>> + ShmemVariableCache->oldestCatalogXmin = 
>> checkPoint.oldestCatalogXmin;
>
> s/writers/writes/?

I meant writers, i.e. nothing else can be writing to it. But writes works too.

>> @@ -276,8 +279,21 @@ CreateInitDecodingContext(char *plugin,
>>
>>   ReplicationSlotsComputeRequiredXmin(true);
>>
>> + /*
>> +  * If this is the first slot created on the master we won't have a
>> +  * persistent record of the oldest safe xid for historic snapshots yet.
>> +  * Force one to be recorded so that when we go to replay from this 
>> slot we
>> +  * know it's safe.
>> +  */
>> + force_standby_snapshot =
>> + !TransactionIdIsValid(ShmemVariableCache->oldestCatalogXmin);
>
> s/first slot/first logical slot/. Also, the reference to master doesn't
> seem right.

Unsure what you mean re reference to master not seeming right.

If oldestCatalogXmin is 0 we'll ERROR when trying to start decoding
from the new slot so we need to make sure it gets advanced one we've
decided on our starting catalog_xmin.

>>   LWLockRelease(ProcArrayLock);
>>
>> + /* Update ShmemVariableCache->oldestCatalogXmin */
>> + if (force_standby_snapshot)
>> + LogStandbySnapshot();
>
> The comment and code don't quite square to me - it's far from obvious
> that LogStandbySnapshot does something like that. I'd even say it's a
> bad idea to have it do that.

So you prefer the prior approach with separate xl_catalog_xmin advance records?

I don't have much preference; I liked the small code reduction of
Simon's proposed approach, but it landed up being a bit awkward in
terms of ordering and locking. I don't think catalog_xmin tracking is
really closely related to the standby snapshot stuff and it feels a
bit like it's a tacked-on afterthought where it is now.

>>   /*
>>* tell the snapshot builder to only assemble snapshot once reaching 
>> the
>>* running_xact's record with the respective xmin.
>> @@ -376,6 +392,8 @@ CreateDecodingContext(XLogRecPtr start_lsn,
>>   start_lsn = slot->data.confirmed_flush;
>>   }
>>
>> + EnsureActiveLogicalSlotValid();
>
> This seems like it should be in a separate patch, and seperately
> reviewed. It's code that's currently unreachable (and thus untestable).

It is reached and is run, those checks run whenever creating a
non-initial decoding context on master or replica.

The failure case is reachable if a replica has hot_standby_feedback
off or it's not using a physical slot and loses its connection. If
promoted, any slot existing on that replica (from a file system level
copy when the replica was created) will fail. I agree it's contrived
since we can't create and maintain slots on replicas, which is the
main point of it.

>> +/*
>> + * Test to see if the active logical slot is usable.
>> + */
>> +static void
>> +EnsureActiveLogicalSlotValid(void)
>> +{
>> + TransactionId shmem_catalog_xmin;
>> +
>> + Assert(MyReplicationSlot != NULL);
>> +
>> + /*
>> +  * A logical slot can become unusable if we're doing logical decoding 
>> on a
>> +  * standby or using a slot created before we were promoted from standby
>> +  * to master.
>
> Neither of those is currently possible...

Right. Because it's foundations for decoding on standby.

>> If the master advanced its global catalog_xmin past the
>> +  * threshold we need it could've removed catalog tuple versions that
>> +  * we'll require to start decoding at our restart_lsn.
>> +  *
>> +  * We need a barrier so that if we decode in recovery on a standby we
>> +  * don't allow new decoding sessions to start after redo has advanced
>> +  * the threshold.
>> +  */
>> + if (RecoveryInProgress())
>> + pg_memory_barrier();
>
> I don't think this is a meaningful locking protocol.  It's a bad idea to
> use lock-free programming without need, especially when the concurrency
> protocol isn't well defined.

Yeah. The intended interaction is with recovery conflict on standby
which doesn't look likely to land in this release due to patch
split/cleanup etc. (Not a complaint).

Better to just take a brief shared ProcArrayLock.

>> diff --git a/src/backend/replication/walsender.c 
>> b/src/backend/replication/walsender.c
>> index cfc3fba..cdc5f95 100644
>> --- a/src/backend/replication/walsender.c
>> +++ b/src/backend/replication/walsender.c
>> @@ -1658,6 +1658,11 @@ PhysicalConfirmReceivedLocation(XLogRecPtr lsn)
>>* be energy wasted - the worst lost information can do here is give us
>>

Re: [HACKERS] Logical decoding on standby

2017-03-30 Thread Craig Ringer

On 31 March 2017 at 01:16, Andres Freund  wrote:
> On 2017-03-29 08:01:34 +0800, Craig Ringer wrote:
>> On 28 March 2017 at 23:22, Andres Freund  wrote:
>>
>> >> --- a/doc/src/sgml/protocol.sgml
>> >> +++ b/doc/src/sgml/protocol.sgml
>> >> @@ -2034,6 +2034,8 @@ The commands accepted in walsender mode are:
>> >>   
>> >>Drops a replication slot, freeing any reserved server-side 
>> >> resources. If
>> >>the slot is currently in use by an active connection, this command 
>> >> fails.
>> >> +  If the slot is a logical slot that was created in a database other 
>> >> than
>> >> +  the database the walsender is connected to, this command fails.
>> >>   
>> >>   
>> >>
>> >
>> > Shouldn't the docs in the drop database section about this?
>>
>> DROP DATABASE doesn't really discuss all the resources it drops, but
>> I'm happy to add mention of replication slots handling.
>
> I don't think that's really comparable, because the other things aren't
> global objects, which replication slots are.

Fine by me.

Patch fix-slot-drop-docs.patch, upthread, adds the passage

+
+  
+   Active logical
+   replication slots count as connections and will prevent a
+   database from being dropped. Inactive slots will be automatically
+   dropped when the database is dropped.
+  

to the notes section of the DROP DATABASE docs; that should do the
trick. I'm not convinced it's worth going into the exceedingly
unlikely race with concurrent slot drop, and we don't seem to in other
places in the docs, like the race you mentioned with connecting to a
db as it's being dropped.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-30 Thread Andres Freund

On 2017-03-30 19:40:08 +0100, Simon Riggs wrote:
> On 30 March 2017 at 18:16, Andres Freund  wrote:
> 
> >>  /*
> >>   * Each page of XLOG file has a header like this:
> >>   */
> >> -#define XLOG_PAGE_MAGIC 0xD097   /* can be used as WAL version 
> >> indicator */
> >> +#define XLOG_PAGE_MAGIC 0xD100   /* can be used as WAL version 
> >> indicator */
> >
> > We normally only advance this by one, it's not tied to the poistgres 
> > version.
> 
> That was my addition. I rounded it up cos this is release 10. No biggie.

We'll probably upgrade that more than once again this release...


> (Poistgres? Is that the Manhattan spelling?)

Tiredness spelling ;)


> We've been redesigning the mechanisms for 2 years now, so it seems
> unlikely that further redesign can be required.

I don't think that's true *at all* - the mechanism previously
fundamentally different.

The whole topic has largely seen activity shortly before the code
freeze, both last time round and now.  I don't think it's surprising
that it thus doesn't end up being ready.


> If it is required,
> this patch is fairly low touch and change is possible later,
> incremental development etc. As regards overhead, this adds a small
> amount of time to a background process executed every 10 secs,
> generates no new WAL records.
> 
> So I don't see any reason not to commit this feature, after the minor
> corrections.

It doesn't have any benefit on its own, the locking model doesn't seem
fully there.  I don't see much reason to get this in before the release.


- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-30 Thread Simon Riggs

On 30 March 2017 at 18:16, Andres Freund  wrote:

>>  /*
>>   * Each page of XLOG file has a header like this:
>>   */
>> -#define XLOG_PAGE_MAGIC 0xD097   /* can be used as WAL version 
>> indicator */
>> +#define XLOG_PAGE_MAGIC 0xD100   /* can be used as WAL version 
>> indicator */
>
> We normally only advance this by one, it's not tied to the poistgres version.

That was my addition. I rounded it up cos this is release 10. No biggie.

(Poistgres? Is that the Manhattan spelling?)

> I'm sorry, but to me this patch isn't ready.  I'm also doubtful that it
> makes a whole lot of sense on its own, without having finished the
> design for decoding on a standby - it seems quite likely that we'll have
> to redesign the mechanisms in here a bit for that.  For 10 this seems to
> do nothing but add overhead?

I'm sure we can reword the comments.

We've been redesigning the mechanisms for 2 years now, so it seems
unlikely that further redesign can be required. If it is required,
this patch is fairly low touch and change is possible later,
incremental development etc. As regards overhead, this adds a small
amount of time to a background process executed every 10 secs,
generates no new WAL records.

So I don't see any reason not to commit this feature, after the minor
corrections.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-30 Thread Andres Freund

On 2017-03-29 08:01:34 +0800, Craig Ringer wrote:
> On 28 March 2017 at 23:22, Andres Freund  wrote:
> 
> >> --- a/doc/src/sgml/protocol.sgml
> >> +++ b/doc/src/sgml/protocol.sgml
> >> @@ -2034,6 +2034,8 @@ The commands accepted in walsender mode are:
> >>   
> >>Drops a replication slot, freeing any reserved server-side 
> >> resources. If
> >>the slot is currently in use by an active connection, this command 
> >> fails.
> >> +  If the slot is a logical slot that was created in a database other 
> >> than
> >> +  the database the walsender is connected to, this command fails.
> >>   
> >>   
> >>
> >
> > Shouldn't the docs in the drop database section about this?
> 
> DROP DATABASE doesn't really discuss all the resources it drops, but
> I'm happy to add mention of replication slots handling.

I don't think that's really comparable, because the other things aren't
global objects, which replication slots are.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-30 Thread Andres Freund

> @@ -9633,6 +9643,12 @@ xlog_redo(XLogReaderState *record)
>   SetTransactionIdLimit(checkPoint.oldestXid, 
> checkPoint.oldestXidDB);
>  
>   /*
> +  * There can be no concurrent writers to oldestCatalogXmin 
> during
> +  * recovery, so no need to take ProcArrayLock.
> +  */
> + ShmemVariableCache->oldestCatalogXmin = 
> checkPoint.oldestCatalogXmin;

s/writers/writes/?

> @@ -9731,6 +9748,15 @@ xlog_redo(XLogReaderState *record)
> 
> checkPoint.oldestXid))
>   SetTransactionIdLimit(checkPoint.oldestXid,
> 
> checkPoint.oldestXidDB);
> +
> + /*
> +  * There can be no concurrent writers to oldestCatalogXmin 
> during
> +  * recovery, so no need to take ProcArrayLock.
> +  */
> + if (TransactionIdPrecedes(ShmemVariableCache->oldestCatalogXmin,
> + 
> checkPoint.oldestCatalogXmin)
> + ShmemVariableCache->oldestCatalogXmin = 
> checkPoint.oldestCatalogXmin;

dito.



> @@ -276,8 +279,21 @@ CreateInitDecodingContext(char *plugin,
>  
>   ReplicationSlotsComputeRequiredXmin(true);
>  
> + /*
> +  * If this is the first slot created on the master we won't have a
> +  * persistent record of the oldest safe xid for historic snapshots yet.
> +  * Force one to be recorded so that when we go to replay from this slot 
> we
> +  * know it's safe.
> +  */
> + force_standby_snapshot =
> + !TransactionIdIsValid(ShmemVariableCache->oldestCatalogXmin);

s/first slot/first logical slot/. Also, the reference to master doesn't
seem right.


>   LWLockRelease(ProcArrayLock);
>  
> + /* Update ShmemVariableCache->oldestCatalogXmin */
> + if (force_standby_snapshot)
> + LogStandbySnapshot();

The comment and code don't quite square to me - it's far from obvious
that LogStandbySnapshot does something like that. I'd even say it's a
bad idea to have it do that.


>   /*
>* tell the snapshot builder to only assemble snapshot once reaching the
>* running_xact's record with the respective xmin.
> @@ -376,6 +392,8 @@ CreateDecodingContext(XLogRecPtr start_lsn,
>   start_lsn = slot->data.confirmed_flush;
>   }
>  
> + EnsureActiveLogicalSlotValid();

This seems like it should be in a separate patch, and seperately
reviewed. It's code that's currently unreachable (and thus untestable).


> +/*
> + * Test to see if the active logical slot is usable.
> + */
> +static void
> +EnsureActiveLogicalSlotValid(void)
> +{
> + TransactionId shmem_catalog_xmin;
> +
> + Assert(MyReplicationSlot != NULL);
> +
> + /*
> +  * A logical slot can become unusable if we're doing logical decoding 
> on a
> +  * standby or using a slot created before we were promoted from standby
> +  * to master.

Neither of those is currently possible...


> If the master advanced its global catalog_xmin past the
> +  * threshold we need it could've removed catalog tuple versions that
> +  * we'll require to start decoding at our restart_lsn.
> +  *
> +  * We need a barrier so that if we decode in recovery on a standby we
> +  * don't allow new decoding sessions to start after redo has advanced
> +  * the threshold.
> +  */
> + if (RecoveryInProgress())
> + pg_memory_barrier();

I don't think this is a meaningful locking protocol.  It's a bad idea to
use lock-free programming without need, especially when the concurrency
protocol isn't well defined.  With what other barrier does this pair
with?  What prevents the data being out of date by the time we actually
do the check?


> diff --git a/src/backend/replication/walsender.c 
> b/src/backend/replication/walsender.c
> index cfc3fba..cdc5f95 100644
> --- a/src/backend/replication/walsender.c
> +++ b/src/backend/replication/walsender.c
> @@ -1658,6 +1658,11 @@ PhysicalConfirmReceivedLocation(XLogRecPtr lsn)
>* be energy wasted - the worst lost information can do here is give us
>* wrong information in a statistics view - we'll just potentially be 
> more
>* conservative in removing files.
> +  *
> +  * We don't have to do any effective_xmin / effective_catalog_xmin 
> testing
> +  * here either, like for LogicalConfirmReceivedLocation. If we received
> +  * the xmin and catalog_xmin from downstream replication slots we know 
> they
> +  * were already confirmed there,
>*/
>  }

This comment reads as if LogicalConfirmReceivedLocation had
justification for not touching / checking catalog_xmin. But it does.



>   /*
> +  * Update our knowledge of the oldest xid we can safely create historic
> +  * snapshots for.
> +  *
> +

Re: [HACKERS] Logical decoding on standby

2017-03-30 Thread Simon Riggs

On 30 March 2017 at 15:27, Andres Freund  wrote:
> On 2017-03-30 15:26:02 +0100, Simon Riggs wrote:
>> On 30 March 2017 at 09:07, Craig Ringer  wrote:
>>
>> > Attached.
>>
>> * Cleaned up in 3 places
>> * Added code for faked up RunningTransactions in xlog.c
>> * Ensure catalog_xmin doesn't go backwards
>>
>> All else looks good. Comments before commit?
>
> Can you give me till after lunch?

Sure, np

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-30 Thread Andres Freund

On 2017-03-30 15:26:02 +0100, Simon Riggs wrote:
> On 30 March 2017 at 09:07, Craig Ringer  wrote:
> 
> > Attached.
> 
> * Cleaned up in 3 places
> * Added code for faked up RunningTransactions in xlog.c
> * Ensure catalog_xmin doesn't go backwards
> 
> All else looks good. Comments before commit?

Can you give me till after lunch?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-30 Thread Simon Riggs

On 30 March 2017 at 09:07, Craig Ringer  wrote:

> Attached.

* Cleaned up in 3 places
* Added code for faked up RunningTransactions in xlog.c
* Ensure catalog_xmin doesn't go backwards

All else looks good. Comments before commit?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


log-catalog-xmin-advances-v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-30 Thread Craig Ringer

On 30 March 2017 at 11:34, Craig Ringer  wrote:
> On 29 March 2017 at 23:13, Simon Riggs  wrote:
>> On 29 March 2017 at 10:17, Craig Ringer  wrote:
>>> On 29 March 2017 at 16:44, Craig Ringer  wrote:
>>>
 * Split oldestCatalogXmin tracking into separate patch
>>>
>>> Regarding this, Simon raised concerns about xlog volume here.
>>>
>>> It's pretty negligible.
>>>
>>> We only write a new record when a vacuum runs after catalog_xmin
>>> advances on the slot with the currently-lowest catalog_xmin (or, if
>>> vacuum doesn't run reasonably soon, when the bgworker next looks).
>>
>> I'd prefer to slow things down a little, not be so eager.
>>
>> If we hold back update of the catalog_xmin until when we run
>> GetRunningTransactionData() we wouldn't need to produce any WAL
>> records at all AND we wouldn't need to have VACUUM do
>> UpdateOldestCatalogXmin(). Bgwriter wouldn't need to perform an extra
>> task.
>>
>> That would also make this patch about half the length it is.
>>
>> Let me know what you think.
>
> Good idea.
>
> We can always add a heuristic later to make xl_running_xacts get
> emitted more often at high transaction rates if it's necessary.
>
> Patch coming soon.

Attached.

A bit fiddlier than expected, but I like the result more.

In the process I identified an issue with both the prior patch and
this one where we don't check slot validity for slots that existed on
standby prior to promotion of standby to master. We were just assuming
that being the master was good enough, since it controls
replication_slot_catalog_xmin, but that's not true for pre-existing
slots.

Fixed by forcing update of the persistent safe catalog xmin after the
first slot is created on the master - which is now done by doing an
immediate LogStandbySnapshot() after assigning the slot's
catalog_xmin.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 0df4f4ae04f8d37c623d3a533699966c3cc0479a Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Wed, 22 Mar 2017 13:36:49 +0800
Subject: [PATCH v2] Log catalog_xmin advances before removing catalog tuples

Before advancing the effective catalog_xmin we use to remove old catalog
tuple versions, make sure it is written to WAL. This allows standbys
to know the oldest xid they can safely create a historic snapshot for.
They can then refuse to start decoding from a slot or raise a recovery
conflict.

The catalog_xmin advance is logged in the next xl_running_xacts records, so
vacuum of catalogs may be held back up to 10 seconds when a replication slot
with catalog_xmin is holding down the global catalog_xmin.
---
 src/backend/access/heap/rewriteheap.c   |  3 +-
 src/backend/access/rmgrdesc/standbydesc.c   |  5 ++-
 src/backend/access/transam/varsup.c |  1 -
 src/backend/access/transam/xlog.c   | 26 ++-
 src/backend/replication/logical/logical.c   | 54 +++
 src/backend/replication/walreceiver.c   |  2 +-
 src/backend/replication/walsender.c | 13 ++
 src/backend/storage/ipc/procarray.c | 68 +++--
 src/backend/storage/ipc/standby.c   | 25 +++
 src/bin/pg_controldata/pg_controldata.c |  2 +
 src/include/access/transam.h| 11 +
 src/include/catalog/pg_control.h|  1 +
 src/include/storage/procarray.h |  3 +-
 src/include/storage/standby.h   |  6 +++
 src/include/storage/standbydefs.h   |  1 +
 src/test/recovery/t/006_logical_decoding.pl | 15 ++-
 16 files changed, 214 insertions(+), 22 deletions(-)

diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index d7f65a5..d1400ec 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -812,7 +812,8 @@ logical_begin_heap_rewrite(RewriteState state)
 	if (!state->rs_logical_rewrite)
 		return;
 
-	ProcArrayGetReplicationSlotXmin(NULL, _xmin);
+	/* Use oldestCatalogXmin here */
+	ProcArrayGetReplicationSlotXmin(NULL, _xmin, NULL);
 
 	/*
 	 * If there are no logical slots in progress we don't need to do anything,
diff --git a/src/backend/access/rmgrdesc/standbydesc.c b/src/backend/access/rmgrdesc/standbydesc.c
index 278546a..4aaae59 100644
--- a/src/backend/access/rmgrdesc/standbydesc.c
+++ b/src/backend/access/rmgrdesc/standbydesc.c
@@ -21,10 +21,11 @@ standby_desc_running_xacts(StringInfo buf, xl_running_xacts *xlrec)
 {
 	int			i;
 
-	appendStringInfo(buf, "nextXid %u latestCompletedXid %u oldestRunningXid %u",
+	appendStringInfo(buf, "nextXid %u latestCompletedXid %u oldestRunningXid %u oldestCatalogXmin %u",
 	 xlrec->nextXid,
 	 xlrec->latestCompletedXid,
-	 xlrec->oldestRunningXid);
+	 xlrec->oldestRunningXid,
+	 xlrec->oldestCatalogXmin);
 	if (xlrec->xcnt > 0)
 	{

Re: [HACKERS] Logical decoding on standby

2017-03-29 Thread Craig Ringer

On 29 March 2017 at 23:13, Simon Riggs  wrote:
> On 29 March 2017 at 10:17, Craig Ringer  wrote:
>> On 29 March 2017 at 16:44, Craig Ringer  wrote:
>>
>>> * Split oldestCatalogXmin tracking into separate patch
>>
>> Regarding this, Simon raised concerns about xlog volume here.
>>
>> It's pretty negligible.
>>
>> We only write a new record when a vacuum runs after catalog_xmin
>> advances on the slot with the currently-lowest catalog_xmin (or, if
>> vacuum doesn't run reasonably soon, when the bgworker next looks).
>
> I'd prefer to slow things down a little, not be so eager.
>
> If we hold back update of the catalog_xmin until when we run
> GetRunningTransactionData() we wouldn't need to produce any WAL
> records at all AND we wouldn't need to have VACUUM do
> UpdateOldestCatalogXmin(). Bgwriter wouldn't need to perform an extra
> task.
>
> That would also make this patch about half the length it is.
>
> Let me know what you think.

Good idea.

We can always add a heuristic later to make xl_running_xacts get
emitted more often at high transaction rates if it's necessary.

Patch coming soon.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-29 Thread Simon Riggs

On 29 March 2017 at 10:17, Craig Ringer  wrote:
> On 29 March 2017 at 16:44, Craig Ringer  wrote:
>
>> * Split oldestCatalogXmin tracking into separate patch
>
> Regarding this, Simon raised concerns about xlog volume here.
>
> It's pretty negligible.
>
> We only write a new record when a vacuum runs after catalog_xmin
> advances on the slot with the currently-lowest catalog_xmin (or, if
> vacuum doesn't run reasonably soon, when the bgworker next looks).

I'd prefer to slow things down a little, not be so eager.

If we hold back update of the catalog_xmin until when we run
GetRunningTransactionData() we wouldn't need to produce any WAL
records at all AND we wouldn't need to have VACUUM do
UpdateOldestCatalogXmin(). Bgwriter wouldn't need to perform an extra
task.

That would also make this patch about half the length it is.

Let me know what you think.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-29 Thread Craig Ringer

On 29 March 2017 at 16:44, Craig Ringer  wrote:

> * Split oldestCatalogXmin tracking into separate patch

Regarding this, Simon raised concerns about xlog volume here.

It's pretty negligible.

We only write a new record when a vacuum runs after catalog_xmin
advances on the slot with the currently-lowest catalog_xmin (or, if
vacuum doesn't run reasonably soon, when the bgworker next looks).

So at worst on a fairly slow moving system or one with a super high
vacuum rate we'll write one per commit. But in most cases we'll write
a lot fewer than that. When running t/006_logical_decoding.pl for
example:

$ ../../../src/bin/pg_waldump/pg_waldump
tmp_check/data_master_daPa/pgdata/pg_wal/00010001  |
grep CATALOG
rmgr: Transaction len (rec/tot):  4/30, tx:  0, lsn:
0/01648D50, prev 0/01648D18, desc: CATALOG_XMIN catalog_xmin 555
rmgr: Transaction len (rec/tot):  4/30, tx:  0, lsn:
0/0164C840, prev 0/0164C378, desc: CATALOG_XMIN catalog_xmin 0
pg_waldump: FATAL:  error in WAL record at 0/16BBF10: invalid record
length at 0/16BBF88: wanted 24, got 0


and of course, none at all unless you use logical decoding.



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-29 Thread Craig Ringer

On 29 March 2017 at 08:11, Craig Ringer  wrote:
> On 29 March 2017 at 08:01, Craig Ringer  wrote:
>
>> I just notice that I failed to remove the docs changes regarding
>> dropping slots becoming db-specific, so I'll post a follow-up for that
>> in a sec.
>
> Attached.

... and here's the next in the patch series. Both this and the
immediately prior minor patch fix-drop-slot-docs.patch are pending
now.

Notable changes in this patch since review:

* Split oldestCatalogXmin tracking into separate patch

* Critically, fix use of procArray->replication_slot_catalog_xmin in
GetSnapshotData's setting of RecentGlobalXmin and RecentGlobalDataXmin
so it instead uses ShmemVariableCache->oldestCatalogXmin . This
could've led to tuples newer than oldestCatalogXmin being removed.

* Memory barrier in UpdateOldestCatalogXmin and SetOldestCatalogXmin.
It still does a pre-check before deciding if it needs to take
ProcArrayLock, recheck, and advance, since we don't want to
unnecessarily contest ProcArrayLock.

* Remove unnecessary volatile usage (retained in
UpdateOldestCatalogXmin due to barrier)

* Remove unnecessary test for XLogInsertAllowed() in XactLogCatalogXminUpdate

* EnsureActiveLogicalSlotValid(void)  - add (void)

* pgidented changes in this diff; have left unrelated changes alone

Re:

> what does
>
> +   TransactionId oldestCatalogXmin; /* oldest xid where complete catalog 
> state
> + * 
> is guaranteed to still exist */
>
> mean?  I complained about the overall justification in the commit
> already, but looking at this commit alone, the justification for this
> part of the change is quite hard to understand.

The patch now contains

TransactionId oldestCatalogXmin; /* oldest xid it is guaranteed to be safe
  * to create a historic snapshot for; see
  * also
  * procArray->replication_slot_catalog_xmin
  * */

which I think is an improvement.

I've also sought to explain the purpose of this change better with

/*
 * If necessary, copy the current catalog_xmin needed by replication slots to
 * the effective catalog_xmin used for dead tuple removal and write a WAL
 * record recording the change.
 *
 * This allows standbys to know the oldest xid for which it is safe to create
 * a historic snapshot for logical decoding. VACUUM or other cleanup may have
 * removed catalog tuple versions needed to correctly decode transactions older
 * than this threshold. Standbys can use this information to cancel conflicting
 * decoding sessions and invalidate slots that need discarded information.
 *
 * (We can't use the transaction IDs in WAL records emitted by VACUUM etc for
 * this, since they don't identify the relation as a catalog or not.  Nor can a
 * standby look up the relcache to get the Relation for the affected
 * relfilenode to check if it is a catalog. The standby would also have no way
 * to know the oldest safe position at startup if it wasn't in the control
 * file.)
 */
void
UpdateOldestCatalogXmin(void)
{
...

Does that help?

(Sidenote for later: ResolveRecoveryConflictWithLogicalDecoding will
need a read barrier too, when the next patch adds it.)

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 4b8e3aaa52539ef8cf3c79d1ed0319cc44800a32 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Wed, 22 Mar 2017 13:36:49 +0800
Subject: [PATCH] Log catalog_xmin advances before removing catalog tuples

Write a WAL record before advancing the oldest catalog_xmin preserved by
VACUUM and other tuple removal.

Previously GetOldestXmin would use procArray->replication_slot_catalog_xmin as
the xid limit for vacuuming catalog tuples, so it was not possible for standbys
to determine whether all catalog tuples needed for a catalog snapshot for a
given xid would still exist.

Logging catalog_xmin advances allows standbys to determine if a logical slot on
the standby has become unsafe to use. It can then refuse to start logical
decoding on that slot or, if decoding is in progress, raise a conflict with
recovery.

Note that we only emit new WAL records if catalog_xmin changes, which happens
due to changes in slot state. So this won't generate WAL whenever oldestXmin
advances.
---
 src/backend/access/heap/rewriteheap.c   |   3 +-
 src/backend/access/rmgrdesc/xactdesc.c  |   9 +++
 src/backend/access/transam/varsup.c |  14 
 src/backend/access/transam/xact.c   |  35 
 src/backend/access/transam/xlog.c   |  12 ++-
 src/backend/commands/vacuum.c   |   9 +++
 src/backend/postmaster/bgwriter.c   |  10 +++
 src/backend/replication/logical/decode.c|  11 +++

Re: [HACKERS] Logical decoding on standby

2017-03-28 Thread Craig Ringer

On 29 March 2017 at 08:01, Craig Ringer  wrote:

> I just notice that I failed to remove the docs changes regarding
> dropping slots becoming db-specific, so I'll post a follow-up for that
> in a sec.

Attached.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 5fe01aef643905ec1f6dcffd0f5d583809fc9a21 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Wed, 29 Mar 2017 08:03:06 +0800
Subject: [PATCH] Documentation amendments for slot drop on db drop

The "Cleanup slots during drop database" patch incorrectly documented that
dropping logical slots must now be done from the database the slot was created
on. This was the case in an earlier variant of the patch, but not the committed
version.

Also document that idle logical replication slots will be dropped by
DROP DATABASE.
---
 doc/src/sgml/func.sgml  | 3 +--
 doc/src/sgml/protocol.sgml  | 2 --
 doc/src/sgml/ref/drop_database.sgml | 7 +++
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 78508d7..ba6f8dd 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18876,8 +18876,7 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());

 Drops the physical or logical replication slot
 named slot_name. Same as replication protocol
-command DROP_REPLICATION_SLOT. For logical slots, this must
-be called when connected to the same database the slot was created on.
+command DROP_REPLICATION_SLOT.

   
 
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 5f97141..b3a5026 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2034,8 +2034,6 @@ The commands accepted in walsender mode are:
  
   Drops a replication slot, freeing any reserved server-side resources. If
   the slot is currently in use by an active connection, this command fails.
-  If the slot is a logical slot that was created in a database other than
-  the database the walsender is connected to, this command fails.
  
  
   
diff --git a/doc/src/sgml/ref/drop_database.sgml b/doc/src/sgml/ref/drop_database.sgml
index 740aa31..3427139 100644
--- a/doc/src/sgml/ref/drop_database.sgml
+++ b/doc/src/sgml/ref/drop_database.sgml
@@ -81,6 +81,13 @@ DROP DATABASE [ IF EXISTS ] name
 instead,
which is a wrapper around this command.
   
+
+  
+   Active logical
+   replication slots count as connections and will prevent a
+   database from being dropped. Inactive slots will be automatically
+   dropped when the database is dropped.
+  
  
 
  
-- 
2.5.5


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-28 Thread Craig Ringer

On 28 March 2017 at 23:22, Andres Freund  wrote:

>> --- a/doc/src/sgml/protocol.sgml
>> +++ b/doc/src/sgml/protocol.sgml
>> @@ -2034,6 +2034,8 @@ The commands accepted in walsender mode are:
>>   
>>Drops a replication slot, freeing any reserved server-side resources. 
>> If
>>the slot is currently in use by an active connection, this command 
>> fails.
>> +  If the slot is a logical slot that was created in a database other 
>> than
>> +  the database the walsender is connected to, this command fails.
>>   
>>   
>>
>
> Shouldn't the docs in the drop database section about this?

DROP DATABASE doesn't really discuss all the resources it drops, but
I'm happy to add mention of replication slots handling.

I just notice that I failed to remove the docs changes regarding
dropping slots becoming db-specific, so I'll post a follow-up for that
in a sec.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-28 Thread Andres Freund

Hi,

On 2017-03-27 16:03:48 +0800, Craig Ringer wrote:
> On 27 March 2017 at 14:08, Craig Ringer  wrote:
> 
> > So this patch makes ReplicationSlotAcquire check that the slot
> > database matches the current database and refuse to acquire the slot
> > if it does not.
> 
> New patch attached that drops above requirement, so slots can still be
> dropped from any DB.
> 
> This introduces a narrow race window where DROP DATABASE may ERROR if
> somebody connects to a different database and runs a
> pg_drop_replication_slot(...) for one of the slots being dropped by
> DROP DATABASE after we check for active slots but before we've dropped
> the slot. But it's hard to hit and it's pretty harmless; the worst
> possible result is dropping one or more of the slots before we ERROR
> out of the DROP. But you clearly didn't want them anyway, since you
> were dropping the DB and dropping some slots at the same time.
> 
> I think this one's ready to go.
> 
> -- 
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services

> From 99d5313d3a265bcc57ca6845230b9ec49d188710 Mon Sep 17 00:00:00 2001
> From: Craig Ringer 
> Date: Wed, 22 Mar 2017 13:21:09 +0800
> Subject: [PATCH] Make DROP DATABASE drop logical slots for the DB
> 
> Automatically drop all logical replication slots associated with a
> database when the database is dropped.
> ---
>  doc/src/sgml/func.sgml |  3 +-
>  doc/src/sgml/protocol.sgml |  2 +
>  src/backend/commands/dbcommands.c  | 32 +---
>  src/backend/replication/slot.c | 88 
> ++
>  src/include/replication/slot.h |  1 +
>  src/test/recovery/t/006_logical_decoding.pl| 40 +-
>  .../recovery/t/010_logical_decoding_timelines.pl   | 30 +++-
>  7 files changed, 182 insertions(+), 14 deletions(-)
> 
> diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
> index ba6f8dd..78508d7 100644
> --- a/doc/src/sgml/func.sgml
> +++ b/doc/src/sgml/func.sgml
> @@ -18876,7 +18876,8 @@ postgres=# SELECT * FROM 
> pg_walfile_name_offset(pg_stop_backup());
> 
>  Drops the physical or logical replication slot
>  named slot_name. Same as replication protocol
> -command DROP_REPLICATION_SLOT.
> +command DROP_REPLICATION_SLOT. For logical slots, this 
> must
> +be called when connected to the same database the slot was created 
> on.
> 
>
>  
> diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
> index b3a5026..5f97141 100644
> --- a/doc/src/sgml/protocol.sgml
> +++ b/doc/src/sgml/protocol.sgml
> @@ -2034,6 +2034,8 @@ The commands accepted in walsender mode are:
>   
>Drops a replication slot, freeing any reserved server-side resources. 
> If
>the slot is currently in use by an active connection, this command 
> fails.
> +  If the slot is a logical slot that was created in a database other than
> +  the database the walsender is connected to, this command fails.
>   
>   
>

Shouldn't the docs in the drop database section about this?


> +void
> +ReplicationSlotsDropDBSlots(Oid dboid)
> +{
> + int i;
> +
> + if (max_replication_slots <= 0)
> + return;
> +
> +restart:
> + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
> + for (i = 0; i < max_replication_slots; i++)
> + {
> + ReplicationSlot *s;
> + NameData slotname;
> + int active_pid;
> +
> + s = >replication_slots[i];
> +
> + /* cannot change while ReplicationSlotCtlLock is held */
> + if (!s->in_use)
> + continue;
> +
> + /* only logical slots are database specific, skip */
> + if (!SlotIsLogical(s))
> + continue;
> +
> + /* not our database, skip */
> + if (s->data.database != dboid)
> + continue;
> +
> + /* Claim the slot, as if ReplicationSlotAcquire()ing. */
> + SpinLockAcquire(>mutex);
> + strncpy(NameStr(slotname), NameStr(s->data.name), NAMEDATALEN);
> + NameStr(slotname)[NAMEDATALEN-1] = '\0';
> + active_pid = s->active_pid;
> + if (active_pid == 0)
> + {
> + MyReplicationSlot = s;
> + s->active_pid = MyProcPid;
> + }
> + SpinLockRelease(>mutex);
> +
> + /*
> +  * We might fail here if the slot was active. Even though we 
> hold an
> +  * exclusive lock on the database object a logical slot for 
> that DB can
> +  * still be active if it's being dropped by a backend connected 
> to
> +  * another DB or is otherwise acquired.
> +  *

Re: [HACKERS] Logical decoding on standby

2017-03-27 Thread Craig Ringer

On 27 March 2017 at 16:20, Simon Riggs  wrote:
> On 27 March 2017 at 09:03, Craig Ringer  wrote:
>
>> I think this one's ready to go.
>
> Looks like something I could commit. Full review by me while offline
> today, aiming to commit tomorrow barring issues raised.

Great.

Meanwhile I'm going to be trying to work with Stas on 2PC logical
decoding, while firming up the next patches in this series to see if
we can progress a bit further.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-27 Thread Simon Riggs

On 27 March 2017 at 09:03, Craig Ringer  wrote:

> I think this one's ready to go.

Looks like something I could commit. Full review by me while offline
today, aiming to commit tomorrow barring issues raised.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-27 Thread Craig Ringer

On 27 March 2017 at 14:08, Craig Ringer  wrote:

> So this patch makes ReplicationSlotAcquire check that the slot
> database matches the current database and refuse to acquire the slot
> if it does not.

New patch attached that drops above requirement, so slots can still be
dropped from any DB.

This introduces a narrow race window where DROP DATABASE may ERROR if
somebody connects to a different database and runs a
pg_drop_replication_slot(...) for one of the slots being dropped by
DROP DATABASE after we check for active slots but before we've dropped
the slot. But it's hard to hit and it's pretty harmless; the worst
possible result is dropping one or more of the slots before we ERROR
out of the DROP. But you clearly didn't want them anyway, since you
were dropping the DB and dropping some slots at the same time.

I think this one's ready to go.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 99d5313d3a265bcc57ca6845230b9ec49d188710 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Wed, 22 Mar 2017 13:21:09 +0800
Subject: [PATCH] Make DROP DATABASE drop logical slots for the DB

Automatically drop all logical replication slots associated with a
database when the database is dropped.
---
 doc/src/sgml/func.sgml |  3 +-
 doc/src/sgml/protocol.sgml |  2 +
 src/backend/commands/dbcommands.c  | 32 +---
 src/backend/replication/slot.c | 88 ++
 src/include/replication/slot.h |  1 +
 src/test/recovery/t/006_logical_decoding.pl| 40 +-
 .../recovery/t/010_logical_decoding_timelines.pl   | 30 +++-
 7 files changed, 182 insertions(+), 14 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index ba6f8dd..78508d7 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18876,7 +18876,8 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());

 Drops the physical or logical replication slot
 named slot_name. Same as replication protocol
-command DROP_REPLICATION_SLOT.
+command DROP_REPLICATION_SLOT. For logical slots, this must
+be called when connected to the same database the slot was created on.

   
 
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index b3a5026..5f97141 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2034,6 +2034,8 @@ The commands accepted in walsender mode are:
  
   Drops a replication slot, freeing any reserved server-side resources. If
   the slot is currently in use by an active connection, this command fails.
+  If the slot is a logical slot that was created in a database other than
+  the database the walsender is connected to, this command fails.
  
  
   
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 5a63b1a..c0ba2b4 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -845,19 +845,22 @@ dropdb(const char *dbname, bool missing_ok)
  errmsg("cannot drop the currently open database")));
 
 	/*
-	 * Check whether there are, possibly unconnected, logical slots that refer
-	 * to the to-be-dropped database. The database lock we are holding
-	 * prevents the creation of new slots using the database.
+	 * Check whether there are active logical slots that refer to the
+	 * to-be-dropped database. The database lock we are holding prevents the
+	 * creation of new slots using the database or existing slots becoming
+	 * active.
 	 */
-	if (ReplicationSlotsCountDBSlots(db_id, , _active))
+	(void) ReplicationSlotsCountDBSlots(db_id, , _active);
+	if (nslots_active)
+	{
 		ereport(ERROR,
 (errcode(ERRCODE_OBJECT_IN_USE),
-			  errmsg("database \"%s\" is used by a logical replication slot",
+			  errmsg("database \"%s\" is used by an active logical replication slot",
 	 dbname),
- errdetail_plural("There is %d slot, %d of them active.",
-  "There are %d slots, %d of them active.",
-  nslots,
-  nslots, nslots_active)));
+ errdetail_plural("There is %d active slot",
+  "There are %d active slots",
+  nslots_active, nslots_active)));
+	}
 
 	/*
 	 * Check for other backends in the target database.  (Because we hold the
@@ -915,6 +918,11 @@ dropdb(const char *dbname, bool missing_ok)
 	dropDatabaseDependencies(db_id);
 
 	/*
+	 * Drop db-specific replication slots.
+	 */
+	ReplicationSlotsDropDBSlots(db_id);
+
+	/*
 	 * Drop pages for this database that are in the shared buffer cache. This
 	 * is important to ensure that no remaining backend tries to write out a
 	 * dirty buffer to the dead database later...
@@ -2124,11 +2132,17 @@ dbase_redo(XLogReaderState *record)
 			 * InitPostgres() cannot fully

Re: [HACKERS] Logical decoding on standby

2017-03-27 Thread Craig Ringer

Hi

Here's the next patch in the split-up series, drop db-specific
(logical) replication slots on DROP DATABASE.

Current behaviour is to ERROR if logical slots exist on the DB,
whether in-use or not.

With this patch we can DROP a database if it has logical slots so long
as they are not active. I haven't added any sort of syntax for this,
it's just done unconditionally.

I don't see any sensible way to stop a slot becoming active after we
check for active slots and begin the actual database DROP, since
ReplicationSlotAcquire will happily acquire a db-specific slot for a
different DB and the only lock it takes is a shared lock on
ReplicationSlotControlLock, which we presumably don't want to hold
throughout DROP DATABASE.

So this patch makes ReplicationSlotAcquire check that the slot
database matches the current database and refuse to acquire the slot
if it does not. The only sensible reason to acquire a slot from a
different DB is to drop it, and then it's only a convenience at best.
Slot drop is the only user-visible behaviour change, since all other
activity on logical slots happened when the backend was already
connected to the slot's DB. Appropriate docs changes have been made
and tests added.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From c126a10e40aba0c39a43a97da591492d6240659c Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Wed, 22 Mar 2017 13:21:09 +0800
Subject: [PATCH] Make DROP DATABASE drop logical slots for the DB

Automatically drop all logical replication slots associated with a
database when the database is dropped.

As a side-effect, pg_drop_replication_slot(...) may now only drop
logical slots when connected to the slot's database.
---
 contrib/test_decoding/sql/slot.sql |  8 ++
 doc/src/sgml/func.sgml |  3 +-
 doc/src/sgml/protocol.sgml |  2 +
 src/backend/commands/dbcommands.c  | 32 +--
 src/backend/replication/logical/logical.c  | 12 +--
 src/backend/replication/slot.c | 97 +-
 src/include/replication/slot.h |  1 +
 src/test/recovery/t/006_logical_decoding.pl| 34 +++-
 .../recovery/t/010_logical_decoding_timelines.pl   | 30 ++-
 9 files changed, 194 insertions(+), 25 deletions(-)

diff --git a/contrib/test_decoding/sql/slot.sql b/contrib/test_decoding/sql/slot.sql
index 7ca83fe..22b22f3 100644
--- a/contrib/test_decoding/sql/slot.sql
+++ b/contrib/test_decoding/sql/slot.sql
@@ -48,3 +48,11 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot1', 'test_
 -- both should error as they should be dropped on error
 SELECT pg_drop_replication_slot('regression_slot1');
 SELECT pg_drop_replication_slot('regression_slot2');
+
+CREATE DATABASE testdb;
+\c testdb
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_otherdb', 'test_decoding');
+\c regression
+SELECT pg_drop_replication_slot('regression_slot_otherdb');
+\c testdb
+SELECT pg_drop_replication_slot('regression_slot_otherdb');
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index ba6f8dd..78508d7 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18876,7 +18876,8 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());

 Drops the physical or logical replication slot
 named slot_name. Same as replication protocol
-command DROP_REPLICATION_SLOT.
+command DROP_REPLICATION_SLOT. For logical slots, this must
+be called when connected to the same database the slot was created on.

   
 
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index b3a5026..5f97141 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2034,6 +2034,8 @@ The commands accepted in walsender mode are:
  
   Drops a replication slot, freeing any reserved server-side resources. If
   the slot is currently in use by an active connection, this command fails.
+  If the slot is a logical slot that was created in a database other than
+  the database the walsender is connected to, this command fails.
  
  
   
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 5a63b1a..7fe2c2b 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -845,19 +845,22 @@ dropdb(const char *dbname, bool missing_ok)
  errmsg("cannot drop the currently open database")));
 
 	/*
-	 * Check whether there are, possibly unconnected, logical slots that refer
-	 * to the to-be-dropped database. The database lock we are holding
-	 * prevents the creation of new slots using the database.
+	 * Check whether there are active logical slots that refer to the
+	 * to-be-dropped database. The database lock we are holding prevents the
+	 * creation of

Re: [HACKERS] Logical decoding on standby

2017-03-26 Thread Craig Ringer

On 20 March 2017 at 17:33, Andres Freund  wrote:

> Have you checked how high the overhead of XLogReadDetermineTimeline is?
> A non-local function call, especially into a different translation-unit
> (no partial inlining), for every single page might end up being
> noticeable.  That's fine in the cases it actually adds functionality,
> but for a master streaming out data, that's not actually adding
> anything.

I haven't been able to measure any difference. But, since we require
the caller to ensure a reasonably up to date ThisTimeLineID, maybe
it's worth adding an inlineable function for the fast-path that tests
the "page cached" and "timeline is current and unchanged" conditions?

//xlogutils.h:
static inline void XLogReadDetermineTimeline(...)
{
   ... first test for page already read-in and valid ...
   ... second test for ThisTimeLineId ...
   XLogReadCheckTimeLineChange(...)
}

XLogReadCheckTimeLineChange(...)
{
   ... rest of function
}

(Yes, I know "inline" means little, but it's a hint for readers)

I'd rather avoid using a macro since it'd be pretty ugly, but it's
also an option if an inline func is undesirable.

#define XLOG_READ_DETERMINE_TIMELINE \
  do { \
... same as above ...
  } while (0);

Can be done after CF if needed anyway, it's just fiddling some code
around. Figured I'd mention though.

>> + /*
>> +  * To avoid largely duplicating ReplicationSlotDropAcquired() 
>> or
>> +  * complicating it with already_locked flags for ProcArrayLock,
>> +  * ReplicationSlotControlLock and 
>> ReplicationSlotAllocationLock, we
>> +  * just release our ReplicationSlotControlLock to drop the 
>> slot.
>> +  *
>> +  * There's no race here: we acquired this slot, and no slot 
>> "behind"
>> +  * our scan can be created or become active with our target 
>> dboid due
>> +  * to our exclusive lock on the DB.
>> +  */
>> + LWLockRelease(ReplicationSlotControlLock);
>> + ReplicationSlotDropAcquired();
>> + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
>
> I don't see much problem with this, but I'd change the code so you
> simply do a goto restart; if you released the slot.  Then there's a lot
> less chance / complications around temporarily releasing
> ReplicationSlotControlLock

I don't quite get this. I suspect I'm just not seeing the implications
as clearly as you do.

Do you mean we should restart the whole scan of the slot array if we
drop any slot? That'll be O(n log m) but since we don't expect to be
working on a big array or a lot of slots it's unlikely to matter.

The patch coming soon will assume we'll restart the whole scan, anyway.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-25 Thread Simon Riggs

On 24 March 2017 at 06:23, Craig Ringer  wrote:
> On 23 March 2017 at 17:44, Craig Ringer  wrote:
>
> Minor update to catalog_xmin walsender patch to fix failure to
> parenthesize definition of PROCARRAY_PROC_FLAGS_MASK .
>
> This one's ready to go. Working on drop slots on DB drop now.

Committed. Next!

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-24 Thread Craig Ringer

On 23 March 2017 at 17:44, Craig Ringer  wrote:

Minor update to catalog_xmin walsender patch to fix failure to
parenthesize definition of PROCARRAY_PROC_FLAGS_MASK .

This one's ready to go. Working on drop slots on DB drop now.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From b5e34ecaa8f43825fe41ae2e2bbf0a97258cb56a Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Wed, 22 Mar 2017 12:29:13 +0800
Subject: [PATCH] Report catalog_xmin separately to xmin in hot standby
 feedback

The catalog_xmin of slots on a standby was reported as part of the standby's
xmin, causing the master's xmin to be held down. This could cause considerable
unnecessary bloat on the master.

Instead, report catalog_xmin as a separate field in hot_standby_feedback. If
the upstream walsender is using a physical replication slot, store the
catalog_xmin in the slot's catalog_xmin field. If the upstream doesn't use a
slot and has only a PGPROC entry behaviour doesn't change, as we store the
combined xmin and catalog_xmin in the PGPROC entry.

There's no backward compatibility concern here, as nothing except another
postgres instance of the same major version has any business sending hot
standby feedback and it's only used on the physical replication protocol.
---
 doc/src/sgml/protocol.sgml |  33 ++-
 src/backend/replication/walreceiver.c  |  45 +++--
 src/backend/replication/walsender.c| 110 +++--
 src/backend/storage/ipc/procarray.c|  12 ++-
 src/include/storage/proc.h |   5 +
 src/include/storage/procarray.h|  11 +++
 .../recovery/t/010_logical_decoding_timelines.pl   |  38 ++-
 7 files changed, 202 insertions(+), 52 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 48ca414..b3a5026 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1916,10 +1916,11 @@ The commands accepted in walsender mode are:
   
   
   
-  The standby's current xmin. This may be 0, if the standby is
-  sending notification that Hot Standby feedback will no longer
-  be sent on this connection. Later non-zero messages may
-  reinitiate the feedback mechanism.
+  The standby's current global xmin, excluding the catalog_xmin from any
+  replication slots. If both this value and the following
+  catalog_xmin are 0 this is treated as a notification that Hot Standby
+  feedback will no longer be sent on this connection. Later non-zero
+  messages may reinitiate the feedback mechanism.
   
   
   
@@ -1929,7 +1930,29 @@ The commands accepted in walsender mode are:
   
   
   
-  The standby's current epoch.
+  The epoch of the global xmin xid on the standby.
+  
+  
+  
+  
+  
+  Int32
+  
+  
+  
+  The lowest catalog_xmin of any replication slots on the standby. Set to 0
+  if no catalog_xmin exists on the standby or if hot standby feedback is being
+  disabled.
+  
+  
+  
+  
+  
+  Int32
+  
+  
+  
+  The epoch of the catalog_xmin xid on the standby.
   
   
   
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 31c567b..0f22f17 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1175,8 +1175,8 @@ XLogWalRcvSendHSFeedback(bool immed)
 {
 	TimestampTz now;
 	TransactionId nextXid;
-	uint32		nextEpoch;
-	TransactionId xmin;
+	uint32		xmin_epoch, catalog_xmin_epoch;
+	TransactionId xmin, catalog_xmin;
 	static TimestampTz sendTime = 0;
 	/* initially true so we always send at least one feedback message */
 	static bool master_has_standby_xmin = true;
@@ -1221,29 +1221,56 @@ XLogWalRcvSendHSFeedback(bool immed)
 	 * everything else has been checked.
 	 */
 	if (hot_standby_feedback)
-		xmin = GetOldestXmin(NULL, PROCARRAY_FLAGS_DEFAULT);
+	{
+		TransactionId slot_xmin;
+
+		/*
+		 * Usually GetOldestXmin() would include the global replication slot
+		 * xmin and catalog_xmin in its calculations, but we don't want to hold
+		 * upstream back from vacuuming normal user table tuples just because
+		 * they're within the catalog_xmin horizon of logical replication slots
+		 * on this standby, so we ignore slot xmin and catalog_xmin GetOldestXmin
+		 * then deal with them ourselves.
+		 */
+		xmin = GetOldestXmin(NULL,
+			 PROCARRAY_FLAGS_DEFAULT|PROCARRAY_SLOTS_XMIN);
+
+		ProcArrayGetReplicationSlotXmin(_xmin, _xmin);
+
+		if (TransactionIdIsValid(slot_xmin) &&
+			TransactionIdPrecedes(slot_xmin, xmin))
+			xmin = slot_xmin;
+	}
 	else
+	{
 		xmin = InvalidTransactionId;
+		catalog_xmin =

Re: [HACKERS] Logical decoding on standby

2017-03-23 Thread Craig Ringer

On 23 March 2017 at 16:07, Craig Ringer  wrote:

> If preferred I can instead add
>
> proc.h:
>
> #define PROC_RESERVED 0x20
>
> procarray.h:
>
> #define PROCARRAY_REPLICATION_SLOTS 0x20
>
> and then test for (flags & PROCARRAY_REPLICATION_SLOTS)

Attached done that way.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 2e887ee19c9c1bae442b9f0682169f9b0c61268a Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Wed, 22 Mar 2017 12:29:13 +0800
Subject: [PATCH] Report catalog_xmin separately to xmin in hot standby
 feedback

The catalog_xmin of slots on a standby was reported as part of the standby's
xmin, causing the master's xmin to be held down. This could cause considerable
unnecessary bloat on the master.

Instead, report catalog_xmin as a separate field in hot_standby_feedback. If
the upstream walsender is using a physical replication slot, store the
catalog_xmin in the slot's catalog_xmin field. If the upstream doesn't use a
slot and has only a PGPROC entry behaviour doesn't change, as we store the
combined xmin and catalog_xmin in the PGPROC entry.

There's no backward compatibility concern here, as nothing except another
postgres instance of the same major version has any business sending hot
standby feedback and it's only used on the physical replication protocol.
---
 doc/src/sgml/protocol.sgml |  33 ++-
 src/backend/replication/walreceiver.c  |  45 +++--
 src/backend/replication/walsender.c| 110 +++--
 src/backend/storage/ipc/procarray.c|  12 ++-
 src/include/storage/proc.h |   5 +
 src/include/storage/procarray.h|  11 +++
 .../recovery/t/010_logical_decoding_timelines.pl   |  38 ++-
 7 files changed, 202 insertions(+), 52 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 244e381..d8786f0 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1911,10 +1911,11 @@ The commands accepted in walsender mode are:
   
   
   
-  The standby's current xmin. This may be 0, if the standby is
-  sending notification that Hot Standby feedback will no longer
-  be sent on this connection. Later non-zero messages may
-  reinitiate the feedback mechanism.
+  The standby's current global xmin, excluding the catalog_xmin from any
+  replication slots. If both this value and the following
+  catalog_xmin are 0 this is treated as a notification that Hot Standby
+  feedback will no longer be sent on this connection. Later non-zero
+  messages may reinitiate the feedback mechanism.
   
   
   
@@ -1924,7 +1925,29 @@ The commands accepted in walsender mode are:
   
   
   
-  The standby's current epoch.
+  The epoch of the global xmin xid on the standby.
+  
+  
+  
+  
+  
+  Int32
+  
+  
+  
+  The lowest catalog_xmin of any replication slots on the standby. Set to 0
+  if no catalog_xmin exists on the standby or if hot standby feedback is being
+  disabled.
+  
+  
+  
+  
+  
+  Int32
+  
+  
+  
+  The epoch of the catalog_xmin xid on the standby.
   
   
   
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 31c567b..0f22f17 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1175,8 +1175,8 @@ XLogWalRcvSendHSFeedback(bool immed)
 {
 	TimestampTz now;
 	TransactionId nextXid;
-	uint32		nextEpoch;
-	TransactionId xmin;
+	uint32		xmin_epoch, catalog_xmin_epoch;
+	TransactionId xmin, catalog_xmin;
 	static TimestampTz sendTime = 0;
 	/* initially true so we always send at least one feedback message */
 	static bool master_has_standby_xmin = true;
@@ -1221,29 +1221,56 @@ XLogWalRcvSendHSFeedback(bool immed)
 	 * everything else has been checked.
 	 */
 	if (hot_standby_feedback)
-		xmin = GetOldestXmin(NULL, PROCARRAY_FLAGS_DEFAULT);
+	{
+		TransactionId slot_xmin;
+
+		/*
+		 * Usually GetOldestXmin() would include the global replication slot
+		 * xmin and catalog_xmin in its calculations, but we don't want to hold
+		 * upstream back from vacuuming normal user table tuples just because
+		 * they're within the catalog_xmin horizon of logical replication slots
+		 * on this standby, so we ignore slot xmin and catalog_xmin GetOldestXmin
+		 * then deal with them ourselves.
+		 */
+		xmin = GetOldestXmin(NULL,
+			 PROCARRAY_FLAGS_DEFAULT|PROCARRAY_SLOTS_XMIN);
+
+		ProcArrayGetReplicationSlotXmin(_xmin, _xmin);
+
+		if (TransactionIdIsValid(slot_xmin) &&
+			TransactionIdPrecedes(slot_xmin, xmin))
+			xmin = slot_xmin;
+	}
 	else
+	{
 		xmin =

Re: [HACKERS] Logical decoding on standby

2017-03-23 Thread Craig Ringer

On 23 March 2017 at 00:13, Simon Riggs  wrote:
> On 22 March 2017 at 08:53, Craig Ringer  wrote:
>
>> I'm splitting up the rest of the decoding on standby patch set with
>> the goal of getting minimal functionality for creating and managing
>> slots on standbys in, so we can maintain slots on standbys and use
>> them when the standby is promoted to master.
>>
>> The first, to send catalog_xmin separately to the global xmin on
>> hot_standby_feedback and store it in the upstream physical slot's
>> catalog_xmin, is attached.
>>
>> These are extracted directly from the logical decoding on standby
>> patch, with comments by Petr and Andres made re the relevant code
>> addressed.
>
> I've reduced your two patches back to one with a smaller blast radius.
>
> I'll commit this tomorrow morning, barring objections.

This needs rebasing on top of

commit af4b1a0869bd3bb52e5f662e4491554b7f611489
Author: Simon Riggs 
Date:   Wed Mar 22 16:51:01 2017 +

Refactor GetOldestXmin() to use flags

Replace ignoreVacuum parameter with more flexible flags.

Author: Eiji Seki
Review: Haribabu Kommi


That patch landed up using PROCARRAY flags directly as flags to
GetOldestXmin, so it doesn't make much sense to add a flag like
PROCARRAY_REPLICATION_SLOTS . There won't be any corresponding PROC_
flag for PGXACT->vacuumFlags, replication slot xmin and catalog_xmin
are global state not tracked in individual proc entries.

Rather than add some kind of "PROC_RESERVED" flag in proc.h that would
never be used and only exist to reserve a bit for use for
PROCARRAY_REPLICATION_SLOTS, which we'd use a flag to GetOldestXmin, I
added a new argument to GetOldestXmin like the prior patch did.

If preferred I can instead add

proc.h:

#define PROC_RESERVED 0x20

procarray.h:

#define PROCARRAY_REPLICATION_SLOTS 0x20

and then test for (flags & PROCARRAY_REPLICATION_SLOTS)

but that's kind of ugly to say the least, I'd rather just add another argument.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From ffa43fae35857dbff0efe83ef199df165d887d97 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Wed, 22 Mar 2017 12:29:13 +0800
Subject: [PATCH] Report catalog_xmin separately to xmin in hot standby
 feedback

The catalog_xmin of slots on a standby was reported as part of the standby's
xmin, causing the master's xmin to be held down. This could cause considerable
unnecessary bloat on the master.

Instead, report catalog_xmin as a separate field in hot_standby_feedback. If
the upstream walsender is using a physical replication slot, store the
catalog_xmin in the slot's catalog_xmin field. If the upstream doesn't use a
slot and has only a PGPROC entry behaviour doesn't change, as we store the
combined xmin and catalog_xmin in the PGPROC entry.

There's no backward compatibility concern here, as nothing except another
postgres instance of the same major version has any business sending hot
standby feedback and it's only used on the physical replication protocol.
---
 contrib/pg_visibility/pg_visibility.c  |   6 +-
 contrib/pgstattuple/pgstatapprox.c |   2 +-
 doc/src/sgml/protocol.sgml |  33 ++-
 src/backend/access/transam/xlog.c  |   4 +-
 src/backend/catalog/index.c|   3 +-
 src/backend/commands/analyze.c |   2 +-
 src/backend/commands/vacuum.c  |   5 +-
 src/backend/replication/walreceiver.c  |  44 +++--
 src/backend/replication/walsender.c| 110 +++--
 src/backend/storage/ipc/procarray.c|  11 ++-
 src/include/storage/procarray.h|   2 +-
 .../recovery/t/010_logical_decoding_timelines.pl   |  38 ++-
 12 files changed, 198 insertions(+), 62 deletions(-)

diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index ee3936e..2db5762 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -557,7 +557,7 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
 	if (all_visible)
 	{
 		/* Don't pass rel; that will fail in recovery. */
-		OldestXmin = GetOldestXmin(NULL, PROCARRAY_FLAGS_VACUUM);
+		OldestXmin = GetOldestXmin(NULL, PROCARRAY_FLAGS_VACUUM, false);
 	}
 
 	rel = relation_open(relid, AccessShareLock);
@@ -674,7 +674,9 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
  * a buffer lock. And this shouldn't happen often, so it's
  * worth being careful so as to avoid false positives.
  */
-RecomputedOldestXmin = GetOldestXmin(NULL, PROCARRAY_FLAGS_VACUUM);
+RecomputedOldestXmin = GetOldestXmin(NULL,
+	 PROCARRAY_FLAGS_VACUUM,
+	 false);
 
 if (!TransactionIdPrecedes(OldestXmin,

Re: [HACKERS] Logical decoding on standby

2017-03-22 Thread Craig Ringer

On 23 March 2017 at 12:41, Andres Freund  wrote:
> On 2017-03-23 12:14:02 +0800, Craig Ringer wrote:
>> On 23 March 2017 at 09:39, Andres Freund  wrote:
>> > I still think decoding-on-standby is simply not the right approach as
>> > the basic/first HA approach for logical rep.  It's a nice later-on
>> > feature.  But that's an irrelevant aside.
>>
>> I don't really agree that it's irrelevant.
>
> I'm not sure we have enough time for either getting some parts of your
> patch in, or for figuring out long term goals. But we definitely don't
> have time for both.

Fair.



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-22 Thread Andres Freund

On 2017-03-23 12:14:02 +0800, Craig Ringer wrote:
> On 23 March 2017 at 09:39, Andres Freund  wrote:
> > I still think decoding-on-standby is simply not the right approach as
> > the basic/first HA approach for logical rep.  It's a nice later-on
> > feature.  But that's an irrelevant aside.
> 
> I don't really agree that it's irrelevant.

I'm not sure we have enough time for either getting some parts of your
patch in, or for figuring out long term goals. But we definitely don't
have time for both.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-22 Thread Craig Ringer

On 23 March 2017 at 00:13, Simon Riggs  wrote:
> On 22 March 2017 at 08:53, Craig Ringer  wrote:
>
>> I'm splitting up the rest of the decoding on standby patch set with
>> the goal of getting minimal functionality for creating and managing
>> slots on standbys in, so we can maintain slots on standbys and use
>> them when the standby is promoted to master.
>>
>> The first, to send catalog_xmin separately to the global xmin on
>> hot_standby_feedback and store it in the upstream physical slot's
>> catalog_xmin, is attached.
>>
>> These are extracted directly from the logical decoding on standby
>> patch, with comments by Petr and Andres made re the relevant code
>> addressed.
>
> I've reduced your two patches back to one with a smaller blast radius.
>
> I'll commit this tomorrow morning, barring objections.

Thanks. I was tempted to refactor GetOldestXmin to use flags myself,
but thought it might be at higher risk of objections. Since Eiji Seki
has shown that there are other uses for excluding particular things
from GetOldestXmin it and that's committed now, it's nice to have the
impact of this patch reduced.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-22 Thread Craig Ringer

On 23 March 2017 at 09:39, Andres Freund  wrote:

> We can't just assume that snapbuild is going to work correctly when it's
> prerequisites - pinned xmin horizon - isn't working.

Makes sense.

>> What do _you_ see as the minimum acceptable way to achieve the ability
>> for a logical decoding client to follow failover of an upstream to a
>> physical standby? In the end, you're one of the main people whose view
>> carries weight in this area, and I don't want to develop yet another
>
> I think your approach here wasn't that bad? There's a lot of cleaning
> up/shoring up needed, and we probably need a smarter feedback system.  I
> don't think anybody here has objected to the fundamental approach?

That's useful, thanks.

I'm not arguing that the patch as it stands is ready, and appreciate
the input re the general design.

> I still think decoding-on-standby is simply not the right approach as
> the basic/first HA approach for logical rep.  It's a nice later-on
> feature.  But that's an irrelevant aside.

I don't really agree that it's irrelevant.

Right now Pg has no HA capability for logical decoding clients. We've
now added logical replication, but it has no way to provide for
upstream node failure and ensure a consistent switch-over, whether to
a logical or physical replica. Since real world servers fail or need
maintenance, this is kind of a problem for practical production use.

Because of transaction serialization for commit-time order replay,
logical replication experiences saw-tooth replication lag, where large
or long xacts such as batch jobs effectively stall all later xacts
until they are fully replicated. We cannot currently start replicating
a big xact until it commits on the upstream, so that lag can easily be
~2x the runtime on the upstream.

So while you can do sync rep on a logical standby, it tends to result
in big delays on COMMITs relative to physical rep, even if app are
careful to keep transactions small. When the app DR planning people
come and ask you what the max data loss window / max sync rep lag is,
you have to say " dunno? depends on what else was running on the
server at the time."

AFAICT, changing those things will require the ability to begin
streaming reorder buffers for big xacts before commit, which as the
logical decoding on 2PC thread shows is not exactly trivial. We'll
also need to be able to apply them concurrently with other xacts on
the other end. Those are both big and complex things IMO, and I'll be
surprised if we can do either in Pg11 given that AFAIK nobody has even
started work on either of them or has a detailed plan.

Presuming we get some kind of failover to logical replica upstreams
into Pg11, it'll have significant limitations relative to what we can
deliver to users by using physical replication. Especially when it
comes to bounded-time lag for HA, sync rep, etc. And I haven't seen a
design for it, though Petr and I have discussed some with regards to
pglogical.

That's why I think we need to do HA on the physical side first.
Because it's going to take a long time to get equivalent functionality
for logical rep based upstreams, and when it is we'll still have to
teach management tools and other non-logical-rep logical decoding
clients about the new way of doing things.  Wheras for physical HA
setups to support logical downstreams requires only relatively minor
changes and gets us all the physical HA features _now_.

That's why we pursued failover slots - as a simple, minimal solution
to allowing logical decoding clients to inter-operate with Pg in a
physical HA configuration. TBH, I still think we should just add them.
Sure, they don't help us achieve decoding on standby, but they're a
lot simpler and they help Pg's behaviour with slots match user
expectations for how the rest of Pg behaves, i.e. if it's on the
master it'll be on the replica too.  And as you've said, decoding on
standby is a nice-to-have, wheras I think some kind of HA support is
rather more important.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-22 Thread Andres Freund

On 2017-03-23 09:14:07 +0800, Craig Ringer wrote:
> On 23 March 2017 at 07:31, Andres Freund  wrote:
> > On 2017-03-23 06:55:53 +0800, Craig Ringer wrote:
> 
> >> I was thinking that by disallowing snapshot use and output plugin
> >> invocation we'd avoid the need to support cancellation on recovery
> >> conflicts, etc, simplifying things considerably.
> >
> > That seems like it'd end up being pretty hacky - the likelihood that
> > we'd run into snapbuild error cross-checks seems very high.
> 
> TBH I'm not following this. But I haven't touched snapbuild much yet,
> Petr's done much more with snapbuild than I have.

We can't just assume that snapbuild is going to work correctly when it's
prerequisites - pinned xmin horizon - isn't working.

> We're not going to have robust logical replication that's suitable for
> HA and failover use on high load systems until 2020 or so, with Pg 12.
> We'll need concurrent decoding and apply, which nobody's even started
> on AFAIK, we'll need sequence replication, and more.

These seem largely unrelated to the topic at hand(nor do I agree on all
of them).

> So I'd really, really like to get some kind of HA picture other than
> "none" in for logical decoding based systems. If it's imperfect, it's
> still something.

I still think decoding-on-standby is simply not the right approach as
the basic/first HA approach for logical rep.  It's a nice later-on
feature.  But that's an irrelevant aside.

I don't understand why you're making a "fundamental" argument here - I'm
not arguing against the goals of the patch at all. I want as much stuff
committed as we can in a good shape.

> What do _you_ see as the minimum acceptable way to achieve the ability
> for a logical decoding client to follow failover of an upstream to a
> physical standby? In the end, you're one of the main people whose view
> carries weight in this area, and I don't want to develop yet another

I think your approach here wasn't that bad? There's a lot of cleaning
up/shoring up needed, and we probably need a smarter feedback system.  I
don't think anybody here has objected to the fundamental approach?

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-22 Thread Craig Ringer

On 23 March 2017 at 07:31, Andres Freund  wrote:
> On 2017-03-23 06:55:53 +0800, Craig Ringer wrote:

>> I was thinking that by disallowing snapshot use and output plugin
>> invocation we'd avoid the need to support cancellation on recovery
>> conflicts, etc, simplifying things considerably.
>
> That seems like it'd end up being pretty hacky - the likelihood that
> we'd run into snapbuild error cross-checks seems very high.

TBH I'm not following this. But I haven't touched snapbuild much yet,
Petr's done much more with snapbuild than I have.

We're not going to have robust logical replication that's suitable for
HA and failover use on high load systems until 2020 or so, with Pg 12.
We'll need concurrent decoding and apply, which nobody's even started
on AFAIK, we'll need sequence replication, and more.

So I'd really, really like to get some kind of HA picture other than
"none" in for logical decoding based systems. If it's imperfect, it's
still something.

I wish we'd just proceeded with failover slots. They were blocked in
favour of decoding on standby, and HA is possible if we have decoding
on standby with some more work by the application. But now we'll have
neither. If we'd just done failover slots we could've had logical
replication able to follow failover in Pg 10.

What do _you_ see as the minimum acceptable way to achieve the ability
for a logical decoding client to follow failover of an upstream to a
physical standby? In the end, you're one of the main people whose view
carries weight in this area, and I don't want to develop yet another
approach only to have it knocked back once the work is done.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-22 Thread Craig Ringer

On 23 March 2017 at 00:17, Andres Freund  wrote:
> On 2017-03-22 15:59:42 +, Simon Riggs wrote:
>> On 22 March 2017 at 13:06, Andres Freund  wrote:
>>
>> >> The parts I think are important for Pg10 are:
>> >
>> >> * Ability to create logical slots on replicas
>> >
>> > Doesn't this also imply recovery conflicts on DROP DATABASE?
>>
>> Not needed until the slot is in use, which is a later patch.
>
> Hm? We need to drop slots, if they can exist / be created, on a standby,
> and they're on a dropped database.  Otherwise we'll reserve resources,
> while anyone connecting to the slot will likely just receive errors
> because the database doesn't exist anymore.  It's also one of the
> patches that can quite easily be developed / reviewed, because there
> really isn't anything complicated about it.  Most of the code is already
> in Craig's patch, it just needs some adjustments.

Right, I'm not too concerned about doing that, and it's next on my
TODO as I clean up the split patch series.

>> >> * Ability to advance (via feedback or via SQL function) - no need to
>> >> actually decode and call output plugins at al
>> >
>> > That pretty much requires decoding, otherwise you really don't know how
>> > much WAL you have to retain.
>>
>> Knowing how much WAL to retain is key.
>>
>> Why would decoding tell you how much WAL to retain?
>
> Because decoding already has the necessary logic?  (You need to retain
> enough WAL to restart decoding for all in-progress transactions etc).

Indeed; after all, standby status updates from the decoding client
only contain the *flushed* LSN. The downstream doesn't know the
restartpoint LSN, it must be tracked by the upstream.

It's also necessary to maintain our catalog_xmin correctly on the
standby so we can send it via hot_standby_feedback to a physical
replication slot used on the master, ensuring the master doesn't
remove catalog tuples we may still need.

> I don't know what you're on about with that statement.  I've spent a
> good chunk of time looking at the 0003 patch, even though it's large
> and contains a lot of different things.  I suggested splitting things up.
> I even suggested what to move earlier after Craig agreed with that
> sentiment, in the mail you're replying to, because it seems
> independently doable.

I really appreciate the review, as I'm all too aware of how time
consuming it can be.

>From my PoV, the difficulty I'm in is that this patch series has
languished for most of the Pg 10 release cycle with no real input from
stakeholders in the logical decoding area, so while the review is
important, the fact that it's now means that it pretty comprehensively
blocks the patch for Pg 10. I asked on list for input on structure
(if/how to split it up) literally months ago, for example.

I've been struggling to get some kind of support for logical decoding
on standbys for most of two release cycles, and there are people
climbing the walls wanting it. I'm trying to make sure it's done
right, but I can't do that alone, and it's hard to progress when I
don't know what will be expected until it's too late to do anything
about it.

I guess all we can do at this point is get the foundations in place
and carry on for Pg 11, where the presence of in-core logical
replication will offer a lever to actually push this in. In the mean
time I'll have to continue carrying the out-of-tree failover slots
patch for people who use logical decoding and want it to be reliable.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-22 Thread Andres Freund

On 2017-03-23 06:55:53 +0800, Craig Ringer wrote:
> On 22 March 2017 at 21:06, Andres Freund  wrote:
> > Hi,
> >
> > On 2017-03-21 09:05:26 +0800, Craig Ringer wrote:
> >> > 0002 should be doable as a whole this release, I have severe doubts that
> >> > 0003 as a whole has a chance for 10 - the code is in quite a raw shape,
> >> > there's a significant number of open ends.  I'd suggest breaking of bits
> >> > that are independently useful, and work on getting those committed.
> >>
> >> That would be my preference too.
> >
> >
> >> The parts I think are important for Pg10 are:
> >
> >> * Ability to create logical slots on replicas
> >
> > Doesn't this also imply recovery conflicts on DROP DATABASE?  Besides,
> > allowing to drop all slots using a database upon DROP DATABASE, is a
> > useful thing on its own.
> 
> Definitely beneficial, otherwise recovery will stop until you drop
> slots, which isn't ideal.

s/isn't ideal/not acceptable/ ;)


> >> * Ability to advance (via feedback or via SQL function) - no need to
> >> actually decode and call output plugins at al
> >
> > That pretty much requires decoding, otherwise you really don't know how
> > much WAL you have to retain.
> 
> Yes, and to update restart_lsn and catalog_xmin correctly.

> I was thinking that by disallowing snapshot use and output plugin
> invocation we'd avoid the need to support cancellation on recovery
> conflicts, etc, simplifying things considerably.

That seems like it'd end up being pretty hacky - the likelihood that
we'd run into snapbuild error cross-checks seems very high.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-22 Thread Craig Ringer

On 22 March 2017 at 21:06, Andres Freund  wrote:
> Hi,
>
> On 2017-03-21 09:05:26 +0800, Craig Ringer wrote:
>> > 0002 should be doable as a whole this release, I have severe doubts that
>> > 0003 as a whole has a chance for 10 - the code is in quite a raw shape,
>> > there's a significant number of open ends.  I'd suggest breaking of bits
>> > that are independently useful, and work on getting those committed.
>>
>> That would be my preference too.
>
>
>> The parts I think are important for Pg10 are:
>
>> * Ability to create logical slots on replicas
>
> Doesn't this also imply recovery conflicts on DROP DATABASE?  Besides,
> allowing to drop all slots using a database upon DROP DATABASE, is a
> useful thing on its own.

Definitely beneficial, otherwise recovery will stop until you drop
slots, which isn't ideal.

>> * Ability to advance (via feedback or via SQL function) - no need to
>> actually decode and call output plugins at al
>
> That pretty much requires decoding, otherwise you really don't know how
> much WAL you have to retain.

Yes, and to update restart_lsn and catalog_xmin correctly.

I was thinking that by disallowing snapshot use and output plugin
invocation we'd avoid the need to support cancellation on recovery
conflicts, etc, simplifying things considerably.

>> * Ability to drop logical slots on replicas
>
> That shouldn't actually require any changes, no?

It doesn't, it works as-is. I have NFI why I wrote that.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-22 Thread Andres Freund

On 2017-03-22 15:59:42 +, Simon Riggs wrote:
> On 22 March 2017 at 13:06, Andres Freund  wrote:
> 
> >> The parts I think are important for Pg10 are:
> >
> >> * Ability to create logical slots on replicas
> >
> > Doesn't this also imply recovery conflicts on DROP DATABASE?
> 
> Not needed until the slot is in use, which is a later patch.

Hm? We need to drop slots, if they can exist / be created, on a standby,
and they're on a dropped database.  Otherwise we'll reserve resources,
while anyone connecting to the slot will likely just receive errors
because the database doesn't exist anymore.  It's also one of the
patches that can quite easily be developed / reviewed, because there
really isn't anything complicated about it.  Most of the code is already
in Craig's patch, it just needs some adjustments.

> > Besides,
> > allowing to drop all slots using a database upon DROP DATABASE, is a
> > useful thing on its own.
> 
> Sure but that's a separate feature unrelated to this patch and we're
> running out of time.

Hm? The patch implemented it.

> >> * Ability to advance (via feedback or via SQL function) - no need to
> >> actually decode and call output plugins at al
> >
> > That pretty much requires decoding, otherwise you really don't know how
> > much WAL you have to retain.
> 
> Knowing how much WAL to retain is key.
> 
> Why would decoding tell you how much WAL to retain?

Because decoding already has the necessary logic?  (You need to retain
enough WAL to restart decoding for all in-progress transactions etc).

> We tried to implement this automatically from the master, which was
> rejected. So the only other way is manually. We need one or the other.

I think the current approach is roughly the right way - but that doesn't
make the patch ready.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-22 Thread Simon Riggs

On 22 March 2017 at 08:53, Craig Ringer  wrote:

> I'm splitting up the rest of the decoding on standby patch set with
> the goal of getting minimal functionality for creating and managing
> slots on standbys in, so we can maintain slots on standbys and use
> them when the standby is promoted to master.
>
> The first, to send catalog_xmin separately to the global xmin on
> hot_standby_feedback and store it in the upstream physical slot's
> catalog_xmin, is attached.
>
> These are extracted directly from the logical decoding on standby
> patch, with comments by Petr and Andres made re the relevant code
> addressed.

I've reduced your two patches back to one with a smaller blast radius.

I'll commit this tomorrow morning, barring objections.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Report-catalog_xmin-separately-to-xmin-in-hot-standb.v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-22 Thread Simon Riggs

On 22 March 2017 at 13:06, Andres Freund  wrote:

>> The parts I think are important for Pg10 are:
>
>> * Ability to create logical slots on replicas
>
> Doesn't this also imply recovery conflicts on DROP DATABASE?

Not needed until the slot is in use, which is a later patch.

> Besides,
> allowing to drop all slots using a database upon DROP DATABASE, is a
> useful thing on its own.

Sure but that's a separate feature unrelated to this patch and we're
running out of time.

>> * Ability to advance (via feedback or via SQL function) - no need to
>> actually decode and call output plugins at al
>
> That pretty much requires decoding, otherwise you really don't know how
> much WAL you have to retain.

Knowing how much WAL to retain is key.

Why would decoding tell you how much WAL to retain?

We tried to implement this automatically from the master, which was
rejected. So the only other way is manually. We need one or the other.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-22 Thread Andres Freund

On 2017-03-22 14:58:29 +, Simon Riggs wrote:
> On 22 March 2017 at 13:06, Andres Freund  wrote:
> 
> > But I have to admit, I've *severe* doubts about getting the whole
> > infrastructure for slot creation on replica into 10.  The work is far
> > from ready, and we're mere days away from freeze.
> 
> If Craig has to guess what would be acceptable, then its not long enough.

I don't know what you're on about with that statement.  I've spent a
good chunk of time looking at the 0003 patch, even though it's large and
contains a lot of different things.  I suggested splitting things up. I
even suggested what to move earlier after Craig agreed with that
sentiment, in the mail you're replying to, because it seems
independently doable.

> It would be better if you could outline a specific approach so he can
> code it. Coding takes about a day for most things, since Craig knows
> the code and what we're trying to achieve.

I find that fairly unconvincing. What we have here is a patch that isn't
close to being ready, contains a lot of complicated pieces, a couple
days before freeze.  If we can pull useful pieces out: great.  But it's
too later for major new development.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-22 Thread Simon Riggs

On 22 March 2017 at 13:06, Andres Freund  wrote:

> But I have to admit, I've *severe* doubts about getting the whole
> infrastructure for slot creation on replica into 10.  The work is far
> from ready, and we're mere days away from freeze.

If Craig has to guess what would be acceptable, then its not long enough.

It would be better if you could outline a specific approach so he can
code it. Coding takes about a day for most things, since Craig knows
the code and what we're trying to achieve.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-22 Thread Andres Freund

Hi,

On 2017-03-21 09:05:26 +0800, Craig Ringer wrote:
> > 0002 should be doable as a whole this release, I have severe doubts that
> > 0003 as a whole has a chance for 10 - the code is in quite a raw shape,
> > there's a significant number of open ends.  I'd suggest breaking of bits
> > that are independently useful, and work on getting those committed.
> 
> That would be my preference too.

> The parts I think are important for Pg10 are:

> * Ability to create logical slots on replicas

Doesn't this also imply recovery conflicts on DROP DATABASE?  Besides,
allowing to drop all slots using a database upon DROP DATABASE, is a
useful thing on its own.

But I have to admit, I've *severe* doubts about getting the whole
infrastructure for slot creation on replica into 10.  The work is far
from ready, and we're mere days away from freeze.

> * Ability to advance (via feedback or via SQL function) - no need to
> actually decode and call output plugins at al

That pretty much requires decoding, otherwise you really don't know how
much WAL you have to retain.

> * Ability to drop logical slots on replicas

That shouldn't actually require any changes, no?

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-22 Thread Craig Ringer

On 22 March 2017 at 10:51, Craig Ringer  wrote:
> Hi all
>
> Updated timeline following patch attached.
>
> There's a change in read_local_xlog_page to ensure we maintain
> ThisTimeLineID properly, otherwise it's just comment changes.

OK, so we're looking OK with the TL following.

I'm splitting up the rest of the decoding on standby patch set with
the goal of getting minimal functionality for creating and managing
slots on standbys in, so we can maintain slots on standbys and use
them when the standby is promoted to master.

The first, to send catalog_xmin separately to the global xmin on
hot_standby_feedback and store it in the upstream physical slot's
catalog_xmin, is attached.

These are extracted directly from the logical decoding on standby
patch, with comments by Petr and Andres made re the relevant code
addressed.

I will next be working on a bare-minimum facility for creating and
advancing logical slots on a replica without support for buffering
changes, creating historic snapshots or invoking output plugin. The
slots will become usable after the replica is promoted. They'll track
their own restart_lsn, etc, and will keep track of xids so they can
manage their catalog_xmin, so there'll be no need for dodgy slot
syncing from the master, but they'll avoid most of the complex and
messy bits. The application will be expected to make sure a slot on
the master exists and is advanced before the corresponding slot on the
replica to protect required catalogs.

Then if there's agreement that it's the right way forward I can add
the catalog_xmin xlog'ing stuff as the next patch.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From b719c0b556a6823c9b48c0f4042aaf77a8d5f69e Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Wed, 22 Mar 2017 12:11:17 +0800
Subject: [PATCH 1/2] Allow GetOldestXmin to ignore replication slot xmin

For walsender to report replication slots' catalog_xmin separately, it's
necessary to be able to ask GetOldestXmin to ignore replication slots.
---
 contrib/pg_visibility/pg_visibility.c |  4 ++--
 contrib/pgstattuple/pgstatapprox.c|  2 +-
 src/backend/access/transam/xlog.c |  4 ++--
 src/backend/catalog/index.c   |  2 +-
 src/backend/commands/analyze.c|  2 +-
 src/backend/commands/vacuum.c |  4 ++--
 src/backend/replication/walreceiver.c |  2 +-
 src/backend/storage/ipc/procarray.c   | 36 +++
 src/include/storage/procarray.h   |  2 +-
 9 files changed, 39 insertions(+), 19 deletions(-)

diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index d0f7618..6261e68 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -557,7 +557,7 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
 	if (all_visible)
 	{
 		/* Don't pass rel; that will fail in recovery. */
-		OldestXmin = GetOldestXmin(NULL, true);
+		OldestXmin = GetOldestXmin(NULL, true, false);
 	}
 
 	rel = relation_open(relid, AccessShareLock);
@@ -674,7 +674,7 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
  * a buffer lock. And this shouldn't happen often, so it's
  * worth being careful so as to avoid false positives.
  */
-RecomputedOldestXmin = GetOldestXmin(NULL, true);
+RecomputedOldestXmin = GetOldestXmin(NULL, true, false);
 
 if (!TransactionIdPrecedes(OldestXmin, RecomputedOldestXmin))
 	record_corrupt_item(items, _self);
diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c
index 8db1e20..743cbee 100644
--- a/contrib/pgstattuple/pgstatapprox.c
+++ b/contrib/pgstattuple/pgstatapprox.c
@@ -70,7 +70,7 @@ statapprox_heap(Relation rel, output_type *stat)
 	TransactionId OldestXmin;
 	uint64		misc_count = 0;
 
-	OldestXmin = GetOldestXmin(rel, true);
+	OldestXmin = GetOldestXmin(rel, true, false);
 	bstrategy = GetAccessStrategy(BAS_BULKREAD);
 
 	nblocks = RelationGetNumberOfBlocks(rel);
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9480377..c2b4f2c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8895,7 +8895,7 @@ CreateCheckPoint(int flags)
 	 * StartupSUBTRANS hasn't been called yet.
 	 */
 	if (!RecoveryInProgress())
-		TruncateSUBTRANS(GetOldestXmin(NULL, false));
+		TruncateSUBTRANS(GetOldestXmin(NULL, false, false));
 
 	/* Real work is done, but log and update stats before releasing lock. */
 	LogCheckpointEnd(false);
@@ -9258,7 +9258,7 @@ CreateRestartPoint(int flags)
 	 * this because StartupSUBTRANS hasn't been called yet.
 	 */
 	if (EnableHotStandby)
-		TruncateSUBTRANS(GetOldestXmin(NULL, false));
+		TruncateSUBTRANS(GetOldestXmin(NULL, false, false));
 
 	/* Real work is done, but log and update before releasing lock. */
 	LogCheckpointEnd(true);
diff --git

Re: [HACKERS] Logical decoding on standby

2017-03-21 Thread Craig Ringer

Hi all

Updated timeline following patch attached.

There's a change in read_local_xlog_page to ensure we maintain
ThisTimeLineID properly, otherwise it's just comment changes.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From d42ceaec47793f67c55523d1aeb72be61c4f2dea Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Thu, 1 Sep 2016 10:16:55 +0800
Subject: [PATCH] Teach xlogreader to follow timeline switches

The XLogReader was timeline-agnostic and assumed that all WAL segments
requested would be on ThisTimeLineID.

When decoding from a logical slot, it's necessary for xlog reading to
be able to read xlog from historical (i.e. not current) timelines.
Otherwise decoding fails after failover to a physical replica because
the oldest still-needed archives are in the historical timeline.

Supporting logical decoding timeline following is a pre-requisite for
logical decoding on physical standby servers. It also makes it
possible to promote a replica with logical slots to a master and
replay from those slots, allowing logical decoding applications to
follow physical failover.

Logical slots cannot actually be created or advanced on a replica so this is
mostly foundation work for subsequent changes to enable logical decoding on
standbys.

Tests are included to exercise the functionality using a cold disk-level copy
of the master that's started up as a replica with slots intact, but the
intended use of the functionality is with logical decoding on a standby.

Note that an earlier version of logical decoding timeline following
was committed to 9.6 as 24c5f1a103ce, 3a3b309041b0, 82c83b337202, and
f07d18b6e94d. It was then reverted by c1543a81a7a8 just after 9.6
feature freeze when issues were discovered too late to safely fix them
in the 9.6 release cycle.

The prior approach failed to consider that a record could be split
across pages that are on different segments, where the new segment
contains the start of a new timeline. In that case the old segment
might be missing or renamed with a .partial suffix.

This patch reworks the logic to be page-based and in the process
simplify how the last timeline for a segment is looked up.
---
 src/backend/access/transam/xlogutils.c | 213 +++--
 src/backend/replication/logical/logicalfuncs.c |   8 +-
 src/backend/replication/walsender.c|  11 +-
 src/include/access/xlogreader.h|  16 ++
 src/include/access/xlogutils.h |   3 +
 src/test/recovery/Makefile |   2 +
 .../recovery/t/009_logical_decoding_timelines.pl   | 130 +
 7 files changed, 364 insertions(+), 19 deletions(-)
 create mode 100644 src/test/recovery/t/009_logical_decoding_timelines.pl

diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index b2b9fcb..28c07d3 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -19,6 +19,7 @@
 
 #include 
 
+#include "access/timeline.h"
 #include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "access/xlogutils.h"
@@ -662,6 +663,7 @@ XLogRead(char *buf, TimeLineID tli, XLogRecPtr startptr, Size count)
 	/* state maintained across calls */
 	static int	sendFile = -1;
 	static XLogSegNo sendSegNo = 0;
+	static TimeLineID sendTLI = 0;
 	static uint32 sendOff = 0;
 
 	p = buf;
@@ -677,7 +679,8 @@ XLogRead(char *buf, TimeLineID tli, XLogRecPtr startptr, Size count)
 		startoff = recptr % XLogSegSize;
 
 		/* Do we need to switch to a different xlog segment? */
-		if (sendFile < 0 || !XLByteInSeg(recptr, sendSegNo))
+		if (sendFile < 0 || !XLByteInSeg(recptr, sendSegNo) ||
+			sendTLI != tli)
 		{
 			char		path[MAXPGPATH];
 
@@ -704,6 +707,7 @@ XLogRead(char *buf, TimeLineID tli, XLogRecPtr startptr, Size count)
 	path)));
 			}
 			sendOff = 0;
+			sendTLI = tli;
 		}
 
 		/* Need to seek in the file? */
@@ -754,6 +758,133 @@ XLogRead(char *buf, TimeLineID tli, XLogRecPtr startptr, Size count)
 }
 
 /*
+ * Determine which timeline to read an xlog page from and set the
+ * XLogReaderState's currTLI to that timeline ID.
+ *
+ * We care about timelines in xlogreader when we might be reading xlog
+ * generated prior to a promotion, either if we're currently a standby in
+ * recovery or if we're a promoted master reading xlogs generated by the old
+ * master before our promotion.
+ *
+ * wantPage must be set to the start address of the page to read and
+ * wantLength to the amount of the page that will be read, up to
+ * XLOG_BLCKSZ. If the amount to be read isn't known, pass XLOG_BLCKSZ.
+ *
+ * We switch to an xlog segment from the new timeline eagerly when on a
+ * historical timeline, as soon as we reach the start of the xlog segment
+ * containing the timeline switch.  The server copied the segment to the new
+ * timeline so all the data up to the switch point

Re: [HACKERS] Logical decoding on standby

2017-03-21 Thread Craig Ringer

On 21 March 2017 at 09:05, Craig Ringer  wrote:

> Thanks, that's a helpful point. The commit in question is 978b2f65. I
> didn't notice that it introduced XLogReader use in twophase.c, though
> I should've realised given the discussion about fetching recent 2pc
> info from xlog. I don't see any potential for issues at first glance,
> but I'll go over it in more detail. The main concern is validity of
> ThisTimeLineID, but since it doesn't run in recovery I don't see much
> of a problem there. That also means it can afford to use the current
> timeline-oblivious read_local_xlog_page safely.
>
> TAP tests for 2pc were added by 3082098. I'll check to make sure they
> have appropriate coverage for this.

The TAP tests pass fine, and I can't see any likely issues either.

XLogReader for 2PC doesn't happen on standby, and RecoveryInProgress()
will update ThisTimeLineID on promotion.

>> Did you check whether ThisTimeLineID is actually always valid in the
>> processes logical decoding could run in?  IIRC it's not consistently
>> update during recovery in any process but the startup process.
>
> I share your concerns that it may not be well enough maintained.
> Thankyou for the reminder, that's been on my TODO and got lost when I
> had to task-hop to other priorities.

The main place we maintain ThisTimeLineID (outside StartupXLOG of
course) is in walsender's GetStandbyFlushRecPtr, which calls
GetWalRcvWriteRecPtr. That's not used in walsender's logical decoding
or in the SQL interface.

I've changed the order of operations in read_local_xlog_page to ensure
that RecoveryInProgress() updates ThisTimeLineID if we're promoted,
and made it update ThisTimeLineID from GetXLogReplayRecPtr otherwise.

pg_logical_slot_get_changes_guts was fine already.

Because xlog read callbacks must not attempt to read pages past the
flush limit (master) or replay limit (standby), it doesn't matter if
ThisTimeLineID is completely up to date, only that it's valid as-of
that LSN.

I did identify one problem. The startup process renames the last
segment in a timeline to .partial when it processes a timeline switch.
See xlog.c:7597. So if we have the following order of operations:

* Update ThisTimeLineID to 2 at latest redo ptr
* XLogReadDetermineTimeline chooses timeline 2 to read from
* startup process replays timeline switch to TL 3 and renames last
segment in old timeline to .partial
* XLogRead() tries to open segment with TL 2

we'll fail. I don't think it matters much though. We're not actually
supporting streaming decoding from standby this release by the looks,
and even if we did the effect would be limited to an ERROR and a
reconnect. It doesn't look like there's really any sort of lock or
other synchronisation we can rely on to prevent this, and we should
probably just live with it. If we have already opened the segment
we'll just keep reading from it without noticing the rename; if we
haven't and are switching to it just as it's renamed we'll ERROR when
we try to open it.

I had cascading and promotion tests in progress for decoding on
standby, but doubt there's much point finishing them off now that it's
not likely that decoding on standby can be added for this CF.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-21 Thread Simon Riggs

On 21 March 2017 at 02:21, Craig Ringer  wrote:
> On 20 March 2017 at 17:33, Andres Freund  wrote:
>
>>> Subject: [PATCH 2/3] Follow timeline switches in logical decoding
>>
>> FWIW, the title doesn't really seem accurate to me.
>
> Yeah, it's not really at the logical decoding layer at all.
>
> "Teach xlogreader to follow timeline switches" ?

Happy with that. I think Craig has addressed Andres' issues with this
patch, so I will apply later today as planned using that name.

The longer Logical Decoding on Standby will not be applied yet and not
without further changess, per review.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-20 Thread Craig Ringer

On 20 March 2017 at 17:33, Andres Freund  wrote:

>> Subject: [PATCH 2/3] Follow timeline switches in logical decoding
>
> FWIW, the title doesn't really seem accurate to me.

Yeah, it's not really at the logical decoding layer at all.

"Teach xlogreader to follow timeline switches" ?

>> Logical slots cannot actually be created on a replica without use of
>> the low-level C slot management APIs so this is mostly foundation work
>> for subsequent changes to enable logical decoding on standbys.
>
> Everytime I read references to anything like this my blood starts to
> boil.  I kind of regret not having plastered RecoveryInProgress() errors
> all over this code.

In fairness, I've been trying for multiple releases to get a "right"
way in. I have no intention of using such hacks, and only ever did so
for testing xlogreader timeline following without full logical
decoding on standby being available.

>> From 8854d44e2227b9d076b0a25a9c8b9df9270b2433 Mon Sep 17 00:00:00 2001
>> From: Craig Ringer 
>> Date: Mon, 5 Sep 2016 15:30:53 +0800
>> Subject: [PATCH 3/3] Logical decoding on standby
>>
>> * Make walsender aware of ProcSignal and recovery conflicts, make walsender
>>   exit with recovery conflict on upstream drop database when it has an active
>>   logical slot on that database.
>> * Allow GetOldestXmin to omit catalog_xmin, be called already locked.
>
> "be called already locked"?

To be called with ProcArrayLock already held. But that's actually
outdated due to changes Petr requested earlier, thanks for noticing.

>> * Send catalog_xmin separately in hot_standby_feedback messages.
>> * Store catalog_xmin separately on a physical slot if received in 
>> hot_standby_feedback
>
> What does separate mean?

Currently, hot standby feedback sends effectively the
min(catalog_xmin, xmin) to the upstream, which in turn records that
either in the PGPROC entry or, if there's a slot in use, in the xmin
field on the slot.

So catalog_xmin on the standby gets promoted to xmin on the master's
physical slot. Lots of unnecessary bloat results.

This splits it up, so we can send catalog_xmin separately on the wire,
and store it on the physical replication slot as catalog_xmin, not
xmin.

>> * Separate the catalog_xmin used by vacuum from ProcArray's 
>> replication_slot_catalog_xmin,
>>   requiring that xlog be emitted before vacuum can remove no longer needed 
>> catalogs, store
>>   it in checkpoints, make vacuum and bgwriter advance it.
>
> I can't parse that sentence.

We now write an xlog record before allowing the catalog_xmin in
ProcArray replication_slot_catalog_xmin to advance and allow catalog
tuples to be removed. This is achieved by making vacuum use a separate
field in ShmemVariableCache, oldestCatalogXmin. When vacuum looks up
the new xmin from GetOldestXmin, it copies
ProcArray.replication_slot_catalog_xmin to
ShmemVariableCache.oldestCatalogXmin, writing an xlog record to ensure
we remember the new value and ensure standbys know about it.

This provides a guarantee to standbys that all catalog tuples >=
ShmemVariableCache.oldestCatalogXmin are protected from vacuum and
lets them discover when that threshold advances.

The reason we cannot use the xid field in existing vacuum xlog records
is that the downstream has no way to know if the xact affected
catalogs and therefore whether it should advance its idea of
catalog_xmin or not. It can't get a Relation for the affected
relfilenode because it can't use the relcache during redo. We'd have
to add a flag to every vacuum record indicating whether it affected
catalogs, which is not fun, and vacuum might not always even know. And
the standby would still need a way to keep track of the oldest valid
catalog_xmin across restart without the ability to write it to
checkpoints.

It's a lot simpler and cheaper to have the master do it.

>> * Add a new recovery conflict type for conflict with catalog_xmin. Abort
>>   in-progress logical decoding sessions with conflict with recovery where 
>> needed
>>   catalog_xmin is too old
>
> Are we retaining WAL for slots broken in that way?

Yes, until the slot is dropped.

If I added a persistent flag on the slot to indicate that the slot is
invalid, then we could ignore it for purposes of WAL retention. It
seemed unnecessary at this stage.

>> * Make extra efforts to reserve master's catalog_xmin during decoding startup
>>   on standby.
>
> What does that mean?

WaitForMasterCatalogXminReservation(...)

I don't like it. At all. I'd rather have hot standby feedback replies
so we can know for sure when the master has locked in our feedback.
It's my most disliked part of this patch.

>> * Remove checks preventing starting logical decoding on standby
>
> To me that's too many different things in one commit.  A bunch of them
> seem like it'd be good if they'd get independent buildfarm cycles too.

I agree with you. I had them all separate before and was told that
there were too many patches.

Re: [HACKERS] Logical decoding on standby

2017-03-20 Thread Craig Ringer

.On 20 March 2017 at 17:33, Andres Freund  wrote:
> Hi,
>
> Have you checked how high the overhead of XLogReadDetermineTimeline is?
> A non-local function call, especially into a different translation-unit
> (no partial inlining), for every single page might end up being
> noticeable.  That's fine in the cases it actually adds functionality,
> but for a master streaming out data, that's not actually adding
> anything.

I don't anticipate any significant effect given the large amount of
indirection via decoding, reorder buffer, snapshot builder, output
plugin, etc that we already do and how much memory allocation gets
done ... but it's worth checking. I could always move the fast path
into a macro or inline function if it does turn out to make a
detectable difference.

One of the things I want to get to is refactoring all the xlog page
reading stuff into a single place, shared between walsender and normal
backends, to get rid of this confusing mess we currently have. The
only necessary difference is how we wait for new WAL, the rest should
be as common as possible allowing for xlogreader's needs. I
particularly want to get rid of the two identically named static
XLogRead functions. But all that probably means making timeline.c
FRONTEND friendly and it's way too intrusive to contemplate at this
stage.

> Did you check whether you changes to read_local_xlog_page could cause
> issues with twophase.c? Because that now also uses it.

Thanks, that's a helpful point. The commit in question is 978b2f65. I
didn't notice that it introduced XLogReader use in twophase.c, though
I should've realised given the discussion about fetching recent 2pc
info from xlog. I don't see any potential for issues at first glance,
but I'll go over it in more detail. The main concern is validity of
ThisTimeLineID, but since it doesn't run in recovery I don't see much
of a problem there. That also means it can afford to use the current
timeline-oblivious read_local_xlog_page safely.

TAP tests for 2pc were added by 3082098. I'll check to make sure they
have appropriate coverage for this.

> Did you check whether ThisTimeLineID is actually always valid in the
> processes logical decoding could run in?  IIRC it's not consistently
> update during recovery in any process but the startup process.

I share your concerns that it may not be well enough maintained.
Thankyou for the reminder, that's been on my TODO and got lost when I
had to task-hop to other priorities.

I have some TAP tests to validate promotion that need finishing off.
My main concern is around live promotions, both promotion of standby
to master, and promotion of upstream master when streaming from a
cascaded replica.

[Will cover review of 0003 separately, next]

> 0002 should be doable as a whole this release, I have severe doubts that
> 0003 as a whole has a chance for 10 - the code is in quite a raw shape,
> there's a significant number of open ends.  I'd suggest breaking of bits
> that are independently useful, and work on getting those committed.

That would be my preference too.

I do not actually feel strongly about the need for logical decoding on
standby, and would in many ways prefer to defer it until we have
two-way hot standby feedback and the ability to have the master
confirm the actual catalog_xmin locked in to eliminate the current
race and ugly workaround for it. I'd rather have solid timeline
following in place now and bare-minimum failover capability.

I'm confident that the ability for xlogreader to follow timeline
switches will also be independently useful.

The parts I think are important for Pg10 are:

* Teach xlogreader to follow timeline switches
* Ability to create logical slots on replicas
* Ability to advance (via feedback or via SQL function) - no need to
actually decode and call output plugins at all.
* Ability to drop logical slots on replicas

That would be enough to provide minimal standby promotion without hideous hacks.

Unfortunately, slot creation on standby is probably the ugliest part
of the patch. It can be considerably simplified by imposing the rule
that the application must ensure catalog_xmin on the master is already
held down (with a replication slot) before creating a slot on the
standby, and it's the application's job to send feedback to the master
before any standbys it's keeping slots on. If the app fails to do so,
the slot on the downstream will become unusable and attempts to decode
changes from it will fail with conflict with recovery.

That'd get rid of a lot of the code including some of the ugliest
bits, since we'd no longer make any special effort with catalog_xmin
during slot creation. We're already pushing complexity onto apps for
this, after concluding that the transparent failover slots approach
wasn't the way forward, so I'm OK with that. Let the apps that want
logical decoding to support physical replica promotion pay most of the
cost.

I'd then like to revisit full decoding on standby later,

Re: [HACKERS] Logical decoding on standby

2017-03-20 Thread Craig Ringer

On 19 March 2017 at 22:12, Petr Jelinek  wrote:

> I am slightly worried about impact of the readTimeLineHistory() call but
> I think it should be called so little that it should not matter.

Pretty much my thinking too.

> That brings us to the big patch 0003.
>
> I still don't like the "New in 10.0" comments in documentation, for one
> it's 10, not 10.0 and mainly we don't generally write stuff like this to
> documentation, that's what release notes are for.

OK. Personally I think it's worthwhile for protocol docs, which are
more dev-focused. But I agree it's not consistent with the rest of the
docs, so removed.

(Frankly I wish we did this consistently throughout the Pg docs, too,
and it'd be much more user-friendly if we did, but that's just not
going to happen.)

> There is large amounts of whitespace mess (empty lines with only
> whitespace, spaces at the end of the lines), nothing horrible, but
> should be cleaned up.

Fixed.

> One thing I don't understand much is the wal_level change and turning
> off catalogXmin tracking. I don't really see anywhere that the
> catalogXmin would be reset in control file for example. There is TODO in
> UpdateOldestCatalogXmin() that seems related but tbh I don't follow
> what's happening there - comment says about not doing anything, but the
> code inside the if block are only Asserts.

UpdateOldestCatalogXmin(...) with force=true forces a
XactLogCatalogXminUpdate(...) call to write the new
procArray->replication_slot_catalog_xmin .

We call it with force=true from XLogReportParameters(...) when
wal_level has been lowered; see XLogReportParameters. This will write
out a xl_xact_catalog_xmin_advance with
procArray->replication_slot_catalog_xmin's value then update
ShmemVariableCache->oldestCatalogXmin in shmem.
ShmemVariableCache->oldestCatalogXmin will get written out in the next
checkpoint, which gets incorporated in the control file.

There is a problem though - StartupReplicationSlots() and
RestoreSlotFromDisk() don't care if catalog_xmin is set on a slot but
wal_level is < logical and will happily restore a logical slot, or a
physical slot with a catalog_xmin. So we can't actually assume that
procArray->replication_slot_catalog_xmin will be 0 if we're not
writing new logical WAL. This isn't a big deal, it just means we can't
short-circuit UpdateOldestCatalogXmin() calls if
!XLogLogicalInfoActive(). It also means the XLogReportParameters()
stuff can be removed since we don't care about wal_level for tracking
oldestCatalogXmin.

Fixed in updated patch.

I'm now reading over Andres's review.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-20 Thread Andres Freund

Hi,

Have you checked how high the overhead of XLogReadDetermineTimeline is?
A non-local function call, especially into a different translation-unit
(no partial inlining), for every single page might end up being
noticeable.  That's fine in the cases it actually adds functionality,
but for a master streaming out data, that's not actually adding
anything.

Did you check whether you changes to read_local_xlog_page could cause
issues with twophase.c? Because that now also uses it.

Did you check whether ThisTimeLineID is actually always valid in the
processes logical decoding could run in?  IIRC it's not consistently
update during recovery in any process but the startup process.

On 2017-03-19 21:12:23 +0800, Craig Ringer wrote:
> From 2fa891a555ea4fb200d75b8c906c6b932699b463 Mon Sep 17 00:00:00 2001
> From: Craig Ringer 
> Date: Thu, 1 Sep 2016 10:16:55 +0800
> Subject: [PATCH 2/3] Follow timeline switches in logical decoding

FWIW, the title doesn't really seem accurate to me.

> Logical slots cannot actually be created on a replica without use of
> the low-level C slot management APIs so this is mostly foundation work
> for subsequent changes to enable logical decoding on standbys.

Everytime I read references to anything like this my blood starts to
boil.  I kind of regret not having plastered RecoveryInProgress() errors
all over this code.

> From 8854d44e2227b9d076b0a25a9c8b9df9270b2433 Mon Sep 17 00:00:00 2001
> From: Craig Ringer 
> Date: Mon, 5 Sep 2016 15:30:53 +0800
> Subject: [PATCH 3/3] Logical decoding on standby
>
> * Make walsender aware of ProcSignal and recovery conflicts, make walsender
>   exit with recovery conflict on upstream drop database when it has an active
>   logical slot on that database.
> * Allow GetOldestXmin to omit catalog_xmin, be called already locked.

"be called already locked"?

> * Send catalog_xmin separately in hot_standby_feedback messages.
> * Store catalog_xmin separately on a physical slot if received in 
> hot_standby_feedback

What does separate mean?

> * Separate the catalog_xmin used by vacuum from ProcArray's 
> replication_slot_catalog_xmin,
>   requiring that xlog be emitted before vacuum can remove no longer needed 
> catalogs, store
>   it in checkpoints, make vacuum and bgwriter advance it.

I can't parse that sentence.

> * Add a new recovery conflict type for conflict with catalog_xmin. Abort
>   in-progress logical decoding sessions with conflict with recovery where 
> needed
>   catalog_xmin is too old

Are we retaining WAL for slots broken in that way?

> * Make extra efforts to reserve master's catalog_xmin during decoding startup
>   on standby.

What does that mean?

> * Remove checks preventing starting logical decoding on standby

To me that's too many different things in one commit.  A bunch of them
seem like it'd be good if they'd get independent buildfarm cycles too.

> diff --git a/src/backend/access/heap/rewriteheap.c 
> b/src/backend/access/heap/rewriteheap.c
> index d7f65a5..36bbb98 100644
> --- a/src/backend/access/heap/rewriteheap.c
> +++ b/src/backend/access/heap/rewriteheap.c
> @@ -812,7 +812,8 @@ logical_begin_heap_rewrite(RewriteState state)
>   if (!state->rs_logical_rewrite)
>   return;
>
> - ProcArrayGetReplicationSlotXmin(NULL, _xmin);
> + /* Use the catalog_xmin being retained by vacuum */
> + ProcArrayGetReplicationSlotXmin(NULL, _xmin, NULL);

What does that comment mean? Vacuum isn't the only thing that prunes old
records.

> +/*
> + * Set the global oldest catalog_xmin used to determine when tuples
> + * may be removed from catalogs and user-catalogs accessible from logical
> + * decoding.
> + *
> + * Only to be called from the startup process or by 
> UpdateOldestCatalogXmin(),
> + * which ensures the update is properly written to xlog first.
> + */
> +void
> +SetOldestCatalogXmin(TransactionId oldestCatalogXmin)
> +{
> + Assert(InRecovery || !IsUnderPostmaster || AmStartupProcess() || 
> LWLockHeldByMe(ProcArrayLock));

Uh, that's long-ish.  And doesn't agree with the comment above
(s/startup process/process performing recovery/?).

This is a long enough list that I'd consider just dropping the assert.

> + else if (info == XLOG_XACT_CATALOG_XMIN_ADV)
> + {
> + xl_xact_catalog_xmin_advance *xlrec = 
> (xl_xact_catalog_xmin_advance *) XLogRecGetData(record);
> +
> + /*
> +  * Unless logical decoding is possible on this node, we don't 
> care about
> +  * this record.
> +  */
> + if (!XLogLogicalInfoActive() || max_replication_slots == 0)
> + return;

Too many negatives for my taste, but whatever.

> + /*
> +  * Apply the new catalog_xmin limit immediately. New decoding 
> sessions
> +  * will refuse to start if their slot is past it, and old ones 
> will
> +  * notice when we signal

Re: [HACKERS] Logical decoding on standby

2017-03-20 Thread Craig Ringer

On 20 March 2017 at 14:57, Simon Riggs  wrote:

> 2.1 Why does call to ReplicationSlotAcquire() move earlier in
> pg_logical_slot_get_changes_guts()?

That appears to be an oversight from an earlier version where it
looped over timelines in pg_logical_slot_get_changes_guts . Reverted.

> 2.2 sendTimeLineIsHistoric looks incorrect, and at least isn't really
> documented well.
> The setting
>   sendTimeLineIsHistoric = state->currTLI == ThisTimeLineID;
> should be
>   sendTimeLineIsHistoric = (state->currTLI != ThisTimeLineID);

Definitely wrong. Fixed.

> but that doesn't cause failure because in read_local_xlog_page() we
> say that we are reading from history when
> state->currTLI != ThisTimeLineID explicitly rather than use
> sendTimeLineIsHistoric

XLogRead(...), as called by logical_read_xlog_page, does test it. It's
part of the walsender-local log read callback. We don't hit
read_local_xlog_page at all when we're doing walsender based logical
decoding.

We have two parallel code paths for reading xlogs, one for walsender,
one for normal backends. The walsender one is glued together with a
bunch of globals that pass state "around" the xlogreader - we set it
up before calling into xlogreader, and then examine it when xlogreader
calls back into walsender.c with logical_read_xlog_page.

I really want to refactor that at some stage, getting rid of the use
of walsender globals for timeline state tracking and sharing more of
the xlog reading logic between walsender and normal backends. But
-ENOTIME, especially to do it as carefully as it must be done.

There are comments on read_local_xlog_page, logical_read_xlog_page
that mention this. Also XLogRead in
src/backend/access/transam/xlogutils.c  (which has the same name as
XLogRead in src/backend/replication/walsender.c). I have a draft for a
timeline following readme that would address some of this but don't
expect to be able to finish it off for this release cycle, and I'd
really rather clean it up instead.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-20 Thread Simon Riggs

On 19 March 2017 at 21:12, Craig Ringer  wrote:

> Rebased attached.

Patch1 looks good to go. I'll correct a spelling mistake in the tap
test when I commit that later today.

Patch2 has a couple of points

2.1 Why does call to ReplicationSlotAcquire() move earlier in
pg_logical_slot_get_changes_guts()?

2.2 sendTimeLineIsHistoric looks incorrect, and at least isn't really
documented well.
The setting
  sendTimeLineIsHistoric = state->currTLI == ThisTimeLineID;
should be
  sendTimeLineIsHistoric = (state->currTLI != ThisTimeLineID);

but that doesn't cause failure because in read_local_xlog_page() we
say that we are reading from history when
state->currTLI != ThisTimeLineID explicitly rather than use
sendTimeLineIsHistoric

So it looks like we could do with a few extra comments
If you correct these I'll commit it tomorrow.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-19 Thread Petr Jelinek

Hi,

I don't know how well I can review the 0001 (the TAP infra patch) but it
looks okay to me.

I don't really have any complaints about 0002 either. I like that it's
more or less one self-contained function and there are no weird ifdefs
anymore like in 9.6 version (btw your commit message talks about 9.5 but
it was 9.6). I also like the clever test :)

I am slightly worried about impact of the readTimeLineHistory() call but
I think it should be called so little that it should not matter.

That brings us to the big patch 0003.

I still don't like the "New in 10.0" comments in documentation, for one
it's 10, not 10.0 and mainly we don't generally write stuff like this to
documentation, that's what release notes are for.

There is large amounts of whitespace mess (empty lines with only
whitespace, spaces at the end of the lines), nothing horrible, but
should be cleaned up.

One thing I don't understand much is the wal_level change and turning
off catalogXmin tracking. I don't really see anywhere that the
catalogXmin would be reset in control file for example. There is TODO in
UpdateOldestCatalogXmin() that seems related but tbh I don't follow
what's happening there - comment says about not doing anything, but the
code inside the if block are only Asserts.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-19 Thread Simon Riggs

On 13 March 2017 at 10:56, Craig Ringer  wrote:
> On 7 March 2017 at 21:08, Simon Riggs  wrote:
>
>> Patch 4 committed. Few others need rebase.
>
> Since this patch series

Patch 1 fails since feature has already been applied. If other reason,
let me know.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-12 Thread Craig Ringer

On 13 March 2017 at 10:56, Craig Ringer  wrote:
> On 7 March 2017 at 21:08, Simon Riggs  wrote:
>
>> Patch 4 committed. Few others need rebase.
>
> Since this patch series and initial data copy for logical replication
> both add a facility for suppressing initial snapshot export on a
> logical slot, I've dropped patch 0003 (make snapshot export on logical
> slot creation) in favour of Petr's similar patch.
>
> I will duplicate it in this patch series for ease of application. (The
> version here is slightly extended over Petr's so I'll re-post the
> modified version on the logical replication initial data copy thread
> too).
>
> The main thing I want to direct attention to for Simon, as committer,
> is the xlog'ing of VACUUM's xid threshold before we advance it and
> start removing tuples. This is necessary for the standby to know
> whether a given replication slot is safe to use and fail with conflict
> with recovery if it is not, or if it becomes unsafe due to master
> vacuum activity. Note that we can _not_ use the various vacuum records
> for this because we don't know which are catalogs and which aren't;
> we'd have to add a separate "is catalog" field to each vacuum xlog
> record, which is undesirable. The downstream can't look up whether
> it's a catalog or not because it doesn't have relcache/syscache access
> during decoding.
>
> This change might look a bit similar to the vac_truncate_clog change
> in the txid_status patch, but it isn't really related. The txid_status
> change is about knowing when we can safely look up xids in clog and
> preventing a race with clog truncation. This change is about knowing
> when we can know all catalog tuples for a given xid will still be in
> the heap, not vacuumed away. Both are about making sure standbys know
> more about the state of the system in a low-cost way, though.
>
> WaitForMasterCatalogXminReservation(...) in logical.c is also worth
> looking more closely at.

I should also note that because the TAP tests currently take a long
time, I recommend skipping the tests for this patch by default and
running them only when actually touching logical decoding.

I'm looking at ways to make them faster, but they're inevitably going
to take a while until we can get hot standby feedback replies in
place, including cascading support. Which I have as WIP, but won't
make this release.

Changing the test import to

 use Test::More skip_all => "disabled by default, too slow";

will be sufficient.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-03-07 Thread Simon Riggs

On 24 January 2017 at 06:37, Craig Ringer  wrote:
> Rebased series attached, on top of current master (which includes
> logical replicaiton).
>
> I'm inclined to think I should split out a few of the changes from
> 0005, roughly along the lines of the bullet points in its commit
> message. Anyone feel strongly about how granular this should be?
>
> This patch series is a pre-requisite for supporting logical
> replication using a physical standby as a source, but does not its
> self enable logical replication from a physical standby.

Patch 4 committed. Few others need rebase.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-01-31 Thread Michael Paquier

On Tue, Jan 24, 2017 at 7:37 AM, Craig Ringer  wrote:
> Rebased series attached, on top of current master (which includes
> logical replicaiton).
>
> I'm inclined to think I should split out a few of the changes from
> 0005, roughly along the lines of the bullet points in its commit
> message. Anyone feel strongly about how granular this should be?
>
> This patch series is a pre-requisite for supporting logical
> replication using a physical standby as a source, but does not its
> self enable logical replication from a physical standby.

There are patches but no reviews yet so moved to CF 2017-03.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-01-23 Thread Thom Brown

On 5 January 2017 at 01:21, Craig Ringer  wrote:
> On 5 January 2017 at 09:19, Craig Ringer  wrote:
>
>> so here's a rebased series on top of master. No other changes.
>
> Now with actual patches.

Patch 5 no longer applies:

patching file src/include/pgstat.h
Hunk #1 FAILED at 745.
1 out of 1 hunk FAILED -- saving rejects to file src/include/pgstat.h.rej

--- src/include/pgstat.h
+++ src/include/pgstat.h
@@ -745,7 +745,8 @@ typedef enum
WAIT_EVENT_SYSLOGGER_MAIN,
WAIT_EVENT_WAL_RECEIVER_MAIN,
WAIT_EVENT_WAL_SENDER_MAIN,
-   WAIT_EVENT_WAL_WRITER_MAIN
+   WAIT_EVENT_WAL_WRITER_MAIN,
+   WAIT_EVENT_STANDBY_LOGICAL_SLOT_CREATE
 } WaitEventActivity;

 /* --

Could you rebase?

Thanks

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-01-05 Thread Michael Paquier

On Fri, Jan 6, 2017 at 1:07 PM, Craig Ringer  wrote:
> On 5 January 2017 at 13:12, Michael Paquier  wrote:
>> On Thu, Jan 5, 2017 at 10:21 AM, Craig Ringer  wrote:
>>> On 5 January 2017 at 09:19, Craig Ringer  wrote:
>>>
 so here's a rebased series on top of master. No other changes.
>>>
>>> Now with actual patches.
>>
>> Looking at the PostgresNode code in 0001...
>> +=pod $node->pg_recvlogical_upto(self, dbname, slot_name, endpos,
>> timeout_secs, ...)
>> +
>> This format is incorrect. I think that you also need to fix the
>> brackets for the do{} and the eval{] blocks.
>>
>> +push @cmd, '--endpos', $endpos if ($endpos);
>> endpos should be made a mandatory argument. If it is not defined that
>> would make the test calling this routine being stuck.
>
> Acknowledged and agreed. I'll fix both in the next revision.  I'm
> currently working on hot standby replies, but will return to this
> ASAP.

By the way, be sure to fix as well the =pod blocks for the new
routines. perldoc needs to use both =pod and =item.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-01-05 Thread Craig Ringer

On 5 January 2017 at 13:12, Michael Paquier  wrote:
> On Thu, Jan 5, 2017 at 10:21 AM, Craig Ringer  wrote:
>> On 5 January 2017 at 09:19, Craig Ringer  wrote:
>>
>>> so here's a rebased series on top of master. No other changes.
>>
>> Now with actual patches.
>
> Looking at the PostgresNode code in 0001...
> +=pod $node->pg_recvlogical_upto(self, dbname, slot_name, endpos,
> timeout_secs, ...)
> +
> This format is incorrect. I think that you also need to fix the
> brackets for the do{} and the eval{] blocks.
>
> +push @cmd, '--endpos', $endpos if ($endpos);
> endpos should be made a mandatory argument. If it is not defined that
> would make the test calling this routine being stuck.
> --
> Michael

Acknowledged and agreed. I'll fix both in the next revision.  I'm
currently working on hot standby replies, but will return to this
ASAP.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-01-04 Thread Michael Paquier

On Thu, Jan 5, 2017 at 10:21 AM, Craig Ringer  wrote:
> On 5 January 2017 at 09:19, Craig Ringer  wrote:
>
>> so here's a rebased series on top of master. No other changes.
>
> Now with actual patches.

Looking at the PostgresNode code in 0001...
+=pod $node->pg_recvlogical_upto(self, dbname, slot_name, endpos,
timeout_secs, ...)
+
This format is incorrect. I think that you also need to fix the
brackets for the do{} and the eval{] blocks.

+push @cmd, '--endpos', $endpos if ($endpos);
endpos should be made a mandatory argument. If it is not defined that
would make the test calling this routine being stuck.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-01-04 Thread Craig Ringer

On 4 January 2017 at 16:19, Craig Ringer  wrote:
> On 4 January 2017 at 12:15, Craig Ringer  wrote:
>
>> That's particularly relevant to you Simon as you expressed a wish to
>> commit the new streaming rep tests.

Simon committed 1, 2, 3 and 5:

* Extra PostgresNode methods
* pg_recvlogical --endpos
* Tests for pg_recvlogical
* Expand streaming replication tests to cover hot standby

so here's a rebased series on top of master. No other changes.

The first patch to add a pg_recvlogical wrapper to PostgresNode is
really only needed to test the rest of the patches, so it can be
committed together with them.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-01-04 Thread Craig Ringer

On 4 January 2017 at 12:15, Craig Ringer  wrote:

> That's particularly relevant to you Simon as you expressed a wish to
> commit the new streaming rep tests.

Patches 0001 and 0005 in this series also posted as
https://www.postgresql.org/message-id/CAMsr+YHxTMrY1woH_m4bEF3f5+kxX_T=sduyxf4d2-+e-56...@mail.gmail.com
, since they're really pre-requisites not part of decoding on standby
as such. I'll post a new series with them removed once committed.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-01-03 Thread Craig Ringer

On 4 January 2017 at 12:08, Craig Ringer  wrote:
>
> 0001 incorporates the changes requested by Michael Paquier. Simon
> expressed his intention to commit this after updates, in the separate
> thread [...]

...

> 0005 (new streaming rep tests) is updated for the changes in 0001,
> otherwise unchanged. Simon said he wanted to commit this soon.
>
> 0006 (timeline following) is unchanged except for updates to be
> compatible with 0001.
>
> 0007 is the optional snapshot export requested by Petr.
>
> 0008 is unchanged.
>
> 0009 is unchanged except for updates vs 0001 and use of the
> WITHOUT_SNAPSHOT option added in 0007.

Oh, note that it's possible to commit 0001 then 0005, skipping over
2..4. I should probably have ordered them that way.

That's particularly relevant to you Simon as you expressed a wish to
commit the new streaming rep tests.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2016-12-23 Thread Andrew Dunstan




On 12/22/2016 01:21 AM, Craig Ringer wrote:

+   my @fields = ('plugin', 'slot_type', 'datoid', 'database',
'active', 'active_pid', 'xmin', 'catalog_xmin', 'restart_lsn');
+   my $result = $self->safe_psql('postgres', 'SELECT ' . join(', ',
@fields) . " FROM pg_catalog.pg_replication_slots WHERE slot_name =
'$slot_name'");
+   $result = undef if $result eq '';
+   # hash slice, see http://stackoverflow.com/a/16755894/398670 .
Couldn't this portion be made more generic? Other queries could
benefit from that by having a routine that accepts as argument an
array of column names for example.

Yeah, probably. I'm not sure where it should live though - TestLib.pm ?

Not sure if there's an idomatic way to  pass a string (in this case
queyr) in Perl with a placeholder for interpolation of values (in this
case columns). in Python you'd pass it with pre-defined
%(placeholders)s for %.






For direct interpolation of an expression, there is this slightly 
baroque gadget:


   my $str = "here it is @{[ arbitrary expression here ]}";

For indirect interpolation using placeholders, there is

   my $str = sprintf("format string",...);

which works much like C except that the string is returned as a result 
instead of being the first argument.


And as we always say, TIMTOWTDI.


cheers

andrew (japh)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2016-12-23 Thread Craig Ringer

On 22 December 2016 at 14:21, Craig Ringer  wrote:

changes-in-0001-v2.diff shows the changes to PostgresNode.pm per
Michael's comments, and applies on top of 0001.

I also attach a patch to add a new CREATE_REPLICATION_SLOT option per
Petr's suggestion, so you can request a slot be created
WITHOUT_SNAPSHOT. This replaces the patch series's behaviour of
silently suppressing snapshot export when a slot was created on a
replica. It'll conflict (easily resolved) if applied on top of the
current series.

I have more to do before re-posting the full series, so waiting on
author at this point. The PostgresNode changes likely break later
tests, I'm just posting them so there's some progress here and so I
don't forget over the next few days' distraction.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 28e9f0b..64a4633 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -93,7 +93,6 @@ use RecursiveCopy;
 use Socket;
 use Test::More;
 use TestLib ();
-use pg_lsn qw(parse_lsn);
 use Scalar::Util qw(blessed);
 
 our @EXPORT = qw(
@@ -1325,38 +1324,62 @@ sub run_log
TestLib::run_log(@_);
 }
 
-=pod $node->lsn
+=pod $node->lsn(mode)
 
-Return pg_current_xlog_insert_location() or, on a replica,
-pg_last_xlog_replay_location().
+Look up xlog positions on the server:
+
+* insert position (master only, error on replica)
+* write position (master only, error on replica)
+* flush position
+* receive position (always undef on master)
+* replay position
+
+mode must be specified.
 
 =cut
 
 sub lsn
 {
-   my $self = shift;
-   return $self->safe_psql('postgres', 'select case when 
pg_is_in_recovery() then pg_last_xlog_replay_location() else 
pg_current_xlog_insert_location() end as lsn;');
+   my ($self, $mode) = @_;
+   my %modes = ('insert' => 'pg_current_xlog_insert_location()',
+'flush' => 'pg_current_xlog_flush_location()',
+'write' => 'pg_current_xlog_location()',
+'receive' => 'pg_last_xlog_receive_location()',
+'replay' => 'pg_last_xlog_replay_location()');
+
+   $mode = '' if !defined($mode);
+   die "unknown mode for 'lsn': '$mode', valid modes are " . join(', ', 
keys %modes)
+   if !defined($modes{$mode});
+
+   my $result = $self->safe_psql('postgres', "SELECT $modes{$mode}");
+   chomp($result);
+   if ($result eq '')
+   {
+   return undef;
+   }
+   else
+   {
+   return $result;
+   }
 }
 
 =pod $node->wait_for_catchup(standby_name, mode, target_lsn)
 
 Wait for the node with application_name standby_name (usually from node->name)
-until its replication equals or passes the upstream's xlog insert point at the
-time this function is called. By default the replay_location is waited for,
-but 'mode' may be specified to wait for any of sent|write|flush|replay.
+until its replication position in pg_stat_replication equals or passes the
+upstream's xlog insert point at the time this function is called. By default
+the replay_location is waited for, but 'mode' may be specified to wait for any
+of sent|write|flush|replay.
 
 If there is no active replication connection from this peer, waits until
 poll_query_until timeout.
 
 Requires that the 'postgres' db exists and is accessible.
 
-If pos is passed, use that xlog position instead of the server's current
-xlog insert position.
+target_lsn may be any arbitrary lsn, but is typically 
$master_node->lsn('insert').
 
 This is not a test. It die()s on failure.
 
-Returns the LSN caught up to.
-
 =cut
 
 sub wait_for_catchup
@@ -1364,24 +1387,25 @@ sub wait_for_catchup
my ($self, $standby_name, $mode, $target_lsn) = @_;
$mode = defined($mode) ? $mode : 'replay';
my %valid_modes = ( 'sent' => 1, 'write' => 1, 'flush' => 1, 'replay' 
=> 1 );
-   die "valid modes are " . join(', ', keys(%valid_modes)) unless 
exists($valid_modes{$mode});
-   if ( blessed( $standby_name ) && $standby_name->isa("PostgresNode") ) {
+   die "unknown mode $mode for 'wait_for_catchup', valid modes are " . 
join(', ', keys(%valid_modes)) unless exists($valid_modes{$mode});
+   # Allow passing of a PostgresNode instance as shorthand
+   if ( blessed( $standby_name ) && $standby_name->isa("PostgresNode") )
+   {
$standby_name = $standby_name->name;
}
-   if (!defined($target_lsn)) {
-   $target_lsn = $self->lsn;
-   }
-   $self->poll_query_until('postgres', qq[SELECT '$target_lsn' <= 
${mode}_location FROM pg_catalog.pg_stat_replication WHERE application_name = 
'$standby_name';])
-   or die "timed out waiting for catchup";
-   return $target_lsn;
+   die

Re: [HACKERS] Logical decoding on standby

2016-12-21 Thread Craig Ringer

On 22 December 2016 at 13:43, Michael Paquier  wrote:

> So, for 0001:
> --- a/src/test/perl/PostgresNode.pm
> +++ b/src/test/perl/PostgresNode.pm
> @@ -93,6 +93,7 @@ use RecursiveCopy;
>  use Socket;
>  use Test::More;
>  use TestLib ();
> +use pg_lsn qw(parse_lsn);
>  use Scalar::Util qw(blessed);
> This depends on 0002, so the order should be reversed.

Will do. That was silly.

I think I should probably also move the standby tests earlier, then
add a patch to update them when the results change.

> +sub lsn
> +{
> +   my $self = shift;
> +   return $self->safe_psql('postgres', 'select case when
> pg_is_in_recovery() then pg_last_xlog_replay_location() else
> pg_current_xlog_insert_location() end as lsn;');
> +}
> The design of the test should be in charge of choosing which value it
> wants to get, and the routine should just blindly do the work. More
> flexibility is more useful to design tests. So it would be nice to
> have one routine able to switch at will between 'flush', 'insert',
> 'write', 'receive' and 'replay modes to get values from the
> corresponding xlog functions.

Fair enough. I can amend that.

> -   die "error running SQL: '$$stderr'\nwhile running '@psql_params'"
> +   die "error running SQL: '$$stderr'\nwhile running
> '@psql_params' with sql '$sql'"
>   if $ret == 3;
> That's separate from this patch, and definitely useful.

Yeah. Slipped through. I don't think it really merits a separate patch
though tbh.

> +   if (!($mode eq 'restart' || $mode eq 'confirmed_flush')) {
> +   die "valid modes are restart, confirmed_flush";
> +   }
> +   if (!defined($target_lsn)) {
> +   $target_lsn = $self->lsn;
> +   }
> That's not really the style followed by the perl scripts present in
> the code regarding the use of the brackets. Do we really need to care
> about the object type checks by the way?

Brackets: will look / fix.

Type checks (not in quoted snippet above): that's a convenience to let
you pass a PostgresNode instance or a string name. Maybe there's a
more idiomatic Perl-y way to write it. My Perl is pretty dire.

> Regarding wait_for_catchup, there are two ways to do things. Either
> query the standby like in the way 004_timeline_switch.pl does it or
> the way this routine does. The way of this routine looks more
> straight-forward IMO, and other tests should be refactored to use it.
> In short I would make the target LSN a mandatory argument, and have
> the caller send a standby's application_name instead of a PostgresNode
> object, the current way to enforce the value of $standby_name being
> really confusing.

Hm, ok. I'll take a look. Making LSN mandatory so you have to pass
$self->lsn is ok with me.

> +   my %valid_modes = ( 'sent' => 1, 'write' => 1, 'flush' => 1,
> 'replay' => 1 );
> What's actually the point of 'sent'?

Pretty useless, but we expose it in Pg, so we might as well in the tests.

> +   my @fields = ('plugin', 'slot_type', 'datoid', 'database',
> 'active', 'active_pid', 'xmin', 'catalog_xmin', 'restart_lsn');
> +   my $result = $self->safe_psql('postgres', 'SELECT ' . join(', ',
> @fields) . " FROM pg_catalog.pg_replication_slots WHERE slot_name =
> '$slot_name'");
> +   $result = undef if $result eq '';
> +   # hash slice, see http://stackoverflow.com/a/16755894/398670 .
> Couldn't this portion be made more generic? Other queries could
> benefit from that by having a routine that accepts as argument an
> array of column names for example.

Yeah, probably. I'm not sure where it should live though - TestLib.pm ?

Not sure if there's an idomatic way to  pass a string (in this case
queyr) in Perl with a placeholder for interpolation of values (in this
case columns). in Python you'd pass it with pre-defined
%(placeholders)s for %.

> Now looking at 0002
> One whitespace:
> $ git diff HEAD~1 --check
> src/test/perl/pg_lsn.pm:139: trailing whitespace.
> +=cut

Will fix.

> pg_lsn sounds like a fine name, now we are more used to camel case for
> module names. And routines are written as lower case separated by an
> underscore.

Unsure what the intent of this is.

> +++ b/src/test/perl/t/002_pg_lsn.pl
> @@ -0,0 +1,68 @@
> +use strict;
> +use warnings;
> +use Test::More tests => 42;
> +use Scalar::Util qw(blessed);
> Most of the tests added don't have a description. This makes things
> harder to debug when tracking an issue.
>
> It may be good to begin using this module within the other tests in
> this patch as well. Now do we actually need it? Most of the existing
> tests I recall rely on the backend's operators for the pg_lsn data
> type, so this is actually duplicating an exiting facility. And all the
> values are just passed as-is.

I added it mainly for ordered tests of whether some expected lsn had
passed/increased. But maybe it makes sense to just call into the
server and let it evaluate such tests.

> +++ b/src/test/perl/t/001_load.pl
> @@ -0,0 +1,9 @@
> +use strict;
> +use warnings;
> +use Test::More tests

Re: [HACKERS] Logical decoding on standby

2016-12-21 Thread Michael Paquier

On Tue, Dec 20, 2016 at 4:03 PM, Petr Jelinek
 wrote:
> That's about approach, but since there are prerequisite patches in the
> patchset that don't really depend on the approach I will comment about
> them as well.
>
> 0001 and 0002 add testing infrastructure and look fine to me, possibly
> committable.
>
> But in 0003 I don't understand following code:
>> + if (endpos != InvalidXLogRecPtr && !do_start_slot)
>> + {
>> + fprintf(stderr,
>> + _("%s: cannot use --create-slot or --drop-slot 
>> together with --endpos\n"),
>> + progname);
>> + fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
>> + progname);
>> + exit(1);
>> + }
>
> Why is --create-slot and --endpos not allowed together?
>
> 0004 again looks good but depends on 0003.
>
> 0005 is timeline following which is IMHO ready for committer, as is 0006
> and 0008 and I still maintain the opinion that these should go in soon.
>
> 0007 is unfinished as you said in your mail (missing option to specify
> behavior). And the last one 0009 is the implementation discussed above,
> which I think needs rework. IMHO 0007 and 0009 should be ultimately merged.
>
> I think parts of this could be committed separately and are ready for
> committer IMHO, but there is no way in CF application to mark only part
> of patch-set for committer to attract the attention.

Craig has pinged me about looking at 0001, 0002, 0004 and 0006 as
those involve the TAP infrastructure.

So, for 0001:
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -93,6 +93,7 @@ use RecursiveCopy;
 use Socket;
 use Test::More;
 use TestLib ();
+use pg_lsn qw(parse_lsn);
 use Scalar::Util qw(blessed);
This depends on 0002, so the order should be reversed.

+sub lsn
+{
+   my $self = shift;
+   return $self->safe_psql('postgres', 'select case when
pg_is_in_recovery() then pg_last_xlog_replay_location() else
pg_current_xlog_insert_location() end as lsn;');
+}
The design of the test should be in charge of choosing which value it
wants to get, and the routine should just blindly do the work. More
flexibility is more useful to design tests. So it would be nice to
have one routine able to switch at will between 'flush', 'insert',
'write', 'receive' and 'replay modes to get values from the
corresponding xlog functions.

-   die "error running SQL: '$$stderr'\nwhile running '@psql_params'"
+   die "error running SQL: '$$stderr'\nwhile running
'@psql_params' with sql '$sql'"
  if $ret == 3;
That's separate from this patch, and definitely useful.

+   if (!($mode eq 'restart' || $mode eq 'confirmed_flush')) {
+   die "valid modes are restart, confirmed_flush";
+   }
+   if (!defined($target_lsn)) {
+   $target_lsn = $self->lsn;
+   }
That's not really the style followed by the perl scripts present in
the code regarding the use of the brackets. Do we really need to care
about the object type checks by the way?

Regarding wait_for_catchup, there are two ways to do things. Either
query the standby like in the way 004_timeline_switch.pl does it or
the way this routine does. The way of this routine looks more
straight-forward IMO, and other tests should be refactored to use it.
In short I would make the target LSN a mandatory argument, and have
the caller send a standby's application_name instead of a PostgresNode
object, the current way to enforce the value of $standby_name being
really confusing.

+   my %valid_modes = ( 'sent' => 1, 'write' => 1, 'flush' => 1,
'replay' => 1 );
What's actually the point of 'sent'?

+   my @fields = ('plugin', 'slot_type', 'datoid', 'database',
'active', 'active_pid', 'xmin', 'catalog_xmin', 'restart_lsn');
+   my $result = $self->safe_psql('postgres', 'SELECT ' . join(', ',
@fields) . " FROM pg_catalog.pg_replication_slots WHERE slot_name =
'$slot_name'");
+   $result = undef if $result eq '';
+   # hash slice, see http://stackoverflow.com/a/16755894/398670 .
Couldn't this portion be made more generic? Other queries could
benefit from that by having a routine that accepts as argument an
array of column names for example.

Now looking at 0002
One whitespace:
$ git diff HEAD~1 --check
src/test/perl/pg_lsn.pm:139: trailing whitespace.
+=cut

pg_lsn sounds like a fine name, now we are more used to camel case for
module names. And routines are written as lower case separated by an
underscore.

+++ b/src/test/perl/t/002_pg_lsn.pl
@@ -0,0 +1,68 @@
+use strict;
+use warnings;
+use Test::More tests => 42;
+use Scalar::Util qw(blessed);
Most of the tests added don't have a description. This makes things
harder to debug when tracking an issue.

It may be good to begin using this module within the other tests in
this patch as well. Now do we actually need it? Most of the existing
tests I recall rely on the backend's operators for the pg_lsn data
type, so this is actually

Re: [HACKERS] Logical decoding on standby

2016-12-21 Thread Robert Haas

On Tue, Dec 20, 2016 at 10:06 PM, Craig Ringer  wrote:
> On 20 December 2016 at 15:03, Petr Jelinek  
> wrote:
>>> The biggest change in this patch, and the main intrusive part, is that
>>> procArray->replication_slot_catalog_xmin is no longer directly used by
>>> vacuum. Instead, a new ShmemVariableCache->oldestCatalogXmin field is
>>> added, with a corresponding CheckPoint field.
>>> [snip]
>>
>> If this mechanism would not be needed most of the time, wouldn't it be
>> better to not have it and just have a way to ask physical slot about
>> what's the current reserved catalog_xmin (in most cases the standby
>> should actually know what it is since it's sending the hs_feedback, but
>> first slot created on such standby may need to check).
>
> Yes, and that was actually my originally preferred approach, though
> the one above does offer the advantage that if something goes wrong we
> can detect it and fail gracefully. Possibly not worth the complexity
> though.
>
> Your approach requires us to make very sure that hot_standby_feedback
> does not get turned off by user or become ineffective once we're
> replicating, though, since we won't have any way to detect when needed
> tuples are removed. We'd probably just bail out with relcache/syscache
> lookup errors, but I can't guarantee we wouldn't crash if we tried
> logical decoding on WAL where needed catalogs have been removed.

I dunno, Craig, I think your approach sounds more robust.  It's not
very nice to introduce a bunch of random prohibitions on what works
with what, and it doesn't sound like it's altogether watertight
anyway.  Incorporating an occasional, small record into the WAL stream
to mark the advancement of the reserved catalog_xmin seems like a
cleaner and safer solution.  We certainly do NOT want to find out
about corruption only because of random relcache/syscache lookup
failures, let alone crashes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2016-12-20 Thread Petr Jelinek

On 21/12/16 04:06, Craig Ringer wrote:
> On 20 December 2016 at 15:03, Petr Jelinek  
> wrote:
> 
>>> The biggest change in this patch, and the main intrusive part, is that
>>> procArray->replication_slot_catalog_xmin is no longer directly used by
>>> vacuum. Instead, a new ShmemVariableCache->oldestCatalogXmin field is
>>> added, with a corresponding CheckPoint field.
>>> [snip]
>>
>> If this mechanism would not be needed most of the time, wouldn't it be
>> better to not have it and just have a way to ask physical slot about
>> what's the current reserved catalog_xmin (in most cases the standby
>> should actually know what it is since it's sending the hs_feedback, but
>> first slot created on such standby may need to check).
> 
> Yes, and that was actually my originally preferred approach, though
> the one above does offer the advantage that if something goes wrong we
> can detect it and fail gracefully. Possibly not worth the complexity
> though.
> 
> Your approach requires us to make very sure that hot_standby_feedback
> does not get turned off by user or become ineffective once we're
> replicating, though, since we won't have any way to detect when needed
> tuples are removed. We'd probably just bail out with relcache/syscache
> lookup errors, but I can't guarantee we wouldn't crash if we tried
> logical decoding on WAL where needed catalogs have been removed.
> 
> I initially ran into trouble doing that, but now think I have a way forward.
> 
>> WRT preventing
>> hs_feedback going off, we can refuse to start with hs_feedback off when
>> there are logical slots detected.
> 
> Yes. There are some ordering issues there though. We load slots quite
> late in startup and they don't get tracked in checkpoints. So we might
> launch the walreceiver before we load slots and notice their needed
> xmin/catalog_xmin. So we need to prevent sending of
> hot_standby_feedback until slots are loaded, or load slots earlier in
> startup. The former sounds less intrusive and safer - probably just
> add an "initialized" flag to ReplicationSlotCtlData and suppress
> hs_feedback until it becomes true.
> 
> We'd also have to suppress the validation callback action on the
> hot_standby_feedback GUC until we know replication slot state is
> initialised, and perform the check during slot startup instead. The
> hot_standby_feedback GUC validation check might get called before
> shmem is even set up so we have to guard against attempts to access a
> shmem segment that may not event exist yet.
> 
> The general idea is workable though. Refuse to start if logical slots
> exist and hot_standby_feedback is off or walreceiver isn't using a
> physical slot. Refuse to allow hot_standby_feedback to change
> 

These are all problems associated with replication slots being in shmem
if I understand correctly. I wonder, could we put just bool someplace
which is available early that says if there are any logical slots
defined? We don't actually need all the slot info, just to know if there
are some.

> 
>> You may ask what if user drops the slot and recreates or somehow
>> otherwise messes up catalog_xmin on master, well, considering that under
>> my proposal we'd first (on connect) check the slot for catalog_xmin we'd
>> know about it so we'd either mark the local slots broken/drop them or
>> plainly refuse to connect to the master same way as if it didn't have
>> required WAL anymore (not sure which behavior is better). Note that user
>> could mess up catalog_xmin even in your design the same way, so it's not
>> really a regression.
> 
> Agreed. Checking catalog_xmin of the slot when we connect is
> sufficient to guard against that, assuming we can trust that the
> catalog_xmin is actually in effect on the master. Consider cascading
> setups, where we set our catalog_xmin but it might not be "locked in"
> until the middle cascaded server relays it to the master.
> 
> I have a proposed solution to that which I'll outline in a separate
> patch+post; it ties in to some work on addressing the race between hot
> standby feedback taking effect and queries starting on the hot
> standby.  It boils down to "add a hot_standby_feedback reply protocol
> message".
> 
>> Plus
>> it might even be okay to only allow creating logical slots on standbys
>> connected directly to master in v1.
> 
> True. I didn't consider that.
> 
> We haven't had much luck in the past with such limitations, but
> personally I'd consider it a perfectly reasonable one.
> 

I think it's infinitely better with that limitation than the status quo.
Especially for failover scenario (you usually won't failover to replica
down the cascade as it's always more behind). Not to mention that with
every level of cascading you get automatically more lag which means more
bloat so it might not even be all that desirable to go that route
immediately in v1 when we don't have way to control that bloat/maximum
slot lag.

-- 
  Petr Jelinek

Re: [HACKERS] Logical decoding on standby

2016-12-20 Thread Petr Jelinek

On 21/12/16 04:31, Craig Ringer wrote:
> On 20 December 2016 at 15:03, Petr Jelinek  
> wrote:
> 
>> But in 0003 I don't understand following code:
>>> + if (endpos != InvalidXLogRecPtr && !do_start_slot)
>>> + {
>>> + fprintf(stderr,
>>> + _("%s: cannot use --create-slot or 
>>> --drop-slot together with --endpos\n"),
>>> + progname);
>>> + fprintf(stderr, _("Try \"%s --help\" for more 
>>> information.\n"),
>>> + progname);
>>> + exit(1);
>>> + }
>>
>> Why is --create-slot and --endpos not allowed together?
> 
> Actually, the test is fine, the error is just misleading due to my
> misunderstanding.
> 
> The fix is simply to change the error message to
> 
> _("%s: --endpos may only be specified
> with --start\n"),
> 
> so I won't post a separate followup patch.
> 

Ah okay makes sense. The --create-slot + --endpos should definitely be
allowed combination, especially now that we can extend this to
optionally use temporary slot.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2016-12-20 Thread Craig Ringer

On 20 December 2016 at 15:03, Petr Jelinek  wrote:

> But in 0003 I don't understand following code:
>> + if (endpos != InvalidXLogRecPtr && !do_start_slot)
>> + {
>> + fprintf(stderr,
>> + _("%s: cannot use --create-slot or --drop-slot 
>> together with --endpos\n"),
>> + progname);
>> + fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
>> + progname);
>> + exit(1);
>> + }
>
> Why is --create-slot and --endpos not allowed together?

Actually, the test is fine, the error is just misleading due to my
misunderstanding.

The fix is simply to change the error message to

_("%s: --endpos may only be specified
with --start\n"),

so I won't post a separate followup patch.

Okano Naoki tried to bring this to my attention earlier, but I didn't
understand as I hadn't yet realised you could use --create-slot
--start, they weren't mutually exclusive.

(We test to ensure --start --drop-slot isn't specified earlier so no
test for do_drop_slot is required).

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2016-12-20 Thread Craig Ringer

On 20 December 2016 at 15:03, Petr Jelinek  wrote:

>> The biggest change in this patch, and the main intrusive part, is that
>> procArray->replication_slot_catalog_xmin is no longer directly used by
>> vacuum. Instead, a new ShmemVariableCache->oldestCatalogXmin field is
>> added, with a corresponding CheckPoint field.
>> [snip]
>
> If this mechanism would not be needed most of the time, wouldn't it be
> better to not have it and just have a way to ask physical slot about
> what's the current reserved catalog_xmin (in most cases the standby
> should actually know what it is since it's sending the hs_feedback, but
> first slot created on such standby may need to check).

Yes, and that was actually my originally preferred approach, though
the one above does offer the advantage that if something goes wrong we
can detect it and fail gracefully. Possibly not worth the complexity
though.

Your approach requires us to make very sure that hot_standby_feedback
does not get turned off by user or become ineffective once we're
replicating, though, since we won't have any way to detect when needed
tuples are removed. We'd probably just bail out with relcache/syscache
lookup errors, but I can't guarantee we wouldn't crash if we tried
logical decoding on WAL where needed catalogs have been removed.

I initially ran into trouble doing that, but now think I have a way forward.

> WRT preventing
> hs_feedback going off, we can refuse to start with hs_feedback off when
> there are logical slots detected.

Yes. There are some ordering issues there though. We load slots quite
late in startup and they don't get tracked in checkpoints. So we might
launch the walreceiver before we load slots and notice their needed
xmin/catalog_xmin. So we need to prevent sending of
hot_standby_feedback until slots are loaded, or load slots earlier in
startup. The former sounds less intrusive and safer - probably just
add an "initialized" flag to ReplicationSlotCtlData and suppress
hs_feedback until it becomes true.

We'd also have to suppress the validation callback action on the
hot_standby_feedback GUC until we know replication slot state is
initialised, and perform the check during slot startup instead. The
hot_standby_feedback GUC validation check might get called before
shmem is even set up so we have to guard against attempts to access a
shmem segment that may not event exist yet.

The general idea is workable though. Refuse to start if logical slots
exist and hot_standby_feedback is off or walreceiver isn't using a
physical slot. Refuse to allow hot_standby_feedback to change

> We can also refuse to connect to the
> master without physical slot if there are logical slots detected. I
> don't see problem with either of those.

Agreed. We must also be able to reliably enforce that the walreceiver
is using a replication slot to connect to the master and refuse to
start if it is not. The user could change recovery.conf and restart
the walreceiver while we're running, after we perform that check, so
walreceiver must also refuse to start if logical replication slots
exist but it has no primary slot name configured.

> You may ask what if user drops the slot and recreates or somehow
> otherwise messes up catalog_xmin on master, well, considering that under
> my proposal we'd first (on connect) check the slot for catalog_xmin we'd
> know about it so we'd either mark the local slots broken/drop them or
> plainly refuse to connect to the master same way as if it didn't have
> required WAL anymore (not sure which behavior is better). Note that user
> could mess up catalog_xmin even in your design the same way, so it's not
> really a regression.

Agreed. Checking catalog_xmin of the slot when we connect is
sufficient to guard against that, assuming we can trust that the
catalog_xmin is actually in effect on the master. Consider cascading
setups, where we set our catalog_xmin but it might not be "locked in"
until the middle cascaded server relays it to the master.

I have a proposed solution to that which I'll outline in a separate
patch+post; it ties in to some work on addressing the race between hot
standby feedback taking effect and queries starting on the hot
standby.  It boils down to "add a hot_standby_feedback reply protocol
message".

> Plus
> it might even be okay to only allow creating logical slots on standbys
> connected directly to master in v1.

True. I didn't consider that.

We haven't had much luck in the past with such limitations, but
personally I'd consider it a perfectly reasonable one.

> But in 0003 I don't understand following code:
>> + if (endpos != InvalidXLogRecPtr && !do_start_slot)
>> + {
>> + fprintf(stderr,
>> + _("%s: cannot use --create-slot or --drop-slot 
>> together with --endpos\n"),
>> + progname);
>> + fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
>> +

Re: [HACKERS] Logical decoding on standby

2016-12-19 Thread Petr Jelinek

On 07/12/16 07:05, Craig Ringer wrote:
> On 21 November 2016 at 16:17, Craig Ringer  wrote:
>> Hi all
>>
>> I've prepared a working initial, somewhat raw implementation for
>> logical decoding on physical standbys.
> 
> Hi all
> 
> I've attached a significantly revised patch, which now incorporates
> safeguards to ensure that we prevent decoding if the master has not
> retained needed catalogs and cancel decoding sessions that are holding
> up apply because they need too-old catalogs
> 
> The biggest change in this patch, and the main intrusive part, is that
> procArray->replication_slot_catalog_xmin is no longer directly used by
> vacuum. Instead, a new ShmemVariableCache->oldestCatalogXmin field is
> added, with a corresponding CheckPoint field. Vacuum notices if
> procArray->replication_slot_catalog_xmin has advanced past
> ShmemVariableCache->oldestCatalogXmin and writes a new xact rmgr
> record with the new value before it copies it to oldestCatalogXmin.
> This means that a standby can now reliably tell when catalogs are
> about to be removed or become candidates for removal, so it can pause
> redo until logical decoding sessions on the standby advance far enough
> that their catalog_xmin passes that point. It also means that if our
> hot_standby_feedback somehow fails to lock in the catalogs our slots
> need on a standby, we can cancel sessions with a conflict with
> recovery.
> 
> If wal_level is < logical this won't do anything, since
> replication_slot_catalog_xmin and oldestCatalogXmin will both always
> be 0.
> 
> Because oldestCatalogXmin advances eagerly as soon as vacuum sees the
> new replication_slot_catalog_xmin, this won't impact catalog bloat.
> 
> Ideally this mechanism won't generally actually be needed, since
> hot_standby_feedback stops the master from removing needed catalogs,
> and we make an effort to ensure that the standby has
> hot_standby_feedback enabled and is using a replication slot. We
> cannot prevent the user from dropping and re-creating the physical
> slot on the upstream, though, and it doesn't look simple to stop them
> turning off hot_standby_feedback or turning off use of a physical slot
> after creating logical slots, either. So we try to stop users shooting
> themselves in the foot, but if they do it anyway we notice and cope
> gracefully. Logging catalog_xmin also helps slots created on standbys
> know where to start, and makes sure we can deal gracefully with a race
> between hs_feedback and slot creation on a standby.
> 

Hi,

If this mechanism would not be needed most of the time, wouldn't it be
better to not have it and just have a way to ask physical slot about
what's the current reserved catalog_xmin (in most cases the standby
should actually know what it is since it's sending the hs_feedback, but
first slot created on such standby may need to check). WRT preventing
hs_feedback going off, we can refuse to start with hs_feedback off when
there are logical slots detected. We can also refuse to connect to the
master without physical slot if there are logical slots detected. I
don't see problem with either of those.

You may ask what if user drops the slot and recreates or somehow
otherwise messes up catalog_xmin on master, well, considering that under
my proposal we'd first (on connect) check the slot for catalog_xmin we'd
know about it so we'd either mark the local slots broken/drop them or
plainly refuse to connect to the master same way as if it didn't have
required WAL anymore (not sure which behavior is better). Note that user
could mess up catalog_xmin even in your design the same way, so it's not
really a regression.

In general I don't think that it's necessary to WAL log anything for
this. It will not work without slot and therefore via archiving anyway
so writing to WAL does not seem to buy us anything. There are some
interesting side effects of cascading (ie having A->B->C replication and
creating logical slot on C) but those should not be insurmountable. Plus
it might even be okay to only allow creating logical slots on standbys
connected directly to master in v1.

That's about approach, but since there are prerequisite patches in the
patchset that don't really depend on the approach I will comment about
them as well.

0001 and 0002 add testing infrastructure and look fine to me, possibly
committable.

But in 0003 I don't understand following code:
> + if (endpos != InvalidXLogRecPtr && !do_start_slot)
> + {
> + fprintf(stderr,
> + _("%s: cannot use --create-slot or --drop-slot 
> together with --endpos\n"),
> + progname);
> + fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
> + progname);
> + exit(1);
> + }

Why is --create-slot and --endpos not allowed together?

0004 again looks good but depends on 0003.

0005 is timeline following which is IMHO ready for committer, as is

Re: [HACKERS] Logical decoding on standby

2016-11-27 Thread Craig Ringer

>> --- a/contrib/pgstattuple/pgstatapprox.c
>> +++ b/contrib/pgstattuple/pgstatapprox.c
>> @@ -70,7 +70,7 @@ statapprox_heap(Relation rel, output_type *stat)
>>   TransactionId OldestXmin;
>>   uint64  misc_count = 0;
>>
>> - OldestXmin = GetOldestXmin(rel, true);
>> + OldestXmin = GetOldestXmin(rel, true, false);
>>   bstrategy = GetAccessStrategy(BAS_BULKREAD);
>>
>>   nblocks = RelationGetNumberOfBlocks(rel);
>
> This does not seem correct, you are sending false as pointer parameter.

Thanks. That's an oversight from the GetOldestXmin interface change
per your prior feedback. C doesn't care since null is 0 and false is
0, and I missed it when transforming the patch.

> 0012:
>
> I think there should be parameter saying if snapshot should be exported
> or not and if user asks for it on standby it should fail.

Sounds reasonable. That also means clients can suppress standby export
on master, which as we recently learned can be desirable sometimes.

> 0014 makes 0011 even more pointless.

Yeah, as I said, it's a bit WIP still and needs some rebasing and rearrangement.

> This also replaces the previous timeline following and decoding
> threads/CF entries so maybe those should be closed in CF?

I wasn't sure what to do about that, since it's all a set of related
functionality. I think it's going to get more traction as a single
"logical decoding onstandby" feature though, since the other parts are
hard to test and use in isolation. So yeah, probably, I'll do so
unless someone objects.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2016-11-27 Thread Petr Jelinek

Hi,

I did look at the code a bit. The first 6 patches seem reasonable.
I don't understand why some patches are separate tbh (like 7-10, or 11).

About the 0009:
> diff --git a/contrib/pg_visibility/pg_visibility.c 
> b/contrib/pg_visibility/pg_visibility.c
> index 9985e3e..4fa3ad4 100644
> --- a/contrib/pg_visibility/pg_visibility.c
> +++ b/contrib/pg_visibility/pg_visibility.c
> @@ -538,7 +538,7 @@ collect_corrupt_items(Oid relid, bool all_visible, bool 
> all_frozen)
>   if (all_visible)
>   {
>   /* Don't pass rel; that will fail in recovery. */
> - OldestXmin = GetOldestXmin(NULL, true);
> + OldestXmin = GetOldestXmin(NULL, true, false);
>   }
>  
>   rel = relation_open(relid, AccessShareLock);
> @@ -660,7 +660,7 @@ collect_corrupt_items(Oid relid, bool all_visible, bool 
> all_frozen)
>* a buffer lock. And this shouldn't happen 
> often, so it's
>* worth being careful so as to avoid false 
> positives.
>*/
> - RecomputedOldestXmin = GetOldestXmin(NULL, 
> true);
> + RecomputedOldestXmin = GetOldestXmin(NULL, 
> true, false);
>  
>   if (!TransactionIdPrecedes(OldestXmin, 
> RecomputedOldestXmin))
>   record_corrupt_item(items, 
> _self);
> diff --git a/contrib/pgstattuple/pgstatapprox.c 
> b/contrib/pgstattuple/pgstatapprox.c
> index f524fc4..5b33c97 100644
> --- a/contrib/pgstattuple/pgstatapprox.c
> +++ b/contrib/pgstattuple/pgstatapprox.c
> @@ -70,7 +70,7 @@ statapprox_heap(Relation rel, output_type *stat)
>   TransactionId OldestXmin;
>   uint64  misc_count = 0;
>  
> - OldestXmin = GetOldestXmin(rel, true);
> + OldestXmin = GetOldestXmin(rel, true, false);
>   bstrategy = GetAccessStrategy(BAS_BULKREAD);
>  
>   nblocks = RelationGetNumberOfBlocks(rel);

This does not seem correct, you are sending false as pointer parameter.

0012:

I think there should be parameter saying if snapshot should be exported
or not and if user asks for it on standby it should fail.

0014 makes 0011 even more pointless.

Not going into deeper detail as this is still very WIP. I go agree with
the general design though.

This also replaces the previous timeline following and decoding
threads/CF entries so maybe those should be closed in CF?

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2016-11-27 Thread Craig Ringer

On 26 Nov. 2016 23:40, "Robert Haas"  wrote:
>
> On Wed, Nov 23, 2016 at 8:37 AM, Craig Ringer 
wrote:
> >>> The last checkpoint's oldestXid, and ShmemVariableCache's oldestXid,
> >>> are already held down by ProcArray's catalog_xmin. But that doesn't
> >>> mean we haven't removed newer tuples from specific relations and
> >>> logged that in xl_heap_clean, etc, including catalogs or user
> >>> catalogs, it only means the clog still exists for those XIDs.
> >>
> >> Really?
> >
> > (Note the double negative above).
> >
> > Yes, necessarily so. You can't look up xids older than the clog
> > truncation threshold at oldestXid, per our discussion on txid_status()
> > and traceable commit. But the tuples from that xact aren't guaranteed
> > to exist in any given relation; vacuum uses vacuum_set_xid_limits(...)
> > which calls GetOldestXmin(...); that in turn scans ProcArray to find
> > the oldest xid any running xact cares about. It might bump it down
> > further if there's a replication slot requirement or based on
> > vacuum_defer_cleanup_age, but it doesn't care in the slightest about
> > oldestXmin.
> >
> > oldestXmin cannot advance until vacuum has removed all tuples for that
> > xid and advanced the database's datfrozenxid. But a given oldestXmin
> > says nothing about which tuples, catalog or otherwise, still exist and
> > are acessible.
>
> Right.  Sorry, my mistake.

Phew. Had me worried there.

Thanks for looking over it. Prototype looks promising so far.

Re: [HACKERS] Logical decoding on standby

2016-11-26 Thread Robert Haas

On Wed, Nov 23, 2016 at 8:37 AM, Craig Ringer  wrote:
>>> The last checkpoint's oldestXid, and ShmemVariableCache's oldestXid,
>>> are already held down by ProcArray's catalog_xmin. But that doesn't
>>> mean we haven't removed newer tuples from specific relations and
>>> logged that in xl_heap_clean, etc, including catalogs or user
>>> catalogs, it only means the clog still exists for those XIDs.
>>
>> Really?
>
> (Note the double negative above).
>
> Yes, necessarily so. You can't look up xids older than the clog
> truncation threshold at oldestXid, per our discussion on txid_status()
> and traceable commit. But the tuples from that xact aren't guaranteed
> to exist in any given relation; vacuum uses vacuum_set_xid_limits(...)
> which calls GetOldestXmin(...); that in turn scans ProcArray to find
> the oldest xid any running xact cares about. It might bump it down
> further if there's a replication slot requirement or based on
> vacuum_defer_cleanup_age, but it doesn't care in the slightest about
> oldestXmin.
>
> oldestXmin cannot advance until vacuum has removed all tuples for that
> xid and advanced the database's datfrozenxid. But a given oldestXmin
> says nothing about which tuples, catalog or otherwise, still exist and
> are acessible.

Right.  Sorry, my mistake.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2016-11-23 Thread Craig Ringer

On 23 November 2016 at 03:55, Robert Haas  wrote:
> On Tue, Nov 22, 2016 at 1:49 AM, Craig Ringer  wrote:
>> On 22 November 2016 at 10:20, Craig Ringer  wrote:
>>> I'm currently looking at making detection of replay conflict with a
>>> slot work by separating the current catalog_xmin into two effective
>>> parts - the catalog_xmin currently needed by any known slots
>>> (ProcArray->replication_slot_catalog_xmin, as now), and the oldest
>>> actually valid catalog_xmin where we know we haven't removed anything
>>> yet.
>>
>> OK, more detailed plan.
>>
>> The last checkpoint's oldestXid, and ShmemVariableCache's oldestXid,
>> are already held down by ProcArray's catalog_xmin. But that doesn't
>> mean we haven't removed newer tuples from specific relations and
>> logged that in xl_heap_clean, etc, including catalogs or user
>> catalogs, it only means the clog still exists for those XIDs.
>
> Really?

(Note the double negative above).

Yes, necessarily so. You can't look up xids older than the clog
truncation threshold at oldestXid, per our discussion on txid_status()
and traceable commit. But the tuples from that xact aren't guaranteed
to exist in any given relation; vacuum uses vacuum_set_xid_limits(...)
which calls GetOldestXmin(...); that in turn scans ProcArray to find
the oldest xid any running xact cares about. It might bump it down
further if there's a replication slot requirement or based on
vacuum_defer_cleanup_age, but it doesn't care in the slightest about
oldestXmin.

oldestXmin cannot advance until vacuum has removed all tuples for that
xid and advanced the database's datfrozenxid. But a given oldestXmin
says nothing about which tuples, catalog or otherwise, still exist and
are acessible.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2016-11-22 Thread Robert Haas

On Tue, Nov 22, 2016 at 1:49 AM, Craig Ringer  wrote:
> On 22 November 2016 at 10:20, Craig Ringer  wrote:
>> I'm currently looking at making detection of replay conflict with a
>> slot work by separating the current catalog_xmin into two effective
>> parts - the catalog_xmin currently needed by any known slots
>> (ProcArray->replication_slot_catalog_xmin, as now), and the oldest
>> actually valid catalog_xmin where we know we haven't removed anything
>> yet.
>
> OK, more detailed plan.
>
> The last checkpoint's oldestXid, and ShmemVariableCache's oldestXid,
> are already held down by ProcArray's catalog_xmin. But that doesn't
> mean we haven't removed newer tuples from specific relations and
> logged that in xl_heap_clean, etc, including catalogs or user
> catalogs, it only means the clog still exists for those XIDs.

Really?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2016-11-21 Thread Craig Ringer

On 22 November 2016 at 10:20, Craig Ringer  wrote:

> I'm currently looking at making detection of replay conflict with a
> slot work by separating the current catalog_xmin into two effective
> parts - the catalog_xmin currently needed by any known slots
> (ProcArray->replication_slot_catalog_xmin, as now), and the oldest
> actually valid catalog_xmin where we know we haven't removed anything
> yet.

OK, more detailed plan.

The last checkpoint's oldestXid, and ShmemVariableCache's oldestXid,
are already held down by ProcArray's catalog_xmin. But that doesn't
mean we haven't removed newer tuples from specific relations and
logged that in xl_heap_clean, etc, including catalogs or user
catalogs, it only means the clog still exists for those XIDs. We don't
emit a WAL record when we advance oldestXid in
SetTransactionIdLimit(), and doing so is useless because vacuum will
have already removed needed tuples from needed catalogs before calling
SetTransactionIdLimit() from vac_truncate_clog(). We know that if
oldestXid is n, the true valid catalog_xmin where no needed tuples
have been removed must be >= n. But we need to know the lower bound of
valid catalog_xmin, which oldestXid doesn't give us.

So right now a standby has no way to reliably know if the catalog_xmin
requirement for a given replication slot can be satisfied. A standby
can't tell based on a xl_heap_cleanup_info record, xl_heap_clean
record, etc whether the affected table is a catalog or not, and
shouldn't generate conflicts for non-catalogs since otherwise it'll be
constantly clobbering walsenders.

A 2-phase advance of the global catalog_xmin would mean that
GetOldestXmin() would return a value from ShmemVariableCache, not the
oldest catalog xmin from ProcArray like it does now. (auto)vacuum
would then be responsible for:

* Reading the oldest catalog_xmin from procarray
* If it has advanced vs what's present in ShmemVariableCache, writing
a new xlog record type recording an advance of oldest catalog xmin
* advancing ShmemVariableCache's oldest catalog xmin

and would do so before it called GetOldestXmin via
vacuum_set_xid_limits() to determine what it can remove.

GetOldestXmin would return the ProcArray's copy of the oldest
catalog_xmin when in recovery, so we report it via hot_standby_fedback
to the upstream, it's recorded on our physical slot, and in turn
causes vacuum to advance the master's effective oldest catalog_xmin
for vacuum.

On the standby we'd replay the catalog_xmin advance record, advance
the standby's ShmemVariableCache's oldest catalog xmin, and check to
see if any replication slots, active or not, have a catalog_xmin <
than the new threshold. If none do, there's no conflict and we're
fine. If any do, we wait
max_standby_streaming_delay/max_standby_archive_delay as appropriate,
then generate recovery conflicts against all backends that have an
active replication slot based on the replication slot state in shmem.
Those backends - walsender or normal decoding backend - would promptly
die. New decoding sessions will check the ShmemVariableCache and
refuse to start if their catalog_xmin is < the threshold. Since we
advance it before generating recovery conflicts there's no race with
clients trying to reconnect after their backend is killed with a
conflict.

If we wanted to get fancy we could set the latches of walsender
backends at risk of conflicting and they could check
ShmemVariableCache's oldest valid catalog xmin, so they could send
immediate keepalives with reply_requested set and hopefully get flush
confirmation from the client and advance their catalog_xmin before we
terminate them as conflicting with recovery. But that can IMO be done
later/separately.

Going to prototype this.

Alternate approach:
---

The oldest xid in heap_xlog_cleanup_info is relation-specific, but the
standby has no way to know if it's a catalog relation or not during
redo and know whether to kill slots and decoding sessions based on its
latestRemovedXid. Same for xl_heap_clean and the other records that
can cause snapshot conflicts (xl_xlog_visible, xl_heap_freeze_page,
xl_btree_delete xl_btree_reuse_page, spgxlogVacuumRedirect).

Instead of adding a 2-phase advance of the global catalog_xmin, we can
instead add a rider to each of these records that identifies whether
it's a catalog table or not. This would only be emitted when wal_level
>= logical, but it *would* increase WAL sizes a bit when logical
decoding is enabled even if it's not going to be used on a standby.
The rider would be a simple:

typedef struct xl_rel_catalog_info
{
bool rel_accessible_from_logical_decoding;
} xl_catalog_info;

or similar. During redo we call a new
ResolveRecoveryConflictWithLogicalSlot function from each of those
records' redo routines that does what I outlined above.

This way add more info to more xlog records, and the upstream has to
use RelationIsAccessibleInLogicalDecoding() to set up the records when
writing the xlogs.

Re: [HACKERS] Logical decoding on standby

2016-11-21 Thread Andres Freund

Hi,

On 2016-11-21 16:17:58 +0800, Craig Ringer wrote:
> I've prepared a working initial, somewhat raw implementation for
> logical decoding on physical standbys.

Please attach. Otherwise in a year or two it'll be impossible to look
this up.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2016-01-19 Thread Craig Ringer

On 19 January 2016 at 23:30, Дмитрий Сарафанников 
wrote:

When you plan to add logical decoding on standby?
>

> it is useful to have separate standby server for logical replication that
> will not break the master if you make a mistake in plugin.
>

Indeed.  It might PANIC it and force a restart, if that's what you mean. I
guess in an extreme case you could clobber shmem silently, which would be
bad. (I wonder if we could make shared_buffers mmap'ed read-only for
walsender backends?). So it certainly seems like it'd be nice to have.
There's also the advantage that you could shift load off the master by
having physical master=>replica replication on a fast wide link, then do
logical decoding from there to send over the wire to WAN sites etc.

Unfortunately it's not particularly simple and nobody seems to have time to
implement it. As Álvaro pointed out, sometimes you have to do the work if
you want the change to happen. Or find someone with the existing skills and
convince them to want to do it, but most of those people are already very,
very busy.

As part of the failover slots work Simon noted that:

"To prevent needed rows from being removed we need we would need to enhance
hot_standby_feedback so it sends both xmin and catalog_xmin to the master."

... which means a protocol change in the walsender protocol. So you're
looking at that plus the other comment given above, that

"We need to be able to correctly and quickly identify the timeline a  LSN
belongs to"

 which is new internal infrastructure and new state in the replica that
must be maintained. Probably persistently.

In other words ... this isn't especially simple to do. It requires at least
two non-trivial core changes. Nobody who has the skills to do it reasonably
quickly also has the time to do it at all or has other higher priority
things to do first. So it's not likely to happen soon unless you or someone
like you steps up and has a go at it.

If you do decide to attempt to implement it and you're willing to read a
fair bit of code and documentation to do so you'll find people here pretty
helpful with suggestions, ideas, and help if you get stuck. But only if you
put in the work yourself.

(On a side note, failover slots probably won't be usable from a standby
either. They have to write to WAL and you can't write to WAL from a
standby. So if slot support is ever added on a standby it'll probably be
ephemeral slots only.)

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Logical decoding on standby

2016-01-19 Thread Andres Freund

On 2016-01-20 15:11:06 +0800, Craig Ringer wrote:
> Unfortunately it's not particularly simple and nobody seems to have time to
> implement it.

FWIW, I don't think it's *that* hard.

> As Álvaro pointed out, sometimes you have to do the work if
> you want the change to happen. Or find someone with the existing skills and
> convince them to want to do it, but most of those people are already very,
> very busy.
> 
> As part of the failover slots work Simon noted that:
> 
> "To prevent needed rows from being removed we need we would need to enhance
> hot_standby_feedback so it sends both xmin and catalog_xmin to the master."
> 
> ... which means a protocol change in the walsender protocol. So you're
> looking at that plus the other comment given above, that

Not a huge problem though.


> "We need to be able to correctly and quickly identify the timeline a  LSN
> belongs to"
> 
>  which is new internal infrastructure and new state in the replica that
> must be maintained. Probably persistently.

I think it just needs a more efficient lookup structure over the
existing tliOfPointInHistory(), the data is all there.


> (On a side note, failover slots probably won't be usable from a standby
> either. They have to write to WAL and you can't write to WAL from a
> standby. So if slot support is ever added on a standby it'll probably be
> ephemeral slots only.)

ephemeral slots are a different thing. Anyway, at this point failover
slots aren't really proposed / have an agreed upon design yet, so it's a
bit hard to take them into account.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2016-01-19 Thread Alvaro Herrera

Hi Dimitriy,

Дмитрий Сарафанников wrote:

> /* 
> * TODO: We got to change that someday soon...
[ more code ]
> if (RecoveryInProgress())
> ereport(ERROR,
> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> errmsg("logical decoding cannot be used while in recovery")));
> When you plan to add logical decoding on standby?

Things change as people submit patches to make them change.  If you want
this changed, you could either write a patch yourself, or persuade
someone else to do it for you.

I don't think anyone is working in this particular TODO just yet -- as I
know, all the logical-decoding-enabled developers are pretty busy doing
other things.  The good news is that the things that need doing are
spelled right there in the comment :-)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

100 matches

Mail list logo