Re: [HACKERS] snapbuild woes

2017-08-06 Thread Andres Freund
On 2017-06-09 09:25:34 +0200, Antonin Houska wrote:
> Andres Freund  wrote:
> 
> > Looking at 0001:
> > - GetOldestSafeDecodingTransactionId() only guarantees to return an xid
> >   safe for decoding (note how procArray->replication_slot_catalog_xmin
> >   is checked), not one for the initial snapshot - so afaics this whole
> >   exercise doesn't guarantee much so far.
> 
> I happen to use CreateInitDecodingContext() in an extension, so I had to think
> what the new argumen exactly means (as for the incompatibility between PG
> 9.6.2 and 9.6.3, I suppose preprocessor directives can handle it).
> 
> One thing I'm failing to understand is: if TRUE is passed for
> need_full_snapshot, then slot->effective_xmin receives the result of
> 
> GetOldestSafeDecodingTransactionId(need_full_snapshot)
> 
> but this does include "catalog xmin".
> 
> If the value returned is determined by an existing slot which has valid
> effective_catalog_xmin and invalid effective_xmin (i.e. that slot only
> protects catalog tables from VACUUM but not the regular ones), then the new
> slot will think it (i.e. the new slot) protects even non-catalog tables, but
> that's no true.
> 
> Shouldn't the xmin_horizon be computed by this call instead?
> 
> GetOldestSafeDecodingTransactionId(!need_full_snapshot);
> 
> (If so, I think "considerCatalog" argument name would be clearer than
> "catalogOnly".)

Good catch. Pushing a fix. Afaict it's luckily inconsequential atm
because fo the way we wait for concurrent snapshots when creating a
slot. But it obviously nevertheless needs tobe fixed.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-06-09 Thread Antonin Houska
Andres Freund  wrote:

> Looking at 0001:
> - GetOldestSafeDecodingTransactionId() only guarantees to return an xid
>   safe for decoding (note how procArray->replication_slot_catalog_xmin
>   is checked), not one for the initial snapshot - so afaics this whole
>   exercise doesn't guarantee much so far.

I happen to use CreateInitDecodingContext() in an extension, so I had to think
what the new argumen exactly means (as for the incompatibility between PG
9.6.2 and 9.6.3, I suppose preprocessor directives can handle it).

One thing I'm failing to understand is: if TRUE is passed for
need_full_snapshot, then slot->effective_xmin receives the result of

GetOldestSafeDecodingTransactionId(need_full_snapshot)

but this does include "catalog xmin".

If the value returned is determined by an existing slot which has valid
effective_catalog_xmin and invalid effective_xmin (i.e. that slot only
protects catalog tables from VACUUM but not the regular ones), then the new
slot will think it (i.e. the new slot) protects even non-catalog tables, but
that's no true.

Shouldn't the xmin_horizon be computed by this call instead?

GetOldestSafeDecodingTransactionId(!need_full_snapshot);

(If so, I think "considerCatalog" argument name would be clearer than
"catalogOnly".)

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-14 Thread Andres Freund
On 2017-05-14 14:54:37 +0200, Petr Jelinek wrote:
> On 13/05/17 22:23, Andres Freund wrote:
> >> And wait for session 3 to finish slot creation, takes about 20 mins on
> >> my laptop without patches, minute and half with your patches for 0004
> >> and 0005 (or with your 0004 and my 0005) and about 2s with my original
> >> 0004 and 0005.
> > 
> > Is that with assertions enabled or not?  With assertions all the time
> > post patches seems to be spent in some Asserts in reorderbuffer.c,
> > without it takes less than a second for me here.
> >
> 
> Right you are, I always forget to switch that off before benchmarks.

Phew ;)

> > I'm appylying these now.
> 
> Cool. Just for posterity, this also fixes the issue number 3 as the
> switch to consistent state is done purely based on xl_running_xacts and
> not decoded commits/aborts.

Cool.  Although I'm still not convinced, as noted somewhere in this
thread, that it actually did much to start with ;)

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-14 Thread Petr Jelinek
On 13/05/17 22:23, Andres Freund wrote:
> On 2017-05-12 10:57:55 +0200, Petr Jelinek wrote:
>> Hmm, well it helps but actually now that we don't track individual
>> running transactions anymore it got much less effective (my version of
>> 0005 does as well).
>>
>> The example workload I test with is:
>> session 1: open transaction, do a write, keep it open
>> session 2: pgbench  -M simple -N -c 10 -P 1 -T 5
>> session 3: run CREATE_REPLICATION_SLOT LOGICAL in walsender
>> session 2: pgbench  -M simple -N -c 10 -P 1 -T 20
>> session 1: commit
>>
>> And wait for session 3 to finish slot creation, takes about 20 mins on
>> my laptop without patches, minute and half with your patches for 0004
>> and 0005 (or with your 0004 and my 0005) and about 2s with my original
>> 0004 and 0005.
> 
> Is that with assertions enabled or not?  With assertions all the time
> post patches seems to be spent in some Asserts in reorderbuffer.c,
> without it takes less than a second for me here.
>

Right you are, I always forget to switch that off before benchmarks.

> I'm appylying these now.

Cool. Just for posterity, this also fixes the issue number 3 as the
switch to consistent state is done purely based on xl_running_xacts and
not decoded commits/aborts.

So all done here, thanks!

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-13 Thread Andres Freund
On 2017-05-12 10:57:55 +0200, Petr Jelinek wrote:
> Hmm, well it helps but actually now that we don't track individual
> running transactions anymore it got much less effective (my version of
> 0005 does as well).
> 
> The example workload I test with is:
> session 1: open transaction, do a write, keep it open
> session 2: pgbench  -M simple -N -c 10 -P 1 -T 5
> session 3: run CREATE_REPLICATION_SLOT LOGICAL in walsender
> session 2: pgbench  -M simple -N -c 10 -P 1 -T 20
> session 1: commit
> 
> And wait for session 3 to finish slot creation, takes about 20 mins on
> my laptop without patches, minute and half with your patches for 0004
> and 0005 (or with your 0004 and my 0005) and about 2s with my original
> 0004 and 0005.

Is that with assertions enabled or not?  With assertions all the time
post patches seems to be spent in some Asserts in reorderbuffer.c,
without it takes less than a second for me here.

I'm appylying these now.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-12 Thread Petr Jelinek
On 12/05/17 10:57, Petr Jelinek wrote:
> On 12/05/17 03:31, Andres Freund wrote:
>> On 2017-05-11 14:54:26 -0700, Andres Freund wrote:
>>> On 2017-05-11 14:51:55 -0700,  wrote:
 On 2017-05-08 00:10:12 -0700, Andres Freund wrote:
> I plan to commit the next pending patch after the back branch releases
> are cut - it's an invasive fix and the issue doesn't cause corruption
> "just" slow slot creation. So it seems better to wait for a few days,
> rather than hurry it into the release.

 Now that that's done, here's an updated version of that patch.  Note the
 new logic to trigger xl_running_xact's to be logged at the right spot.
 Works well in my testing.

 I plan to commit this fairly soon, unless somebody wants a bit more time
 to look into it.

 Unless somebody protests, I'd like to slightly revise how the on-disk
 snapshots are stored on master - given the issues this bug/commit showed
 with the current method - but I can see one could argue that that should
 rather be done next release.
>>>
>>> As usual I forgot my attachement.
>>
>> This patch also seems to offer a way to do your 0005 in, possibly, more
>> efficient manner.  We don't ever need to assume a transaction is
>> timetravelling anymore.  Could you check whether the attached, hasty,
>> patch resolves the performance problems you measured?  Also, do you have
>> a "reference" workload for that?
>>
> 
> Hmm, well it helps but actually now that we don't track individual
> running transactions anymore it got much less effective (my version of
> 0005 does as well).
> 
> The example workload I test with is:
> session 1: open transaction, do a write, keep it open
> session 2: pgbench  -M simple -N -c 10 -P 1 -T 5
> session 3: run CREATE_REPLICATION_SLOT LOGICAL in walsender
> session 2: pgbench  -M simple -N -c 10 -P 1 -T 20
> session 1: commit
> 
> And wait for session 3 to finish slot creation, takes about 20 mins on
> my laptop without patches, minute and half with your patches for 0004
> and 0005 (or with your 0004 and my 0005) and about 2s with my original
> 0004 and 0005.
> 
> What makes it slow is the constant qsorting IIRC.
> 

Btw I still think that we should pursue the approach you proposed. I
think we can deal with the qsorting in some other ways (ordered insert
perhaps?) later. What you propose fixes the correctness, makes
performance less awful in the above workload and also makes the
snapbuild bit easier to read.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-12 Thread Petr Jelinek
On 12/05/17 03:31, Andres Freund wrote:
> On 2017-05-11 14:54:26 -0700, Andres Freund wrote:
>> On 2017-05-11 14:51:55 -0700,  wrote:
>>> On 2017-05-08 00:10:12 -0700, Andres Freund wrote:
 I plan to commit the next pending patch after the back branch releases
 are cut - it's an invasive fix and the issue doesn't cause corruption
 "just" slow slot creation. So it seems better to wait for a few days,
 rather than hurry it into the release.
>>>
>>> Now that that's done, here's an updated version of that patch.  Note the
>>> new logic to trigger xl_running_xact's to be logged at the right spot.
>>> Works well in my testing.
>>>
>>> I plan to commit this fairly soon, unless somebody wants a bit more time
>>> to look into it.
>>>
>>> Unless somebody protests, I'd like to slightly revise how the on-disk
>>> snapshots are stored on master - given the issues this bug/commit showed
>>> with the current method - but I can see one could argue that that should
>>> rather be done next release.
>>
>> As usual I forgot my attachement.
> 
> This patch also seems to offer a way to do your 0005 in, possibly, more
> efficient manner.  We don't ever need to assume a transaction is
> timetravelling anymore.  Could you check whether the attached, hasty,
> patch resolves the performance problems you measured?  Also, do you have
> a "reference" workload for that?
> 

Hmm, well it helps but actually now that we don't track individual
running transactions anymore it got much less effective (my version of
0005 does as well).

The example workload I test with is:
session 1: open transaction, do a write, keep it open
session 2: pgbench  -M simple -N -c 10 -P 1 -T 5
session 3: run CREATE_REPLICATION_SLOT LOGICAL in walsender
session 2: pgbench  -M simple -N -c 10 -P 1 -T 20
session 1: commit

And wait for session 3 to finish slot creation, takes about 20 mins on
my laptop without patches, minute and half with your patches for 0004
and 0005 (or with your 0004 and my 0005) and about 2s with my original
0004 and 0005.

What makes it slow is the constant qsorting IIRC.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-11 Thread Andres Freund
On 2017-05-11 14:54:26 -0700, Andres Freund wrote:
> On 2017-05-11 14:51:55 -0700,  wrote:
> > On 2017-05-08 00:10:12 -0700, Andres Freund wrote:
> > > I plan to commit the next pending patch after the back branch releases
> > > are cut - it's an invasive fix and the issue doesn't cause corruption
> > > "just" slow slot creation. So it seems better to wait for a few days,
> > > rather than hurry it into the release.
> > 
> > Now that that's done, here's an updated version of that patch.  Note the
> > new logic to trigger xl_running_xact's to be logged at the right spot.
> > Works well in my testing.
> > 
> > I plan to commit this fairly soon, unless somebody wants a bit more time
> > to look into it.
> > 
> > Unless somebody protests, I'd like to slightly revise how the on-disk
> > snapshots are stored on master - given the issues this bug/commit showed
> > with the current method - but I can see one could argue that that should
> > rather be done next release.
> 
> As usual I forgot my attachement.

This patch also seems to offer a way to do your 0005 in, possibly, more
efficient manner.  We don't ever need to assume a transaction is
timetravelling anymore.  Could you check whether the attached, hasty,
patch resolves the performance problems you measured?  Also, do you have
a "reference" workload for that?

Regards,

Andres
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 0f2dcb84be..4ddd10fcf0 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -929,21 +929,31 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 {
 	int			nxact;
 
-	bool		forced_timetravel = false;
+	bool		needs_snapshot = false;
+	bool		needs_timetravel = false;
+
 	bool		sub_needs_timetravel = false;
-	bool		top_needs_timetravel = false;
 
 	TransactionId xmax = xid;
 
+	if (builder->state == SNAPBUILD_START)
+		return;
+
+
 	/*
-	 * If we couldn't observe every change of a transaction because it was
-	 * already running at the point we started to observe we have to assume it
-	 * made catalog changes.
-	 *
-	 * This has the positive benefit that we afterwards have enough
-	 * information to build an exportable snapshot that's usable by pg_dump et
-	 * al.
+	 * Transactions preceding BUILDING_SNAPSHOT will neither be decoded, nor
+	 * will it be part of a snapshot.  So we don't even need to record
+	 * anything.
 	 */
+	if (builder->state == SNAPBUILD_BUILDING_SNAPSHOT &&
+		TransactionIdPrecedes(xid, SnapBuildNextPhaseAt(builder)))
+	{
+		/* ensure that only commits after this are getting replayed */
+		if (builder->start_decoding_at <= lsn)
+			builder->start_decoding_at = lsn + 1;
+		return;
+	}
+
 	if (builder->state < SNAPBUILD_CONSISTENT)
 	{
 		/* ensure that only commits after this are getting replayed */
@@ -951,12 +961,13 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 			builder->start_decoding_at = lsn + 1;
 
 		/*
-		 * We could avoid treating !SnapBuildTxnIsRunning transactions as
-		 * timetravel ones, but we want to be able to export a snapshot when
-		 * we reached consistency.
+		 * If we're building an exportable snapshot, force recording of the
+		 * xid, even if the transaction doesn't modify the catalog.
 		 */
-		forced_timetravel = true;
-		elog(DEBUG1, "forced to assume catalog changes for xid %u because it was running too early", xid);
+		if (builder->building_full_snapshot)
+		{
+			needs_timetravel = true;
+		}
 	}
 
 	for (nxact = 0; nxact < nsubxacts; nxact++)
@@ -964,23 +975,13 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 		TransactionId subxid = subxacts[nxact];
 
 		/*
-		 * If we're forcing timetravel we also need visibility information
-		 * about subtransaction, so keep track of subtransaction's state.
-		 */
-		if (forced_timetravel)
-		{
-			SnapBuildAddCommittedTxn(builder, subxid);
-			if (NormalTransactionIdFollows(subxid, xmax))
-xmax = subxid;
-		}
-
-		/*
 		 * Add subtransaction to base snapshot if it DDL, we don't distinguish
 		 * to toplevel transactions there.
 		 */
-		else if (ReorderBufferXidHasCatalogChanges(builder->reorder, subxid))
+		if (ReorderBufferXidHasCatalogChanges(builder->reorder, subxid))
 		{
 			sub_needs_timetravel = true;
+			needs_snapshot = true;
 
 			elog(DEBUG1, "found subtransaction %u:%u with catalog changes.",
  xid, subxid);
@@ -990,21 +991,26 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 			if (NormalTransactionIdFollows(subxid, xmax))
 xmax = subxid;
 		}
+		/*
+		 * If we're forcing timetravel we also need visibility information
+		 * about subtransaction, so keep track of subtransaction's state, even
+		 * if not catalog modifying.
+		 */
+		else if (needs_timetravel)
+		{
+			SnapBuildAddCommittedTxn(builder, subxid);
+			if (NormalTransactionIdFollows(subxid, xmax))
+xmax = subxid;
+		}
 	}
 
-	if 

Re: [HACKERS] snapbuild woes

2017-05-11 Thread Andres Freund
On 2017-04-15 05:18:49 +0200, Petr Jelinek wrote:
> >>> From 3318a929e691870f3c1ca665bec3bfa8ea2af2a8 Mon Sep 17 00:00:00 2001
> >>> From: Petr Jelinek 
> >>> Date: Sun, 26 Feb 2017 01:07:33 +0100
> >>> Subject: [PATCH 3/5] Prevent snapshot builder xmin from going backwards
> >>
> >> A bit more commentary would be good. What does that protect us against?
> >>
> > 
> > I think I explained that in the email. We might export snapshot with
> > xmin smaller than global xmin otherwise.
> > 
> 
> Updated commit message with explanation as well.


> From ae60b52ae0ca96bc14169cd507f101fbb5dfdf52 Mon Sep 17 00:00:00 2001
> From: Petr Jelinek 
> Date: Sun, 26 Feb 2017 01:07:33 +0100
> Subject: [PATCH 3/5] Prevent snapshot builder xmin from going backwards
> 
> Logical decoding snapshot builder may encounter xl_running_xacts with
> older xmin than the xmin of the builder. This can happen because
> LogStandbySnapshot() sometimes sees already committed transactions as
> running (there is difference between "running" in terms for WAL and in
> terms of ProcArray). When this happens we must make sure that the xmin
> of snapshot builder won't go back otherwise the resulting snapshot would
> show some transaction as running even though they have already
> committed.
> ---
>  src/backend/replication/logical/snapbuild.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/src/backend/replication/logical/snapbuild.c 
> b/src/backend/replication/logical/snapbuild.c
> index ada618d..3e34f75 100644
> --- a/src/backend/replication/logical/snapbuild.c
> +++ b/src/backend/replication/logical/snapbuild.c
> @@ -1165,7 +1165,8 @@ SnapBuildProcessRunningXacts(SnapBuild *builder, 
> XLogRecPtr lsn, xl_running_xact
>* looking, it's correct and actually more efficient this way since we 
> hit
>* fast paths in tqual.c.
>*/
> - builder->xmin = running->oldestRunningXid;
> + if (TransactionIdFollowsOrEquals(running->oldestRunningXid, 
> builder->xmin))
> + builder->xmin = running->oldestRunningXid;
>  
>   /* Remove transactions we don't need to keep track off anymore */
>   SnapBuildPurgeCommittedTxn(builder);
> -- 
> 2.7.4

I still don't understand.  The snapshot's xmin is solely managed via
xl_running_xacts, so I don't see how the WAL/procarray difference can
play a role here.  ->committed isn't pruned before xl_running_xacts
indicates it can be done, so I don't understand your explanation above.

I'd be ok with adding this as paranoia check, but I still don't
understand when it could practically be hit.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-11 Thread Andres Freund
On 2017-05-11 14:51:55 -0700,  wrote:
> On 2017-05-08 00:10:12 -0700, Andres Freund wrote:
> > I plan to commit the next pending patch after the back branch releases
> > are cut - it's an invasive fix and the issue doesn't cause corruption
> > "just" slow slot creation. So it seems better to wait for a few days,
> > rather than hurry it into the release.
> 
> Now that that's done, here's an updated version of that patch.  Note the
> new logic to trigger xl_running_xact's to be logged at the right spot.
> Works well in my testing.
> 
> I plan to commit this fairly soon, unless somebody wants a bit more time
> to look into it.
> 
> Unless somebody protests, I'd like to slightly revise how the on-disk
> snapshots are stored on master - given the issues this bug/commit showed
> with the current method - but I can see one could argue that that should
> rather be done next release.

As usual I forgot my attachement.

- Andres
>From 3ecec368e3015f5082f120b1a428d1108203a356 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Thu, 4 May 2017 16:48:00 -0700
Subject: [PATCH] Fix race condition leading to hanging logical slot creation.

The snapshot assembly during the creation of logical slots relied
waiting for transactions in xl_running_xacts to end, by checking for
their commit/abort records.  Unfortunately, despite locking, it is
possible to see an xl_running_xact record listing transactions as
ready, that have already WAL-logged an commit/abort record, as the
locking just prevents the ProcArray to be adjusted, and the commit
record has to be logged first.

That lead to either delayed or hanging snapshot creation, because
snapbuild.c would wait "forever" to see commit/abort records for some
transactions.  That hang resolved only if a xl_running_xacts record
without any running transactions happened to be logged, far from
certain on a busy server.

It's impractical to prevent that via more heavyweight locking, the
likelihood of deadlocks and significantly increased contention would
be too big.

Instead change the initial snapshot creation to be solely based on
tracking the oldest running transaction via
xl_running_xacts->oldestRunningXid - that actually ends up
significantly simplifying the code.  That has two disadvantages:
1) Because we cannot fully "trust" the contents of xl_running_xacts,
   we cannot use it to build the initial snapshot.  Instead we have to
   wait twice for all running transactions to finish.
2) Previously a slot, unless the race occurred, could be created when
   the all transaction perceived as running based on commit/abort
   records, now we have to wait for the next xl_running_xacts record.
To address that, trigger logging new xl_running_xacts record from
within snapbuild.c exactly when necessary.

Unfortunately snabuild.c's SnapBuild is stored on disk, one of the
stupider ideas of a certain Mr Freund, so we can't change it in a
minor release.  As this is going to be backpatched, we have to play
around a bit to keep on-disk compatibility.  A later commit will
rejigger that on master.

Author: Andres Freund, based on a quite different patch from Petr Jelinek
Analyzed-By: Petr Jelinek
Reviewed-By: Petr Jelinek
Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb...@2ndquadrant.com
Backpatch: 9.4-, where logical decoding has been introduced
---
 contrib/test_decoding/expected/ondisk_startup.out |  15 +-
 contrib/test_decoding/specs/ondisk_startup.spec   |   8 +-
 src/backend/replication/logical/decode.c  |   3 -
 src/backend/replication/logical/reorderbuffer.c   |   2 +-
 src/backend/replication/logical/snapbuild.c   | 416 ++
 src/include/replication/snapbuild.h   |  25 +-
 6 files changed, 220 insertions(+), 249 deletions(-)

diff --git a/contrib/test_decoding/expected/ondisk_startup.out b/contrib/test_decoding/expected/ondisk_startup.out
index 65115c830a..c7b1f45b46 100644
--- a/contrib/test_decoding/expected/ondisk_startup.out
+++ b/contrib/test_decoding/expected/ondisk_startup.out
@@ -1,21 +1,30 @@
 Parsed test spec with 3 sessions
 
-starting permutation: s2txid s1init s3txid s2alter s2c s1insert s1checkpoint s1start s1insert s1alter s1insert s1start
-step s2txid: BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT txid_current() IS NULL;
+starting permutation: s2b s2txid s1init s3b s3txid s2alter s2c s2b s2txid s3c s2c s1insert s1checkpoint s1start s1insert s1alter s1insert s1start
+step s2b: BEGIN;
+step s2txid: SELECT txid_current() IS NULL;
 ?column?   
 
 f  
 step s1init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding'); 
-step s3txid: BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT txid_current() IS NULL;
+step s3b: BEGIN;
+step s3txid: SELECT txid_current() IS NULL;
 ?column?   
 
 f  
 step s2alter: ALTER TABLE do_write ADD COLUMN addedbys2 int;
 step s2c: COMMIT;
+step s2b: BEGIN;
+step s2txid: SELECT txid_current() IS NULL;
+?column?   

Re: [HACKERS] snapbuild woes

2017-05-11 Thread Peter Geoghegan
On Thu, May 11, 2017 at 2:51 PM, Andres Freund  wrote:
> Now that that's done, here's an updated version of that patch.  Note the
> new logic to trigger xl_running_xact's to be logged at the right spot.
> Works well in my testing.

You forgot the patch. :-)


-- 
Peter Geoghegan

VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-11 Thread Andres Freund
On 2017-05-08 00:10:12 -0700, Andres Freund wrote:
> I plan to commit the next pending patch after the back branch releases
> are cut - it's an invasive fix and the issue doesn't cause corruption
> "just" slow slot creation. So it seems better to wait for a few days,
> rather than hurry it into the release.

Now that that's done, here's an updated version of that patch.  Note the
new logic to trigger xl_running_xact's to be logged at the right spot.
Works well in my testing.

I plan to commit this fairly soon, unless somebody wants a bit more time
to look into it.

Unless somebody protests, I'd like to slightly revise how the on-disk
snapshots are stored on master - given the issues this bug/commit showed
with the current method - but I can see one could argue that that should
rather be done next release.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-11 Thread Petr Jelinek
On 11/05/17 16:33, Stas Kelvich wrote:
> 
>> On 10 May 2017, at 11:43, Petr Jelinek  wrote:
>>
>> On 09/05/17 22:11, Erik Rijkers wrote:
>>> On 2017-05-09 21:00, Petr Jelinek wrote:
 On 09/05/17 19:54, Erik Rijkers wrote:
> On 2017-05-09 11:50, Petr Jelinek wrote:
>

 Ah okay, so this is same issue that's reported by both Masahiko Sawada
 [1] and Jeff Janes [2].

 [1]
 https://www.postgresql.org/message-id/CAD21AoBYpyqTSw%2B%3DES%2BxXtRGMPKh%3DpKiqjNxZKnNUae0pSt9bg%40mail.gmail.com

 [2]
 https://www.postgresql.org/message-id/flat/CAMkU%3D1xUJKs%3D2etq2K7bmbY51Q7g853HLxJ7qEB2Snog9oRvDw%40mail.gmail.com

>>>
>>> I don't understand why you come to that conclusion: both Masahiko Sawada
>>> and Jeff Janes have a DROP SUBSCRIPTION in the mix; my cases haven't. 
>>> Isn't that a real difference?
>>>
>>
> 
> Just noted another one issue/feature with snapshot builder — when logical 
> decoding is in progress
> and vacuum full command is being issued quite a big amount of files appears 
> in pg_logical/mappings
> and reside there till the checkpoint. Also having consequent vacuum full’s 
> before checkpoint yields even
> more files.
> 
> Having two pgbench-filled instances with logical replication between them:
> 
> for i in {1..100}; do psql -c 'vacuum full' && ls -la pg_logical/mappings | 
> wc -l; done;
> VACUUM
>   73
> VACUUM
>  454
> VACUUM
> 1146
> VACUUM
> 2149
> VACUUM
> 3463
> VACUUM
> 5088
> <...>
> VACUUM
>44708
> <…> // checkpoint happens somewhere here
> VACUUM
>20921
> WARNING:  could not truncate file "base/16384/61773": Too many open files in 
> system
> ERROR:  could not fsync file 
> "pg_logical/mappings/map-4000-4df-0_A4EA29F8-5aa5-5ae6": Too many open files 
> in system
> 
> 
> I’m not sure whether this is boils down to some of the previous issues 
> mentioned here or not, so posting
> here as observation.
> 

This has nothing to do with snapshot builder so this is not the thread
for it. See the comment section near the bottom of
src/backend/access/heap/rewriteheap.c for explanation of why it does
what it does.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-11 Thread Stas Kelvich

> On 10 May 2017, at 11:43, Petr Jelinek  wrote:
> 
> On 09/05/17 22:11, Erik Rijkers wrote:
>> On 2017-05-09 21:00, Petr Jelinek wrote:
>>> On 09/05/17 19:54, Erik Rijkers wrote:
 On 2017-05-09 11:50, Petr Jelinek wrote:
 
>>> 
>>> Ah okay, so this is same issue that's reported by both Masahiko Sawada
>>> [1] and Jeff Janes [2].
>>> 
>>> [1]
>>> https://www.postgresql.org/message-id/CAD21AoBYpyqTSw%2B%3DES%2BxXtRGMPKh%3DpKiqjNxZKnNUae0pSt9bg%40mail.gmail.com
>>> 
>>> [2]
>>> https://www.postgresql.org/message-id/flat/CAMkU%3D1xUJKs%3D2etq2K7bmbY51Q7g853HLxJ7qEB2Snog9oRvDw%40mail.gmail.com
>>> 
>> 
>> I don't understand why you come to that conclusion: both Masahiko Sawada
>> and Jeff Janes have a DROP SUBSCRIPTION in the mix; my cases haven't. 
>> Isn't that a real difference?
>> 
> 

Just noted another one issue/feature with snapshot builder — when logical 
decoding is in progress
and vacuum full command is being issued quite a big amount of files appears in 
pg_logical/mappings
and reside there till the checkpoint. Also having consequent vacuum full’s 
before checkpoint yields even
more files.

Having two pgbench-filled instances with logical replication between them:

for i in {1..100}; do psql -c 'vacuum full' && ls -la pg_logical/mappings | wc 
-l; done;
VACUUM
  73
VACUUM
 454
VACUUM
1146
VACUUM
2149
VACUUM
3463
VACUUM
5088
<...>
VACUUM
   44708
<…> // checkpoint happens somewhere here
VACUUM
   20921
WARNING:  could not truncate file "base/16384/61773": Too many open files in 
system
ERROR:  could not fsync file 
"pg_logical/mappings/map-4000-4df-0_A4EA29F8-5aa5-5ae6": Too many open files in 
system


I’m not sure whether this is boils down to some of the previous issues 
mentioned here or not, so posting
here as observation.


Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-10 Thread Petr Jelinek
On 09/05/17 22:11, Erik Rijkers wrote:
> On 2017-05-09 21:00, Petr Jelinek wrote:
>> On 09/05/17 19:54, Erik Rijkers wrote:
>>> On 2017-05-09 11:50, Petr Jelinek wrote:
>>>
>>
>> Ah okay, so this is same issue that's reported by both Masahiko Sawada
>> [1] and Jeff Janes [2].
>>
>> [1]
>> https://www.postgresql.org/message-id/CAD21AoBYpyqTSw%2B%3DES%2BxXtRGMPKh%3DpKiqjNxZKnNUae0pSt9bg%40mail.gmail.com
>>
>> [2]
>> https://www.postgresql.org/message-id/flat/CAMkU%3D1xUJKs%3D2etq2K7bmbY51Q7g853HLxJ7qEB2Snog9oRvDw%40mail.gmail.com
>>
> 
> I don't understand why you come to that conclusion: both Masahiko Sawada
> and Jeff Janes have a DROP SUBSCRIPTION in the mix; my cases haven't. 
> Isn't that a real difference?
> 

Because I only see the sync workers running in the output you shown, the
bug described in those threads can as one of side effects cause the sync
workers to wait forever for the apply that was killed, crashed, etc, in
the right moment, which I guess is what happened during your failed test
(there should be some info in the log about apply exiting).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-09 Thread Erik Rijkers

On 2017-05-09 21:00, Petr Jelinek wrote:

On 09/05/17 19:54, Erik Rijkers wrote:

On 2017-05-09 11:50, Petr Jelinek wrote:



Ah okay, so this is same issue that's reported by both Masahiko Sawada
[1] and Jeff Janes [2].

[1]
https://www.postgresql.org/message-id/CAD21AoBYpyqTSw%2B%3DES%2BxXtRGMPKh%3DpKiqjNxZKnNUae0pSt9bg%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/flat/CAMkU%3D1xUJKs%3D2etq2K7bmbY51Q7g853HLxJ7qEB2Snog9oRvDw%40mail.gmail.com


I don't understand why you come to that conclusion: both Masahiko Sawada 
and Jeff Janes have a DROP SUBSCRIPTION in the mix; my cases haven't.  
Isn't that a real difference?


( I do sometimes get that DROP-SUBSCRIPTION too, but much less often 
than the sync-failure. )






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-09 Thread Petr Jelinek
On 09/05/17 19:54, Erik Rijkers wrote:
> On 2017-05-09 11:50, Petr Jelinek wrote:
> 
>> I rebased the above mentioned patch to apply to the patches Andres sent,
>> if you could try to add it on top of what you have and check if it still
>> fails, that would be helpful.
> 
> It still fails.
> 
> With these patches
> 
> - 0001-WIP-Fix-off-by-one-around-GetLastImportantRecPtr.patch+
> - 2-WIP-Possibly-more-robust-snapbuild-approach.patch +
> - fix-statistics-reporting-in-logical-replication-work.patch +
> - Skip-unnecessary-snapshot-builds.patch
> 
> built again on top of 44c528810a1 ( so I had to add the
> 'fix-statistics-rep*' patch because without it I immediately got that
> Assertion failure again ).
> 
> As always most runs succeed (especially on this large 192GB 16-core
> server).
> 
> But attached is an output file of a number of runs of my
> pgbench_derail2.sh test.
> 
> Overal result:
> 
> -- out_20170509_1635.txt
>   3 -- pgbench -c 64 -j 8 -T 900 -P 180 -n   --  scale 25
>   2 -- All is well.
>   1 -- Not good, but breaking out of wait (21 times no change)
> 
> I broke it off after iteration 4, so 5 never ran, and
> iteration 1 failed due to a mistake in the harness (somethind stupid I
> did) - not interesting.
> 
> iteration 2 succeeds. (eventually has 'replica ok')
> 
> iteration 3 succeeds. (eventually has 'replica ok')
> 
> iteration 4 fails.
>   Just after 'alter subscription sub1 enable' I caught (as is usual)
> pg_stat_replication.state as 'catchup'. So far so good.
>   After the 15-minute pgbench run pg_stat_replication has only 2
> 'startup' lines (and none 'catchup' or 'streaming'):
> 
>  port | pg_stat_replication |  pid   | wal | replay_loc | diff |
> ?column? |  state  |   app   | sync_state
>  6972 | pg_stat_replication | 108349 | 19/8FBCC248 ||  |
>  | startup | derail2 | async
>  6972 | pg_stat_replication | 108351 | 19/8FBCC248 ||  |
>  | startup | derail2 | async
> 
> (that's from:
>select $port1 as port,'pg_stat_replication' as pg_stat_replication, pid
>  , pg_current_wal_location() wal, replay_location replay_loc,
> pg_current_wal_location() - replay_location as diff
>  , pg_current_wal_location() <= replay_location
>  , state, application_name as app, sync_state
>from pg_stat_replication
> )
> 
> This remains in this state for as long as my test-programs lets it
> (i.e., 20 x 30s, or something like that, and then the loop is exited);
> in the ouput file it says: 'Not good, but breaking out of wait'
> 
> Below is the accompanying ps (with the 2 'deranged senders' as Jeff
> Janes would surely call them):
> 
> 
> UID PID   PPID  C STIME TTY  STAT   TIME CMD
> rijkers  107147  1  0 17:11 pts/35   S+ 0:00
> /var/data1/pg_stuff/pg_installations/pgsql.logical_replication2/bin/postgres
> -D /var/data1/pg_stuff/pg_installations
> rijkers  107149 107147  0 17:11 ?Ss 0:00  \_ postgres:
> logger process
> rijkers  107299 107147  0 17:11 ?Ss 0:01  \_ postgres:
> checkpointer process
> rijkers  107300 107147  0 17:11 ?Ss 0:00  \_ postgres:
> writer process
> rijkers  107301 107147  0 17:11 ?Ss 0:00  \_ postgres: wal
> writer process
> rijkers  107302 107147  0 17:11 ?Ss 0:00  \_ postgres:
> autovacuum launcher process
> rijkers  107303 107147  0 17:11 ?Ss 0:00  \_ postgres: stats
> collector process
> rijkers  107304 107147  0 17:11 ?Ss 0:00  \_ postgres:
> bgworker: logical replication launcher
> rijkers  108348 107147  0 17:12 ?Ss 0:01  \_ postgres:
> bgworker: logical replication worker for subscription 70310 sync 70293
> rijkers  108350 107147  0 17:12 ?Ss 0:00  \_ postgres:
> bgworker: logical replication worker for subscription 70310 sync 70298
> rijkers  107145  1  0 17:11 pts/35   S+ 0:02
> /var/data1/pg_stuff/pg_installations/pgsql.logical_replication/bin/postgres
> -D /var/data1/pg_stuff/pg_installations
> rijkers  107151 107145  0 17:11 ?Ss 0:00  \_ postgres:
> logger process
> rijkers  107160 107145  0 17:11 ?Ss 0:08  \_ postgres:
> checkpointer process
> rijkers  107161 107145  0 17:11 ?Ss 0:07  \_ postgres:
> writer process
> rijkers  107162 107145  0 17:11 ?Ss 0:02  \_ postgres: wal
> writer process
> rijkers  107163 107145  0 17:11 ?Ss 0:00  \_ postgres:
> autovacuum launcher process
> rijkers  107164 107145  0 17:11 ?Ss 0:02  \_ postgres: stats
> collector process
> rijkers  107165 107145  0 17:11 ?Ss 0:00  \_ postgres:
> bgworker: logical replication launcher
> rijkers  108349 107145  0 17:12 ?Ss 0:27  \_ postgres: wal
> sender process rijkers [local] idle
> rijkers  108351 107145  0 17:12 ?Ss 0:26  \_ postgres: wal
> sender process rijkers [local] idle
> 
> I have had no time to add (or view) any CPUinfo.
> 
> 

Ah okay, so this is same 

Re: [HACKERS] snapbuild woes

2017-05-09 Thread Erik Rijkers

On 2017-05-09 11:50, Petr Jelinek wrote:

I rebased the above mentioned patch to apply to the patches Andres 
sent,
if you could try to add it on top of what you have and check if it 
still

fails, that would be helpful.


It still fails.

With these patches

- 0001-WIP-Fix-off-by-one-around-GetLastImportantRecPtr.patch+
- 2-WIP-Possibly-more-robust-snapbuild-approach.patch +
- fix-statistics-reporting-in-logical-replication-work.patch +
- Skip-unnecessary-snapshot-builds.patch

built again on top of 44c528810a1 ( so I had to add the 
'fix-statistics-rep*' patch because without it I immediately got that 
Assertion failure again ).


As always most runs succeed (especially on this large 192GB 16-core 
server).


But attached is an output file of a number of runs of my 
pgbench_derail2.sh test.


Overal result:

-- out_20170509_1635.txt
  3 -- pgbench -c 64 -j 8 -T 900 -P 180 -n   --  scale 25
  2 -- All is well.
  1 -- Not good, but breaking out of wait (21 times no change)

I broke it off after iteration 4, so 5 never ran, and
iteration 1 failed due to a mistake in the harness (somethind stupid I 
did) - not interesting.


iteration 2 succeeds. (eventually has 'replica ok')

iteration 3 succeeds. (eventually has 'replica ok')

iteration 4 fails.
  Just after 'alter subscription sub1 enable' I caught (as is usual) 
pg_stat_replication.state as 'catchup'. So far so good.
  After the 15-minute pgbench run pg_stat_replication has only 2 
'startup' lines (and none 'catchup' or 'streaming'):


 port | pg_stat_replication |  pid   | wal | replay_loc | diff | 
?column? |  state  |   app   | sync_state
 6972 | pg_stat_replication | 108349 | 19/8FBCC248 ||  | 
 | startup | derail2 | async
 6972 | pg_stat_replication | 108351 | 19/8FBCC248 ||  | 
 | startup | derail2 | async


(that's from:
   select $port1 as port,'pg_stat_replication' as pg_stat_replication, 
pid
 , pg_current_wal_location() wal, replay_location replay_loc, 
pg_current_wal_location() - replay_location as diff

 , pg_current_wal_location() <= replay_location
 , state, application_name as app, sync_state
   from pg_stat_replication
)

This remains in this state for as long as my test-programs lets it 
(i.e., 20 x 30s, or something like that, and then the loop is exited); 
in the ouput file it says: 'Not good, but breaking out of wait'


Below is the accompanying ps (with the 2 'deranged senders' as Jeff 
Janes would surely call them):



UID PID   PPID  C STIME TTY  STAT   TIME CMD
rijkers  107147  1  0 17:11 pts/35   S+ 0:00 
/var/data1/pg_stuff/pg_installations/pgsql.logical_replication2/bin/postgres 
-D /var/data1/pg_stuff/pg_installations
rijkers  107149 107147  0 17:11 ?Ss 0:00  \_ postgres: 
logger process
rijkers  107299 107147  0 17:11 ?Ss 0:01  \_ postgres: 
checkpointer process
rijkers  107300 107147  0 17:11 ?Ss 0:00  \_ postgres: 
writer process
rijkers  107301 107147  0 17:11 ?Ss 0:00  \_ postgres: wal 
writer process
rijkers  107302 107147  0 17:11 ?Ss 0:00  \_ postgres: 
autovacuum launcher process
rijkers  107303 107147  0 17:11 ?Ss 0:00  \_ postgres: stats 
collector process
rijkers  107304 107147  0 17:11 ?Ss 0:00  \_ postgres: 
bgworker: logical replication launcher
rijkers  108348 107147  0 17:12 ?Ss 0:01  \_ postgres: 
bgworker: logical replication worker for subscription 70310 sync 70293
rijkers  108350 107147  0 17:12 ?Ss 0:00  \_ postgres: 
bgworker: logical replication worker for subscription 70310 sync 70298
rijkers  107145  1  0 17:11 pts/35   S+ 0:02 
/var/data1/pg_stuff/pg_installations/pgsql.logical_replication/bin/postgres 
-D /var/data1/pg_stuff/pg_installations
rijkers  107151 107145  0 17:11 ?Ss 0:00  \_ postgres: 
logger process
rijkers  107160 107145  0 17:11 ?Ss 0:08  \_ postgres: 
checkpointer process
rijkers  107161 107145  0 17:11 ?Ss 0:07  \_ postgres: 
writer process
rijkers  107162 107145  0 17:11 ?Ss 0:02  \_ postgres: wal 
writer process
rijkers  107163 107145  0 17:11 ?Ss 0:00  \_ postgres: 
autovacuum launcher process
rijkers  107164 107145  0 17:11 ?Ss 0:02  \_ postgres: stats 
collector process
rijkers  107165 107145  0 17:11 ?Ss 0:00  \_ postgres: 
bgworker: logical replication launcher
rijkers  108349 107145  0 17:12 ?Ss 0:27  \_ postgres: wal 
sender process rijkers [local] idle
rijkers  108351 107145  0 17:12 ?Ss 0:26  \_ postgres: wal 
sender process rijkers [local] idle


I have had no time to add (or view) any CPUinfo.


Erik Rijkers





out_20170509_1635.txt
Description: application/elc

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-09 Thread Erik Rijkers

On 2017-05-09 11:50, Petr Jelinek wrote:

On 09/05/17 10:59, Erik Rijkers wrote:

On 2017-05-09 10:50, Petr Jelinek wrote:

On 09/05/17 00:03, Erik Rijkers wrote:

On 2017-05-05 02:00, Andres Freund wrote:


Could you have a look?


[...]

I rebased the above mentioned patch to apply to the patches Andres 
sent,
if you could try to add it on top of what you have and check if it 
still

fails, that would be helpful.


I suppose you mean these; but they do not apply anymore:

20170505/0001-WIP-Fix-off-by-one-around-GetLastImportantRecPtr.patch
20170505/0002-WIP-Possibly-more-robust-snapbuild-approach.patch

Andres, any change you could update them?

alternatively I could use the older version again..

thanks,

Erik Rijkers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-09 Thread Petr Jelinek
On 09/05/17 10:59, Erik Rijkers wrote:
> On 2017-05-09 10:50, Petr Jelinek wrote:
>> On 09/05/17 00:03, Erik Rijkers wrote:
>>> On 2017-05-05 02:00, Andres Freund wrote:

 Could you have a look?
>>>
>>> Running tests with these three patches:
>>>
 0001-WIP-Fix-off-by-one-around-GetLastImportantRecPtr.patch+
 0002-WIP-Possibly-more-robust-snapbuild-approach.patch +
 fix-statistics-reporting-in-logical-replication-work.patch
>>> (on top of 44c528810)
>>>
>>> I test by 15-minute pgbench runs while there is a logical replication
>>> connection. Primary and replica are on the same machine.
>>>
>>> I have seen errors on 3 different machines (where error means: at least
>>> 1 of the 4 pgbench tables is not md5-equal). It seems better, faster
>>> machines yield less errors.
>>>
>>> Normally I see in pg_stat_replication (on master) one process in state
>>> 'streaming'.
>>>
>>>  pid  | wal | replay_loc  |   diff   |   state   |   app   |
>>> sync_state
>>> 16495 | 11/EDBC | 11/EA3FEEE8 | 58462488 | streaming | derail2 |
>>> async
>>>
>>> Often there are another two processes in pg_stat_replication that remain
>>> in state 'startup'.
>>>
>>> In the failing sessions the 'streaming'-state process is missing; in
>>> failing sessions there are only the two processes that are and remain in
>>> 'startup'.
>>
>> Hmm, startup is the state where slot creation is happening. I wonder if
>> it's just taking long time to create snapshot because of the 5th issue
>> which is not yet fixed (and the original patch will not apply on top of
>> this change). Alternatively there is a bug in this patch.
>>
>> Did you see high CPU usage during the test when there were those
>> "startup" state walsenders?
>>
> 
> I haven't noticed but I didn't pay attention to that particularly.
> 
> I'll try to get some CPU-info logged...
> 

I rebased the above mentioned patch to apply to the patches Andres sent,
if you could try to add it on top of what you have and check if it still
fails, that would be helpful.

Thanks!

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


Skip-unnecessary-snapshot-builds.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-09 Thread Erik Rijkers

On 2017-05-09 10:50, Petr Jelinek wrote:

On 09/05/17 00:03, Erik Rijkers wrote:

On 2017-05-05 02:00, Andres Freund wrote:


Could you have a look?


Running tests with these three patches:


0001-WIP-Fix-off-by-one-around-GetLastImportantRecPtr.patch+
0002-WIP-Possibly-more-robust-snapbuild-approach.patch +
fix-statistics-reporting-in-logical-replication-work.patch

(on top of 44c528810)

I test by 15-minute pgbench runs while there is a logical replication
connection. Primary and replica are on the same machine.

I have seen errors on 3 different machines (where error means: at 
least

1 of the 4 pgbench tables is not md5-equal). It seems better, faster
machines yield less errors.

Normally I see in pg_stat_replication (on master) one process in state
'streaming'.

 pid  | wal | replay_loc  |   diff   |   state   |   app   |
sync_state
16495 | 11/EDBC | 11/EA3FEEE8 | 58462488 | streaming | derail2 | 
async


Often there are another two processes in pg_stat_replication that 
remain

in state 'startup'.

In the failing sessions the 'streaming'-state process is missing; in
failing sessions there are only the two processes that are and remain 
in

'startup'.


Hmm, startup is the state where slot creation is happening. I wonder if
it's just taking long time to create snapshot because of the 5th issue
which is not yet fixed (and the original patch will not apply on top of
this change). Alternatively there is a bug in this patch.

Did you see high CPU usage during the test when there were those
"startup" state walsenders?



I haven't noticed but I didn't pay attention to that particularly.

I'll try to get some CPU-info logged...



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-09 Thread Petr Jelinek
On 09/05/17 00:03, Erik Rijkers wrote:
> On 2017-05-05 02:00, Andres Freund wrote:
>>
>> Could you have a look?
> 
> Running tests with these three patches:
> 
>> 0001-WIP-Fix-off-by-one-around-GetLastImportantRecPtr.patch+
>> 0002-WIP-Possibly-more-robust-snapbuild-approach.patch +
>> fix-statistics-reporting-in-logical-replication-work.patch
> (on top of 44c528810)
> 
> I test by 15-minute pgbench runs while there is a logical replication
> connection. Primary and replica are on the same machine.
> 
> I have seen errors on 3 different machines (where error means: at least
> 1 of the 4 pgbench tables is not md5-equal). It seems better, faster
> machines yield less errors.
> 
> Normally I see in pg_stat_replication (on master) one process in state
> 'streaming'.
> 
>  pid  | wal | replay_loc  |   diff   |   state   |   app   |
> sync_state
> 16495 | 11/EDBC | 11/EA3FEEE8 | 58462488 | streaming | derail2 | async
> 
> Often there are another two processes in pg_stat_replication that remain
> in state 'startup'.
> 
> In the failing sessions the 'streaming'-state process is missing; in
> failing sessions there are only the two processes that are and remain in
> 'startup'.

Hmm, startup is the state where slot creation is happening. I wonder if
it's just taking long time to create snapshot because of the 5th issue
which is not yet fixed (and the original patch will not apply on top of
this change). Alternatively there is a bug in this patch.

Did you see high CPU usage during the test when there were those
"startup" state walsenders?

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-08 Thread Erik Rijkers

On 2017-05-05 02:00, Andres Freund wrote:


Could you have a look?


Running tests with these three patches:


0001-WIP-Fix-off-by-one-around-GetLastImportantRecPtr.patch+
0002-WIP-Possibly-more-robust-snapbuild-approach.patch +
fix-statistics-reporting-in-logical-replication-work.patch

(on top of 44c528810)

I test by 15-minute pgbench runs while there is a logical replication 
connection. Primary and replica are on the same machine.


I have seen errors on 3 different machines (where error means: at least 
1 of the 4 pgbench tables is not md5-equal). It seems better, faster 
machines yield less errors.


Normally I see in pg_stat_replication (on master) one process in state 
'streaming'.


 pid  | wal | replay_loc  |   diff   |   state   |   app   | 
sync_state
16495 | 11/EDBC | 11/EA3FEEE8 | 58462488 | streaming | derail2 | 
async


Often there are another two processes in pg_stat_replication that remain 
in state 'startup'.


In the failing sessions the 'streaming'-state process is missing; in 
failing sessions there are only the two processes that are and remain in 
'startup'.


FWIW, below  the output of a succesful and a failed run:


-- successful run:
creating tables...
1590400 of 250 tuples (63%) done (elapsed 5.34 s, remaining 3.05 s)
250 of 250 tuples (100%) done (elapsed 9.63 s, remaining 0.00 s)
vacuum...
set primary keys...
done.
create publication pub1 for all tables;
create subscription sub1 connection 'port=6972 application_name=derail2' 
publication pub1 with (disabled);

alter subscription sub1 enable;
-- pgbench -c 90 -j 8 -T 900 -P 180 -n   --  scale 25
progress: 180.0 s, 82.5 tps, lat 1086.845 ms stddev 3211.785
progress: 360.0 s, 25.4 tps, lat 3469.040 ms stddev 6297.440
progress: 540.0 s, 28.9 tps, lat 3131.438 ms stddev 4288.130
progress: 720.0 s, 27.5 tps, lat 3285.024 ms stddev 4113.841
progress: 900.0 s, 47.2 tps, lat 1896.698 ms stddev 2182.695
transaction type: 
scaling factor: 25
query mode: simple
number of clients: 90
number of threads: 8
duration: 900 s
number of transactions actually processed: 38175
latency average = 2128.606 ms
latency stddev = 3948.634 ms
tps = 42.151205 (including connections establishing)
tps = 42.151589 (excluding connections establishing)
-- waiting 0s... (always)
 port | pg_stat_replication | pid  | wal | replay_loc  | diff | 
?column? |   state   |   app   | sync_state
 6972 | pg_stat_replication | 2545 | 18/432B2180 | 18/432B2180 |0 | 
t| streaming | derail2 | async


2017.05.08 23:19:22
-- getting md5 (cb)
6972 a,b,t,h:  250 25250  38175  b2ba48b53 b3788a837 
d1afac950 d4abcc72e master
6973 a,b,t,h:  250 25250  38175  b2ba48b53 b3788a837 
d1afac950 d4abcc72e replica ok  bee2312c7

2017.05.08 23:20:48

 port | pg_stat_replication | pid  | wal | replay_loc  |   diff  
 | ?column? |   state   |   app   | sync_state
 6972 | pg_stat_replication | 2545 | 18/4AEEC8C0 | 18/453FBD20 | 
95357856 | f| streaming | derail2 | async




-- failure:
creating tables...
1777100 of 250 tuples (71%) done (elapsed 5.06 s, remaining 2.06 s)
250 of 250 tuples (100%) done (elapsed 7.41 s, remaining 0.00 s)
vacuum...
set primary keys...
done.
create publication pub1 for all tables;
create subscription sub1 connection 'port=6972 application_name=derail2' 
publication pub1 with (disabled);

alter subscription sub1 enable;
 port | pg_stat_replication |  pid  | wal | replay_loc | diff | 
?column? |  state  |   app   | sync_state
 6972 | pg_stat_replication | 11945 | 18/5E2913D0 ||  |  
| catchup | derail2 | async


-- pgbench -c 90 -j 8 -T 900 -P 180 -n   --  scale 25
progress: 180.0 s, 78.4 tps, lat 1138.348 ms stddev 2884.815
progress: 360.0 s, 69.2 tps, lat 1309.716 ms stddev 2594.231
progress: 540.0 s, 59.0 tps, lat 1519.146 ms stddev 2033.400
progress: 720.0 s, 62.9 tps, lat 1421.854 ms stddev 1775.066
progress: 900.0 s, 57.0 tps, lat 1575.693 ms stddev 1681.800
transaction type: 
scaling factor: 25
query mode: simple
number of clients: 90
number of threads: 8
duration: 900 s
number of transactions actually processed: 58846
latency average = 1378.259 ms
latency stddev = 2304.159 ms
tps = 65.224168 (including connections establishing)
tps = 65.224788 (excluding connections establishing)
-- waiting 0s... (always)
 port | pg_stat_replication |  pid  | wal | replay_loc | diff | 
?column? |  state  |   app   | sync_state
 6972 | pg_stat_replication | 11948 | 18/7469A038 ||  |  
| startup | derail2 | async
 6972 | pg_stat_replication | 12372 | 18/7469A038 ||  |  
| startup | derail2 | async




During my tests, I keep an eye on pg_stat_replication (refreshing every 
2s), and when I see those two 'startup'-lines in pg_stat_replication 
without any 'streaming'-line I know the test 

Re: [HACKERS] snapbuild woes

2017-05-08 Thread Andres Freund


On May 3, 2017 10:45:16 PM PDT, Noah Misch  wrote:
>On Thu, Apr 27, 2017 at 09:42:58PM -0700, Andres Freund wrote:
>> 
>> 
>> On April 27, 2017 9:34:44 PM PDT, Noah Misch 
>wrote:
>> >On Fri, Apr 21, 2017 at 10:36:21PM -0700, Andres Freund wrote:
>> >> On 2017-04-17 21:16:57 -0700, Andres Freund wrote:
>> >> > I've since the previous update reviewed Petr's patch, which he
>> >since has
>> >> > updated over the weekend.  I'll do another round tomorrow, and
>will
>> >see
>> >> > how it looks.  I think we might need some more tests for this to
>be
>> >> > committable, so it might not become committable tomorrow.  I
>hope
>> >we'll
>> >> > have something in tree by end of this week, if not I'll send an
>> >update.
>> >> 
>> >> I was less productive this week than I'd hoped, and creating a
>> >testsuite
>> >> was more work than I'd anticipated, so I'm slightly lagging
>behind. 
>> >I
>> >> hope to have a patchset tomorrow, aiming to commit something
>> >> Monday/Tuesday.
>> >
>> >This PostgreSQL 10 open item is past due for your status update. 
>> >Kindly send
>> >a status update within 24 hours, and include a date for your
>subsequent
>> >status
>> >update.  Refer to the policy on open item ownership:
>>
>>https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
>> 
>> I committed part of the series today, plan to continue doing so over
>the next few days.  Changes require careful review & testing, this is
>easy to get wrong...
>
>This PostgreSQL 10 open item is past due for your status update. 
>Kindly send
>a status update within 24 hours, and include a date for your subsequent
>status
>update.
>
>Also, this open item has been alive for three weeks, well above
>guideline.  I
>understand it's a tricky bug, but I'm worried this isn't on track to
>end.
>What is missing to make it end?
>
>Refer to the policy on open item ownership:
>https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

I plan to commit the next pending patch after the back branch releases are cut 
- it's an invasive fix and the issue doesn't cause corruption "just" slow slot 
creation. So it seems better to wait for a few days, rather than hurry it into 
the release.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-05 Thread Petr Jelinek
On 05/05/17 18:18, Andres Freund wrote:
> On 2017-05-05 13:53:16 +0200, Petr Jelinek wrote:
>> On 05/05/17 02:42, Andres Freund wrote:
>>> On 2017-05-04 17:00:04 -0700, Andres Freund wrote:
 Attached is a prototype patch for that.
>>>
>>
>> I am not sure I understand the ABI comment for started_collection_at.
>> What's the ABI issue? The struct is private to snapbuild.c module. Or
>> you want to store it in the ondisk snapshot as well?
> 
> It's stored on-disk already :(

Hmm okay, well then I guess we'll have to store it somewhere inside
running, we should not normally care about that since we only load
CONSISTENT snapshots where running don't matter anymore. Alternatively
we could bump the SNAPBUILD_VERSION but that would make it impossible to
downgrade minor version which is bad (I guess we'll want to do that for
master, but not back-branches).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-05 Thread Andres Freund
On 2017-05-05 13:53:16 +0200, Petr Jelinek wrote:
> On 05/05/17 02:42, Andres Freund wrote:
> > On 2017-05-04 17:00:04 -0700, Andres Freund wrote:
> >> Attached is a prototype patch for that.
> > 
> 
> I am not sure I understand the ABI comment for started_collection_at.
> What's the ABI issue? The struct is private to snapbuild.c module. Or
> you want to store it in the ondisk snapshot as well?

It's stored on-disk already :(


> Otherwise the logic seems to be right on first read, will ponder it a
> bit more

Cool!

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-05 Thread Petr Jelinek
On 05/05/17 02:42, Andres Freund wrote:
> On 2017-05-04 17:00:04 -0700, Andres Freund wrote:
>> Attached is a prototype patch for that.
> 

I am not sure I understand the ABI comment for started_collection_at.
What's the ABI issue? The struct is private to snapbuild.c module. Or
you want to store it in the ondisk snapshot as well?

About better name for it what about something like oldest_full_xact?

Otherwise the logic seems to be right on first read, will ponder it a
bit more

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-05 Thread Petr Jelinek
On 05/05/17 02:00, Andres Freund wrote:
> Hi,
> 
> On 2017-05-02 08:55:53 +0200, Petr Jelinek wrote:
>> Aah, now I understand we talked about slightly different things, I
>> considered the running thing to be first step towards tracking aborted
>> txes everywhere.
>> I think
>> we'll have to revisit tracking of aborted transactions in PG11 then
>> though because of the 'snapshot too large' issue when exporting, at
>> least I don't see any other way to fix that.
> 
> FWIW, that seems unnecessary - we can just check for that using the
> clog.  Should be very simple to check for aborted xacts when exporting
> the snapshot (like 2 lines + comments).  That should address your
> concern, right?

Right, because there isn't practical difference between running and
aborted transaction for us so we don't mind if the abort has happened in
the future. Yeah the export code could do the check seems quite simple.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-04 Thread Andres Freund
On 2017-05-04 17:00:04 -0700, Andres Freund wrote:
> Attached is a prototype patch for that.

Oops.

Andres
>From b6eb46e376e40f3e2e9a55d16b1b37b27904564b Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Thu, 4 May 2017 16:40:52 -0700
Subject: [PATCH 1/2] WIP: Fix off-by-one around GetLastImportantRecPtr.

---
 src/backend/postmaster/bgwriter.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index dcb4cf249c..d409d977c0 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -325,10 +325,11 @@ BackgroundWriterMain(void)
 
 			/*
 			 * Only log if enough time has passed and interesting records have
-			 * been inserted since the last snapshot.
+			 * been inserted since the last snapshot (it's <= because
+			 * last_snapshot_lsn points at the end+1 of the record).
 			 */
 			if (now >= timeout &&
-last_snapshot_lsn < GetLastImportantRecPtr())
+last_snapshot_lsn <= GetLastImportantRecPtr())
 			{
 last_snapshot_lsn = LogStandbySnapshot();
 last_snapshot_ts = now;
-- 
2.12.0.264.gd6db3f2165.dirty

>From 7ed2aeb832029f5602566a665b3f4dbe8baedfcd Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Thu, 4 May 2017 16:48:00 -0700
Subject: [PATCH 2/2] WIP: Possibly more robust snapbuild approach.

---
 contrib/test_decoding/expected/ondisk_startup.out |  15 +-
 contrib/test_decoding/specs/ondisk_startup.spec   |   8 +-
 src/backend/replication/logical/decode.c  |   3 -
 src/backend/replication/logical/snapbuild.c   | 386 +++---
 src/include/replication/snapbuild.h   |  25 +-
 5 files changed, 215 insertions(+), 222 deletions(-)

diff --git a/contrib/test_decoding/expected/ondisk_startup.out b/contrib/test_decoding/expected/ondisk_startup.out
index 65115c830a..c7b1f45b46 100644
--- a/contrib/test_decoding/expected/ondisk_startup.out
+++ b/contrib/test_decoding/expected/ondisk_startup.out
@@ -1,21 +1,30 @@
 Parsed test spec with 3 sessions
 
-starting permutation: s2txid s1init s3txid s2alter s2c s1insert s1checkpoint s1start s1insert s1alter s1insert s1start
-step s2txid: BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT txid_current() IS NULL;
+starting permutation: s2b s2txid s1init s3b s3txid s2alter s2c s2b s2txid s3c s2c s1insert s1checkpoint s1start s1insert s1alter s1insert s1start
+step s2b: BEGIN;
+step s2txid: SELECT txid_current() IS NULL;
 ?column?   
 
 f  
 step s1init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding'); 
-step s3txid: BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT txid_current() IS NULL;
+step s3b: BEGIN;
+step s3txid: SELECT txid_current() IS NULL;
 ?column?   
 
 f  
 step s2alter: ALTER TABLE do_write ADD COLUMN addedbys2 int;
 step s2c: COMMIT;
+step s2b: BEGIN;
+step s2txid: SELECT txid_current() IS NULL;
+?column?   
+
+f  
+step s3c: COMMIT;
 step s1init: <... completed>
 ?column?   
 
 init   
+step s2c: COMMIT;
 step s1insert: INSERT INTO do_write DEFAULT VALUES;
 step s1checkpoint: CHECKPOINT;
 step s1start: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false');
diff --git a/contrib/test_decoding/specs/ondisk_startup.spec b/contrib/test_decoding/specs/ondisk_startup.spec
index 8223705639..12c57a813d 100644
--- a/contrib/test_decoding/specs/ondisk_startup.spec
+++ b/contrib/test_decoding/specs/ondisk_startup.spec
@@ -24,7 +24,8 @@ step "s1alter" { ALTER TABLE do_write ADD COLUMN addedbys1 int; }
 session "s2"
 setup { SET synchronous_commit=on; }
 
-step "s2txid" { BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT txid_current() IS NULL; }
+step "s2b" { BEGIN; }
+step "s2txid" { SELECT txid_current() IS NULL; }
 step "s2alter" { ALTER TABLE do_write ADD COLUMN addedbys2 int; }
 step "s2c" { COMMIT; }
 
@@ -32,7 +33,8 @@ step "s2c" { COMMIT; }
 session "s3"
 setup { SET synchronous_commit=on; }
 
-step "s3txid" { BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT txid_current() IS NULL; }
+step "s3b" { BEGIN; }
+step "s3txid" { SELECT txid_current() IS NULL; }
 step "s3c" { COMMIT; }
 
 # Force usage of ondisk snapshot by starting and not finishing a
@@ -40,4 +42,4 @@ step "s3c" { COMMIT; }
 # reached. In combination with a checkpoint forcing a snapshot to be
 # written and a new restart point computed that'll lead to the usage
 # of the snapshot.
-permutation "s2txid" "s1init" "s3txid" "s2alter" "s2c" "s1insert" "s1checkpoint" "s1start" "s1insert" "s1alter" "s1insert" "s1start"
+permutation "s2b" "s2txid" "s1init" "s3b" "s3txid" "s2alter" "s2c" "s2b" "s2txid" "s3c" "s2c" "s1insert" "s1checkpoint" "s1start" "s1insert" "s1alter" "s1insert" "s1start"
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 5c13d26099..68825ef598 100644
--- 

Re: [HACKERS] snapbuild woes

2017-05-04 Thread Andres Freund
Hi,

On 2017-05-02 08:55:53 +0200, Petr Jelinek wrote:
> Aah, now I understand we talked about slightly different things, I
> considered the running thing to be first step towards tracking aborted
> txes everywhere.
> I think
> we'll have to revisit tracking of aborted transactions in PG11 then
> though because of the 'snapshot too large' issue when exporting, at
> least I don't see any other way to fix that.

FWIW, that seems unnecessary - we can just check for that using the
clog.  Should be very simple to check for aborted xacts when exporting
the snapshot (like 2 lines + comments).  That should address your
concern, right?


> If you think that adding the SNAPBUILD_BUILD_INITIAL_SNAPSHOT would be
> less invasive/smaller patch I am okay with doing that for PG10.

Attached is a prototype patch for that.

What I decided is that essentially tracking the running xacts is too
unrealiable due to the race, so I decided that just relying on
oldestRunningXid and nextXid - which are solely in the procArray and
thus racefree - is better.

It's not perfect yet, primarily because we'd need to take a bit more
care about being ABI compatible for older releases, and because we'd
probably have to trigger LogStandbySnapshot() a bit more frequently
(presumably while waiting for WAL).  The change means we'll have to wait
a bit longer for slot creation, but it's considerably simpler / more
robust.

Could you have a look?

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-04 Thread Petr Jelinek
On 04/05/17 07:45, Noah Misch wrote:
> On Thu, Apr 27, 2017 at 09:42:58PM -0700, Andres Freund wrote:
>>
>>
>> On April 27, 2017 9:34:44 PM PDT, Noah Misch  wrote:
>>> On Fri, Apr 21, 2017 at 10:36:21PM -0700, Andres Freund wrote:
 On 2017-04-17 21:16:57 -0700, Andres Freund wrote:
> I've since the previous update reviewed Petr's patch, which he
>>> since has
> updated over the weekend.  I'll do another round tomorrow, and will
>>> see
> how it looks.  I think we might need some more tests for this to be
> committable, so it might not become committable tomorrow.  I hope
>>> we'll
> have something in tree by end of this week, if not I'll send an
>>> update.

 I was less productive this week than I'd hoped, and creating a
>>> testsuite
 was more work than I'd anticipated, so I'm slightly lagging behind. 
>>> I
 hope to have a patchset tomorrow, aiming to commit something
 Monday/Tuesday.
>>>
>>> This PostgreSQL 10 open item is past due for your status update. 
>>> Kindly send
>>> a status update within 24 hours, and include a date for your subsequent
>>> status
>>> update.  Refer to the policy on open item ownership:
>>> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
>>
>> I committed part of the series today, plan to continue doing so over the 
>> next few days.  Changes require careful review & testing, this is easy to 
>> get wrong...
> 
> This PostgreSQL 10 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.
> 
> Also, this open item has been alive for three weeks, well above guideline.  I
> understand it's a tricky bug, but I'm worried this isn't on track to end.
> What is missing to make it end?

It's tricky 5 bugs, and they are quite sensitive on rare
timing/concurrency events.

First two are fixed, we can live with 5th being done later (as it's not
correctness fix, but very bad performance one).

We haven't figured way for fixing the 4th one that we all agree is good
solution (everything proposed so far still had bugs).

I am not quite sure what happened to the 3rd one.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-03 Thread Noah Misch
On Thu, Apr 27, 2017 at 09:42:58PM -0700, Andres Freund wrote:
> 
> 
> On April 27, 2017 9:34:44 PM PDT, Noah Misch  wrote:
> >On Fri, Apr 21, 2017 at 10:36:21PM -0700, Andres Freund wrote:
> >> On 2017-04-17 21:16:57 -0700, Andres Freund wrote:
> >> > I've since the previous update reviewed Petr's patch, which he
> >since has
> >> > updated over the weekend.  I'll do another round tomorrow, and will
> >see
> >> > how it looks.  I think we might need some more tests for this to be
> >> > committable, so it might not become committable tomorrow.  I hope
> >we'll
> >> > have something in tree by end of this week, if not I'll send an
> >update.
> >> 
> >> I was less productive this week than I'd hoped, and creating a
> >testsuite
> >> was more work than I'd anticipated, so I'm slightly lagging behind. 
> >I
> >> hope to have a patchset tomorrow, aiming to commit something
> >> Monday/Tuesday.
> >
> >This PostgreSQL 10 open item is past due for your status update. 
> >Kindly send
> >a status update within 24 hours, and include a date for your subsequent
> >status
> >update.  Refer to the policy on open item ownership:
> >https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> 
> I committed part of the series today, plan to continue doing so over the next 
> few days.  Changes require careful review & testing, this is easy to get 
> wrong...

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.

Also, this open item has been alive for three weeks, well above guideline.  I
understand it's a tricky bug, but I'm worried this isn't on track to end.
What is missing to make it end?

Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-02 Thread Petr Jelinek
On 01/05/17 21:14, Andres Freund wrote:
> On 2017-05-01 11:09:44 +0200, Petr Jelinek wrote:
>> On 01/05/17 10:03, Andres Freund wrote:
>>> On 2017-05-01 03:54:49 +0200, Petr Jelinek wrote:
> 
 But, I still think we need to restart the tracking after new
 xl_running_xacts. Reason for that is afaics any of the catalog snapshots
 that we assigned to transactions at the end of SnapBuildCommitTxn might
 be corrupted otherwise as they were built before we knew one of the
 supposedly running txes was actually already committed and that
 transaction might have done catalog changes.
>>>
>>> I'm afraid you're right.  But I think this is even more complicated: The
>>> argument in your version that this can only happen once, seems to also
>>> be holey: Just imagine a pg_usleep(3000 * 100) right before
>>> ProcArrayEndTransaction() and enjoy the picture.
> 
>> Well yes, transaction can in theory have written commit/abort xlog
>> record and stayed in proc for more than single xl_running_xacts write.
>> But then the condition which we test that the new xl_running_xacts has
>> bigger xmin than the previously tracked one's xmax would not be
>> satisfied and we would not enter the relevant code path yet. So I think
>> we should not be able to get any xids we didn't see. But we have to
>> restart tracking from beginning (after first checking if we didn't
>> already see anything that the xl_running_xacts considers as running),
>> that's what my code did.
> 
> But to get that correct, we'd have to not only track ->committed, but
> also somehow maintain ->aborted, and not just for the transactions in
> the original set of running transactions.  That'd be fairly complicated
> and large.  The reason I was trying - and it's definitely not correct as
> I had proposed - to use the original running_xacts record is that that
> only required tracking as many transaction statuses as in the first
> xl_running_xacts.  Am I missing something?

Aah, now I understand we talked about slightly different things, I
considered the running thing to be first step towards tracking aborted
txes everywhere. I am not sure if it's complicated, it would be exactly
the same as committed tracking, except we'd do it only before we reach
SNAPBUILD_CONSISTENT. It would be definitely larger patch I agree, but I
can give it at try.

If you think that adding the SNAPBUILD_BUILD_INITIAL_SNAPSHOT would be
less invasive/smaller patch I am okay with doing that for PG10. I think
we'll have to revisit tracking of aborted transactions in PG11 then
though because of the 'snapshot too large' issue when exporting, at
least I don't see any other way to fix that.

> 
> The probabilistic tests catch the issues here fairly quickly, btw, if
> you run with synchronous_commit=on, while pgbench is running, because
> the WAL flushes make this more likely.  Runs this query:
> 
> SELECT account_count, teller_count, account_sum - teller_sum s
> FROM
> (
> SELECT count(*) account_count, SUM(abalance) account_sum
> FROM pgbench_accounts
> ) a,
> (
> SELECT count(*) teller_count, SUM(tbalance) teller_sum
> FROM pgbench_tellers
> ) t
> 
> which, for my scale, should always return:
> ┌─┬─┬───┐
> │a│  t  │ s │
> ├─┼─┼───┤
> │ 200 │ 200 │ 0 │
> └─┴─┴───┘
> but with my POC patch occasionally returns things like:
> ┌─┬─┬───┐
> │a│  t  │   s   │
> ├─┼─┼───┤
> │ 200 │ 212 │ 37358 │
> └─┴─┴───┘
> 
> which obviously shouldn't be the case.
> 

Very nice (the test, not the failures ;)) !

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-01 Thread Noah Misch
On Mon, May 01, 2017 at 12:32:07PM -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-05-01 08:46:47 -0400, Tom Lane wrote:
> >> 30sec is kind of a big lump from a buildfarm standpoint, especially if
> >> you mean "it runs for 30s on my honkin' fast workstation".  I'm fine
> >> with individual tests that run for ~ 1sec.
> 
> > I was more thinking of pgench -T$XX, rather than constant number of
> > iterations.  I currently can reproduce the issues within like 3-4
> > minutes, so 5s is probably not quite sufficient to get decent coverage.

You might hit the race faster by adding a dedicated stress test function to
regress.c.

> IMO the buildfarm is mainly for verifying portability, not for
> trying to prove that race-like conditions don't exist.

Perhaps so, but it has excelled at both tasks.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-01 Thread Peter Eisentraut
On 5/1/17 13:02, Tom Lane wrote:
> Andres Freund  writes:
>> On 2017-05-01 12:32:07 -0400, Tom Lane wrote:
>>> But quite aside from the question of whether we can afford the cycles,
>>> it seems like the wrong approach.  IMO the buildfarm is mainly for
>>> verifying portability, not for trying to prove that race-like
>>> conditions don't exist.  In most situations we're going out of our way
>>> to ensure reproduceability of tests we add to the buildfarm sequence;
>>> but it seems like this is looking for irreproducible results.
> 
>> Yea, I wondered about that upthread as well.  But the tests are quite
>> useful nonetheless.  Wonder about adding them simply as a separate
>> target.
> 
> I have no objection to adding more tests as a non-default target.

Well, the problem with nondefault targets is that they are hard to find
if you don't know them, and then they will rot.

Sure, we need a way to distinguish different classes of tests, but lets
think about the bigger scheme, too.  Ideas welcome.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-01 Thread Andres Freund
On 2017-05-01 11:09:44 +0200, Petr Jelinek wrote:
> On 01/05/17 10:03, Andres Freund wrote:
> > On 2017-05-01 03:54:49 +0200, Petr Jelinek wrote:

> >> But, I still think we need to restart the tracking after new
> >> xl_running_xacts. Reason for that is afaics any of the catalog snapshots
> >> that we assigned to transactions at the end of SnapBuildCommitTxn might
> >> be corrupted otherwise as they were built before we knew one of the
> >> supposedly running txes was actually already committed and that
> >> transaction might have done catalog changes.
> > 
> > I'm afraid you're right.  But I think this is even more complicated: The
> > argument in your version that this can only happen once, seems to also
> > be holey: Just imagine a pg_usleep(3000 * 100) right before
> > ProcArrayEndTransaction() and enjoy the picture.

> Well yes, transaction can in theory have written commit/abort xlog
> record and stayed in proc for more than single xl_running_xacts write.
> But then the condition which we test that the new xl_running_xacts has
> bigger xmin than the previously tracked one's xmax would not be
> satisfied and we would not enter the relevant code path yet. So I think
> we should not be able to get any xids we didn't see. But we have to
> restart tracking from beginning (after first checking if we didn't
> already see anything that the xl_running_xacts considers as running),
> that's what my code did.

But to get that correct, we'd have to not only track ->committed, but
also somehow maintain ->aborted, and not just for the transactions in
the original set of running transactions.  That'd be fairly complicated
and large.  The reason I was trying - and it's definitely not correct as
I had proposed - to use the original running_xacts record is that that
only required tracking as many transaction statuses as in the first
xl_running_xacts.  Am I missing something?



The probabilistic tests catch the issues here fairly quickly, btw, if
you run with synchronous_commit=on, while pgbench is running, because
the WAL flushes make this more likely.  Runs this query:

SELECT account_count, teller_count, account_sum - teller_sum s
FROM
(
SELECT count(*) account_count, SUM(abalance) account_sum
FROM pgbench_accounts
) a,
(
SELECT count(*) teller_count, SUM(tbalance) teller_sum
FROM pgbench_tellers
) t

which, for my scale, should always return:
┌─┬─┬───┐
│a│  t  │ s │
├─┼─┼───┤
│ 200 │ 200 │ 0 │
└─┴─┴───┘
but with my POC patch occasionally returns things like:
┌─┬─┬───┐
│a│  t  │   s   │
├─┼─┼───┤
│ 200 │ 212 │ 37358 │
└─┴─┴───┘

which obviously shouldn't be the case.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-01 Thread Tom Lane
Andres Freund  writes:
> On 2017-05-01 12:32:07 -0400, Tom Lane wrote:
>> But quite aside from the question of whether we can afford the cycles,
>> it seems like the wrong approach.  IMO the buildfarm is mainly for
>> verifying portability, not for trying to prove that race-like
>> conditions don't exist.  In most situations we're going out of our way
>> to ensure reproduceability of tests we add to the buildfarm sequence;
>> but it seems like this is looking for irreproducible results.

> Yea, I wondered about that upthread as well.  But the tests are quite
> useful nonetheless.  Wonder about adding them simply as a separate
> target.

I have no objection to adding more tests as a non-default target.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-01 Thread Andres Freund
On 2017-05-01 12:32:07 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-05-01 08:46:47 -0400, Tom Lane wrote:
> >> 30sec is kind of a big lump from a buildfarm standpoint, especially if
> >> you mean "it runs for 30s on my honkin' fast workstation".  I'm fine
> >> with individual tests that run for ~ 1sec.
> 
> > I was more thinking of pgench -T$XX, rather than constant number of
> > iterations.  I currently can reproduce the issues within like 3-4
> > minutes, so 5s is probably not quite sufficient to get decent coverage.
> 
> Adding a five-minute pgbench run to the buildfarm sequence is definitely
> going to get you ridden out of town on a rail.

Right - that was referring to Noah's comment upthread:

On 2017-04-29 14:42:01 -0700, Noah Misch wrote:
> If the probabilistic test catches the bug even 5% of the time in typical
> configurations, the buildfarm will rapidly identify any regression.  I'd
> choose a 7s test that detects the bug 5% of the time over a 30s test that
> detects it 99% of the time.  (When I wrote src/bin/pgbench/t/001_pgbench.pl
> for a probabilistic bug, I sized that test to finish in 1s and catch its bug
> half the time.  In its case, only two buildfarm members were able to
> demonstrate the original bug, so 5% detection would have been too low.)

and I suspect that you'd not find these within 5s within sufficient
time, because the detection rate would be too low.



> But quite aside from the question of whether we can afford the cycles,
> it seems like the wrong approach.  IMO the buildfarm is mainly for
> verifying portability, not for trying to prove that race-like
> conditions don't exist.  In most situations we're going out of our way
> to ensure reproduceability of tests we add to the buildfarm sequence;
> but it seems like this is looking for irreproducible results.

Yea, I wondered about that upthread as well.  But the tests are quite
useful nonetheless.  Wonder about adding them simply as a separate
target.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-01 Thread Tom Lane
Andres Freund  writes:
> On 2017-05-01 08:46:47 -0400, Tom Lane wrote:
>> 30sec is kind of a big lump from a buildfarm standpoint, especially if
>> you mean "it runs for 30s on my honkin' fast workstation".  I'm fine
>> with individual tests that run for ~ 1sec.

> I was more thinking of pgench -T$XX, rather than constant number of
> iterations.  I currently can reproduce the issues within like 3-4
> minutes, so 5s is probably not quite sufficient to get decent coverage.

Adding a five-minute pgbench run to the buildfarm sequence is definitely
going to get you ridden out of town on a rail.  But quite aside from the
question of whether we can afford the cycles, it seems like the wrong
approach.  IMO the buildfarm is mainly for verifying portability, not for
trying to prove that race-like conditions don't exist.  In most situations
we're going out of our way to ensure reproduceability of tests we add to
the buildfarm sequence; but it seems like this is looking for
irreproducible results.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-01 Thread Andres Freund
On 2017-05-01 08:46:47 -0400, Tom Lane wrote:
> Noah Misch  writes:
> > On Fri, Apr 21, 2017 at 10:34:58PM -0700, Andres Freund wrote:
> >> ... I was wondering about adding
> >> a loop that simply runs for like 30s and then quits or such, but who
> >> knows.
> 
> > If the probabilistic test catches the bug even 5% of the time in typical
> > configurations, the buildfarm will rapidly identify any regression.  I'd
> > choose a 7s test that detects the bug 5% of the time over a 30s test that
> > detects it 99% of the time.  (When I wrote src/bin/pgbench/t/001_pgbench.pl
> > for a probabilistic bug, I sized that test to finish in 1s and catch its bug
> > half the time.  In its case, only two buildfarm members were able to
> > demonstrate the original bug, so 5% detection would have been too low.)
> 
> 30sec is kind of a big lump from a buildfarm standpoint, especially if
> you mean "it runs for 30s on my honkin' fast workstation".  I'm fine
> with individual tests that run for ~ 1sec.

I was more thinking of pgench -T$XX, rather than constant number of
iterations.  I currently can reproduce the issues within like 3-4
minutes, so 5s is probably not quite sufficient to get decent coverage.


- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-01 Thread Andrew Dunstan


On 05/01/2017 08:46 AM, Tom Lane wrote:
> Noah Misch  writes:
>> On Fri, Apr 21, 2017 at 10:34:58PM -0700, Andres Freund wrote:
>>> ... I was wondering about adding
>>> a loop that simply runs for like 30s and then quits or such, but who
>>> knows.
>> If the probabilistic test catches the bug even 5% of the time in typical
>> configurations, the buildfarm will rapidly identify any regression.  I'd
>> choose a 7s test that detects the bug 5% of the time over a 30s test that
>> detects it 99% of the time.  (When I wrote src/bin/pgbench/t/001_pgbench.pl
>> for a probabilistic bug, I sized that test to finish in 1s and catch its bug
>> half the time.  In its case, only two buildfarm members were able to
>> demonstrate the original bug, so 5% detection would have been too low.)
> 30sec is kind of a big lump from a buildfarm standpoint, especially if
> you mean "it runs for 30s on my honkin' fast workstation".  I'm fine
> with individual tests that run for ~ 1sec.
>
> (This is top-of-mind for me right now because I've been looking around
> for ways to speed up the regression tests.)
>
>   


Yes, me too. We're getting a bit lazy about that - see thread nearby
that will let us avoid unnecessary temp installs among other things.


cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-01 Thread Tom Lane
Noah Misch  writes:
> On Fri, Apr 21, 2017 at 10:34:58PM -0700, Andres Freund wrote:
>> ... I was wondering about adding
>> a loop that simply runs for like 30s and then quits or such, but who
>> knows.

> If the probabilistic test catches the bug even 5% of the time in typical
> configurations, the buildfarm will rapidly identify any regression.  I'd
> choose a 7s test that detects the bug 5% of the time over a 30s test that
> detects it 99% of the time.  (When I wrote src/bin/pgbench/t/001_pgbench.pl
> for a probabilistic bug, I sized that test to finish in 1s and catch its bug
> half the time.  In its case, only two buildfarm members were able to
> demonstrate the original bug, so 5% detection would have been too low.)

30sec is kind of a big lump from a buildfarm standpoint, especially if
you mean "it runs for 30s on my honkin' fast workstation".  I'm fine
with individual tests that run for ~ 1sec.

(This is top-of-mind for me right now because I've been looking around
for ways to speed up the regression tests.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-01 Thread Petr Jelinek
On 01/05/17 10:03, Andres Freund wrote:
> On 2017-05-01 03:54:49 +0200, Petr Jelinek wrote:
>> I agree with adding running, I think that's good thing even for the per
>> transaction tracking and snapshot exports - we could use the newly added
>> field to get rid of the issue we have with 'snapshot too large' when
>> there were many aborted transactions while we waited for running ones to
>> finish.
> 
> I'm not sure of that - what I was proposing would only track this for
> the ->running substructure.  How'd that help?
> 

Well not as is, but it's a building block for it.

> 
>> But, I still think we need to restart the tracking after new
>> xl_running_xacts. Reason for that is afaics any of the catalog snapshots
>> that we assigned to transactions at the end of SnapBuildCommitTxn might
>> be corrupted otherwise as they were built before we knew one of the
>> supposedly running txes was actually already committed and that
>> transaction might have done catalog changes.
> 
> I'm afraid you're right.  But I think this is even more complicated: The
> argument in your version that this can only happen once, seems to also
> be holey: Just imagine a pg_usleep(3000 * 100) right before
> ProcArrayEndTransaction() and enjoy the picture.
>

Well yes, transaction can in theory have written commit/abort xlog
record and stayed in proc for more than single xl_running_xacts write.
But then the condition which we test that the new xl_running_xacts has
bigger xmin than the previously tracked one's xmax would not be
satisfied and we would not enter the relevant code path yet. So I think
we should not be able to get any xids we didn't see. But we have to
restart tracking from beginning (after first checking if we didn't
already see anything that the xl_running_xacts considers as running),
that's what my code did.

> Wonder if we should just (re-)add a stage between SNAPBUILD_START and
> SNAPBUILD_FULL_SNAPSHOT.  Enter SNAPBUILD_BUILD_INITIAL_SNAPSHOT at the
> first xl_running_xacts, wait for all transactions to end with my
> approach, while populating SnapBuild->committed, only then start
> collecting changes for transactions (i.e. return true from
> SnapBuildProcessChange()), return true once all xacts have finished
> again.  That'd presumably be a bit easier to understand, more robust -
> and slower.
> 

That would also work, but per above, I don't understand why it's needed.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-05-01 Thread Andres Freund
On 2017-05-01 03:54:49 +0200, Petr Jelinek wrote:
> I agree with adding running, I think that's good thing even for the per
> transaction tracking and snapshot exports - we could use the newly added
> field to get rid of the issue we have with 'snapshot too large' when
> there were many aborted transactions while we waited for running ones to
> finish.

I'm not sure of that - what I was proposing would only track this for
the ->running substructure.  How'd that help?


> But, I still think we need to restart the tracking after new
> xl_running_xacts. Reason for that is afaics any of the catalog snapshots
> that we assigned to transactions at the end of SnapBuildCommitTxn might
> be corrupted otherwise as they were built before we knew one of the
> supposedly running txes was actually already committed and that
> transaction might have done catalog changes.

I'm afraid you're right.  But I think this is even more complicated: The
argument in your version that this can only happen once, seems to also
be holey: Just imagine a pg_usleep(3000 * 100) right before
ProcArrayEndTransaction() and enjoy the picture.

Wonder if we should just (re-)add a stage between SNAPBUILD_START and
SNAPBUILD_FULL_SNAPSHOT.  Enter SNAPBUILD_BUILD_INITIAL_SNAPSHOT at the
first xl_running_xacts, wait for all transactions to end with my
approach, while populating SnapBuild->committed, only then start
collecting changes for transactions (i.e. return true from
SnapBuildProcessChange()), return true once all xacts have finished
again.  That'd presumably be a bit easier to understand, more robust -
and slower.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-30 Thread Petr Jelinek
On 01/05/17 04:29, Craig Ringer wrote:
> On 1 May 2017 at 09:54, Petr Jelinek  wrote:
> 
>> But, I still think we need to restart the tracking after new
>> xl_running_xacts. Reason for that is afaics any of the catalog snapshots
>> that we assigned to transactions at the end of SnapBuildCommitTxn might
>> be corrupted otherwise as they were built before we knew one of the
>> supposedly running txes was actually already committed and that
>> transaction might have done catalog changes.
> 
> Due to the race where LogStandbySnapshot() collects running-xacts info
> while a concurrent xact commits, such that the xl_xact_commit appears
> before the xl_running_xacts, but the xl_running_xacts still has the
> commited xact listed as running, right? Because we update PGXACT only
> after we write the commit to WAL, so there's a window where an xact is
> committed in WAL but not shown as committed in shmem.
> 

Yes that's what the patch at hand tries to fix, but Andres approached it
from too simplistic standpoint as we don't only care about the exported
snapshot but whatever catalog snapshots we made for the transactions we
already track, unless I am missing something.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-30 Thread Craig Ringer
On 1 May 2017 at 09:54, Petr Jelinek  wrote:

> But, I still think we need to restart the tracking after new
> xl_running_xacts. Reason for that is afaics any of the catalog snapshots
> that we assigned to transactions at the end of SnapBuildCommitTxn might
> be corrupted otherwise as they were built before we knew one of the
> supposedly running txes was actually already committed and that
> transaction might have done catalog changes.

Due to the race where LogStandbySnapshot() collects running-xacts info
while a concurrent xact commits, such that the xl_xact_commit appears
before the xl_running_xacts, but the xl_running_xacts still has the
commited xact listed as running, right? Because we update PGXACT only
after we write the commit to WAL, so there's a window where an xact is
committed in WAL but not shown as committed in shmem.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-30 Thread Petr Jelinek
On 01/05/17 03:35, Andres Freund wrote:
> On 2017-04-30 17:59:21 -0700, Andres Freund wrote:
>> ISTM, we need a xip_status array in SnapBuild->running.  Then, whenever
>> a xl_running_xacts is encountered while builder->running.xcnt > 0, loop
>> for i in 0 .. xcnt_space, and end SnapBuildEndTxn() if xip_status[i] but
>> the xid is not xl_running_xacts?  Does that make sense?
> 
> A hasty implementation, untested besides check-world, of that approach
> is attached.  I'm going out for dinner now, will subject it to mean
> things afterwards.
> 
> Needs more testing, comment and commit message policing.  But I think
> the idea is sound?
> 

I agree with adding running, I think that's good thing even for the per
transaction tracking and snapshot exports - we could use the newly added
field to get rid of the issue we have with 'snapshot too large' when
there were many aborted transactions while we waited for running ones to
finish. Because so far only tracked committed transactions, any aborted
one would show as running inside the exported snapshot so it can grow
over maximum number of backends and can't be exported anymore. So +1 for
that part.

But, I still think we need to restart the tracking after new
xl_running_xacts. Reason for that is afaics any of the catalog snapshots
that we assigned to transactions at the end of SnapBuildCommitTxn might
be corrupted otherwise as they were built before we knew one of the
supposedly running txes was actually already committed and that
transaction might have done catalog changes.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-30 Thread Andres Freund
On 2017-04-30 17:59:21 -0700, Andres Freund wrote:
> ISTM, we need a xip_status array in SnapBuild->running.  Then, whenever
> a xl_running_xacts is encountered while builder->running.xcnt > 0, loop
> for i in 0 .. xcnt_space, and end SnapBuildEndTxn() if xip_status[i] but
> the xid is not xl_running_xacts?  Does that make sense?

A hasty implementation, untested besides check-world, of that approach
is attached.  I'm going out for dinner now, will subject it to mean
things afterwards.

Needs more testing, comment and commit message policing.  But I think
the idea is sound?

- Andres
>From 6160eda5177e7f538ff13eacc46acb2f2959c257 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sun, 30 Apr 2017 18:24:06 -0700
Subject: [PATCH] Fix initial logical decoding snapshat race condition.

LogStandbySnapshot() has no interlock preventing
RecordTransactionCommit() from logging its commit record (abort
situation is similar), before the xl_running_xacts record is
generated.  That can lead to the situation that logical decoding
forever waits for a commit/abort record, that was logged in the past.

To fix, check for such "missed" transactions when xl_running_xacts are
observed, while in SNAPBUILD_FULL_SNAPSHOT.

This also reverts changes made to GetRunningTransactionData() and
LogStandbySnapshot() by b89e151 as the additional locking does not
solve the problem, and is not required anymore.

Author: Petr Jelined and Andres Freund
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb...@2ndquadrant.com
Backpatch: 9.4, where logical decoding was introduced
---
 src/backend/replication/logical/snapbuild.c | 98 +++--
 src/backend/storage/ipc/procarray.c |  5 +-
 src/backend/storage/ipc/standby.c   | 19 --
 3 files changed, 81 insertions(+), 41 deletions(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 068d214fa1..860562 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -203,6 +203,7 @@ struct SnapBuild
 		size_t		xcnt;		/* number of used xip entries */
 		size_t		xcnt_space; /* allocated size of xip */
 		TransactionId *xip;		/* running xacts array, xidComparator-sorted */
+		bool	   *xip_running; /* xid in ->xip still running? */
 	}			running;
 
 	/*
@@ -253,7 +254,7 @@ static bool ExportInProgress = false;
 static void SnapBuildEndTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid);
 
 /* ->running manipulation */
-static bool SnapBuildTxnIsRunning(SnapBuild *builder, TransactionId xid);
+static bool SnapBuildTxnIsRunning(SnapBuild *builder, TransactionId xid, int *off);
 
 /* ->committed manipulation */
 static void SnapBuildPurgeCommittedTxn(SnapBuild *builder);
@@ -700,7 +701,7 @@ SnapBuildProcessChange(SnapBuild *builder, TransactionId xid, XLogRecPtr lsn)
 	 * we got into the SNAPBUILD_FULL_SNAPSHOT state.
 	 */
 	if (builder->state < SNAPBUILD_CONSISTENT &&
-		SnapBuildTxnIsRunning(builder, xid))
+		SnapBuildTxnIsRunning(builder, xid, NULL))
 		return false;
 
 	/*
@@ -776,7 +777,7 @@ SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid,
  * only exist after we freshly started from an < CONSISTENT snapshot.
  */
 static bool
-SnapBuildTxnIsRunning(SnapBuild *builder, TransactionId xid)
+SnapBuildTxnIsRunning(SnapBuild *builder, TransactionId xid, int *off)
 {
 	Assert(builder->state < SNAPBUILD_CONSISTENT);
 	Assert(TransactionIdIsNormal(builder->running.xmin));
@@ -793,8 +794,17 @@ SnapBuildTxnIsRunning(SnapBuild *builder, TransactionId xid)
 		if (search != NULL)
 		{
 			Assert(*search == xid);
+
+			if (off)
+			{
+*off = search - builder->running.xip;
+Assert(builder->running.xip[*off] == xid);
+			}
+
 			return true;
 		}
+
+
 	}
 
 	return false;
@@ -928,6 +938,8 @@ SnapBuildPurgeCommittedTxn(SnapBuild *builder)
 static void
 SnapBuildEndTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid)
 {
+	int off;
+
 	if (builder->state == SNAPBUILD_CONSISTENT)
 		return;
 
@@ -938,10 +950,13 @@ SnapBuildEndTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid)
 	 * subxids and since they end at the same time it's sufficient to deal
 	 * with them here.
 	 */
-	if (SnapBuildTxnIsRunning(builder, xid))
+	if (SnapBuildTxnIsRunning(builder, xid, ))
 	{
 		Assert(builder->running.xcnt > 0);
 
+		Assert(builder->running.xip_running[off]);
+		builder->running.xip_running[off] = false;
+
 		if (!--builder->running.xcnt)
 		{
 			/*
@@ -1250,9 +1265,15 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
 	 *
 	 * a) There were no running transactions when the xl_running_xacts record
 	 *	  was inserted, jump to CONSISTENT immediately. We might find such a
-	 *	  state we were waiting for b) or c).
+	 *	  state while waiting for b) or c).
 	 *
-	 * b) Wait for all toplevel transactions that were running to end. We
+	 * b) This (in a 

Re: [HACKERS] snapbuild woes

2017-04-30 Thread Andres Freund
Hi,

On 2017-04-15 05:18:49 +0200, Petr Jelinek wrote:
> + /*
> +  * c) we already seen the xl_running_xacts and tried to do the above.
> +  * However because of race condition in LogStandbySnapshot() there might
> +  * have been transaction reported as running but in reality has already
> +  * written commit record before the xl_running_xacts so decoding has
> +  * missed it. We now see xl_running_xacts that suggests all transactions
> +  * from the original one were closed but the consistent state wasn't
> +  * reached which means the race condition has indeed happened.
> +  *
> +  * Start tracking again as if this was the first xl_running_xacts we've
> +  * seen, with the advantage that because decoding was already running,
> +  * any transactions committed before the xl_running_xacts record will be
> +  * known to us so we won't hit with the same issue again.
> +  */

Unfortunately I don't think that's true, as coded.  You're using
information about committed transactions:

> + else if (TransactionIdFollows(running->oldestRunningXid,
> +   
> builder->running.xmax))
> + {
> + int off;
> +
> + SnapBuildStartXactTracking(builder, running);
> +
>   /*
> +  * Nark any transactions that are known to have committed 
> before the
> +  * xl_running_xacts as finished to avoid the race condition in
> +  * LogStandbySnapshot().
>*
> +  * We can use SnapBuildEndTxn directly as it only does the
> +  * transaction running check and handling without any additional
> +  * side effects.
>*/
> + for (off = 0; off < builder->committed.xcnt; off++)
> + SnapBuildEndTxn(builder, lsn, 
> builder->committed.xip[off]);

but a transaction might just have *aborted* before the new snapshot, no?
Since we don't keep track of those, I don't think this guarantees anything?

ISTM, we need a xip_status array in SnapBuild->running.  Then, whenever
a xl_running_xacts is encountered while builder->running.xcnt > 0, loop
for i in 0 .. xcnt_space, and end SnapBuildEndTxn() if xip_status[i] but
the xid is not xl_running_xacts?  Does that make sense?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-29 Thread Noah Misch
On Fri, Apr 21, 2017 at 10:34:58PM -0700, Andres Freund wrote:
> I've a bunch of tests, but I don't quite know whether we can expose all
> of them via classical tests.  There are several easy ones that I
> definitely want to add (import "empty" snapshot; import snapshot with
> running xacts; create snapshot, perform some ddl, import snapshot,
> perform some ddl, check things work reasonably crazy), but there's
> enough others that are just probabilistic.  I was wondering about adding
> a loop that simply runs for like 30s and then quits or such, but who
> knows.

If the probabilistic test catches the bug even 5% of the time in typical
configurations, the buildfarm will rapidly identify any regression.  I'd
choose a 7s test that detects the bug 5% of the time over a 30s test that
detects it 99% of the time.  (When I wrote src/bin/pgbench/t/001_pgbench.pl
for a probabilistic bug, I sized that test to finish in 1s and catch its bug
half the time.  In its case, only two buildfarm members were able to
demonstrate the original bug, so 5% detection would have been too low.)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-27 Thread Andres Freund


On April 27, 2017 9:34:44 PM PDT, Noah Misch  wrote:
>On Fri, Apr 21, 2017 at 10:36:21PM -0700, Andres Freund wrote:
>> On 2017-04-17 21:16:57 -0700, Andres Freund wrote:
>> > I've since the previous update reviewed Petr's patch, which he
>since has
>> > updated over the weekend.  I'll do another round tomorrow, and will
>see
>> > how it looks.  I think we might need some more tests for this to be
>> > committable, so it might not become committable tomorrow.  I hope
>we'll
>> > have something in tree by end of this week, if not I'll send an
>update.
>> 
>> I was less productive this week than I'd hoped, and creating a
>testsuite
>> was more work than I'd anticipated, so I'm slightly lagging behind. 
>I
>> hope to have a patchset tomorrow, aiming to commit something
>> Monday/Tuesday.
>
>This PostgreSQL 10 open item is past due for your status update. 
>Kindly send
>a status update within 24 hours, and include a date for your subsequent
>status
>update.  Refer to the policy on open item ownership:
>https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

I committed part of the series today, plan to continue doing so over the next 
few days.  Changes require careful review & testing, this is easy to get 
wrong...
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-27 Thread Noah Misch
On Fri, Apr 21, 2017 at 10:36:21PM -0700, Andres Freund wrote:
> On 2017-04-17 21:16:57 -0700, Andres Freund wrote:
> > I've since the previous update reviewed Petr's patch, which he since has
> > updated over the weekend.  I'll do another round tomorrow, and will see
> > how it looks.  I think we might need some more tests for this to be
> > committable, so it might not become committable tomorrow.  I hope we'll
> > have something in tree by end of this week, if not I'll send an update.
> 
> I was less productive this week than I'd hoped, and creating a testsuite
> was more work than I'd anticipated, so I'm slightly lagging behind.  I
> hope to have a patchset tomorrow, aiming to commit something
> Monday/Tuesday.

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-24 Thread Petr Jelinek

On 25/04/17 00:59, Andres Freund wrote:
> Hi,
> 
> On 2017-04-15 05:18:49 +0200, Petr Jelinek wrote:
>> Hi, here is updated patch (details inline).
> 
> I'm not yet all that happy, sorry:
> 
> Looking at 0001:
> - GetOldestSafeDecodingTransactionId() only guarantees to return an xid
>   safe for decoding (note how procArray->replication_slot_catalog_xmin
>   is checked), not one for the initial snapshot - so afaics this whole
>   exercise doesn't guarantee much so far.
> - A later commit introduces need_full_snapshot as a
>   CreateInitDecodingContext, but you don't use it, not here.  That seems
>   wrong.

Ah yeah looks like that optimization is useful even here.

> - I remain unhappy with the handling of the reset of effective_xmin in
>   FreeDecodingContext(). What if we ERROR/FATAL out before that happens?
> 

Oh your problem was that I did it in FreeDecodingContext() instead of
slot release, that I didn't get, yeah sure it's possibly better place.

> What do you think about something like the attached?  I've not yet
> tested it any way except running the regression tests.
> 

> - if (!already_locked)
> - LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
> + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);

Don't really understand this change much, but otherwise the patch looks
good to me.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-24 Thread Andres Freund
Hi,

On 2017-04-15 05:18:49 +0200, Petr Jelinek wrote:
> Hi, here is updated patch (details inline).

I'm not yet all that happy, sorry:

Looking at 0001:
- GetOldestSafeDecodingTransactionId() only guarantees to return an xid
  safe for decoding (note how procArray->replication_slot_catalog_xmin
  is checked), not one for the initial snapshot - so afaics this whole
  exercise doesn't guarantee much so far.
- A later commit introduces need_full_snapshot as a
  CreateInitDecodingContext, but you don't use it, not here.  That seems
  wrong.
- I remain unhappy with the handling of the reset of effective_xmin in
  FreeDecodingContext(). What if we ERROR/FATAL out before that happens?

What do you think about something like the attached?  I've not yet
tested it any way except running the regression tests.

- Andres
>From b20c8e1edb31d517ecb714467a7acbeec1b926dc Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sun, 23 Apr 2017 20:41:29 -0700
Subject: [PATCH] Preserve required !catalog tuples while computing initial
 decoding snapshot.

The logical decoding machinery already preserved all the required
catalog tuples, which is sufficient in the course of normal logical
decoding, but did not guarantee that non-catalog tuples were preserved
during computation of the initial snapshot when creating a slot over
the replication protocol.

This could cause a corrupted initial snapshot being exported.  The
time window for issues is usually not terribly large, but on a busy
server it's perfectly possible to it hit it.  Ongoing decoding is not
affected by this bug.

To avoid increased overhead for the SQL API, only retain additional
tuples when a logical slot is being created over the replication
protocol.  To do so this commit changes the signature of
CreateInitDecodingContext(), but it seems unlikely that it's being
used in an extension, so that's probably ok.

In a drive-by fix, fix handling of
ReplicationSlotsComputeRequiredXmin's already_locked argument, which
should only apply to ProcArrayLock, not ReplicationSlotControlLock.

Reported-By: Erik Rijkers
Analyzed-By: Petr Jelinek
Author: Petr Jelinek, heavily editorialized by Andres Freund
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/9a897b86-46e1-9915-ee4c-da02e4ff6...@2ndquadrant.com
Backport: 9.4, where logical decoding was introduced.
---
 src/backend/replication/logical/logical.c   | 25 +
 src/backend/replication/logical/snapbuild.c | 12 
 src/backend/replication/slot.c  | 25 +
 src/backend/replication/slotfuncs.c |  4 ++--
 src/backend/replication/walsender.c |  1 +
 src/backend/storage/ipc/procarray.c | 14 +++---
 src/include/replication/logical.h   |  1 +
 src/include/storage/procarray.h |  2 +-
 8 files changed, 66 insertions(+), 18 deletions(-)

diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 5529ac8fb4..032e91c371 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -210,6 +210,7 @@ StartupDecodingContext(List *output_plugin_options,
 LogicalDecodingContext *
 CreateInitDecodingContext(char *plugin,
 		  List *output_plugin_options,
+		  bool need_full_snapshot,
 		  XLogPageReadCB read_page,
 		  LogicalOutputPluginWriterPrepareWrite prepare_write,
 		  LogicalOutputPluginWriterWrite do_write)
@@ -267,23 +268,31 @@ CreateInitDecodingContext(char *plugin,
 	 * the slot machinery about the new limit. Once that's done the
 	 * ProcArrayLock can be released as the slot machinery now is
 	 * protecting against vacuum.
+	 *
+	 * Note that, temporarily, the data, not just the catalog, xmin has to be
+	 * reserved if a data snapshot is to be exported.  Otherwise the initial
+	 * data snapshot created here is not guaranteed to be valid. After that
+	 * the data xmin doesn't need to be managed anymore and the global xmin
+	 * should be recomputed. As we are fine with losing the pegged data xmin
+	 * after crash - no chance a snapshot would get exported anymore - we can
+	 * get away with just setting the slot's
+	 * effective_xmin. ReplicationSlotRelease will reset it again.
+	 *
 	 * 
 	 */
 	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 
-	slot->effective_catalog_xmin = GetOldestSafeDecodingTransactionId();
-	slot->data.catalog_xmin = slot->effective_catalog_xmin;
+	xmin_horizon = GetOldestSafeDecodingTransactionId(need_full_snapshot);
+
+	slot->effective_catalog_xmin = xmin_horizon;
+	slot->data.catalog_xmin = xmin_horizon;
+	if (need_full_snapshot)
+		slot->effective_xmin = xmin_horizon;
 
 	ReplicationSlotsComputeRequiredXmin(true);
 
 	LWLockRelease(ProcArrayLock);
 
-	/*
-	 * tell the snapshot builder to only assemble snapshot once reaching the
-	 * running_xact's record with the respective xmin.
-	 */
-	xmin_horizon = slot->data.catalog_xmin;
-
 	

Re: [HACKERS] snapbuild woes

2017-04-21 Thread Andres Freund
On 2017-04-17 21:16:57 -0700, Andres Freund wrote:
> I've since the previous update reviewed Petr's patch, which he since has
> updated over the weekend.  I'll do another round tomorrow, and will see
> how it looks.  I think we might need some more tests for this to be
> committable, so it might not become committable tomorrow.  I hope we'll
> have something in tree by end of this week, if not I'll send an update.

I was less productive this week than I'd hoped, and creating a testsuite
was more work than I'd anticipated, so I'm slightly lagging behind.  I
hope to have a patchset tomorrow, aiming to commit something
Monday/Tuesday.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-21 Thread Andres Freund
On 2017-04-20 13:32:10 +0200, Petr Jelinek wrote:
> On 20/04/17 02:09, Andres Freund wrote:
> > On 2017-04-17 21:16:57 -0700, Andres Freund wrote:
> > I'm working on some infrastructure around this.  Not sure if it needs to
> > be committed, but it's certainly useful for evaluation.  Basically it's
> > a small UDF that:
> > 1) creates a slot via walsender protocol (to some dsn)
> > 2) imports that snapshot into yet another connection to that dsn
> > 3) runs some query over that new connection
> > 
> > That makes it reasonably easy to run e.g. pgbench and continually create
> > slots, and use the snapshot to run queries "verifying" that things look
> > good.  It's a bit shoestring-ed together, but everything else seems to
> > require more code. And it's just test.
> > 
> > Unless somebody has a better idea?

> I don't. I mean it would be nice to have isolation tester support
> walsender protocol, but I don't know anything about isolation tester
> internals so no idea how much work that is.

Now that the replication protocol supports normal queries, it's actually
not much of an issue on its own.  The problem is more that
isolationtester's client side language isn't powerfull enough - you
can't extract the snapshot name from one session and import it in
another. While that might be something we want to address, I certainly
don't want to tackle it for v10.

I'd started to develop a C toolkit as above, but after I got the basics
running I actually noticed it's pretty much unnecessary: You can just as
well do it with dblink and some plpgsql.

I can reliably reproduce several of the bugs in this thread in
relatively short amount of time before applying the patch, and so far
not after.  Thats great!


> I guess you plan to make that as one of the test/modules or something
> similar (the UDF)?

I've a bunch of tests, but I don't quite know whether we can expose all
of them via classical tests.  There are several easy ones that I
definitely want to add (import "empty" snapshot; import snapshot with
running xacts; create snapshot, perform some ddl, import snapshot,
perform some ddl, check things work reasonably crazy), but there's
enough others that are just probabilistic.  I was wondering about adding
a loop that simply runs for like 30s and then quits or such, but who
knows.

Testing around this made me wonder whether we need to make bgwriter.c's
LOG_SNAPSHOT_INTERVAL_MS configurable - for efficient testing reducing
it is quite valuable, and on busier machines it'll also almost always be
a win to log more frequently...  Opinions?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-20 Thread Petr Jelinek
On 20/04/17 02:09, Andres Freund wrote:
> On 2017-04-17 21:16:57 -0700, Andres Freund wrote:
>> I think we might need some more tests for this to be committable, so
>> it might not become committable tomorrow.
> 
> I'm working on some infrastructure around this.  Not sure if it needs to
> be committed, but it's certainly useful for evaluation.  Basically it's
> a small UDF that:
> 1) creates a slot via walsender protocol (to some dsn)
> 2) imports that snapshot into yet another connection to that dsn
> 3) runs some query over that new connection
> 
> That makes it reasonably easy to run e.g. pgbench and continually create
> slots, and use the snapshot to run queries "verifying" that things look
> good.  It's a bit shoestring-ed together, but everything else seems to
> require more code. And it's just test.
> 
> Unless somebody has a better idea?
> 

I don't. I mean it would be nice to have isolation tester support
walsender protocol, but I don't know anything about isolation tester
internals so no idea how much work that is. On top of that some of the
issues are not even possible to provoke via isolation tester (or
anything similar that would give us control over timing) unless we
expose a lot of guts of xlog/xact as a UDFs as well. So I think simple
function that does what you said and pgbench are reasonable solutions. I
guess you plan to make that as one of the test/modules or something
similar (the UDF)?

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-19 Thread Andres Freund
On 2017-04-17 21:16:57 -0700, Andres Freund wrote:
> I think we might need some more tests for this to be committable, so
> it might not become committable tomorrow.

I'm working on some infrastructure around this.  Not sure if it needs to
be committed, but it's certainly useful for evaluation.  Basically it's
a small UDF that:
1) creates a slot via walsender protocol (to some dsn)
2) imports that snapshot into yet another connection to that dsn
3) runs some query over that new connection

That makes it reasonably easy to run e.g. pgbench and continually create
slots, and use the snapshot to run queries "verifying" that things look
good.  It's a bit shoestring-ed together, but everything else seems to
require more code. And it's just test.

Unless somebody has a better idea?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-17 Thread Andres Freund
On 2017-04-16 22:04:04 -0400, Noah Misch wrote:
> On Thu, Apr 13, 2017 at 12:58:12AM -0400, Noah Misch wrote:
> > On Wed, Apr 12, 2017 at 10:21:51AM -0700, Andres Freund wrote:
> > > On 2017-04-12 11:03:57 -0400, Peter Eisentraut wrote:
> > > > On 4/12/17 02:31, Noah Misch wrote:
> > > > >>> But I hope you mean to commit these snapbuild patches before the 
> > > > >>> postgres 10
> > > > >>> release?  As far as I know, logical replication is still very 
> > > > >>> broken without
> > > > >>> them (or at least some of that set of 5 patches - I don't know 
> > > > >>> which ones
> > > > >>> are essential and which may not be).
> > > > >>
> > > > >> Yes, these should go into 10 *and* earlier releases, and I don't 
> > > > >> plan to
> > > > >> wait for 2017-07.
> > > > > 
> > > > > [Action required within three days.  This is a generic notification.]
> > > > 
> > > > I'm hoping for a word from Andres on this.
> > > 
> > > Feel free to reassign to me.
> > 
> > Thanks for volunteering; I'll do that shortly.  Please observe the policy on
> > open item ownership[1] and send a status update within three calendar days 
> > of
> > this message.  Include a date for your subsequent status update.
> > 
> > [1] 
> > https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> 
> This PostgreSQL 10 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.

I've since the previous update reviewed Petr's patch, which he since has
updated over the weekend.  I'll do another round tomorrow, and will see
how it looks.  I think we might need some more tests for this to be
committable, so it might not become committable tomorrow.  I hope we'll
have something in tree by end of this week, if not I'll send an update.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-16 Thread Noah Misch
On Thu, Apr 13, 2017 at 12:58:12AM -0400, Noah Misch wrote:
> On Wed, Apr 12, 2017 at 10:21:51AM -0700, Andres Freund wrote:
> > On 2017-04-12 11:03:57 -0400, Peter Eisentraut wrote:
> > > On 4/12/17 02:31, Noah Misch wrote:
> > > >>> But I hope you mean to commit these snapbuild patches before the 
> > > >>> postgres 10
> > > >>> release?  As far as I know, logical replication is still very broken 
> > > >>> without
> > > >>> them (or at least some of that set of 5 patches - I don't know which 
> > > >>> ones
> > > >>> are essential and which may not be).
> > > >>
> > > >> Yes, these should go into 10 *and* earlier releases, and I don't plan 
> > > >> to
> > > >> wait for 2017-07.
> > > > 
> > > > [Action required within three days.  This is a generic notification.]
> > > 
> > > I'm hoping for a word from Andres on this.
> > 
> > Feel free to reassign to me.
> 
> Thanks for volunteering; I'll do that shortly.  Please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.
> 
> [1] 
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-14 Thread Petr Jelinek
Hi, here is updated patch (details inline).

On 13/04/17 20:00, Petr Jelinek wrote:
> Thanks for looking at this!
> 
> On 13/04/17 02:29, Andres Freund wrote:
>> Hi,
>> On 2017-03-03 01:30:11 +0100, Petr Jelinek wrote:
>>
>>> From 7d5b48c8cb80e7c867b2096c999d08feda50b197 Mon Sep 17 00:00:00 2001
>>> From: Petr Jelinek 
>>> Date: Fri, 24 Feb 2017 21:39:03 +0100
>>> Subject: [PATCH 1/5] Reserve global xmin for create slot snasphot export
>>>
>>> Otherwise the VACUUM or pruning might remove tuples still needed by the
>>> exported snapshot.
>>> ---
>>>  src/backend/replication/logical/logical.c | 21 -
>>>  1 file changed, 20 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/src/backend/replication/logical/logical.c 
>>> b/src/backend/replication/logical/logical.c
>>> index 5529ac8..57c392c 100644
>>> --- a/src/backend/replication/logical/logical.c
>>> +++ b/src/backend/replication/logical/logical.c
>>> @@ -267,12 +267,18 @@ CreateInitDecodingContext(char *plugin,
>>>  * the slot machinery about the new limit. Once that's done the
>>>  * ProcArrayLock can be released as the slot machinery now is
>>>  * protecting against vacuum.
>>> +*
>>> +* Note that we only store the global xmin temporarily in the in-memory
>>> +* state so that the initial snapshot can be exported. After initial
>>> +* snapshot is done global xmin should be reset and not tracked anymore
>>> +* so we are fine with losing the global xmin after crash.
>>>  * 
>>>  */
>>> LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
>>>  
>>> slot->effective_catalog_xmin = GetOldestSafeDecodingTransactionId();
>>> slot->data.catalog_xmin = slot->effective_catalog_xmin;
>>> +   slot->effective_xmin = slot->effective_catalog_xmin;
>>
>>
>>>  void
>>>  FreeDecodingContext(LogicalDecodingContext *ctx)
>>>  {
>>> +   ReplicationSlot *slot = MyReplicationSlot;
>>> +
>>> if (ctx->callbacks.shutdown_cb != NULL)
>>> shutdown_cb_wrapper(ctx);
>>>  
>>> +   /*
>>> +* Cleanup global xmin for the slot that we may have set in
>>> +* CreateInitDecodingContext().
>>
>> Hm.  Is that actually a meaningful point to do so? For one, it gets
>> called by pg_logical_slot_get_changes_guts(), but more importantly, the
>> snapshot is exported till SnapBuildClearExportedSnapshot(), which is the
>> next command?  If we rely on the snapshot magic done by ExportSnapshot()
>> it'd be worthwhile to mention that...
>>
> 
> (Didn't see the patch for couple of months so don't remember all the
> detailed decisions anymore)
> 
> Yes we rely on backend's xmin to be set for exported snapshot. We only
> care about global xmin for exported snapshots really, I assumed that's
> clear enough from "so that the initial snapshot can be exported" but I
> guess there should be more clear comment about this where we actually
> clean this up.
> 

Okay, wrote new comment there, how is it now?

>>
>>> We do not take ProcArrayLock or similar
>>> +* since we only reset xmin here and there's not much harm done by a
>>> +* concurrent computation missing that.
>>> +*/
>>
>> Hum.  I was prepared to complain about this, but ISTM, that there's
>> absolutely no risk because the following
>> ReplicationSlotsComputeRequiredXmin(false); actually does all the
>> necessary locking?  But still, I don't see much point in the
>> optimization.
>>
> 
> Well, if we don't need it in LogicalConfirmReceivedLocation, I don't see
> why we need it here. Please enlighten me.
> 

I kept this as it was, after rereading, the
ReplicationSlotsComputeRequiredXmin(false) will do shared lock while if
we wanted to avoid mutex and do the xmin update under lock we'd need to
do exclusive lock so I think it's worth the optimization...

>>
>>
>>> This patch changes the code so that stored snapshots are only used for
>>> logical decoding restart but not for initial slot snapshot.
>>
>> Yea, that's a very good point...
>>
>>> @@ -1284,13 +1286,13 @@ SnapBuildFindSnapshot(SnapBuild *builder, 
>>> XLogRecPtr lsn, xl_running_xacts *runn
>>>  
>>> return false;
>>> }
>>> -   /* c) valid on disk state */
>>> -   else if (SnapBuildRestore(builder, lsn))
>>> +   /* c) valid on disk state and not exported snapshot */
>>> +   else if (!TransactionIdIsNormal(builder->initial_xmin_horizon) &&
>>> +SnapBuildRestore(builder, lsn))
>>
>> Hm. Is this a good signaling mechanism? It'll also trigger for the SQL
>> interface, where it'd strictly speaking not be required, right?
>>
> 
> Good point. Maybe we should really tell snapshot builder if the snapshot
> is going to be exported or not explicitly (see the rant all the way down).
> 

I added the new signaling mechanism (the new boolean option indicating
if we are building full snapshot which is only set when the snapshot is
exported or used by the transaction).

>>
>>> From 3318a929e691870f3c1ca665bec3bfa8ea2af2a8 Mon Sep 17 00:00:00 2001
>>> From: Petr 

Re: [HACKERS] snapbuild woes

2017-04-13 Thread Petr Jelinek
On 13/04/17 07:02, Andres Freund wrote:
> 
> On April 12, 2017 9:58:12 PM PDT, Noah Misch  wrote:
>> On Wed, Apr 12, 2017 at 10:21:51AM -0700, Andres Freund wrote:
>>> On 2017-04-12 11:03:57 -0400, Peter Eisentraut wrote:
 On 4/12/17 02:31, Noah Misch wrote:
>>> But I hope you mean to commit these snapbuild patches before
>> the postgres 10
>>> release?  As far as I know, logical replication is still very
>> broken without
>>> them (or at least some of that set of 5 patches - I don't know
>> which ones
>>> are essential and which may not be).
>>
>> Yes, these should go into 10 *and* earlier releases, and I don't
>> plan to
>> wait for 2017-07.
>
> [Action required within three days.  This is a generic
>> notification.]

 I'm hoping for a word from Andres on this.
>>>
>>> Feel free to reassign to me.
>>
>> Thanks for volunteering; I'll do that shortly.  Please observe the
>> policy on
>> open item ownership[1] and send a status update within three calendar
>> days of
>> this message.  Include a date for your subsequent status update.
>>
>> [1]
>> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> 
> Will, volunteering might be the wrong word.  These ate all my fault, although 
> none look v10 specific.
> 

Yeah none of this is v10 specific, the importance to v10 is that it
affects the logical replication in core, not just extensions like in
9.4-9.6.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-13 Thread Petr Jelinek
Thanks for looking at this!

On 13/04/17 02:29, Andres Freund wrote:
> Hi,
> On 2017-03-03 01:30:11 +0100, Petr Jelinek wrote:
> 
>> From 7d5b48c8cb80e7c867b2096c999d08feda50b197 Mon Sep 17 00:00:00 2001
>> From: Petr Jelinek 
>> Date: Fri, 24 Feb 2017 21:39:03 +0100
>> Subject: [PATCH 1/5] Reserve global xmin for create slot snasphot export
>>
>> Otherwise the VACUUM or pruning might remove tuples still needed by the
>> exported snapshot.
>> ---
>>  src/backend/replication/logical/logical.c | 21 -
>>  1 file changed, 20 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/backend/replication/logical/logical.c 
>> b/src/backend/replication/logical/logical.c
>> index 5529ac8..57c392c 100644
>> --- a/src/backend/replication/logical/logical.c
>> +++ b/src/backend/replication/logical/logical.c
>> @@ -267,12 +267,18 @@ CreateInitDecodingContext(char *plugin,
>>   * the slot machinery about the new limit. Once that's done the
>>   * ProcArrayLock can be released as the slot machinery now is
>>   * protecting against vacuum.
>> + *
>> + * Note that we only store the global xmin temporarily in the in-memory
>> + * state so that the initial snapshot can be exported. After initial
>> + * snapshot is done global xmin should be reset and not tracked anymore
>> + * so we are fine with losing the global xmin after crash.
>>   * 
>>   */
>>  LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
>>  
>>  slot->effective_catalog_xmin = GetOldestSafeDecodingTransactionId();
>>  slot->data.catalog_xmin = slot->effective_catalog_xmin;
>> +slot->effective_xmin = slot->effective_catalog_xmin;
> 
> 
>>  void
>>  FreeDecodingContext(LogicalDecodingContext *ctx)
>>  {
>> +ReplicationSlot *slot = MyReplicationSlot;
>> +
>>  if (ctx->callbacks.shutdown_cb != NULL)
>>  shutdown_cb_wrapper(ctx);
>>  
>> +/*
>> + * Cleanup global xmin for the slot that we may have set in
>> + * CreateInitDecodingContext().
> 
> Hm.  Is that actually a meaningful point to do so? For one, it gets
> called by pg_logical_slot_get_changes_guts(), but more importantly, the
> snapshot is exported till SnapBuildClearExportedSnapshot(), which is the
> next command?  If we rely on the snapshot magic done by ExportSnapshot()
> it'd be worthwhile to mention that...
> 

(Didn't see the patch for couple of months so don't remember all the
detailed decisions anymore)

Yes we rely on backend's xmin to be set for exported snapshot. We only
care about global xmin for exported snapshots really, I assumed that's
clear enough from "so that the initial snapshot can be exported" but I
guess there should be more clear comment about this where we actually
clean this up.

> 
>> We do not take ProcArrayLock or similar
>> + * since we only reset xmin here and there's not much harm done by a
>> + * concurrent computation missing that.
>> + */
> 
> Hum.  I was prepared to complain about this, but ISTM, that there's
> absolutely no risk because the following
> ReplicationSlotsComputeRequiredXmin(false); actually does all the
> necessary locking?  But still, I don't see much point in the
> optimization.
> 

Well, if we don't need it in LogicalConfirmReceivedLocation, I don't see
why we need it here. Please enlighten me.

> 
> 
>> This patch changes the code so that stored snapshots are only used for
>> logical decoding restart but not for initial slot snapshot.
> 
> Yea, that's a very good point...
> 
>> @@ -1284,13 +1286,13 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr 
>> lsn, xl_running_xacts *runn
>>  
>>  return false;
>>  }
>> -/* c) valid on disk state */
>> -else if (SnapBuildRestore(builder, lsn))
>> +/* c) valid on disk state and not exported snapshot */
>> +else if (!TransactionIdIsNormal(builder->initial_xmin_horizon) &&
>> + SnapBuildRestore(builder, lsn))
> 
> Hm. Is this a good signaling mechanism? It'll also trigger for the SQL
> interface, where it'd strictly speaking not be required, right?
> 

Good point. Maybe we should really tell snapshot builder if the snapshot
is going to be exported or not explicitly (see the rant all the way down).

> 
>> From 3318a929e691870f3c1ca665bec3bfa8ea2af2a8 Mon Sep 17 00:00:00 2001
>> From: Petr Jelinek 
>> Date: Sun, 26 Feb 2017 01:07:33 +0100
>> Subject: [PATCH 3/5] Prevent snapshot builder xmin from going backwards
> 
> A bit more commentary would be good. What does that protect us against?
> 

I think I explained that in the email. We might export snapshot with
xmin smaller than global xmin otherwise.

> 
> 
>> From 53193b40f26dd19c712f3b9b77af55f81eb31cc4 Mon Sep 17 00:00:00 2001
>> From: Petr Jelinek 
>> Date: Wed, 22 Feb 2017 00:57:33 +0100
>> Subject: [PATCH 4/5] Fix xl_running_xacts usage in snapshot builder
>>
>> Due to race condition, the xl_running_xacts might contain no longer
>> 

Re: [HACKERS] snapbuild woes

2017-04-12 Thread Andres Freund


On April 12, 2017 9:58:12 PM PDT, Noah Misch  wrote:
>On Wed, Apr 12, 2017 at 10:21:51AM -0700, Andres Freund wrote:
>> On 2017-04-12 11:03:57 -0400, Peter Eisentraut wrote:
>> > On 4/12/17 02:31, Noah Misch wrote:
>> > >>> But I hope you mean to commit these snapbuild patches before
>the postgres 10
>> > >>> release?  As far as I know, logical replication is still very
>broken without
>> > >>> them (or at least some of that set of 5 patches - I don't know
>which ones
>> > >>> are essential and which may not be).
>> > >>
>> > >> Yes, these should go into 10 *and* earlier releases, and I don't
>plan to
>> > >> wait for 2017-07.
>> > > 
>> > > [Action required within three days.  This is a generic
>notification.]
>> > 
>> > I'm hoping for a word from Andres on this.
>> 
>> Feel free to reassign to me.
>
>Thanks for volunteering; I'll do that shortly.  Please observe the
>policy on
>open item ownership[1] and send a status update within three calendar
>days of
>this message.  Include a date for your subsequent status update.
>
>[1]
>https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

Will, volunteering might be the wrong word.  These ate all my fault, although 
none look v10 specific.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-12 Thread Noah Misch
On Wed, Apr 12, 2017 at 10:21:51AM -0700, Andres Freund wrote:
> On 2017-04-12 11:03:57 -0400, Peter Eisentraut wrote:
> > On 4/12/17 02:31, Noah Misch wrote:
> > >>> But I hope you mean to commit these snapbuild patches before the 
> > >>> postgres 10
> > >>> release?  As far as I know, logical replication is still very broken 
> > >>> without
> > >>> them (or at least some of that set of 5 patches - I don't know which 
> > >>> ones
> > >>> are essential and which may not be).
> > >>
> > >> Yes, these should go into 10 *and* earlier releases, and I don't plan to
> > >> wait for 2017-07.
> > > 
> > > [Action required within three days.  This is a generic notification.]
> > 
> > I'm hoping for a word from Andres on this.
> 
> Feel free to reassign to me.

Thanks for volunteering; I'll do that shortly.  Please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-12 Thread Andres Freund
Hi,
On 2017-03-03 01:30:11 +0100, Petr Jelinek wrote:

> From 7d5b48c8cb80e7c867b2096c999d08feda50b197 Mon Sep 17 00:00:00 2001
> From: Petr Jelinek 
> Date: Fri, 24 Feb 2017 21:39:03 +0100
> Subject: [PATCH 1/5] Reserve global xmin for create slot snasphot export
> 
> Otherwise the VACUUM or pruning might remove tuples still needed by the
> exported snapshot.
> ---
>  src/backend/replication/logical/logical.c | 21 -
>  1 file changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/src/backend/replication/logical/logical.c 
> b/src/backend/replication/logical/logical.c
> index 5529ac8..57c392c 100644
> --- a/src/backend/replication/logical/logical.c
> +++ b/src/backend/replication/logical/logical.c
> @@ -267,12 +267,18 @@ CreateInitDecodingContext(char *plugin,
>* the slot machinery about the new limit. Once that's done the
>* ProcArrayLock can be released as the slot machinery now is
>* protecting against vacuum.
> +  *
> +  * Note that we only store the global xmin temporarily in the in-memory
> +  * state so that the initial snapshot can be exported. After initial
> +  * snapshot is done global xmin should be reset and not tracked anymore
> +  * so we are fine with losing the global xmin after crash.
>* 
>*/
>   LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
>  
>   slot->effective_catalog_xmin = GetOldestSafeDecodingTransactionId();
>   slot->data.catalog_xmin = slot->effective_catalog_xmin;
> + slot->effective_xmin = slot->effective_catalog_xmin;


>  void
>  FreeDecodingContext(LogicalDecodingContext *ctx)
>  {
> + ReplicationSlot *slot = MyReplicationSlot;
> +
>   if (ctx->callbacks.shutdown_cb != NULL)
>   shutdown_cb_wrapper(ctx);
>  
> + /*
> +  * Cleanup global xmin for the slot that we may have set in
> +  * CreateInitDecodingContext().

Hm.  Is that actually a meaningful point to do so? For one, it gets
called by pg_logical_slot_get_changes_guts(), but more importantly, the
snapshot is exported till SnapBuildClearExportedSnapshot(), which is the
next command?  If we rely on the snapshot magic done by ExportSnapshot()
it'd be worthwhile to mention that...


> We do not take ProcArrayLock or similar
> +  * since we only reset xmin here and there's not much harm done by a
> +  * concurrent computation missing that.
> +  */

Hum.  I was prepared to complain about this, but ISTM, that there's
absolutely no risk because the following
ReplicationSlotsComputeRequiredXmin(false); actually does all the
necessary locking?  But still, I don't see much point in the
optimization.



> This patch changes the code so that stored snapshots are only used for
> logical decoding restart but not for initial slot snapshot.

Yea, that's a very good point...

> @@ -1284,13 +1286,13 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr 
> lsn, xl_running_xacts *runn
>  
>   return false;
>   }
> - /* c) valid on disk state */
> - else if (SnapBuildRestore(builder, lsn))
> + /* c) valid on disk state and not exported snapshot */
> + else if (!TransactionIdIsNormal(builder->initial_xmin_horizon) &&
> +  SnapBuildRestore(builder, lsn))

Hm. Is this a good signaling mechanism? It'll also trigger for the SQL
interface, where it'd strictly speaking not be required, right?


> From 3318a929e691870f3c1ca665bec3bfa8ea2af2a8 Mon Sep 17 00:00:00 2001
> From: Petr Jelinek 
> Date: Sun, 26 Feb 2017 01:07:33 +0100
> Subject: [PATCH 3/5] Prevent snapshot builder xmin from going backwards

A bit more commentary would be good. What does that protect us against?



> From 53193b40f26dd19c712f3b9b77af55f81eb31cc4 Mon Sep 17 00:00:00 2001
> From: Petr Jelinek 
> Date: Wed, 22 Feb 2017 00:57:33 +0100
> Subject: [PATCH 4/5] Fix xl_running_xacts usage in snapshot builder
> 
> Due to race condition, the xl_running_xacts might contain no longer
> running transactions. Previous coding tried to get around this by
> additional locking but that did not work correctly for committs. Instead
> try combining decoded commits and multiple xl_running_xacts to get the
> consistent snapshot.

Needs more explanation about approach.


> @@ -1221,7 +1221,12 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr 
> lsn, xl_running_xacts *runn
>*simply track the number of in-progress toplevel transactions 
> and
>*lower it whenever one commits or aborts. When that number
>*(builder->running.xcnt) reaches zero, we can go from 
> FULL_SNAPSHOT
> -  *to CONSISTENT.
> +  *to CONSISTENT. Sometimes we might get xl_running_xacts which 
> has
> +  *all tracked transactions as finished. We'll need to restart 
> tracking
> +  *in that case and use previously collected committed 
> transactions to
> +  *

Re: [HACKERS] snapbuild woes

2017-04-12 Thread Andres Freund
On 2017-04-12 11:03:57 -0400, Peter Eisentraut wrote:
> On 4/12/17 02:31, Noah Misch wrote:
> >>> But I hope you mean to commit these snapbuild patches before the postgres 
> >>> 10
> >>> release?  As far as I know, logical replication is still very broken 
> >>> without
> >>> them (or at least some of that set of 5 patches - I don't know which ones
> >>> are essential and which may not be).
> >>
> >> Yes, these should go into 10 *and* earlier releases, and I don't plan to
> >> wait for 2017-07.
> > 
> > [Action required within three days.  This is a generic notification.]
> 
> I'm hoping for a word from Andres on this.

Feel free to reassign to me.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-12 Thread Simon Riggs
On 3 March 2017 at 00:30, Petr Jelinek  wrote:

>> 0004 - Changes handling of the xl_running_xacts in initial snapshot
>> build to what I wrote above and removes the extra locking from
>> LogStandbySnapshot introduced by logical decoding.

This seems OK and unlikely to have wider impact.

The "race condition" we're speaking about is by design, not a bug.

I think the initial comment could be slightly better worded; if I
didn't already know what is being discussed, I wouldnt be too much
further forwards from those comments.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-12 Thread Peter Eisentraut
On 4/12/17 02:31, Noah Misch wrote:
>>> But I hope you mean to commit these snapbuild patches before the postgres 10
>>> release?  As far as I know, logical replication is still very broken without
>>> them (or at least some of that set of 5 patches - I don't know which ones
>>> are essential and which may not be).
>>
>> Yes, these should go into 10 *and* earlier releases, and I don't plan to
>> wait for 2017-07.
> 
> [Action required within three days.  This is a generic notification.]

I'm hoping for a word from Andres on this.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-12 Thread Noah Misch
On Sat, Apr 08, 2017 at 07:30:59AM -0700, Andres Freund wrote:
> On 2017-04-08 16:29:10 +0200, Erik Rijkers wrote:
> > On 2017-04-08 15:56, Andres Freund wrote:
> > > On 2017-04-08 09:51:39 -0400, David Steele wrote:
> > > > On 3/2/17 7:54 PM, Petr Jelinek wrote:
> > > > >
> > > > > Yes the copy patch needs rebase as well. But these ones are fine.
> > > > 
> > > > This bug has been moved to CF 2017-07.
> > > 
> > > FWIW, as these are bug-fixes that need to be backpatched, I do plan to
> > > work on them soon.
> > > 
> > 
> > CF 2017-07 pertains to postgres 11, is that right?
> > 
> > But I hope you mean to commit these snapbuild patches before the postgres 10
> > release?  As far as I know, logical replication is still very broken without
> > them (or at least some of that set of 5 patches - I don't know which ones
> > are essential and which may not be).
> 
> Yes, these should go into 10 *and* earlier releases, and I don't plan to
> wait for 2017-07.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Peter,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-08 Thread David Steele
On 4/8/17 10:29 AM, Erik Rijkers wrote:
> On 2017-04-08 15:56, Andres Freund wrote:
>> On 2017-04-08 09:51:39 -0400, David Steele wrote:
>>> On 3/2/17 7:54 PM, Petr Jelinek wrote:
>>> >
>>> > Yes the copy patch needs rebase as well. But these ones are fine.
>>>
>>> This bug has been moved to CF 2017-07.
>>
>> FWIW, as these are bug-fixes that need to be backpatched, I do plan to
>> work on them soon.
>>
> 
> CF 2017-07 pertains to postgres 11, is that right?

In general, yes, but bugs will always be fixed as needed.  It doesn't
matter what CF they are in.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-08 Thread Andres Freund
On 2017-04-08 16:29:10 +0200, Erik Rijkers wrote:
> On 2017-04-08 15:56, Andres Freund wrote:
> > On 2017-04-08 09:51:39 -0400, David Steele wrote:
> > > On 3/2/17 7:54 PM, Petr Jelinek wrote:
> > > >
> > > > Yes the copy patch needs rebase as well. But these ones are fine.
> > > 
> > > This bug has been moved to CF 2017-07.
> > 
> > FWIW, as these are bug-fixes that need to be backpatched, I do plan to
> > work on them soon.
> > 
> 
> CF 2017-07 pertains to postgres 11, is that right?
> 
> But I hope you mean to commit these snapbuild patches before the postgres 10
> release?  As far as I know, logical replication is still very broken without
> them (or at least some of that set of 5 patches - I don't know which ones
> are essential and which may not be).

Yes, these should go into 10 *and* earlier releases, and I don't plan to
wait for 2017-07.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-08 Thread Erik Rijkers

On 2017-04-08 15:56, Andres Freund wrote:

On 2017-04-08 09:51:39 -0400, David Steele wrote:

On 3/2/17 7:54 PM, Petr Jelinek wrote:
>
> Yes the copy patch needs rebase as well. But these ones are fine.

This bug has been moved to CF 2017-07.


FWIW, as these are bug-fixes that need to be backpatched, I do plan to
work on them soon.



CF 2017-07 pertains to postgres 11, is that right?

But I hope you mean to commit these snapbuild patches before the 
postgres 10 release?  As far as I know, logical replication is still 
very broken without them (or at least some of that set of 5 patches - I 
don't know which ones are essential and which may not be).


If it's at all useful I can repeat tests to show how often current 
master still fails (easily 50% or so failure-rate).


This would be the pgbench-over-logical-replication test that I did so 
often earlier on.


thanks,

Erik Rijkers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-08 Thread Andres Freund
On 2017-04-08 09:51:39 -0400, David Steele wrote:
> On 3/2/17 7:54 PM, Petr Jelinek wrote:
> > 
> > Yes the copy patch needs rebase as well. But these ones are fine.
> 
> This bug has been moved to CF 2017-07.

FWIW, as these are bug-fixes that need to be backpatched, I do plan to
work on them soon.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-08 Thread David Steele
On 3/2/17 7:54 PM, Petr Jelinek wrote:
> 
> Yes the copy patch needs rebase as well. But these ones are fine.

This bug has been moved to CF 2017-07.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-03-02 Thread Petr Jelinek
On 03/03/17 01:53, Erik Rijkers wrote:
> On 2017-03-03 01:30, Petr Jelinek wrote:
> 
> With these patches:
> 
> 0001-Use-asynchronous-connect-API-in-libpqwalreceiver.patch
> 0002-Fix-after-trigger-execution-in-logical-replication.patch
> 0003-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch
> snapbuild-v5-0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch
> snapbuild-v5-0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch
> 
> snapbuild-v5-0003-Prevent-snapshot-builder-xmin-from-going-backwards.patch
> snapbuild-v5-0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patch
> snapbuild-v5-0005-Skip-unnecessary-snapshot-builds.patch
> 0001-Logical-replication-support-for-initial-data-copy-v6.patch
> 
> I get:
> 
> subscriptioncmds.c:47:12: error: static declaration of ‘oid_cmp’ follows
> non-static declaration
>  static int oid_cmp(const void *p1, const void *p2);
> ^~~
> In file included from subscriptioncmds.c:42:0:
> ../../../src/include/utils/builtins.h:70:12: note: previous declaration
> of ‘oid_cmp’ was here
>  extern int oid_cmp(const void *p1, const void *p2);
> ^~~
> make[3]: *** [subscriptioncmds.o] Error 1
> make[3]: *** Waiting for unfinished jobs
> make[2]: *** [commands-recursive] Error 2
> make[2]: *** Waiting for unfinished jobs
> make[1]: *** [all-backend-recurse] Error 2
> make: *** [all-src-recurse] Error 2
> 

Yes the copy patch needs rebase as well. But these ones are fine.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-03-02 Thread Erik Rijkers

On 2017-03-03 01:30, Petr Jelinek wrote:

With these patches:

0001-Use-asynchronous-connect-API-in-libpqwalreceiver.patch
0002-Fix-after-trigger-execution-in-logical-replication.patch
0003-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch
snapbuild-v5-0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch
snapbuild-v5-0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch
snapbuild-v5-0003-Prevent-snapshot-builder-xmin-from-going-backwards.patch
snapbuild-v5-0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patch
snapbuild-v5-0005-Skip-unnecessary-snapshot-builds.patch
0001-Logical-replication-support-for-initial-data-copy-v6.patch

I get:

subscriptioncmds.c:47:12: error: static declaration of ‘oid_cmp’ follows 
non-static declaration

 static int oid_cmp(const void *p1, const void *p2);
^~~
In file included from subscriptioncmds.c:42:0:
../../../src/include/utils/builtins.h:70:12: note: previous declaration 
of ‘oid_cmp’ was here

 extern int oid_cmp(const void *p1, const void *p2);
^~~
make[3]: *** [subscriptioncmds.o] Error 1
make[3]: *** Waiting for unfinished jobs
make[2]: *** [commands-recursive] Error 2
make[2]: *** Waiting for unfinished jobs
make[1]: *** [all-backend-recurse] Error 2
make: *** [all-src-recurse] Error 2




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-03-02 Thread Petr Jelinek
On 26/02/17 01:43, Petr Jelinek wrote:
> On 24/02/17 22:56, Petr Jelinek wrote:
>> On 22/02/17 03:05, Petr Jelinek wrote:
>>> On 13/12/16 00:38, Petr Jelinek wrote:
 On 12/12/16 23:33, Andres Freund wrote:
> On 2016-12-12 23:27:30 +0100, Petr Jelinek wrote:
>> On 12/12/16 22:42, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:
 Hi,
 First one is outright bug, which has to do with how we track running
 transactions. What snapbuild basically does while doing initial 
 snapshot
 is read the xl_running_xacts record, store the list of running txes and
 then wait until they all finish. The problem with this is that
 xl_running_xacts does not ensure that it only logs transactions that 
 are
 actually still running (to avoid locking PGPROC) so there might be xids
 in xl_running_xacts that already committed before it was logged.
>>>
>>> I don't think that's actually true?  Notice how LogStandbySnapshot()
>>> only releases the lock *after* the LogCurrentRunningXacts() iff
>>> wal_level >= WAL_LEVEL_LOGICAL.  So the explanation for the problem you
>>> observed must actually be a bit more complex :(
>>>
>>
>> Hmm, interesting, I did see the transaction commit in the WAL before the
>> xl_running_xacts that contained the xid as running. I only seen it on
>> production system though, didn't really manage to easily reproduce it
>> locally.
>
> I suspect the reason for that is that RecordTransactionCommit() doesn't
> conflict with ProcArrayLock in the first place - only
> ProcArrayEndTransaction() does.  So they're still running in the PGPROC
> sense, just not the crash-recovery sense...
>

 That looks like reasonable explanation. BTW I realized my patch needs
 bit more work, currently it will break the actual snapshot as it behaves
 same as if the xl_running_xacts was empty which is not correct AFAICS.

>>>
>>> Hi,
>>>
>>> I got to work on this again. Unfortunately I haven't found solution that
>>> I would be very happy with. What I did is in case we read
>>> xl_running_xacts which has all transactions we track finished, we start
>>> tracking from that new xl_running_xacts again with the difference that
>>> we clean up the running transactions based on previously seen committed
>>> ones. That means that on busy server we may wait for multiple
>>> xl_running_xacts rather than just one, but at least we have chance to
>>> finish unlike with current coding which basically waits for empty
>>> xl_running_xacts. I also removed the additional locking for logical
>>> wal_level in LogStandbySnapshot() since it does not work.
>>
>> Not hearing any opposition to this idea so I decided to polish this and
>> also optimize it a bit.
>>
>> That being said, thanks to testing from Erik Rijkers I've identified one
>> more bug in how we do the initial snapshot. Apparently we don't reserve
>> the global xmin when we start building the initial exported snapshot for
>> a slot (we only reserver catalog_xmin which is fine for logical decoding
>> but not for the exported snapshot) so the VACUUM and heap pruning will
>> happily delete old versions of rows that are still needed by anybody
>> trying to use that exported snapshot.
>>
> 
> Aaand I found one more bug in snapbuild. Apparently we don't protect the
> snapshot builder xmin from going backwards which can yet again result in
> corrupted exported snapshot.
> 
> Summary of attached patches:
> 0001 - Fixes the above mentioned global xmin tracking issues. Needs to
> be backported all the way to 9.4
> 
> 0002 - Removes use of the logical decoding saved snapshots for initial
> exported snapshot since those snapshots only work for catalogs and not
> user data. Also needs to be backported all the way to 9.4.

I've been bit overzealous about this one (removed the use of the saved
snapshots completely). So here is a fix for that (and rebase on top of
current HEAD).

> 
> 0003 - Makes sure snapshot builder xmin is not moved backwards by
> xl_running_xacts (which can otherwise happen during initial snapshot
> building). Also should be backported to 9.4.
> 
> 0004 - Changes handling of the xl_running_xacts in initial snapshot
> build to what I wrote above and removes the extra locking from
> LogStandbySnapshot introduced by logical decoding.
> 
> 0005 - Improves performance of initial snapshot building by skipping
> catalog snapshot build for transactions that don't do catalog changes.
> 


-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
>From 7d5b48c8cb80e7c867b2096c999d08feda50b197 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Fri, 24 Feb 2017 21:39:03 +0100
Subject: [PATCH 1/5] Reserve global xmin for create slot snasphot export

Otherwise the VACUUM or pruning might remove 

Re: [HACKERS] snapbuild woes

2017-02-25 Thread Petr Jelinek
On 24/02/17 22:56, Petr Jelinek wrote:
> On 22/02/17 03:05, Petr Jelinek wrote:
>> On 13/12/16 00:38, Petr Jelinek wrote:
>>> On 12/12/16 23:33, Andres Freund wrote:
 On 2016-12-12 23:27:30 +0100, Petr Jelinek wrote:
> On 12/12/16 22:42, Andres Freund wrote:
>> Hi,
>>
>> On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:
>>> Hi,
>>> First one is outright bug, which has to do with how we track running
>>> transactions. What snapbuild basically does while doing initial snapshot
>>> is read the xl_running_xacts record, store the list of running txes and
>>> then wait until they all finish. The problem with this is that
>>> xl_running_xacts does not ensure that it only logs transactions that are
>>> actually still running (to avoid locking PGPROC) so there might be xids
>>> in xl_running_xacts that already committed before it was logged.
>>
>> I don't think that's actually true?  Notice how LogStandbySnapshot()
>> only releases the lock *after* the LogCurrentRunningXacts() iff
>> wal_level >= WAL_LEVEL_LOGICAL.  So the explanation for the problem you
>> observed must actually be a bit more complex :(
>>
>
> Hmm, interesting, I did see the transaction commit in the WAL before the
> xl_running_xacts that contained the xid as running. I only seen it on
> production system though, didn't really manage to easily reproduce it
> locally.

 I suspect the reason for that is that RecordTransactionCommit() doesn't
 conflict with ProcArrayLock in the first place - only
 ProcArrayEndTransaction() does.  So they're still running in the PGPROC
 sense, just not the crash-recovery sense...

>>>
>>> That looks like reasonable explanation. BTW I realized my patch needs
>>> bit more work, currently it will break the actual snapshot as it behaves
>>> same as if the xl_running_xacts was empty which is not correct AFAICS.
>>>
>>
>> Hi,
>>
>> I got to work on this again. Unfortunately I haven't found solution that
>> I would be very happy with. What I did is in case we read
>> xl_running_xacts which has all transactions we track finished, we start
>> tracking from that new xl_running_xacts again with the difference that
>> we clean up the running transactions based on previously seen committed
>> ones. That means that on busy server we may wait for multiple
>> xl_running_xacts rather than just one, but at least we have chance to
>> finish unlike with current coding which basically waits for empty
>> xl_running_xacts. I also removed the additional locking for logical
>> wal_level in LogStandbySnapshot() since it does not work.
> 
> Not hearing any opposition to this idea so I decided to polish this and
> also optimize it a bit.
> 
> That being said, thanks to testing from Erik Rijkers I've identified one
> more bug in how we do the initial snapshot. Apparently we don't reserve
> the global xmin when we start building the initial exported snapshot for
> a slot (we only reserver catalog_xmin which is fine for logical decoding
> but not for the exported snapshot) so the VACUUM and heap pruning will
> happily delete old versions of rows that are still needed by anybody
> trying to use that exported snapshot.
> 

Aaand I found one more bug in snapbuild. Apparently we don't protect the
snapshot builder xmin from going backwards which can yet again result in
corrupted exported snapshot.

Summary of attached patches:
0001 - Fixes the above mentioned global xmin tracking issues. Needs to
be backported all the way to 9.4

0002 - Removes use of the logical decoding saved snapshots for initial
exported snapshot since those snapshots only work for catalogs and not
user data. Also needs to be backported all the way to 9.4.

0003 - Makes sure snapshot builder xmin is not moved backwards by
xl_running_xacts (which can otherwise happen during initial snapshot
building). Also should be backported to 9.4.

0004 - Changes handling of the xl_running_xacts in initial snapshot
build to what I wrote above and removes the extra locking from
LogStandbySnapshot introduced by logical decoding.

0005 - Improves performance of initial snapshot building by skipping
catalog snapshot build for transactions that don't do catalog changes.

I did some improvements to the other patches as well so they are not the
same as in previous post, hence the version bump.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From dfa0b07639059675704a35f2e63be4934f97c3a3 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Fri, 24 Feb 2017 21:39:03 +0100
Subject: [PATCH 1/5] Reserve global xmin for create slot snasphot export

Otherwise the VACUUM or pruning might remove tuples still needed by the
exported snapshot.
---
 src/backend/replication/logical/logical.c | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git 

Re: [HACKERS] snapbuild woes

2017-02-24 Thread Petr Jelinek
On 22/02/17 03:05, Petr Jelinek wrote:
> On 13/12/16 00:38, Petr Jelinek wrote:
>> On 12/12/16 23:33, Andres Freund wrote:
>>> On 2016-12-12 23:27:30 +0100, Petr Jelinek wrote:
 On 12/12/16 22:42, Andres Freund wrote:
> Hi,
>
> On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:
>> Hi,
>> First one is outright bug, which has to do with how we track running
>> transactions. What snapbuild basically does while doing initial snapshot
>> is read the xl_running_xacts record, store the list of running txes and
>> then wait until they all finish. The problem with this is that
>> xl_running_xacts does not ensure that it only logs transactions that are
>> actually still running (to avoid locking PGPROC) so there might be xids
>> in xl_running_xacts that already committed before it was logged.
>
> I don't think that's actually true?  Notice how LogStandbySnapshot()
> only releases the lock *after* the LogCurrentRunningXacts() iff
> wal_level >= WAL_LEVEL_LOGICAL.  So the explanation for the problem you
> observed must actually be a bit more complex :(
>

 Hmm, interesting, I did see the transaction commit in the WAL before the
 xl_running_xacts that contained the xid as running. I only seen it on
 production system though, didn't really manage to easily reproduce it
 locally.
>>>
>>> I suspect the reason for that is that RecordTransactionCommit() doesn't
>>> conflict with ProcArrayLock in the first place - only
>>> ProcArrayEndTransaction() does.  So they're still running in the PGPROC
>>> sense, just not the crash-recovery sense...
>>>
>>
>> That looks like reasonable explanation. BTW I realized my patch needs
>> bit more work, currently it will break the actual snapshot as it behaves
>> same as if the xl_running_xacts was empty which is not correct AFAICS.
>>
> 
> Hi,
> 
> I got to work on this again. Unfortunately I haven't found solution that
> I would be very happy with. What I did is in case we read
> xl_running_xacts which has all transactions we track finished, we start
> tracking from that new xl_running_xacts again with the difference that
> we clean up the running transactions based on previously seen committed
> ones. That means that on busy server we may wait for multiple
> xl_running_xacts rather than just one, but at least we have chance to
> finish unlike with current coding which basically waits for empty
> xl_running_xacts. I also removed the additional locking for logical
> wal_level in LogStandbySnapshot() since it does not work.

Not hearing any opposition to this idea so I decided to polish this and
also optimize it a bit.

That being said, thanks to testing from Erik Rijkers I've identified one
more bug in how we do the initial snapshot. Apparently we don't reserve
the global xmin when we start building the initial exported snapshot for
a slot (we only reserver catalog_xmin which is fine for logical decoding
but not for the exported snapshot) so the VACUUM and heap pruning will
happily delete old versions of rows that are still needed by anybody
trying to use that exported snapshot.

Attached are updated versions of patches:

0001 - Fixes the above mentioned global xmin tracking issues. Needs to
be backported all the way to 9.4

0002 - Removes use of the logical decoding saved snapshots for initial
exported snapshot since those snapshots only work for catalogs and not
user data. Also needs to be backported all the way to 9.4.

0003 - Changes handling of the xl_running_xacts in initial snapshot
build to what I wrote above and removes the extra locking from
LogStandbySnapshot introduced by logical decoding.

0004 - Improves performance of initial snapshot building by skipping
catalog snapshot build for transactions that don't do catalog changes.

The 0001 and 0002 are bug fixes because without them the exported
snapshots are basically corrupted. The 0003 and 0004 are performance
improvements, but on busy servers the snapshot export might never happen
so it's for rather serious performance issues.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From 67a44702ff146756b33e8d15e91a02f5d9e86792 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Fri, 24 Feb 2017 21:39:03 +0100
Subject: [PATCH 1/4] Reserve global xmin for create slot snasphot export

Otherwise the VACUUM or pruning might remove tuples still needed by the
exported snapshot.
---
 src/backend/replication/logical/logical.c | 31 +++
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 5529ac8..9062244 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -267,12 +267,18 @@ CreateInitDecodingContext(char *plugin,
 	 * the slot machinery about the new limit. Once that's done the

Re: [HACKERS] snapbuild woes

2017-02-22 Thread Petr Jelinek


On 22/02/17 11:29, Erik Rijkers wrote:
> On 2017-02-22 03:05, Petr Jelinek wrote:
>>
>> So to summarize attached patches:
>> 0001 - Fixes performance issue where we build tons of snapshots that we
>> don't need which kills CPU.
>>
>> 0002 - Disables the use of ondisk historical snapshots for initial
>> consistent snapshot export as it may result in corrupt data. This
>> definitely needs backport.
>>
>> 0003 - Fixes bug where we might never reach snapshot on busy server due
>> to race condition in xl_running_xacts logging. The original use of extra
>> locking does not seem to be enough in practice. Once we have agreed fix
>> for this it's probably worth backpatching. There are still some comments
>> that need updating, this is more of a PoC.
>>
> 
> I am not not entirely sure what to expect.  Should a server with these 3
> patches do initial data copy or not?  The sgml seems to imply there is
> not inital data copy.  But my test does copy something.
> 

Not by itself (without the copy patch), those fixes are for snapshots.

> 
> With
>> 0001-Skip-unnecessary-snapshot-builds.patch
>> 0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch
>> 0003-Fix-xl_running_xacts-usage-in-snapshot-builder.patch
> 
> the consistent (but wrong) end state is always that only one of the four
> pgbench tables, pgbench_history, is replicated (always correctly).
> 
> Below is the output from the test (I've edited the lines for email)
> (below, a,b,t,h stand for: pgbench_accounts, pgbench_branches,
> pgbench_tellers, pgbench_history)
> (master on port 6972, replica on port 6973.)
> 
> port
> 6972 a,b,t,h: 10  1 10347
> 6973 a,b,t,h:  0  0  0347
> 
> a,b,t,h: a68efc81a  2c27f7ba5  128590a57  1e4070879   master
> a,b,t,h: d41d8cd98  d41d8cd98  d41d8cd98  1e4070879   replica NOK
> 
> The md5-initstrings are from a md5 of the whole content of each table
> (an ordered select *)
> 
> I repeated this a few times: of course, the number of rows in
> pgbench_history varies a bit but otherwise it is always the same: 3
> empty replica tables, pgbench_history replicated correctly.
> 

That's actually correct behaviour without the initial copy patch, it
replicates changes, but since the 3 tables only get updates there is
nothing to replicate as there is no data downstream. However inserts
will of course work fine even without data downstream.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-02-22 Thread Erik Rijkers

On 2017-02-22 03:05, Petr Jelinek wrote:


So to summarize attached patches:
0001 - Fixes performance issue where we build tons of snapshots that we
don't need which kills CPU.

0002 - Disables the use of ondisk historical snapshots for initial
consistent snapshot export as it may result in corrupt data. This
definitely needs backport.

0003 - Fixes bug where we might never reach snapshot on busy server due
to race condition in xl_running_xacts logging. The original use of 
extra

locking does not seem to be enough in practice. Once we have agreed fix
for this it's probably worth backpatching. There are still some 
comments

that need updating, this is more of a PoC.



I am not not entirely sure what to expect.  Should a server with these 3 
patches do initial data copy or not?  The sgml seems to imply there is 
not inital data copy.  But my test does copy something.


Anyway, I have repeated the same old pgbench-test, assuming inital data 
copy should be working.


With

0001-Skip-unnecessary-snapshot-builds.patch
0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch
0003-Fix-xl_running_xacts-usage-in-snapshot-builder.patch


the consistent (but wrong) end state is always that only one of the four 
pgbench tables, pgbench_history, is replicated (always correctly).


Below is the output from the test (I've edited the lines for email)
(below, a,b,t,h stand for: pgbench_accounts, pgbench_branches, 
pgbench_tellers, pgbench_history)

(master on port 6972, replica on port 6973.)

port
6972 a,b,t,h: 10  1 10347
6973 a,b,t,h:  0  0  0347

a,b,t,h: a68efc81a  2c27f7ba5  128590a57  1e4070879   master
a,b,t,h: d41d8cd98  d41d8cd98  d41d8cd98  1e4070879   replica NOK

The md5-initstrings are from a md5 of the whole content of each table 
(an ordered select *)


I repeated this a few times: of course, the number of rows in 
pgbench_history varies a bit but otherwise it is always the same: 3 
empty replica tables, pgbench_history replicated correctly.


Something is not right.


thanks,


Erik Rijkers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-02-21 Thread Petr Jelinek
On 13/12/16 00:38, Petr Jelinek wrote:
> On 12/12/16 23:33, Andres Freund wrote:
>> On 2016-12-12 23:27:30 +0100, Petr Jelinek wrote:
>>> On 12/12/16 22:42, Andres Freund wrote:
 Hi,

 On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:
> Hi,
> First one is outright bug, which has to do with how we track running
> transactions. What snapbuild basically does while doing initial snapshot
> is read the xl_running_xacts record, store the list of running txes and
> then wait until they all finish. The problem with this is that
> xl_running_xacts does not ensure that it only logs transactions that are
> actually still running (to avoid locking PGPROC) so there might be xids
> in xl_running_xacts that already committed before it was logged.

 I don't think that's actually true?  Notice how LogStandbySnapshot()
 only releases the lock *after* the LogCurrentRunningXacts() iff
 wal_level >= WAL_LEVEL_LOGICAL.  So the explanation for the problem you
 observed must actually be a bit more complex :(

>>>
>>> Hmm, interesting, I did see the transaction commit in the WAL before the
>>> xl_running_xacts that contained the xid as running. I only seen it on
>>> production system though, didn't really manage to easily reproduce it
>>> locally.
>>
>> I suspect the reason for that is that RecordTransactionCommit() doesn't
>> conflict with ProcArrayLock in the first place - only
>> ProcArrayEndTransaction() does.  So they're still running in the PGPROC
>> sense, just not the crash-recovery sense...
>>
> 
> That looks like reasonable explanation. BTW I realized my patch needs
> bit more work, currently it will break the actual snapshot as it behaves
> same as if the xl_running_xacts was empty which is not correct AFAICS.
> 

Hi,

I got to work on this again. Unfortunately I haven't found solution that
I would be very happy with. What I did is in case we read
xl_running_xacts which has all transactions we track finished, we start
tracking from that new xl_running_xacts again with the difference that
we clean up the running transactions based on previously seen committed
ones. That means that on busy server we may wait for multiple
xl_running_xacts rather than just one, but at least we have chance to
finish unlike with current coding which basically waits for empty
xl_running_xacts. I also removed the additional locking for logical
wal_level in LogStandbySnapshot() since it does not work.

I also identified another bug in snapbuild while looking at the code.
That is the logical decoding will try to use on disk serialized snapshot
for initial snapshot export when it can. The problem is that these
snapshots are quite special and are not really usable as snapshots for
data (ie, the logical decoding snapshots regularly have xmax smaller
than xmin). So then when client tries to use this exported snapshot it
gets completely wrong data as the snapshot is broken. I think this is
explanation for Erik Rijker's problems with the initial COPY patch for
logical replication. At least for me the issues go away when I disable
use of the ondisk snapshots.

I didn't really find better solution than that though (disabling the use
of ondisk snapshots for initial consistent snapshot).

So to summarize attached patches:
0001 - Fixes performance issue where we build tons of snapshots that we
don't need which kills CPU.

0002 - Disables the use of ondisk historical snapshots for initial
consistent snapshot export as it may result in corrupt data. This
definitely needs backport.

0003 - Fixes bug where we might never reach snapshot on busy server due
to race condition in xl_running_xacts logging. The original use of extra
locking does not seem to be enough in practice. Once we have agreed fix
for this it's probably worth backpatching. There are still some comments
that need updating, this is more of a PoC.

Thoughts or better ideas?

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From e8d4dc52bc9dc7e5b17f4e374319fda229b19e61 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Tue, 21 Feb 2017 19:58:18 +0100
Subject: [PATCH 1/3] Skip unnecessary snapshot builds

When doing initial snapshot build during logical decoding
initialization, don't build snapshots for transactions where we know the
transaction didn't do any catalog changes. Otherwise we might end up
with thousands of useless snapshots on busy server which can be quite
expensive.
---
 src/backend/replication/logical/snapbuild.c | 82 +++--
 1 file changed, 53 insertions(+), 29 deletions(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index ed9f69f..c2476a9 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -972,6 +972,7 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 	

Re: [HACKERS] snapbuild woes

2016-12-12 Thread Petr Jelinek
On 12/12/16 23:33, Andres Freund wrote:
> On 2016-12-12 23:27:30 +0100, Petr Jelinek wrote:
>> On 12/12/16 22:42, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:
 Hi,
 First one is outright bug, which has to do with how we track running
 transactions. What snapbuild basically does while doing initial snapshot
 is read the xl_running_xacts record, store the list of running txes and
 then wait until they all finish. The problem with this is that
 xl_running_xacts does not ensure that it only logs transactions that are
 actually still running (to avoid locking PGPROC) so there might be xids
 in xl_running_xacts that already committed before it was logged.
>>>
>>> I don't think that's actually true?  Notice how LogStandbySnapshot()
>>> only releases the lock *after* the LogCurrentRunningXacts() iff
>>> wal_level >= WAL_LEVEL_LOGICAL.  So the explanation for the problem you
>>> observed must actually be a bit more complex :(
>>>
>>
>> Hmm, interesting, I did see the transaction commit in the WAL before the
>> xl_running_xacts that contained the xid as running. I only seen it on
>> production system though, didn't really manage to easily reproduce it
>> locally.
> 
> I suspect the reason for that is that RecordTransactionCommit() doesn't
> conflict with ProcArrayLock in the first place - only
> ProcArrayEndTransaction() does.  So they're still running in the PGPROC
> sense, just not the crash-recovery sense...
> 

That looks like reasonable explanation. BTW I realized my patch needs
bit more work, currently it will break the actual snapshot as it behaves
same as if the xl_running_xacts was empty which is not correct AFAICS.

Also if we did the approach suggested by my patch (ie using this
xmin/xmax comparison) I guess we wouldn't need to hold the lock for
extra time in wal_level logical anymore.

That is of course unless you think it should be approached from the
other side of the stream and try log correct xl_running_xacts.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2016-12-12 Thread Andres Freund
On 2016-12-12 23:27:30 +0100, Petr Jelinek wrote:
> On 12/12/16 22:42, Andres Freund wrote:
> > Hi,
> > 
> > On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:
> >> Hi,
> >> First one is outright bug, which has to do with how we track running
> >> transactions. What snapbuild basically does while doing initial snapshot
> >> is read the xl_running_xacts record, store the list of running txes and
> >> then wait until they all finish. The problem with this is that
> >> xl_running_xacts does not ensure that it only logs transactions that are
> >> actually still running (to avoid locking PGPROC) so there might be xids
> >> in xl_running_xacts that already committed before it was logged.
> > 
> > I don't think that's actually true?  Notice how LogStandbySnapshot()
> > only releases the lock *after* the LogCurrentRunningXacts() iff
> > wal_level >= WAL_LEVEL_LOGICAL.  So the explanation for the problem you
> > observed must actually be a bit more complex :(
> > 
> 
> Hmm, interesting, I did see the transaction commit in the WAL before the
> xl_running_xacts that contained the xid as running. I only seen it on
> production system though, didn't really manage to easily reproduce it
> locally.

I suspect the reason for that is that RecordTransactionCommit() doesn't
conflict with ProcArrayLock in the first place - only
ProcArrayEndTransaction() does.  So they're still running in the PGPROC
sense, just not the crash-recovery sense...

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2016-12-12 Thread Petr Jelinek
On 12/12/16 22:42, Andres Freund wrote:
> Hi,
> 
> On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:
>> Hi,
>> First one is outright bug, which has to do with how we track running
>> transactions. What snapbuild basically does while doing initial snapshot
>> is read the xl_running_xacts record, store the list of running txes and
>> then wait until they all finish. The problem with this is that
>> xl_running_xacts does not ensure that it only logs transactions that are
>> actually still running (to avoid locking PGPROC) so there might be xids
>> in xl_running_xacts that already committed before it was logged.
> 
> I don't think that's actually true?  Notice how LogStandbySnapshot()
> only releases the lock *after* the LogCurrentRunningXacts() iff
> wal_level >= WAL_LEVEL_LOGICAL.  So the explanation for the problem you
> observed must actually be a bit more complex :(
> 

Hmm, interesting, I did see the transaction commit in the WAL before the
xl_running_xacts that contained the xid as running. I only seen it on
production system though, didn't really manage to easily reproduce it
locally.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2016-12-12 Thread Andres Freund
Hi,

On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:
> Hi,
> First one is outright bug, which has to do with how we track running
> transactions. What snapbuild basically does while doing initial snapshot
> is read the xl_running_xacts record, store the list of running txes and
> then wait until they all finish. The problem with this is that
> xl_running_xacts does not ensure that it only logs transactions that are
> actually still running (to avoid locking PGPROC) so there might be xids
> in xl_running_xacts that already committed before it was logged.

I don't think that's actually true?  Notice how LogStandbySnapshot()
only releases the lock *after* the LogCurrentRunningXacts() iff
wal_level >= WAL_LEVEL_LOGICAL.  So the explanation for the problem you
observed must actually be a bit more complex :(

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2016-12-11 Thread Craig Ringer
On 12 December 2016 at 00:36, Kevin Grittner  wrote:
> On Sun, Dec 11, 2016 at 1:17 AM, Craig Ringer
>  wrote:
>> On 11 Dec. 2016 06:50, "Petr Jelinek"  wrote:
>>> On 10/12/16 23:10, Petr Jelinek wrote:
>>>
 The attached 0002-Skip-unnecessary-snapshot-builds.patch changes this
 behavior so that we don't make snapshots for transactions that we seen
 wholly and know that they didn't make catalog changes even if we didn't
 reach consistency yet.
>>>
>>> Eh, attached wrong patch. This one is correct.
>>
>> Attached no patch second time?
>
> I see an attachment, and it shows in the archives.
>
> https://www.postgresql.org/message-id/aee1d499-e3ca-e091-56da-1ee6a47741c8%402ndquadrant.com

Sorry for the noise, apparently my phone's mail client was being dense.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2016-12-11 Thread Kevin Grittner
On Sun, Dec 11, 2016 at 1:17 AM, Craig Ringer
 wrote:
> On 11 Dec. 2016 06:50, "Petr Jelinek"  wrote:
>> On 10/12/16 23:10, Petr Jelinek wrote:
>>
>>> The attached 0002-Skip-unnecessary-snapshot-builds.patch changes this
>>> behavior so that we don't make snapshots for transactions that we seen
>>> wholly and know that they didn't make catalog changes even if we didn't
>>> reach consistency yet.
>>
>> Eh, attached wrong patch. This one is correct.
>
> Attached no patch second time?

I see an attachment, and it shows in the archives.

https://www.postgresql.org/message-id/aee1d499-e3ca-e091-56da-1ee6a47741c8%402ndquadrant.com

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2016-12-10 Thread Craig Ringer
On 11 Dec. 2016 06:50, "Petr Jelinek"  wrote:

On 10/12/16 23:10, Petr Jelinek wrote:
>
> The attached 0002-Skip-unnecessary-snapshot-builds.patch changes this
> behavior so that we don't make snapshots for transactions that we seen
> wholly and know that they didn't make catalog changes even if we didn't
> reach consistency yet.
>

Eh, attached wrong patch. This one is correct.


Attached no patch second time?


Re: [HACKERS] snapbuild woes

2016-12-10 Thread Petr Jelinek
On 10/12/16 23:10, Petr Jelinek wrote:
> 
> The attached 0002-Skip-unnecessary-snapshot-builds.patch changes this
> behavior so that we don't make snapshots for transactions that we seen
> wholly and know that they didn't make catalog changes even if we didn't
> reach consistency yet.
>

Eh, attached wrong patch. This one is correct.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From 2add068ed38c2887e6652396c280695a7e384fe7 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Sat, 10 Dec 2016 22:22:13 +0100
Subject: [PATCH 2/2] Skip unnecessary snapshot builds

---
 src/backend/replication/logical/snapbuild.c | 82 +++--
 1 file changed, 53 insertions(+), 29 deletions(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 5836d52..ea3f40f 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -955,6 +955,7 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 	bool		forced_timetravel = false;
 	bool		sub_needs_timetravel = false;
 	bool		top_needs_timetravel = false;
+	bool		skip_forced_snapshot = false;
 
 	TransactionId xmax = xid;
 
@@ -976,10 +977,19 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 		/*
 		 * We could avoid treating !SnapBuildTxnIsRunning transactions as
 		 * timetravel ones, but we want to be able to export a snapshot when
-		 * we reached consistency.
+		 * we reached consistency so we need to keep track of them.
 		 */
 		forced_timetravel = true;
 		elog(DEBUG1, "forced to assume catalog changes for xid %u because it was running too early", xid);
+
+		/*
+		 * It is however desirable to skip building new snapshot for
+		 * !SnapBuildTxnIsRunning transactions as otherwise we might end up
+		 * building thousands of unused snapshots on busy servers which can
+		 * be very expensive.
+		 */
+		if (!SnapBuildTxnIsRunning(builder, xid))
+			skip_forced_snapshot = true;
 	}
 
 	for (nxact = 0; nxact < nsubxacts; nxact++)
@@ -992,21 +1002,10 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 		SnapBuildEndTxn(builder, lsn, subxid);
 
 		/*
-		 * If we're forcing timetravel we also need visibility information
-		 * about subtransaction, so keep track of subtransaction's state.
-		 */
-		if (forced_timetravel)
-		{
-			SnapBuildAddCommittedTxn(builder, subxid);
-			if (NormalTransactionIdFollows(subxid, xmax))
-xmax = subxid;
-		}
-
-		/*
 		 * Add subtransaction to base snapshot if it DDL, we don't distinguish
 		 * to toplevel transactions there.
 		 */
-		else if (ReorderBufferXidHasCatalogChanges(builder->reorder, subxid))
+		if (ReorderBufferXidHasCatalogChanges(builder->reorder, subxid))
 		{
 			sub_needs_timetravel = true;
 
@@ -1018,6 +1017,16 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 			if (NormalTransactionIdFollows(subxid, xmax))
 xmax = subxid;
 		}
+		/*
+		 * If we're forcing timetravel we also need visibility information
+		 * about subtransaction, so keep track of subtransaction's state.
+		 */
+		else if (forced_timetravel)
+		{
+			SnapBuildAddCommittedTxn(builder, subxid);
+			if (NormalTransactionIdFollows(subxid, xmax))
+xmax = subxid;
+		}
 	}
 
 	/*
@@ -1026,14 +1035,8 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 	 */
 	SnapBuildEndTxn(builder, lsn, xid);
 
-	if (forced_timetravel)
-	{
-		elog(DEBUG2, "forced transaction %u to do timetravel.", xid);
-
-		SnapBuildAddCommittedTxn(builder, xid);
-	}
 	/* add toplevel transaction to base snapshot */
-	else if (ReorderBufferXidHasCatalogChanges(builder->reorder, xid))
+	if (ReorderBufferXidHasCatalogChanges(builder->reorder, xid))
 	{
 		elog(DEBUG2, "found top level transaction %u, with catalog changes!",
 			 xid);
@@ -1046,10 +1049,18 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 		/* mark toplevel txn as timetravel as well */
 		SnapBuildAddCommittedTxn(builder, xid);
 	}
+	else if (forced_timetravel)
+	{
+		elog(DEBUG2, "forced transaction %u to do timetravel.", xid);
+
+		SnapBuildAddCommittedTxn(builder, xid);
+	}
 
 	/* if there's any reason to build a historic snapshot, do so now */
 	if (forced_timetravel || top_needs_timetravel || sub_needs_timetravel)
 	{
+		bool build_snapshot;
+
 		/*
 		 * Adjust xmax of the snapshot builder, we only do that for committed,
 		 * catalog modifying, transactions, everything else isn't interesting
@@ -1070,14 +1081,29 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 			return;
 
 		/*
-		 * Decrease the snapshot builder's refcount of the old snapshot, note
-		 * that it still will be used if it has been handed out to the
-		 * reorderbuffer earlier.
+		 * Build snapshot if needed. We need to build it if there isn't one
+		 * 

[HACKERS] snapbuild woes

2016-12-10 Thread Petr Jelinek
Hi,

I recently found couple of issues with the way initial logical decoding
snapshot is made.

First one is outright bug, which has to do with how we track running
transactions. What snapbuild basically does while doing initial snapshot
is read the xl_running_xacts record, store the list of running txes and
then wait until they all finish. The problem with this is that
xl_running_xacts does not ensure that it only logs transactions that are
actually still running (to avoid locking PGPROC) so there might be xids
in xl_running_xacts that already committed before it was logged. This in
turn means that the snapbuild will never make snapshot if the situation
occurs. The reason why this didn't bite us much yet is that the
snapbuild will consider all transactions finished if the empty
xl_running_xacts comes which usually happens after a while (but might
not on a consistently busy server).

The fix I came up with this is to extend the mechanism to also consider
all transactions finished when xl_running_xacts which has xmin bigger
then that previous xmax comes. This seems to work pretty well on busy
servers. This fix is attached as
0001-Mark-snapshot-consistent-when-all-running-txes-have.patch. I
believe it should be backpatched all the way to 9.4.


The other issue is performance problem again on busy servers with
initial snapshot. We track transactions for catalog modifications so
that we can do proper catalog time travel for decoding of changes. But
for transactions that were running while we started trying to get
initial consistent snapshot, there is no good way to tell if they did
catalog changes or not so we consider them all as catalog changing. We
make separate historical snapshot for every such transaction. This by
itself is fine, the problem is that current implementation also
considers all the transactions that started after we started watching
for changes but before we reached consistent state to also do catalog
changes even though there we actually do know if they did any catalog
change or not. In practice it means we make snapshots that are not
really necessary and if there was long running transaction for which the
snapshot builder has to wait for then we can create thousands of unused
snapshots which affects performance in bad ways (I've seen the initial
snapshot taking hour because of this).

The attached 0002-Skip-unnecessary-snapshot-builds.patch changes this
behavior so that we don't make snapshots for transactions that we seen
wholly and know that they didn't make catalog changes even if we didn't
reach consistency yet.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From 67e902eef33da9e241c079eaed9ab12a88616296 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Sat, 10 Dec 2016 21:56:44 +0100
Subject: [PATCH 1/2] Mark snapshot consistent when all running txes have
 finished.

---
 src/backend/replication/logical/snapbuild.c | 29 +
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 8b59fc5..5836d52 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1213,9 +1213,11 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
 	 * Build catalog decoding snapshot incrementally using information about
 	 * the currently running transactions. There are several ways to do that:
 	 *
-	 * a) There were no running transactions when the xl_running_xacts record
-	 *	  was inserted, jump to CONSISTENT immediately. We might find such a
-	 *	  state we were waiting for b) and c).
+	 * a) Either there were no running transactions when the xl_running_xacts
+	 *record was inserted (we might find this while waiting for b) or c))
+	 *or the running transactions we've been tracking have all finished
+	 *by the time the xl_running_xacts was inserted. We can jump to
+	 *CONSISTENT immediately.
 	 *
 	 * b) Wait for all toplevel transactions that were running to end. We
 	 *	  simply track the number of in-progress toplevel transactions and
@@ -1251,13 +1253,18 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
 	}
 
 	/*
-	 * a) No transaction were running, we can jump to consistent.
+	 * a) Either no transaction were running or we've been tracking running
+	 *transactions and the new snapshot has them all finished, we can
+	 *jump to consistent.
 	 *
-	 * NB: We might have already started to incrementally assemble a snapshot,
-	 * so we need to be careful to deal with that.
+	 * NB: Since we might have already started to incrementally assemble a
+	 * snapshot, we need to be careful to deal with that.
 	 */
-	if (running->xcnt == 0)
+	if (running->xcnt == 0 ||
+		(builder->running.xcnt > 0 &&
+		 running->oldestRunningXid >