Re: [HACKERS] Logical replication existing data copy

2017-03-30 Thread Erik Rijkers


(At the moment using these patches for tests:)
 0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch +
 0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch+
 0003-Prevent-snapshot-builder-xmin-from-going-backwards.patch  +
 0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patch  +
 0005-Skip-unnecessary-snapshot-builds.patch+
 and now (Tuesday 30) added :
 0001-Fix-remote-position-tracking-in-logical-replication.patch



I think what you have seen is because of this:
https://www.postgresql.org/message-id/flat/b235fa69-147a-5e09-f8f3-3f780a1ab...@2ndquadrant.com#b235fa69-147a-5e09-f8f3-3f780a1ab...@2ndquadrant.com



You were right: with that 6th patch (and wal_sender_timout back at its 
default 60s) there are no errors either (I tested on all 3 
test-machines).


I must have missed that last patch when you posted it.  Anyway all seems 
fine now; I hope the above patches can all be committed soon.


thanks,

Erik Rijkers



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-29 Thread Petr Jelinek
On 29/03/17 10:14, Erik Rijkers wrote:
> On 2017-03-09 11:06, Erik Rijkers wrote:

 I use three different machines (2 desktop, 1 server) to test logical
 replication, and all three have now at least once failed to correctly
 synchronise a pgbench session (amidst many succesful runs, of course)
>>>
> 
> 
> (At the moment using tese patches for tests:)
> 
>> 0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch +
>> 0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch+
>> 0003-Prevent-snapshot-builder-xmin-from-going-backwards.patch  +
>> 0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patch  +
>> 0005-Skip-unnecessary-snapshot-builds.patch+
> 
> 
> The failed tests that I kept seeing (see the
> pgbench-over-logical-replication tests upthread) were never really
> 'solved'.
> 
> 
> But I have now finally figured out what caused these unexpected failed
> tests: it was  wal_sender_timeout  or rather, its default of 60 s.
> 
> This caused 'terminating walsender process due to replication timeout'
> on the primary (not strictly an error), and the concomittant ERROR on
> the replica: 'could not receive data from WAL stream: server closed the
> connection unexpectedly'.
> 
> here is a typical example (primary/replica logs time-intertwined, with
> 'primary'):
> 
> [...]
> 2017-03-24 16:21:38.129 CET [15002]  primaryLOG:  using stale
> statistics instead of current ones because stats collector is not
> responding
> 2017-03-24 16:21:42.690 CET [27515]  primaryLOG:  using stale
> statistics instead of current ones because stats collector is not
> responding
> 2017-03-24 16:21:42.965 CET [14999]replica  LOG:  using stale
> statistics instead of current ones because stats collector is not
> responding
> 2017-03-24 16:21:49.816 CET [14930]  primaryLOG:  terminating
> walsender process due to
> 2017-03-24 16:21:49.817 CET [14926]replica  ERROR:  could not
> receive data from WAL stream: server closed the connection unexpectedly
> 2017-03-24 16:21:49.824 CET [27502]replica  LOG:  worker process:
> logical replication worker for subscription 24864 (PID 14926) exited
> with exit code 1
> 2017-03-24 16:21:49.824 CET [27521]replica  LOG:  starting logical
> replication worker for subscription "sub1"
> 2017-03-24 16:21:49.828 CET [15008]replica  LOG:  logical
> replication apply for subscription sub1 started
> 2017-03-24 16:21:49.832 CET [15009]  primaryLOG:  received
> replication command: IDENTIFY_SYSTEM
> 2017-03-24 16:21:49.832 CET [15009]  primaryLOG:  received
> replication command: START_REPLICATION SLOT "sub1" LOGICAL 3/FC976440
> (proto_version '1', publication_names '"pub1"')
> 2017-03-24 16:21:49.833 CET [15009]  primaryDETAIL:  streaming
> transactions committing after 3/FC889810, reading WAL from 3/FC820FC0
> 2017-03-24 16:21:49.833 CET [15009]  primaryLOG:  starting logical
> decoding for slot "sub1"
> 2017-03-24 16:21:50.471 CET [15009]  primaryDETAIL:  Logical
> decoding will begin using saved snapshot.
> 2017-03-24 16:21:50.471 CET [15009]  primaryLOG:  logical decoding
> found consistent point at 3/FC820FC0
> 2017-03-24 16:21:51.169 CET [15008]replica  DETAIL:  Key
> (hid)=(9014) already exists.
> 2017-03-24 16:21:51.169 CET [15008]replica  ERROR:  duplicate key
> value violates unique constraint "pgbench_history_pkey"
> 2017-03-24 16:21:51.170 CET [27502]replica  LOG:  worker process:
> logical replication worker for subscription 24864 (PID 15008) exited
> with exit code 1
> 2017-03-24 16:21:51.170 CET [27521]replica  LOG:  starting logical
> replication worker for subscription "sub1"
> [...]
> 
> My primary and replica were always on a single machine (making it more
> likely that that timeout is reached?).  In my testing it seems that
> reaching the timeout on the primary (and 'closing the connection
> unexpectedly' on the replica) does not necessarily break the logical
> replication.  But almost all log-rep failures that I have seen were
> started by this sequence of events.
> 
> After setting  wal_sender_timeout  to 3 minutes there were no more
> failed tests.
> 
> Perhaps it warrants setting  wal_sender_timeout  a bit higher than the
> current default of 60 seconds?  After all I also saw the 'replication
> timeout' / 'closed the connection' couple rather often during
> not-failing tests.  (These also disappeared, almost completely,  with a
> higher  setting of wal_sender_timeout)
> 
> In any case it would be good to mention the setting (and its potentially
> deteriorating effect) somehere nearer the logical replication treatment.
> 
> ( I read about wal_sender_timeout and keepalive ping, perhaps there's
> (still) something amiss there? Just a guess, I don't know )
> 
> As I said, I saw no more failures with the higher 3 minute setting, with
> one exception: the one test that straddled the DST change (saterday 24
> march 02:00 h).  I am happy to discount that one failure 

Re: [HACKERS] Logical replication existing data copy

2017-03-29 Thread Erik Rijkers

On 2017-03-09 11:06, Erik Rijkers wrote:


I use three different machines (2 desktop, 1 server) to test logical
replication, and all three have now at least once failed to correctly
synchronise a pgbench session (amidst many succesful runs, of course)





(At the moment using tese patches for tests:)


0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch +
0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch+
0003-Prevent-snapshot-builder-xmin-from-going-backwards.patch  +
0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patch  +
0005-Skip-unnecessary-snapshot-builds.patch+



The failed tests that I kept seeing (see the 
pgbench-over-logical-replication tests upthread) were never really 
'solved'.



But I have now finally figured out what caused these unexpected failed 
tests: it was  wal_sender_timeout  or rather, its default of 60 s.


This caused 'terminating walsender process due to replication timeout' 
on the primary (not strictly an error), and the concomittant ERROR on 
the replica: 'could not receive data from WAL stream: server closed the 
connection unexpectedly'.


here is a typical example (primary/replica logs time-intertwined, with 
'primary'):


[...]
2017-03-24 16:21:38.129 CET [15002]  primaryLOG:  using stale 
statistics instead of current ones because stats collector is not 
responding
2017-03-24 16:21:42.690 CET [27515]  primaryLOG:  using stale 
statistics instead of current ones because stats collector is not 
responding
2017-03-24 16:21:42.965 CET [14999]replica  LOG:  using stale 
statistics instead of current ones because stats collector is not 
responding
2017-03-24 16:21:49.816 CET [14930]  primaryLOG:  terminating 
walsender process due to
2017-03-24 16:21:49.817 CET [14926]replica  ERROR:  could not 
receive data from WAL stream: server closed the connection unexpectedly
2017-03-24 16:21:49.824 CET [27502]replica  LOG:  worker process: 
logical replication worker for subscription 24864 (PID 14926) exited 
with exit code 1
2017-03-24 16:21:49.824 CET [27521]replica  LOG:  starting logical 
replication worker for subscription "sub1"
2017-03-24 16:21:49.828 CET [15008]replica  LOG:  logical 
replication apply for subscription sub1 started
2017-03-24 16:21:49.832 CET [15009]  primaryLOG:  received 
replication command: IDENTIFY_SYSTEM
2017-03-24 16:21:49.832 CET [15009]  primaryLOG:  received 
replication command: START_REPLICATION SLOT "sub1" LOGICAL 3/FC976440 
(proto_version '1', publication_names '"pub1"')
2017-03-24 16:21:49.833 CET [15009]  primaryDETAIL:  streaming 
transactions committing after 3/FC889810, reading WAL from 3/FC820FC0
2017-03-24 16:21:49.833 CET [15009]  primaryLOG:  starting logical 
decoding for slot "sub1"
2017-03-24 16:21:50.471 CET [15009]  primaryDETAIL:  Logical 
decoding will begin using saved snapshot.
2017-03-24 16:21:50.471 CET [15009]  primaryLOG:  logical decoding 
found consistent point at 3/FC820FC0
2017-03-24 16:21:51.169 CET [15008]replica  DETAIL:  Key 
(hid)=(9014) already exists.
2017-03-24 16:21:51.169 CET [15008]replica  ERROR:  duplicate key 
value violates unique constraint "pgbench_history_pkey"
2017-03-24 16:21:51.170 CET [27502]replica  LOG:  worker process: 
logical replication worker for subscription 24864 (PID 15008) exited 
with exit code 1
2017-03-24 16:21:51.170 CET [27521]replica  LOG:  starting logical 
replication worker for subscription "sub1"

[...]

My primary and replica were always on a single machine (making it more 
likely that that timeout is reached?).  In my testing it seems that 
reaching the timeout on the primary (and 'closing the connection 
unexpectedly' on the replica) does not necessarily break the logical 
replication.  But almost all log-rep failures that I have seen were 
started by this sequence of events.


After setting  wal_sender_timeout  to 3 minutes there were no more 
failed tests.


Perhaps it warrants setting  wal_sender_timeout  a bit higher than the 
current default of 60 seconds?  After all I also saw the 'replication 
timeout' / 'closed the connection' couple rather often during 
not-failing tests.  (These also disappeared, almost completely,  with a 
higher  setting of wal_sender_timeout)


In any case it would be good to mention the setting (and its potentially 
deteriorating effect) somehere nearer the logical replication treatment.


( I read about wal_sender_timeout and keepalive ping, perhaps there's 
(still) something amiss there? Just a guess, I don't know )


As I said, I saw no more failures with the higher 3 minute setting, with 
one exception: the one test that straddled the DST change (saterday 24 
march 02:00 h).  I am happy to discount that one failure but strictly 
speaking I suppose it should be able to take DST into its stride.



Thanks,

Erik Rijkers











--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes 

Re: [HACKERS] Logical replication existing data copy

2017-03-24 Thread Peter Eisentraut
On 3/25/17 00:45, Mark Kirkwood wrote:
> Minor niggle:
> 
> bench=# DROP PUBLICATION pgbench;
> DROP STATISTICS   <===
> 
> I'm guessing that notification is wrong.

Fixed.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-24 Thread Mark Kirkwood


On 25/03/17 07:52, Peter Eisentraut wrote:

On 3/24/17 10:09, Petr Jelinek wrote:

On 24/03/17 15:05, Peter Eisentraut wrote:

On 3/23/17 19:32, Petr Jelinek wrote:

Yes, I also forgot to check if the table actually exists on subscriber
when fetching them in CREATE SUBSCRIPTION (we have check during
replication but not there).

I think for this we can probably just change the missing_ok argument of
RangeVarGetRelid() to false.

Unless we want the custom error message, in which case we need to change
AlterSubscription_refresh(), because right now it errors out because
missing_ok = false.


You are right, stupid me.

Committed this version.



Minor niggle:

bench=# DROP PUBLICATION pgbench;
DROP STATISTICS   <===

I'm guessing that notification is wrong.

regards

Mark





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-24 Thread Michael Banck
Am Freitag, den 24.03.2017, 14:57 -0400 schrieb Peter Eisentraut:
> On 3/24/17 05:22, Michael Banck wrote:
> > However, replication also seems to not work, I'm using the following
> > script right now:
> 
> The problem is that your publication does not include any tables.

Oops, of course. That part must've got lost in a copy-paste or when I
tried to dig further why the CREATE SUBSCRIPTION segfaulted, sorry for
the noise.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-24 Thread Peter Eisentraut
On 3/24/17 05:22, Michael Banck wrote:
> However, replication also seems to not work, I'm using the following
> script right now:

The problem is that your publication does not include any tables.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-24 Thread Peter Eisentraut
On 3/24/17 10:09, Petr Jelinek wrote:
> On 24/03/17 15:05, Peter Eisentraut wrote:
>> On 3/23/17 19:32, Petr Jelinek wrote:
>>> Yes, I also forgot to check if the table actually exists on subscriber
>>> when fetching them in CREATE SUBSCRIPTION (we have check during
>>> replication but not there).
>>
>> I think for this we can probably just change the missing_ok argument of
>> RangeVarGetRelid() to false.
>>
>> Unless we want the custom error message, in which case we need to change
>> AlterSubscription_refresh(), because right now it errors out because
>> missing_ok = false.
>>
> 
> You are right, stupid me.

Committed this version.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-24 Thread Petr Jelinek
On 24/03/17 15:05, Peter Eisentraut wrote:
> On 3/23/17 19:32, Petr Jelinek wrote:
>> Yes, I also forgot to check if the table actually exists on subscriber
>> when fetching them in CREATE SUBSCRIPTION (we have check during
>> replication but not there).
> 
> I think for this we can probably just change the missing_ok argument of
> RangeVarGetRelid() to false.
> 
> Unless we want the custom error message, in which case we need to change
> AlterSubscription_refresh(), because right now it errors out because
> missing_ok = false.
> 

You are right, stupid me.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
>From 81c7750095cfac35fad9a841309b2c5501e59a62 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Fri, 24 Mar 2017 00:24:47 +0100
Subject: [PATCH 2/2] Check that published table exists on subscriber

---
 src/backend/commands/subscriptioncmds.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 8f201b2..efe70b4 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -416,7 +416,7 @@ CreateSubscription(CreateSubscriptionStmt *stmt, bool isTopLevel)
 RangeVar   *rv = (RangeVar *) lfirst(lc);
 Oid			relid;
 
-relid = RangeVarGetRelid(rv, AccessShareLock, true);
+relid = RangeVarGetRelid(rv, AccessShareLock, false);
 
 SetSubscriptionRelState(subid, relid, table_state,
 		InvalidXLogRecPtr);
-- 
2.7.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-24 Thread Peter Eisentraut
On 3/23/17 19:32, Petr Jelinek wrote:
> Yes, I also forgot to check if the table actually exists on subscriber
> when fetching them in CREATE SUBSCRIPTION (we have check during
> replication but not there).

I think for this we can probably just change the missing_ok argument of
RangeVarGetRelid() to false.

Unless we want the custom error message, in which case we need to change
AlterSubscription_refresh(), because right now it errors out because
missing_ok = false.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-24 Thread Petr Jelinek
On 24/03/17 11:23, Erik Rijkers wrote:
> On 2017-03-24 10:45, Mark Kirkwood wrote:
>>
>> However one minor observation - as Michael Banck noted - the elapsed
>> time for slave to catch up after running:
>>
>> $ pgbench -c8 -T600 bench
>>
>> on the master was (subjectively) much longer than for physical
>> streaming replication. Is this expected?
>>
> 
> I think you probably want to do (on the slave) :
> 
>   alter role  set synchronous_commit = off;
> 
> otherwise it's indeed extremely slow.
> 

Yes that might be one reason, see
https://www.postgresql.org/message-id/08b7053b-5679-0733-3af7-00b8cde62...@2ndquadrant.com
(although it probably needs rebase now)

Also don't forget that the snapbuild performance fixes haven't been
committed yet so it can take long time to even start.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-24 Thread Erik Rijkers

On 2017-03-24 10:45, Mark Kirkwood wrote:


However one minor observation - as Michael Banck noted - the elapsed
time for slave to catch up after running:

$ pgbench -c8 -T600 bench

on the master was (subjectively) much longer than for physical
streaming replication. Is this expected?



I think you probably want to do (on the slave) :

  alter role  set synchronous_commit = off;

otherwise it's indeed extremely slow.







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-24 Thread Mark Kirkwood

On 24/03/17 12:32, Petr Jelinek wrote:


On 24/03/17 00:14, Mark Kirkwood wrote:

On 24/03/17 02:00, Peter Eisentraut wrote:

On 3/21/17 21:38, Peter Eisentraut wrote:

This patch is looking pretty good to me, modulo the failing pg_dump
tests.

Attached is a fixup patch.  I have mainly updated some comments and
variable naming for (my) clarity.  No functional changes.

Committed all that.


Testing now this patch is in, I'm unable to create a subscription:

(master)

bench=# CREATE PUBLICATION pgbench
 FOR TABLE pgbench_accounts , pgbench_branches,
   pgbench_tellers, pgbench_history
 WITH (PUBLISH INSERT, PUBLISH UPDATE, PUBLISH DELETE);

(slave)

bench=# CREATE SUBSCRIPTION pgbench
 CONNECTION 'port=5447 user=postgres dbname=bench'
 PUBLICATION pgbench
 WITH (COPY DATA);
ERROR:  duplicate key value violates unique constraint
"pg_subscription_rel_srrelid_srsubid_index"
DETAIL:  Key (srrelid, srsubid)=(0, 16389) already exists.

This is a pair of freshly initdb'ed instances, the master has a size 100
pgbench schema.

I'm guessing this is a different bug from the segfault also reported


Yes, I also forgot to check if the table actually exists on subscriber
when fetching them in CREATE SUBSCRIPTION (we have check during
replication but not there).

Attached patches should fix both issues.




Yep, does seem to.

I note that (probably intensional) specifying 'COPY DATA' does not 
'CREATE TABLES' for you...ok, I probably didn't read ...something 
somewhere...but anyway, after creating the tables it all seems to work. 
Nice.


However one minor observation - as Michael Banck noted - the elapsed 
time for slave to catch up after running:


$ pgbench -c8 -T600 bench

on the master was (subjectively) much longer than for physical streaming 
replication. Is this expected?


regards

Mark




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-24 Thread Michael Banck
Hi,

On Fri, Mar 24, 2017 at 12:32:28AM +0100, Petr Jelinek wrote:
> Yes, I also forgot to check if the table actually exists on subscriber
> when fetching them in CREATE SUBSCRIPTION (we have check during
> replication but not there).
> 
> Attached patches should fix both issues.

I no longer get a segfault, thanks.

However, replication also seems to not work, I'm using the following
script right now:

--8<--
#!/bin/sh

set -x

#rm -rf data_* pg*.log

initdb --pgdata=data_pg1 1> /dev/null 2>&1
sed -i -e 's/^#wal_level.=.replica/wal_level = logical/'
data_pg1/postgresql.conf
pg_ctl --pgdata=data_pg1 -l pg1.log start 1> /dev/null
psql -c "CREATE TABLE t1(id int);"
pg_basebackup --pgdata=data_pg2
sed -i -e 's/^#port.=.5432/port = 5433/' data_pg2/postgresql.conf
psql -c "INSERT INTO t1 VALUES(1);"
pg_ctl --pgdata=data_pg2 -l pg2.log start 1> /dev/null
psql -c "CREATE PUBLICATION pub1;"
psql --port=5433 -c "CREATE SUBSCRIPTION sub1 CONNECTION
'dbname=postgres' PUBLICATION pub1 WITH (COPY DATA);"
sleep 6
psql -c 'SELECT application_name, state FROM pg_stat_replication;'
psql --port=5433 -c "SELECT COUNT(*) FROM t1;"
--8<--

(I had to add the sleep 6 - 5 wasn't always enough - in order to get the
subcription from 'catchup' to 'streaming' which was a bit surprising to
me)

This is the output:

--8<--
+ initdb --pgdata=data_pg1
+ sed -i -e s/^#wal_level.=.replica/wal_level = logical/
data_pg1/postgresql.conf
+ pg_ctl --pgdata=data_pg1 -l pg1.log start
+ psql -c CREATE TABLE t1(id int);
CREATE TABLE
+ pg_basebackup --pgdata=data_pg2
+ sed -i -e s/^#port.=.5432/port = 5433/ data_pg2/postgresql.conf
+ psql -c INSERT INTO t1 VALUES(1);
INSERT 0 1
+ pg_ctl --pgdata=data_pg2 -l pg2.log start
+ psql -c CREATE PUBLICATION pub1;
CREATE PUBLICATION
+ psql --port=5433 -c CREATE SUBSCRIPTION sub1 CONNECTION
'dbname=postgres' PUBLICATION pub1 WITH (COPY DATA);
NOTICE:  created replication slot "sub1" on publisher
NOTICE:  synchronized table states
CREATE SUBSCRIPTION
+ sleep 6
+ psql -c SELECT application_name, state FROM pg_stat_replication;
 application_name |   state   
--+---
 sub1 | streaming
(1 row)

+ psql --port=5433 -c SELECT COUNT(*) FROM t1;
 count 
---
 0
(1 row)
--8<--

I would have assumed the inserted value to be visible on the standby.
If I insert further values, don't show up, either.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-23 Thread Petr Jelinek
On 24/03/17 00:14, Mark Kirkwood wrote:
> On 24/03/17 02:00, Peter Eisentraut wrote:
>> On 3/21/17 21:38, Peter Eisentraut wrote:
>>> This patch is looking pretty good to me, modulo the failing pg_dump
>>> tests.
>>>
>>> Attached is a fixup patch.  I have mainly updated some comments and
>>> variable naming for (my) clarity.  No functional changes.
>> Committed all that.
>>
> 
> Testing now this patch is in, I'm unable to create a subscription:
> 
> (master)
> 
> bench=# CREATE PUBLICATION pgbench
> FOR TABLE pgbench_accounts , pgbench_branches,
>   pgbench_tellers, pgbench_history
> WITH (PUBLISH INSERT, PUBLISH UPDATE, PUBLISH DELETE);
> 
> (slave)
> 
> bench=# CREATE SUBSCRIPTION pgbench
> CONNECTION 'port=5447 user=postgres dbname=bench'
> PUBLICATION pgbench
> WITH (COPY DATA);
> ERROR:  duplicate key value violates unique constraint
> "pg_subscription_rel_srrelid_srsubid_index"
> DETAIL:  Key (srrelid, srsubid)=(0, 16389) already exists.
> 
> This is a pair of freshly initdb'ed instances, the master has a size 100
> pgbench schema.
> 
> I'm guessing this is a different bug from the segfault also reported
> 

Yes, I also forgot to check if the table actually exists on subscriber
when fetching them in CREATE SUBSCRIPTION (we have check during
replication but not there).

Attached patches should fix both issues.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
>From cdb2590e8b42d0dec4c00d8ae7a6affa2bb41372 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Fri, 24 Mar 2017 00:24:47 +0100
Subject: [PATCH 2/2] Check that published table exists on subscriber

---
 src/backend/commands/subscriptioncmds.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 8f201b2..ad11610 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -417,6 +417,11 @@ CreateSubscription(CreateSubscriptionStmt *stmt, bool isTopLevel)
 Oid			relid;
 
 relid = RangeVarGetRelid(rv, AccessShareLock, true);
+if (relid == InvalidOid)
+	ereport(ERROR,
+			(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+			 errmsg("published table \"%s.%s\" not found on subscriber",
+	rv->schemaname, rv->relname)));
 
 SetSubscriptionRelState(subid, relid, table_state,
 		InvalidXLogRecPtr);
@@ -513,6 +518,12 @@ AlterSubscription_refresh(Subscription *sub, bool copy_data)
 		Oid			relid;
 
 		relid = RangeVarGetRelid(rv, AccessShareLock, false);
+		if (relid == InvalidOid)
+			ereport(ERROR,
+	(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+	 errmsg("published table \"%s.%s\" not found on subscriber",
+			rv->schemaname, rv->relname)));
+
 		pubrel_local_oids[off++] = relid;
 
 		if (!bsearch(, subrel_local_oids,
-- 
2.7.4

>From e492ead4b8036f03b4624093d64280aa6ea23129 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Thu, 23 Mar 2017 23:39:25 +0100
Subject: [PATCH 1/2] Always return tupleslot and tupledesc from libpqrcv_exec

---
 src/backend/replication/libpqwalreceiver/libpqwalreceiver.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 4dd8eef..9d7bb25 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -803,10 +803,6 @@ libpqrcv_processTuples(PGresult *pgres, WalRcvExecResult *walres,
 	MemoryContext	rowcontext;
 	MemoryContext	oldcontext;
 
-	/* No point in doing anything here if there were no tuples returned. */
-	if (PQntuples(pgres) == 0)
-		return;
-
 	/* Make sure we got expected number of fields. */
 	if (nfields != nRetTypes)
 		ereport(ERROR,
@@ -824,6 +820,10 @@ libpqrcv_processTuples(PGresult *pgres, WalRcvExecResult *walres,
 		   PQfname(pgres, coln), retTypes[coln], -1, 0);
 	attinmeta = TupleDescGetAttInMetadata(walres->tupledesc);
 
+	/* No point in doing more here if there were no tuples returned. */
+	if (PQntuples(pgres) == 0)
+		return;
+
 	/* Create temporary context for local allocations. */
 	rowcontext = AllocSetContextCreate(CurrentMemoryContext,
 	   "libpqrcv query result context",
-- 
2.7.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-23 Thread Mark Kirkwood

On 24/03/17 02:00, Peter Eisentraut wrote:

On 3/21/17 21:38, Peter Eisentraut wrote:

This patch is looking pretty good to me, modulo the failing pg_dump tests.

Attached is a fixup patch.  I have mainly updated some comments and
variable naming for (my) clarity.  No functional changes.

Committed all that.



Testing now this patch is in, I'm unable to create a subscription:

(master)

bench=# CREATE PUBLICATION pgbench
FOR TABLE pgbench_accounts , pgbench_branches,
  pgbench_tellers, pgbench_history
WITH (PUBLISH INSERT, PUBLISH UPDATE, PUBLISH DELETE);

(slave)

bench=# CREATE SUBSCRIPTION pgbench
CONNECTION 'port=5447 user=postgres dbname=bench'
PUBLICATION pgbench
WITH (COPY DATA);
ERROR:  duplicate key value violates unique constraint 
"pg_subscription_rel_srrelid_srsubid_index"

DETAIL:  Key (srrelid, srsubid)=(0, 16389) already exists.

This is a pair of freshly initdb'ed instances, the master has a size 100 
pgbench schema.


I'm guessing this is a different bug from the segfault also reported


regards


Mark




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-23 Thread Petr Jelinek
Hi,

On 23/03/17 23:06, Michael Banck wrote:
> Hi,
> 
> On Thu, Mar 23, 2017 at 09:00:16AM -0400, Peter Eisentraut wrote:
>> On 3/21/17 21:38, Peter Eisentraut wrote:
>>> This patch is looking pretty good to me, modulo the failing pg_dump tests.
>>>
>>> Attached is a fixup patch.  I have mainly updated some comments and
>>> variable naming for (my) clarity.  No functional changes.
>>
>> Committed all that.
> 
> Maybe I'm doing something wrong, but I'm getting a segfault trying to
> start logical replication since that patch went in.
> 

Thanks for report. Looks like we don't handle correctly empty result set
when fetching table list. Will post fix tomorrow.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-23 Thread Michael Banck
Hi,

On Thu, Mar 23, 2017 at 09:00:16AM -0400, Peter Eisentraut wrote:
> On 3/21/17 21:38, Peter Eisentraut wrote:
> > This patch is looking pretty good to me, modulo the failing pg_dump tests.
> > 
> > Attached is a fixup patch.  I have mainly updated some comments and
> > variable naming for (my) clarity.  No functional changes.
> 
> Committed all that.

Maybe I'm doing something wrong, but I'm getting a segfault trying to
start logical replication since that patch went in.

I've blown away my build tree and started over with this minimal
example, any obvious mistakes here?  Quoting the publisher/subscriber
names doesn't seem to change things:

$ initdb --pgdata=data_pg1
$ sed -i -e 's/^#wal_level.=.replica/wal_level = logical/' 
data_pg1/postgresql.conf
$ pg_ctl --pgdata=data_pg1 -l pg1.log start
$ pg_basebackup --pgdata=data_pg2
$ sed -i -e 's/^#port.=.5432/port = 5433/' data_pg2/postgresql.conf
$ pg_ctl --pgdata=data_pg2 -l pg2.log start
$ psql -c "CREATE PUBLICATION pub1;"
$ psql --port=5433 -c "CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=postgres' 
PUBLICATION pub1;"

Backtrace:

Program received signal SIGSEGV, Segmentation fault.
ExecSetSlotDescriptor (slot=slot@entry=0xfc4d38, tupdesc=tupdesc@entry=0x0)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/executor/execTuples.c:269
269 PinTupleDesc(tupdesc);
(gdb) bt
#0  ExecSetSlotDescriptor (slot=slot@entry=0xfc4d38, tupdesc=tupdesc@entry=0x0)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/executor/execTuples.c:269
#1  0x005dc4fc in MakeSingleTupleTableSlot (tupdesc=0x0)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/executor/execTuples.c:203
#2  0x005a16ff in fetch_table_list (wrconn=wrconn@entry=0xc0030001,
publications=publications@entry=0xffb448)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/commands/subscriptioncmds.c:996
#3  0x005a1efa in CreateSubscription (stmt=0x101cd40, 
isTopLevel=)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/commands/subscriptioncmds.c:396
#4  0x006ecf96 in ProcessUtilitySlow (pstate=0x0, 
pstate@entry=0xfc4818, pstmt=0x0,
pstmt@entry=0x101d080, queryString=0xf9d248 ,
queryString@entry=0x101c0b8 "CREATE SUBSCRIPTION \"sub1\" CONNECTION 
'dbname=postgres' PUBLICATION pub1 WITH (COPY DATA);", context=(unknown: 
16534992), context@entry=PROCESS_UTILITY_TOPLEVEL,
params=0x7ffd7e6263d0, params@entry=0x0,
completionTag=0x45 ,
completionTag@entry=0x7ffd7e626b00 "", dest=)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/tcop/utility.c:1612
#5  0x006ec4e9 in standard_ProcessUtility (pstmt=0x101d080,
queryString=0x101c0b8 "CREATE SUBSCRIPTION \"sub1\" CONNECTION 
'dbname=postgres' PUBLICATION pub1 WITH (COPY DATA);", 
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, dest=0x101d160,
completionTag=0x7ffd7e626b00 "")
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/tcop/utility.c:922
#6  0x006e9f5f in PortalRunUtility (portal=0xfbb9d8, pstmt=0x101d080,
isTopLevel=, setHoldSnapshot=, 
dest=,
completionTag=0x7ffd7e626b00 "")
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/tcop/pquery.c:1165
#7  0x006ea977 in PortalRunMulti (portal=portal@entry=0xfbb9d8,
isTopLevel=isTopLevel@entry=1 '\001', 
setHoldSnapshot=setHoldSnapshot@entry=0 '\000',
dest=dest@entry=0x101d160, altdest=altdest@entry=0x101d160,
completionTag=completionTag@entry=0x7ffd7e626b00 "")
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/tcop/pquery.c:1315
#8  0x006eb4cb in PortalRun (portal=portal@entry=0xfbb9d8,
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001',
dest=dest@entry=0x101d160, altdest=altdest@entry=0x101d160,
completionTag=completionTag@entry=0x7ffd7e626b00 "")
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/tcop/pquery.c:788
#9  0x006e7868 in exec_simple_query (
query_string=0x101c0b8 "CREATE SUBSCRIPTION \"sub1\" CONNECTION 
'dbname=postgres' PUBLICATION pub1 WITH (COPY DATA);")
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/tcop/postgres.c:1101
#10 0x006e94a9 in PostgresMain (argc=1, argv=0x101c0b8, dbname=0xfc8478 
"postgres",
username=0xfc8458 "postgres")
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/tcop/postgres.c:4069
#11 0x004770ad in BackendRun (port=0xfc2970)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/postmaster/postmaster.c:4317
#12 BackendStartup (port=0xfc2970)
at 
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/backend/postmaster/postmaster.c:3989
#13 ServerLoop ()
at 

Re: [HACKERS] Logical replication existing data copy

2017-03-23 Thread Peter Eisentraut
On 3/21/17 21:38, Peter Eisentraut wrote:
> This patch is looking pretty good to me, modulo the failing pg_dump tests.
> 
> Attached is a fixup patch.  I have mainly updated some comments and
> variable naming for (my) clarity.  No functional changes.

Committed all that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-22 Thread Petr Jelinek
On 22/03/17 13:36, Peter Eisentraut wrote:
> On 3/20/17 16:08, Peter Eisentraut wrote:
>> The current patch causes a failure in the pg_dump tests, because the
>> generated CREATE SUBSCRIPTION commands make connection attempts that
>> don't work.  We have the pg_dump option --no-create-subscription-slots
>> for this, but I suppose we should expand that to
>> --no-subscription-connect and use the new NOCONNECT option instead.
> 
> Small top-up patch to accomplish that.
> 

Works for me. I'd maybe explicitly mention in the docs for
---no-subscription-connect that it dumps subscriptions as CREATE
SUBSCRIPTION ... WITH(NOCONNECT).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-22 Thread Peter Eisentraut
On 3/20/17 16:08, Peter Eisentraut wrote:
> The current patch causes a failure in the pg_dump tests, because the
> generated CREATE SUBSCRIPTION commands make connection attempts that
> don't work.  We have the pg_dump option --no-create-subscription-slots
> for this, but I suppose we should expand that to
> --no-subscription-connect and use the new NOCONNECT option instead.

Small top-up patch to accomplish that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0001-fixup-Logical-replication-support-for-initial-data-c.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-21 Thread Peter Eisentraut
This patch is looking pretty good to me, modulo the failing pg_dump tests.

Attached is a fixup patch.  I have mainly updated some comments and
variable naming for (my) clarity.  No functional changes.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 04f772f350773cce9890386c4a5924ee251ebbe6 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 21 Mar 2017 15:38:39 -0400
Subject: [PATCH] fixup! Logical replication support for initial data copy

---
 src/backend/catalog/pg_subscription.c  |   6 +-
 src/backend/commands/subscriptioncmds.c|   2 +-
 .../libpqwalreceiver/libpqwalreceiver.c|   9 +-
 src/backend/replication/logical/tablesync.c| 190 +++--
 src/backend/replication/logical/worker.c   |  22 +--
 src/include/replication/logical.h  |   2 -
 src/include/replication/worker_internal.h  |   8 +-
 src/test/regress/expected/subscription.out |   2 -
 src/test/regress/sql/subscription.sql  |   2 -
 src/test/subscription/t/001_rep_changes.pl |   6 +-
 src/test/subscription/t/004_sync.pl|   2 +-
 11 files changed, 127 insertions(+), 124 deletions(-)

diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index 9b74892548..e420ec14d2 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -321,10 +321,8 @@ GetSubscriptionRelState(Oid subid, Oid relid, XLogRecPtr *sublsn,
 			return SUBREL_STATE_UNKNOWN;
 		}
 
-		ereport(ERROR,
-(errcode(ERRCODE_UNDEFINED_OBJECT),
- errmsg("subscription table %u in subscription %u does not exist",
-		relid, subid)));
+		elog(ERROR, "subscription table %u in subscription %u does not exist",
+			 relid, subid);
 	}
 
 	/* Get the state. */
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index cba2d5c085..0784ca7951 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -992,7 +992,7 @@ fetch_table_list(WalReceiverConn *wrconn, List *publications)
 (errmsg("could not receive list of replicated tables from the publisher: %s",
 		res->err)));
 
-	/* Proccess tables. */
+	/* Process tables. */
 	slot = MakeSingleTupleTableSlot(res->tupledesc);
 	while (tuplestore_gettupleslot(res->tuplestore, true, false, slot))
 	{
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 3176182523..4dd8eef1f9 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -792,13 +792,12 @@ libpqrcv_create_slot(WalReceiverConn *conn, const char *slotname,
  * Convert tuple query result to tuplestore.
  */
 static void
-libpqrcv_proccessTuples(PGresult *pgres, WalRcvExecResult *walres,
+libpqrcv_processTuples(PGresult *pgres, WalRcvExecResult *walres,
 		const int nRetTypes, const Oid *retTypes)
 {
 	int		tupn;
 	int		coln;
 	int		nfields = PQnfields(pgres);
-	char   *cstrs[MaxTupleAttributeNumber];
 	HeapTuple		tuple;
 	AttInMetadata  *attinmeta;
 	MemoryContext	rowcontext;
@@ -830,9 +829,11 @@ libpqrcv_proccessTuples(PGresult *pgres, WalRcvExecResult *walres,
 	   "libpqrcv query result context",
 	   ALLOCSET_DEFAULT_SIZES);
 
-	/* Proccess returned rows. */
+	/* Process returned rows. */
 	for (tupn = 0; tupn < PQntuples(pgres); tupn++)
 	{
+		char   *cstrs[MaxTupleAttributeNumber];
+
 		CHECK_FOR_INTERRUPTS();
 
 		/* Do the allocations in temporary context. */
@@ -885,7 +886,7 @@ libpqrcv_exec(WalReceiverConn *conn, const char *query,
 		case PGRES_SINGLE_TUPLE:
 		case PGRES_TUPLES_OK:
 			walres->status = WALRCV_OK_TUPLES;
-			libpqrcv_proccessTuples(pgres, walres, nRetTypes, retTypes);
+			libpqrcv_processTuples(pgres, walres, nRetTypes, retTypes);
 			break;
 
 		case PGRES_COPY_IN:
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index 6c67a5ea9f..3e16b0d576 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -36,7 +36,7 @@
  *		  - if the apply is in front of the sync in the wal stream the new
  *			state is set to CATCHUP and apply loops until the sync process
  *			catches up to the same LSN as apply
- *		  - if the sync if in front of the apply in the wal stream the new
+ *		  - if the sync is in front of the apply in the wal stream the new
  *			state is set to SYNCDONE
  *		  - if both apply and sync are at the same position in the wal stream
  *			the state of the table is set to READY
@@ -104,7 +104,6 @@
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 
-static List *table_states = NIL;
 static bool table_states_valid = false;
 
 StringInfo	copybuf = 

Re: [HACKERS] Logical replication existing data copy

2017-03-21 Thread Peter Eisentraut
On 3/20/17 19:54, Petr Jelinek wrote:
> Here is fixed version, also rebased on top of all the changes to pg_dump
> tests. Subscriptions are dumped unless --no-subscriptions is specified.

The problem with that is that pg_dump will now fail for unprivileged
users.  That's a separate problem to solve.  I suggest not overloading
this patch with that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-20 Thread Stephen Frost
Peter,

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> The current patch causes a failure in the pg_dump tests, because the
> generated CREATE SUBSCRIPTION commands make connection attempts that
> don't work.  We have the pg_dump option --no-create-subscription-slots
> for this, but I suppose we should expand that to
> --no-subscription-connect and use the new NOCONNECT option instead.

I'll admit that I've not followed this very closely, so perhaps I'm
misunderstanding, but I thought our prior discussion lead to the idea
that pg_dump would always dump out subscriptions with 'NOCONNECT', so
that a restore wouldn't automatically start trying to connect to another
system.

In which case, I don't quite understand why we need a
"--no-subscription-connect" option.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Logical replication existing data copy

2017-03-20 Thread Peter Eisentraut
The current patch causes a failure in the pg_dump tests, because the
generated CREATE SUBSCRIPTION commands make connection attempts that
don't work.  We have the pg_dump option --no-create-subscription-slots
for this, but I suppose we should expand that to
--no-subscription-connect and use the new NOCONNECT option instead.

I'm a bit puzzled at the moment how it worked before, because the
pg_dump test suite does not use that option.  But some change like that
is probably useful anyway.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-14 Thread Peter Eisentraut
On 3/14/17 12:12, Petr Jelinek wrote:
>> Committing this in chunks makes sense anyway.
>>
>> I've attached a slightly version that makes pg_recvlogical skip slot
>> export. The second patch is unchanged, use the copy from the
>> immediately prior mail.
>>
> 
> Okay, I merged your changes in.
> 
> Here is that merge, plus the updated main patch that replaces all the
> SQL interaces in libpqwalreceiver.c with single exec one which uses
> tuplestore to send any tuple results back.
> 
> For correct working in every situation, it still needs to snapbuild
> fixes and also the logical replication origin tracking fix.

Committed 0001.  Will look at 0002 next.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-12 Thread Craig Ringer
On 11 March 2017 at 00:33, Petr Jelinek  wrote:
> On 09/03/17 18:48, Peter Eisentraut wrote:
>> On 3/6/17 05:27, Petr Jelinek wrote:
>>> And lastly I changed the automagic around exporting, not exporting and
>>> using the snapshot produced by CREATE_REPLICATION_SLOT into explicit
>>> parameter for the CREATE_REPLICATION_SLOT command (and added simple
>>> framework for adding more of those if needed in the future).
>>
>> It might be useful to make this into a separate patch, for clarity.
>>
>
> Okay here it is with this part split out. The first patch also help with
> Craig's logical decoding on standby so it definitely makes sense to be
> split.

Greatly appreciated.

Committing this in chunks makes sense anyway.

I've attached a slightly version that makes pg_recvlogical skip slot
export. The second patch is unchanged, use the copy from the
immediately prior mail.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 02af255a84736ac2783705055d6e998e476359af Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Thu, 9 Mar 2017 14:20:28 +0100
Subject: [PATCH 1/4] Add option to control snapshot export to
 CREATE_REPLICATION_SLOT

We used to export snapshots unconditionally in CREATE_REPLICATION_SLOT
in the replication protocol, but several upcoming patches want more control
over what happens with the slot.

This means that when we allow creation of replication slots on standbys, which
cannot export snapshots because they cannot allocate new XIDs, we don't have to
silently omit the snapshot creation.

It also allows clients like pg_recvlogical, which neither need nor can use the
exported snapshot, to suppress its creation. Since snapshot exporting can fail
this improves reliability.
---
 doc/src/sgml/logicaldecoding.sgml  | 13 +++--
 doc/src/sgml/protocol.sgml | 16 +-
 src/backend/commands/subscriptioncmds.c|  6 ++-
 .../libpqwalreceiver/libpqwalreceiver.c| 15 --
 src/backend/replication/repl_gram.y| 43 
 src/backend/replication/repl_scanner.l |  2 +
 src/backend/replication/walsender.c| 58 --
 src/bin/pg_basebackup/streamutil.c |  5 ++
 src/include/nodes/replnodes.h  |  2 +-
 src/include/replication/walreceiver.h  |  6 +--
 10 files changed, 140 insertions(+), 26 deletions(-)

diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 03c2c69..2b7d6e9 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -268,11 +268,12 @@ $ pg_recvlogical -d postgres --slot test --drop-slot
 

 
-   
+   
 Exported Snapshots
 
- When a new replication slot is created using the streaming replication interface,
- a snapshot is exported
+ When a new replication
+ slot is created using the streaming replication interface, a snapshot
+ is exported
  (see ), which will show
  exactly the state of the database after which all changes will be
  included in the change stream. This can be used to create a new replica by
@@ -282,6 +283,12 @@ $ pg_recvlogical -d postgres --slot test --drop-slot
  database's state at that point in time, which afterwards can be updated
  using the slot's contents without losing any changes.
 
+
+ Creation of a snapshot is not always possible - in particular, it will
+ fail when connected to a hot standby. Applications that do not require
+ snapshot export may suppress it with the NOEXPORT_SNAPSHOT
+ option.
+

   
 
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 3d6e8ee..95603d3 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1487,7 +1487,7 @@ The commands accepted in walsender mode are:
   
 
   
-   CREATE_REPLICATION_SLOT slot_name [ TEMPORARY ] { PHYSICAL [ RESERVE_WAL ] | LOGICAL output_plugin }
+   CREATE_REPLICATION_SLOT slot_name [ TEMPORARY ] { PHYSICAL [ RESERVE_WAL ] | LOGICAL output_plugin [ EXPORT_SNAPSHOT | NOEXPORT_SNAPSHOT ] }
  CREATE_REPLICATION_SLOT
 
 
@@ -1538,6 +1538,20 @@ The commands accepted in walsender mode are:
 

   
+  
+   EXPORT_SNAPSHOT
+   NOEXPORT_SNAPSHOT
+   
+
+ Decides what to do with the snapshot created during logical slot
+ initialization. EXPORT_SNAPSHOT, which is the
+ default, will export the snapshot for use in other sessions. This
+ option can't be used inside a transaction. The
+ NOEXPORT_SNAPSHOT will just use the snapshot for logical
+ decoding as normal but won't do anything else with it.
+
+   
+  
  
 
  
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c

Re: [HACKERS] Logical replication existing data copy

2017-03-09 Thread Petr Jelinek
On 09/03/17 18:48, Peter Eisentraut wrote:
> On 3/6/17 05:27, Petr Jelinek wrote:
>> And lastly I changed the automagic around exporting, not exporting and
>> using the snapshot produced by CREATE_REPLICATION_SLOT into explicit
>> parameter for the CREATE_REPLICATION_SLOT command (and added simple
>> framework for adding more of those if needed in the future).
> 
> It might be useful to make this into a separate patch, for clarity.
> 

Yes, already working on that (at least part of it), since it's useful
for other patches in CF.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-09 Thread Petr Jelinek
On 09/03/17 18:46, Peter Eisentraut wrote:
> On 3/6/17 05:27, Petr Jelinek wrote:
>> updated and rebased version of the patch attached.
> 
> Some comments on this patch:
> 
> Can we have a better explanation of different snapshot options in
> CREATE_REPLICATION_SLOT.  We use all these variants in different
> places, but it's not fully documented why.  Think about interested
> users reading this.
> 

I am not quite sure what/where do you want me to expand the docs, the
protocol doc explains what the options do, are you missing reasoning for
why we use them in the calling code?

> 
> errmsg("subscription table %u in subscription %u does not exist",
> 
> Use names instead of IDs if possible.
> 

Well, this message is more or less equal to cache lookup failure,
problem is that neither relation nor subscription are guaranteed to
exist if this fails. I can write code that spits two different messages
depending of they do, not quite sure if it's worth it there though.

> 
> +   libpqrcv_table_list,
> +   libpqrcv_table_info,
> +   libpqrcv_table_copy,
> 
> I don't think these functions belong into the WAL receiver API, since
> they are specific to this particular logical replication
> implementation.  I would just make an API function libpqrc_exec_sql()
> that runs a query, and then put the table_*() functions as wrappers
> around that into tablesync.c.
> 

Yeah I was wondering about it as well. The reason why it's in
libpqwalreciver is that if we create some libpqrc_exec_sql as you say,
we'll need code that converts the PQresult to something consumable by
backend that has no access to libpq API and that seemed like quite a bit
of boilerplate work. I really don't want to write another libpq-like or
SPI-like interface for this. Maybe it would be acceptable to return
result as tuplestore?

> 
> Not sure what the worker init stuff in ApplyLauncherShmemInit() is
> doing.  Could be commented more.
> 

Eh, I copy pasted comment there from different place, will fix.

> 
> max_sync_workers_per_subscription could be PGC_SIGHUP, I think.  And
> the minimum could be 0, to stop any syncing?
> 
> 
> pg_subscription_rel.h: I'm not sure under which circumstances we need
> to use BKI_ROWTYPE_OID.
> 
> 
> Does pg_subscription_rel need an OID column?  The index
> SubscriptionRelOidIndexId is not used anywhere.
>

Ah, leftover from the time it used dependencies for tracking.

> 
> You removed this command from the tests:
> 
> ALTER SUBSCRIPTION testsub SET PUBLICATION testpub, testpub1;
> 
> I suppose because it causes a connection.  We should have an option to
> prevent that (NOCONNECT/NOREFRESH?).

NOREFRESH makes more sense I guess.

> 
> Why was the test 'check replication origin was dropped on subscriber'
> removed?
> 

I don't know what you mean?

> 
> Attached also a small patch with some stylistic changes.
> 

Thanks.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-09 Thread Peter Eisentraut
On 3/6/17 05:27, Petr Jelinek wrote:
> And lastly I changed the automagic around exporting, not exporting and
> using the snapshot produced by CREATE_REPLICATION_SLOT into explicit
> parameter for the CREATE_REPLICATION_SLOT command (and added simple
> framework for adding more of those if needed in the future).

It might be useful to make this into a separate patch, for clarity.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-09 Thread Peter Eisentraut
On 3/6/17 05:27, Petr Jelinek wrote:
> updated and rebased version of the patch attached.

Some comments on this patch:

Can we have a better explanation of different snapshot options in
CREATE_REPLICATION_SLOT.  We use all these variants in different
places, but it's not fully documented why.  Think about interested
users reading this.


errmsg("subscription table %u in subscription %u does not exist",

Use names instead of IDs if possible.


+   libpqrcv_table_list,
+   libpqrcv_table_info,
+   libpqrcv_table_copy,

I don't think these functions belong into the WAL receiver API, since
they are specific to this particular logical replication
implementation.  I would just make an API function libpqrc_exec_sql()
that runs a query, and then put the table_*() functions as wrappers
around that into tablesync.c.


Not sure what the worker init stuff in ApplyLauncherShmemInit() is
doing.  Could be commented more.


There are a lot of places that do one of

MyLogicalRepWorker->relid == InvalidOid
OidIsValid(MyLogicalRepWorker->relid)

To check whether the current worker is a sync worker.  Wrap that into
a function.


Not a fan of adding CommandCounterIncrement() calls at the end of
functions.  In some cases, they are not necessary at all.  In cases
where they are, the CCI call should be at a higher level between two
function calls with a comment for why the call below needs to see the
changes from the call above.


The index name pg_subscription_rel_map_index/SubscriptionRelMapIndexId
doesn't seem to match existing style, since there is no "map" column.
How about pg_subscription_rel_rel_sub_index?  I see we have a
similarly named index for publications.


max_sync_workers_per_subscription could be PGC_SIGHUP, I think.  And
the minimum could be 0, to stop any syncing?


pg_subscription_rel.h: I'm not sure under which circumstances we need
to use BKI_ROWTYPE_OID.


Does pg_subscription_rel need an OID column?  The index
SubscriptionRelOidIndexId is not used anywhere.


You removed this command from the tests:

ALTER SUBSCRIPTION testsub SET PUBLICATION testpub, testpub1;

I suppose because it causes a connection.  We should have an option to
prevent that (NOCONNECT/NOREFRESH?).


Why was the test 'check replication origin was dropped on subscriber'
removed?


Attached also a small patch with some stylistic changes.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From c6ba1eaa3c5a44e4a9f6d072cb95fcf7e68ba3d6 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 9 Mar 2017 08:19:06 -0500
Subject: [PATCH] fixup! Logical replication support for initial data copy

---
 doc/src/sgml/catalogs.sgml  |  9 +
 doc/src/sgml/logical-replication.sgml   | 18 +-
 doc/src/sgml/monitoring.sgml|  4 ++--
 doc/src/sgml/protocol.sgml  | 14 +++---
 doc/src/sgml/ref/alter_subscription.sgml|  8 
 doc/src/sgml/ref/create_subscription.sgml   | 13 ++---
 src/backend/catalog/pg_subscription.c   | 13 -
 src/backend/replication/logical/launcher.c  | 18 +++---
 src/backend/replication/logical/snapbuild.c |  6 +++---
 src/backend/replication/logical/worker.c|  4 ++--
 src/backend/replication/repl_gram.y |  1 +
 src/backend/replication/walsender.c |  2 +-
 src/backend/tcop/postgres.c |  8 +---
 13 files changed, 56 insertions(+), 62 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index f587a08b6a..ab78585035 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -302,7 +302,7 @@ System Catalogs
 
  
   pg_subscription_rel
-  relation state mapping for subscriptions
+  relation state for subscriptions
  
 
  
@@ -6429,14 +6429,14 @@ pg_subscription_rel
 
   
The catalog pg_subscription_rel contains the
-   status for each replicated relation in each subscription.  This is a
+   state for each replicated relation in each subscription.  This is a
many-to-many mapping.
   
 
   
-   This catalog only contains tables known to subscription after running
+   This catalog only contains tables known to the subscription after running
either CREATE SUBSCRIPTION or
-   ALTER SUBSCRIPTION ... REFRESH commands.
+   ALTER SUBSCRIPTION ... REFRESH.
   
 
   
@@ -6472,6 +6472,7 @@ pg_subscription_rel Columns
   char
   
   
+   State code:
i = initialize,
d = data is being copied,
s = synchronized,
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index f75304cd91..4ec6bb49b7 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -24,8 +24,8 @@ Logical Replication
  
 
  
-  Logical replication typically starts with a snapshot of the data on
-  the publisher 

Re: [HACKERS] Logical replication existing data copy

2017-03-09 Thread Erik Rijkers

On 2017-03-09 11:06, Erik Rijkers wrote:

On 2017-03-08 10:36, Petr Jelinek wrote:

On 07/03/17 23:30, Erik Rijkers wrote:

On 2017-03-06 11:27, Petr Jelinek wrote:


0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch +
0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch+
0003-Prevent-snapshot-builder-xmin-from-going-backwards.patch  +
0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patch  +
0005-Skip-unnecessary-snapshot-builds.patch+
0001-Logical-replication-support-for-initial-data-copy-v6.patch




The attached bz2 contains
- an output file from pgbench_derail2.sh (also attached, as it changes
somewhat all the time); the

- the pg_waldump output from both master (file with .1. in it) and
replica (.2.).

- the 2 logfiles.




I forgot to include the bash-output file.  Now attached.  This file 
should have been in the bz2 I sent a few minutes ago.



= iteration 1 --   1 of 10 =

-- scale 25   clients 64   duration 300   CLEAN_ONLY=

-- hostname: barzoi
-- timestamp: 20170309_1021
-- master_start_time 2017-03-08 12:04:02.127127+01 replica_start_time 
2017-03-08 12:04:02.12713+01
-- master  patch-md5 [59c92165d4a328d68450ef0e922c0a42]
-- replica patch-md5 [59c92165d4a328d68450ef0e922c0a42] (ok)
-- synchronous_commit, master  [on]  replica [off]
-- master_assert  [on]  replica_assert [on]
-- self md5 87554cfed7cda67ad292b6481e1b8b41 ./pgbench_derail2.sh
clean-at-start-call
creating tables...
1699900 of 250 tuples (67%) done (elapsed 5.19 s, remaining 2.44 s)
250 of 250 tuples (100%) done (elapsed 7.51 s, remaining 0.00 s)
vacuum...
set primary keys...
done.
create publication pub1 for all tables;
create subscription sub1 connection 'port=6972' publication pub1 with 
(disabled);
alter subscription sub1 enable;
-- pgbench -c 64 -j 8 -T 300 -P 60 -n   --  scale 25
progress: 60.0 s, 134.4 tps, lat 472.280 ms stddev 622.992
progress: 120.0 s, 26.4 tps, lat 2083.748 ms stddev 4356.546
progress: 180.0 s, 21.2 tps, lat 2977.751 ms stddev 4767.332
progress: 240.0 s, 13.5 tps, lat 5230.657 ms stddev 7029.718
progress: 300.0 s, 42.4 tps, lat 1555.645 ms stddev 1733.152
transaction type: 
scaling factor: 25
query mode: simple
number of clients: 64
number of threads: 8
duration: 300 s
number of transactions actually processed: 14336
latency average = 1342.222 ms
latency stddev = 3043.759 ms
tps = 47.383887 (including connections establishing)
tps = 47.385513 (excluding connections establishing)
-- waiting 0s... (always)
2017.03.09 10:27:56
-- getting md5 (cb)
6972 a,b,t,h:  250 25250  14336  ee0f7bfd9 960d7d79c 3e8af1e9e 
cd2bd0395 master
6973 a,b,t,h:  250 25250  14336  ee0f7bfd9 960d7d79c 3e8af1e9e 
cd2bd0395 replica ok  578113f12
2017.03.09 10:29:18
-- All is well.
-- 0 seconds total.  scale 25  clients 64  -T 300
-- waiting 20s, then end-cleaning
clean-at-end-call
sub_count -ne 0 : deleting sub1 (plain)
ERROR:  could not drop the replication slot "sub1" on publisher
DETAIL:  The error was: ERROR:  replication slot "sub1" is active for PID 10569
sub_count -ne 0 : deleting sub1 (nodrop)
pub_count -ne 0 - deleting pub1
pub_repl_slot_count -ne 0 - deleting (sub1)
ERROR:  replication slot "sub1" is active for PID 10569

 pub_count  0
   pub_repl_slot_count  1
 sub_count  0
   sub_repl_slot_count  0

-- imperfect cleanup, pg_waldump to unclean.20170309_1021.txt.bz2, waiting 60 
s, then exit
-- testset.sh: waiting 10s...

= iteration 2 --   2 of 10 =

-- scale 25   clients 64   duration 300   CLEAN_ONLY=

-- hostname: barzoi
-- timestamp: 20170309_1021
-- master_start_time 2017-03-08 12:04:02.127127+01 replica_start_time 
2017-03-08 12:04:02.12713+01
-- master  patch-md5 [59c92165d4a328d68450ef0e922c0a42]
-- replica patch-md5 [59c92165d4a328d68450ef0e922c0a42] (ok)
-- synchronous_commit, master  [on]  replica [off]
-- master_assert  [on]  replica_assert [on]
-- self md5 87554cfed7cda67ad292b6481e1b8b41 ./pgbench_derail2.sh
clean-at-start-call
pub_repl_slot_count -ne 0 - deleting (sub1)
 pg_drop_replication_slot 
--
 
(1 row)

creating tables...
1596800 of 250 tuples (63%) done (elapsed 5.09 s, remaining 2.88 s)
250 of 250 tuples (100%) done (elapsed 7.88 s, remaining 0.00 s)
vacuum...
set primary keys...
done.
create publication pub1 for all tables;
create subscription sub1 connection 'port=6972' publication pub1 with 
(disabled);
alter subscription sub1 enable;
-- pgbench -c 64 -j 8 -T 300 -P 60 -n   --  scale 25
progress: 60.0 s, 129.0 tps, lat 493.130 ms stddev 635.654
progress: 120.0 s, 34.0 tps, lat 1679.787 ms stddev 3808.253

Re: [HACKERS] Logical replication existing data copy

2017-03-09 Thread Erik Rijkers

On 2017-03-09 11:06, Erik Rijkers wrote:



file Name:
logrep.20170309_1021.1.1043.scale_25.clients_64.NOK.log

20170309_1021 is the start-time of the script
1  is master (2 is replica)
1043 is the time, 10:43, just before the pg_waldump call


Sorry, that might be confusing.  That 10:43 is the time when script 
renames and copies the logfiles (not the waldump)



I meant to show the name of the waldump file:

waldump.20170309_1021_1039.1.5.000100270069.txt.bz2
where:

20170309_1021 is the start-time of the script
1  is master (2 is replica)
5 is wait-state cycles during which all 8 md5s remained the same
1039 is the time, 10:43, just before the pg_waldump call



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-08 Thread Petr Jelinek
On 07/03/17 23:30, Erik Rijkers wrote:
> On 2017-03-06 11:27, Petr Jelinek wrote:
> 
>> 0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch +
>> 0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch+
>> 0003-Prevent-snapshot-builder-xmin-from-going-backwards.patch  +
>> 0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patch  +
>> 0005-Skip-unnecessary-snapshot-builds.patch+
>> 0001-Logical-replication-support-for-initial-data-copy-v6.patch
> 
> I use three different machines (2 desktop, 1 server) to test logical
> replication, and all three have now at least once failed to correctly
> synchronise a pgbench session (amidst many succesful runs, of course)
> 
> I attach an output-file from the test-program, with the 2 logfiles
> (master+replica) of the failed run.  The outputfile
> (out_20170307_1613.txt) contains the output of 5 runs of
> pgbench_derail2.sh.  The first run failed, the next 4 were ok.
> 
> But that's probably not very useful; perhaps is pg_waldump more useful? 
> From what moment, or leading up to what moment, or period, is a
> pg_waldump(s) useful?  I can run it from the script, repeatedly, and
> only keep the dumped files when things go awry. Would that make sense?

Hi,

yes waldump would be useful, the last segment should be enough, but
possibly all segments mentioned in the log.

The other useful thing would be to turn on log_connections and
log_replication_commands.

And finally if you could dump the contents of pg_subscription_rel,
pg_replication_origin_status on subscriber and pg_replication_slots on
publisher at the end of the failed run that would also help.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-07 Thread Erik Rijkers

On 2017-03-06 11:27, Petr Jelinek wrote:


0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch +
0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch+
0003-Prevent-snapshot-builder-xmin-from-going-backwards.patch  +
0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patch  +
0005-Skip-unnecessary-snapshot-builds.patch+
0001-Logical-replication-support-for-initial-data-copy-v6.patch


I use three different machines (2 desktop, 1 server) to test logical 
replication, and all three have now at least once failed to correctly 
synchronise a pgbench session (amidst many succesful runs, of course)


I attach an output-file from the test-program, with the 2 logfiles 
(master+replica) of the failed run.  The outputfile 
(out_20170307_1613.txt) contains the output of 5 runs of 
pgbench_derail2.sh.  The first run failed, the next 4 were ok.


But that's probably not very useful; perhaps is pg_waldump more useful?  
From what moment, or leading up to what moment, or period, is a 
pg_waldump(s) useful?  I can run it from the script, repeatedly, and 
only keep the dumped files when things go awry. Would that make sense?


Any other ideas welcome.


thanks,

Erik Rijkers





20170307_1613.tar.bz2
Description: BZip2 compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-07 Thread Robert Haas
On Tue, Mar 7, 2017 at 11:39 AM, Peter Eisentraut
 wrote:
> On 3/4/17 01:46, Robert Haas wrote:
>>> So I think we should do it, but it needs to be configurable, my original
>>> patch added GUC for it, Peter wanted it to be configurable per
>>> subscription. I guess we could add it as another option to the list of
>>> WITH (...) options for CREATE and ALTER SUBSCRIPTION.
>>
>> I don't have a terribly well-considered opinion on this point just
>> yet, but my initial hunch is that Peter has the right idea.
>
> Basically, we need to have some way of setting this that makes sense in
> the global scheme of things.  We don't want a situation where "sometimes
> it does this, sometimes it does that".  I'm not set on any specific
> mechanism.

I think we are on the same page.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-07 Thread Peter Eisentraut
On 3/4/17 01:46, Robert Haas wrote:
>> So I think we should do it, but it needs to be configurable, my original
>> patch added GUC for it, Peter wanted it to be configurable per
>> subscription. I guess we could add it as another option to the list of
>> WITH (...) options for CREATE and ALTER SUBSCRIPTION.
> 
> I don't have a terribly well-considered opinion on this point just
> yet, but my initial hunch is that Peter has the right idea.

Basically, we need to have some way of setting this that makes sense in
the global scheme of things.  We don't want a situation where "sometimes
it does this, sometimes it does that".  I'm not set on any specific
mechanism.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-06 Thread Petr Jelinek
On 06/03/17 23:40, Erik Rijkers wrote:
> On 2017-03-06 16:10, Erik Rijkers wrote:
>> On 2017-03-06 11:27, Petr Jelinek wrote:
>>> Hi,
>>>
>>> updated and rebased version of the patch attached.
>>>
>>
>> I compiled with /only/ this one latest patch:
>>0001-Logical-replication-support-for-initial-data-copy-v6.patch
>>
>> Is that correct, or are other patches still needed on top, or underneath?
>>
> 
> TWIMC, I'll answer my own question: the correct patchset seems to be
> these six:
> 
> 0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch
> 0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch
> 0003-Prevent-snapshot-builder-xmin-from-going-backwards.patch
> 0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patch
> 0005-Skip-unnecessary-snapshot-builds.patch
> 0001-Logical-replication-support-for-initial-data-copy-v6.patch
> 

Yes the 0004 and 0005 improve performance of snapshot creation
considerably. They're also the reason we uncovered all the snapbuilder
bugs now. Before the performance optimization it was very unlikely for
some of the race conditions to happen as the snapshot would just not be
built until there was less concurrent traffic on the server.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-06 Thread Erik Rijkers

On 2017-03-06 16:10, Erik Rijkers wrote:

On 2017-03-06 11:27, Petr Jelinek wrote:

Hi,

updated and rebased version of the patch attached.



I compiled with /only/ this one latest patch:
   0001-Logical-replication-support-for-initial-data-copy-v6.patch

Is that correct, or are other patches still needed on top, or 
underneath?




TWIMC, I'll answer my own question: the correct patchset seems to be 
these six:


0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch
0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch
0003-Prevent-snapshot-builder-xmin-from-going-backwards.patch
0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patch
0005-Skip-unnecessary-snapshot-builds.patch
0001-Logical-replication-support-for-initial-data-copy-v6.patch

These compile, make check, and install fine.  make check-world is also 
without errors.


Logical replication tests are now running again (no errors yet); they'll 
have to run for a few hours with varying parameters to gain some 
confidence but it's looking good for the moment.



Erik Rijkers







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-06 Thread Erik Rijkers

On 2017-03-06 11:27, Petr Jelinek wrote:

Hi,

updated and rebased version of the patch attached.



I compiled with /only/ this one latest patch:
   0001-Logical-replication-support-for-initial-data-copy-v6.patch

Is that correct, or are other patches still needed on top, or 
underneath?


Anyway, with that one patch, and even after
  alter role ... set synchronous_commit = off;
the process is very slow. (sufficiently slow that I haven't
had the patience to see it to completion yet)

What am I doing wrong?

thanks,

Erik Rijkers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-03 Thread Robert Haas
On Sat, Mar 4, 2017 at 11:41 AM, Petr Jelinek
 wrote:
> On 04/03/17 06:46, Robert Haas wrote:
>> On Tue, Feb 28, 2017 at 12:08 PM, Erik Rijkers  wrote:
>>> Would you remind me why synchronous_commit = on was deemed a better default?
>> I'm wondering about that, too.  If you're trying to do logical
>> synchronous replication, then maybe there's some argument there,
>> although even in that case I am not sure it's actually necessary.  But
>> if you're doing asynchronous logical replication, it seems not to make
>> much sense.  I mean, walwriter is going to flush the WAL to disk
>> within a fraction of a second; why would we wait for that to happen
>> instead of getting on with replicating the next transaction meanwhile?
>>
>
> I do agree. And even when doing synchronous replication it should not be
> problem. We do have code already in place which takes care of reporting
> correct write/flush position in the situation where upstream does
> syncrep but downstream has synccommit off.
>
> The only anomaly I could think of in terms of having sync commit off on
> downstream is that some replicated data that were visible will disappear
> after a crash only to reappear again once replication catches up to
> where it has been before when doing asynchronous replication. That does
> not strike me like good enough reason to have by default something like
> order of magnitude worse performance.

I see.  That's a good reason to have an option, but I agree with your
conclusion.

> So I think we should do it, but it needs to be configurable, my original
> patch added GUC for it, Peter wanted it to be configurable per
> subscription. I guess we could add it as another option to the list of
> WITH (...) options for CREATE and ALTER SUBSCRIPTION.

I don't have a terribly well-considered opinion on this point just
yet, but my initial hunch is that Peter has the right idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-03 Thread Petr Jelinek
On 04/03/17 06:46, Robert Haas wrote:
> On Tue, Feb 28, 2017 at 12:08 PM, Erik Rijkers  wrote:
>> Would you remind me why synchronous_commit = on was deemed a better default?
> 
> I'm wondering about that, too.  If you're trying to do logical
> synchronous replication, then maybe there's some argument there,
> although even in that case I am not sure it's actually necessary.  But
> if you're doing asynchronous logical replication, it seems not to make
> much sense.  I mean, walwriter is going to flush the WAL to disk
> within a fraction of a second; why would we wait for that to happen
> instead of getting on with replicating the next transaction meanwhile?
> 

I do agree. And even when doing synchronous replication it should not be
problem. We do have code already in place which takes care of reporting
correct write/flush position in the situation where upstream does
syncrep but downstream has synccommit off.

The only anomaly I could think of in terms of having sync commit off on
downstream is that some replicated data that were visible will disappear
after a crash only to reappear again once replication catches up to
where it has been before when doing asynchronous replication. That does
not strike me like good enough reason to have by default something like
order of magnitude worse performance.

So I think we should do it, but it needs to be configurable, my original
patch added GUC for it, Peter wanted it to be configurable per
subscription. I guess we could add it as another option to the list of
WITH (...) options for CREATE and ALTER SUBSCRIPTION.

I am not going to write the patch immediately as I'd like to get the
initial data copy working correctly first (and I've sunk a lot of time
over last couple of weeks into investigation and fixing of various
existing bugs in the snapshot builder so it's taking longer than I
expected). But the patch for this is trivial anyway. Agreement is the
what we need first.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-03 Thread Robert Haas
On Tue, Feb 28, 2017 at 12:08 PM, Erik Rijkers  wrote:
> Would you remind me why synchronous_commit = on was deemed a better default?

I'm wondering about that, too.  If you're trying to do logical
synchronous replication, then maybe there's some argument there,
although even in that case I am not sure it's actually necessary.  But
if you're doing asynchronous logical replication, it seems not to make
much sense.  I mean, walwriter is going to flush the WAL to disk
within a fraction of a second; why would we wait for that to happen
instead of getting on with replicating the next transaction meanwhile?

(There may well be an aspect to this I'm missing, so please forgive me
if the above is off-base.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-01 Thread Petr Jelinek
On 28/02/17 20:36, Erik Rijkers wrote:
> This is the most frequent error that happens while doing pgbench-runs
> over logical replication: I run it continuously all day, and every few
> hours an error occurs of the kind seen below: a table (pgbench_history,
> mostly) ends up 1 row short (673466 instead of 673467).  I have the
> script wait a long time before calling it an error (because in theory it
> could still 'finish', and end successfully (although that has not
> happened yet, once the system got into this state).
> 

Yeah it's unlikely if it's just one row. It suggests there might still
be some snapbuild issue, but I don't immediately see one and I run
couple thousand repeats of the test without getting any inconsistency.
Will continue digging.

> 
> I gathered some info in this (proabably deadlocked) state in the hope
> there is something suspicious in there:
> 

Hmm that didn't really reveal much :(


-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-02-28 Thread Erik Rijkers

On 2017-02-28 07:38, Erik Rijkers wrote:

On 2017-02-27 15:08, Petr Jelinek wrote:


0001-Use-asynchronous-connect-API-in-libpqwalreceiver.patch 
+
0002-Fix-after-trigger-execution-in-logical-replication.patch   
+
0003-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch 
+
snapbuild-v4-0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch 
+

snapbuild-v4-0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch+
snapbuild-v4-0003-Prevent-snapshot-builder-xmin-from-going-backwards.patch 
 +
snapbuild-v4-0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patch  
+
snapbuild-v4-0005-Skip-unnecessary-snapshot-builds.patch
+

0001-Logical-replication-support-for-initial-data-copy-v6.patch


This is the most frequent error that happens while doing pgbench-runs 
over logical replication: I run it continuously all day, and every few 
hours an error occurs of the kind seen below: a table (pgbench_history, 
mostly) ends up 1 row short (673466 instead of 673467).  I have the 
script wait a long time before calling it an error (because in theory it 
could still 'finish', and end successfully (although that has not 
happened yet, once the system got into this state).


-- pgbench -c 16 -j 8 -T 120 -P 24 -n -M simple  --  scale 25

[...]
6972 a,b,t,h:  250 25250 673467   e53236c09 643235708 
f952814c3 559d618cd   master
6973 a,b,t,h:  250 25250 673466   e53236c09 643235708 
f952814c3 4b09337e3   replica NOK   a22fb00a6

-- wait another 5 s   (total 20 s) (unchanged 1)
-- getting md5 (cb)
6972 a,b,t,h:  250 25250 673467   e53236c09 643235708 
f952814c3 559d618cd   master
6973 a,b,t,h:  250 25250 673466   e53236c09 643235708 
f952814c3 4b09337e3   replica NOK   a22fb00a6

-- wait another 5 s   (total 25 s) (unchanged 2)
-- getting md5 (cb)
6972 a,b,t,h:  250 25250 673467   e53236c09 643235708 
f952814c3 559d618cd   master
6973 a,b,t,h:  250 25250 673466   e53236c09 643235708 
f952814c3 4b09337e3   replica NOK   a22fb00a6

-- wait another 5 s   (total 30 s) (unchanged 3)
-- getting md5 (cb)
6972 a,b,t,h:  250 25250 673467   e53236c09 643235708 
f952814c3 559d618cd   master
6973 a,b,t,h:  250 25250 673466   e53236c09 643235708 
f952814c3 4b09337e3   replica NOK   a22fb00a6

-- wait another 5 s   (total 35 s) (unchanged 4)


I gathered some info in this (proabably deadlocked) state in the hope 
there is something suspicious in there:



UID PID   PPID  C STIME TTY  STAT   TIME CMD
rijkers   71203  1  0 20:06 pts/57   S  0:00 postgres -D 
/var/data1/pg_stuff/pg_installations/pgsql.logical_replication2/data -p 
6973 -c wal_level=replica [...]
rijkers   71214  71203  0 20:06 ?Ss 0:00  \_ postgres: 
logger process
rijkers   71216  71203  0 20:06 ?Ss 0:00  \_ postgres: 
checkpointer process
rijkers   71217  71203  0 20:06 ?Ss 0:00  \_ postgres: 
writer process
rijkers   71218  71203  0 20:06 ?Ss 0:00  \_ postgres: wal 
writer process
rijkers   71219  71203  0 20:06 ?Ss 0:00  \_ postgres: 
autovacuum launcher process
rijkers   71220  71203  0 20:06 ?Ss 0:00  \_ postgres: stats 
collector process
rijkers   71221  71203  0 20:06 ?Ss 0:00  \_ postgres: 
bgworker: logical replication launcher
rijkers   71222  71203  0 20:06 ?Ss 0:00  \_ postgres: 
bgworker: logical replication worker 30042


rijkers   71201  1  0 20:06 pts/57   S  0:00 postgres -D 
/var/data1/pg_stuff/pg_installations/pgsql.logical_replication/data -p 
6972 -c wal_level=logical [...]
rijkers   71206  71201  0 20:06 ?Ss 0:00  \_ postgres: 
logger process
rijkers   71208  71201  0 20:06 ?Ss 0:00  \_ postgres: 
checkpointer process
rijkers   71209  71201  0 20:06 ?Ss 0:00  \_ postgres: 
writer process
rijkers   71210  71201  0 20:06 ?Ss 0:00  \_ postgres: wal 
writer process
rijkers   71211  71201  0 20:06 ?Ss 0:00  \_ postgres: 
autovacuum launcher process
rijkers   71212  71201  0 20:06 ?Ss 0:00  \_ postgres: stats 
collector process
rijkers   71213  71201  0 20:06 ?Ss 0:00  \_ postgres: 
bgworker: logical replication launcher
rijkers   71223  71201  0 20:06 ?Ss 0:00  \_ postgres: wal 
sender process rijkers [local] idle





-- replica:
 port | shared_buffers | work_mem | m_w_m | e_c_s
--++--+---+---
 6973 | 100MB  | 50MB | 2GB   | 64GB
(1 row)

select  current_setting('port') as port
, datname   as db
,  to_char(pg_database_size(datname), '9G999G999G999G999')
 || ' (' ||  pg_size_pretty(pg_database_size(datname)) || ')' as 
dbsize

, pid
, application_name  as app
, xact_start
, query_start
, regexp_replace( cast(now() - query_start as text), 
E'\.[[:digit:]]+\$', '')   

Re: [HACKERS] Logical replication existing data copy

2017-02-27 Thread Erik Rijkers

On 2017-02-27 15:08, Petr Jelinek wrote:


The performance was why in original patch I wanted the apply process to
default to synchronous_commit = off as without it the apply performance
(due to applying transactions individually and in sequences) is quite
lackluster.

It can be worked around using user that has synchronous_commit = off 
set

via ALTER ROLE as owner of the subscription.



Wow, that's a huge difference in speed.

I set
   ALTER ROLE aardvark synchronous_commit = off;

during the first iteration of a 10x pgbench-test (so the first was still 
done with it 'on'):

here the pertinent grep | uniq -c lines:

-- out_20170228_0004.txt
 10 -- pgbench -c 16 -j 8 -T 900 -P 180 -n   --  scale 25
 10 -- All is well.
  1 -- 1325 seconds total.
  9 -- 5 seconds total.

And that 5 seconds is a hardcoded wait; so it's probably even quicker.

This is a slowish machine but that's a really spectacular difference.  
It's the difference between keeping up or getting lost.


Would you remind me why synchronous_commit = on was deemed a better 
default?  This thread isn't very clear about it (not the 'logical 
replication WIP' thread).



thanks,

Erik Rijkers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-02-27 Thread Petr Jelinek
On 27/02/17 11:07, Erik Rijkers wrote:
> With these patches:
> 
> -- 0416d87c-09a5-182e-4901-236aec103...@2ndquadrant.com
>Subject: Re: Logical Replication WIP
>   48.
> https://www.postgresql.org/message-id/attachment/49886/0001-Use-asynchronous-connect-API-in-libpqwalreceiver.patch
> 
>   49.
> https://www.postgresql.org/message-id/attachment/49887/0002-Fix-after-trigger-execution-in-logical-replication.patch
> 
>   50.
> https://www.postgresql.org/message-id/attachment/49888/0003-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch
> 
> 
> -- 51f65289-54f8-2256-d107-937d662d6...@2ndquadrant.com
>Subject: Re: snapbuild woes
>   48.
> https://www.postgresql.org/message-id/attachment/49995/snapbuild-v4-0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch
> 
>   49.
> https://www.postgresql.org/message-id/attachment/49996/snapbuild-v4-0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch
> 
>   50.
> https://www.postgresql.org/message-id/attachment/49997/snapbuild-v4-0003-Prevent-snapshot-builder-xmin-from-going-backwards.patch
> 
>   51.
> https://www.postgresql.org/message-id/attachment/49998/snapbuild-v4-0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patch
> 
>   52.
> https://www.postgresql.org/message-id/attachment/4/snapbuild-v4-0005-Skip-unnecessary-snapshot-builds.patch
> 
> 
> -- c0f90176-efff-0770-1e79-0249fb4b9...@2ndquadrant.com
>Subject: Re: Logical replication existing data copy
>   48.
> https://www.postgresql.org/message-id/attachment/49977/0001-Logical-replication-support-for-initial-data-copy-v6.patch
> 
> 
> 
> logical replication now seems pretty stable, at least for the limited
> testcase that I am using.  I've done dozens of pgbench_derail2.sh runs
> without failure.  I am now changing the pgbench-test to larger scale
> (pgbench -is) and longer periods (-T) which makes running the test slow
> (both instances are running on a modest desktop with a single 7200
> disk).  It is quite a bit slower than I expected (a 5-minute pgbench
> scale 5, with 8 clients, takes, after it has finished on master, another
> 2-3 minutes to get synced on the replica).  I suppose it's just a
> hardware limitation.  I set max_sync_workers_per_subscription to 6 (from
> default 2) but it doesn't help much (at all).
> 
> To be continued...
> 

Thanks,

The performance was why in original patch I wanted the apply process to
default to synchronous_commit = off as without it the apply performance
(due to applying transactions individually and in sequences) is quite
lackluster.

It can be worked around using user that has synchronous_commit = off set
via ALTER ROLE as owner of the subscription.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-02-27 Thread Erik Rijkers

With these patches:

-- 0416d87c-09a5-182e-4901-236aec103...@2ndquadrant.com
   Subject: Re: Logical Replication WIP
  48. 
https://www.postgresql.org/message-id/attachment/49886/0001-Use-asynchronous-connect-API-in-libpqwalreceiver.patch
  49. 
https://www.postgresql.org/message-id/attachment/49887/0002-Fix-after-trigger-execution-in-logical-replication.patch
  50. 
https://www.postgresql.org/message-id/attachment/49888/0003-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch


-- 51f65289-54f8-2256-d107-937d662d6...@2ndquadrant.com
   Subject: Re: snapbuild woes
  48. 
https://www.postgresql.org/message-id/attachment/49995/snapbuild-v4-0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch
  49. 
https://www.postgresql.org/message-id/attachment/49996/snapbuild-v4-0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch
  50. 
https://www.postgresql.org/message-id/attachment/49997/snapbuild-v4-0003-Prevent-snapshot-builder-xmin-from-going-backwards.patch
  51. 
https://www.postgresql.org/message-id/attachment/49998/snapbuild-v4-0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patch
  52. 
https://www.postgresql.org/message-id/attachment/4/snapbuild-v4-0005-Skip-unnecessary-snapshot-builds.patch


-- c0f90176-efff-0770-1e79-0249fb4b9...@2ndquadrant.com
   Subject: Re: Logical replication existing data copy
  48. 
https://www.postgresql.org/message-id/attachment/49977/0001-Logical-replication-support-for-initial-data-copy-v6.patch



logical replication now seems pretty stable, at least for the limited 
testcase that I am using.  I've done dozens of pgbench_derail2.sh runs 
without failure.  I am now changing the pgbench-test to larger scale 
(pgbench -is) and longer periods (-T) which makes running the test slow 
(both instances are running on a modest desktop with a single 7200 
disk).  It is quite a bit slower than I expected (a 5-minute pgbench 
scale 5, with 8 clients, takes, after it has finished on master, another 
2-3 minutes to get synced on the replica).  I suppose it's just a 
hardware limitation.  I set max_sync_workers_per_subscription to 6 (from 
default 2) but it doesn't help much (at all).


To be continued...


Thanks,

Erik Rijkers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-02-26 Thread Erik Rijkers

On 2017-02-26 10:53, Erik Rijkers wrote:


Not yet perfect, but we're getting there...


Sorry, I made a mistake: I was running the newest patches on master but 
the older versions on replica (or more precise: I didn't properly 
shutdown the replica so the older version remained up and running during 
subsequent testing).


So my last email mentioning the 'DROP SUBSCRIPTION' hang error are 
hopefully wrong.


I'll get back when I've repeated these tests. This will take some hours 
(at least).


Sorry to cause you these palpitations, perhaps unnecessarily...


Erik Rijkers





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-02-26 Thread Erik Rijkers

On 2017-02-26 01:45, Petr Jelinek wrote:

Again, much better... :

-- out_20170226_0724.txt
 25 -- pgbench -c 1 -j 8 -T 10 -P 5 -n
 25 -- All is well.
-- out_20170226_0751.txt
 25 -- pgbench -c 4 -j 8 -T 10 -P 5 -n
 25 -- All is well.
-- out_20170226_0819.txt
 25 -- pgbench -c 8 -j 8 -T 10 -P 5 -n
 25 -- All is well.
-- out_20170226_0844.txt
 25 -- pgbench -c 16 -j 8 -T 10 -P 5 -n
 25 -- All is well.
-- out_20170226_0912.txt
 25 -- pgbench -c 32 -j 8 -T 10 -P 5 -n
 25 -- All is well.
-- out_20170226_0944.txt
 25 -- scale  5 clients  1   INIT_WAIT  0CLEAN_ONLY=
 25 -- pgbench -c 1 -j 8 -T 10 -P 5 -n
 25 -- All is well.

 but not perfect: with the next scale up (pgbench scale 25) I got:

-- out_20170226_1001.txt
  3 -- scale  25 clients  1   INIT_WAIT  0CLEAN_ONLY=
  3 -- pgbench -c 1 -j 8 -T 10 -P 5 -n
  2 -- All is well.
  1 -- Not good, but breaking out of wait (waited more than 60s)

It looks like something got stuck at DROP SUBSCRIPTION again which, I 
think, derives from this line:

echo "drop subscription if exists sub1"  | psql -qXp $port2

I don't know exactly what is useful/useless to report; below is the 
state of some tables/views (note that this is from 31 minutes after the 
fact (see 'duration' in the first query)), and a backtrace :



$ ./view.sh
select current_setting('port') as port;
 port
--
 6973
(1 row)

select
  rpad(now()::text,19) as now
, pid   as pid
, application_name  as app
, state as state
, wait_eventas wt_evt
, wait_event_type   as wt_evt_type
, date_trunc('second', query_start::timestamp)  as query_start
, substring((now() - query_start)::text, 1, position('.' in (now() - 
query_start)::text)-1) as duration

, query
from pg_stat_activity
where query !~ 'pg_stat_activity'
;
 now |  pid  | app   
  | state  | wt_evt | wt_evt_type | query_start 
| duration |  query

-+---+-+++-+-+--+--
 2017-02-26 10:42:43 | 28232 | logical replication worker 31929  
  | active | relation   | Lock| 
|  |
 2017-02-26 10:42:43 | 28237 | logical replication worker 31929 sync 
31906 || LogicalSyncStateChange | IPC |  
   |  |
 2017-02-26 10:42:43 | 28242 | logical replication worker 31929 sync 
31909 || transactionid  | Lock|  
   |  |
 2017-02-26 10:42:43 | 32023 | psql  
  | active | BgWorkerShutdown   | IPC | 2017-02-26 10:10:52 
| 00:31:51 | drop subscription if exists sub1

(4 rows)

select * from pg_stat_replication;
 pid | usesysid | usename | application_name | client_addr | 
client_hostname | client_port | backend_start | backend_xmin | state | 
sent_location | write_location | flush_location | replay_location | 
sync_priority | sync_state

-+--+-+--+-+-+-+---+--+---+---+++-+---+
(0 rows)

select * from pg_stat_subscription;
 subid | subname |  pid  | relid | received_lsn |  
last_msg_send_time   | last_msg_receipt_time | 
latest_end_lsn |latest_end_time

---+-+---+---+--+---+---++---
 31929 | sub1| 28242 | 31909 |  | 2017-02-26 
10:07:05.723093+01 | 2017-02-26 10:07:05.723093+01 || 
2017-02-26 10:07:05.723093+01
 31929 | sub1| 28237 | 31906 |  | 2017-02-26 
10:07:04.721229+01 | 2017-02-26 10:07:04.721229+01 || 
2017-02-26 10:07:04.721229+01
 31929 | sub1| 28232 |   | 1/73497468   |
   | 2017-02-26 10:07:47.781883+01 | 1/59A73EF8 | 2017-02-26 
10:07:04.720595+01

(3 rows)

select * from pg_subscription;
 subdbid | subname | subowner | subenabled | subconninfo | subslotname | 
subpublications

-+-+--++-+-+-
   16384 | sub1|   10 | t  | port=6972   | sub1| 
{pub1}

(1 row)

select * from pg_subscription_rel;
 srsubid | srrelid | srsubstate |  srsublsn
-+-++
   31929 |   31912 | i  |
   31929 |   31917 | i  |
   31929 |   31909 | d  |
   31929 |   31906 | w  | 1/73498F90
(4 rows)

Dunno if a backtrace is is useful

$ gdb -pid 32023  (from the DROP SUBSCRIPTION 

Re: [HACKERS] Logical replication existing data copy

2017-02-25 Thread Petr Jelinek
On 25/02/17 09:40, Erik Rijkers wrote:
> On 2017-02-25 00:40, Petr Jelinek wrote:
> 
>> 0001-Use-asynchronous-connect-API-in-libpqwalreceiver.patch
>> 0002-Fix-after-trigger-execution-in-logical-replication.patch
>> 0003-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch
>> snapbuild-v3-0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch
>>
>> snapbuild-v3-0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch
>>
>> snapbuild-v3-0003-Fix-xl_running_xacts-usage-in-snapshot-builder.patch
>> snapbuild-v3-0004-Skip-unnecessary-snapshot-builds.patch
>> 0001-Logical-replication-support-for-initial-data-copy-v6.patch
> 
> Here are some results. There is improvement although it's not an
> unqualified success.
> 

Yep, I found yet another bug in snapbuild (see the relevant thread). I
hope with the updated patches from there (now 5 of them) it should
finally work correctly.

> I want to repeat what I said a few emails back: problems seem to
> disappear when a short wait state is introduced (directly after the
> 'alter subscription sub1 enable' line) to give the logrep machinery time
> to 'settle'. It makes one think of a timing error somewhere (now don't
> ask me where..).

I think that's because if you sleep, the COPY finishes in meantime and
so there is no concurrency with pgbench when the initial snapshot is
being taken so that's not solution for production.

> 
> (By the way, no hanged sessions so far, so that's good)
> 

Great.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-02-25 Thread Erik Rijkers

On 2017-02-25 00:40, Petr Jelinek wrote:


0001-Use-asynchronous-connect-API-in-libpqwalreceiver.patch
0002-Fix-after-trigger-execution-in-logical-replication.patch
0003-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch
snapbuild-v3-0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch
snapbuild-v3-0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch
snapbuild-v3-0003-Fix-xl_running_xacts-usage-in-snapshot-builder.patch
snapbuild-v3-0004-Skip-unnecessary-snapshot-builds.patch
0001-Logical-replication-support-for-initial-data-copy-v6.patch


Here are some results. There is improvement although it's not an 
unqualified success.


Several repeat-runs of pgbench_derail2.sh, with different parameters for 
number-of-client yielded an output file each.


Those show that logrep is now pretty stable when there is only 1 client 
(pgbench -c 1).  But it starts making mistakes with 4, 8, 16 clients.  
I'll just show a grep of the output files; I think it is 
self-explicatory:


Output-files (lines counted with  grep | sort | uniq -c):

-- out_20170225_0129.txt
250 -- pgbench -c 1 -j 8 -T 10 -P 5 -n
250 -- All is well.

-- out_20170225_0654.txt
 25 -- pgbench -c 4 -j 8 -T 10 -P 5 -n
 24 -- All is well.
  1 -- Not good, but breaking out of wait (waited more than 60s)

-- out_20170225_0711.txt
 25 -- pgbench -c 8 -j 8 -T 10 -P 5 -n
 23 -- All is well.
  2 -- Not good, but breaking out of wait (waited more than 60s)

-- out_20170225_0803.txt
 25 -- pgbench -c 16 -j 8 -T 10 -P 5 -n
 11 -- All is well.
 14 -- Not good, but breaking out of wait (waited more than 60s)

So, that says:
1 clients: 250x success, zero fail (250 not a typo, ran this overnight)
4 clients: 24x success, 1 fail
8 clients: 23x success, 2 fail
16 clients: 11x success, 14 fail

I want to repeat what I said a few emails back: problems seem to 
disappear when a short wait state is introduced (directly after the 
'alter subscription sub1 enable' line) to give the logrep machinery time 
to 'settle'. It makes one think of a timing error somewhere (now don't 
ask me where..).


To show that, here is pgbench_derail2.sh output that waited 10 seconds 
(INIT_WAIT in the script) as such a 'settle' period works faultless 
(with 16 clients):


-- out_20170225_0852.txt
 25 -- pgbench -c 16 -j 8 -T 10 -P 5 -n
 25 -- All is well.

QED.

(By the way, no hanged sessions so far, so that's good)


thanks

Erik Rijkers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-02-24 Thread Petr Jelinek
On 25/02/17 00:39, Erik Rijkers wrote:
> On 2017-02-25 00:08, Petr Jelinek wrote:
>>
>> There is now a lot of fixes for existing code that this patch depends
>> on. Hopefully some of the fixes get committed soonish.
> 
> Indeed - could you look over the below list of 8 patches; is it correct
> and in the right (apply) order?
> 
> 0001-Use-asynchronous-connect-API-in-libpqwalreceiver.patch
> 0002-Fix-after-trigger-execution-in-logical-replication.patch
> 0003-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch
> snapbuild-v3-0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch
> snapbuild-v3-0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch
> 
> snapbuild-v3-0003-Fix-xl_running_xacts-usage-in-snapshot-builder.patch
> snapbuild-v3-0004-Skip-unnecessary-snapshot-builds.patch
> 0001-Logical-replication-support-for-initial-data-copy-v6.patch
> 

Yes that should be correct.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-02-24 Thread Erik Rijkers

On 2017-02-25 00:08, Petr Jelinek wrote:


There is now a lot of fixes for existing code that this patch depends
on. Hopefully some of the fixes get committed soonish.


Indeed - could you look over the below list of 8 patches; is it correct 
and in the right (apply) order?


0001-Use-asynchronous-connect-API-in-libpqwalreceiver.patch
0002-Fix-after-trigger-execution-in-logical-replication.patch
0003-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch
snapbuild-v3-0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch
snapbuild-v3-0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch
snapbuild-v3-0003-Fix-xl_running_xacts-usage-in-snapshot-builder.patch
snapbuild-v3-0004-Skip-unnecessary-snapshot-builds.patch
0001-Logical-replication-support-for-initial-data-copy-v6.patch

(they do apply & compile like this...)







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-02-24 Thread Petr Jelinek
On 25/02/17 00:05, Erik Rijkers wrote:
> On 2017-02-24 22:58, Petr Jelinek wrote:
>> On 23/02/17 01:41, Petr Jelinek wrote:
>>> On 23/02/17 01:02, Erik Rijkers wrote:
 On 2017-02-22 18:13, Erik Rijkers wrote:
> On 2017-02-22 14:48, Erik Rijkers wrote:
>> On 2017-02-22 13:03, Petr Jelinek wrote:
>>
>>> 0001-Skip-unnecessary-snapshot-builds.patch
>>> 0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch
>>> 0003-Fix-xl_running_xacts-usage-in-snapshot-builder.patch
>>> 0001-Use-asynchronous-connect-API-in-libpqwalreceiver.patch
>>> 0002-Fix-after-trigger-execution-in-logical-replication.patch
>>> 0003-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch
>>> 0001-Logical-replication-support-for-initial-data-copy-v5.patch
>>
>> It works well now, or at least my particular test case seems now
>> solved.
>
> Cried victory too early, I'm afraid.
>

 I got into a 'hung' state while repeating  pgbench_derail2.sh.

 Below is some state.  I notice that master pg_stat_replication.syaye is
 'startup'.
 Maybe I should only start the test after that state has changed. Any
 of the
 other possible values (catchup, streaming) wuold be OK, I would think.

>>>
>>> I think that's known issue (see comment in tablesync.c about hanging
>>> forever). I think I may have fixed it locally.
>>>
>>> I will submit patch once I fixed the other snapshot issue (I managed to
>>> reproduce it as well, although very rarely so it's rather hard to test).
>>>
>>
>> Hi,
>>
>> Here it is. But check also the snapbuild related thread for updated
>> patches related to that (the issue you had with this not copying all
>> rows is yet another pre-existing Postgres bug).
>>
> 
> 
> The four earlier snapbuild patches apply cleanly, but
> then I get errors while  applying
> 0001-Logical-replication-support-for-initial-data-copy-v6.patch:
> 
> 
> patching file src/test/regress/expected/sanity_check.out
> (Stripping trailing CRs from patch.)
> patching file src/test/regress/expected/subscription.out
> Hunk #2 FAILED at 25.
> 1 out of 2 hunks FAILED -- saving rejects to file
> src/test/regress/expected/subscription.out.rej
> (Stripping trailing CRs from patch.)
> patching file src/test/regress/sql/object_address.sql
> (Stripping trailing CRs from patch.)
> patching file src/test/regress/sql/subscription.sql
> (Stripping trailing CRs from patch.)
> patching file src/test/subscription/t/001_rep_changes.pl
> Hunk #9 succeeded at 175 with fuzz 2.
> Hunk #10 succeeded at 193 (offset -9 lines).
> (Stripping trailing CRs from patch.)
> patching file src/test/subscription/t/002_types.pl
> (Stripping trailing CRs from patch.)
> can't find file to patch at input line 4296
> Perhaps you used the wrong -p or --strip option?
> The text leading up to this was:
> --
> |diff --git a/src/test/subscription/t/003_constraints.pl
> b/src/test/subscription/t/003_constraints.pl
> |index 17d4565..9543b91 100644
> |--- a/src/test/subscription/t/003_constraints.pl
> |+++ b/src/test/subscription/t/003_constraints.pl
> --
> File to patch:
> 
> 
> Can you have a look?
> 

Hi,

I think you are missing the other fixes (this specific error says that
you didn't apply
0002-Fix-after-trigger-execution-in-logical-replication.patch).

There is now a lot of fixes for existing code that this patch depends
on. Hopefully some of the fixes get committed soonish.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-02-24 Thread Erik Rijkers

On 2017-02-24 22:58, Petr Jelinek wrote:

On 23/02/17 01:41, Petr Jelinek wrote:

On 23/02/17 01:02, Erik Rijkers wrote:

On 2017-02-22 18:13, Erik Rijkers wrote:

On 2017-02-22 14:48, Erik Rijkers wrote:

On 2017-02-22 13:03, Petr Jelinek wrote:


0001-Skip-unnecessary-snapshot-builds.patch
0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch
0003-Fix-xl_running_xacts-usage-in-snapshot-builder.patch
0001-Use-asynchronous-connect-API-in-libpqwalreceiver.patch
0002-Fix-after-trigger-execution-in-logical-replication.patch
0003-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch
0001-Logical-replication-support-for-initial-data-copy-v5.patch


It works well now, or at least my particular test case seems now 
solved.


Cried victory too early, I'm afraid.



I got into a 'hung' state while repeating  pgbench_derail2.sh.

Below is some state.  I notice that master pg_stat_replication.syaye 
is

'startup'.
Maybe I should only start the test after that state has changed. Any 
of the
other possible values (catchup, streaming) wuold be OK, I would 
think.




I think that's known issue (see comment in tablesync.c about hanging
forever). I think I may have fixed it locally.

I will submit patch once I fixed the other snapshot issue (I managed 
to
reproduce it as well, although very rarely so it's rather hard to 
test).




Hi,

Here it is. But check also the snapbuild related thread for updated
patches related to that (the issue you had with this not copying all
rows is yet another pre-existing Postgres bug).




The four earlier snapbuild patches apply cleanly, but
then I get errors while  applying
0001-Logical-replication-support-for-initial-data-copy-v6.patch:


patching file src/test/regress/expected/sanity_check.out
(Stripping trailing CRs from patch.)
patching file src/test/regress/expected/subscription.out
Hunk #2 FAILED at 25.
1 out of 2 hunks FAILED -- saving rejects to file 
src/test/regress/expected/subscription.out.rej

(Stripping trailing CRs from patch.)
patching file src/test/regress/sql/object_address.sql
(Stripping trailing CRs from patch.)
patching file src/test/regress/sql/subscription.sql
(Stripping trailing CRs from patch.)
patching file src/test/subscription/t/001_rep_changes.pl
Hunk #9 succeeded at 175 with fuzz 2.
Hunk #10 succeeded at 193 (offset -9 lines).
(Stripping trailing CRs from patch.)
patching file src/test/subscription/t/002_types.pl
(Stripping trailing CRs from patch.)
can't find file to patch at input line 4296
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--
|diff --git a/src/test/subscription/t/003_constraints.pl 
b/src/test/subscription/t/003_constraints.pl

|index 17d4565..9543b91 100644
|--- a/src/test/subscription/t/003_constraints.pl
|+++ b/src/test/subscription/t/003_constraints.pl
--
File to patch:


Can you have a look?

thanks,

Erik Rijkers



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-02-22 Thread Petr Jelinek
On 23/02/17 01:02, Erik Rijkers wrote:
> On 2017-02-22 18:13, Erik Rijkers wrote:
>> On 2017-02-22 14:48, Erik Rijkers wrote:
>>> On 2017-02-22 13:03, Petr Jelinek wrote:
>>>
 0001-Skip-unnecessary-snapshot-builds.patch
 0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch
 0003-Fix-xl_running_xacts-usage-in-snapshot-builder.patch
 0001-Use-asynchronous-connect-API-in-libpqwalreceiver.patch
 0002-Fix-after-trigger-execution-in-logical-replication.patch
 0003-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch
 0001-Logical-replication-support-for-initial-data-copy-v5.patch
>>>
>>> It works well now, or at least my particular test case seems now solved.
>>
>> Cried victory too early, I'm afraid.
>>
> 
> I got into a 'hung' state while repeating  pgbench_derail2.sh.
> 
> Below is some state.  I notice that master pg_stat_replication.syaye is
> 'startup'.
> Maybe I should only start the test after that state has changed. Any of the
> other possible values (catchup, streaming) wuold be OK, I would think.
> 

I think that's known issue (see comment in tablesync.c about hanging
forever). I think I may have fixed it locally.

I will submit patch once I fixed the other snapshot issue (I managed to
reproduce it as well, although very rarely so it's rather hard to test).

Thanks for all this testing btw, I really appreciate it.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-02-22 Thread Erik Rijkers

On 2017-02-22 18:13, Erik Rijkers wrote:

On 2017-02-22 14:48, Erik Rijkers wrote:

On 2017-02-22 13:03, Petr Jelinek wrote:


0001-Skip-unnecessary-snapshot-builds.patch
0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch
0003-Fix-xl_running_xacts-usage-in-snapshot-builder.patch
0001-Use-asynchronous-connect-API-in-libpqwalreceiver.patch
0002-Fix-after-trigger-execution-in-logical-replication.patch
0003-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch
0001-Logical-replication-support-for-initial-data-copy-v5.patch


It works well now, or at least my particular test case seems now 
solved.


Cried victory too early, I'm afraid.



I got into a 'hung' state while repeating  pgbench_derail2.sh.

Below is some state.  I notice that master pg_stat_replication.syaye is 
'startup'.
Maybe I should only start the test after that state has changed. Any of 
the

other possible values (catchup, streaming) wuold be OK, I would think.


$  ( dbactivity.sh ; echo "; table pg_subscription; table 
pg_subscription_rel;" ) | psql -qXp 6973
 now |  pid  | app   
  | state  | wt_evt | wt_evt_type | query_start 
| duration |  query

-+---+-+++-+-+--+--
 2017-02-23 00:37:57 | 31352 | logical replication worker 47435  
  | active | relation   | Lock| 
|  |
 2017-02-23 00:37:57 |   397 | psql  
  | active | BgWorkerShutdown   | IPC | 2017-02-23 00:22:14 
| 00:15:42 | drop subscription if exists sub1
 2017-02-23 00:37:57 | 31369 | logical replication worker 47435 sync 
47423 || LogicalSyncStateChange | IPC |  
   |  |
 2017-02-23 00:37:57 |   398 | logical replication worker 47435 sync 
47418 || transactionid  | Lock|  
   |  |

(4 rows)

 subdbid | subname | subowner | subenabled | subconninfo | subslotname | 
subpublications

-+-+--++-+-+-
   16384 | sub1|   10 | t  | port=6972   | sub1| 
{pub1}

(1 row)

 srsubid | srrelid | srsubstate |  srsublsn
-+-++
   47435 |   47423 | w  | 2/CB078260
   47435 |   47412 | r  |
   47435 |   47415 | r  |
   47435 |   47418 | c  | 2/CB06E158
(4 rows)


Replica (port 6973):

[bulldog aardvark] [local]:6973 (Thu) 00:52:47 [pid:5401] [testdb] # 
table pg_stat_subscription ;
 subid | subname |  pid  | relid | received_lsn |  
last_msg_send_time   | last_msg_receipt_time | 
latest_end_lsn |latest_end_time

---+-+---+---+--+---+---++---
 47435 | sub1| 31369 | 47423 |  | 2017-02-23 
00:20:45.758072+01 | 2017-02-23 00:20:45.758072+01 || 
2017-02-23 00:20:45.758072+01
 47435 | sub1|   398 | 47418 |  | 2017-02-23 
00:22:14.896471+01 | 2017-02-23 00:22:14.896471+01 || 
2017-02-23 00:22:14.896471+01
 47435 | sub1| 31352 |   | 2/CB06E158   |
   | 2017-02-23 00:20:47.034664+01 || 2017-02-23 
00:20:45.679245+01

(3 rows)


Master  (port 6972):

[bulldog aardvark] [local]:6972 (Thu) 00:48:27 [pid:5307] [testdb] # \x 
on \\ table pg_stat_replication ;

Expanded display is on.
-[ RECORD 1 ]+--
pid  | 399
usesysid | 10
usename  | aardvark
application_name | sub1_47435_sync_47418
client_addr  |
client_hostname  |
client_port  | -1
backend_start| 2017-02-23 00:22:14.902701+01
backend_xmin |
state| startup
sent_location|
write_location   |
flush_location   |
replay_location  |
sync_priority| 0
sync_state   | async
-[ RECORD 2 ]+--
pid  | 31371
usesysid | 10
usename  | aardvark
application_name | sub1_47435_sync_47423
client_addr  |
client_hostname  |
client_port  | -1
backend_start| 2017-02-23 00:20:45.762852+01
backend_xmin |
state| startup
sent_location|
write_location   |
flush_location   |
replay_location  |
sync_priority| 0
sync_state   | async



( above 'dbactivity.sh' is:

select
  rpad(now()::text,19) as now
, pid   as pid
, application_name  as app
, state as state
, wait_eventas wt_evt
, wait_event_type   as wt_evt_type
, date_trunc('second', query_start::timestamp)  as query_start
, substring((now() - query_start)::text, 1, position('.' in 

Re: [HACKERS] Logical replication existing data copy

2017-02-22 Thread Erik Rijkers

On 2017-02-22 13:03, Petr Jelinek wrote:


0001-Skip-unnecessary-snapshot-builds.patch
0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch
0003-Fix-xl_running_xacts-usage-in-snapshot-builder.patch
0001-Use-asynchronous-connect-API-in-libpqwalreceiver.patch
0002-Fix-after-trigger-execution-in-logical-replication.patch
0003-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch
0001-Logical-replication-support-for-initial-data-copy-v5.patch


It works well now, or at least my particular test case seems now solved.

thanks,

Erik Rijkers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy - comments snapbuild.c

2017-02-19 Thread Erik Rijkers

On 2017-02-19 23:24, Erik Rijkers wrote:

0001-Use-asynchronous-connect-API-in-libpqwalreceiver-v2.patch
0002-Always-initialize-stringinfo-buffers-in-walsender-v2.patch
0003-Fix-after-trigger-execution-in-logical-replication-v2.patch
0004-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION-v2.patch
0001-Logical-replication-support-for-initial-data-copy-v4.patch


Improve comment blocks in
  src/backend/replication/logical/snapbuild.c



[deep sigh...]  attached...--- src/backend/replication/logical/snapbuild.c.orig2	2017-02-19 17:25:57.237527107 +0100
+++ src/backend/replication/logical/snapbuild.c	2017-02-19 23:19:57.654946968 +0100
@@ -34,7 +34,7 @@
  * xid. That is we keep a list of transactions between snapshot->(xmin, xmax)
  * that we consider committed, everything else is considered aborted/in
  * progress. That also allows us not to care about subtransactions before they
- * have committed which means this modules, in contrast to HS, doesn't have to
+ * have committed which means this module, in contrast to HS, doesn't have to
  * care about suboverflowed subtransactions and similar.
  *
  * One complexity of doing this is that to e.g. handle mixed DDL/DML
@@ -82,7 +82,7 @@
  * Initially the machinery is in the START stage. When an xl_running_xacts
  * record is read that is sufficiently new (above the safe xmin horizon),
  * there's a state transition. If there were no running xacts when the
- * runnign_xacts record was generated, we'll directly go into CONSISTENT
+ * running_xacts record was generated, we'll directly go into CONSISTENT
  * state, otherwise we'll switch to the FULL_SNAPSHOT state. Having a full
  * snapshot means that all transactions that start henceforth can be decoded
  * in their entirety, but transactions that started previously can't. In
@@ -274,7 +274,7 @@
 /*
  * Allocate a new snapshot builder.
  *
- * xmin_horizon is the xid >=which we can be sure no catalog rows have been
+ * xmin_horizon is the xid >= which we can be sure no catalog rows have been
  * removed, start_lsn is the LSN >= we want to replay commits.
  */
 SnapBuild *
@@ -1642,7 +1642,7 @@
 	fsync_fname("pg_logical/snapshots", true);
 
 	/*
-	 * Now there's no way we can loose the dumped state anymore, remember this
+	 * Now there's no way we can lose the dumped state anymore, remember this
 	 * as a serialization point.
 	 */
 	builder->last_serialized_snapshot = lsn;
@@ -1858,7 +1858,7 @@
 	char		path[MAXPGPATH];
 
 	/*
-	 * We start of with a minimum of the last redo pointer. No new replication
+	 * We start off with a minimum of the last redo pointer. No new replication
 	 * slot will start before that, so that's a safe upper bound for removal.
 	 */
 	redo = GetRedoRecPtr();
@@ -1916,7 +1916,7 @@
 			/*
 			 * It's not particularly harmful, though strange, if we can't
 			 * remove the file here. Don't prevent the checkpoint from
-			 * completing, that'd be cure worse than the disease.
+			 * completing, that'd be a cure worse than the disease.
 			 */
 			if (unlink(path) < 0)
 			{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy - comments snapbuild.c

2017-02-19 Thread Erik Rijkers

0001-Use-asynchronous-connect-API-in-libpqwalreceiver-v2.patch
0002-Always-initialize-stringinfo-buffers-in-walsender-v2.patch
0003-Fix-after-trigger-execution-in-logical-replication-v2.patch
0004-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION-v2.patch
0001-Logical-replication-support-for-initial-data-copy-v4.patch


Improve comment blocks in
  src/backend/replication/logical/snapbuild.c



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy - comments origin.c

2017-02-19 Thread Erik Rijkers

On 2017-02-19 17:21, Erik Rijkers wrote:

0001-Use-asynchronous-connect-API-in-libpqwalreceiver-v2.patch
0002-Always-initialize-stringinfo-buffers-in-walsender-v2.patch
0003-Fix-after-trigger-execution-in-logical-replication-v2.patch
0004-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION-v2.patch
0001-Logical-replication-support-for-initial-data-copy-v4.patch


Improve readability of comment blocks
in  src/backend/replication/logical/origin.c



now attached



thanks,

Erik Rijkers
--- src/backend/replication/logical/origin.c.orig	2017-02-19 16:45:28.558865304 +0100
+++ src/backend/replication/logical/origin.c	2017-02-19 17:11:09.034023021 +0100
@@ -11,31 +11,29 @@
  * NOTES
  *
  * This file provides the following:
- * * An infrastructure to name nodes in a replication setup
- * * A facility to efficiently store and persist replication progress in an
- *	 efficient and durable manner.
- *
- * Replication origin consist out of a descriptive, user defined, external
- * name and a short, thus space efficient, internal 2 byte one. This split
- * exists because replication origin have to be stored in WAL and shared
+ * * Infrastructure to name nodes in a replication setup
+ * * A facility to efficiently store and persist replication progress
+ *
+ * A replication origin has a descriptive, user defined, external
+ * name and a short, internal 2 byte one. This split
+ * exists because a replication origin has to be stored in WAL and shared
  * memory and long descriptors would be inefficient.  For now only use 2 bytes
  * for the internal id of a replication origin as it seems unlikely that there
- * soon will be more than 65k nodes in one replication setup; and using only
- * two bytes allow us to be more space efficient.
+ * soon will be more than 65k nodes in one replication setup.
  *
  * Replication progress is tracked in a shared memory table
- * (ReplicationStates) that's dumped to disk every checkpoint. Entries
+ * (ReplicationStates) that is dumped to disk every checkpoint. Entries
  * ('slots') in this table are identified by the internal id. That's the case
  * because it allows to increase replication progress during crash
  * recovery. To allow doing so we store the original LSN (from the originating
  * system) of a transaction in the commit record. That allows to recover the
- * precise replayed state after crash recovery; without requiring synchronous
+ * precise replayed state after crash recovery without requiring synchronous
  * commits. Allowing logical replication to use asynchronous commit is
  * generally good for performance, but especially important as it allows a
  * single threaded replay process to keep up with a source that has multiple
  * backends generating changes concurrently.  For efficiency and simplicity
- * reasons a backend can setup one replication origin that's from then used as
- * the source of changes produced by the backend, until reset again.
+ * reasons a backend can setup one replication origin that is used as
+ * the source of changes produced by the backend, until it is reset again.
  *
  * This infrastructure is intended to be used in cooperation with logical
  * decoding. When replaying from a remote system the configured origin is
@@ -45,11 +43,11 @@
  * There are several levels of locking at work:
  *
  * * To create and drop replication origins an exclusive lock on
- *	 pg_replication_slot is required for the duration. That allows us to
- *	 safely and conflict free assign new origins using a dirty snapshot.
+ *	 pg_replication_slot is required. That allows us to
+ *	 safely and conflict-free assign new origins using a dirty snapshot.
  *
- * * When creating an in-memory replication progress slot the ReplicationOirgin
- *	 LWLock has to be held exclusively; when iterating over the replication
+ * * When creating an in-memory replication progress slot the ReplicationOrigin
+ *	 LWLock has to be held exclusively. When iterating over the replication
  *	 progress a shared lock has to be held, the same when advancing the
  *	 replication progress of an individual backend that has not setup as the
  *	 session's replication origin.
@@ -57,7 +55,7 @@
  * * When manipulating or looking at the remote_lsn and local_lsn fields of a
  *	 replication progress slot that slot's lwlock has to be held. That's
  *	 primarily because we do not assume 8 byte writes (the LSN) is atomic on
- *	 all our platforms, but it also simplifies memory ordering concerns
+ *	 all our platforms, but it also simplifies memory ordering
  *	 between the remote and local lsn. We use a lwlock instead of a spinlock
  *	 so it's less harmful to hold the lock over a WAL write
  *	 (c.f. AdvanceReplicationProgress).
@@ -305,7 +303,7 @@
 		}
 	}
 
-	/* now release lock again,	*/
+	/* now release lock again. */
 	heap_close(rel, ExclusiveLock);
 
 	if (tuple == NULL)
@@ -382,7 +380,7 @@
 
 	CommandCounterIncrement();
 
-	/* now release lock again,	*/
+	/* now release lock again. */
 	

Re: [HACKERS] Logical replication existing data copy - comments origin.c

2017-02-19 Thread Erik Rijkers

0001-Use-asynchronous-connect-API-in-libpqwalreceiver-v2.patch
0002-Always-initialize-stringinfo-buffers-in-walsender-v2.patch
0003-Fix-after-trigger-execution-in-logical-replication-v2.patch
0004-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION-v2.patch
0001-Logical-replication-support-for-initial-data-copy-v4.patch


Improve readability of comment blocks
in  src/backend/replication/logical/origin.c


thanks,

Erik Rijkers




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-02-18 Thread Petr Jelinek
On 18/02/17 14:17, Erik Rijkers wrote:
> On 2017-02-16 00:43, Petr Jelinek wrote:
>> On 13/02/17 14:51, Erik Rijkers wrote:
>>> On 2017-02-11 11:16, Erik Rijkers wrote:
 On 2017-02-08 23:25, Petr Jelinek wrote:

> 0001-Use-asynchronous-connect-API-in-libpqwalreceiver-v2.patch
> 0002-Always-initialize-stringinfo-buffers-in-walsender-v2.patch
> 0003-Fix-after-trigger-execution-in-logical-replication-v2.patch
> 0004-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION-v2.patch
> 0001-Logical-replication-support-for-initial-data-copy-v4.patch

 This often works but it also fails far too often (in my hands).  I
>>
>> That being said, I am so far having problems reproducing this on my test
>> machine(s) so no idea what causes it yet.
>>
> 
> A few extra bits:
> 
> - I have repeated this now on three different machines (debian 7, 8,
> centos6; one a pretty big server); there is always failure within a few
> tries of that test program (i.e. pgbench_derail2.sh, with the above 5
> patches).
> 
> - I have also tried to go back to an older version of logrep: running
> with 2 instances with only the first four patches (i.e., leaving out the
> support-for-existing-data patch).  With only those 4, the logical
> replication is solid. (a quick 25x repetition of a (very similar) test
> program is 100% successful). So the problem is likely somehow in that
> last 5th patch.
> 
> - A 25x repetition of a test on a master + replica 5-patch server yields
> 13 ok, 12 NOK.
> 
> - Is the 'make check' FAILED test 'object_addess' unrelated?  (Can you
> at least reproduce that failed test?)
> 

Yes, it has nothing to do with that, that just needs to be updated to
use  SKIP CONNECT instead of NOCREATE SLOT in this patch since NOCREATE
SLOT is no longer enough to skip the connection attempt. And I have that
fixed locally, but that does not deserve new patch version given the
main issue you reported.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-02-18 Thread Petr Jelinek
On 18/02/17 14:24, Erik Rijkers wrote:
>>
>> Maybe add this to the 10 Open Items list?
>>   https://wiki.postgresql.org/wiki/PostgreSQL_10_Open_Items
>>
>> It may garner a bit more attention.
>>
> 
> Ah sorry, it's there already.

Hmm that's interesting given that it's not even committed yet :)

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-02-18 Thread Erik Rijkers

On 2017-02-11 11:16, Erik Rijkers wrote:

On 2017-02-08 23:25, Petr Jelinek wrote:


0001-Use-asynchronous-connect-API-in-libpqwalreceiver-v2.patch
0002-Always-initialize-stringinfo-buffers-in-walsender-v2.patch
0003-Fix-after-trigger-execution-in-logical-replication-v2.patch
0004-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION-v2.patch
0001-Logical-replication-support-for-initial-data-copy-v4.patch




Let me add the script ('instances.sh')  that I use to startup the two 
logical replication instances for testing.


Together with the earlier posted 'pgbench_derail2.sh' it makes out the 
fails test.


pg_config of the master is:

$ pg_config
BINDIR = 
/home/aardvark/pg_stuff/pg_installations/pgsql.logical_replication/bin
DOCDIR = 
/home/aardvark/pg_stuff/pg_installations/pgsql.logical_replication/share/doc
HTMLDIR = 
/home/aardvark/pg_stuff/pg_installations/pgsql.logical_replication/share/doc
INCLUDEDIR = 
/home/aardvark/pg_stuff/pg_installations/pgsql.logical_replication/include
PKGINCLUDEDIR = 
/home/aardvark/pg_stuff/pg_installations/pgsql.logical_replication/include
INCLUDEDIR-SERVER = 
/home/aardvark/pg_stuff/pg_installations/pgsql.logical_replication/include/server
LIBDIR = 
/home/aardvark/pg_stuff/pg_installations/pgsql.logical_replication/lib
PKGLIBDIR = 
/home/aardvark/pg_stuff/pg_installations/pgsql.logical_replication/lib
LOCALEDIR = 
/home/aardvark/pg_stuff/pg_installations/pgsql.logical_replication/share/locale
MANDIR = 
/home/aardvark/pg_stuff/pg_installations/pgsql.logical_replication/share/man
SHAREDIR = 
/home/aardvark/pg_stuff/pg_installations/pgsql.logical_replication/share
SYSCONFDIR = 
/home/aardvark/pg_stuff/pg_installations/pgsql.logical_replication/etc
PGXS = 
/home/aardvark/pg_stuff/pg_installations/pgsql.logical_replication/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = 
'--prefix=/home/aardvark/pg_stuff/pg_installations/pgsql.logical_replication' 
'--bindir=/home/aardvark/pg_stuff/pg_installations/pgsql.logical_replication/bin' 
'--libdir=/home/aardvark/pg_stuff/pg_installations/pgsql.logical_replication/lib' 
'--with-pgport=6972' '--enable-depend' '--enable-cassert' 
'--enable-debug' '--with-openssl' '--with-perl' '--with-libxml' 
'--with-libxslt' '--with-zlib' '--enable-tap-tests' 
'--with-extra-version=_logical_replication_20170218_1221_e3a58c8835a2'

CC = gcc
CPPFLAGS = -DFRONTEND -D_GNU_SOURCE -I/usr/include/libxml2
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv 
-fexcess-precision=standard -g -O2

CFLAGS_SL = -fpic
LDFLAGS = -L../../src/common -Wl,--as-needed 
-Wl,-rpath,'/home/aardvark/pg_stuff/pg_installations/pgsql.logical_replication/lib',--enable-new-dtags

LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgcommon -lpgport -lxslt -lxml2 -lssl -lcrypto -lz -lreadline 
-lrt -lcrypt -ldl -lm
VERSION = PostgreSQL 
10devel_logical_replication_20170218_1221_e3a58c8835a2



I hope it helps someone to reproduce the errors I get.  (If you don't, 
I'd like to hear that too)



thanks,

Erik Rijkers
#!/bin/sh
port1=6972
port2=6973
project1=logical_replication
project2=logical_replication2
pg_stuff_dir=$HOME/pg_stuff
PATH1=$pg_stuff_dir/pg_installations/pgsql.$project1/bin:$PATH
PATH2=$pg_stuff_dir/pg_installations/pgsql.$project2/bin:$PATH
server_dir1=$pg_stuff_dir/pg_installations/pgsql.$project1
server_dir2=$pg_stuff_dir/pg_installations/pgsql.$project2
data_dir1=$server_dir1/data
data_dir2=$server_dir2/data
options1="
-c wal_level=logical
-c max_replication_slots=10
-c max_worker_processes=12
-c max_logical_replication_workers=10
-c max_wal_senders=14
-c logging_collector=on
-c log_directory=$server_dir1
-c log_filename=logfile.${project1} "

options2="
-c wal_level=replica
-c max_replication_slots=10
-c max_worker_processes=12
-c max_logical_replication_workers=10
-c max_wal_senders=14
-c logging_collector=on
-c log_directory=$server_dir2
-c log_filename=logfile.${project2} "

export PATH=$PATH1; which postgres; postgres -D $data_dir1 -p $port1 ${options1} &
export PATH=$PATH2; which postgres; postgres -D $data_dir2 -p $port2 ${options2} &


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-02-18 Thread Erik Rijkers


Maybe add this to the 10 Open Items list?
  https://wiki.postgresql.org/wiki/PostgreSQL_10_Open_Items

It may garner a bit more attention.



Ah sorry, it's there already.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-02-18 Thread Erik Rijkers

On 2017-02-16 00:43, Petr Jelinek wrote:

On 13/02/17 14:51, Erik Rijkers wrote:

On 2017-02-11 11:16, Erik Rijkers wrote:

On 2017-02-08 23:25, Petr Jelinek wrote:


0001-Use-asynchronous-connect-API-in-libpqwalreceiver-v2.patch
0002-Always-initialize-stringinfo-buffers-in-walsender-v2.patch
0003-Fix-after-trigger-execution-in-logical-replication-v2.patch
0004-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION-v2.patch
0001-Logical-replication-support-for-initial-data-copy-v4.patch


This often works but it also fails far too often (in my hands).  I


That being said, I am so far having problems reproducing this on my 
test

machine(s) so no idea what causes it yet.



A few extra bits:

- I have repeated this now on three different machines (debian 7, 8, 
centos6; one a pretty big server); there is always failure within a few 
tries of that test program (i.e. pgbench_derail2.sh, with the above 5 
patches).


- I have also tried to go back to an older version of logrep: running 
with 2 instances with only the first four patches (i.e., leaving out the 
support-for-existing-data patch).  With only those 4, the logical 
replication is solid. (a quick 25x repetition of a (very similar) test 
program is 100% successful). So the problem is likely somehow in that 
last 5th patch.


- A 25x repetition of a test on a master + replica 5-patch server yields 
13 ok, 12 NOK.


- Is the 'make check' FAILED test 'object_addess' unrelated?  (Can you 
at least reproduce that failed test?)


Maybe add this to the 10 Open Items list?
  https://wiki.postgresql.org/wiki/PostgreSQL_10_Open_Items

It may garner a bit more attention.


thanks,

Erik Rijkers




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-02-16 Thread Erik Rijkers

On 2017-02-16 00:43, Petr Jelinek wrote:

On 13/02/17 14:51, Erik Rijkers wrote:

On 2017-02-11 11:16, Erik Rijkers wrote:

On 2017-02-08 23:25, Petr Jelinek wrote:


0001-Use-asynchronous-connect-API-in-libpqwalreceiver-v2.patch
0002-Always-initialize-stringinfo-buffers-in-walsender-v2.patch
0003-Fix-after-trigger-execution-in-logical-replication-v2.patch
0004-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION-v2.patch
0001-Logical-replication-support-for-initial-data-copy-v4.patch


This often works but it also fails far too often (in my hands).  I


Could you periodically dump contents of the pg_subscription_rel on
subscriber (ideally when dumping the md5 of the data) and attach that 
as

well?


I attach a bash script (and its output) that polls the 4 pgbench table's 
md5s and the pg_subscription_rel table, each second, while I run the 
pgbench_derail2.sh (for that see my earlier mail).


pgbench_derail2.sh writes a 'header' into the same output stream (search 
for '^===' ).


The .out file reflects a session where I started pgbench_derail2.sh 
twice (it removes the publication and subscription at startup).  So 
there are 2 headers in the attached  cb_20170216_10_04_47.out. The first 
run ended in a succesful replication (=all 4 pgbench tables 
md5-identical).  The second run does not end correctly: it has (one of) 
the typical faulty end-states: pgbench_accounts, the copy, has a few 
less rows than the master table.


Other typical endstates are:
same number of rows but content not identical (for some, typically < 20 
rows).

mostly pgbench_accounts and pgbench_history are affected.

(I see now that I made some mistakes in generating the timestamps in the 
.out file but I suppose it doesn't matter too much)


I hope it helps; let me know if I can do any other test(s).

20170216_10_04_49_1487  6972 a,b,t,h: 10  1 10776   24be8c7be  
cf860f1f2  aed87334f  f2bfaa587   master
20170216_10_04_50_1487  6973 a,b,t,h:  6  1 10776   74cd7528c  
cf860f1f2  aed87334f  f2bfaa587   replica NOK
  now  | srsubid | srrelid | srsubstate | srsublsn 
---+-+-++--
 2017-02-16 10:04:50.242616+01 |   25398 |   25375 | r  | 
 2017-02-16 10:04:50.242616+01 |   25398 |   25378 | r  | 
 2017-02-16 10:04:50.242616+01 |   25398 |   25381 | r  | 
 2017-02-16 10:04:50.242616+01 |   25398 |   25386 | r  | 
(4 rows)

20170216_10_04_51_1487  6972 a,b,t,h: 10  1 10776   24be8c7be  
cf860f1f2  aed87334f  f2bfaa587   master
20170216_10_04_51_1487  6973 a,b,t,h:  6  1 10776   74cd7528c  
cf860f1f2  aed87334f  f2bfaa587   replica NOK
  now  | srsubid | srrelid | srsubstate | srsublsn 
---+-+-++--
 2017-02-16 10:04:51.945931+01 |   25398 |   25375 | r  | 
 2017-02-16 10:04:51.945931+01 |   25398 |   25378 | r  | 
 2017-02-16 10:04:51.945931+01 |   25398 |   25381 | r  | 
 2017-02-16 10:04:51.945931+01 |   25398 |   25386 | r  | 
(4 rows)


-- 20170216 10:04:S
-- scale  1 clients  1   INIT_WAIT  0
-- 
/home/aardvark/pg_stuff/pg_installations/pgsql.logical_replication/bin.fast/postgres
20170216_10_04_53_1487  6972 a,b,t,h: 10  1 10776   24be8c7be  
cf860f1f2  aed87334f  f2bfaa587   master
20170216_10_04_53_1487  6973 a,b,t,h:  6  1 10776   74cd7528c  
cf860f1f2  aed87334f  f2bfaa587   replica NOK
  now  | srsubid | srrelid | srsubstate | srsublsn 
---+-+-++--
 2017-02-16 10:04:53.635163+01 |   25398 |   25375 | r  | 
 2017-02-16 10:04:53.635163+01 |   25398 |   25378 | r  | 
 2017-02-16 10:04:53.635163+01 |   25398 |   25381 | r  | 
 2017-02-16 10:04:53.635163+01 |   25398 |   25386 | r  | 
(4 rows)

20170216_10_04_54_1487  6972 a,b,t,h:  0  0  0  0   24be8c7be  
d41d8cd98  d41d8cd98  d41d8cd98   master
20170216_10_04_55_1487  6973 a,b,t,h:  0  0  0  0   d41d8cd98  
d41d8cd98  d41d8cd98  d41d8cd98   replica NOK
 now | srsubid | srrelid | srsubstate | srsublsn 
-+-+-++--
(0 rows)

20170216_10_04_56_1487  6972 a,b,t,h: 10  1 10  0   68d91d95b  
6c4f8b9aa  92162c9b8  d41d8cd98   master
20170216_10_04_56_1487  6973 a,b,t,h:  0  0  0  0   d41d8cd98  
d41d8cd98  d41d8cd98  d41d8cd98   replica NOK
 now | srsubid | srrelid | srsubstate | srsublsn 
-+-+-++--
(0 rows)

20170216_10_04_57_1487  6972 a,b,t,h: 10  1 10  1   68d91d95b  
6c4f8b9aa  92162c9b8  d41d8cd98   master
20170216_10_04_58_1487  6973 a,b,t,h:  0  0  0  0   d41d8cd98  

Re: [HACKERS] Logical replication existing data copy

2017-02-15 Thread Petr Jelinek
On 13/02/17 14:51, Erik Rijkers wrote:
> On 2017-02-11 11:16, Erik Rijkers wrote:
>> On 2017-02-08 23:25, Petr Jelinek wrote:
>>
>>> 0001-Use-asynchronous-connect-API-in-libpqwalreceiver-v2.patch
>>> 0002-Always-initialize-stringinfo-buffers-in-walsender-v2.patch
>>> 0003-Fix-after-trigger-execution-in-logical-replication-v2.patch
>>> 0004-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION-v2.patch
>>> 0001-Logical-replication-support-for-initial-data-copy-v4.patch
>>
>> This often works but it also fails far too often (in my hands).  I
>> test whether the tables are identical by comparing an md5 from an
>> ordered resultset, from both replica and master.  I estimate that 1 in
>> 5 tries fail; 'fail'  being a somewhat different table on replica
>> (compared to mater), most often pgbench_accounts (typically there are
>> 10-30 differing rows).  No errors or warnings in either logfile.   I'm
>> not sure but I think testing on faster machines seem to be doing
>> somewhat better ('better' being less replication error).
>>
> 
> I have noticed that when I insert a few seconds wait-state after the
> create subscription (or actually: the 'enable'ing of the subscription)
> the problem does not occur.  Apparently, (I assume) the initial snapshot
> occurs somewhere when the subsequent pgbench-run has already started, so
> that the logical replication also starts somewhere 'into' that
> pgbench-run. Does that make sense?
> 
> I don't know what to make of it.  Now that I think that I understand
> what happens I hesitate to call it a bug. But I'd say it's still a
> useability problem that the subscription is only 'valid' after some
> time, even if it's only a few seconds.
> 

It is a bug, we are going to great lengths to create data snapshot that
corresponds to specific LSN so that we are able to decode exactly the
changes that happened since the data snapshot was taken. And the
tablecopy.c does quite a lot to synchronize table handover to main apply
process so that there is correct continuation of data stream as well. So
the end result is that concurrent changes are supposed to be okay and
eventually replication should catch up and the contents should be the same.

That being said, I am so far having problems reproducing this on my test
machine(s) so no idea what causes it yet.

Could you periodically dump contents of the pg_subscription_rel on
subscriber (ideally when dumping the md5 of the data) and attach that as
well?

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-02-13 Thread Erik Rijkers

On 2017-02-11 11:16, Erik Rijkers wrote:

On 2017-02-08 23:25, Petr Jelinek wrote:


0001-Use-asynchronous-connect-API-in-libpqwalreceiver-v2.patch
0002-Always-initialize-stringinfo-buffers-in-walsender-v2.patch
0003-Fix-after-trigger-execution-in-logical-replication-v2.patch
0004-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION-v2.patch
0001-Logical-replication-support-for-initial-data-copy-v4.patch


This often works but it also fails far too often (in my hands).  I
test whether the tables are identical by comparing an md5 from an
ordered resultset, from both replica and master.  I estimate that 1 in
5 tries fail; 'fail'  being a somewhat different table on replica
(compared to mater), most often pgbench_accounts (typically there are
10-30 differing rows).  No errors or warnings in either logfile.   I'm
not sure but I think testing on faster machines seem to be doing
somewhat better ('better' being less replication error).



I have noticed that when I insert a few seconds wait-state after the 
create subscription (or actually: the 'enable'ing of the subscription) 
the problem does not occur.  Apparently, (I assume) the initial snapshot 
occurs somewhere when the subsequent pgbench-run has already started, so 
that the logical replication also starts somewhere 'into' that 
pgbench-run. Does that make sense?


I don't know what to make of it.  Now that I think that I understand 
what happens I hesitate to call it a bug. But I'd say it's still a 
useability problem that the subscription is only 'valid' after some 
time, even if it's only a few seconds.


(the other problem I mentioned (drop subscription hangs) still happens 
every now and then)



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy - sgml fixes

2017-02-13 Thread Erik Rijkers

On 2017-02-09 02:25, Erik Rijkers wrote:

On 2017-02-08 23:25, Petr Jelinek wrote:


0001-Use-asynchronous-connect-API-in-libpqwalreceiver-v2.patch
0002-Always-initialize-stringinfo-buffers-in-walsender-v2.patch
0003-Fix-after-trigger-execution-in-logical-replication-v2.patch
0004-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION-v2.patch
0001-Logical-replication-support-for-initial-data-copy-v4.patch




fixes in create_subscription.sgml

--- doc/src/sgml/ref/create_subscription.sgml.orig2	2017-02-11 11:58:10.788502999 +0100
+++ doc/src/sgml/ref/create_subscription.sgml	2017-02-11 12:17:50.069635493 +0100
@@ -55,7 +55,7 @@
 
   
Additional info about subscriptions and logical replication as a whole
-   can is available at  and
+   is available at  and
.
   
 
@@ -122,14 +122,14 @@
 
  
   Name of the replication slot to use. The default behavior is to use
-  subscription_name for slot name.
+  subscription_name as the slot name.
  
 

 

-COPY DATA
-NOCOPY DATA
+COPY DATA
+NOCOPY DATA
 
  
   Specifies if the existing data in the publication that are being
@@ -140,11 +140,11 @@

 

-SKIP CONNECT
+SKIP CONNECT
 
  
-  Instructs the CREATE SUBSCRIPTION to skip initial
-  connection to the provider. This will change default values of other
+  Instructs CREATE SUBSCRIPTION to skip initial
+  connection to the provider. This will change the default values of other
   options to DISABLED,
   NOCREATE SLOT and NOCOPY DATA.
  
@@ -181,8 +181,8 @@
 
   
Create a subscription to a remote server that replicates tables in
-   the publications mypubclication and
-   insert_only and starts replicating immediately on
+   the publications mypublication and
+   insert_only and start replicating immediately on
commit:
 
 CREATE SUBSCRIPTION mysub
@@ -193,7 +193,7 @@
 
   
Create a subscription to a remote server that replicates tables in
-   the insert_only publication and does not start replicating
+   the insert_only publication and do not start replicating
until enabled at a later time.
 
 CREATE SUBSCRIPTION mysub

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-02-11 Thread Erik Rijkers

On 2017-02-08 23:25, Petr Jelinek wrote:


0001-Use-asynchronous-connect-API-in-libpqwalreceiver-v2.patch
0002-Always-initialize-stringinfo-buffers-in-walsender-v2.patch
0003-Fix-after-trigger-execution-in-logical-replication-v2.patch
0004-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION-v2.patch
0001-Logical-replication-support-for-initial-data-copy-v4.patch


Apart from the failing one make check test (test 'object_address') which 
I reported earlier, I find it is easy to 'confuse' the replication.


I attach a script that intends to test the default COPY DATA.   There 
are two instances, initially without any replication.  The script inits 
pgbench on the master, adds a serial column to pgbench_history, and 
dump-restores the 4 pgbench-tables to the future replica.  It then 
empties the 4 pgbench-tables on the 'replica'.  The idea is that when 
logrep is initiated, data will be replicated from master, with the end 
result being that there are 4 identical tables on master and replica.


This often works but it also fails far too often (in my hands).  I test 
whether the tables are identical by comparing an md5 from an ordered 
resultset, from both replica and master.  I estimate that 1 in 5 tries 
fail; 'fail'  being a somewhat different table on replica (compared to 
mater), most often pgbench_accounts (typically there are 10-30 differing 
rows).  No errors or warnings in either logfile.   I'm not sure but I 
think testing on faster machines seem to be doing somewhat better 
('better' being less replication error).


Another, probably unrelated, problem occurs (but much more rarely) when 
executing 'DROP SUBSCRIPTION sub1' on the replica (see the beginning of 
the script).  Sometimes that command hangs, and refuses to accept 
shutdown of the server.  I don't know how to recover from this -- I just 
have to kill the replica server (master server still obeys normal 
shutdown) and restart the instances.


The script accepts 2 parameters, scale and clients (used in pgbench -s 
resp. -c)


I don't think I've managed to successfully run the script with more than 
1 client yet.


Can you have a look whether this is reproducible elsewhere?

thanks,

Erik Rijkers






#!/bin/sh

#  assumes both instances are running, on port 6972 and 6973

logfile1=$HOME/pg_stuff/pg_installations/pgsql.logical_replication/logfile.logical_replication
logfile2=$HOME/pg_stuff/pg_installations/pgsql.logical_replication2/logfile.logical_replication2

scale=1
if [[ ! "$1" == "" ]]
then
   scale=$1
fi

clients=1
if [[ ! "$2" == "" ]]
then
   clients=$2
fi

unset PGSERVICEFILE PGSERVICE PGPORT PGDATA PGHOST
PGDATABASE=testdb

# (this script also uses a custom pgpassfile)

## just for info:
# env | grep PG
# psql -qtAXc "select current_setting('server_version')"

port1=6972
port2=6973

function cb()
{
  #  display the 4 pgbench tables' accumulated content as md5s
  #  a,b,t,h stand for:  pgbench_accounts, -branches, -tellers, -history
  md5_total_6972='-1'
  md5_total_6973='-2'
  for port in $port1 $port2
  do
md5_a=$(echo "select * from pgbench_accounts order by aid"|psql -qtAXp$port|md5sum|cut -b 1-9)
md5_b=$(echo "select * from pgbench_branches order by bid"|psql -qtAXp$port|md5sum|cut -b 1-9)
md5_t=$(echo "select * from pgbench_tellers  order by tid"|psql -qtAXp$port|md5sum|cut -b 1-9)
md5_h=$(echo "select * from pgbench_history  order by hid"|psql -qtAXp$port|md5sum|cut -b 1-9)
cnt_a=$(echo "select count(*) from pgbench_accounts"|psql -qtAXp $port)
cnt_b=$(echo "select count(*) from pgbench_branches"|psql -qtAXp $port)
cnt_t=$(echo "select count(*) from pgbench_tellers" |psql -qtAXp $port)
cnt_h=$(echo "select count(*) from pgbench_history" |psql -qtAXp $port)
md5_total[$port]=$( echo "${md5_a} ${md5_b} ${md5_t} ${md5_h}" | md5sum )
printf "$port a,b,t,h: %6d %6d %6d %6d" $cnt_a  $cnt_b  $cnt_t  $cnt_h
echo -n "   $md5_a  $md5_b  $md5_t  $md5_h"
if   [[ $port -eq $port1 ]]; then echo"   master"
elif [[ $port -eq $port2 ]]; then echo -n "   replica"
else  echo" ERROR  "
fi
  done
  if [[ "${md5_total[6972]}" == "${md5_total[6973]}" ]]
  then
echo " ok"
  else
echo " NOK"
  fi
}

bail=0

pub_count=$( echo "select count(*) from pg_publication" | psql -qtAXp 6972 )
if  [[ $pub_count -ne 0 ]]
then
  echo "pub_count -ne 0 - deleting pub1 & bailing out"
  echo "drop publication if exists pub1" | psql -Xp 6972
  bail=1
fi
sub_count=$( echo "select count(*) from pg_subscription" | psql -qtAXp 6973 )
if  [[ $sub_count -ne 0 ]]
then
  echo "sub_count -ne 0 - deleting sub1 & bailing out"
  echo "drop subscription if exists sub1" | psql -Xp 6973
  rc=$?
  echo "(drop subscr. ) )  rc=$rc"
  bail=1
fi

pub_count=$( echo "select count(*) from pg_publication"  | psql -qtAXp 6972 )
sub_count=$( echo "select count(*) from pg_subscription" | psql -qtAXp 6973 )

#if [[ $bail -eq 1 ]]
#then
#if  [[ $pub_count -eq 0 ]] && [[ 

Re: [HACKERS] Logical replication existing data copy

2017-02-08 Thread Erik Rijkers

On 2017-02-08 23:25, Petr Jelinek wrote:


0001-Use-asynchronous-connect-API-in-libpqwalreceiver-v2.patch
0002-Always-initialize-stringinfo-buffers-in-walsender-v2.patch
0003-Fix-after-trigger-execution-in-logical-replication-v2.patch
0004-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION-v2.patch
0001-Logical-replication-support-for-initial-data-copy-v4.patch


test 'object_address' fails, see atachment.

That's all I found in a quick first trial.

thanks,

Erik Rijkers




*** /home/aardvark/pg_stuff/pg_sandbox/pgsql.logical_replication/src/test/regress/expected/object_address.out	2017-02-09 00:51:30.345519608 +0100
--- /home/aardvark/pg_stuff/pg_sandbox/pgsql.logical_replication/src/test/regress/results/object_address.out	2017-02-09 00:54:11.884715532 +0100
***
*** 38,43 
--- 38,45 
  	TO SQL WITH FUNCTION int4recv(internal));
  CREATE PUBLICATION addr_pub FOR TABLE addr_nsp.gentable;
  CREATE SUBSCRIPTION addr_sub CONNECTION '' PUBLICATION bar WITH (DISABLED, NOCREATE SLOT);
+ ERROR:  could not connect to the publisher: FATAL:  no pg_hba.conf entry for replication connection from host "[local]", user "aardvark", SSL off
+ 
  -- test some error cases
  SELECT pg_get_object_address('stone', '{}', '{}');
  ERROR:  unrecognized object type "stone"
***
*** 409,463 
  			pg_identify_object_as_address(classid, objid, subobjid) ioa(typ,nms,args),
  			pg_get_object_address(typ, nms, ioa.args) as addr2
  	ORDER BY addr1.classid, addr1.objid, addr1.subobjid;
!type|   schema   |   name|   identity   | ?column? 
! ---++---+--+--
!  default acl   ||   | for role regress_addr_user in schema public on tables| t
!  default acl   ||   | for role regress_addr_user on tables | t
!  type  | pg_catalog | _int4 | integer[]| t
!  type  | addr_nsp   | gencomptype   | addr_nsp.gencomptype | t
!  type  | addr_nsp   | genenum   | addr_nsp.genenum | t
!  type  | addr_nsp   | gendomain | addr_nsp.gendomain   | t
!  function  | pg_catalog |   | pg_catalog.pg_identify_object(pg_catalog.oid,pg_catalog.oid,integer) | t
!  aggregate | addr_nsp   |   | addr_nsp.genaggr(integer)| t
!  sequence  | addr_nsp   | gentable_a_seq| addr_nsp.gentable_a_seq  | t
!  table | addr_nsp   | gentable  | addr_nsp.gentable| t
!  table column  | addr_nsp   | gentable  | addr_nsp.gentable.b  | t
!  index | addr_nsp   | gentable_pkey | addr_nsp.gentable_pkey   | t
!  view  | addr_nsp   | genview   | addr_nsp.genview | t
!  materialized view | addr_nsp   | genmatview| addr_nsp.genmatview  | t
!  foreign table | addr_nsp   | genftable | addr_nsp.genftable   | t
!  foreign table column  | addr_nsp   | genftable | addr_nsp.genftable.a | t
!  role  || regress_addr_user | regress_addr_user| t
!  server|| addr_fserv| addr_fserv   | t
!  user mapping  ||   | regress_addr_user on server integer  | t
!  foreign-data wrapper  || addr_fdw  | addr_fdw | t
!  access method || btree | btree| t
!  operator of access method ||   | operator 1 (integer, integer) of pg_catalog.integer_ops USING btree  | t
!  function of access method ||   | function 2 (integer, integer) of pg_catalog.integer_ops USING btree  | t
!  default value |   

Re: [HACKERS] Logical replication existing data copy

2017-02-08 Thread Petr Jelinek
Hi,

here is updated patch.

Note that it's rebased on top of logical replication improvements
patches [1] (which still apply fine to my surprise).

It will probably need another rebase once patches from Masahiko Sawada
and Fujii Masao get in.

[1]
https://www.postgresql.org/message-id/42655eb4-6b2e-b35b-c184-509efd6eba10%402ndquadrant.com

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-01-31 Thread Michael Paquier
On Fri, Jan 20, 2017 at 3:03 AM, Petr Jelinek
 wrote:
> Okay, here is v3 with some small fixes and rebased on top of rename.
> Also it's rebased without the
> 0005-Add-separate-synchronous-commit-control-for-logical--v18.patch as I
> don't expect that one to go further for now.
>
> Thanks for testing!

This patch needs a rebase, moved to next CF with "waiting on author".
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-01-19 Thread Erik Rijkers

On 2017-01-18 20:45, Petr Jelinek wrote:


AFAICS you should always get error from that test after you enable the


Ah tes, you were right, of course;  I had assumed the earlier mentioned
CREATE SUBSCRIPTION ... [ WITH (COPY DATA | NOCOPY DATA) ]
but that syntax wasn't implemented, I now understand.

Taking that into account, my older tests work OK again (using the 
7-patches below).



Other small issue: using this patch-set:


0001-Add-PUBLICATION-catalogs-and-DDL-v18.patch
0002-Add-SUBSCRIPTION-catalog-and-DDL-v18.patch
0003-Define-logical-replication-protocol-and-output-plugi-v18.patch
0004-Add-logical-replication-workers-v18fixed.patch
0005-Add-separate-synchronous-commit-control-for-logical--v18.patch
0001-Logical-replication-support-for-initial-data-copy-v2.patch
0006-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch


( This is now the patch-set to test, is that correct? )

make check complains:

*** 
/home/aardvark/pg_stuff/pg_sandbox/pgsql.logical_replication/src/test/regress/expected/subscription.out	2017-01-19 
09:26:41.354703032 +0100
--- 
/home/aardvark/pg_stuff/pg_sandbox/pgsql.logical_replication/src/test/regress/results/subscription.out	2017-01-19 
09:29:56.104685043 +0100

***
*** 53,62 
  COMMIT;
  ALTER SUBSCRIPTION testsub RENAME TO testsub_foo;
  \dRs
!  List of subscriptions
! Name |   Owner   | Enabled |Publication
! 
-+---+-+
!  testsub_foo | regress_subscription_user | f   | 
{testpub,testpub1}

  (1 row)

  DROP SUBSCRIPTION testsub_foo NODROP SLOT;
--- 53,62 
  COMMIT;
  ALTER SUBSCRIPTION testsub RENAME TO testsub_foo;
  \dRs
!   List of subscriptions
! Name |   Owner   | Enabled | Publication
! -+---+-+-
!  testsub_foo | regress_subscription_user | f   | {testpub}
  (1 row)

  DROP SUBSCRIPTION testsub_foo NODROP SLOT;

==


Thanks,


Erik Rijkers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-01-18 Thread Petr Jelinek
On 18/01/17 17:35, Erik Rijkers wrote:
> On 2017-01-18 14:46, Petr Jelinek wrote:
> 
>> 0001-Logical-replication-support-for-initial-data-copy-v2.patch
> 
> Applies and builds fine on top of the previous 5 patches.
> 
> Two problems:
> 
> 1.  alter_subscription.sgml has an unpaired -tag, which breaks
> the doc-build:
> This is the offending patch-line:
> +  CREATE SUBSCRIPTION with COPY
> DATA
>

Hmm, I wonder how did that compile on my machine as it's indeed syntax
error.

> 
> 2. Running the below (a version of the earlier pgbench_derail.sh) I have
> found that
>create subscription sub1  ..  with (disabled);  and then  alter
> subscription sub1 enable;
> cannot be run immediately, consecutively.  The error is avoided when the
> two
> commands are separated (for instance, below in separate psql- calls).
> 
> I don't understand why this is but it is reliably so.
>

AFAICS you should always get error from that test after you enable the
subscription, no matter if you enable immediately or later. The default
behavior is to copy the data and your test already copies them via
pg_dump/pg_restore.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-01-18 Thread Erik Rijkers

On 2017-01-18 14:46, Petr Jelinek wrote:


0001-Logical-replication-support-for-initial-data-copy-v2.patch


Applies and builds fine on top of the previous 5 patches.

Two problems:

1.  alter_subscription.sgml has an unpaired -tag, which breaks 
the doc-build:

This is the offending patch-line:
+  CREATE SUBSCRIPTION with COPY 
DATA



2. Running the below (a version of the earlier pgbench_derail.sh) I have 
found that
   create subscription sub1  ..  with (disabled);  and then  alter 
subscription sub1 enable;
cannot be run immediately, consecutively.  The error is avoided when the 
two

commands are separated (for instance, below in separate psql- calls).

I don't understand why this is but it is reliably so.

The error(s):
2017-01-18 17:26:56.126 CET 24410 LOG:  starting logical replication 
worker for subscription "sub1"
2017-01-18 17:26:56.132 CET 26291 LOG:  logical replication apply for 
subscription sub1 started
2017-01-18 17:26:56.139 CET 26291 LOG:  starting logical replication 
worker for subscription "sub1"
2017-01-18 17:26:56.145 CET 26295 LOG:  logical replication sync for 
subscription sub1, table pgbench_accounts started
2017-01-18 17:26:56.534 CET 26295 ERROR:  duplicate key value violates 
unique constraint "pgbench_accounts_pkey"

2017-01-18 17:26:56.534 CET 26295 DETAIL:  Key (aid)=(1) already exists.
2017-01-18 17:26:56.534 CET 26295 CONTEXT:  COPY pgbench_accounts, line 
1
2017-01-18 17:26:56.536 CET 21006 LOG:  worker process: logical 
replication worker 41015 sync 40991 (PID 26295) exited with exit code 1
2017-01-18 17:26:56.536 CET 26291 LOG:  starting logical replication 
worker for subscription "sub1"
2017-01-18 17:26:56.542 CET 26297 LOG:  logical replication sync for 
subscription sub1, table pgbench_branches started
2017-01-18 17:26:57.015 CET 26297 ERROR:  duplicate key value violates 
unique constraint "pgbench_branches_pkey"

2017-01-18 17:26:57.015 CET 26297 DETAIL:  Key (bid)=(1) already exists.
2017-01-18 17:26:57.015 CET 26297 CONTEXT:  COPY pgbench_branches, line 
1
2017-01-18 17:26:57.017 CET 21006 LOG:  worker process: logical 
replication worker 41015 sync 40994 (PID 26297) exited with exit code 1
2017-01-18 17:26:57.017 CET 26291 LOG:  starting logical replication 
worker for subscription "sub1"
2017-01-18 17:26:57.023 CET 26299 LOG:  logical replication sync for 
subscription sub1, table pgbench_history started
2017-01-18 17:26:57.487 CET 26299 LOG:  logical replication 
synchronization worker finished processing
2017-01-18 17:26:57.488 CET 26291 LOG:  starting logical replication 
worker for subscription "sub1"
2017-01-18 17:26:57.491 CET 26301 LOG:  logical replication sync for 
subscription sub1, table pgbench_tellers started
2017-01-18 17:26:57.948 CET 26301 ERROR:  duplicate key value violates 
unique constraint "pgbench_tellers_pkey"

2017-01-18 17:26:57.948 CET 26301 DETAIL:  Key (tid)=(1) already exists.
2017-01-18 17:26:57.948 CET 26301 CONTEXT:  COPY pgbench_tellers, line 1
etc, etc.









#!/bin/sh

# assumes both instances are running, initially without publication or 
subscription


unset PGSERVICEFILE PGSERVICE PGPORT PGDATA PGHOST
env | grep PG

PGDATABASE=testdb

# clear logs
echo > 
/home/aardvark/pg_stuff/pg_installations/pgsql.logical_replication/logfile.logical_replication
echo > 
/home/aardvark/pg_stuff/pg_installations/pgsql.logical_replication2/logfile.logical_replication2


port1=6972
port2=6973

function cb()
{
  #  display the 4 pgbench tables' accumulated content as md5s
  #  a,b,t,h stand for:  pgbench_accounts, -branches, -tellers, -history
  for port in $port1 $port2
  do
md5_a=$(echo "select * from pgbench_accounts order by aid"|psql 
-qtAXp$port|md5sum|cut -b 1-9)
md5_b=$(echo "select * from pgbench_branches order by bid"|psql 
-qtAXp$port|md5sum|cut -b 1-9)
md5_t=$(echo "select * from pgbench_tellers  order by tid"|psql 
-qtAXp$port|md5sum|cut -b 1-9)
md5_h=$(echo "select * from pgbench_history  order by hid"|psql 
-qtAXp$port|md5sum|cut -b 1-9)
cnt_a=$(echo "select count(*) from pgbench_accounts"|psql -qtAXp 
$port)
cnt_b=$(echo "select count(*) from pgbench_branches"|psql -qtAXp 
$port)
cnt_t=$(echo "select count(*) from pgbench_tellers" |psql -qtAXp 
$port)
cnt_h=$(echo "select count(*) from pgbench_history" |psql -qtAXp 
$port)
printf "$port a,b,t,h: %6d %6d %6d %6d" $cnt_a  $cnt_b  $cnt_t  
$cnt_h

echo -n "   $md5_a  $md5_b  $md5_t  $md5_h"
if   [[ $port -eq $port1 ]]; then echo "   master"
elif [[ $port -eq $port2 ]]; then echo "   replica"
else  echo " ERROR"
fi
  done
}


echo "
drop table if exists pgbench_accounts;
drop table if exists pgbench_branches;
drop table if exists pgbench_tellers;
drop table if exists pgbench_history;" | psql -X -p $port1 \
  && echo "
drop table if exists pgbench_accounts;
drop table if exists pgbench_branches;
drop table if exists pgbench_tellers;
drop table if exists pgbench_history;" | 

Re: [HACKERS] Logical replication existing data copy

2017-01-11 Thread Peter Eisentraut
On 12/19/16 4:30 AM, Petr Jelinek wrote:
> This patch implements data synchronization for the logical replication.
> It works both for initial setup of subscription as well as for tables
> added later to replication when the subscription is already active.

First detailed read-through.  General structure makes sense.

Comments on some details:

No catalogs.sgml documentation for pg_subscription_rel.  When that is
added, I would emphasize that entries are only added when relations
are first encountered, not immediately when a table is added to a
publication.  So it is unlike pg_publication_rel in that respect.

Rename max_subscription_sync_workers ->
max_sync_workers_per_subscription (similar to
max_parallel_workers_per_gather and others)

About the changes to COPY: It took me a while to get clarity on the
direction of things.  Is the read_cb reading the table, or the data?
You are copying data produced by a function into a table, so
produce_cb or data_source_cb could be clearer.

SetSubscriptionRelState(): This is doing an "upsert", so I don't think
a RowExclusiveLock is enough, or it needs to be able to retry on
concurrent changes.  I suppose there shouldn't be more than one
concurrent change per sub/rel pair, but that would need to be
explained there.

SetSubscriptionRelState(): The memset(values, 0, sizeof(values)) is
kind of in an odd place.  Possibly not actually needed.

GetSubscriptionRelState(): Prefer error messages that identify the
objects by name.  (Also subid should be %u.)

GetSubscriptionRelationsFilter(): The state and filter arguments are
kind of weird.  And there are only two callers, so all this complexity
is not even used.  Perhaps, if state == SUBREL_STATE_UNKNOWN, then
return everything, else return exactly the state specified.  The case
of everything-but-the-state-specified does not appear to be needed.

GetSubscriptionRelationsFilter(): RowExclusiveLock is probably too
much

This patch adds the fourth definition of oid_cmp() (also known as
oidComparator()) and the 12th definition of atooid().  Let's collect
those in a central place.  I'm sending a separate patch for that.  (No
need to change here until that is resolved, of course.)

AlterSubscription_refresh(): Put the if (wrconn == NULL) case first
and error out, and then put the rest of the code into the main
function block, saving one level of indentation.

AlterSubscription_refresh(): remote_table_oids isn't really the
remote OIDs of the tables but the local OIDs of the remote tables.
Consider clearer variable naming in that function.

interesting_relation(): very generic name, maybe
should_apply_changes_for_rel().  Also the comment mentions a "parallel
worker" -- maybe better a "separate worker process running in
parallel", since a parallel worker is something different.

LogicalRepApplyLoop() changed to non-static, but not used anywhere
else.

process_syncing_tables_*(): Function names and associated comments are
not very clear (process what?, handle what?).

In process_syncing_tables_apply(), it says that
logicalrep_worker_count() counts the apply worker as well, but what
happens if the apply worker has temporarily disappeared?  Since
logicalrep_worker_count() is only used in this one place, maybe have
it count just the sync workers and rename it slightly.

Changed some LOG messages to DEBUG, not clear what the purpose there is.

check_max_subscription_sync_workers(): Same problem as discussed
previously, checking a GUC setting against another GUC setting like
this doesn't work.  Just skip it and check at run time.

wait_for_sync_status_change(): Comment is wrong/misleading: It doesn't
wait until it matches any specific state, it just waits for any state
change.

LogicalRepSyncTableStart(): The code that assembles the slot name
needs to be aware of NAMEDATALEN.  (Maybe at least throw in a static
assert that NAMEDATALEN>=64.)


-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2016-12-19 Thread Petr Jelinek
On 19/12/16 18:25, Peter Eisentraut wrote:
> On 12/19/16 4:30 AM, Petr Jelinek wrote:
>> So existing table data can be copied once subscription is created, but
>> also new tables can be added and their data copied. This is where the
>> REFRESH PUBLICATION comes into play. Adding table to publication does
>> not make it automatically replicated by the subscription as the
>> subscription does not have tracking info for that table. So to add new
>> table user must call ALTER SUBSCRIPTION ... REFRESH PUBLICATION on
>> subscriber otherwise the data won't be replicated.
> 
> Couldn't the subscriber automatically add tracking info when apply
> stream data arrives for a relation it has not seen before?
> 

Sure, but it has many caveats:
- what if the table does not exist
- what it if exists and already has data
- what if the table is rarely written to

We can't control any of that until we have DDL replication/automatic
structure dumping. Once we have those, we can add options to control
default behavior per subscriber, but with current feature set, anything
that does not require user action will behave non-deterministically
which is usually confusing.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2016-12-19 Thread Craig Ringer
On 20 Dec. 2016 1:27 am, "Peter Eisentraut" <
peter.eisentr...@2ndquadrant.com> wrote:

On 12/19/16 4:30 AM, Petr Jelinek wrote:
> So existing table data can be copied once subscription is created, but
> also new tables can be added and their data copied. This is where the
> REFRESH PUBLICATION comes into play. Adding table to publication does
> not make it automatically replicated by the subscription as the
> subscription does not have tracking info for that table. So to add new
> table user must call ALTER SUBSCRIPTION ... REFRESH PUBLICATION on
> subscriber otherwise the data won't be replicated.

Couldn't the subscriber automatically add tracking info when apply
stream data arrives for a relation it has not seen before?


If no table has been created by the user and we start trying to apply a
data stream apply will break.

Since manual action is needed to create the destination I don't see a
problem with requiring manual enabling too, personally.

Let the fully transparent way wait until we can do DDL replication in v11+


Re: [HACKERS] Logical replication existing data copy

2016-12-19 Thread Peter Eisentraut
On 12/19/16 4:30 AM, Petr Jelinek wrote:
> So existing table data can be copied once subscription is created, but
> also new tables can be added and their data copied. This is where the
> REFRESH PUBLICATION comes into play. Adding table to publication does
> not make it automatically replicated by the subscription as the
> subscription does not have tracking info for that table. So to add new
> table user must call ALTER SUBSCRIPTION ... REFRESH PUBLICATION on
> subscriber otherwise the data won't be replicated.

Couldn't the subscriber automatically add tracking info when apply
stream data arrives for a relation it has not seen before?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers