from:"Steve Singer"

Re: [HACKERS] pglogical - logical replication contrib module

2016-01-16 Thread Steve Singer


On 01/15/2016 12:07 PM, Petr Jelinek wrote:

That's bug, fixed.



Can you posted an updated patch with whatever fixes you have so far made?


There are several statuses the table goes through, during the COPY 
it's in synchronizing status, so next logical step seemed to be 
synchronized. Maybe it should be renamed to 'replicating' instead as 
that's what it actually means (table has finished synchronization and 
is now replicating normally).




I agree 'replicating' is clearer



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pglogical - logical replication contrib module

2016-01-22 Thread Steve Singer

The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested


This reply will covers a 10,000 foot level review of the feature (some of my 
other replies to the thread cover specifics that came up in testing and code 
level review will come later)

1) Do we want logical replication in core/contrib

10 year ago a popular feeling in the postgresql project was that replication 
didn't belong in core because there were too many different styles.  People 
then went on to complain that it were too many replication projects to choose 
from and that they were hard to use and had lots of corner cases.   The 
evolution of WAL based replication showed us how popular in-core replication 
is.  Users like being able to use in-core features and our community process 
tends to produce better quality in-core solutions than external projects.

I am of the opinion that if we can come up with a solution that meets some 
common use cases then it would be good to have those features in core/contrib.  
At this stage I am not going to get into a discussion of a contrib extension 
versus built in as not an extension.   I don't think a single replication 
solution
will ever meet all use-cases.   I feel that the extensible infrastructure we 
have so far built for logical replication means that people who want to develop 
solutions for use-cases not covered will be in a good position.  This doesn't 
mean we can't or shouldn't try to cover some use cases in core.


2) Does this patch provide a set of logical replication features that meet many 
popular use-cases

Below I will review some use-cases and try to assess how pglogical meets them.

 ** Streaming Postgresql Upgrade

pg_upgrade is great for many situations but sometimes you don't want an in 
place upgrade but you want a streaming upgrade.  Possibly because you don't 
want application downtime but instead you just want to point your applications 
at the upgraded database server in a controlled manner.   Othertimes you
might want an option of upgrading to a newer version of PG but maintain the 
option of having to rollback to the older version if things go badly.

I think pglogical should be able to handle this use case pretty well (assuming 
the source version of PG is actually new enough to include pglogical). 
Support for replicating sequences would need to be added before this is as 
smooth but once sequence support was added I think this would work well.
I also don't see any reason why you couldn't replicate from 9.7 -> 9.6 thought 
since the wire format is abstracted from the internal representation.  This is 
of course dependent not the application not doing anything that is inherently 
in-compatible between the two versions 


** Query only replicas (with temp tables or additional indexes)

Sometimes you want a replica for long running or heavy queries.  Requirements 
for temp tables, additional indexes or maybe the effect on vacuum means that 
our existing WAL based replicas are unsuitable. 

I think pglogical should be able to handle this use case pretty well with the 
caveat being that your replica is an asynchronous replica and will always lag
the origin by some amount.

** Replicating a subset of tables into a different database

Sometimes you wan to replicate a handful of tables from one database to another 
database.  Maybe the first database is the system of record for the data and 
the second database needs an up to date copy for querying.

Pglogical should meet this use case pretty well, it has flexible support for 
selecting which tables get replicated from which source.  Pglogical doesn't 
have any facilities to rename the tables between the origin and replica but 
they could be added later.

** Sharding

Systems that do application level sharding (or even sharding with a fdw) often 
have non-sharded tables that need to be available on all shards for relational 
integrity or joins.   Logical replication is one way to make sure that the 
replicated data gets to all the shards.  Sharding systems also sometimes want
to take the data from individual shards and replicate it to a consolidation 
server for reporting purposes.

Pglogical seems to meet this use case, I guess you would have a designated 
origin for the shared data/global data that all shards would subscribe to
with a set containing the designated data.  For the consolidation use case you 
would have the consolidation server subscribe to all shards

I am less clear about how someone would want DDL changes to work for these 
cases.  The DDL support in the patch is pretty limited so I am not going to 
think much now about how we would want DDL to work.


** Schema changes involving rewriting big tables

Sometimes you have a DDL change on a large table that will involve a table 
rewrite and the best way of deploying the change is to make the DDL ch

Re: [HACKERS] pglogical - logical replication contrib module

2016-02-02 Thread Steve Singer

The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, failed
Spec compliant:   not tested
Documentation:tested, failed

Here is some more review

+- `pglogical.replication_set_add_table(set_name name, table_name regclass, 
synchronize boolean)`
+  Adds a table to replication set.
+
+  Parameters:
+  - `set_name` - name of the existing replication set
+  - `table_name` - name or OID of the table to be added to the set
+  - `synchronize` - if true, the table data is synchronized on all subscribers
+which are subscribed to given replication set, default false
+

The argument to this function is actually named "relation" not "table_name" 
though we might want to update the function to name the argument table_name.

Also we don't explain what 'synchronize' means I first thought that a value of 
false would mean that existing data won't be copied but any new changes will be.
A value of false actually seems to mean that nothing will happen with the table 
until the synchronize function is manually called. We seem to be using the word 
'synchronize' in different sense in different places I find it confusing (ie 
synchronize_data and syncronize_structure in create_subscription).





*** a/contrib/pglogical/pglogical_sync.c
--- b/contrib/pglogical/pglogical_sync.c
+ static void
+ dump_structure(PGLogicalSubscription *sub, const char *snapshot)
+ {
+   charpg_dump[MAXPGPATH];
+   uint32  version;
+   int res;
+   StringInfoData  command;
+
+   if (find_other_exec_version(my_exec_path, PGDUMP_BINARY, &version, pg_dump))
+   elog(ERROR, "pglogical subscriber init failed to find pg_dump relative 
to binary %s",
+my_exec_path);
+
+   if (version / 100 != PG_VERSION_NUM / 100)
+   elog(ERROR, "pglogical subscriber init found pg_dump with wrong major 
version %d.%d, expected %d.%d",
+version / 100 / 100, version / 100 % 100,
+PG_VERSION_NUM / 100 / 100, PG_VERSION_NUM / 100 % 100);
+
+   initStringInfo(&command);
+ #if PG_VERSION_NUM < 90500
+   appendStringInfo(&command, "%s --snapshot=\"%s\" -s -N %s -N 
pglogical_origin -F c -f \"/tmp/pglogical-%d.dump\" \"%s\"",
+ #else
+   appendStringInfo(&command, "%s --snapshot=\"%s\" -s -N %s -F c -f 
\"/tmp/pglogical-%d.dump\" \"%s\"",

1) I am not sure we can assume/require that the pg_dump binary be in the same 
location as the postgres binary.  I don't know think we've ever required that 
client binaries (ie psql, pg_dump, pg_restore ...) be in the same directory as 
postgres.  pg_upgrade does require this so maybe this isn't a problem in 
practice but I thought I'd point it out. Ideally wouldn't need to call an 
external program to get a schema dump but turning pg_dump into a library is 
beyond the scope of this patch.


2) I don't think we can hard-coded /tmp as the directory for the schema dump.  
I don't think will work on most windows systems and even on a unix system 
$TMPDIR might be set to something else.  Maybe writing this into pgsql_tmp 
would be a better choice.  

Furtherdown in
pglogical_sync_subscription(PGLogicalSubscription *sub)
+   switch (status)
+   {
+   /* Already synced, nothing to do except cleanup. */
+   case SYNC_STATUS_READY:
+   MemoryContextDelete(myctx);
+   return;
+   /* We can recover from crashes during these. */
+   case SYNC_STATUS_INIT:
+   case SYNC_STATUS_CATCHUP:
+   break;
+   default:
+   elog(ERROR,
+"subscriber %s initialization failed during nonrecoverable 
step (%c), please try the setup again",
+sub->name, status);
+   break;
+   }

I think the default case needs to do something to unregister the background 
worker.  We already discussed trying to get the error message to a user in a 
better way either way there isn't any sense in this background worker being 
launched again if the error is nonrecoverable.


+
+   tables = copy_replication_sets_data(sub->origin_if->dsn,
+   sub->target_if->dsn,
+   snapshot,
+   sub->replication_sets);
+
+   /* Store info about all the synchronized tables. */
+   StartTransactionCommand();
+   foreach (lc, tables)

Shouldn't we be storing the info about the synchronized tables as part of the 
same transaction that does the sync?  


I'll keeping going through the code as I have time.   I think it is appropriate 
to move this to the next CF since the CF is past the end date and the patch has 
received some review.   When you have an updated version of the patch post it, 
don't wait until March.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpre

Re: [HACKERS] pglogical - logical replication contrib module

2016-02-02 Thread Steve Singer


On 01/26/2016 10:43 AM, Craig Ringer wrote:
On 23 January 2016 at 11:17, Steve Singer <mailto:st...@ssinger.info>> wrote:





** Schema changes involving rewriting big tables

Sometimes you have a DDL change on a large table that will involve
a table rewrite and the best way of deploying the change is to
make the DDL change
on a replicate then once it is finished promote the replica to the
origin in some controlled fashion.  This avoids having to lock the
table on the origin for hours.

pglogical seems to allow minor schema changes on the replica such
as changing a type but it doesn't seem to allow a DO INSTEAD
trigger on the replica.  I don't think pglogical currently meets
this use case particularly well


I'm not sure I fully understand that one.


Say you have a table A with column b

and the next version of your application you want to create a second 
table B that has column B


create table B (b text);
insert into B select b from a;
alter table a drop column b;

but you want to do this on a replica because it is a very big table and 
you want to minimize downtown.


You could have a trigger on the replica that performed updates on B.b 
instead of A except triggers don't seem to get run on the replica.




 Craig Ringer http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Steve



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication WIP

2016-09-05 Thread Steve Singer


On 08/31/2016 04:51 PM, Petr Jelinek wrote:

Hi,

and one more version with bug fixes, improved code docs and couple 
more tests, some general cleanup and also rebased on current master 
for the start of CF.






To get the 'subscription' TAP tests to pass I need to set

export PGTZ=+02

Shouldn't the expected output be with reference to PST8PDT?




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication WIP

2016-09-05 Thread Steve Singer


On 09/05/2016 03:58 PM, Steve Singer wrote:

On 08/31/2016 04:51 PM, Petr Jelinek wrote:

Hi,

and one more version with bug fixes, improved code docs and couple 
more tests, some general cleanup and also rebased on current master 
for the start of CF.








A few more things I noticed when playing with the patches

1, Creating a subscription to yourself ends pretty badly,
the 'CREATE SUBSCRIPTION' command seems to get stuck, and you can't kill 
it.  The background process seems to be waiting for a transaction to 
commit (I assume the create subscription command).  I had to kill -9 the 
various processes to get things to stop.  Getting confused about 
hostnames and ports is a common operator error.


2. Failures during the initial subscription  aren't recoverable

For example

on db1
  create table a(id serial4 primary key,b text);
  insert into a(b) values ('1');
  create publication testpub for table a;

on db2
  create table a(id serial4 primary key,b text);
  insert into a(b) values ('1');
  create subscription testsub connection 'host=localhost port=5440 
dbname=test' publication testpub;


I then get in my db2 log

ERROR:  duplicate key value violates unique constraint "a_pkey"
DETAIL:  Key (id)=(1) already exists.
LOG:  worker process: logical replication worker 16396 sync 16387 (PID 
10583) exited with exit code 1

LOG:  logical replication sync for subscription testsub, table a started
ERROR:  could not crate replication slot "testsub_sync_a": ERROR: 
replication slot "testsub_sync_a" already exists



LOG:  worker process: logical replication worker 16396 sync 16387 (PID 
10585) exited with exit code 1

LOG:  logical replication sync for subscription testsub, table a started
ERROR:  could not crate replication slot "testsub_sync_a": ERROR: 
replication slot "testsub_sync_a" already exists



and it keeps looping.
If I then truncate "a" on db2 it doesn't help. (I'd expect at that point 
the initial subscription to work)


If I then do on db2
 drop subscription testsub cascade;

I still see a slot in use on db1

select * FROM pg_replication_slots ;
   slot_name|  plugin  | slot_type | datoid | database | active | 
active_pid | xmin | catalog_xmin | rest

art_lsn | confirmed_flush_lsn
+--+---++--+++--+--+-
+-
 testsub_sync_a | pgoutput | logical   |  16384 | test | f 
||  | 1173 | 0/15

66E08   | 0/1566E40






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication WIP

2016-09-18 Thread Steve Singer


On 09/08/2016 06:59 PM, Petr Jelinek wrote:
- the CREATE SUBSCRIPTION also tries to check if the specified 
connection connects back to same db (although that check is somewhat 
imperfect) and if it gets stuck on create slot it should be normally 
cancelable (that should solve the issue Steve Singer had) 




When I create my subscriber database by doing a physical backup of the 
publisher cluster (with cp before I add any data) then I am unable to 
connect subscribe.

ie
initdb ../data
cp -r ../data ../data2
./postgres -D ../data
./postgres -D ../data2


This make sense when I look at your code, but it might not be what we want

I had the same issue when I created my subscriber cluster with 
pg_basebackup (The timeline on the destination cluster still shows as 1)






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication WIP

2016-09-18 Thread Steve Singer


On 09/08/2016 06:59 PM, Petr Jelinek wrote:

Hi,

Updated version, this should address most of the things in Peter's 
reviews so far, not all though as some of it needs more discussion.




Another bug report.

I had subscribed a subscriber database to a publication with 1 table
create table a (a serial4 primary key, b text);

* I then dropped column b on the subscriber
* inserted some rows on the publisher
* Noticed the expected error about column b not existing in the 
subscriber log

* Added column c on the subscriber, then added column b after column C

I now get the following stack trace

#1  0x007dc8f9 in cstring_to_text (
s=0x16f238af0 )
at varlena.c:152
#2  0x008046a3 in InputFunctionCall (
flinfo=flinfo@entry=0x7fffa02d0250,
str=str@entry=0x16f238af0 0x16f238af0>, typioparam=typioparam@entry=25, typmod=typmod@entry=-1) at 
fmgr.c:1909

#3  0x00804971 in OidInputFunctionCall (functionId=,
str=0x16f238af0 ,
typioparam=25, typmod=-1) at fmgr.c:2040
#4  0x006aa485 in SlotStoreCStrings (slot=0x2748670,
values=0x7fffa02d0330) at apply.c:569
#5  0x006ab45c in handle_insert (s=0x274d088) at apply.c:756
#6  0x006abcea in handle_message (s=0x7fffa02d3e20) at apply.c:978
#7  LogicalRepApplyLoop (last_received=117457680) at apply.c:1146
#8  0x006ac37e in ApplyWorkerMain (main_arg=)
at apply.c:1530


In SlotStoreCStrings values only has 2 elements but natts is 4




Changes:
- I moved the publication.c to pg_publication.c, subscription.c to 
pg_subscription.c.

- changed \drp and \drs to \dRp and \dRs
- fixed definitions of the catalogs (BKI_ROWTYPE_OID)
- changed some GetPublication calls to get_publication_name
- fixed getObjectIdentityParts for OCLASS_PUBLICATION_REL
- fixed get_object_address_publication_rel
- fixed the dependencies between pkeys and publications, for this I 
actually had to add new interface to depenency.c that allows dropping 
single dependency

- fixed the 'for all tables' and 'for tables all in schema' publications
- changed the alter publication from FOR to SET
- added more test cases for the publication DDL
- fixed compilation of subscription patch alone and docs
- changed subpublications to name[]
- added check for publication list duplicates
- made the subscriptions behave more like they are inside the database 
instead of shared catalog (even though the catalog is still shared)
- added options for for CREATE SUBSCRIPTION to optionally not create 
slot and not do the initial data sync - that should solve the 
complaint about CREATE SUBSCRIPTION always connecting
- the CREATE SUBSCRIPTION also tries to check if the specified 
connection connects back to same db (although that check is somewhat 
imperfect) and if it gets stuck on create slot it should be normally 
cancelable (that should solve the issue Steve Singer had)

- fixed the tests to work in any timezone
- added DDL regress tests for subscription
- added proper detection of missing schemas and tables on subscriber
- rebased on top of 19acee8 as the DefElem changes broke the patch

The table sync is still far from ready.







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication WIP

2016-09-20 Thread Steve Singer


On Tue, 20 Sep 2016, Peter Eisentraut wrote:


On 9/18/16 4:17 PM, Steve Singer wrote:




I think if we want to prevent the creation of subscriptions that point
to self, then we need to create a magic token when the postmaster starts
and check for that when we connect.  So more of a running-instance
identifier instead of a instance-on-disk identifier.

The other option is that we just allow it and make it more robust.


I think we should go with the second option for now. I feel that the effort 
is  better spent making sure that initial syncs that have don't subscribe 
(for whatever reasons) can be aborted instead of trying to build a concept of 
node identity before we really need it.


Steve



--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench regression test failure

2017-11-11 Thread Steve Singer

The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

This causes the pgbench tests to fail (consistently) with

not ok 194 - pgbench late throttling stdout /(?^:processed: [01]/10)/


When I run pgbench manually I get (-t 10 --rate=10 --latency-limit=1 -n -r)

number of transactions actually processed: 10/10
number of transactions skipped: 10 (100.000 %)

Prior to the patch I was getting.

number of transactions actually processed: 0/10
number of transactions skipped: 10 (100.000 %)



@@ -3539,7 +3542,7 @@ printResults(TState *threads, StatsData *total, 
instr_time total_time,^M
{^M
printf("number of transactions per client: %d\n", nxacts);^M
printf("number of transactions actually processed: " 
INT64_FORMAT "/%d\n",^M
-  total->cnt - total->skipped, nxacts * nclients);^M
+  total->cnt, nxacts * nclients);^M

I think you want ntx instead of total->cnt here. 




The new status of this patch is: Waiting on Author

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication WIP

2016-10-30 Thread Steve Singer


On 10/24/2016 09:22 AM, Petr Jelinek wrote:

Hi,

attached is updated version of the patch.

There are quite a few improvements and restructuring, I fixed all the
bugs and basically everything that came up from the reviews and was
agreed on. There are still couple of things missing, ie column type
definition in protocol and some things related to existing data copy.


Here are a few things I've noticed so far.

+
+CREATE SUBSCRIPTION mysub WITH CONNECTION dbname=foo host=bar 
user=repuser PUBLICATION mypub;

+
+  
+  

The documentation above doesn't match the syntax, CONNECTION needs to be 
in single quotes not double quotes

I think you want
+
+CREATE SUBSCRIPTION mysub WITH CONNECTION 'dbname=foo host=bar 
user=repuser' PUBLICATION mypub;

+
+  
+  



I am not sure if this is a known issue covered by your comments about 
data copy but I am still having issues with error reporting on a failed 
subscription.


I created a subscription, dropped the subscription and created a second 
one.  The second subscription isn't active but shows no errors.



P: create publication mypub for table public.a;
S: create subscription mysub with connection 'dbname=test host=localhost 
port=5440' publication mypub;

P:  insert into a(b) values ('t');
S: select * FROM a;
 a | b
---+---
 1 | t
(1 row)

Everything is good
Then I do
S: drop subscription mysub;

S: create subscription mysub2 with connection 'dbname=test 
host=localhost port=5440' publication mypub;

P:  insert into a(b) values ('f');
S: select * FROM a;
 a | b
---+---
 1 | t

The data doesn't replicate


select * FROM pg_stat_subscription;
 subid | subname | pid | relid | received_lsn | last_msg_send_time | 
last_msg_receipt_time | latest_end_lsn | latest_end_time

---+-+-+---+--++---++-
 16398 | mysub2  | |   |  | |   
||

(1 row)


The only thing in my log is

2016-10-30 15:27:27.038 EDT [6028] NOTICE:  dropped replication slot 
"mysub" on publisher
2016-10-30 15:27:36.072 EDT [6028] NOTICE:  created replication slot 
"mysub2" on publisher

2016-10-30 15:27:36.082 EDT [6028] NOTICE:  synchronized table states


I'd expect an error in the log or something.
However, if I delete everything from the table on the subscriber then 
the subscription proceeds


I think there are still problems with signal handling in the initial sync

If I try to drop mysub2 (while the subscription is stuck instead of 
deleting the data) the drop hangs
If I then try to kill the postmaster for the subscriber nothing happens, 
have to send it a -9 to go away.


However once I do that and then restart the postmaster for the 
subscriber I start to see the duplicate key errors in the log


2016-10-30 16:00:54.635 EDT [7018] ERROR:  duplicate key value violates 
unique constraint "a_pkey"

2016-10-30 16:00:54.635 EDT [7018] DETAIL:  Key (a)=(1) already exists.
2016-10-30 16:00:54.635 EDT [7018] CONTEXT:  COPY a, line 1
2016-10-30 16:00:54.637 EDT [7007] LOG:  worker process: logical 
replication worker 16400 sync 16387 (PID 7018) exited with exit code 1


I'm not sure why I didn't get those until I restarted the postmaster but 
it seems to happen whenever I drop a subscription then create a new one. 
Creating the second subscription from the same psql session as I 
create/drop the first seems important  in reproducing this





I am also having issues dropping a second subscription from the same 
psql session


(table a is empty on both nodes to avoid duplicate key errors)
S: create subscription sub1 with connection  'host=localhost dbname=test 
port=5440' publication mypub;
S: create subscription sub2 with connection  'host=localhost dbname=test 
port=5440' publication mypub;

S: drop subscription sub1;
S: drop subscription sub2;

At this point the drop subscription hangs.








The biggest changes are:

I added one more prerequisite patch (the first one) which adds ephemeral
slots (or well implements UI on top of the code that was mostly already
there). The ephemeral slots are different in that they go away either on
error or when session is closed. This means the initial data sync does
not have to worry about cleaning up the slots after itself. I think this
will be useful in other places as well (for example basebackup). I
originally wanted to call them temporary slots in the UI but since the
behavior is bit different from temp tables I decided to go with what the
underlying code calls them in UI as well.

I also split out the libpqwalreceiver rewrite to separate patch which
does just the re-architecture and does not really add new functionality.
And I did the re-architecture bit differently based on the review.

There is now new executor module in execReplication.c, no new nodes but
several utility commands. I moved there the tuple lookup functions from
apply and also wrote new interfaces for doing inserts/updates/deletes to
a

Re: [HACKERS] Logical Replication WIP

2016-11-05 Thread Steve Singer


On 10/31/2016 06:38 AM, Petr Jelinek wrote:
There are some fundamental issues with initial sync that need to be 
discussed on list but this one is not known. I'll try to convert this 
to test case (seems like useful one) and fix it, thanks for the 
report. In meantime I realized I broke the last patch in the series 
during rebase so attached is the fixed version. It also contains the 
type info in the protocol.





I don't know if this is covered by the known initial_sync problems or not


If I have a 'all tables' publication and then create a new table the 
data doesn't seem to replicate to the new table.



P: create table a(a serial4 primary key, b text);
S: create table a(a serial4 primary key, b text);
P: create publication mypub for all tables;
S: create subscription mysub connection 'host=localhost dbname=test 
port=5441' publication mypub;

P: create table b(a serial4 primary key, b text);
P: insert into b(b) values ('foo2');
P: insert into a(b) values ('foo3');

Then I check my subscriber

select * FROM a;
 a |  b
---+--
 1 | foo
 2 | foo3
(2 rows)

test=# select * FROM b;
 a | b
---+---
(0 rows)


However, if the table isn't on the subscriber I do get an error:

ie

P: create table c(a serial4 primary key, b text);
P: insert into c(b) values('foo');

2016-11-05 11:49:31.456 EDT [14938] FATAL:  the logical replication 
target public.c not found
2016-11-05 11:49:31.457 EDT [13703] LOG:  worker process: logical 
replication worker 16457 (PID 14938) exited with exit code 1


but if then add the table
S: create table c(a serial4 primary key, b text);
2016-11-05 11:51:08.583 EDT [15014] LOG:  logical replication apply for 
subscription mysub started


but the data doesn't replicate to table c either.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication WIP

2016-11-13 Thread Steve Singer


On 10/31/2016 06:38 AM, Petr Jelinek wrote:

On 31/10/16 00:52, Steve Singer wrote:
There are some fundamental issues with initial sync that need to be 
discussed on list but this one is not known. I'll try to convert this 
to test case (seems like useful one) and fix it, thanks for the 
report. In meantime I realized I broke the last patch in the series 
during rebase so attached is the fixed version. It also contains the 
type info in the protocol.




Attached are some proposed documentation updates (to be applied ontop of 
your 20161031 patch set)


Also


  Publication


+  
+The tables are matched using fully qualified table name. Renaming of
+tables or schemas is not supported.
+  

Is renaming of tables any less supported than other DDL operations
For example

alter table nokey2 rename to nokey3
select * FROM pg_publication_tables ;
 pubname | schemaname | tablename
-++---
 tpub| public | nokey3
(1 row)


If I then kill the postmaster on my subscriber and restart it, I get

2016-11-13 16:17:11.341 EST [29488] FATAL:  the logical replication 
target public.nokey3 not found
2016-11-13 16:17:11.342 EST [29272] LOG:  worker process: logical 
replication worker 41076 (PID 29488) exited with exit code 1
2016-11-13 16:17:16.350 EST [29496] LOG:  logical replication apply for 
subscription nokeysub started
2016-11-13 16:17:16.358 EST [29498] LOG:  logical replication sync for 
subscription nokeysub, table nokey2 started
2016-11-13 16:17:16.515 EST [29498] ERROR:  table public.nokey2 not 
found on publisher
2016-11-13 16:17:16.517 EST [29272] LOG:  worker process: logical 
replication worker 41076 sync 24688 (PID 29498) exited with exit code 1


but if I then rename the table on the subscriber everything seems to work.

(I suspect the need to kill+restart is a bug, I've seen other instances 
where a hard restart of the subscriber following changes to is required)



I am also having issues adding a table to a publication ,it doesn't seem 
work


P: create publication tpub for table a;
S: create subscription mysub connection 'host=localhost dbname=test 
port=5440' publication tpub;

P: insert into a(b) values ('1');
P: alter publication tpub add table b;
P: insert into b(b) values ('1');
P: insert into a(b) values ('2');


select * FROM pg_publication_tables ;
 pubname | schemaname | tablename
-++---
 tpub| public | a
 tpub| public | b


but


S:  select * FROM b;
 a | b
---+---
(0 rows)
S: select * FROM a;
 a | b
---+---
 5 | 1
 6 | 2
(2 rows)


diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
new file mode 100644
index 63af1a5..89723c3
*** a/doc/src/sgml/logical-replication.sgml
--- b/doc/src/sgml/logical-replication.sgml
***
*** 95,110 
  how the table is accessed. Each table can be added to multiple
  publications if needed.  Publications may currently only contain
  tables. Objects must be added explicitly, except when a publication
! is created for ALL TABLES. There is no default name for
! a publication which specifies all tables.


  Publications can choose to limit the changes they produce to show
  any combination of INSERT, UPDATE and
  DELETE in a similar way to the way triggers are fired by
! particular event types. Only tables with a REPLICA IDENTITY
! index can be added to a publication which replicates UPDATE
! and DELETE operation.


  The definition of a publication object will be included within
--- 95,110 
  how the table is accessed. Each table can be added to multiple
  publications if needed.  Publications may currently only contain
  tables. Objects must be added explicitly, except when a publication
! is created for ALL TABLES.


  Publications can choose to limit the changes they produce to show
  any combination of INSERT, UPDATE and
  DELETE in a similar way to the way triggers are fired by
! particular event types. If a table with without a REPLICA IDENTITY
! index is added to a publication which replicates UPDATE or
! DELETE operations then subsequent UPDATE
! or DELETE operations will fail on the publisher.


  The definition of a publication object will be included within
***
*** 195,214 


  A conflict will produce an error and will stop the replication; it
! must be resolved manually by the user.


! The resolution can be done either by changing data on the subscriber
! so that it does not conflict with incoming change or by skipping the
! transaction that conflicts with the existing data. The transaction
! can be skipped by calling the
! 
! pg_replication_origin_advance() function
! with a node_name corresponding to the subscription name. T

Re: [HACKERS] Logical Replication WIP

2016-11-20 Thread Steve Singer


On Sun, 20 Nov 2016, Petr Jelinek wrote:


On 13/11/16 23:02, Steve Singer wrote:



There is one exception though:

*** 195,214 


  A conflict will produce an error and will stop the replication; it
! must be resolved manually by the user.


! The resolution can be done either by changing data on the subscriber
! so that it does not conflict with incoming change or by skipping the
! transaction that conflicts with the existing data. The transaction
! can be skipped by calling the
! 
! pg_replication_origin_advance() function
! with a node_name corresponding to the subscription name. The
! current position of origins can be seen in the
! 
! pg_replication_origin_status system view.
!   
  
  


I don't see why this needs to be removed? Maybe it could be improved but
certainly not removed?



Sorry, I was confused. I noticed that the function was missing in the patch 
and thought it was documentation for a function that you had removed from 
recent versions of the patch versus referencing a function that is already 
committed.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] 9.4 logical replication - walsender keepalive replies

2014-06-30 Thread Steve Singer


In 9.4 we've the below  block of code to walsender.c as

/*
 * We only send regular messages to the client for full decoded
 * transactions, but a synchronous replication and walsender shutdown
 * possibly are waiting for a later location. So we send pings
 * containing the flush location every now and then.
*/
if (MyWalSnd->flush < sentPtr && !waiting_for_ping_response)
{
WalSndKeepalive(true);
waiting_for_ping_response = true;
}


I am finding that my logical replication reader is spending a tremendous 
amount of time sending feedback to the server because a keep alive reply 
was requested.  My flush pointer is smaller than sendPtr, which I see as 
the normal case (The client hasn't confirmed all the wal it has been 
sent).   My client queues the records it receives and only confirms when 
actually processes the record.


So the sequence looks something like

Server Sends LSN 0/1000
Server Sends LSN 0/2000
Server Sends LSN 0/3000
Client confirms LSN 0/2000


I don't see why all these keep alive replies are needed in this case 
(the timeout value is bumped way up, it's the above block that is 
triggering the reply request not something related to timeout)


Steve






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] 9.4 pg_control corruption

2014-07-08 Thread Steve Singer

I've encountered a corrupt pg_control  file on my 9.4 development 
cluster.  I've mostly been using the cluster for changeset extraction / 
slony testing.


This is a 9.4 (currently commit 6ad903d70a440e  + a walsender change 
discussed in another thread) but would have had the initdb done with an 
earlier 9.4 snapshot.




/usr/local/pgsql94wal/bin$ ./pg_controldata ../data
WARNING: Calculated CRC checksum does not match value stored in file.
Either the file is corrupt, or it has a different layout than this program
is expecting.  The results below are untrustworthy.

pg_control version number:937
Catalog version number:   201405111
Database system identifier:   6014096177254975326
Database cluster state:   in production
pg_control last modified: Tue 08 Jul 2014 06:15:58 PM EDT
Latest checkpoint location:   5/44DC5FC8
Prior checkpoint location:5/44C58B88
Latest checkpoint's REDO location:5/44DC5FC8
Latest checkpoint's REDO WAL file:000100050044
Latest checkpoint's TimeLineID:   1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:  0/1558590
Latest checkpoint's NextOID:  505898
Latest checkpoint's NextMultiXactId:  3285
Latest checkpoint's NextMultiOffset:  6569
Latest checkpoint's oldestXID:1281
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Time of latest checkpoint:Tue 08 Jul 2014 06:15:23 PM EDT
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline:   0
Backup start location:0/0
Backup end location:  0/0
End-of-backup record required:no
Current wal_level setting:logical
Current wal_log_hints setting:off
Current max_connections setting:  200
Current max_worker_processes setting: 8
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   64
Maximum data alignment:   8
Database block size:  8192
Blocks per segment of large relation: 131072
WAL block size:   8192
Bytes per WAL segment:16777216
Maximum length of identifiers:64
Maximum columns in an index:  32
Maximum size of a TOAST chunk:1996
Size of a large-object chunk: 65793
Date/time type storage:   floating-point numbers
Float4 argument passing:  by reference
Float8 argument passing:  by reference
Data page checksum version:   2602751502
ssinger@ssinger-laptop:/usr/local/pgsql94wal/bin$


Before this postgres crashed, and seemed to have problems recovering. I 
might have hit CTRL-C but I didn't do anything drastic like issue a kill -9.



test1 [unknown] 2014-07-08 18:15:18.986 EDTFATAL:  the database system 
is in recovery mode
test1 [unknown] 2014-07-08 18:15:20.482 EDTWARNING:  terminating 
connection because of crash of another server process
test1 [unknown] 2014-07-08 18:15:20.482 EDTDETAIL:  The postmaster has 
commanded this server process to roll back the current transaction and 
exit, because another server process exited abnormally and possibly 
corrupted shared memory.
test1 [unknown] 2014-07-08 18:15:20.482 EDTHINT:  In a moment you should 
be able to reconnect to the database and repeat your command.
  2014-07-08 18:15:20.483 EDTLOG:  all server processes terminated; 
reinitializing
  2014-07-08 18:15:20.720 EDTLOG:  database system was interrupted; 
last known up at 2014-07-08 18:15:15 EDT
  2014-07-08 18:15:20.865 EDTLOG:  database system was not properly 
shut down; automatic recovery in progress

  2014-07-08 18:15:20.954 EDTLOG:  redo starts at 5/41023848
  2014-07-08 18:15:23.153 EDTLOG:  unexpected pageaddr 4/D8DC6000 in 
log segment 000100050044, offset 14442496

  2014-07-08 18:15:23.153 EDTLOG:  redo done at 5/44DC5F60
  2014-07-08 18:15:23.153 EDTLOG:  last completed transaction was at 
log time 2014-07-08 18:15:17.874937-04
test2 [unknown] 2014-07-08 18:15:24.247 EDTFATAL:  the database system 
is in recovery mode
test2 [unknown] 2014-07-08 18:15:24.772 EDTFATAL:  the database system 
is in recovery mode
test2 [unknown] 2014-07-08 18:15:25.281 EDTFATAL:  the database system 
is in recovery mode
test1 [unknown] 2014-07-08 18:15:25.547 EDTFATAL:  the database system 
is in recovery mode
test2 [unknown] 2014-07-08 18:15:25.548 EDTFATAL:  the database system 
is in recovery mode
test3 [unknown] 2014-07-08 18:15:25.549 EDTFATAL:  the database system 
is in recovery mode
test4 [unknown] 2014-07-08 18:15:25.557 EDTFATAL:  the database system 
is in recovery mode
test5 [unknown] 2014-07-08 18:15:25.582 EDTFATAL:  the database system 
is in recovery mode
test2 [unknown] 2014-07-08 18:15:25.584 EDTFATAL:  the database system 
is in recove

Re: [HACKERS] 9.4 pg_control corruption

2014-07-08 Thread Steve Singer


On 07/08/2014 10:14 PM, Tom Lane wrote:

Steve Singer  writes:

I've encountered a corrupt pg_control  file on my 9.4 development
cluster.  I've mostly been using the cluster for changeset extraction /
slony testing.
This is a 9.4 (currently commit 6ad903d70a440e  + a walsender change
discussed in another thread) but would have had the initdb done with an
earlier 9.4 snapshot.

Somehow or other you missed the update to pg_control version number 942.
There's no obvious reason to think that this pg_control file is corrupt
on its own terms, but the pg_controldata version you're using expects
the 942 layout.  The fact that the server wasn't complaining about this
suggests that you've not recompiled the server, or at least not xlog.c.
Possibly the odd failure to restart indicates that you have a partially
updated server executable?



The server  is complaining about that, it started to after the crash 
(which is why I ran pg_controldata)


ssinger@ssinger-laptop:/usr/local/pgsql94wal/bin$ ./postgres -D ../data
  2014-07-08 22:28:57.796 EDTFATAL:  database files are incompatible 
with server
  2014-07-08 22:28:57.796 EDTDETAIL:  The database cluster was 
initialized with PG_CONTROL_VERSION 937, but the server was compiled 
with PG_CONTROL_VERSION 942.

  2014-07-08 22:28:57.796 EDTHINT:  It looks like you need to initdb.
ssinger@ssinger-laptop:/usr/local/pgsql94wal/bin$


The server seemed fine (and it was 9.4 because I was using 9.4 features)
The server crashed
The server performed crash recovery
The server server wouldn't start and pg_controldata shows the attached 
output


I wasn't recompiling or reinstalling around this time either.




regards, tom lane






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 logical replication - walsender keepalive replies

2014-07-14 Thread Steve Singer


On 07/06/2014 10:11 AM, Andres Freund wrote:

Hi Steve,



Right. I thought about this for a while, and I think we should change
two things. For one, don't request replies here. It's simply not needed,
as this isn't dealing with timeouts. For another don't just check ->flush
< sentPtr but also && ->write < sentPtr. The reason we're sending these
feedback messages is to inform the 'logical standby' that there's been
WAL activity which it can't see because they don't correspond to
anything that's logically decoded (e.g. vacuum stuff).
Would that suit your needs?

Greetings,


Yes I think that will work for me.
I tested with the attached patch that I think  does what you describe 
and it seems okay.





Andres Freund



diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
new file mode 100644
index 3189793..844a5de
*** a/src/backend/replication/walsender.c
--- b/src/backend/replication/walsender.c
*** WalSndWaitForWal(XLogRecPtr loc)
*** 1203,1211 
  		 * possibly are waiting for a later location. So we send pings
  		 * containing the flush location every now and then.
  		 */
! 		if (MyWalSnd->flush < sentPtr && !waiting_for_ping_response)
  		{
! 			WalSndKeepalive(true);
  			waiting_for_ping_response = true;
  		}
  
--- 1203,1213 
  		 * possibly are waiting for a later location. So we send pings
  		 * containing the flush location every now and then.
  		 */
! 		if (MyWalSnd->flush < sentPtr &&
! 			MyWalSnd->write < sentPtr &&
! 			!waiting_for_ping_response)
  		{
! 			WalSndKeepalive(false);
  			waiting_for_ping_response = true;
  		}
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 logical replication - walsender keepalive replies

2014-08-11 Thread Steve Singer


On 07/14/2014 01:19 PM, Steve Singer wrote:

On 07/06/2014 10:11 AM, Andres Freund wrote:

Hi Steve,



Right. I thought about this for a while, and I think we should change
two things. For one, don't request replies here. It's simply not needed,
as this isn't dealing with timeouts. For another don't just check 
->flush

< sentPtr but also && ->write < sentPtr. The reason we're sending these
feedback messages is to inform the 'logical standby' that there's been
WAL activity which it can't see because they don't correspond to
anything that's logically decoded (e.g. vacuum stuff).
Would that suit your needs?

Greetings,


Yes I think that will work for me.
I tested with the attached patch that I think  does what you describe 
and it seems okay.






Any feedback on this?  Do we want that change for 9.4, or do we want 
something else?






Andres Freund









--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] 9.4 logical decoding assertion

2014-08-14 Thread Steve Singer


I hit the following on 9.4 testing logical decoding.


TRAP: FailedAssertion("!(prev_first_lsn < cur_txn->first_lsn)", File: 
"reorderbuffer.c", Line: 618)

LOG:  server process (PID 3801) was terminated by signal 6: Aborted

Unfortunately I don't have a core file and I haven't been able to 
reproduce this.










--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 logical decoding assertion

2014-08-15 Thread Steve Singer


On 08/15/2014 09:42 AM, Andres Freund wrote:

On 2014-08-14 16:03:08 -0400, Steve Singer wrote:

I hit the following on 9.4 testing logical decoding.


TRAP: FailedAssertion("!(prev_first_lsn < cur_txn->first_lsn)", File:
"reorderbuffer.c", Line: 618)
LOG:  server process (PID 3801) was terminated by signal 6: Aborted

I saw that recently while hacking around, but I thought it was because
of stuff I'd added. But apparently not.

Hm. I think I see how that might happen. It might be possible (and
harmless) if two subxacts of the same toplevel xact have the same
first_lsn. But if it's not just <= vs < it'd be worse.


Unfortunately I don't have a core file and I haven't been able to reproduce
this.

Any information about the workload? Any chance you still have the data
directory around?


I was running the slony regression tests  but  I ran the same tests 
script after a number of times after and the problem didn't reproduce 
itself.


The last thing the tests did before the crash was part of the slony 
failover process.


I am doing my testing running with all 5 nodes/databases under the same 
postmaster (giving something like 20 replication slots open)


A few milliseconds before the one of the connections had just done a
START_REPLICATION SLOT "slon_4_2" LOGICAL 0/32721A58

and then that connection reported the socket being closed,

but because so much was going on concurrently I can't say for sure if 
that connection experienced the assert or was closed because another 
backend asserted.



I haven't done an initdb since, so I have the data directory but I've 
dropped and recreated all of my slots many times since so the wal files 
are long gone.




Greetings,

Andres Freund





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tablesync patch broke the assumption that logical rep depends on?

2017-04-16 Thread Steve Singer


On 04/10/2017 01:28 PM, Fujii Masao wrote:

Hi,

  src/backend/replication/logical/launcher.c
  * Worker started and attached to our shmem. This check is safe
  * because only launcher ever starts the workers, so nobody can steal
  * the worker slot.

The tablesync patch enabled even worker to start another worker.
So the above assumption is not valid for now.

This issue seems to cause the corner case where the launcher picks up
the same worker slot that previously-started worker has already picked
up to start another worker.

Regards,



I'm not sure if what I am seeing is related to this or another issue.

If I create a subscription for a publication that doesn't exist (yet) I 
see 'publication mypub does not exist" in the subscribers log


If I then create the publication on the publisher the error doesn't go 
away, the worker process keeps spinning with the same error.


If I then drop the subscription and recreate it it works.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] 10 beta docs: different replication solutions

2017-07-30 Thread Steve Singer



We don't seem to describe logical replication on

https://www.postgresql.org/docs/10/static/different-replication-solutions.html

The attached patch adds a section.

Steve


diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index 138bdf2..1329d1f 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -158,6 +158,26 @@ protocol to make nodes agree on a serializable transactional order.

   
 
+
+  
+   Logical Replication
+   
+
+
+  Logical replication allows a database server to send a stream of
+  data modifications to another server.
+  PostgreSQL logical replication constructs
+  a stream of logical data modifications from the WAL.
+
+
+ Logical replication allows the data changes from individual tables
+ to be replicated. Logical replication doesn't require a particular server
+ to be designated as a master or a slave but allows data to flow in multiple
+ directions. For more information on logical replication, see .
+
+   
+  
+ 
   
Trigger-Based Master-Standby Replication

@@ -290,6 +310,7 @@ protocol to make nodes agree on a serializable transactional order.
  Shared Disk Failover
  File System Replication
  Write-Ahead Log Shipping
+ Logical Replication
  Trigger-Based Master-Standby Replication
  Statement-Based Replication Middleware
  Asynchronous Multimaster Replication
@@ -304,6 +325,7 @@ protocol to make nodes agree on a serializable transactional order.
  NAS
  DRBD
  Streaming Repl.
+ Logical Repl.
  Slony
  pgpool-II
  Bucardo
@@ -315,6 +337,7 @@ protocol to make nodes agree on a serializable transactional order.
  shared disk
  disk blocks
  WAL
+ Logical decoding
  table rows
  SQL
  table rows
@@ -330,6 +353,7 @@ protocol to make nodes agree on a serializable transactional order.
  •
  •
  •
+ •
 
 
 
@@ -337,6 +361,7 @@ protocol to make nodes agree on a serializable transactional order.
  
  
  
+ •
  
  •
  •
@@ -349,6 +374,7 @@ protocol to make nodes agree on a serializable transactional order.
  
  •
  
+ 
  •
  
  
@@ -360,6 +386,7 @@ protocol to make nodes agree on a serializable transactional order.
  
  with sync off
  •
+ •
  
  •
  
@@ -371,6 +398,7 @@ protocol to make nodes agree on a serializable transactional order.
  •
  with sync on
  
+ 
  •
  
  •
@@ -385,6 +413,7 @@ protocol to make nodes agree on a serializable transactional order.
  •
  •
  •
+ •
 
 
 
@@ -393,6 +422,7 @@ protocol to make nodes agree on a serializable transactional order.
  
  
  •
+ •
  
  •
  •
@@ -403,6 +433,7 @@ protocol to make nodes agree on a serializable transactional order.
  •
  •
  •
+ 
  •
  
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 10 beta docs: different replication solutions

2017-07-31 Thread Steve Singer


On Mon, 31 Jul 2017, Merlin Moncure wrote:


On Sun, Jul 30, 2017 at 8:34 PM, Steve Singer  wrote:


We don't seem to describe logical replication on

https://www.postgresql.org/docs/10/static/different-replication-solutions.html

The attached patch adds a section.


This is a good catch.  Two quick observations:

1) Super pedantic point. I don't like the 'repl.' abbreviation in the
'most common implementation' both for the existing hs/sr and for the
newly added logical.


Updated



2) This lingo:
+ Logical replication allows the data changes from individual tables
+ to be replicated. Logical replication doesn't require a particular server
+ to be designated as a master or a slave but allows data to flow
in multiple
+ directions. For more information on logical replication, see
.

Is good, but I would revise it just a bit to emphasize the
subscription nature of logical replication to link the concepts
expressed strongly in the main section.  For example:

Logical replication allows the data changes [remove: "from individual
tables to be replicated"] to be published to subscriber nodes.  Data
can flow in any direction between nodes on a per-table basis; there is
no concept of a master server.  Conflict resolution must be handled
completely by the application.  For more information on...

what do you think?


Sounds good.

I've incorporated these changes into an updated patch but I changed the 
language around conflict resolution.  Conflict resolution could be handled 
by triggers or adding extra columns to the primary key, that wouldn't be 
'completely by the application'





merlin

diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index 138bdf2..19b26f8 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -158,6 +158,27 @@ protocol to make nodes agree on a serializable transactional order.

   
 
+
+  
+   Logical Replication
+   
+
+
+  Logical replication allows a database server to send a stream of
+  data modifications to another server.
+  PostgreSQL logical replication constructs
+  a stream of logical data modifications from the WAL.
+
+
+  Logical replication allows the data changes to be published to subscriber
+  nodes. Data can flow in any direction between nodes on a per-table basis;
+  there is no concept of a master server. PostgreSQL
+  does not include support for conflict resolution.
+  For more information on logical replication, see .
+   
+   
+  
+ 
   
Trigger-Based Master-Standby Replication

@@ -290,6 +311,7 @@ protocol to make nodes agree on a serializable transactional order.
  Shared Disk Failover
  File System Replication
  Write-Ahead Log Shipping
+ Logical Replication
  Trigger-Based Master-Standby Replication
  Statement-Based Replication Middleware
  Asynchronous Multimaster Replication
@@ -303,7 +325,8 @@ protocol to make nodes agree on a serializable transactional order.
  Most Common Implementation
  NAS
  DRBD
- Streaming Repl.
+ Streaming Replication.
+ Logical Replication.
  Slony
  pgpool-II
  Bucardo
@@ -315,6 +338,7 @@ protocol to make nodes agree on a serializable transactional order.
  shared disk
  disk blocks
  WAL
+ Logical decoding
  table rows
  SQL
  table rows
@@ -330,6 +354,7 @@ protocol to make nodes agree on a serializable transactional order.
  •
  •
  •
+ •
 
 
 
@@ -337,6 +362,7 @@ protocol to make nodes agree on a serializable transactional order.
  
  
  
+ •
  
  •
  •
@@ -349,6 +375,7 @@ protocol to make nodes agree on a serializable transactional order.
  
  •
  
+ 
  •
  
  
@@ -360,6 +387,7 @@ protocol to make nodes agree on a serializable transactional order.
  
  with sync off
  •
+ •
  
  •
  
@@ -371,6 +399,7 @@ protocol to make nodes agree on a serializable transactional order.
  •
  with sync on
  
+ 
  •
  
  •
@@ -385,6 +414,7 @@ protocol to make nodes agree on a serializable transactional order.
  •
  •
  •
+ •
 
 
 
@@ -393,6 +423,7 @@ protocol to make nodes agree on a serializable transactional order.
  
  
  •
+ •
  
  •
  •
@@ -403,6 +434,7 @@ protocol to make nodes agree on a serializable transactional order.
  •
  •
  •
+ 
  •
  
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapshot too old, configured by time

2015-09-08 Thread Steve Singer


On 08/31/2015 10:07 AM, Kevin Grittner wrote:


Kevin,

I've started to do a review on this patch but I am a bit confused with 
some of what I am seeing.


The attached testcase fails I  replace the cursor in your test case with 
direct selects from the table.
I would have expected this to generate the snapshot too old error as 
well but it doesn't.




#   Failed test 'expect "snapshot too old" error'
#   at t/002_snapshot_too_old_select.pl line 64.
#  got: ''
# expected: '72000'
# Looks like you failed 1 test of 9.
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/9 subtests

Am I misunderstanding something or is the patch not working as expected?





As discussed when the "proof of concept" patch was submitted during
9.5 development, here is a version intended to be considered for
commit to 9.6, with the following changes:

1. It is configured using time rather than number of transactions.
Not only was there unanimous agreement here that this was better,
but the EDB customer who had requested this feature and who had
been testing it independently made the same request.

2. The "proof of concept" patch only supported heap and btree
checking; this supports all index types.

3. Documentation has been added.

4. Tests have been added.  They are currently somewhat minimal,
since this is using a whole new technique for testing from any
existing committed tests -- I wanted to make sure that this
approach to testing was OK with everyone before expanding it.  If
it is, I assume we will want to move some of the more generic
portions to a .pm file to make it available for other tests.

Basically, this patch aims to limit bloat when there are snapshots
that are kept registered for prolonged periods.  The immediate
reason for this is a customer application that keeps read-only
cursors against fairly static data open for prolonged periods, and
automatically fields SQLSTATE 72000 to re-open them if necessary.
When used, it should also provide some protections against extreme
bloat from forgotten "idle in transaction" connections which are
left holding a snapshot.

Once a snapshot reaches the age threshold, it can be terminated if
reads data modified after the snapshot was built.  It is expected
that useful ranges will normally be somewhere from a few hours to a
few days.

By default old_snapshot_threshold is set to -1, which disables the
new behavior.

The customer has been testing a preliminary version of this
time-based patch for several weeks, and is happy with the results
-- it is preventing bloat for them and not generating "snapshot too
old" errors at unexpected times.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






002_snapshot_too_old_select.pl
Description: Perl program

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] record identical operator

2013-09-18 Thread Steve Singer


On 09/18/2013 11:39 AM, Stephen Frost wrote:

* Kevin Grittner (kgri...@ymail.com) wrote:

= and <> aren't listed above even though they do a byte-for-byte
comparison because, well, I guess because we have chosen to treat
two UTF8 strings which produce the same set of glyphs using
different bytes as unequal.  :-/

I tend to side with Andres on this case actually- we're being asked to
store specific UTF8 bytes by the end user.  That is not the same as
treating two different numerics which are the same *number* as
different because they have different binary representations, which is
entirely an internal-to-postgres consideration.



The problem is that there are datatypes (citext, postgis,...) that have 
defined = to return true when comparing two values that are different 
not just stored differently.  Are you saying that matview's should 
update only when the = operator of the datatype returns false and if 
people don't like this behaviour they should fix the datatypes?




Thanks,

Stephen




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] record identical operator - Review

2013-09-19 Thread Steve Singer


On 09/12/2013 06:27 PM, Kevin Grittner wrote:

Attached is a patch for a bit of infrastructure I believe to be
necessary for correct behavior of REFRESH MATERIALIZED VIEW
CONCURRENTLY as well as incremental maintenance of matviews.


Here is attempt at a review of the v1 patch.

There has been a heated discussion on list about if we even want this 
type of operator. I will try to summarize it here


The arguments against it are
* that a record_image_identical call on two records that returns false 
can still return true under the equality operator, and the records might 
(or might not) appear to be the same to users
* Once we expose this as an operator via SQL someone will find it and 
use it and then complain when we change things such as the internal 
representation of a datatype which might

   break there queries
* The differences between = and record_image_identical might not be 
understood by everywhere who finds the operator leading to confusion
* Using a criteria other than equality (the = operator provided by the 
datatype) might cause problems if we later provide 'on change' triggers 
for materialized views


The arguments for this patch are
* We want the materialized view to return the same value as would be 
returned if the query were executed directly.  This not only means that 
the values should be the same according to a datatypes = operator but 
that they should appear the same 'to the eyeball'.
* Supporting the materialized view refresh check via SQL makes a lot of 
sense but doing that requires exposing something via SQL
* If we adequately document what we mean by record_image_identical and 
the operator we pick for this then users shouldn't be surprised at what 
they get if they use this


I think there is agreement that better (as in more obscure) operators 
than === and !== need to be picked  and we need to find a place in the 
user-docs to warn users of the behaviour of this operators.   Hannu has 
proposed


*==*   "binary equal, surely very equal by any other definition as wall"
!==?  "maybe not equal" -- "binary inequal, may still be equal by
other comparison methods"

and no one has yet objected to this.  My vote would be to update the patch to 
use those operator names and then figure out where we can document them.  It it 
means we have to section describing the equal operator for records as well then 
maybe we should do that so we can get on with things.


Code Review


The record_image_eq and record_image_cmp functions are VERY similar.  It seems 
to me that you could have the meet of these functions into a common function 
(that isn't exposed via SQL) that then behaves differently  in 2 or 3 spots 
based on a parameter that controls if it detoasts the tuple for the compare or 
returns false on the equals.

Beyond that I don't see any issues with the code.

You call out a question about two records being compared have a different 
number of columns should it always be an error, or only an error when they 
match on the columns they do have in common.

The current behaviour is

 select * FROM r1,r2 where r1<==r2;
 a | b | a | b | c
---+---+---+---+---
 1 | 1 | 1 | 2 | 1
(1 row)

update r2 set b=1;
UPDATE 1
test=# select * FROM r1,r2 where r2<==r1;
ERROR:  cannot compare record types with different numbers of columns

This seems like a bad idea to me.  I don't like that I get a type comparison 
error only sometimes based on the values of the data in the column.  If I'm a 
user testing code that compares two records with this operator and I test my 
query with 1 value pair then and it works then I'd naively expect to get a true 
or false on all other value sets of the same record type.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] record identical operator - Review

2013-09-20 Thread Steve Singer


On 09/20/2013 08:38 AM, Kevin Grittner wrote:
Did you look at the record_eq and record_cmp functions which exist 
before this patch?  Is there a reason we should do it one way for the 
default operators and another way for this alternative? 
Do you think we should change record_eq and record_cmp to do things 
the way you suggest? 


I think the record_eq and record_cmp functions would be better if they 
shared code as well, but I don't think changing that should be part of 
this patch, you aren't otherwise touching those functions. I know some 
people dislike code that is switch based and prefer duplication, my 
preference is to avoid duplication.


This seems like a bad idea to me.  I don't like that I get a type comparison
error only sometimes based on the values of the data in the column.  If I'm
a user testing code that compares two records with this operator and I test my
query with 1 value pair then and it works then I'd naively expect to get a
true or false on all other value sets of the same record type.


Again, this is a result of following the precedent set by the
existing record comparison operators.

test=# create table r1 (c1 int, c2 int);
CREATE TABLE
test=# create table r2 (c1 int, c2 int, c3 int);
CREATE TABLE
test=# insert into r1 values (1, 2);
INSERT 0 1
test=# insert into r2 values (3, 4, 5);
INSERT 0 1
test=# select * from r1 join r2 on r1 < r2;
  c1 | c2 | c1 | c2 | c3
++++
   1 |  2 |  3 |  4 |  5
(1 row)

test=# update r1 set c1 = 3, c2 = 4;
UPDATE 1
test=# select * from r1 join r2 on r1 < r2;
ERROR:  cannot compare record types with different numbers of columns

Would be in favor of forcing a change to the record comparison
operators which have been in use for years?  If not, is there a
good reason to have them behave differently in this regard?

--
I hadn't picked up on that you copied that part of the behaviour from 
the existing comparison operators.
No we would need a really good reason for changing the behaviour of the 
comparison operators and I don't think we have that.  I agree that the 
binary identical operators should behave similarly to the existing 
comparison operators and error out on the column number mismatch after 
comparing the columns that are present in both.


Steve


Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6

2013-09-21 Thread Steve Singer


On 09/20/2013 06:33 AM, Andres Freund wrote:

Hi,




The points I find daunting are the semantics, like:
* How do we control whether a standby is allowed prevent WAL file
   removal. What if archiving is configured?
* How do we control whether a standby is allowed to peg xmin?
* How long do we peg an xmin/wal file removal if the standby is gone
* How does the userinterface look to remove a slot if a standby is gone
* How do we decide/control which commands use a slot in which cases?


I think we are going to want to be flexible enough to support users with 
a couple of different points of use-cases.
* Some people will want to keep xmin pegged and prevent WAL removal so a 
standby with a slot can always catch up, and wi
* Most people will want to say keep X megabytes of WA (if needed by a 
behind slot) and keep xmin pegged so that the WAL can be consumed by a 
logical plugin.


I can see us also implementing a restore_command that the walsender 
could use to get archived segments but for logical replication xmin 
would still need to be low enough


I don't think the current patch set is incompatible with us later 
implementing any of the above. I'd rather see us focus on getting the 
core functionality committed and worry about a good interface for 
managing slots later.



Greetings, Andres Freund 


Steve



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6

2013-09-24 Thread Steve Singer


On 09/24/2013 11:21 AM, Andres Freund wrote:

Not having a consumer of the walsender interface included sounds like a
bad idea to me, even if it were only useful for testing. Now, you could
argue it should be in /contrib - and I wouldn't argue against that
except it shares code with the rest of src/bin/pg_basebackup.


+1 on pg_receivellog (or whatever better name we pick) being somewhere.
I found the pg_receivellog code very useful as an example and for 
debugging/development purposes.
It isn't something that I see useful for the average user and I think 
the use-cases it meets are closer to other things we usually put in /contrib


Steve



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6

2013-09-25 Thread Steve Singer


On 09/17/2013 10:31 AM, Andres Freund wrote:

This patch set now fails to apply because of the commit "Rename various
"freeze multixact" variables".
And I am even partially guilty for that patch...

Rebased patches attached.


While testing the logical replication changes against my WIP logical 
slony I am sometimes getting error messages from the WAL sender of the form:

unexpected duplicate for tablespace  X relfilenode  X

The call stack is

HeapTupleSatisfiesMVCCDuringDecoding
heap_hot_search_buffer
index_fetch_heap
index_getnext
systable_getnext
RelidByRelfilenode
ReorderBufferCommit
 DecodeCommit
.
.
.


I am working off something based on your version 
e0acfeace6d695c229efd5d78041a1b734583431



Any ideas?


Greetings,

Andres Freund







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6

2013-09-25 Thread Steve Singer


On 09/25/2013 11:08 AM, Andres Freund wrote:

On 2013-09-25 11:01:44 -0400, Steve Singer wrote:

On 09/17/2013 10:31 AM, Andres Freund wrote:

This patch set now fails to apply because of the commit "Rename various
"freeze multixact" variables".
And I am even partially guilty for that patch...

Rebased patches attached.

While testing the logical replication changes against my WIP logical slony I
am sometimes getting error messages from the WAL sender of the form:
unexpected duplicate for tablespace  X relfilenode  X

Any chance you could provide a setup to reproduce the error?



The steps to build a setup that should reproduce this error are:

1.  I had apply the attached patch on top of your logical replication 
branch so my pg_decode_init  would now if it was being called as part of 
a INIT_REPLICATION or START_REPLICATION.
Unless I have misunderstood something you probably will want to merge 
this fix in


2.  Get my WIP for adding logical support to slony from: 
g...@github.com:ssinger/slony1-engine.git branch logical_repl  
(4af1917f8418a)
(My code changes to slony are more prototype level code quality than 
production code quality)


3.
cd slony1-engine
./configure --with-pgconfigdir=/usr/local/pg94wal/bin (or whatever)
make
make install

4.   Grab the clustertest framework JAR from 
https://github.com/clustertest/clustertest-framework and build up a 
clustertest jar file


5.  Create a file
slony1-engine/clustertest/conf/java.conf
that contains the path to the above JAR file  as a shell variable 
assignment: ie

CLUSTERTESTJAR=/home/ssinger/src/clustertest/clustertest_git/build/jar/clustertest-coordinator.jar

6.
cp clustertest/conf/disorder.properties.sample 
clustertest/conf/disorder.properties



edit disorder.properites to have the proper values for your 
environment.  All 6 databases can point at the same postgres instance, 
this test will only actually use 2 of them(so far).


7. Run the test
cd clustertest
./run_all_disorder_tests.sh

This involves having the slon connect to the walsender on the database 
test1 and replicate the data into test2 (which is a different database 
on the same postmaster)


If this setup seems like too much effort I can request one of the 
commitfest VM's from Josh and get everything setup there for you.


Steve


Any ideas?

I'll look into it. Could you provide any context to what youre doing
that's being decoded?

Greetings,

Andres Freund



>From 9efae980c357d3b75c6d98204ed75b8d29ed6385 Mon Sep 17 00:00:00 2001
From: Steve Singer 
Date: Mon, 16 Sep 2013 10:21:39 -0400
Subject: [PATCH] set the init parameter to true when performing an INIT
 REPLICATION

---
 src/backend/replication/walsender.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 2187d96..1b8f289 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -743,7 +743,7 @@ InitLogicalReplication(InitLogicalReplicationCmd *cmd)
 	sendTimeLine = ThisTimeLineID;
 
 	initStringInfo(&output_message);
-	ctx = CreateLogicalDecodingContext(MyLogicalDecodingSlot, false, InvalidXLogRecPtr,
+	ctx = CreateLogicalDecodingContext(MyLogicalDecodingSlot, true, InvalidXLogRecPtr,
 	   NIL,	replay_read_page,
 	   WalSndPrepareWrite, WalSndWriteData);
 
-- 
1.7.10.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6

2013-09-26 Thread Steve Singer

On 09/25/2013 01:20 PM, Steve Singer wrote:

On 09/25/2013 11:08 AM, Andres Freund wrote:

On 2013-09-25 11:01:44 -0400, Steve Singer wrote:

On 09/17/2013 10:31 AM, Andres Freund wrote:
This patch set now fails to apply because of the commit "Rename
various

"freeze multixact" variables".
And I am even partially guilty for that patch...

Rebased patches attached.
While testing the logical replication changes against my WIP logical
slony I

am sometimes getting error messages from the WAL sender of the form:
unexpected duplicate for tablespace X relfilenode X

Any chance you could provide a setup to reproduce the error?

The steps to build a setup that should reproduce this error are:

1. I had apply the attached patch on top of your logical replication
branch so my pg_decode_init would now if it was being called as part
of a INIT_REPLICATION or START_REPLICATION.
Unless I have misunderstood something you probably will want to merge
this fix in

2. Get my WIP for adding logical support to slony from:
g...@github.com:ssinger/slony1-engine.git branch logical_repl
(4af1917f8418a)
(My code changes to slony are more prototype level code quality than
production code quality)

3.
cd slony1-engine
./configure --with-pgconfigdir=/usr/local/pg94wal/bin (or whatever)
make
make install

4. Grab the clustertest framework JAR from
https://github.com/clustertest/clustertest-framework and build up a
clustertest jar file

5. Create a file
slony1-engine/clustertest/conf/java.conf
that contains the path to the above JAR file as a shell variable
assignment: ie
CLUSTERTESTJAR=/home/ssinger/src/clustertest/clustertest_git/build/jar/clustertest-coordinator.jar

6.
cp clustertest/conf/disorder.properties.sample
clustertest/conf/disorder.properties

edit disorder.properites to have the proper values for your
environment. All 6 databases can point at the same postgres instance,
this test will only actually use 2 of them(so far).

7. Run the test
cd clustertest
./run_all_disorder_tests.sh

This involves having the slon connect to the walsender on the database
test1 and replicate the data into test2 (which is a different database
on the same postmaster)

If this setup seems like too much effort I can request one of the
commitfest VM's from Josh and get everything setup there for you.

Steve

Any ideas?

I'll look into it. Could you provide any context to what youre doing
that's being decoded?

I've determined that when in this test the walsender seems to be hitting
this when it is decode the transactions that are behind the slonik
commands to add tables to replication (set add table, set add
sequence). This is before the SUBSCRIBE SET is submitted.

I've also noticed something else that is strange (but might be
unrelated). If I stop my slon process and restart it I get messages like:

WARNING: Starting logical replication from 0/a9321360
ERROR: cannot stream from 0/A9321360, minimum is 0/A9320B00

Where 0/A9321360 was sent in the last packet my slon received from the
walsender before the restart.

If force it to restart replication from 0/A9320B00 I see datarows that I
appear to have already seen before the restart.
I think this is happening when I process the data for 0/A9320B00 but
don't get the feedback message my slon was killed. Is this expected?

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6

2013-09-27 Thread Steve Singer


On 09/26/2013 02:47 PM, Steve Singer wrote:



I've determined that when in this test the walsender seems to be 
hitting this when it is decode the transactions that are behind the 
slonik commands to add tables to replication (set add table, set add 
sequence).  This is before the SUBSCRIBE SET is submitted.


I've also noticed something else that is strange (but might be 
unrelated).  If I stop my slon process and restart it I get messages 
like:


WARNING:  Starting logical replication from 0/a9321360
ERROR:  cannot stream from 0/A9321360, minimum is 0/A9320B00

Where 0/A9321360 was sent in the last packet my slon received from the 
walsender before the restart.


If force it to restart replication from 0/A9320B00 I see datarows that 
I appear to have already seen before the restart.
I think this is happening when I process the data for 0/A9320B00 but 
don't get the feedback message my slon was killed. Is this expected?





I've further narrowed this down to something (or the combination of) 
what the  _disorder_replica.altertableaddTriggers(1);

stored function does.  (or @SLONYNAMESPACE@.altertableaddTriggers(int);

Which is essentially
* Get an exclusive lock on sl_config_lock
* Get an exclusive lock on the user table in question
* create a trigger (the deny access trigger)
* create a truncate trigger
* create a deny truncate trigger

I am not yet able to replicate the error by issuing the same SQL 
commands from psql, but I must be missing something.


I can replicate this when just using the test_decoding plugin.











Greetings,

Andres Freund













--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.1

2013-09-28 Thread Steve Singer


On 09/27/2013 11:44 AM, Andres Freund wrote:

I'm encountering a make error:
Gah. Lastminute changes. Always the same... Updated patch attached.

Greetings,

Andres Freund




I'm still encountering an error in the make.

make clean
.
.make[3]: Entering directory 
`/usr/local/src/postgresql/src/bin/pg_basebackup'

rm -f pg_basebackup pg_receivexlog pg_recvlogical(X) \
pg_basebackup.o pg_receivexlog.o pg_recvlogical.o \
receivelog.o streamutil.o
/bin/sh: 1: Syntax error: "(" unexpected
make[3]: *** [clean] Error 2

I had to add a quotes in to the clean commands to make it work









--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6

2013-09-28 Thread Steve Singer


On 09/27/2013 05:18 PM, Andres Freund wrote:

Hi Steve,

On 2013-09-27 17:06:59 -0400, Steve Singer wrote:

I've determined that when in this test the walsender seems to be hitting
this when it is decode the transactions that are behind the slonik
commands to add tables to replication (set add table, set add sequence).
This is before the SUBSCRIBE SET is submitted.

I've also noticed something else that is strange (but might be unrelated).
If I stop my slon process and restart it I get messages like:

WARNING:  Starting logical replication from 0/a9321360
ERROR:  cannot stream from 0/A9321360, minimum is 0/A9320B00

Where 0/A9321360 was sent in the last packet my slon received from the
walsender before the restart.

Uh, that looks like I fumbled some comparison. Let me check.


I've further narrowed this down to something (or the combination of) what
the  _disorder_replica.altertableaddTriggers(1);
stored function does.  (or @SLONYNAMESPACE@.altertableaddTriggers(int);

Which is essentially
* Get an exclusive lock on sl_config_lock
* Get an exclusive lock on the user table in question
* create a trigger (the deny access trigger)
* create a truncate trigger
* create a deny truncate trigger

I am not yet able to replicate the error by issuing the same SQL commands
from psql, but I must be missing something.

I can replicate this when just using the test_decoding plugin.

Thanks. That should get me started with debugging. Unless it's possibly
fixed in the latest version, one bug fixed there might cause something
like this if the moon stands exactly right?


The latest version has NOT fixed the problem.

Also, I was a bit inaccurate in my previous descriptions. To clarify:

1.   I sometimes am getting that 'unexpected duplicate' error
2.   The 'set add table ' which triggers those functions that create and 
configure triggers is actually causing the walsender to hit the 
following assertion

2  0x00773d47 in ExceptionalCondition (
conditionName=conditionName@entry=0x8cf400 "!(ent->cmin == 
change->tuplecid.cmin)", errorType=errorType@entry=0x7ab830 
"FailedAssertion",

fileName=fileName@entry=0x8cecc3 "reorderbuffer.c",
lineNumber=lineNumber@entry=1162) at assert.c:54
#3  0x00665480 in ReorderBufferBuildTupleCidHash (txn=0x1b6e610,
rb=) at reorderbuffer.c:1162
#4  ReorderBufferCommit (rb=0x1b6e4f8, xid=,
commit_lsn=3461001952, end_lsn=) at reorderbuffer.c:1285
#5  0x0065f0f7 in DecodeCommit (xid=,
nsubxacts=, sub_xids=, ninval_msgs=16,
msgs=0x1b637c0, buf=0x7fff54d01530, buf=0x7fff54d01530, ctx=0x1adb928,
ctx=0x1adb928) at decode.c:477


I had added an assert(false) to the code where the 'unknown duplicate' 
error was logged to make spotting this easier but yesterday I didn't 
double check that I was hitting the assertion I added versus this other 
one.   I can't yet say if this is two unrelated issues or if I'd get to 
the 'unknown duplicate' message immediately after.






Greetings,

Andres Freund





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] record identical operator - Review

2013-09-29 Thread Steve Singer


On 09/28/2013 03:03 PM, Kevin Grittner wrote:


+To support matching of rows which include elements without a default
+B-tree operator class, the following operators are defined for composite
+type comparison:
+*=,
+*<>,
+*<,
+*<=,
+*>, and
+*>=.
+These operators are also used internally to maintain materialized views,
+and may be useful to replication software.  To ensure that all user
+visible changes are detected, even when the equality operator for the
+type treats old and new values as equal, the byte images of the stored
+data are compared.  While ordering is deterministic, it is not generally
+useful except to facilitate merge joins.  Ordering may differ between
+system architectures and major releases of
+PostgreSQL.
+   


How about

 To support matching of rows which include elements without a default
B-tree operator class, the following operators are defined for composite
type comparison:
*=,
*<>,
*<,
*<=,
*>, and
   *>=.

These operators compare the internal binary representation of the two 
rows.   Two rows might have a different binary representation even 
though comparisons of the two rows with the equality operator is true.   
The ordering of rows under these comparision operators is deterministic 
but not otherwise meaningful.  These operators are used internally for 
materialized views and might be useful for other specialized purposes 
such as replication but are not intended to be generally useful for 
writing queries.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.2

2013-10-01 Thread Steve Singer


On 09/30/2013 06:44 PM, Andres Freund wrote:

Hi,

The series from friday was a bit too buggy - obviously I was too
tired. So here's a new one:


With this series I've also noticed
#2  0x007741a7 in ExceptionalCondition (
conditionName=conditionName@entry=0x7c2908 "!(!(tuple->t_infomask & 
0x1000))", errorType=errorType@entry=0x7acc70 "FailedAssertion",

fileName=fileName@entry=0x91767e "tqual.c",
lineNumber=lineNumber@entry=1608) at assert.c:54
54abort();


 0x007a4432 in HeapTupleSatisfiesMVCCDuringDecoding (
htup=0x10bfe48, snapshot=0x108b3d8, buffer=310) at tqual.c:1608
#4  0x0049d6b7 in heap_hot_search_buffer (tid=tid@entry=0x10bfe4c,
relation=0x7fbebbcd89c0, buffer=310, snapshot=0x10bfda0,
heapTuple=heapTuple@entry=0x10bfe48,
all_dead=all_dead@entry=0x7fff4aa3866f "\001\370\375\v\001",
first_call=1 '\001') at heapam.c:1756
#5  0x004a8174 in index_fetch_heap (scan=scan@entry=0x10bfdf8)
at indexam.c:539
#6  0x004a82a8 in index_getnext (scan=0x10bfdf8,
direction=direction@entry=ForwardScanDirection) at indexam.c:622
#7  0x004a6fa9 in systable_getnext (sysscan=sysscan@entry=0x10bfd48)
at genam.c:343
#8  0x0076df40 in RelidByRelfilenode (reltablespace=0,
relfilenode=529775) at relfilenodemap.c:214
---Type  to continue, or q  to quit---
#9  0x00664ad7 in ReorderBufferCommit (rb=0x1082d98,
xid=, commit_lsn=4638756800, end_lsn=,
commit_time=commit_time@entry=433970378426176) at reorderbuffer.c:1320

In addition to some of the other ones I've posted about.



* fix pg_recvlogical makefile (Thanks Steve)
* fix two commits not compiling properly without later changes (Thanks Kevin)
* keep track of commit timestamps
* fix bugs with option passing in test_logical_decoding
* actually parse option values in test_decoding instead of just using the
   option name
* don't use anonymous structs in unions. That's compiler specific (msvc
   and gcc) before C11 on which we can't rely. That unfortunately will
   break output plugins because ReorderBufferChange need to qualify
   old/new tuples now
* improve error handling/cleanup in test_logical_decoding
* some minor cleanups

Patches attached, git tree updated.

Greetings,

Andres Freund







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] record identical operator - Review

2013-10-03 Thread Steve Singer


On 09/30/2013 09:08 AM, Kevin Grittner wrote:

Steve Singer  wrote:


How about

To support matching of rows which include elements without a default
  B-tree operator class, the following operators are defined for composite
  type comparison:
  *=,
  *<>,
  *<,
  *<=,
  *>, and
  *>=.

These operators compare the internal binary representation of the two
rows.  Two rows might have a different binary representation even
though comparisons of the two rows with the equality operator is true.
The ordering of rows under these comparision operators is deterministic
but not otherwise meaningful.  These operators are used internally for
materialized views and might be useful for other specialized purposes
such as replication but are not intended to be generally useful for
writing queries.

I agree that's an improvement.  Thanks!



Are there any outstanding issues on this patch preventing it from being 
committed?
I think we have discussed this patch enough such that we now have 
consensus on proceeding with adding a record identical operator to SQL.

No one has objected to the latest names of the operators.

You haven't adjusted the patch to reduce the duplication between the 
equality and comparison functions, if you disagree with me and feel that 
doing so would increase the code complexity and be inconsistent with how 
we do things elsewhere that is fine.


Steve




--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.2

2013-10-03 Thread Steve Singer


On 10/03/2013 12:38 PM, Andres Freund wrote:
Does your code use SELECT FOR UPDATE/SHARE on system or 
treat_as_catalog tables? Greetings, Andres Freund 


Yes.
It declares sl_table and sl_sequence and sl_set as catalog.

It does a
SELECT ..
from @NAMESPACE@.sl_table T, @NAMESPACE@.sl_set S,
"pg_catalog".pg_class PGC, "pg_catalog".pg_namespace PGN,
"pg_catalog".pg_index PGX, "pg_catalog".pg_class PGXC
where ... for update

in the code being executed by the 'set add table'.

(We also do select for update commands in many other places during 
cluster configuration commands)




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.2

2013-10-07 Thread Steve Singer


On 10/03/2013 04:00 PM, Andres Freund wrote:
Ok, there were a couple of bugs because I thought mxacts wouldn't need 
to be supported. So far your testcase doesn't crash the database 
anymore - it spews some internal errors though, so I am not sure if 
it's entirely fixed for you. Thanks for testing and helping! I've 
pushed the changes to the git tree, they aren't squashed yet and 
there's some further outstanding stuff, so I won't repost the series 
yet. Greetings, Andres Freund 
When I run your updated version (from friday, not what you posted today) 
against a more recent version of my slony changes I can get the test 
case to pass 2/3 'rd of the time.  The failures are due to an issue in 
slon itself that I need to fix.


I see lots of
0LOG:  tx with subtxn 58836

but they seem harmless.

Thanks



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.1

2013-10-08 Thread Steve Singer


On 10/07/2013 09:32 AM, Andres Freund wrote:

Todo:
* rename treat_as_catalog_table, after agreeing on the new name
* rename remaining timetravel function names
* restrict SuspendDecodingSnapshots usage to RelationInitPhysicalAddr,
   that ought to be enough.
* add InLogicalDecoding() function.
* throw away older data when reading xl_running_xacts records, to deal
   with immediate shutdowns/crashes


What is your current plan for decoding sequence updates?  Is this 
something that you were going to hold-off on supporting till a future 
version? ( know this was discussed a while ago but I don't remember 
where it stands now)


From a Slony point of view this isn't a big deal, I can continue to 
capture sequence changes in sl_seqlog when I create each SYNC event and 
then just replicate the INSERT statements in sl_seqlog via logical 
decoding.  I can see why someone building a replication system not based 
on the concept of a SYNC would have a harder time with this.


I am guessing we would want to pass sequence operations to the plugins 
as we encounter the WAL for them out-of-band of any transaction.   This 
would mean that a set of operations like


begin;
insert into a (id) values(4);
insert into a (id) values(nextval('some_seq'));
commit

would be replayed on the replicas as
setval('some_seq',100);
begin;
insert into a (id) values (4);
insert into a (id) values (100);
commit;




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.5

2013-11-09 Thread Steve Singer


On 11/05/2013 10:21 AM, Andres Freund wrote:

Hi,

Attached to this mail and in the xlog-decoding-rebasing-remapping branch
in my git[1] repository you can find the next version of the patchset that:
* Fixes full table rewrites of catalog tables using the method Robert
   prefers (which is to log rewrite mappings to disk)
* Extract the REPLICA IDENTITY as configured with ALTER TABLE for the
   old tuple for UPDATEs and DELETEs
* Much better support for synchronous replication
* Better resource cleanup (as in we need less local WAL available)
* Lots of smaller fixes
The change around REPLICA IDENTITY is *incompatible* to older output
plugins since we now log tuples using the table's TupleDesc, not the
indexes.


My updated plugin is getting rows with
change->tp.oldtuple as NULL on updates either with the default PRIMARY 
KEY identify or with a  FULL identity.


When I try the test_decoding plugin on UPDATE I get rows like:

table "do_inventory": UPDATE: ii_id[int8]:251 ii_in_stock[int8]:1 
ii_reserved[int8]:144 ii_total_sold[int8]:911


which I think is only data from the new tuple.The lack of "old-key" 
in the output makes me think the test decoding plugin also isn't getting 
the old tuple.


(This is with your patch-set rebased ontop of 
ac4ab97ec05ea900db0f14d428cae2e79832e02d which includes the patches 
Robert committed the other day, I can't rule out that I didn't break 
something in the rebase).







Robert, I'd be very grateful if you could have a look at patch 0007
implementing what we've discussed. I kept it separate to make it easier
to look at it in isolation, but I think in the end it partially should
be merged into the wal_level=logical patch.
I still think the "wide cmin/cmax" solution is more elegant and has
wider applicability, but this works as well although it's about 5 times
the code.

Comments?

[1]: 
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=summary
Greetings,

Andres Freund







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.5

2013-11-09 Thread Steve Singer


On 11/09/2013 05:42 PM, Andres Freund wrote:

On 2013-11-09 17:36:49 -0500, Steve Singer wrote:

On 11/05/2013 10:21 AM, Andres Freund wrote:

Hi,

Attached to this mail and in the xlog-decoding-rebasing-remapping branch
in my git[1] repository you can find the next version of the patchset that:
* Fixes full table rewrites of catalog tables using the method Robert
   prefers (which is to log rewrite mappings to disk)
* Extract the REPLICA IDENTITY as configured with ALTER TABLE for the
   old tuple for UPDATEs and DELETEs
* Much better support for synchronous replication
* Better resource cleanup (as in we need less local WAL available)
* Lots of smaller fixes
The change around REPLICA IDENTITY is *incompatible* to older output
plugins since we now log tuples using the table's TupleDesc, not the
indexes.

My updated plugin is getting rows with
change->tp.oldtuple as NULL on updates either with the default PRIMARY KEY
identify or with a  FULL identity.

When I try the test_decoding plugin on UPDATE I get rows like:

table "do_inventory": UPDATE: ii_id[int8]:251 ii_in_stock[int8]:1
ii_reserved[int8]:144 ii_total_sold[int8]:911

which I think is only data from the new tuple.The lack of "old-key" in
the output makes me think the test decoding plugin also isn't getting the
old tuple.

(This is with your patch-set rebased ontop of
ac4ab97ec05ea900db0f14d428cae2e79832e02d which includes the patches Robert
committed the other day, I can't rule out that I didn't break something in
the rebase).

I've pushed an updated tree to git, that contains that
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/xlog-decoding-rebasing-remapping
git://git.postgresql.org/git/users/andresfreund/postgres.git

and some more fixes. I'll send out an email with details sometime soon.


93c5c2a171455763995cef0afa907bcfaa405db4

Still give me the following:
update  disorder.do_inventory set ii_in_stock=2 where ii_id=251;
UPDATE 1
test1=# LOG:  tuple in table with oid: 35122 without primary key

\d disorder.do_inventory
   Table "disorder.do_inventory"
Column |  Type  | Modifiers
---++---
 ii_id | bigint | not null
 ii_in_stock   | bigint |
 ii_reserved   | bigint |
 ii_total_sold | bigint |
Indexes:
"do_inventory_pkey" PRIMARY KEY, btree (ii_id)
Foreign-key constraints:
"do_inventory_item_ref" FOREIGN KEY (ii_id) REFERENCES 
disorder.do_item(i_id) ON DELETE CASCADE

Referenced by:
TABLE "disorder.do_item" CONSTRAINT "do_item_inventory_ref" FOREIGN 
KEY (i_id) REFERENCES disorder.do_inventory(ii_id) DEFERRABLE INITIALLY 
DEFERRED
TABLE "disorder.do_restock" CONSTRAINT "do_restock_inventory_ref" 
FOREIGN KEY (r_i_id) REFERENCES disorder.do_inventory(ii_id) ON DELETE 
CASCADE

Triggers:
_disorder_replica_truncatetrigger BEFORE TRUNCATE ON 
disorder.do_inventory FOR EACH STATEMENT EXECUTE PROCEDURE 
_disorder_replica.log_truncate('3')

Disabled triggers:
_disorder_replica_denyaccess BEFORE INSERT OR DELETE OR UPDATE ON 
disorder.do_inventory FOR EACH ROW EXECUTE PROCEDURE 
_disorder_replica.denyaccess('_disorder_replica')
_disorder_replica_truncatedeny BEFORE TRUNCATE ON 
disorder.do_inventory FOR EACH STATEMENT EXECUTE PROCEDURE 
_disorder_replica.deny_truncate()

Replica Identity: FULL


The test decoder plugin gives me:

table "do_inventory": UPDATE: old-pkey:


a) The table does have a primary key
b) I don't get anything in the old key when I was expecting all the rows
c)  If I change the table to use the pkey index with
alter table disorder.do_inventory  replica identity using index 
do_inventory_pkey;


The LOG message on the update goes away but the output of the test 
decoder plugin goes back to


table "do_inventory": UPDATE: ii_id[int8]:251 ii_in_stock[int8]:5 
ii_reserved[int8]:144 ii_total_sold[int8]:911


Which I suspect means oldtuple is back to null




Greetings,

Andres Freund





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.5

2013-11-10 Thread Steve Singer


On 11/10/2013 09:41 AM, Andres Freund wrote:

Still give me the following:
update  disorder.do_inventory set ii_in_stock=2 where ii_id=251;
UPDATE 1
test1=# LOG:  tuple in table with oid: 35122 without primary key
Hm. Could it be that you still have an older "test_decoding" plugin
lying around? The current one doesn't contain that string
anymore. That'd explain the problems.
In v6.4 the output plugin API was changed that plain heaptuples are
passed for the "old" key, although with non-key columns set to
NULL. Earlier it was a "index tuple" as defined by the indexes
TupleDesc.


Grrr, yah that was the problem I had compiled but not installed the 
newer plugin. Sorry.




a) The table does have a primary key
b) I don't get anything in the old key when I was expecting all the rows
c)  If I change the table to use the pkey index with
alter table disorder.do_inventory  replica identity using index
do_inventory_pkey;

The LOG message on the update goes away but the output of the test decoder
plugin goes back to

table "do_inventory": UPDATE: ii_id[int8]:251 ii_in_stock[int8]:5
ii_reserved[int8]:144 ii_total_sold[int8]:911

Which I suspect means oldtuple is back to null

Which is legitimate though, if you don't update the primary (or
explicitly chosen candidate) key. Those only get logged if there's
actual changes in those columns.
Makes sense?

Is the expectation that plugin writters will call
RelationGetIndexAttrBitmap(relation,INDEX_ATTR_BITMAP_IDENTITY_KEY);
to figure out what the identity key is.

How do we feel about having the decoder logic populate change.oldtuple 
with the identity
on UPDATE statements when it is null?  The logic I have now is  to use 
oldtuple if it is not null, otherwise go figure out which columns from 
the identiy key we should be using.   I think most plugins that do 
anything useful with an update will need to duplicate that









Greetings,

Andres Freund

--
  Andres Freundhttp://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapshot too old, configured by time

2015-11-08 Thread Steve Singer


On 10/15/2015 05:47 PM, Kevin Grittner wrote:
All other issues raised by Álvaro and Steve have been addressed, 
except for this one, which I will argue against: 


I've been looking through the updated patch


In snapmgr.c


+ * XXX: If we can trust a read of an int64 value to be atomic, we can 
skip the

+ * spinlock here.
+ */
+int64
+GetOldSnapshotThresholdTimestamp(void)


Was your intent with the XXX for it to be a TODO to only aquire the lock 
on platforms without the atomic 64bit operations?


On a more general note:

I've tried various manual tests of this feature and it sometimes works 
as expected and sometimes doesn't.
I'm getting the feeling that how I expect it to work isn't quite in sync 
with how it does work.


I'd expect the following to be sufficient to generate the test

T1: Obtains a snapshot that can see some rows
T2: Waits 60 seconds and performs an update on those rows
T2: Performs a vacuum
T1: Waits 60 seconds, tries to select from the table.  The snapshot 
should be too old



For example it seems that in test 002 the select_ok on conn1 following 
the vacuum but right before the final sleep is critical to the snapshot 
too old error showing up (ie if I remove that select_ok but leave in the 
sleep I don't get the error)


Is this intended and if so is there a better way we can explain how 
things work?



Also is 0 intended to be an acceptable value for old_snapshot_threshold 
and if so what should we expect to see then?






So if I understand correctly, every call to BufferGetPage needs to have
a TestForOldSnapshot immediately afterwards?  It seems easy to later
introduce places that fail to test for old snapshots.  What happens if
they do?  Does vacuum remove tuples anyway and then the query returns
wrong results?  That seems pretty dangerous.  Maybe the snapshot could
be an argument of BufferGetPage?

There are 486 occurences of BufferGetPage in the source code, and
this patch follows 36 of them with TestForOldSnapshot.  This only
needs to be done when a page is going to be used for a scan which
produces user-visible results.  That is, it is *not* needed for
positioning within indexes to add or vacuum away entries, for heap
inserts or deletions (assuming the row to be deleted has already
been found).  It seems wrong to modify about 450 BufferGetPage
references to add a NULL parameter; and if we do want to make that
noop change as insurance, it seems like it should be a separate
patch, since the substance of this patch would be buried under the
volume of that.

I will add this to the November CF.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Fix conversion for Decimal arguments in plpython functions

2013-06-24 Thread Steve Singer


On 05/28/2013 04:41 PM, Szymon Guz wrote:

Hi,
I've got a patch.

This is for a plpython enhancement.

There is an item at the TODO list 
http://wiki.postgresql.org/wiki/Todo#Server-Side_Languages
"Fix loss of information during conversion of numeric type to Python 
float"


This patch uses a decimal.Decimal type from Python standard library 
for the plpthon function numeric argument instead of float.


Patch contains changes in code, documentation and tests.

Most probably there is something wrong, as this is my first Postgres 
patch :)




Thanks for contributing.

This patch applies cleanly against master and compiles with warnings

plpy_main.c: In function ‘PLy_init_interp’:
plpy_main.c:157:2: warning: ISO C90 forbids mixed declarations and code 
[-Wdeclaration-after-statement]
plpy_main.c:161:2: warning: ISO C90 forbids mixed declarations and code 
[-Wdeclaration-after-statement]


You can avoid this by moving the declaration of decimal and decimal_dict 
to be at the top of the function where mainmod is declared.


Also in this function you've introduced places where it returns with an 
error (the PLy_elog(ERROR...) calls before decrementing the reference to 
mainmod. I think you can decrement the mainmod reference after the call 
to SetItemString  before your changes that import the Decimal module.



The patch works as expected, I am able to write python functions that 
take numerics as arguments and work with them.  I can adjust the decimal 
context precision inside of  my function.


One concern I have is that this patch makes pl/python functions 
involving numerics more than 3 times as slow as before.



create temp table b(a numeric);
insert into b select generate_series(1,1);

create or replace function x(a numeric,b numeric) returns numeric as $$
if a==None:
  return b
return a+b
$$ language plpythonu;
create aggregate sm(basetype=numeric, sfunc=x,stype=numeric);


test=# select sm(a) from b;
sm
--
 50005000
(1 row)

Time: 565.650 ms

versus before the patch this was taking in the range of 80ms.

Would it be faster to call numeric_send instead of numeric_out and then 
convert the sequence of Int16's to a tuple of digits that can be passed 
into the Decimal constructor? I think this is worth trying and testing,



Documentation
=
Your patched version of the docs say

  PostgreSQL real, double, and 
numeric are converted to
   Python Decimal. This type is imported 
fromdecimal.Decimal.



I don't think this is correct, as far as I can tell your not changing 
the behaviour for postgresql real and double types, they continue to use 
floating point.






   PostgreSQL real and doubleare converted to
   Python float.





   PostgreSQL numeric is converted to
   Python Decimal. This type is imported from
decimal.Decimal.




Maybe?

Steve



thanks,
Szymon







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Fix conversion for Decimal arguments in plpython functions

2013-06-25 Thread Steve Singer


On 06/25/2013 06:42 AM, Szymon Guz wrote:




Hi,
I've attached a new patch. I've fixed all the problems you've found, 
except for the efficiency problem, which has been described in 
previous email.


thanks,
Szymon



This version of the patch addresses the issues I mentioned.  Thanks for 
looking into seeing if the performance issue is with our conversions to 
strings or inherit with the python decimal type.  I guess we 
(Postgresql) can't do much about it.   A runtime switch to use cdecimal 
if it is available is a good idea, but I agree with you that could be a 
different patch.


One minor thing I noticed in this round,

 PLy_elog(ERROR, "could not import module 'decimal'");

I think should have "decimal" in double-quotes.

I think this patch is ready for a committer to look at it.

Steve







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Fix conversion for Decimal arguments in plpython functions

2013-06-26 Thread Steve Singer


On 06/26/2013 04:47 PM, Szymon Guz wrote:





Attached patch has all changes against trunk code.

There is added a function for conversion from Postgres numeric to 
Python Decimal. The Decimal type is taken from cdecimal.Decimal, if it 
is available. It is an external library, quite fast, but may be not 
available. If it is not available, then decimal.Decimal will be used. 
It is in standard Python library, however it is rather slow.


The initialization is done in the conversion function, the pointer to 
a proper Decimal constructor is stored as static variable inside the 
function and is lazy initialized.


The documentation is updated.



I've tested this version with python 2.7 with and without cdecimal and 
also with 3.3 that has the faster decimal performance. It seems fine.


The v5 version of the patch makes only white-space changes to 
plpy_main.c you should excluded that from the patch if your making a new 
version (I have done this in the v6 version I'm attaching)



Tests for python 2 and 3 have been added. They work only with standard 
decimal.Decimal, as the type is printed in the *.out files. I think 
there is nothing we can do with that now.





I think we should make  test_type_conversion_numeric to do something 
that generates the same output in both cases.  ie
py.info(str(x)).  I downside of having the test fail on installs with 
cdecimal installed is much greater than any benefit we get by ensuring 
that the type is really decimal.
I've attached a v6 version of the patch that does this, do you agree 
with my thinking?





Steve







regards,
Szymon




diff --git a/doc/src/sgml/plpython.sgml b/doc/src/sgml/plpython.sgml
new file mode 100644
index aaf758d..da27874
*** a/doc/src/sgml/plpython.sgml
--- b/doc/src/sgml/plpython.sgml
*** $$ LANGUAGE plpythonu;
*** 308,321 

   
  
   

!PostgreSQL real, double,
!and numeric are converted to
!Python float.  Note that for
!the numeric this loses information and can lead to
!incorrect results.  This might be fixed in a future
!release.

   
  
--- 308,326 

   
  
+ 	 
+   
+PostgreSQL real and double are converted to
+Python float.
+   
+  
+ 
   

!PostgreSQL numeric is converted to
!Python Decimal. This type is imported from 
! 	   cdecimal package if it is available. If cdecimal
! 	   cannot be used, then decimal.Decimal will be used.

   
  
diff --git a/src/pl/plpython/expected/plpython_types.out b/src/pl/plpython/expected/plpython_types.out
new file mode 100644
index 4641345..46308ed
*** a/src/pl/plpython/expected/plpython_types.out
--- b/src/pl/plpython/expected/plpython_types.out
*** CONTEXT:  PL/Python function "test_type_
*** 213,248 
  (1 row)
  
  CREATE FUNCTION test_type_conversion_numeric(x numeric) RETURNS numeric AS $$
! plpy.info(x, type(x))
  return x
  $$ LANGUAGE plpythonu;
! /* The current implementation converts numeric to float. */
  SELECT * FROM test_type_conversion_numeric(100);
! INFO:  (100.0, )
  CONTEXT:  PL/Python function "test_type_conversion_numeric"
   test_type_conversion_numeric 
  --
! 100.0
  (1 row)
  
  SELECT * FROM test_type_conversion_numeric(-100);
! INFO:  (-100.0, )
  CONTEXT:  PL/Python function "test_type_conversion_numeric"
   test_type_conversion_numeric 
  --
!-100.0
  (1 row)
  
  SELECT * FROM test_type_conversion_numeric(50.5);
! INFO:  (50.5, )
  CONTEXT:  PL/Python function "test_type_conversion_numeric"
   test_type_conversion_numeric 
  --
   50.5
  (1 row)
  
  SELECT * FROM test_type_conversion_numeric(null);
! INFO:  (None, )
  CONTEXT:  PL/Python function "test_type_conversion_numeric"
   test_type_conversion_numeric 
  --
--- 213,264 
  (1 row)
  
  CREATE FUNCTION test_type_conversion_numeric(x numeric) RETURNS numeric AS $$
! plpy.info(str(x))
  return x
  $$ LANGUAGE plpythonu;
! /* The current implementation converts numeric to decimal.Decimal. */
  SELECT * FROM test_type_conversion_numeric(100);
! INFO:  100
  CONTEXT:  PL/Python function "test_type_conversion_numeric"
   test_type_conversion_numeric 
  --
!   100
  (1 row)
  
  SELECT * FROM test_type_conversion_numeric(-100);
! INFO:  -100
  CONTEXT:  PL/Python function "test_type_conversion_numeric"
   test_type_conversion_numeric 
  --
!  -100
  (1 row)
  
  SELECT * FROM test_type_conversion_numeric(50.5);
! INFO:  50.5
  CONTEXT:  PL/Python function "test_type_conversion_numeric"
   test_type_conversion_numeric 
  --
   50.5
  (1 row)
  
+ SELECT *

Re: [HACKERS] [PATCH] Fix conversion for Decimal arguments in plpython functions

2013-06-28 Thread Steve Singer


On 06/27/2013 05:04 AM, Szymon Guz wrote:
On 27 June 2013 05:21, Steve Singer <mailto:st...@ssinger.info>> wrote:


On 06/26/2013 04:47 PM, Szymon Guz wrote:






Hi Steve,
thanks for the changes.

You're idea about common code for decimal and cdecimal is good, 
however not good enough. I like the idea of common code for decimal 
and cdecimal. But we need class name, not the value.


I've changed the code from str(x) to x.__class__.__name__ so the 
function prints class name (which is Decimal for both packages), not 
the value. We need to have the class name check. The value is returned 
by the function and is a couple of lines lower in the file.


patch is attached.



I think the value is more important than the name, I want to the tests 
to make sure that the conversion is actually converting properly.  With 
your method of getting the class name without the module we can have both.


The attached patch should print the value and the class name but not the 
module name.


Steve



thanks,
Szymon








diff --git a/doc/src/sgml/plpython.sgml b/doc/src/sgml/plpython.sgml
new file mode 100644
index aaf758d..da27874
*** a/doc/src/sgml/plpython.sgml
--- b/doc/src/sgml/plpython.sgml
*** $$ LANGUAGE plpythonu;
*** 308,321 

   
  
   

!PostgreSQL real, double,
!and numeric are converted to
!Python float.  Note that for
!the numeric this loses information and can lead to
!incorrect results.  This might be fixed in a future
!release.

   
  
--- 308,326 

   
  
+ 	 
+   
+PostgreSQL real and double are converted to
+Python float.
+   
+  
+ 
   

!PostgreSQL numeric is converted to
!Python Decimal. This type is imported from 
! 	   cdecimal package if it is available. If cdecimal
! 	   cannot be used, then decimal.Decimal will be used.

   
  
diff --git a/src/pl/plpython/expected/plpython_types.out b/src/pl/plpython/expected/plpython_types.out
new file mode 100644
index 4641345..e602336
*** a/src/pl/plpython/expected/plpython_types.out
--- b/src/pl/plpython/expected/plpython_types.out
*** CONTEXT:  PL/Python function "test_type_
*** 213,248 
  (1 row)
  
  CREATE FUNCTION test_type_conversion_numeric(x numeric) RETURNS numeric AS $$
! plpy.info(x, type(x))
  return x
  $$ LANGUAGE plpythonu;
! /* The current implementation converts numeric to float. */
  SELECT * FROM test_type_conversion_numeric(100);
! INFO:  (100.0, )
  CONTEXT:  PL/Python function "test_type_conversion_numeric"
   test_type_conversion_numeric 
  --
! 100.0
  (1 row)
  
  SELECT * FROM test_type_conversion_numeric(-100);
! INFO:  (-100.0, )
  CONTEXT:  PL/Python function "test_type_conversion_numeric"
   test_type_conversion_numeric 
  --
!-100.0
  (1 row)
  
  SELECT * FROM test_type_conversion_numeric(50.5);
! INFO:  (50.5, )
  CONTEXT:  PL/Python function "test_type_conversion_numeric"
   test_type_conversion_numeric 
  --
   50.5
  (1 row)
  
  SELECT * FROM test_type_conversion_numeric(null);
! INFO:  (None, )
  CONTEXT:  PL/Python function "test_type_conversion_numeric"
   test_type_conversion_numeric 
  --
--- 213,264 
  (1 row)
  
  CREATE FUNCTION test_type_conversion_numeric(x numeric) RETURNS numeric AS $$
! plpy.info(x,x.__class__.__name__)
  return x
  $$ LANGUAGE plpythonu;
! /* The current implementation converts numeric to Decimal. */
  SELECT * FROM test_type_conversion_numeric(100);
! INFO:  (Decimal('100'), 'Decimal')
  CONTEXT:  PL/Python function "test_type_conversion_numeric"
   test_type_conversion_numeric 
  --
!   100
  (1 row)
  
  SELECT * FROM test_type_conversion_numeric(-100);
! INFO:  (Decimal('-100'), 'Decimal')
  CONTEXT:  PL/Python function "test_type_conversion_numeric"
   test_type_conversion_numeric 
  --
!  -100
  (1 row)
  
  SELECT * FROM test_type_conversion_numeric(50.5);
! INFO:  (Decimal('50.5'), 'Decimal')
  CONTEXT:  PL/Python function "test_type_conversion_numeric"
   test_type_conversion_numeric 
  --
   50.5
  (1 row)
  
+ SELECT * FROM test_type_conversion_numeric(1234567890.0987654321);
+ INFO:  (Decimal('1234567890.0987654321'), 'Decimal')
+ CONTEXT:  PL/Python function "test_type_conversion_numeric"
+  test_type_conversion_numeric 
+ --
+ 1234567890.0987654321
+ (1 row)
+ 
+

Re: [HACKERS] changeset generation v5-01 - Patches & git tree

2013-07-05 Thread Steve Singer


On 07/05/2013 08:03 AM, Andres Freund wrote:

On 2013-06-27 21:52:03 -0700, Kevin Grittner wrote:
Tried that, too, and problem persists.  The log shows the last commit 
on your branch as 022c2da1873de2fbc93ae524819932719ca41bdb. 

Ok. I think I have a slight idea what's going on. Could you check
whether recompiling with -O0 "fixes" the issue?

There's something strange going on here, not sure whether it's just a
bug that's hidden, by either not doing optimizations or by adding more
elog()s, or wheter it's a compiler bug.


I am getting the same test failure Kevin is seeing.
This is on a x64 Debian wheezy machine with
gcc (Debian 4.7.2-5) 4.7.2

Building with -O0 results in passing tests.




Greetings,

Andres Freund





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] changeset generation v5-01 - Patches & git tree

2013-07-05 Thread Steve Singer


On 07/05/2013 09:34 AM, Andres Freund wrote:

On 2013-07-05 09:28:45 -0400, Steve Singer wrote:

On 07/05/2013 08:03 AM, Andres Freund wrote:

On 2013-06-27 21:52:03 -0700, Kevin Grittner wrote:

Tried that, too, and problem persists.  The log shows the last commit on
your branch as 022c2da1873de2fbc93ae524819932719ca41bdb.

Ok. I think I have a slight idea what's going on. Could you check
whether recompiling with -O0 "fixes" the issue?

There's something strange going on here, not sure whether it's just a
bug that's hidden, by either not doing optimizations or by adding more
elog()s, or wheter it's a compiler bug.

I am getting the same test failure Kevin is seeing.
This is on a x64 Debian wheezy machine with
gcc (Debian 4.7.2-5) 4.7.2

Building with -O0 results in passing tests.

Does the patch from
http://archives.postgresql.org/message-id/20130705132513.GB11640%40awork2.anarazel.de
or the git tree (which is rebased ontop of the mvcc catalog commit from
robert which needs some changes) fix it, even with optimizations?



Yes your latest git tree the tests pass with -O2



Greetings,

Andres Freund





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] changeset generation v5-01 - Patches & git tree

2013-07-05 Thread Steve Singer


On 06/14/2013 06:51 PM, Andres Freund wrote:

The git tree is at:
git://git.postgresql.org/git/users/andresfreund/postgres.git branch 
xlog-decoding-rebasing-cf4
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/xlog-decoding-rebasing-cf4



We discussed issues related to passing options to the plugins a number 
of months ago ( 
http://www.postgresql.org/message-id/20130129015732.ga24...@awork2.anarazel.de)


I'm still having issues with the syntax you describe there.


START_LOGICAL_REPLICATION "1" 0/0 ("foo","bar")
 unexpected termination of replication stream: ERROR:  foo requires a 
parameter


START_LOGICAL_REPLICATION "1" 0/0 ("foo" "bar")
"START_LOGICAL_REPLICATION "1" 0/0 ("foo" "bar")": ERROR:  syntax error

START_LOGICAL_REPLICATION "1" 0/0 ("foo")
works okay



Steve





On 2013-06-15 00:48:17 +0200, Andres Freund wrote:

Overview of the attached patches:
0001: indirect toast tuples; required but submitted independently
0002: functions for testing; not required,
0003: (tablespace, filenode) syscache; required
0004: RelationMapFilenodeToOid: required, simple
0005: pg_relation_by_filenode() function; not required but useful
0006: Introduce InvalidCommandId: required, simple
0007: Adjust Satisfies* interface: required, mechanical,
0008: Allow walsender to attach to a database: required, needs review
0009: New GetOldestXmin() parameter; required, pretty boring
0010: Log xl_running_xact regularly in the bgwriter: required
0011: make fsync_fname() public; required, needs to be in a different file
0012: Relcache support for an Relation's primary key: required
0013: Actual changeset extraction; required
0014: Output plugin demo; not required (except for testing) but useful
0015: Add pg_receivellog program: not required but useful
0016: Add test_logical_decoding extension; not required, but contains
   the tests for the feature. Uses 0014
0017: Snapshot building docs; not required

Version v5-01 attached

Greetings,

Andres Freund







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication identifiers, take 3

2014-09-27 Thread Steve Singer


On 09/26/2014 06:05 PM, Andres Freund wrote:

On 2014-09-26 14:57:12 -0400, Robert Haas wrote:
Sure, it'll possibly not be trivial to move them elsewhere. On the other
hand, the padding bytes have been unused for 8+ years without somebody
laying "claim" on them but "me". I don't think it's a good idea to leave
them there unused when nobody even has proposed another use for a long
while. That'll just end up with them continuing to be unused.  And
there's actually four more consecutive bytes on 64bit systems that are
unused.

Should there really be a dire need after that, we can simply bump the
record size. WAL volume wise it'd not be too bad to make the record a
tiny bit bigger - the header is only a relatively small fraction of the
entire content.


If we were now increasing the WAL record size anyway for some unrelated 
reason, would we be willing to increase it by a further 2 bytes for the 
node identifier?
If the answer is 'no' then I don't think we can justify using the 2 
padding bytes just because they are there and have been unused for 
years.  But if the answer is yes, we feel this important enough to 
justfiy a slightly (2 byte) larger WAL record header then we shouldn't 
use the excuse of maybe needing those 2 bytes for something else.   When 
something else comes along that needs the WAL space we'll have to 
increase the record size.


To say that if some other patch comes along that needs the space we'll 
redo this feature to use the method Robert describes is unrealistic.  If 
we think that the replication identifier  isn't 
general/important/necessary to justify 2 bytes of WAL header space then 
we should start out with something that doesn't use the WAL header,




Steve


I've also wondered about that. Perhaps we simply should have an
additional 'name' column indicating the replication solution?

Yeah, maybe, but there's still the question of substructure within the
non-replication-solution part of the name.  Not sure if we can assume
that a bipartite identifier, specifically, is right, or whether some
solutions will end up with different numbers of components.

Ah. I thought you only wanted to suggest a separator between the
replication solution and it's internal dat. But you actually want to
suggest an internal separator to be used in the solution's namespace?
I'm fine with that. I don't think we can suggest much beyond that -
different solutions will have fundamentally differing requirements about
which information to store.

Agreed.

So, let's recommend underscores as that separator?

Greetings,

Andres Freund





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication identifiers, take 3

2014-09-27 Thread Steve Singer


On 09/26/2014 10:21 AM, Andres Freund wrote:

On 2014-09-26 09:53:09 -0400, Robert Haas wrote:

On Fri, Sep 26, 2014 at 5:05 AM, Andres Freund  wrote:

Let me try to summarize the information requirements for each of these
things.  For #1, you need to know, after crash recovery, for each
standby, the last commit LSN which the client has confirmed via a
feedback message.

I'm not sure I understand what you mean here? This is all happening on
the *standby*. The standby needs to know, after crash recovery, the
latest commit LSN from the primary that it has successfully replayed.

Ah, sorry, you're right: so, you need to know, after crash recovery,
for each machine you are replicating *from*, the last transaction (in
terms of LSN) from that server that you successfully replayed.

Precisely.


I don't think a solution which logs the change of origin will be
simpler. When the origin is in every record, you can filter without keep
track of any state. That's different if you can switch the origin per
tx. At the very least you need a in memory entry for the origin.

But again, that complexity pertains only to logical decoding.
Somebody who wants to tweak the WAL format for an UPDATE in the future
doesn't need to understand how this works, or care.

I agree that that's a worthy goal. But I don't see how this isn't the
case with what I propose? This isn't happening on the level of
individual rmgrs/ams - there've been two padding bytes after 'xl_rmid'
in struct XLogRecord for a long time.

There's also the significant advantage that not basing this on the xid
allows it to work correctly with records not tied to a
transaction. There's not that much of that happening yet, but I've
several features in mind:

* separately replicate 2PC commits. 2PC commits don't have an xid
   anymore... With some tooling on top replication 2PC in two phases
   allow for very cool stuff. Like optionally synchronous multimaster
   replication.
* I have a pending patch that allows to send 'messages' through logical
   decoding - yielding a really fast and persistent queue. For that it's
   useful have transactional *and* nontransactional messages.
* Sanely replicating CONCURRENTLY stuff gets harder if you tie things to
   the xid.


At what point in the decoding stream should something related to a 
CONCURRENTLY command show up?
Also, for a logical message queue why couldn't you have a xid associated 
with the message that had nothing else in the transaction?


l


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] HINT: pg_hba.conf changed since last config reload

2014-10-19 Thread Steve Singer


On 10/16/2014 11:34 PM, Craig Ringer wrote:



Given the generally positive reception to this, here's a patch.

The first patch adds an errhint_log , akin to the current errdetail_log,
so we can send a different HINT to the server log than we do to the client.


The patch behaves as you describe.I feel that this feature would be 
useful , and you implemented the suggestions given that requested the 
reload notice but be sent to the client but instead just a hint about 
checking the server log.


You follow the pattern set with detail_log which makes sense.  The 
variable name "hint_log" doesn't make it obvious to me that
the hint goes to the server log, but not the client.  The comment for 
errhint_log should maybe explicitly say that.



One question about the code:

Does errfinish (elog.c at around line 505) need to free hint_log  ? (I 
would assume it does)


Other than that the patch looks good to me.


-

Something else I noticed while testing.  This isn't introduced by your 
patch but I am wondering if it an existing bug if I setup my 
configuration like this:



#data_directory = 'ConfigDir'   # use data in another directory
# (change requires restart)
hba_file = 'ConfigDir/pg_hba2.conf' # host-based authentication file

and start postgres like

./postgres -D ../data

it looks for pg2hba2.conf at bin/ConfigDir/pg_hba2.conf   (relative to 
the bin directory I started it from)


Then if I change my pg_hba.conf and do a reload I get the following in 
the log


LOG:  parameter "hba_file" cannot be changed without restarting the server
LOG:  configuration file 
"/usr/local/pgsql95git/bin/../data/postgresql.conf" contains errors; 
unaffected changes were applied



set_config_option is comparing the relative path with the absolute path.

Steve



(Even if DETAIL was appropriate for this info, which it isn't, I can't
use errdetail_log because it's already used for other information in
some of the same error sites.)

The second patch adds a test during errors to report if pg_hba.conf is
stale, or if pg_ident.conf is stale.


Typical output, client:

psql: FATAL:  Peer authentication failed for user "fred"
HINT:  See the server error log for additional information.


Typical output, server:

LOG:  provided user name (fred) and authenticated user name (craig) do
not match
FATAL:  Peer authentication failed for user "fred"
DETAIL:  Connection matched pg_hba.conf line 84: "local   all
   all peer"
HINT:  pg_hba.conf has been changed since last server configuration
reload. Reload the server configuration to apply the changes.



I've added this to the next CF.








--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Moving of INT64_FORMAT to c.h

2014-10-22 Thread Steve Singer


On 10/16/2014 08:08 AM, Andres Freund wrote:

On 2014-10-16 08:04:17 -0400, Jan Wieck wrote:

Hi,

PostgreSQL has for ages defined INT64_FORMAT and UINT64_FORMAT in
pg_config.h. This commit

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ce486056ecd28050

moved those definitions to c.h, which breaks compilation of all released
Slony-I versions against current master. Can those be moved back to where
they used to be?

Well, you could add additional configure stuff to also emit what you
want.


Slony uses the definitions in external tools, like slon and slonik, to
format sequence numbers in log output.

Then it should include c.h/postgres_fe.h?


So the header of c.h says "Note that the definitions here are not 
intended to be exposed to clients"

but
postgres_fe.h says "This should be the first file included by PostgreSQL 
client libraries and"


Should client programs that live outside the postgres source tree be 
including postgres_fe.h ?  I have a feeling the answer is no. If the 
answer is no, then why does a make install install postgres_fe.h ?


Slonik used to include postgre_fe.h but back in 2011 or so we stopped 
doing so because it was causing issues (I think on win32 builds)


Maybe slony client programs shouldn't be trying to steal portability 
definitions from postgres headers, but I doubt we are the only ones 
doing that.  It isn't a big deal for slony to define it's own 
INT64_FORMAT for 9.5+ but third party projects that have been including 
pg_config.h will hit similar issues.  if there was good reason for the 
change then fine (Postgres isn't intended to be a general purpose C 
portability layer).




Steve


Greetings,

Andres Freund





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Moving of INT64_FORMAT to c.h

2014-10-22 Thread Steve Singer


On 10/22/2014 04:17 PM, Robert Haas wrote:



Slonik used to include postgre_fe.h but back in 2011 or so we stopped doing
so because it was causing issues (I think on win32 builds)

That seems like something we ought to consider fixing, but obviously
we'd need more details.

When I'll try to find sometime to get a windows environment running and 
try to figure out what the problems were.
It's also possible that I removed the postgres_fe include at the same 
time as I was fixing other win32 issues and the postgres_fe include 
wasn't causing a problem it was just removed because it was unnecessary.


At the moment slony only includes the server include dir if we are 
building with pgport (which we normally do on windows)
We have had reports of issues like described 
(http://www.slony.info/bugzilla/show_bug.cgi?id=315) where some installs 
make us pick up server and client includes from different versions of PG.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] "snapshot too large" error when initializing logical replication (9.4)

2014-10-25 Thread Steve Singer

I sometimes get the error "snapshot too large" from my logical 
replication walsender process when in response to a CREATE_REPLICATION_SLOT.


This is in SnapBuildExportSnapshot in snapbuild.c

newxcnt is 212 at that point

I have max_connections = 200

procArray->maxProcs=212

Should we be testing
newxcnt > GetMaxSnapshotXidCount()

instead of
newxcnt >= GetMaxSnapshotXidCount()






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] logical decoding - reading a user catalog table

2014-10-25 Thread Steve Singer


My logical decoding plugin is occasionally getting this  error

"could not resolve cmin/cmax of catalog tuple"

I get this when my output plugin is trying to read one of the user 
defined catalog tables (user_catalog_table=true)



I am not sure if this is a bug in the time-travel support in the logical 
decoding support of if I'm just using it wrong (ie not getting a 
sufficient lock on the relation or something).




This is the interesting part of the stack trace

#4  0x0091bbc8 in HeapTupleSatisfiesHistoricMVCC 
(htup=0x7fffcf42a900,

snapshot=0x7f786ffe92d8, buffer=10568) at tqual.c:1631
#5  0x004aedf3 in heapgetpage (scan=0x28d7080, page=0) at 
heapam.c:399

#6  0x004b0182 in heapgettup_pagemode (scan=0x28d7080,
dir=ForwardScanDirection, nkeys=0, key=0x0) at heapam.c:747
#7  0x004b1ba6 in heap_getnext (scan=0x28d7080,
direction=ForwardScanDirection) at heapam.c:1475
#8  0x7f787002dbfb in lookupSlonyInfo (tableOid=91754, ctx=0x2826118,
origin_id=0x7fffcf42ab8c, table_id=0x7fffcf42ab88, 
set_id=0x7fffcf42ab84)

at slony_logical.c:663
#9  0x7f787002b7a3 in pg_decode_change (ctx=0x2826118, txn=0x28cbec0,
relation=0x7f787a3446a8, change=0x7f786ffe3268) at slony_logical.c:237
#10 0x007497d4 in change_cb_wrapper (cache=0x28cbda8, 
txn=0x28cbec0,

relation=0x7f787a3446a8, change=0x7f786ffe3268) at logical.c:704



Here is what the code in lookupSlonyInfo is doing
--

  sltable_oid = get_relname_relid("sl_table",slony_namespace);

  sltable_rel = relation_open(sltable_oid,AccessShareLock);
  tupdesc=RelationGetDescr(sltable_rel);
  scandesc=heap_beginscan(sltable_rel, 
GetCatalogSnapshot(sltable_oid),0,NULL);

  reloid_attnum = get_attnum(sltable_oid,"tab_reloid");

  if(reloid_attnum == InvalidAttrNumber)
  elog(ERROR,"sl_table does not have a tab_reloid column");
  set_attnum = get_attnum(sltable_oid,"tab_set");

  if(set_attnum == InvalidAttrNumber)
  elog(ERROR,"sl_table does not have a tab_set column");
  tableid_attnum = get_attnum(sltable_oid, "tab_id");

  if(tableid_attnum == InvalidAttrNumber)
  elog(ERROR,"sl_table does not have a tab_id column");

  while( (tuple = heap_getnext(scandesc,ForwardScanDirection) ))





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding - reading a user catalog table

2014-10-28 Thread Steve Singer


On 10/28/2014 01:31 PM, Andres Freund wrote:

On 2014-10-25 18:18:07 -0400, Steve Singer wrote:

My logical decoding plugin is occasionally getting this  error

"could not resolve cmin/cmax of catalog tuple"

I get this when my output plugin is trying to read one of the user defined
catalog tables (user_catalog_table=true)

Hm. That should obviously not happen.

Could you describe how that table is modified? Does that bug happen
initially, or only after a while?


It doesn't happen right away, in this case it was maybe 4 minutes after 
creating the slot.
The error also doesn't always happen when I run the this test workload 
but it is reproducible with some trying.
I' don't do anything special to that table, it gets created then I do 
inserts on it.  I don't do an alter table or anything fancy like that.  
I was running the slony failover test (all nodes under the same 
postmaster) which involves the occasional dropping and recreating of 
databases along with normal query load + replication.


I'll send you tar of the data directory off list with things in this state.


Do you have a testcase that would allow me to easily reproduce the
problem?


I don't have a isolated test case that does this.  The test that I'm 
hitting this with does lots of stuff and doesn't even always hit this.



I am not sure if this is a bug in the time-travel support in the logical
decoding support of if I'm just using it wrong (ie not getting a sufficient
lock on the relation or something).

I don't know yet...


This is the interesting part of the stack trace

#4  0x0091bbc8 in HeapTupleSatisfiesHistoricMVCC
(htup=0x7fffcf42a900,
 snapshot=0x7f786ffe92d8, buffer=10568) at tqual.c:1631
#5  0x004aedf3 in heapgetpage (scan=0x28d7080, page=0) at
heapam.c:399
#6  0x004b0182 in heapgettup_pagemode (scan=0x28d7080,
 dir=ForwardScanDirection, nkeys=0, key=0x0) at heapam.c:747
#7  0x004b1ba6 in heap_getnext (scan=0x28d7080,
 direction=ForwardScanDirection) at heapam.c:1475
#8  0x7f787002dbfb in lookupSlonyInfo (tableOid=91754, ctx=0x2826118,
 origin_id=0x7fffcf42ab8c, table_id=0x7fffcf42ab88,
set_id=0x7fffcf42ab84)
 at slony_logical.c:663
#9  0x7f787002b7a3 in pg_decode_change (ctx=0x2826118, txn=0x28cbec0,
 relation=0x7f787a3446a8, change=0x7f786ffe3268) at slony_logical.c:237
#10 0x007497d4 in change_cb_wrapper (cache=0x28cbda8, txn=0x28cbec0,
 relation=0x7f787a3446a8, change=0x7f786ffe3268) at logical.c:704



Here is what the code in lookupSlonyInfo is doing
--

   sltable_oid = get_relname_relid("sl_table",slony_namespace);

   sltable_rel = relation_open(sltable_oid,AccessShareLock);
   tupdesc=RelationGetDescr(sltable_rel);
   scandesc=heap_beginscan(sltable_rel,
GetCatalogSnapshot(sltable_oid),0,NULL);
   reloid_attnum = get_attnum(sltable_oid,"tab_reloid");

   if(reloid_attnum == InvalidAttrNumber)
  elog(ERROR,"sl_table does not have a tab_reloid column");
   set_attnum = get_attnum(sltable_oid,"tab_set");

   if(set_attnum == InvalidAttrNumber)
  elog(ERROR,"sl_table does not have a tab_set column");
   tableid_attnum = get_attnum(sltable_oid, "tab_id");

   if(tableid_attnum == InvalidAttrNumber)
  elog(ERROR,"sl_table does not have a tab_id column");

   while( (tuple = heap_getnext(scandesc,ForwardScanDirection) ))

(Except missing spaces ;)) I don't see anything obviously wrong with
this.

Greetings,

Andres Freund





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "snapshot too large" error when initializing logical replication (9.4)

2014-10-30 Thread Steve Singer


On 10/28/2014 01:27 PM, Andres Freund wrote:

Hi,

On 2014-10-25 18:09:36 -0400, Steve Singer wrote:

I sometimes get the error "snapshot too large" from my logical replication
walsender process when in response to a CREATE_REPLICATION_SLOT.

Yes. That's possible if 'too much' was going on until a consistent point
was reached.  I think we can just use a much larger size for the array
if necessary.

I've attached patch for this. Could you try whether that helps? I don't
have a testcase handy that reproduces the problem.


This patch seems to fix things.
I've done numerous runs of the test with I was doing before with your 
patch applied and don't seem to be having this issue anymore.




This is in SnapBuildExportSnapshot in snapbuild.c

newxcnt is 212 at that point

I have max_connections = 200

procArray->maxProcs=212

Should we be testing
newxcnt > GetMaxSnapshotXidCount()

instead of
newxcnt >= GetMaxSnapshotXidCount()

It actually looks correct to me new - newxcnt is used as an offset into
an array of size GetMaxSnapshotXidCount().

Greetings,

Andres Freund







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tracking commit timestamps

2014-11-05 Thread Steve Singer


On 11/05/2014 11:23 AM, Jim Nasby wrote:



Except that commit time is not guaranteed unique *even on a single 
system*. That's my whole point. If we're going to bother with all the 
commit time machinery it seems really silly to provide a way to 
uniquely order every commit.


Clearly trying to uniquely order commits across multiple systems is a 
far larger problem, and I'm not suggesting we attempt that. But for a 
single system AIUI all we need to do is expose the LSN of each commit 
record and that will give you the exact and unique order in which 
transactions committed.


This isn't a hypothetical feature either; if we had this, logical 
replication systems wouldn't have to try and fake this via batches. 
You could actually recreate exactly what data was visible at what time 
to all transactions, not just repeatable read ones (as long as you 
kept snapshot data as well, which isn't hard).


As for how much data to keep, if you have a process that's doing 
something to record this information permanently all it needs to do is 
keep an old enough snapshot around. That's not that hard to do, even 
from user space.


+1 for this.

It isn't just 'replication' systems that have a need for getting the 
commit order of transactions on a single system.  I have a application 
(not slony) where we want to query a table but order the output based on 
the transaction commit order of when the insert into the table was done 
(think of a queue). I'm not replicating the output but passing the data 
to other applications for further processing.  If I just had the commit 
timestamp I would need to put in some other condition to break ties in a 
consistent way.  I think being able to get an ordering by commit LSN is 
what I really want in this case not the timestamp.


Logical decoding is one solution to this (that I was considering) but 
being able to do something like

select * FROM event_log order by commit_id would be a lot simpler.






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tracking commit timestamps

2014-11-05 Thread Steve Singer


On 11/05/2014 05:43 PM, Andres Freund wrote:

On 2014-11-05 17:17:05 -0500, Steve Singer wrote:
Imo that's essentially a different feature. What you essentially would
need here is a 'commit sequence number' - but no timestamps. And
probably to be useful that number has to be 8 bytes in itself.


I think this gets to the heart of some of the differing views people 
have expressed on this patch


Is this patch supposed to:

A)  Add commit timestamp tracking but nothing more

B) Add infrastructure to store commit timestamps and provide a facility 
for storing additional bits of data extensions might want to be 
associated with the commit


C).  Add commit timestamps and node identifiers to commits

If the answer is (A) then I can see why some people are objecting to 
adding extradata.If the answer is (B) then it's fair to ask how well 
does this patch handle various types of things people might want to 
attach to the commit record (such as the LSN).   I think the problem is 
that once you start providing a facility extensions can use to store 
data along with the commit record then being restricted to 4 or 8 bytes 
is very limiting.  It also doesn't allow you to load two extensions at 
once on a system.   You wouldn't be able to have both the  
'steve_commit_order' extension and BDR installed at the same time.  I 
don't think this patch does a very good job at (B) but It wasn't 
intended to.


If what we are really doing is C, and just calling the space 'extradata' 
until we get the logical identifier stuff in and then we are going to 
rename extradata  to nodeid then we should say so.  If we are really 
concerned about the pg_upgrade impact of expanding the record later then 
maybe we should add 4 bytes of padding to the CommitTimeStampEntry now 
and but leave the manipulating the node id until later.


Steve





Greetings,

Andres Freund





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tracking commit timestamps

2014-11-09 Thread Steve Singer


On 11/07/2014 07:07 PM, Petr Jelinek wrote:


The list of what is useful might be long, but we can't have everything 
there as there are space constraints, and LSN is another 8 bytes and I 
still want to have some bytes for storing the "origin" or whatever you 
want to call it there, as that's the one I personally have biggest 
use-case for.
So this would be ~24bytes per txid already, hmm I wonder if we can 
pull some tricks to lower that a bit.




The reason why Jim and myself are asking for the LSN and not just the 
timestamp is that I want to be able to order the transactions. Jim 
pointed out earlier in the thread that just ordering on timestamp allows 
for multiple transactions with the same timestamp.


Maybe we don't need the entire LSN to solve that.  If you already have 
the commit timestamp maybe you only need another byte or two of 
granularity to order transactions that are within the same microsecond.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tracking commit timestamps

2014-11-10 Thread Steve Singer


On 11/10/2014 08:39 AM, Petr Jelinek wrote:

On 09/11/14 17:57, Steve Singer wrote:

On 11/07/2014 07:07 PM, Petr Jelinek wrote:

The list of what is useful might be long, but we can't have everything
there as there are space constraints, and LSN is another 8 bytes and I
still want to have some bytes for storing the "origin" or whatever you
want to call it there, as that's the one I personally have biggest
use-case for.
So this would be ~24bytes per txid already, hmm I wonder if we can
pull some tricks to lower that a bit.



The reason why Jim and myself are asking for the LSN and not just the
timestamp is that I want to be able to order the transactions. Jim
pointed out earlier in the thread that just ordering on timestamp allows
for multiple transactions with the same timestamp.

Maybe we don't need the entire LSN to solve that.  If you already have
the commit timestamp maybe you only need another byte or two of
granularity to order transactions that are within the same microsecond.



Hmm maybe just one part of LSN, but I don't really like that either, 
if we want to store LSN we should probably store it as is as somebody 
might want to map it to txid for other reasons.


I did the calculation above wrong btw, it's actually 20 bytes not 24 
bytes per record, I am inclined to just say we can live with that.


Since we agreed that the (B) case is not really feasible and we are 
doing the (C), I also wonder if extradata should be renamed to nodeid 
(even if it's not used at this point as nodeid). And then there is 
question about the size of it, since the nodeid itself can live with 2 
bytes probably ("64k of nodes ought to be enough for everybody" ;) ).
Or leave the extradata as is but use as reserved space for future use 
and not expose it at this time on SQL level at all?


I guess Andres could answer what suits him better here.



I am happy with renaming extradata to nodeid and not exposing it at this 
time.


If we feel that commit-order (ie LSN or something equivalent) is really 
a different patch/feature than commit-timestamp then I am okay with that 
also but we should make sure to warn users of the commit-timestamp in 
the documentation that two transactions might have the same timestamp 
and that the commit order might not be the same as ordering by the 
commit timestamp.






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding - reading a user catalog table

2014-11-13 Thread Steve Singer


On 11/13/2014 02:44 PM, Andres Freund wrote:

H



I've pushed a fix for a bug that could possibly also cause
this. Although it'd be odd that it always hits the user catalog
table. Except if your tests mostly modify the slony tables, but do not
do much DDL otherwise?


The test I was running doesn't do DDL, so only the user catalog tables 
would have changes being tracked.


I still sometimes get the error. When I get sometime on the weekend I'll 
work on some detailed  instructions on how to grab and setup the various 
components to replicate this test.


Also since updating (to 2c267e47afa4f9a7c) I've seen a assertion failure 
in a normal client connection, not the walsender


#3  0x006b4978 in GetSerializableTransactionSnapshotInt (
snapshot=snapshot@entry=0xbfa8a0 ,
sourcexid=sourcexid@entry=0) at predicate.c:1738
#4  0x006b66c3 in GetSafeSnapshot (origSnapshot=)
at predicate.c:1517
#5  GetSerializableTransactionSnapshot (
snapshot=0xbfa8a0 ) at predicate.c:1598
#6  0x007d16dd in GetTransactionSnapshot () at snapmgr.c:200
#7  0x006c0e35 in exec_simple_query (
query_string=0x1fd01b8 "select ev_origin, ev_seqno, 
ev_timestamp,ev_snapshot, 
\"pg_catalog\".txid_snapshot_xmin(ev_snapshot), 
\"pg_catalog\".txid_snapshot_xmax(ev_snapshot), 
coalesce(ev_provider_xid,\""...)

at postgres.c:959
#8  PostgresMain (argc=, argv=argv@entry=0x1f5ab50,


I have no idea if this has anything to do with your recent changes or 
some other change. I haven't so far been able to replicate that since 
the first time I saw it.





I'll send you tar of the data directory off list with things in this state.


Do you have a testcase that would allow me to easily reproduce the
problem?

I don't have a isolated test case that does this.  The test that I'm hitting
this with does lots of stuff and doesn't even always hit this.

If it still happens, could you send me instructions of how to reproduce
the problem after cloning the necessary source repositories? It's quite
hard to validate a possible fix otherwise.

Greetings,

Andres Freund





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tracking commit timestamps

2014-11-14 Thread Steve Singer


On 11/14/2014 08:21 PM, Simon Riggs wrote:

The requested information is already available, as discussed. Logical
decoding adds commit ordering for *exactly* the purpose of using it
for replication, available to all solutions. This often requested
feature has now been added and doesn't need to be added twice.

So what we are discussing is adding a completely superfluous piece of
information.

Not including the LSN info does nothing to trigger based replication,
which will no doubt live on happily for many years. But adding LSN
will slow down logical replication, for no purpose at all.



Simon,
The use cases I'm talking about aren't really replication related. Often 
I have come across systems that want to do something such as 'select * 
from orders where X > the_last_row_I_saw order by X' and then do further 
processing on the order.


This is kind of awkard to do today because you don't have a good 
candidate for 'X' to order on.   Using either a sequence or insert-row 
timestamp doesn't work well because a transaction with a lower value for 
X might end up committing after the higher value in in a query result.


Yes you could setup a logical wal slot and listen on the stream of 
inserts into your order table but thats a lot of administration overhead 
compared to just issuing an SQL query for what really is a query type 
operation.


Using the commit timestamp for my X sounded very tempting but could 
allow duplicates.


One could argue that this patch is about replication features, and 
providing commit ordering for query purposes should be a separate patch 
to add that on top of this infrastructure. I see merit to smaller more 
focused patches but that requires leaving the door open to easily 
extending things later.


It could also be that I'm the only one who wants to order and filter 
queries in this manner (but that would surprise me).  If the commit lsn 
has limited appeal and we decide we don't want it at all  then we 
shouldn't add it.  I've seen this type of requirement in a number of 
different systems at a number of different companies.  I've generally 
seen it dealt with by either selecting rows behind the last now() 
timestamp seen and then filtering out already processed rows or by 
tracking the 'processed' state of each row individually (ie performing 
an update on each row once its been processed) which performs poorly.


Steve







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding - reading a user catalog table

2014-11-16 Thread Steve Singer


On 11/13/2014 02:44 PM, Andres Freund wrote:

Hi Steve,



If it still happens, could you send me instructions of how to reproduce
the problem after cloning the necessary source repositories? It's quite
hard to validate a possible fix otherwise.


1. Install PG 9.4

2. Perform an initdb
   max_connections = 200
   wal_level=logical
   max_walsenders=50
   wal_keep_segments = 100
   wal_sender_timeout = 600s
   max_replication_slots = 120



Create a user slon with superuser
create a user test
(set passwords for them you'll use them later)

Configure pg_hba.conf to allow slon to connect as a replication user


3. Download slony from github (https://github.com/ssinger/slony1-engine.git)
   checkout  the branch logical_remote_helper
./configure --with-pgconfigdir=/path_to_your_pgcbindir
make
make install

4. Download clustertest & compile clustertest from 
(https://github.com/clustertest/clustertest-framework).

5. In the slony source tree cd to the clustertest directory
6. cp conf/disorder.properties.sample to conf/disorder.properties
Edit the file to have the correct paths, ports, users, passwords.

7 cp conf/java.properties.sample to conf/java.properties edit the file 
to point at the JAR you downloaded earlier.


I run with all 5 databases on the same PG cluster. Performance will be 
much better with 5 different clusters, but I recommend trying to emulate 
my configuration as much as possible to replicate this


To run the tests then do
./run_all_disorder_tests.sh


At the moment (commit df5acfd1a3) is configured to just run the Failover 
node test cases where I am seeing this not the entire suite.


It typically takes between 30 minutes to an hour to run though.

I installed things following the above steps on a different system than 
my usual development laptop and I have been unable to reproduce the 
error so for (on that system).  But I am still able to reproduce it on 
occasion on my normal development laptop.





Greetings,

Andres Freund





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding - reading a user catalog table

2014-11-17 Thread Steve Singer


On 11/16/2014 04:49 PM, Steve Singer wrote:



I installed things following the above steps on a different system 
than my usual development laptop and I have been unable to reproduce 
the error so for (on that system).  But I am still able to reproduce 
it on occasion on my normal development laptop.






After continuously running the test I was eventually able to reproduce 
the error on the other system as well.




Greetings,

Andres Freund









--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding - reading a user catalog table

2014-11-17 Thread Steve Singer


On 11/17/2014 10:37 AM, Andres Freund wrote:

On 2014-11-13 22:23:02 -0500, Steve Singer wrote:


Also since updating (to 2c267e47afa4f9a7c) I've seen a assertion failure in
a normal client connection, not the walsender

#3  0x006b4978 in GetSerializableTransactionSnapshotInt (
 snapshot=snapshot@entry=0xbfa8a0 ,
 sourcexid=sourcexid@entry=0) at predicate.c:1738
#4  0x006b66c3 in GetSafeSnapshot (origSnapshot=)
 at predicate.c:1517
#5  GetSerializableTransactionSnapshot (
 snapshot=0xbfa8a0 ) at predicate.c:1598
#6  0x007d16dd in GetTransactionSnapshot () at snapmgr.c:200
#7  0x006c0e35 in exec_simple_query (
 query_string=0x1fd01b8 "select ev_origin, ev_seqno, ev_timestamp,
ev_snapshot, \"pg_catalog\".txid_snapshot_xmin(ev_snapshot),
\"pg_catalog\".txid_snapshot_xmax(ev_snapshot),
coalesce(ev_provider_xid,\""...)
 at postgres.c:959
#8  PostgresMain (argc=, argv=argv@entry=0x1f5ab50,


I have no idea if this has anything to do with your recent changes or some
other change. I haven't so far been able to replicate that since the first
time I saw it.
That crash is decidedly odd. Any chance you still have the full
backtrace around?


Yes I still have the core file



This is in the SSI code... I'm not immediately seeing how this could be
related to logical decoding. It can't even be a imported snapshot,
because the exported snapshot is marked repeatable read.

Are you actually using serializable transactions? If so, how and why?


Yes, the test client that performs the 'simulated workload' does some 
serializable transactions.  It connects as a normal client via JDBC and 
does a prepareStatement("SET TRANSACTION ISOLATION LEVEL SERIALIZABLE") 
and then does some DML. Why? because it seemed like a good thing to 
include in the test suite.


Your right this might have nothing to do with logical decoding.   I 
haven't been able to reproduce again either, I can't even say
 if this problem was introduced to 9.4 in the past month or if it has 
been around much longer and I just haven't hit it before.







Greetings,

Andres Freund





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical decoding - reading a user catalog table

2014-11-17 Thread Steve Singer


On 11/17/2014 11:34 AM, Andres Freund wrote:

Hi,

Kevin: CCed you, because it doesn't really look like a logical decoding
related issue.

On 2014-11-17 11:25:40 -0500, Steve Singer wrote:

On 11/17/2014 10:37 AM, Andres Freund wrote:

On 2014-11-13 22:23:02 -0500, Steve Singer wrote:


Also since updating (to 2c267e47afa4f9a7c) I've seen a assertion failure in
a normal client connection, not the walsender

#3  0x006b4978 in GetSerializableTransactionSnapshotInt (
 snapshot=snapshot@entry=0xbfa8a0 ,
 sourcexid=sourcexid@entry=0) at predicate.c:1738
#4  0x006b66c3 in GetSafeSnapshot (origSnapshot=)
 at predicate.c:1517
#5  GetSerializableTransactionSnapshot (
 snapshot=0xbfa8a0 ) at predicate.c:1598
#6  0x007d16dd in GetTransactionSnapshot () at snapmgr.c:200
#7  0x006c0e35 in exec_simple_query (
 query_string=0x1fd01b8 "select ev_origin, ev_seqno, ev_timestamp,
ev_snapshot, \"pg_catalog\".txid_snapshot_xmin(ev_snapshot),
\"pg_catalog\".txid_snapshot_xmax(ev_snapshot),
coalesce(ev_provider_xid,\""...)
 at postgres.c:959
#8  PostgresMain (argc=, argv=argv@entry=0x1f5ab50,


I have no idea if this has anything to do with your recent changes or some
other change. I haven't so far been able to replicate that since the first
time I saw it.
That crash is decidedly odd. Any chance you still have the full
backtrace around?

Yes I still have the core file

Cool, could you show the full thing? Or did you just snip it because
it's just the Assert/ExceptionalCondition thing?


I just snipped the exception stuff. Here is the full thing

(gdb) backtrace
#0  0x7f6fb22e8295 in __GI_raise (sig=sig@entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x7f6fb22eb438 in __GI_abort () at abort.c:90
#2  0x007a08e7 in ExceptionalCondition (
conditionName=conditionName@entry=0x918e88 
"!(TransactionIdFollows(snapshot->xmin, PredXact->SxactGlobalXmin))",

errorType=errorType@entry=0x7da390 "FailedAssertion",
fileName=fileName@entry=0x9182c3 "predicate.c",
lineNumber=lineNumber@entry=1738) at assert.c:54
#3  0x006b4978 in GetSerializableTransactionSnapshotInt (
snapshot=snapshot@entry=0xbfa8a0 ,
sourcexid=sourcexid@entry=0) at predicate.c:1738
#4  0x006b66c3 in GetSafeSnapshot (origSnapshot=)
at predicate.c:1517
#5  GetSerializableTransactionSnapshot (
snapshot=0xbfa8a0 ) at predicate.c:1598
#6  0x007d16dd in GetTransactionSnapshot () at snapmgr.c:200
#7  0x006c0e35 in exec_simple_query (
query_string=0x1fd01b8 "select ev_origin, ev_seqno, 
ev_timestamp,ev_snapshot, 
\"pg_catalog\".txid_snapshot_xmin(ev_snapshot), 
\"pg_catalog\".txid_snapshot_xmax(ev_snapshot), 
coalesce(ev_provider_xid,\""...)

at postgres.c:959
#8  PostgresMain (argc=, argv=argv@entry=0x1f5ab50,
---Type  to continue, or q  to quit---
dbname=0x1f5ab30 "test2", username=) at postgres.c:4016
#9  0x0046a08e in BackendRun (port=0x1f83010) at postmaster.c:4123
#10 BackendStartup (port=0x1f83010) at postmaster.c:3797
#11 ServerLoop () at postmaster.c:1576
#12 0x00665eae in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x1f59d00) at postmaster.c:1223
#13 0x0046ab58 in main (argc=3, argv=0x1f59d00) at main.c:227




Could you print *snapshot in frame #3?


(gdb) p *snapshot
$1 = {satisfies = 0x7d0330 , xmin = 412635,
  xmax = 412637, xip = 0x1f86e40, xcnt = 1, subxcnt = 0, subxip = 
0x1fd2190,
  suboverflowed = 0 '\000', takenDuringRecovery = 0 '\000', copied = 0 
'\000',

  curcid = 0, active_count = 0, regd_count = 0}
(gdb)



This is in the SSI code... I'm not immediately seeing how this could be
related to logical decoding. It can't even be a imported snapshot,
because the exported snapshot is marked repeatable read.

Are you actually using serializable transactions? If so, how and why?

Yes, the test client that performs the 'simulated workload' does some
serializable transactions.  It connects as a normal client via JDBC and does
a prepareStatement("SET TRANSACTION ISOLATION LEVEL SERIALIZABLE") and then
does some DML. Why? because it seemed like a good thing to include in the
test suite.

Yes, it's a good reason as the above backtrace proves ;). I'm just
trying to understand uner which circumstances it happens.

The above backtrace looks like it can only be triggered if you do a
BEGIN TRANSACTION SERIALIZABLE DEFERRABLE READ ONLY; Is that something
you do?


The slony remote listener connection does this when it selects from 
sl_event.  We switched to this shortly after 9.1 came out.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tracking commit timestamps

2014-11-19 Thread Steve Singer


On 11/19/2014 08:22 AM, Alvaro Herrera wrote:


I think we're overblowing the pg_upgrade issue.  Surely we don't need to
preserve commit_ts data when upgrading across major versions; and
pg_upgrade is perfectly prepared to remove old data when upgrading
(actually it just doesn't copy it; consider pg_subtrans or pg_serial,
for instance.)  If we need to change binary representation in a future
major release, we can do so without any trouble.



That sounds reasonable. I am okay with Petr removing the LSN portion 
this patch.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pglogical - logical replication contrib module

2016-01-08 Thread Steve Singer


On 12/31/2015 06:34 PM, Petr Jelinek wrote:

Hi,

I'd like to submit the replication solution which is based on the 
pglogical_output [1] module (which is obviously needed for this to 
compile).




Hi,

make check  gives me


for extra in ../../contrib/pglogical_output contrib/pglogical; do make 
-C '../..'/$extra DESTDIR='/usr/local/src/postgresql'/tmp_install 
install >>'/usr/local/src/postgresql'/tmp_install/log/install.log || 
exit; done
make[1]: *** ../../../../contrib/pglogical_output: No such file or 
directory.  Stop.

../../src/Makefile.global:325: recipe for target 'temp-install' failed
make: *** [temp-install] Error 2
ssinger@ssinger-laptop:/usr/local/src/postgresql/contrib/pglogical$

The attached patch fixes that but it then is creating the test database 
'contrib_regression' not 'regression'
changing pglogical.provider_dsn = 'contrib_regression' still leaves me 
with a lot of failures.



diff --git a/contrib/pglogical/Makefile b/contrib/pglogical/Makefile
new file mode 100644
index 1640f63..a4dab88
*** a/contrib/pglogical/Makefile
--- b/contrib/pglogical/Makefile
*** include $(top_srcdir)/contrib/contrib-gl
*** 27,34 
  # typical installcheck users do not have (e.g. buildfarm clients).
  @installcheck: ;
  
! EXTRA_INSTALL += $(top_srcdir)/contrib/pglogical_output
! EXTRA_REGRESS_OPTS += $(top_srcdir)/contrib/pglogical/regress-postgresql.conf
  
  override CPPFLAGS := $(CPPFLAGS) -I$(top_srcdir)/contrib/pglogical_output
  
--- 27,34 
  # typical installcheck users do not have (e.g. buildfarm clients).
  @installcheck: ;
  
! EXTRA_INSTALL += contrib/pglogical_output
! EXTRA_REGRESS_OPTS += --temp-config=regress-postgresql.conf
  
  override CPPFLAGS := $(CPPFLAGS) -I$(top_srcdir)/contrib/pglogical_output
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pglogical - logical replication contrib module

2016-01-09 Thread Steve Singer


On 12/31/2015 06:34 PM, Petr Jelinek wrote:

Hi,

I'd like to submit the replication solution which is based on the 
pglogical_output [1] module (which is obviously needed for this to 
compile).


The pglogical contrib module provides extension which does the 
master-slave logical replication based on the logical decoding.


The basic documentation is in README.md, I didn't bother making sgml 
docs yet since I expect that there will be ongoing changes happening 
and it's easier for me to update the markdown docs than sgml. I will 
do the conversion once we start approaching committable state.
I am going to send my comments/issues out in batches as I find them 
instead of waiting till I look over everything.



I find this part of the documentation a bit unclear


+Once the provider node is setup, subscribers can be subscribed to it. 
First the

+subscriber node must be created:
+
+SELECT pglogical.create_node(
+node_name := 'subscriber1',
+dsn := 'host=thishost port=5432 dbname=db'
+);
+

My initial reading was that I should execute this on the provider node.  
Perhaps instead

-
Once the provider node is setup you can then create subscriber nodes.  
Create the subscriber nodes and

then execute the following commands on each subscriber node

create extension pglogical

select pglogical.create_node(node_name:='subsriberX',dsn:='host=thishost 
dbname=db port=5432');


---

Also the documentation for create_subscription talks about

+  - `synchronize_structure` - specifies if to synchronize structure from
+provider to the subscriber, default true



I did the following

test2=# select pglogical.create_subscription(subscription_name:='default 
sub',provider_dsn:='host=localhost dbname=test1 port=5436');

 create_subscription
-
   247109879


Which then resulted in the following showing up in my  PG log

LOG:  worker process: pglogical apply 16542:247109879 (PID 4079) exited 
with exit code 1
ERROR:  replication slot name "pgl_test2_test1_default sub" contains 
invalid character
HINT:  Replication slot names may only contain lower case letters, 
numbers, and the underscore character.
FATAL:  could not send replication command "CREATE_REPLICATION_SLOT 
"pgl_test2_test1_default sub" LOGICAL pglogical_output": status 
PGRES_FATAL_ERROR: ERROR:  replication slot name 
"pgl_test2_test1_default sub" contains invalid character
HINT:  Replication slot names may only contain lower case letters, 
numbers, and the underscore character.



The create_subscription command should check if the subscription name is 
valid (meets the rules that will be applied against the slot command).


I wondered how I could fix my mistake.

The docs say

+- `pglogical.pglogical_drop_subscription(subscription_name name, 
ifexists bool)`

+  Disconnects the subscription and removes it from the catalog.
+

test2=# select pglogical.pglogical_drop_subscription('default sub', true);
ERROR:  function pglogical.pglogical_drop_subscription(unknown, boolean) 
does not exist



The command is actually called pglogical.drop_subscription the docs 
should be fixed to show the actual command name



I then wanted to add a second table to my database. ('b').

select pglogical.replication_set_add_table('default','public.b',true);
 replication_set_add_table
---
 t
(1 row)

In my pglog I then got

LOG:  starting sync of table public.b for subscriber defaultsub
ERROR:  replication slot name "pgl_test2_test1_defaultsub_public.b" 
contains invalid character
HINT:  Replication slot names may only contain lower case letters, 
numbers, and the underscore character.
FATAL:  could not send replication command "CREATE_REPLICATION_SLOT 
"pgl_test2_test1_defaultsub_public.b" LOGICAL pglogical_output": status 
PGRES_FATAL_ERROR: ERROR:  replication slot name 
"pgl_test2_test1_defaultsub_public.b" contains invalid character
HINT:  Replication slot names may only contain lower case letters, 
numbers, and the underscore character.



I then did

test1=# select pglogical.replication_set_remove_table('default','public.b');
 replication_set_remove_table
--
 t
(1 row)


but my log still keep repeating the error, so I tried connecting to the 
replica and did the same


test2=# select pglogical.replication_set_remove_table('default','public.b');
ERROR:  replication set mapping -303842815:16726 not found

Is there any way to recover from this situation?

The documenation says I can drop a replication set, maybe that will let 
replication continue.


+- `pglogical.delete_replication_set(set_name text)`
+  Removes the replication set.
+

select pglogical.delete_replication_set('default');
ERROR:  function pglogical.delete_replication_set(unknown) does not exist
LINE 1: select pglogical.delete_replication_set('default');
   ^
HINT:  No function matches the given name and argument types. You might 
need to add explicit type c

Re: [HACKERS] pglogical - logical replication contrib module

2016-01-10 Thread Steve Singer


On 01/09/2016 01:30 PM, Steve Singer wrote:

On 12/31/2015 06:34 PM, Petr Jelinek wrote:



I'm not really sure what to do to 'recover' my cluster at this point 
so I'll send this off and rebuild my cluster and start over.





I had a setup test1--->test2 (with 2 tables in the default set)

I then created a third database (all three hosted on the same PG cluster)

In the third database (test3)
test3=# create extension pglogical;
CREATE EXTENSION
test3=# select pglogical.create_node(node_name:='test3', 
dsn:='host=localhost dbname=test3 port=5436');

 create_node
-
  2001662995
(1 row)

test3=# select 
pglogical.create_subscription(subscription_name:='defaultsub',provider_dsn:='host=localhost 
dbname=test2 port=5436');

 create_subscription
-
  2974019075

It copied the schema over but not the data (if I use test2 as the 
provider_dsn then it does copy the data).


I then tried inserting a row into a table on test1.  Things crashed and 
after crash recovery I keep getting


2016-01-10 13:03:15 EST LOG:  database system is ready to accept connections
2016-01-10 13:03:15 EST LOG:  autovacuum launcher started
2016-01-10 13:03:15 EST LOG:  starting apply for subscription defaultsub
2016-01-10 13:03:15 EST LOG:  starting apply for subscription defaultsub
2016-01-10 13:03:15 EST test2LOG:  starting logical decoding for slot 
"pgl_test3

_test2_defaultsub"
2016-01-10 13:03:15 EST test2DETAIL:  streaming transactions committing 
after 0/

18292D8, reading WAL from 0/18292D8
2016-01-10 13:03:15 EST test2LOG:  logical decoding found consistent 
point at 0/

18292D8
2016-01-10 13:03:15 EST test2DETAIL:  Logical decoding will begin using 
saved sn

apshot.
TRAP: FailedAssertion("!(IsTransactionState())", File: "catcache.c", 
Line: 1127)

2016-01-10 13:03:15 EST test2LOG:  unexpected EOF on standby connection
2016-01-10 13:03:15 EST LOG:  worker process: pglogical apply 
17016:2974019075 (

PID 24746) was terminated by signal 6: Aborted

The stack trace is

#3  0x007b83af in SearchCatCache (cache=0xe27d18, v1=15015784,
v2=v2@entry=0, v3=v3@entry=0, v4=v4@entry=0) at catcache.c:1127
#4  0x007c503e in SearchSysCache (cacheId=cacheId@entry=47,
key1=, key2=key2@entry=0, key3=key3@entry=0,
key4=key4@entry=0) at syscache.c:981
#5  0x006996d4 in replorigin_by_name (
roname=0xe51f30 "pgl_test2_test1_defaultsub",
missing_ok=missing_ok@entry=0 '\000') at origin.c:216
#6  0x7fdb54a908d3 in handle_origin (s=0x7ffd873f6da0)
at pglogical_apply.c:235
#7  replication_handler (s=0x7ffd873f6da0) at pglogical_apply.c:1031
#8  apply_work (streamConn=streamConn@entry=0xe84fb0) at 
pglogical_apply.c:1309

#9  0x7fdb54a911cc in pglogical_apply_main (main_arg=)
at pglogical_apply.c:1691
#10 0x00674912 in StartBackgroundWorker () at bgworker.c:726
---Type  to continue, or q  to quit---
#11 0x0067f7e2 in do_start_bgworker (rw=0xe03890) at 
postmaster.c:5501

#12 maybe_start_bgworker () at postmaster.c:5676
#13 0x00680206 in sigusr1_handler 
(postgres_signal_arg=)

at postmaster.c:4937
#14 
#15 0x7fdb54fa2293 in __select_nocancel ()
at ../sysdeps/unix/syscall-template.S:81
#16 0x00468285 in ServerLoop () at postmaster.c:1648
#17 0x0068161e in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0xddede0) at postmaster.c:1292
#18 0x0046979d in main (argc=3, argv=0xddede0) at main.c:223



I tried dropping the subscription and re-adding it.  I keep getting

2016-01-10 13:21:48 EST test1LOG:  logical decoding found consistent 
point at 0/1830080

2016-01-10 13:21:48 EST test1DETAIL:  There are no running transactions.
2016-01-10 13:21:48 EST test1LOG:  exported logical decoding snapshot: 
"04DE-1" with 0 transaction IDs

2016-01-10 13:21:48 EST test3ERROR:  relation "a" already exists
2016-01-10 13:21:48 EST test3STATEMENT:  CREATE TABLE a (
a integer NOT NULL,
b integer
);



pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 182; 1259 16700 TABLE a 
ssinger
pg_restore: [archiver (db)] could not execute query: ERROR: relation "a" 
already exists

Command was: CREATE TABLE a (
a integer NOT NULL,
b integer
);



2016-01-10 13:21:48 EST ERROR:  could not execute command 
"/usr/local/pgsql96gitlogical/bin/pg_restore --section="pre-data" 
--exit-on-error -1 -d "host=localhost dbname=test3 port=5436" 
"/tmp/pglogical-28079.dump""
2016-01-10 13:21:48 EST test1LOG:  unexpected EOF on client connection 
with an open transaction
2016-01-10 13:21:48 EST LOG:  worker process: pglogical apply 
17016:844915593 (PID 28079) exited with exit code 1
2016-01-10 13:21:48 EST ERROR:  subscriber defaultsub4 initialization 
failed duri

Re: [HACKERS] [PATCH 09/14] Adjust all *Satisfies routines to take a HeapTuple instead of a HeapTupleHeader

2012-11-16 Thread Steve Singer


On 12-11-14 08:17 PM, Andres Freund wrote:

For the regular satisfies routines this is needed in prepareation of logical
decoding. I changed the non-regular ones for consistency as well.

The naming between htup, tuple and similar is rather confused, I could not find
any consistent naming anywhere.

This is preparatory work for the logical decoding feature which needs to be
able to get to a valid relfilenode from when checking the visibility of a
tuple.



I have taken a look at this patch.   The patch does what it says, it 
changes a bunch of
HeapTupleSatisfies routines to take a HeapTuple structure instead of 
a HeapTupleHeader.

It also sets the HeapTuple.t_tableOid value before calling these routines.

The patch does not modify these routines to actually do anything useful 
with the additional data fields though it does add some assertions to 
make sure t_tableOid is set.


The patch compiles cleanly and the unit tests pass.

This patch  does not seem to depend on any of the other patches in this 
set and applies cleanly against master.  The patch doesn't actually add 
any functionality, unless someone sees a reason for complaining about 
this that I don't see, then I think it can be committed.


Steve


---
  contrib/pgrowlocks/pgrowlocks.c  |  2 +-
  src/backend/access/heap/heapam.c | 13 ++
  src/backend/access/heap/pruneheap.c  | 16 ++--
  src/backend/catalog/index.c  |  2 +-
  src/backend/commands/analyze.c   |  3 ++-
  src/backend/commands/cluster.c   |  2 +-
  src/backend/commands/vacuumlazy.c|  3 ++-
  src/backend/storage/lmgr/predicate.c |  2 +-
  src/backend/utils/time/tqual.c   | 50 +---
  src/include/utils/snapshot.h |  4 +--
  src/include/utils/tqual.h| 20 +++
  11 files changed, 83 insertions(+), 34 deletions(-)

Re: [HACKERS] logical changeset generation v3 - Source for Slony

2012-11-19 Thread Steve Singer


First, you can add me to the list of people saying 'wow', I'm impressed.

The approach I am taking to reviewing this to try and answer the 
following question


1) How might a future version of slony be able to use logical 
replication as described by your patch and design documents

and what would that look like.

2) What functionality is missing from the patch set that would stop me 
from implementing or prototyping the above.




Connecting slon to the remote postgresql


Today the slony remote listener thread queries a bunch of events from 
sl_event for a batch of SYNC events. Then the remote helper thread 
queries data from sl_log_1 and sl_log_2.I see this changing, instead 
the slony remote listener thread would connect to the remote system and 
get a logical replication stream.


  1) Would slony connect as a normal client connection and call 
something like 'select pg_slony_process_xlog(...)' to get bunch of 
logical replication

  change records to process.
  OR
  2) Would slony connect as a replication connection similar to how the 
pg_receivelog program does today and then process the logical changeset

  outputs.  Instead of writing it to a file (as pg_receivelog does)

It seems that the second approach is what is encouraged.  I think we 
would put a lot of the pg_receivelog functionality into slon and it 
would issue a command like 'INIT_LOGICAL_REPLICATION 'slony') to use the 
slony logical replication plugin.  Slon would also have to provide 
feedback to the walsender about what it has processed so the origin 
database knows what catalog snapshots can be expired.  Based on 
eyeballing pg_receivelog.c it seems that about half the code in the 700 
file is related to command line arguments etc, and the other half is 
related to looping over the copy out stream, sending feedback and other 
things that we would need to duplicate in slon.


pg_receivelog.c has  comment:

/*
 * We have to use postgres.h not postgres_fe.h here, because there's so 
much

 * backend-only stuff in the XLOG include files we need.  But we need a
 * frontend-ish environment otherwise.Hence this ugly hack.
 */

This looks like more of a carryover from pg_receivexlog.c.  From what I 
can tell we can eliminate the postgres.h include if we also eliminate 
the utils/datetime.h and utils/timestamp.h and instead add in:


#include "postgres_fe.h"
#define POSTGRES_EPOCH_JDATE 2451545
#define UNIX_EPOCH_JDATE 2440588
#define SECS_PER_DAY 86400
#define USECS_PER_SEC INT64CONST(100)
typedef int64 XLogRecPtr;
#define InvalidXLogRecPtr 0

If there is a better way of getting these defines someone should speak 
up.   I recall that in the past slon actually did include postgres.h and 
it caused some issues (I think with MSVC win32 builds).  Since 
pg_receivelog.c will be used as a starting point/sample for third 
parties to write client programs it would be better if it didn't 
encourage client programs to include postgres.h



The Slony Output Plugin
=

Once we've modified slon to connect as a logical replication client we 
will need to write a slony plugin.


As I understand the plugin API:
* A walsender is processing through WAL records, each time it sees a 
COMMIT WAL record it will call my plugins

.begin
.change (for each change in the transaction)
.commit

* The plugin for a particular stream/replication client will see one 
transaction at a time passed to it in commit order.  It won't see 
.change(t1) followed by .change (t2), followed by a second .change(t1).  
The reorder buffer code hides me from all that complexity (yah)


From a slony point of view I think the output of the plugin will be 
rows, suitable to be passed to COPY IN of the form:


origin_id, table_namespace,table_name,command_type, 
cmd_updatencols,command_args


This is basically the Slony 2.2 sl_log format minus a few columns we no 
longer need (txid, actionseq).
command_args is a postgresql text array of column=value pairs.  Ie [ 
{id=1},{name='steve'},{project='slony'}]


I don't t think our output plugin will be much more complicated than the 
test_decoding plugin.  I suspect we will want to give it the ability to 
filter out non-replicated tables.   We will also have to filter out 
change records that didn't originate on the local-node that aren't part 
of a cascaded subscription.  Remember that in a two node cluster  slony 
will have connections from A-->B  and from B--->A even if user tables 
only flow one way. Data that is replicated from A into B will show up in 
the WAL stream for B.


Exactly how we do this filtering is an open question,  I think the 
output plugin will at a minimum need to know:


a) What the slony node id is of the node it is running on.  This is easy 
to figure out if the output plugin is able/allowed to query its 
database.  Will this be possible? I would expect to be able to query the 
database as it exists now(at plugin invocation time) not as it existed 
in the pa

Re: [HACKERS] logical changeset generation v3 - Source for Slony

2012-11-19 Thread Steve Singer


On 12-11-18 11:07 AM, Andres Freund wrote:

Hi Steve!


I think we should provide some glue code to do this, otherwise people
will start replicating all the bugs I hacked into this... More
seriously: I think we should have support code here, no user will want
to learn the intracacies of feedback messages and such. Where that would
live? No idea.


libpglogicalrep.so ?


I wholeheartedly aggree. It should also be cleaned up a fair bit before
others copy it should we not go for having some client side library.

Imo the library could very roughly be something like:

state = SetupStreamingLLog(replication-slot, ...);
while((message = StreamingLLogNextMessage(state))
{
  write(outfd, message->data, message->length);
  if (received_100_messages)
  {
   fsync(outfd);
   StreamingLLogConfirm(message);
  }
}

Although I guess thats not good enough because StreamingLLogNextMessage
would be blocking, but that shouldn't be too hard to work around.



How about we pass a timeout value to StreamingLLogNextMessage (..) where 
it returns if no data is available after the timeout to give the caller 
a chance to do something else.



This is basically the Slony 2.2 sl_log format minus a few columns we no
longer need (txid, actionseq).
command_args is a postgresql text array of column=value pairs.  Ie [
{id=1},{name='steve'},{project='slony'}]

It seems to me that that makes escaping unneccesarily complicated, but
given you already have all the code... ;)


When I look at the actual code/representation we picked it is closer to 
{column1,value1,column2,value2...}





I don't t think our output plugin will be much more complicated than the
test_decoding plugin.

Good. Thats the idea ;). Are you ok with the interface as it is now or
would you like to change something?


I'm going to think about this some more and maybe try to write an 
example plugin before I can say anything with confidence.




Yes. We will also need something like that. If you remember the first
prototype we sent to the list, it included the concept of an
'origin_node' in wal record. I think you actually reviewed that one ;)

That was exactly aimed at something like this...

Since then my thoughts about how the origin_id looks like have changed a
bit:
- origin id is internally still represented as an uint32/Oid
   - never visible outside of wal/system catalogs
- externally visible it gets
   - assigned an uuid
   - optionally assigned a user defined name
- user settable (permissions?) origin when executing sql:
   - SET change_origin_uuid = 'uuid';
   - SET change_origin_name = 'user-settable-name';
   - defaults to the local node
- decoding callbacks get passed the origin of a change
   - txn->{origin_uuid, origin_name, origin_internal?}
- the init decoding callback can setup an array of interesting origins,
   so the others don't even get the ReorderBuffer treatment

I have to thank the discussion on -hackers and a march through prague
with Marko here...
So would the uuid and optional name assignment be done in the output 
plugin or some else?
When/how does the uuid get generated and where do we store it so the 
same uuid gets returned when postgres restarts.  Slony today stores all 
this type of stuff in user-level tables and user-level functions 
(because it has no other choice).What is the connection between 
these values and the 'slot-id' in your proposal for the init arguments? 
Does the slot-id need to be the external uuid of the other end or is 
there no direct connection?


Today slony allows us to replicate between two databases in the same 
postgresql cluster (I use this for testing all the time)
Slony also allows for two different 'slony clusters' to be setup in the 
same database (or so I'm told, I don't think I have ever tried this myself).


plugin functions that let me query the local database and then return 
the  uuid and origin_name would work in this model.


+1 on being able to mark the 'change origin' in a SET command when the 
replication process is pushing data into the replica.



Exactly how we do this filtering is an open question,  I think the output
plugin will at a minimum need to know:

a) What the slony node id is of the node it is running on.  This is easy to
figure out if the output plugin is able/allowed to query its database.  Will
this be possible? I would expect to be able to query the database as it
exists now(at plugin invocation time) not as it existed in the past when the
WAL was generated.   In addition to the node ID I can see us wanting to be
able to query other slony tables (sl_table,sl_set etc...)

Hm. There is no fundamental reason not to allow normal database access
to the current database but it won't be all that cheap, so doing it
frequently is not a good idea.
The reason its not cheap is that you basically need to teardown the
postgres internal caches if you switch the timestream in which you are
working.

Would go something like:

TransactionContext = AllocSetCreate(

Re: [HACKERS] [PATCH 11/14] Introduce wal decoding via catalog timetravel

2012-12-02 Thread Steve Singer


On 12-11-14 08:17 PM, Andres Freund wrote:

I am getting errors like the following when I try to use either your 
test_decoding plugin or my own (which does even less than yours)



LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
WARNING:  connecting to
WARNING:  Initiating logical rep
LOG:  computed new xmin: 773
LOG:  start reading from 0/17F5D58, scrolled back to 0/17F4000
LOG:  got new xmin 773 at 25124280
LOG:  found initial snapshot (via running xacts). Done: 1
WARNING:  reached consistent point, stopping!
WARNING:  Starting logical replication
LOG:  start reading from 0/17F5D58, scrolled back to 0/17F4000
LOG:  found initial snapshot (via running xacts). Done: 1
FATAL:  cannot read pg_class without having selected a database
TRAP: FailedAssertion("!(SHMQueueEmpty(&(MyProc->myProcLocks[i])))", 
File: "proc.c", Line: 759)


This seems to be happening under the calls at
reorderbuffer.c:832if (!SnapBuildHasCatalogChanges(NULL, xid, 
&change->relnode))


The sequence of events I do is:
1. start pg_receivellog
2. run a checkpoint
3. Attach to the walsender process with gdb
4. Start a new client connection with psql and do 'INSERT INTO a values 
(1)' twice.


(skipping step 3 doesn't make a difference)




I



This introduces several things:
* 'reorderbuffer' module which reassembles transactions from a stream of 
interspersed changes
* 'snapbuilder' which builds catalog snapshots so that tuples from wal can be 
understood
* logging more data into wal to facilitate logical decoding
* wal decoding into an reorderbuffer
* shared library output plugins with 5 callbacks
  * init
  * begin
  * change
  * commit
* walsender infrastructur to stream out changes and to keep the global xmin low 
enough
  * INIT_LOGICAL_REPLICATION $plugin; waits till a consistent snapshot is built 
and returns
* initial LSN
* replication slot identifier
* id of a pg_export() style snapshot
  * START_LOGICAL_REPLICATION $id $lsn; streams out changes
  * uses named output plugins for output specification

Todo:
* testing infrastructure (isolationtester)
* persistence/spilling to disk of built snapshots, longrunning
   transactions
* user docs
* more frequent lowering of xmins
* more docs about the internals
* support for user declared catalog tables
* actual exporting of initial pg_export snapshots after
   INIT_LOGICAL_REPLICATION
* own shared memory segment instead of piggybacking on walsender's
* nicer interface between snapbuild.c, reorderbuffer.c, decode.c and the
   outside.
* more frequent xl_running_xid's so xmin can be upped more frequently
* add STOP_LOGICAL_REPLICATION $id
---
  src/backend/access/heap/heapam.c|  280 +-
  src/backend/access/transam/xlog.c   |1 +
  src/backend/catalog/index.c |   74 ++
  src/backend/replication/Makefile|2 +
  src/backend/replication/logical/Makefile|   19 +
  src/backend/replication/logical/decode.c|  496 ++
  src/backend/replication/logical/logicalfuncs.c  |  247 +
  src/backend/replication/logical/reorderbuffer.c | 1156 +++
  src/backend/replication/logical/snapbuild.c | 1144 ++
  src/backend/replication/repl_gram.y |   32 +-
  src/backend/replication/repl_scanner.l  |2 +
  src/backend/replication/walsender.c |  566 ++-
  src/backend/storage/ipc/procarray.c |   23 +
  src/backend/storage/ipc/standby.c   |8 +-
  src/backend/utils/cache/inval.c |2 +-
  src/backend/utils/cache/relcache.c  |3 +-
  src/backend/utils/misc/guc.c|   11 +
  src/backend/utils/time/tqual.c  |  249 +
  src/bin/pg_controldata/pg_controldata.c |2 +
  src/include/access/heapam_xlog.h|   23 +
  src/include/access/transam.h|5 +
  src/include/access/xlog.h   |3 +-
  src/include/catalog/index.h |4 +
  src/include/nodes/nodes.h   |2 +
  src/include/nodes/replnodes.h   |   22 +
  src/include/replication/decode.h|   21 +
  src/include/replication/logicalfuncs.h  |   44 +
  src/include/replication/output_plugin.h |   76 ++
  src/include/replication/reorderbuffer.h |  284 ++
  src/include/replication/snapbuild.h |  128 +++
  src/include/replication/walsender.h |1 +
  src/include/replication/walsender_private.h |   34 +-
  src/include/storage/itemptr.h   |3 +
  src/include/storage/sinval.h|2 +
  src/include/utils/tqual.h   |   31 +-
  35 files changed, 4966 insertions(+), 34 deletions(-)
  create mode 100644 src/backend/replication/logical/Makefile
  create mode 100644 src/backend/replication/logical/decode.c

Re: [HACKERS] [PATCH 11/14] Introduce wal decoding via catalog timetravel

2012-12-03 Thread Steve Singer


On 12-12-03 07:42 AM, Andres Freund wrote:

Two things:
1) Which exact options are you using for pg_receivellog? Not "-d
replication" by any chance?


Yes that is exactly what I'md doing.  Using a real database name instead 
makes this go away.


Thanks


This seems to produce exactly that kind off error messages. I added some
error checking around that. Pushed.

Thanks!

Andres Freund

--
  Andres Freundhttp://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 11/14] Introduce wal decoding via catalog timetravel

2012-12-03 Thread Steve Singer


On 12-12-03 09:48 AM, Andres Freund wrote:

On 2012-12-03 09:35:55 -0500, Steve Singer wrote:

On 12-12-03 07:42 AM, Andres Freund wrote:
Yes that is exactly what I'md doing. Using a real database name 
instead makes this go away. 

Was using "replication" an accident or do you think it makes sense in
some way?


The 'replication' line in pg_hba.conf made me think that the database 
name had to be replication for walsender connections.



Greetings,

Andres Freund

--
  Andres Freundhttp://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding & exported base snapshot

2012-12-11 Thread Steve Singer


On 12-12-11 06:52 PM, Andres Freund wrote:

Hi,




Problem 1:

One problem I see is that while exporting a snapshot solves the
visibility issues of the table's contents it does not protect against
schema changes. I am not sure whether thats a problem.

If somebody runs a CLUSTER or something like that, the table's contents
will be preserved including MVCC semantics. That's fine.
The more problematic cases I see are TRUNCATE, DROP and ALTER
TABLE. Imagine the following:

S1: INIT_LOGICAL_REPLICATION
S1: get snapshot: 0333-1
S2: ALTER TABLE foo ALTER COLUMN blub text USING (blub::text);
S3: pg_dump --snapshot 0333-1
S1: START_LOGICAL_REPLICATION

In that case the pg_dump would dump foo using the schema *after* the
ALTER TABLE but showing only rows visible to our snapshot. After
START_LOGICAL_REPLICATION all changes after the xlog position from
INIT_LOGICAL_REPLICATION will be returned though - including all the
tuples from the ALTER TABLE and potentially - if some form of schema
capturing was in place - the ALTER TABLE itself. The copied schema would
have the new format already though.

Does anybody see that as aproblem or is it just a case of PEBKAC? One
argument for the latter is that thats already a problematic case for
normal pg_dump's. Its just that the window is a bit larger here.


Is there anyway to detect this situation as part of the pg_dump?  If I 
could detect this, abort my pg_dump then go and get a new snapshot then 
I don't see this as a big deal.  I can live with telling users, "don't 
do DDL like things while subscribing a new node, if you do the 
subscription will restart". I am less keen on telling users "don't do 
DDL like things while subscribing a new node or the results will be 
unpredictable"


I'm trying to convince myself if I will be able to take a pg_dump from 
an exported snapshot plus the changes made after in between the snapshot 
id to some later time and turn the results into a consistent database.  
What if something like this comes along


INIT REPLICATION
insert into foo (id,bar) values (1,2);
alter table foo drop column bar;
pg_dump --snapshot


The schema I get as part of the pg_dump won't have bar because it has 
been dropped, even though it will have the rows that existed with bar.
I then go to process the INSERT statement.  It will have a WAL record 
with column data for bar and the logical replication replay will lookup 
the catalog rows from before the alter table so it will generate a 
logical INSERT record with BAR.  That will fail on the replica.


I'm worried that it will be  difficult to pragmatically stitch together 
the  inconsistent snapshot from the pg_dump plus the logical records 
generated in between the snapshot and the dump (along with any record of 
the DDL if it exists).






Problem 2:

Control Flow.

To be able to do a "SET TRANSACTION SNAPSHOT" the source transaction
needs to be alive. That's currently solved by exporting the snapshot in
the walsender connection that did the INIT_LOGICAL_REPLICATION. The
question is how long should we preserve that snapshot?

There is no requirement - and there *cannot* be one in the general case,
the slot needs to be usable after a restart - that
START_LOGICAL_REPLICATION has to be run in the same replication
connection as INIT.

Possible solutions:
1) INIT_LOGICAL_REPLICATION waits for an answer from the client that
confirms that logical replication initialization is finished. Before
that the walsender connection cannot be used for anything else.

2) we remove the snapshot as soon as any other commend is received, this
way the replication connection stays usable, e.g. to issue a
START_LOGICAL_REPLICATION in parallel to the initial data dump. In that
case the snapshot would have to be imported *before* the next command
was received as SET TRANSACTION SNAPSHOT requires the source transaction
to be still open.

Opinions?


Option 2 sounds more flexible.  Is it more difficult to implement?

Andres
--
  Andres Freundhttp://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding & exported base snapshot

2012-12-13 Thread Steve Singer


On 12-12-12 06:20 AM, Andres Freund wrote:

Possible solutions:
1) INIT_LOGICAL_REPLICATION waits for an answer from the client that
confirms that logical replication initialization is finished. Before
that the walsender connection cannot be used for anything else.

2) we remove the snapshot as soon as any other commend is received, this
way the replication connection stays usable, e.g. to issue a
START_LOGICAL_REPLICATION in parallel to the initial data dump. In that
case the snapshot would have to be imported *before* the next command
was received as SET TRANSACTION SNAPSHOT requires the source transaction
to be still open.
Option 2 sounds more flexible.  Is it more difficult to implement?

No, I don't think so. It's a bit more intrusive in that it requires
knowledge about logical replication in more parts of walsender, but it
should be ok.

Note btw, that my description of 1) was easy to misunderstand. The
"that" in "Before that the walsender connection cannot be used for
anything else." is the answer from the client, not the usage of the
exported snapshot. Once the snapshot has been iimported into other
session(s) the source doesn't need to be alive anymore.
Does that explanation change anything?


I think I understood you were saying the walsender connection can't be 
used for anything else (ie streaming WAL) until the exported snapshot 
has been imported.  I think your clarification is still consistent with 
this?


WIth option 2 I can still get the option 1 behaviour by not sending the 
next command to the walsender until I am done importing the snapshot.  
However if I want to start processing WAL before the snapshot has been 
imported I can do that with option 2.


I am not sure I would need to do that, I'm just saying having the option 
is more flexible.







Greetings,

Andres Freund

--
  Andres Freundhttp://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] AIX buildfarm member

2013-01-11 Thread Steve Singer

The only animal in the buildfarm running AIX is grebe 
(http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=grebe&br=HEAD)


It is likely that the server running this animal will go away sometime 
this year and the machine replacing it isn't running AIX.


If someone else in the community is running PostgreSQL on AIX then it 
would be good if they setup a buildfarm member, perhaps with a more 
recent version of AIX.



Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-19 Thread Steve Singer


On 13-01-09 03:07 PM, Tom Lane wrote:

Andres Freund  writes:

Well, I *did* benchmark it as noted elsewhere in the thread, but thats
obviously just machine (E5520 x 2) with one rather restricted workload
(pgbench -S -jc 40 -T60). At least its rather palloc heavy.
Here are the numbers:
before:
#101646.763208 101350.361595 101421.425668 101571.211688 101862.172051 
101449.857665
after:
#101553.596257 102132.277795 101528.816229 101733.541792 101438.531618 
101673.400992
So on my system if there is a difference, its positive (0.12%).

pgbench-based testing doesn't fill me with a lot of confidence for this
--- its numbers contain a lot of communication overhead, not to mention
that pgbench itself can be a bottleneck.  It struck me that we have a
recent test case that's known to be really palloc-intensive, namely
Pavel's example here:
http://www.postgresql.org/message-id/CAFj8pRCKfoz6L82PovLXNK-1JL=jzjwat8e2bd2pwnkm7i7...@mail.gmail.com

I set up a non-cassert build of commit
78a5e738e97b4dda89e1bfea60675bcf15f25994 (ie, just before the patch that
reduced the data-copying overhead for that).  On my Fedora 16 machine
(dual 2.0GHz Xeon E5503, gcc version 4.6.3 20120306 (Red Hat 4.6.3-2))
I get a runtime for Pavel's example of 17023 msec (average over five
runs).  I then applied oprofile and got a breakdown like this:

   samples|  %|
--
108409 84.5083 /home/tgl/testversion/bin/postgres
 13723 10.6975 /lib64/libc-2.14.90.so
  3153  2.4579 /home/tgl/testversion/lib/postgresql/plpgsql.so

samples  %symbol name
1096010.1495  AllocSetAlloc
6325  5.8572  MemoryContextAllocZeroAligned
6225  5.7646  base_yyparse
3765  3.4866  copyObject
2511  2.3253  MemoryContextAlloc
2292  2.1225  grouping_planner
2044  1.8928  SearchCatCache
1956  1.8113  core_yylex
1763  1.6326  expression_tree_walker
1347  1.2474  MemoryContextCreate
1340  1.2409  check_stack_depth
1276  1.1816  GetCachedPlan
1175  1.0881  AllocSetFree
1106  1.0242  GetSnapshotData
1106  1.0242  _SPI_execute_plan
1101  1.0196  extract_query_dependencies_walker

I then applied the palloc.h and mcxt.c hunks of your patch and rebuilt.
Now I get an average runtime of 1 ms, a full 2% faster, which is a
bit astonishing, particularly because the oprofile results haven't moved
much:

107642 83.7427 /home/tgl/testversion/bin/postgres
 14677 11.4183 /lib64/libc-2.14.90.so
  3180  2.4740 /home/tgl/testversion/lib/postgresql/plpgsql.so

samples  %symbol name
10038 9.3537  AllocSetAlloc
6392  5.9562  MemoryContextAllocZeroAligned
5763  5.3701  base_yyparse
4810  4.4821  copyObject
2268  2.1134  grouping_planner
2178  2.0295  core_yylex
1963  1.8292  palloc
1867  1.7397  SearchCatCache
1835  1.7099  expression_tree_walker
1551  1.4453  check_stack_depth
1374  1.2803  _SPI_execute_plan
1282  1.1946  MemoryContextCreate
1187  1.1061  AllocSetFree
...
653   0.6085  palloc0
...
552   0.5144  MemoryContextAlloc

The number of calls of AllocSetAlloc certainly hasn't changed at all, so
how did that get faster?

I notice that the postgres executable is about 0.2% smaller, presumably
because a whole lot of inlined fetches of CurrentMemoryContext are gone.
This makes me wonder if my result is due to chance improvements of cache
line alignment for inner loops.

I would like to know if other people get comparable results on other
hardware (non-Intel hardware would be especially interesting).  If this
result holds up across a range of platforms, I'll withdraw my objection
to making palloc a plain function.

regards, tom lane



Sorry for the delay I only read this thread today.


I just tried Pawel's test on a POWER5 machine with an older version of 
gcc (see the grebe buildfarm animal for details)


78a5e738e:   37874.855 (average of 6 runs)
78a5e738 + palloc.h + mcxt.c: 38076.8035

The functions do seem to slightly slow things down on POWER. I haven't bothered 
to run oprofile or tprof to get a breakdown of the functions since Andres has 
already removed this from his patch.

Steve




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v4

2013-01-19 Thread Steve Singer


On 13-01-14 08:38 PM, Andres Freund wrote:

Hi everyone,

Here is the newest version of logical changeset generation.



2) Currently the logical replication infrastructure assigns a 'slot-id'
when a new replica is setup. That slot id isn't really nice
(e.g. "id-321578-3"). It also requires that [18] keeps state in a global
variable to make writing regression tests easy.

I think it would be better to make the user specify those replication
slot ids, but I am not really sure about it.


Shortly after trying out the latest version I hit the following scenario
1. I started pg_receivellog but mistyped the name of my plugin
2. It looped and used up all of my logical replication slots

I killed pg_receivellog and restarted it with the correct plugin name 
but it won't do anything because I have no free slots.  I can't free the 
slots with -F because I have no clue what the names of the slots are.
I can figure the names out by looking in pg_llog but if my replication 
program can't do that so it won't be able to clean up from a failed attempt.


I agree with you that we should make the user program specify a slot, we 
eventually might want to provide a view that shows the currently 
allocated slots. For a logical based slony I would just generate the 
slot name based on the remote node id.  If walsender generates the slot 
name then I would need to store a mapping between slot names and slons 
so when a slon restarted it would know which slot to resume using.   I'd 
have to use a table in the slony schema on the remote database for 
this.  There would always be a risk of losing track of a slot id if the 
slon crashed after getting the slot number but before committing the 
mapping on the remote database.






3) Currently no options can be passed to an output plugin. I am thinking
about making "INIT_LOGICAL_REPLICATION 'plugin'" accept the now widely
used ('option' ['value'], ...) syntax and pass that to the output
plugin's initialization function.


I think we discussed this last CF, I like this idea.


4) Does anybody object to:
-- allocate a permanent replication slot
INIT_LOGICAL_REPLICATION 'plugin' 'slotname' (options);

-- stream data
START_LOGICAL_REPLICATION 'slotname' 'recptr';

-- deallocate a permanent replication slot
FREE_LOGICAL_REPLICATION 'slotname';


+1



5) Currently its only allowed to access catalog tables, its fairly
trivial to extend this to additional tables if you can accept some
(noticeable but not too big) overhead for modifications on those tables.

I was thinking of making that an option for tables, that would be useful
for replication solutions configuration tables.


I think this will make the life of anyone developing a new replication 
system easier.  Slony has a lot of infrastructure for allowing slonik 
scripts to wait for configuration changes to popogate everywhere before 
making other configuration changes because you can get race conditions.  
If I were designing a new replication system and I had this feature then 
I would try to use it to come up with a simpler model of propagating 
configuration changes.




Andres Freund




Steve



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-21 Thread Steve Singer


On 13-01-21 02:28 AM, Andres Freund wrote:


I haven't removed it from the patch afaik, so it would be great to get a
profile here! Its "only" for xlogdump, but that tool helped me immensely
and I don't want to maintain it independently...


Here is the output from tprof

Here is the baseline:

Average time 37862

Total Ticks For All Processes (./postgres_perftest) = 19089



Subroutine Ticks%   Source Address  Bytes
== = == == ===  =
.AllocSetAlloc  2270   0.99 aset.c 3bfb84b0
.base_yyparse   1217   0.53 gram.c fe644  168ec
.text   1015   0.44 copyfuncs.c 11bfb4   4430
.MemoryContextAllocZero  535   0.23 mcxt.c 3b990 f0
.MemoryContextAlloc  533   0.23 mcxt.c 3b780 e0
.check_stack_depth   392   0.17 postgres.c 568ac100
.core_yylex  385   0.17 gram.c fb5b4   1c90
.expression_tree_walker  347   0.15 nodeFuncs.c 50da4750
.AllocSetFree308   0.13 aset.c 3c9981b0
.grouping_planner242   0.11 planner.c 2399f0   1d10
.SearchCatCache  198   0.09 catcache.c 7ec20550
._SPI_execute_plan   195   0.09 spi.c 2f0c187c0
.pfree   195   0.09 mcxt.c 3b3b0 70
query_dependencies_walker185   0.08 setrefs.c 2559441b0
.GetSnapshotData 183   0.08 procarray.c 69efc460
.query_tree_walker   176   0.08 nodeFuncs.c 50ae4210
.strncpy 168   0.07 strncpy.s ba080130
.fmgr_info_cxt_security  166   0.07 fmgr.c 3f7b0850
.transformStmt   159   0.07 analyze.c 29091c   12d0
.text141   0.06 parse_collate.c 28ddf8  1
.ExecInitExpr137   0.06 execQual.c 17f18c   15f0
.fix_expr_common 132   0.06 setrefs.c 2557e4160
.standard_ExecutorStart  127   0.06 execMain.c 1d9a00940
.GetCachedPlan   125   0.05 plancache.c ce664310
.strcpy  121   0.05 noname 3bd401a8


With your changes (same test as I described before)

Average: 37938


Total Ticks For All Processes (./postgres_perftest) = 19311

Subroutine Ticks%   Source Address  Bytes
== = == == ===  =
.AllocSetAlloc  2162   2.17 aset.c 3bfb84b0
.base_yyparse   1242   1.25 gram.c fdc7c  167f0
.text   1028   1.03 copyfuncs.c 11b4d0   4210
.palloc  553   0.56 mcxt.c 3b4c8 d0
.MemoryContextAllocZero  509   0.51 mcxt.c 3b9e8 f0
.core_yylex  413   0.41 gram.c fac2c   1c60
.check_stack_depth   404   0.41 postgres.c 56730100
.expression_tree_walker  320   0.32 nodeFuncs.c 50d28750
.AllocSetFree261   0.26 aset.c 3c9981b0
._SPI_execute_plan   232   0.23 spi.c 2ee6247c0
.GetSnapshotData 221   0.22 procarray.c 69d54460
.grouping_planner211   0.21 planner.c 237b60   1cf0
.MemoryContextAlloc  190   0.19 mcxt.c 3b738 e0
.query_tree_walker   184   0.18 nodeFuncs.c 50a68210
.SearchCatCache  182   0.18 catcache.c 7ea08550
.transformStmt   181   0.18 analyze.c 28e774   12d0
query_dependencies_walker180   0.18 setrefs.c 25397c1b0
.strncpy 175   0.18 strncpy.s b9a60130
.MemoryContextCreate 167   0.17 mcxt.c 3bad8160
.pfree   151   0.15 mcxt.c 3b208 70
.strcpy  150   0.15 noname 3bd401a8
.fmgr_info_cxt_security  146   0.15 fmgr.c 3f790850
.text132   0.13 parse_collate.c 28bc50  1
.ExecInitExpr125   0.13 execQual.c 17de28   15e0
.expression_tree_mutator 124   0.12 nodeFuncs.c 53268   1080
.strcmp  122   0.12 noname 1e44158
.fix_expr_common 117   0.12 setrefs.c 25381c160





Greetings,

Andres Freund





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

2013-01-21 Thread Steve Singer


On 13-01-21 12:15 PM, Andres Freund wrote:

On 2013-01-21 11:59:18 -0500, Steve Singer wrote:

On 13-01-21 02:28 AM, Andres Freund wrote:

I haven't removed it from the patch afaik, so it would be great to get a
profile here! Its "only" for xlogdump, but that tool helped me immensely
and I don't want to maintain it independently...

Here is the output from tprof

Here is the baseline:

Any chance to do the test ontop of HEAD? The elog stuff has gone in only
afterwards and might actually effect this enough to be relevant.


HEAD(:fb197290)
Average: 28877

.AllocSetAlloc  1442   1.90 aset.c 3a9384b0
.base_yyparse   1220   1.61 gram.c fdbc0  168ec
.MemoryContextAlloc  485   0.64 mcxt.c 3a154 e0
.core_yylex  407   0.54 gram.c fab30   1c90
.AllocSetFree320   0.42 aset.c 3b3181b0
.MemoryContextAllocZero  316   0.42 mcxt.c 3a364 f0
.grouping_planner271   0.36 planner.c 2b0ce8   1d10
.strncpy 256   0.34 strncpy.s b8ca0130
.expression_tree_walker  222   0.29 nodeFuncs.c 4f734750
._SPI_execute_plan   221   0.29 spi.c 2fb230840
.SearchCatCache  182   0.24 catcache.c 7d870550
.GetSnapshotData 161   0.21 procarray.c 68acc460
.fmgr_info_cxt_security  155   0.20 fmgr.c 3e130850
.pfree   152   0.20 mcxt.c 39d84 70
.expression_tree_mutator 151   0.20 nodeFuncs.c 51c84   1170
.check_stack_depth   142   0.19 postgres.c 5523c100
.text126   0.17 parse_collate.c 233d90  1
.transformStmt   125   0.16 analyze.c 289e88   12c0
.ScanKeywordLookup   123   0.16 kwlookup.c f7ab4154
.new_list120   0.16 list.c 40f74 b0
.subquery_planner115   0.15 planner.c 2b29f8dc0
.GetCachedPlan   115   0.15 plancache.c cdab0310
.ExecInitExpr114   0.15 execQual.c 17e690   15f0
.set_plan_refs   113   0.15 setrefs.c 252630cb0
.standard_ExecutorStart  110   0.14 execMain.c 1d9264940
.heap_form_tuple 107   0.14 heaptuple.c 177c842f0
.query_tree_walker   105   0.14 nodeFuncs.c 4f474210



HEAD + patch:
Average: 29035

Total Ticks For All Processes (./postgres_perftest) = 15044

Subroutine Ticks%   Source Address  Bytes
== = == == ===  =
.AllocSetAlloc  1422   1.64 aset.c 3a9384b0
.base_yyparse   1201   1.38 gram.c fd1e8  167f0
.palloc  470   0.54 mcxt.c 39e04 d0
.core_yylex  364   0.42 gram.c fa198   1c60
.MemoryContextAllocZero  282   0.33 mcxt.c 3a324 f0
._SPI_execute_plan   250   0.29 spi.c 2f8c18840
.AllocSetFree244   0.28 aset.c 3b3181b0
.strncpy 234   0.27 strncpy.s b86a0130
.expression_tree_walker  229   0.26 nodeFuncs.c 4f698750
.grouping_planner176   0.20 planner.c 2aea84   1d30
.SearchCatCache  170   0.20 catcache.c 7d638550
.GetSnapshotData 168   0.19 procarray.c 68904460
.assign_collations_walker155   0.18 parse_collate.c 231f4c7e0
.subquery_planner141   0.16 planner.c 2b07b4dc0
.fmgr_info_cxt_security  141   0.16 fmgr.c 3e110850
.check_stack_depth   140   0.16 postgres.c 550a0100
.ExecInitExpr138   0.16 execQual.c 17d2f8   15e0
.pfree   137   0.16 mcxt.c 39b44 70
.transformStmt   132   0.15 analyze.c 287dec   12c0
.new_list128   0.15 list.c 40f44 90
.expression_tree_mutator 125   0.14 nodeFuncs.c 51bd8   1080
.preprocess_expression   118   0.14 planner.c 2adf541a0
.strcmp  118   0.14 noname 1e44158
.standard_ExecutorStart  115   0.13 execMain.c 1d77c0940
.MemoryContextAlloc  111   0.13 mcxt.c 3a074 e0
.set_plan_refs   109   0.13 setrefs.c 2506c4ca0





Otherwise I have to say I am a bit confused by the mighty difference in
costs attributed to AllocSetAlloc given the amount of calls should be
exactly the same and the number of "trampoline" function calls should
also stay the same.
Greetings,

Andres Freund

--
  Andres Freundhttp://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state

2013-01-24 Thread Steve Singer

On 13-01-24 06:40 AM, Andres Freund wrote:

Fair enough. I am also working on a user of this infrastructure but that
doesn't help you very much. Steve Singer seemed to make some stabs at
writing an output plugin as well. Steve, how far did you get there?

I was able to get something that generated output for INSERT statements
in a format similar to what a modified slony apply trigger would want.
This was with the list of tables to replicate hard-coded in the plugin.
This was with the patchset from the last commitfest.I had gotten a bit
hung up on the UPDATE and DELETE support because slony allows you to use
an arbitrary user specified unique index as your key. It looks like
better support for tables with a unique non-primary key is in the most
recent patch set. I am hoping to have time this weekend to update my
plugin to use parameters passed in on the init and other updates in the
most recent version. If I make some progress I will post a link to my
progress at the end of the weekend. My big issue is that I have limited
time to spend on this.

BTW, why does all the transaction reordering stuff has to be in core?

It didn't use to, but people argued pretty damned hard that no undecoded
data should ever allowed to leave the postgres cluster. And to be fair
it makes writing an output plugin *way* much easier. Check
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=contrib/test_decoding/test_decoding.c;hb=xlog-decoding-rebasing-cf4
If you skip over tuple_to_stringinfo(), which is just pretty generic
scaffolding for converting a whole tuple to a string, writing out the
changes in some format by now is pretty damn simple.

I think we will find that the replication systems won't be the only
users of this feature. I have often seen systems that have a logging
requirement for auditing purposes or to log then reconstruct the
sequence of changes made to a set of tables in order to feed a
downstream application. Triggers and a journaling table are the
traditional way of doing this but it should be pretty easy to write a
plugin to accomplish the same thing that should give better
performance. If the reordering stuff wasn't in core this would be much
harder.

How much of this infrastructure is to support replicating DDL changes? IOW,
if we drop that requirement, how much code can we slash?

Unfortunately I don't think too much unless we add in other code that
allows us to check whether the current definition of a table is still
the same as it was back when the tuple was logged.

Any other features or requirements that could be dropped? I think it's clear at
this stage that
this patch is not going to be committed as it is. If you can reduce it to a
fraction of what it is now, that fraction might have a chance. Otherwise,
it's just going to be pushed to the next commitfest as whole, and we're
going to be having the same doubts and discussions then.

One thing that reduces complexity is to declare the following as
unsupported:
- CREATE TABLE foo(data text);
- DECODE UP TO HERE;
- INSERT INTO foo(data) VALUES(very-long-to-be-externally-toasted-tuple);
- DROP TABLE foo;
- DECODE UP TO HERE;

but thats just a minor thing.

I think what we can do more realistically than to chop of required parts
of changeset extraction is to start applying some of the preliminary
patches independently:
- the relmapper/relfilenode changes + pg_relation_by_filenode(spc,
relnode) should be independently committable if a bit boring
- allowing walsenders to connect to a database possibly needs an interface
change
but otherwise it should be fine to go in independently. It also has
other potential use-cases, so I think thats fair.
- logging xl_running_xact's more frequently could also be committed
independently and makes sense independently as it allows a standby to
enter HS faster if the master is busy
- Introducing InvalidCommandId should be relatively uncontroversial. The
fact that no invalid value for command ids exists is imo an oversight
- the *Satisfies change could be applied and they are imo ready but
there's no use-case for it without the rest, so I am not sure whether
theres a point
- currently not separately available, but we could add wal_level=logical
independently. There would be no user of it, but it would be partial
work. That includes the relcache support for keeping track of the
primary key which already is available separately.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Event Triggers: adding information

2013-01-24 Thread Steve Singer


On 13-01-24 05:43 AM, Dimitri Fontaine wrote:

Robert Haas  writes:

On Mon, Jan 21, 2013 at 12:27 PM, Dimitri Fontaine
 wrote:

   - Context exposing and filtering


I'm not feeling very sanguine about any of this.  I feel strongly that
we should try to do something that's more principled than just
throwing random junk into ProcessUtility.


The goal is to allow for advanced users of that feature to get at the
sequence, constraints and index names that have been picked
automatically by PostgreSQL when doing some CREATE TABLE statement that
involves them:

CREATE TABLE foo (id serial PRIMARY KEY, f1 text);

Andres and Steve, as you're the next ones to benefit directly from that
whole feature and at different levels, do you have a strong opinion on
whether you really need that to operate at the logical level?




I haven't been following this thread very closely and I'm not exactly 
sure what your asking.


If the question is what a replication system would need to replicate DDL 
commands, then I need a method of translating whatever the DDL trigger 
is passed into either the 'CREATE TABLE public.foo(id serial primary 
key, f1 text)' OR some set of equivalent commands that will allow me to 
create the same objects on the replica. I don't have any brilliant 
insight on how I would go about during it. Honestly I haven't spent a 
lot of time thinking about it in the past 8 months.


If your asking what I would need for a trigger to automatically add a 
new table to replication then I think I would only need to know (or be 
able to obtain) the fully qualified name of the table and then in 
another trigger call be given the name of the fully qualified name of 
the sequence.  The triggers would need to be able to do DML to 
configuration tables.  I don't need to know that a table and sequence 
are connected if all I want to do is replicate them.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state

2013-01-26 Thread Steve Singer


On 13-01-24 11:15 AM, Steve Singer wrote:

On 13-01-24 06:40 AM, Andres Freund wrote:


Fair enough. I am also working on a user of this infrastructure but that
doesn't help you very much. Steve Singer seemed to make some stabs at
writing an output plugin as well. Steve, how far did you get there?


I was able to get something that generated output for INSERT 
statements in a format similar to what a modified slony apply trigger 
would want.  This was with the list of tables to replicate hard-coded 
in the plugin.  This was with the patchset from the last commitfest.I 
had gotten a bit hung up on the UPDATE and DELETE support because 
slony allows you to use an arbitrary user specified unique index as 
your key.  It looks like better support for tables with a unique 
non-primary key is in the most recent patch set.  I am hoping to have 
time this weekend to update my plugin to use parameters passed in on 
the init and other updates in the most recent version.  If I make some 
progress I will post a link to my progress at the end of the weekend.  
My big issue is that I have limited time to spend on this.




This isn't a complete review just a few questions I've hit so far that I 
thought I'd ask to see if I'm not seeing something related to updates.



*** a/src/include/catalog/index.h
--- b/src/include/catalog/index.h
*** extern bool ReindexIsProcessingHeap(Oid
*** 114,117 
--- 114,121 
  extern bool ReindexIsProcessingIndex(Oid indexOid);
  extern OidIndexGetRelation(Oid indexId, bool missing_ok);

+ extern void relationFindPrimaryKey(Relation pkrel, Oid *indexOid,
+int16 *nratts, int16 *attnums, Oid 
*atttypids,

+Oid *opclasses);
+
  #endif   /* INDEX_H */


I don't see this defined anywhere could it be left over from a previous 
version of the patch?



In decode.c
DecodeUpdate:
+
+   /*
+* FIXME: need to get/save the old tuple as well if we want primary key
+* changes to work.
+*/
+   change->newtuple = ReorderBufferGetTupleBuf(reorder);

I also don't see any code in heap_update to find + save the old primary 
key values like you added to heap_delete.   You didn't list "Add ability 
to change the primary key on an UPDATE" in the TODO so I'm wondering if 
I'm missing something.  Is there another way I can bet the primary key 
values for the old_tuple?


Also,

I think the name of the test contrib module was changed but you didn't 
update the make file. This fixes it


diff --git a/contrib/Makefile b/contrib/Makefile
index 1cc30fe..36e6bfe 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -50,7 +50,7 @@ SUBDIRS = \
tcn \
test_parser \
test_decoding   \
-   test_logical_replication \
+   test_logical_decoding \
tsearch2\
unaccent\
vacuumlo\







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Event Triggers: adding information

2013-01-27 Thread Steve Singer


On 13-01-26 11:11 PM, Robert Haas wrote:

On Fri, Jan 25, 2013 at 11:58 AM, Dimitri Fontaine
 wrote:

My understanding is that if the command string we give to event triggers
is ambiguous (sub-object names, schema qualifications, etc), it comes
useless for logical replication use. I'll leave it to the consumers of
that to speak up now.


Yeah, that's probably true.  I think it might be useful for other
purposes, but I think we need a bunch of infrastructure we don't have
yet to make logical replication of DDL a reality.



I agree.  Does anyone have a specific use case other than DDL 
replication where an ambiguous command string would be useful? Even for 
use cases like automatically removing a table from replication when it 
is dropped, I would want to be able to determine which table is being 
dropped unambiguously. Could I determine that from an oid? I suspect so, 
but parsing a command string and then trying to figure out the table 
from the search_path doesn't sound very appealing.



Well, the point is that if you have a function that maps a parse tree
onto an object name, any API or ABI changes can be reflected in an
updated definition for that function.  So suppose I have the command
"CREATE TABLE public.foo (a int)".  And we have a call
pg_target_object_namespace(), which will return "public" given the
parse tree for the foregoing command.  And we have a call
pg_target_object_name(), which will return "foo".  We can whack around
the underlying parse tree representation all we want and still not
break anything - because any imaginable parse tree representation will
allow the object name and object namespace to be extracted.  Were that
not possible it could scarcely be called a parse tree any longer.


How do you get the fully qualified type of the first column?
col1=pg_target_get_column(x, 0)
pg_target_get_type(col1);

or something similar.

I think that could work but we would be adding a lot of API functions to 
get all the various bits of info one would want the API to expose. I 
also suspect executing triggers that had to make lots of function calls 
to walk a tree would be much slower than an extension that could just 
walk the parse-tree or some other abstract tree like structure.


Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v4

2013-01-27 Thread Steve Singer


On 13-01-22 11:30 AM, Andres Freund wrote:

Hi,

I pushed a new rebased version (the xlogreader commit made it annoying
to merge).

The main improvements are
* way much coherent code internally for intializing logical rep
* explicit control over slots
* options for logical replication



Exactly what is the syntax for using that.  My reading your changes to 
repl_gram.y make me think that any of the following should work (but 
they don't).


START_LOGICAL_REPLICATION 'slon1' 0/0 ('opt1')
 ERROR:  syntax error: unexpected character "("

"START_LOGICAL_REPLICATION 'slon1' 0/0 ('opt1' 'val1')
 ERROR:  syntax error: unexpected character "("

START_LOGICAL_REPLICATION 'slon1' 0/0 ('opt1','opt2')
ERROR:  syntax error: unexpected character "("

I'm also attaching a patch to pg_receivellog that allows you to specify 
these options on the command line.  I'm not saying I think that it is 
appropriate to be adding more bells and whistles to the utilities  two 
weeks into the CF but I found this useful for testing so I'm sharing it.






Greetings,

Andres Freund

--
  Andres Freundhttp://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services





>From 176087bacec6cbf0b86e4ffeb918f41b4a5b8d7a Mon Sep 17 00:00:00 2001
From: Steve Singer 
Date: Sun, 27 Jan 2013 12:24:33 -0500
Subject: [PATCH] allow pg_receivellog to pass plugin options from the command line to the plugin

---
 src/bin/pg_basebackup/pg_receivellog.c |   14 ++
 1 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_receivellog.c b/src/bin/pg_basebackup/pg_receivellog.c
index 04bedbe..30b3cea 100644
--- a/src/bin/pg_basebackup/pg_receivellog.c
+++ b/src/bin/pg_basebackup/pg_receivellog.c
@@ -54,7 +54,7 @@ static XLogRecPtr	startpos;
 static bool do_init_slot = false;
 static bool do_start_slot = false;
 static bool do_stop_slot = false;
-
+static const char * plugin_opts="";
 
 static void usage(void);
 static void StreamLog();
@@ -84,6 +84,7 @@ usage(void)
 	printf(_("  -s, --status-interval=INTERVAL\n"
 			 " time between status packets sent to server (in seconds)\n"));
 	printf(_("  -S, --slot=SLOTuse existing replication slot SLOT instead of starting a new one\n"));
+	printf(_("  -o --options=OPTIONS   A comma separated list of options to the plugin\n"));
 	printf(_("\nAction to be performed:\n"));
 	printf(_("  --init initiate a new replication slot (for the slotname see --slot)\n"));
 	printf(_("  --startstart streaming in a replication slot (for the slotname see --slot)\n"));
@@ -264,8 +265,8 @@ StreamLog(void)
 slot);
 
 	/* Initiate the replication stream at specified location */
-	snprintf(query, sizeof(query), "START_LOGICAL_REPLICATION '%s' %X/%X",
-			 slot, (uint32) (startpos >> 32), (uint32) startpos);
+	snprintf(query, sizeof(query), "START_LOGICAL_REPLICATION '%s' %X/%X (%s)",
+			 slot, (uint32) (startpos >> 32), (uint32) startpos,plugin_opts);
 	res = PQexec(conn, query);
 	if (PQresultStatus(res) != PGRES_COPY_BOTH)
 	{
@@ -560,6 +561,7 @@ main(int argc, char **argv)
 		{"init", no_argument, NULL, 1},
 		{"start", no_argument, NULL, 2},
 		{"stop", no_argument, NULL, 3},
+		{"options",required_argument,NULL,'o'},
 		{NULL, 0, NULL, 0}
 	};
 	int			c;
@@ -584,7 +586,7 @@ main(int argc, char **argv)
 		}
 	}
 
-	while ((c = getopt_long(argc, argv, "f:nvd:h:p:U:wWP:s:S:",
+	while ((c = getopt_long(argc, argv, "f:nvd:h:p:U:wWP:s:S:o:",
 			long_options, &option_index)) != -1)
 	{
 		switch (c)
@@ -659,6 +661,10 @@ main(int argc, char **argv)
 			case 3:
 do_stop_slot = true;
 break;
+			case 'o':
+if(optarg != NULL)
+	plugin_opts = pg_strdup(optarg);
+break;
 /* action */
 
 			default:
-- 
1.7.0.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to add a primary key using an existing index

2010-11-25 Thread Steve Singer


On 10-11-22 03:24 PM, Steve Singer wrote:

On 10-11-22 09:37 AM, Gurjeet Singh wrote:

On Sat, Nov 20, 2010 at 9:00 AM, Steve Singer 


Almost fixed.
I still get an unexpected difference.

! DETAIL: cannot create PRIMARY KEY/UNIQUE constraint with a non-unique
index.
CREATE UNIQUE INDEX rpi_idx2 ON rpi_test(a , b);
-- should fail; WITH INDEX option specified more than once.
ALTER TABLE rpi_test ADD PRIMARY KEY (a, b)
--- 35,41 
-- should fail; non-unique
ALTER TABLE rpi_test ADD primary key(a, b) WITH (INDEX = 'rpi_idx1');
ERROR: "rpi_idx1" is not a unique index
! DETAIL: Cannot create PRIMARY KEY/UNIQUE constraint using a non-unique
index.


The attached version of the patch gets your regression tests to pass.

I'm going to mark this as ready for a committer.






replace_pkey_index.revised2.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-06 Thread Steve Singer


On 10-12-06 06:56 PM, Greg Smith wrote:

Tom Lane wrote:

The various testing that's been reported so far is all for
Linux and thus doesn't directly address the question of whether other
kernels will have similar performance properties.


Survey of some popular platforms:






So my guess is that some small percentage of Windows users might notice
a change here, and some testing on FreeBSD would be useful too. That's
about it for platforms that I think anybody needs to worry about.


If you tell me which options to pgbench and which .conf file settings 
you'd like to see I can probably arrange to run some tests on AIX.






--
Greg Smith   2ndQuadrant usg...@2ndquadrant.comBaltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
"PostgreSQL 9.0 High Performance":http://www.2ndQuadrant.com/books




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-07 Thread Steve Singer


On 10-12-06 09:00 PM, Josh Berkus wrote:

Steve,


If you tell me which options to pgbench and which .conf file settings
you'd like to see I can probably arrange to run some tests on AIX.


Compile and run test_fsync in PGSRC/src/tools/fsync.



Attached are runs against two different disk sub-systems from a server 
running AIX 5.3.


The first one is against the local disks


Loops = 1

Simple write:
8k write  60812.454/second

Compare file sync methods using one write:
open_datasync 8k write  162.160/second
open_sync 8k write  158.472/second
8k write, fdatasync 158.157/second
8k write, fsync  45.382/second

Compare file sync methods using two writes:
2 open_datasync 8k writes79.472/second
2 open_sync 8k writes80.095/second
8k write, 8k write, fdatasync   159.268/second
8k write, 8k write, fsync44.725/second

Compare open_sync with different sizes:
open_sync 16k write 162.017/second
2 open_sync 8k writes79.709/second

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
8k write, fsync, close   45.361/second
8k write, close, fsync   36.311/second





The below profile is from the same machine using an IBM DS 6800 SAN for 
storage.



Loops = 1

Simple write:
8k write  75933.027/second

Compare file sync methods using one write:
open_datasync 8k write 2762.801/second
open_sync 8k write 2453.822/second
8k write, fdatasync2867.331/second
8k write, fsync1094.048/second

Compare file sync methods using two writes:
2 open_datasync 8k writes  1287.845/second
2 open_sync 8k writes  1332.084/second
8k write, 8k write, fdatasync  1966.411/second
8k write, 8k write, fsync  1048.354/second

Compare open_sync with different sizes:
open_sync 16k write2281.425/second
2 open_sync 8k writes  1401.561/second

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
8k write, fsync, close 1298.404/second
8k write, close, fsync 1188.582/second




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to add a primary key using an existing index

2011-01-16 Thread Steve Singer


I've taken a look at this version of the patch.


Submission Review

This version of the patch applies cleanly to master. It matches your git 
repo and includes test + docs.


Usability Review
---

The command syntax now matches what was discussed during the last cf.

The text of the notice:

test=# alter table a add constraint acons unique using index aind2;
NOTICE:  ALTER TABLE / ADD UNIQUE USING INDEX will rename index "aind2" 
to "acons"




Documentation
--

I've attached a patch (to be applied on top of your latest patch) with 
some editorial changes I'd recommend to your documentation.  I feel it 
reads a bit clearer (but others should chime in if they disagree or have 
better wordings)


 A git tree with changes rebased to master + this change is available 
at https://github.com/ssinger/postgres/tree/ssinger/constraint_with_index



Code Review
---

src/backend/parser/parse_utilcmd.c: 1452
Your calling strdup on the attribute name.  I don't have a good enough 
grasp on the code to be able to trace this through to where the memory 
gets free'd.  Does it get freed? Should/could this be a call to pstrdup


Feature Test
-

I wasn't able to find any issues in my testing

I'm marking this as returned with feedback pending your answer on the 
possible memory leak above but I think the patch is very close to being 
ready.



Steve Singer


diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 83d2fbb..0b486ab 100644
*** a/doc/src/sgml/ref/alter_table.sgml
--- b/doc/src/sgml/ref/alter_table.sgml
*** ALTER TABLE table_constraint_using_index
  
   
!   This form adds a new PRIMARY KEY or UNIQUE
constraint to the table using an existing index. All the columns of the
index will be included in the constraint.
   
  
   
!   The index should be UNIQUE, and should not be a partial index
!   or an expressional index.
   
  
   
!   This can be helpful in situations where one wishes to recreate or
!   REINDEX the index of a PRIMARY KEY or a
!   UNIQUE constraint, but dropping and recreating the constraint
!   to acheive the effect is not desirable. See the illustrative example below.
   
  
   
   
If a constraint name is provided then the index will be renamed to that
!   name, else the constraint will be named to match the index name.
   
  
  
--- 242,270 
  ADD table_constraint_using_index
  
   
!   This form adds a new PRIMARY KEY or UNIQUE
constraint to the table using an existing index. All the columns of the
index will be included in the constraint.
   
  
   
!   The index should be a UNIQUE index. A partial index
! 	  or an expressional index is not allowed.
   
  
   
!   Adding a constraint using an existing index can be helpful in situations 
! 	  where you wishes to rebuild an index used for a  
! 	  PRIMARY KEY or a UNIQUE constraint,
! 	  but dropping and recreating the constraint
!   is not desirable. See the illustrative example below.
   
  
   
   
If a constraint name is provided then the index will be renamed to that
!   name of the constraint. Otherwise the constraint will be named to match 
! 	  the index name.
   
  
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] log_hostname and pg_stat_activity

2011-01-18 Thread Steve Singer



Here is my review for this patch


Submission Review

-Patch applies cleanly

-Patch does not include documentation changes.  At a minimum: update the 
table that lists what pg_stat_activity and pg_stat_replication includes 
in monitoring.sgml but I propose more below.


-No tests are included but writing unit tests that depend on produce the 
same output involving the hostname of the client is not possible.



Usability review


See my comments below in the testing section. The patch does do what it 
says but the log_hostname issue is a usability issue (it not being 
obvious why you get only null owhen log_hostname is turned off). 
Documenting it might be fine.  If log_hostname were new to 9.1 I would 
encourage renaming it to something that implies it does more than just 
control logging output but I don't see this as a good enough reason to 
rename a parameter.


I think being able to see the hostnames connections in pg_stat_activity 
come from is useful and it is a feature we don't already have.




Feature Test


If my connection is authorized through a line in pg_hba that uses 
client_hostname then the column shows what I expect even with 
log_hostname set to off.


However if I connect with a line in pg_hba that matches on an IP network 
then my client_hostname is always null unless log_hostname is set to 
true.  This is consistent with the behavior you describe but I think the 
average user will find it a bit confusing.  Having a column that is 
always null unless a GUC is set is less than ideal but I understand why 
log_hostname isn't on by default.


I would be inclined to add an entry for these views in catalogs.sgml 
that where we can then give users a pointer that they need to set 
log_hostname to get anything out of this column.


If I connect through unix sockets (say psql -h /tmp --port 5433)
I was also expecting to see "[local]" in client_hostname but I am just 
getting NULL.  This this lookup is static I don't see why it should be 
dependent on log_hostname (even with log_hostname set I'm not seeing 
[local])


I have not tested this with ipv6

Performance Review
--
The lookup is done when the connection is established not each time the 
view is queried.  I don't see any performance issues



Coding Review
-

As Alvaro pointed out BackendStatusShmemSize should be updated.

To answer his question about why clientaddr works:  clientaddr is a 
SockAddr which is a structure not a pointer so the data gets copied by 
value to beentry.   That won't work for strings,  I have no issues with 
how your allocating space in beentry and then strlcpy'ing the values.


I see no issues with the implementation

I'm marking this as 'Waiting for Author' pending documentation changes 
and maybe a fix the behaviour with unix sockets


Steve



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state

2013-01-27 Thread Steve Singer


On 13-01-24 11:15 AM, Steve Singer wrote:

On 13-01-24 06:40 AM, Andres Freund wrote:


Fair enough. I am also working on a user of this infrastructure but that
doesn't help you very much. Steve Singer seemed to make some stabs at
writing an output plugin as well. Steve, how far did you get there?


I was able to get something that generated output for INSERT 
statements in a format similar to what a modified slony apply trigger 
would want.  This was with the list of tables to replicate hard-coded 
in the plugin.  This was with the patchset from the last commitfest.I 
had gotten a bit hung up on the UPDATE and DELETE support because 
slony allows you to use an arbitrary user specified unique index as 
your key.  It looks like better support for tables with a unique 
non-primary key is in the most recent patch set.  I am hoping to have 
time this weekend to update my plugin to use parameters passed in on 
the init and other updates in the most recent version.  If I make some 
progress I will post a link to my progress at the end of the weekend.  
My big issue is that I have limited time to spend on this.






A few more comments;

In decode.c DecodeDelete

+   if (r->xl_len <= (SizeOfHeapDelete + SizeOfHeapHeader))
+   {
+   elog(DEBUG2, "huh, no primary key for a delete on wal_level = 
logical?");

+   return;
+   }
+

I think we should be passing delete's with candidate key data logged to 
the plugin.  If the table isn't a replicated table then ignoring the 
delete is fine.  If the table is a replicated table but someone has 
deleted the unique index from the table then the plugin will receive 
INSERT changes on the table but not DELETE changes. If this happens the 
plugin would have any way of knowing that it is missing delete changes.  
If my plugin gets passed a DELETE change record but with no key data 
then my plugin could do any of

1.  Start screaming for help (ie log errors)
2.  Drop the table from replication
3.  Pass the delete (with no key values) onto the replication client and 
let it deal with it (see 1 and 2)


Also, 'huh' isn't one of our standard log message phrases :)


How do you plan on dealing with sequences?
I don't see my plugin being called on sequence changes and I don't see 
XLOG_SEQ_LOG listed in DecodeRecordIntoReorderBuffer.  Is there a reason 
why this can't be easily added?


Also what do we want to do about TRUNCATE support.  I could always leave 
a TRUNCATE trigger in place that logged the truncate to a sl_truncates 
and have my replication daemon respond to the insert on a   sl_truncates 
table by actually truncating the data on the replica.


I've spent some time this weekend updating my prototype plugin that 
generates slony 2.2 style COPY output.  I have attached my progress here 
(also https://github.com/ssinger/slony1-engine/tree/logical_repl).  I 
have not gotten as far as modifying slon to act as a logical log 
receiver, or made a version of the slony apply trigger that would 
process these changes.  I haven't looked into the details of what is 
involved in setting up a subscription with the snapshot exporting.


I couldn't get the options on the START REPLICATION command to parse so 
I just hard coded some list building code in the init method. I do plan 
on pasing the list of tables to replicate from the replica to the plugin 
(because this list comes from the replica).   Passing what could be a 
few thousand table names as a list of arguments is a bit ugly and I 
admit my list processing code is rough.  Does this make us want to 
reconsider the format of the option_list ?


I guess should provide an opinion on if I think that the patch in this 
CF, if committed could be used to act as a source for slony instead of 
the log trigger.



The biggest missing piece I mentioned in my email yesterday, that we 
aren't logging the old primary key on row UPDATEs.  I don't see building 
a credible replication system where you don't allow users to update any 
column of a row.


The other issues I've raised (DecodeDelete hiding bad deletes, 
replication options not parsing for me) look like easy fixes


 no wal decoding support for sequences or truncate are things that I 
could work around by doing things much like slony does today.  The SYNC 
can still capture the sequence changes in  a table (where the INSERT's 
would be logged) and I can have a trigger capture truncates.


I mostly did this review from the point of view of someone trying to use 
the feature, I haven't done a line-by-line review of the code.


I suspect Andres can address these issues and get an updated patch out 
during this CF.   I think a more detailed code review by someone more 
familiar with postgres internals will reveal a handful of other issues 
that hopefully can be fixed without a lot of effort. If this were the 
only patch in the commitfest I would encourage Andres to push t

1 2 3 >

1 - 100 of 223 matches

Mail list logo