Re: [HACKERS] pglogical - logical replication contrib module
On 01/15/2016 12:07 PM, Petr Jelinek wrote: That's bug, fixed. Can you posted an updated patch with whatever fixes you have so far made? There are several statuses the table goes through, during the COPY it's in synchronizing status, so next logical step seemed to be synchronized. Maybe it should be renamed to 'replicating' instead as that's what it actually means (table has finished synchronization and is now replicating normally). I agree 'replicating' is clearer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pglogical - logical replication contrib module
The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: not tested Spec compliant: not tested Documentation:not tested This reply will covers a 10,000 foot level review of the feature (some of my other replies to the thread cover specifics that came up in testing and code level review will come later) 1) Do we want logical replication in core/contrib 10 year ago a popular feeling in the postgresql project was that replication didn't belong in core because there were too many different styles. People then went on to complain that it were too many replication projects to choose from and that they were hard to use and had lots of corner cases. The evolution of WAL based replication showed us how popular in-core replication is. Users like being able to use in-core features and our community process tends to produce better quality in-core solutions than external projects. I am of the opinion that if we can come up with a solution that meets some common use cases then it would be good to have those features in core/contrib. At this stage I am not going to get into a discussion of a contrib extension versus built in as not an extension. I don't think a single replication solution will ever meet all use-cases. I feel that the extensible infrastructure we have so far built for logical replication means that people who want to develop solutions for use-cases not covered will be in a good position. This doesn't mean we can't or shouldn't try to cover some use cases in core. 2) Does this patch provide a set of logical replication features that meet many popular use-cases Below I will review some use-cases and try to assess how pglogical meets them. ** Streaming Postgresql Upgrade pg_upgrade is great for many situations but sometimes you don't want an in place upgrade but you want a streaming upgrade. Possibly because you don't want application downtime but instead you just want to point your applications at the upgraded database server in a controlled manner. Othertimes you might want an option of upgrading to a newer version of PG but maintain the option of having to rollback to the older version if things go badly. I think pglogical should be able to handle this use case pretty well (assuming the source version of PG is actually new enough to include pglogical). Support for replicating sequences would need to be added before this is as smooth but once sequence support was added I think this would work well. I also don't see any reason why you couldn't replicate from 9.7 -> 9.6 thought since the wire format is abstracted from the internal representation. This is of course dependent not the application not doing anything that is inherently in-compatible between the two versions ** Query only replicas (with temp tables or additional indexes) Sometimes you want a replica for long running or heavy queries. Requirements for temp tables, additional indexes or maybe the effect on vacuum means that our existing WAL based replicas are unsuitable. I think pglogical should be able to handle this use case pretty well with the caveat being that your replica is an asynchronous replica and will always lag the origin by some amount. ** Replicating a subset of tables into a different database Sometimes you wan to replicate a handful of tables from one database to another database. Maybe the first database is the system of record for the data and the second database needs an up to date copy for querying. Pglogical should meet this use case pretty well, it has flexible support for selecting which tables get replicated from which source. Pglogical doesn't have any facilities to rename the tables between the origin and replica but they could be added later. ** Sharding Systems that do application level sharding (or even sharding with a fdw) often have non-sharded tables that need to be available on all shards for relational integrity or joins. Logical replication is one way to make sure that the replicated data gets to all the shards. Sharding systems also sometimes want to take the data from individual shards and replicate it to a consolidation server for reporting purposes. Pglogical seems to meet this use case, I guess you would have a designated origin for the shared data/global data that all shards would subscribe to with a set containing the designated data. For the consolidation use case you would have the consolidation server subscribe to all shards I am less clear about how someone would want DDL changes to work for these cases. The DDL support in the patch is pretty limited so I am not going to think much now about how we would want DDL to work. ** Schema changes involving rewriting big tables Sometimes you have a DDL change on a large table that will involve a table rewrite and the best way of deploying the change is to make the DDL ch
Re: [HACKERS] pglogical - logical replication contrib module
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: tested, failed Spec compliant: not tested Documentation:tested, failed Here is some more review +- `pglogical.replication_set_add_table(set_name name, table_name regclass, synchronize boolean)` + Adds a table to replication set. + + Parameters: + - `set_name` - name of the existing replication set + - `table_name` - name or OID of the table to be added to the set + - `synchronize` - if true, the table data is synchronized on all subscribers +which are subscribed to given replication set, default false + The argument to this function is actually named "relation" not "table_name" though we might want to update the function to name the argument table_name. Also we don't explain what 'synchronize' means I first thought that a value of false would mean that existing data won't be copied but any new changes will be. A value of false actually seems to mean that nothing will happen with the table until the synchronize function is manually called. We seem to be using the word 'synchronize' in different sense in different places I find it confusing (ie synchronize_data and syncronize_structure in create_subscription). *** a/contrib/pglogical/pglogical_sync.c --- b/contrib/pglogical/pglogical_sync.c + static void + dump_structure(PGLogicalSubscription *sub, const char *snapshot) + { + charpg_dump[MAXPGPATH]; + uint32 version; + int res; + StringInfoData command; + + if (find_other_exec_version(my_exec_path, PGDUMP_BINARY, &version, pg_dump)) + elog(ERROR, "pglogical subscriber init failed to find pg_dump relative to binary %s", +my_exec_path); + + if (version / 100 != PG_VERSION_NUM / 100) + elog(ERROR, "pglogical subscriber init found pg_dump with wrong major version %d.%d, expected %d.%d", +version / 100 / 100, version / 100 % 100, +PG_VERSION_NUM / 100 / 100, PG_VERSION_NUM / 100 % 100); + + initStringInfo(&command); + #if PG_VERSION_NUM < 90500 + appendStringInfo(&command, "%s --snapshot=\"%s\" -s -N %s -N pglogical_origin -F c -f \"/tmp/pglogical-%d.dump\" \"%s\"", + #else + appendStringInfo(&command, "%s --snapshot=\"%s\" -s -N %s -F c -f \"/tmp/pglogical-%d.dump\" \"%s\"", 1) I am not sure we can assume/require that the pg_dump binary be in the same location as the postgres binary. I don't know think we've ever required that client binaries (ie psql, pg_dump, pg_restore ...) be in the same directory as postgres. pg_upgrade does require this so maybe this isn't a problem in practice but I thought I'd point it out. Ideally wouldn't need to call an external program to get a schema dump but turning pg_dump into a library is beyond the scope of this patch. 2) I don't think we can hard-coded /tmp as the directory for the schema dump. I don't think will work on most windows systems and even on a unix system $TMPDIR might be set to something else. Maybe writing this into pgsql_tmp would be a better choice. Furtherdown in pglogical_sync_subscription(PGLogicalSubscription *sub) + switch (status) + { + /* Already synced, nothing to do except cleanup. */ + case SYNC_STATUS_READY: + MemoryContextDelete(myctx); + return; + /* We can recover from crashes during these. */ + case SYNC_STATUS_INIT: + case SYNC_STATUS_CATCHUP: + break; + default: + elog(ERROR, +"subscriber %s initialization failed during nonrecoverable step (%c), please try the setup again", +sub->name, status); + break; + } I think the default case needs to do something to unregister the background worker. We already discussed trying to get the error message to a user in a better way either way there isn't any sense in this background worker being launched again if the error is nonrecoverable. + + tables = copy_replication_sets_data(sub->origin_if->dsn, + sub->target_if->dsn, + snapshot, + sub->replication_sets); + + /* Store info about all the synchronized tables. */ + StartTransactionCommand(); + foreach (lc, tables) Shouldn't we be storing the info about the synchronized tables as part of the same transaction that does the sync? I'll keeping going through the code as I have time. I think it is appropriate to move this to the next CF since the CF is past the end date and the patch has received some review. When you have an updated version of the patch post it, don't wait until March. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpre
Re: [HACKERS] pglogical - logical replication contrib module
On 01/26/2016 10:43 AM, Craig Ringer wrote: On 23 January 2016 at 11:17, Steve Singer <mailto:st...@ssinger.info>> wrote: ** Schema changes involving rewriting big tables Sometimes you have a DDL change on a large table that will involve a table rewrite and the best way of deploying the change is to make the DDL change on a replicate then once it is finished promote the replica to the origin in some controlled fashion. This avoids having to lock the table on the origin for hours. pglogical seems to allow minor schema changes on the replica such as changing a type but it doesn't seem to allow a DO INSTEAD trigger on the replica. I don't think pglogical currently meets this use case particularly well I'm not sure I fully understand that one. Say you have a table A with column b and the next version of your application you want to create a second table B that has column B create table B (b text); insert into B select b from a; alter table a drop column b; but you want to do this on a replica because it is a very big table and you want to minimize downtown. You could have a trigger on the replica that performed updates on B.b instead of A except triggers don't seem to get run on the replica. Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication WIP
On 08/31/2016 04:51 PM, Petr Jelinek wrote: Hi, and one more version with bug fixes, improved code docs and couple more tests, some general cleanup and also rebased on current master for the start of CF. To get the 'subscription' TAP tests to pass I need to set export PGTZ=+02 Shouldn't the expected output be with reference to PST8PDT? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication WIP
On 09/05/2016 03:58 PM, Steve Singer wrote: On 08/31/2016 04:51 PM, Petr Jelinek wrote: Hi, and one more version with bug fixes, improved code docs and couple more tests, some general cleanup and also rebased on current master for the start of CF. A few more things I noticed when playing with the patches 1, Creating a subscription to yourself ends pretty badly, the 'CREATE SUBSCRIPTION' command seems to get stuck, and you can't kill it. The background process seems to be waiting for a transaction to commit (I assume the create subscription command). I had to kill -9 the various processes to get things to stop. Getting confused about hostnames and ports is a common operator error. 2. Failures during the initial subscription aren't recoverable For example on db1 create table a(id serial4 primary key,b text); insert into a(b) values ('1'); create publication testpub for table a; on db2 create table a(id serial4 primary key,b text); insert into a(b) values ('1'); create subscription testsub connection 'host=localhost port=5440 dbname=test' publication testpub; I then get in my db2 log ERROR: duplicate key value violates unique constraint "a_pkey" DETAIL: Key (id)=(1) already exists. LOG: worker process: logical replication worker 16396 sync 16387 (PID 10583) exited with exit code 1 LOG: logical replication sync for subscription testsub, table a started ERROR: could not crate replication slot "testsub_sync_a": ERROR: replication slot "testsub_sync_a" already exists LOG: worker process: logical replication worker 16396 sync 16387 (PID 10585) exited with exit code 1 LOG: logical replication sync for subscription testsub, table a started ERROR: could not crate replication slot "testsub_sync_a": ERROR: replication slot "testsub_sync_a" already exists and it keeps looping. If I then truncate "a" on db2 it doesn't help. (I'd expect at that point the initial subscription to work) If I then do on db2 drop subscription testsub cascade; I still see a slot in use on db1 select * FROM pg_replication_slots ; slot_name| plugin | slot_type | datoid | database | active | active_pid | xmin | catalog_xmin | rest art_lsn | confirmed_flush_lsn +--+---++--+++--+--+- +- testsub_sync_a | pgoutput | logical | 16384 | test | f || | 1173 | 0/15 66E08 | 0/1566E40 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication WIP
On 09/08/2016 06:59 PM, Petr Jelinek wrote: - the CREATE SUBSCRIPTION also tries to check if the specified connection connects back to same db (although that check is somewhat imperfect) and if it gets stuck on create slot it should be normally cancelable (that should solve the issue Steve Singer had) When I create my subscriber database by doing a physical backup of the publisher cluster (with cp before I add any data) then I am unable to connect subscribe. ie initdb ../data cp -r ../data ../data2 ./postgres -D ../data ./postgres -D ../data2 This make sense when I look at your code, but it might not be what we want I had the same issue when I created my subscriber cluster with pg_basebackup (The timeline on the destination cluster still shows as 1) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication WIP
On 09/08/2016 06:59 PM, Petr Jelinek wrote: Hi, Updated version, this should address most of the things in Peter's reviews so far, not all though as some of it needs more discussion. Another bug report. I had subscribed a subscriber database to a publication with 1 table create table a (a serial4 primary key, b text); * I then dropped column b on the subscriber * inserted some rows on the publisher * Noticed the expected error about column b not existing in the subscriber log * Added column c on the subscriber, then added column b after column C I now get the following stack trace #1 0x007dc8f9 in cstring_to_text ( s=0x16f238af0 ) at varlena.c:152 #2 0x008046a3 in InputFunctionCall ( flinfo=flinfo@entry=0x7fffa02d0250, str=str@entry=0x16f238af0 0x16f238af0>, typioparam=typioparam@entry=25, typmod=typmod@entry=-1) at fmgr.c:1909 #3 0x00804971 in OidInputFunctionCall (functionId=, str=0x16f238af0 , typioparam=25, typmod=-1) at fmgr.c:2040 #4 0x006aa485 in SlotStoreCStrings (slot=0x2748670, values=0x7fffa02d0330) at apply.c:569 #5 0x006ab45c in handle_insert (s=0x274d088) at apply.c:756 #6 0x006abcea in handle_message (s=0x7fffa02d3e20) at apply.c:978 #7 LogicalRepApplyLoop (last_received=117457680) at apply.c:1146 #8 0x006ac37e in ApplyWorkerMain (main_arg=) at apply.c:1530 In SlotStoreCStrings values only has 2 elements but natts is 4 Changes: - I moved the publication.c to pg_publication.c, subscription.c to pg_subscription.c. - changed \drp and \drs to \dRp and \dRs - fixed definitions of the catalogs (BKI_ROWTYPE_OID) - changed some GetPublication calls to get_publication_name - fixed getObjectIdentityParts for OCLASS_PUBLICATION_REL - fixed get_object_address_publication_rel - fixed the dependencies between pkeys and publications, for this I actually had to add new interface to depenency.c that allows dropping single dependency - fixed the 'for all tables' and 'for tables all in schema' publications - changed the alter publication from FOR to SET - added more test cases for the publication DDL - fixed compilation of subscription patch alone and docs - changed subpublications to name[] - added check for publication list duplicates - made the subscriptions behave more like they are inside the database instead of shared catalog (even though the catalog is still shared) - added options for for CREATE SUBSCRIPTION to optionally not create slot and not do the initial data sync - that should solve the complaint about CREATE SUBSCRIPTION always connecting - the CREATE SUBSCRIPTION also tries to check if the specified connection connects back to same db (although that check is somewhat imperfect) and if it gets stuck on create slot it should be normally cancelable (that should solve the issue Steve Singer had) - fixed the tests to work in any timezone - added DDL regress tests for subscription - added proper detection of missing schemas and tables on subscriber - rebased on top of 19acee8 as the DefElem changes broke the patch The table sync is still far from ready. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication WIP
On Tue, 20 Sep 2016, Peter Eisentraut wrote: On 9/18/16 4:17 PM, Steve Singer wrote: I think if we want to prevent the creation of subscriptions that point to self, then we need to create a magic token when the postmaster starts and check for that when we connect. So more of a running-instance identifier instead of a instance-on-disk identifier. The other option is that we just allow it and make it more robust. I think we should go with the second option for now. I feel that the effort is better spent making sure that initial syncs that have don't subscribe (for whatever reasons) can be aborted instead of trying to build a concept of node identity before we really need it. Steve -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench regression test failure
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: not tested Spec compliant: not tested Documentation:not tested This causes the pgbench tests to fail (consistently) with not ok 194 - pgbench late throttling stdout /(?^:processed: [01]/10)/ When I run pgbench manually I get (-t 10 --rate=10 --latency-limit=1 -n -r) number of transactions actually processed: 10/10 number of transactions skipped: 10 (100.000 %) Prior to the patch I was getting. number of transactions actually processed: 0/10 number of transactions skipped: 10 (100.000 %) @@ -3539,7 +3542,7 @@ printResults(TState *threads, StatsData *total, instr_time total_time,^M {^M printf("number of transactions per client: %d\n", nxacts);^M printf("number of transactions actually processed: " INT64_FORMAT "/%d\n",^M - total->cnt - total->skipped, nxacts * nclients);^M + total->cnt, nxacts * nclients);^M I think you want ntx instead of total->cnt here. The new status of this patch is: Waiting on Author -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication WIP
On 10/24/2016 09:22 AM, Petr Jelinek wrote: Hi, attached is updated version of the patch. There are quite a few improvements and restructuring, I fixed all the bugs and basically everything that came up from the reviews and was agreed on. There are still couple of things missing, ie column type definition in protocol and some things related to existing data copy. Here are a few things I've noticed so far. + +CREATE SUBSCRIPTION mysub WITH CONNECTION dbname=foo host=bar user=repuser PUBLICATION mypub; + + + The documentation above doesn't match the syntax, CONNECTION needs to be in single quotes not double quotes I think you want + +CREATE SUBSCRIPTION mysub WITH CONNECTION 'dbname=foo host=bar user=repuser' PUBLICATION mypub; + + + I am not sure if this is a known issue covered by your comments about data copy but I am still having issues with error reporting on a failed subscription. I created a subscription, dropped the subscription and created a second one. The second subscription isn't active but shows no errors. P: create publication mypub for table public.a; S: create subscription mysub with connection 'dbname=test host=localhost port=5440' publication mypub; P: insert into a(b) values ('t'); S: select * FROM a; a | b ---+--- 1 | t (1 row) Everything is good Then I do S: drop subscription mysub; S: create subscription mysub2 with connection 'dbname=test host=localhost port=5440' publication mypub; P: insert into a(b) values ('f'); S: select * FROM a; a | b ---+--- 1 | t The data doesn't replicate select * FROM pg_stat_subscription; subid | subname | pid | relid | received_lsn | last_msg_send_time | last_msg_receipt_time | latest_end_lsn | latest_end_time ---+-+-+---+--++---++- 16398 | mysub2 | | | | | || (1 row) The only thing in my log is 2016-10-30 15:27:27.038 EDT [6028] NOTICE: dropped replication slot "mysub" on publisher 2016-10-30 15:27:36.072 EDT [6028] NOTICE: created replication slot "mysub2" on publisher 2016-10-30 15:27:36.082 EDT [6028] NOTICE: synchronized table states I'd expect an error in the log or something. However, if I delete everything from the table on the subscriber then the subscription proceeds I think there are still problems with signal handling in the initial sync If I try to drop mysub2 (while the subscription is stuck instead of deleting the data) the drop hangs If I then try to kill the postmaster for the subscriber nothing happens, have to send it a -9 to go away. However once I do that and then restart the postmaster for the subscriber I start to see the duplicate key errors in the log 2016-10-30 16:00:54.635 EDT [7018] ERROR: duplicate key value violates unique constraint "a_pkey" 2016-10-30 16:00:54.635 EDT [7018] DETAIL: Key (a)=(1) already exists. 2016-10-30 16:00:54.635 EDT [7018] CONTEXT: COPY a, line 1 2016-10-30 16:00:54.637 EDT [7007] LOG: worker process: logical replication worker 16400 sync 16387 (PID 7018) exited with exit code 1 I'm not sure why I didn't get those until I restarted the postmaster but it seems to happen whenever I drop a subscription then create a new one. Creating the second subscription from the same psql session as I create/drop the first seems important in reproducing this I am also having issues dropping a second subscription from the same psql session (table a is empty on both nodes to avoid duplicate key errors) S: create subscription sub1 with connection 'host=localhost dbname=test port=5440' publication mypub; S: create subscription sub2 with connection 'host=localhost dbname=test port=5440' publication mypub; S: drop subscription sub1; S: drop subscription sub2; At this point the drop subscription hangs. The biggest changes are: I added one more prerequisite patch (the first one) which adds ephemeral slots (or well implements UI on top of the code that was mostly already there). The ephemeral slots are different in that they go away either on error or when session is closed. This means the initial data sync does not have to worry about cleaning up the slots after itself. I think this will be useful in other places as well (for example basebackup). I originally wanted to call them temporary slots in the UI but since the behavior is bit different from temp tables I decided to go with what the underlying code calls them in UI as well. I also split out the libpqwalreceiver rewrite to separate patch which does just the re-architecture and does not really add new functionality. And I did the re-architecture bit differently based on the review. There is now new executor module in execReplication.c, no new nodes but several utility commands. I moved there the tuple lookup functions from apply and also wrote new interfaces for doing inserts/updates/deletes to a
Re: [HACKERS] Logical Replication WIP
On 10/31/2016 06:38 AM, Petr Jelinek wrote: There are some fundamental issues with initial sync that need to be discussed on list but this one is not known. I'll try to convert this to test case (seems like useful one) and fix it, thanks for the report. In meantime I realized I broke the last patch in the series during rebase so attached is the fixed version. It also contains the type info in the protocol. I don't know if this is covered by the known initial_sync problems or not If I have a 'all tables' publication and then create a new table the data doesn't seem to replicate to the new table. P: create table a(a serial4 primary key, b text); S: create table a(a serial4 primary key, b text); P: create publication mypub for all tables; S: create subscription mysub connection 'host=localhost dbname=test port=5441' publication mypub; P: create table b(a serial4 primary key, b text); P: insert into b(b) values ('foo2'); P: insert into a(b) values ('foo3'); Then I check my subscriber select * FROM a; a | b ---+-- 1 | foo 2 | foo3 (2 rows) test=# select * FROM b; a | b ---+--- (0 rows) However, if the table isn't on the subscriber I do get an error: ie P: create table c(a serial4 primary key, b text); P: insert into c(b) values('foo'); 2016-11-05 11:49:31.456 EDT [14938] FATAL: the logical replication target public.c not found 2016-11-05 11:49:31.457 EDT [13703] LOG: worker process: logical replication worker 16457 (PID 14938) exited with exit code 1 but if then add the table S: create table c(a serial4 primary key, b text); 2016-11-05 11:51:08.583 EDT [15014] LOG: logical replication apply for subscription mysub started but the data doesn't replicate to table c either. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication WIP
On 10/31/2016 06:38 AM, Petr Jelinek wrote: On 31/10/16 00:52, Steve Singer wrote: There are some fundamental issues with initial sync that need to be discussed on list but this one is not known. I'll try to convert this to test case (seems like useful one) and fix it, thanks for the report. In meantime I realized I broke the last patch in the series during rebase so attached is the fixed version. It also contains the type info in the protocol. Attached are some proposed documentation updates (to be applied ontop of your 20161031 patch set) Also Publication + +The tables are matched using fully qualified table name. Renaming of +tables or schemas is not supported. + Is renaming of tables any less supported than other DDL operations For example alter table nokey2 rename to nokey3 select * FROM pg_publication_tables ; pubname | schemaname | tablename -++--- tpub| public | nokey3 (1 row) If I then kill the postmaster on my subscriber and restart it, I get 2016-11-13 16:17:11.341 EST [29488] FATAL: the logical replication target public.nokey3 not found 2016-11-13 16:17:11.342 EST [29272] LOG: worker process: logical replication worker 41076 (PID 29488) exited with exit code 1 2016-11-13 16:17:16.350 EST [29496] LOG: logical replication apply for subscription nokeysub started 2016-11-13 16:17:16.358 EST [29498] LOG: logical replication sync for subscription nokeysub, table nokey2 started 2016-11-13 16:17:16.515 EST [29498] ERROR: table public.nokey2 not found on publisher 2016-11-13 16:17:16.517 EST [29272] LOG: worker process: logical replication worker 41076 sync 24688 (PID 29498) exited with exit code 1 but if I then rename the table on the subscriber everything seems to work. (I suspect the need to kill+restart is a bug, I've seen other instances where a hard restart of the subscriber following changes to is required) I am also having issues adding a table to a publication ,it doesn't seem work P: create publication tpub for table a; S: create subscription mysub connection 'host=localhost dbname=test port=5440' publication tpub; P: insert into a(b) values ('1'); P: alter publication tpub add table b; P: insert into b(b) values ('1'); P: insert into a(b) values ('2'); select * FROM pg_publication_tables ; pubname | schemaname | tablename -++--- tpub| public | a tpub| public | b but S: select * FROM b; a | b ---+--- (0 rows) S: select * FROM a; a | b ---+--- 5 | 1 6 | 2 (2 rows) diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml new file mode 100644 index 63af1a5..89723c3 *** a/doc/src/sgml/logical-replication.sgml --- b/doc/src/sgml/logical-replication.sgml *** *** 95,110 how the table is accessed. Each table can be added to multiple publications if needed. Publications may currently only contain tables. Objects must be added explicitly, except when a publication ! is created for ALL TABLES. There is no default name for ! a publication which specifies all tables. Publications can choose to limit the changes they produce to show any combination of INSERT, UPDATE and DELETE in a similar way to the way triggers are fired by ! particular event types. Only tables with a REPLICA IDENTITY ! index can be added to a publication which replicates UPDATE ! and DELETE operation. The definition of a publication object will be included within --- 95,110 how the table is accessed. Each table can be added to multiple publications if needed. Publications may currently only contain tables. Objects must be added explicitly, except when a publication ! is created for ALL TABLES. Publications can choose to limit the changes they produce to show any combination of INSERT, UPDATE and DELETE in a similar way to the way triggers are fired by ! particular event types. If a table with without a REPLICA IDENTITY ! index is added to a publication which replicates UPDATE or ! DELETE operations then subsequent UPDATE ! or DELETE operations will fail on the publisher. The definition of a publication object will be included within *** *** 195,214 A conflict will produce an error and will stop the replication; it ! must be resolved manually by the user. ! The resolution can be done either by changing data on the subscriber ! so that it does not conflict with incoming change or by skipping the ! transaction that conflicts with the existing data. The transaction ! can be skipped by calling the ! ! pg_replication_origin_advance() function ! with a node_name corresponding to the subscription name. T
Re: [HACKERS] Logical Replication WIP
On Sun, 20 Nov 2016, Petr Jelinek wrote: On 13/11/16 23:02, Steve Singer wrote: There is one exception though: *** 195,214 A conflict will produce an error and will stop the replication; it ! must be resolved manually by the user. ! The resolution can be done either by changing data on the subscriber ! so that it does not conflict with incoming change or by skipping the ! transaction that conflicts with the existing data. The transaction ! can be skipped by calling the ! ! pg_replication_origin_advance() function ! with a node_name corresponding to the subscription name. The ! current position of origins can be seen in the ! ! pg_replication_origin_status system view. ! I don't see why this needs to be removed? Maybe it could be improved but certainly not removed? Sorry, I was confused. I noticed that the function was missing in the patch and thought it was documentation for a function that you had removed from recent versions of the patch versus referencing a function that is already committed. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 9.4 logical replication - walsender keepalive replies
In 9.4 we've the below block of code to walsender.c as /* * We only send regular messages to the client for full decoded * transactions, but a synchronous replication and walsender shutdown * possibly are waiting for a later location. So we send pings * containing the flush location every now and then. */ if (MyWalSnd->flush < sentPtr && !waiting_for_ping_response) { WalSndKeepalive(true); waiting_for_ping_response = true; } I am finding that my logical replication reader is spending a tremendous amount of time sending feedback to the server because a keep alive reply was requested. My flush pointer is smaller than sendPtr, which I see as the normal case (The client hasn't confirmed all the wal it has been sent). My client queues the records it receives and only confirms when actually processes the record. So the sequence looks something like Server Sends LSN 0/1000 Server Sends LSN 0/2000 Server Sends LSN 0/3000 Client confirms LSN 0/2000 I don't see why all these keep alive replies are needed in this case (the timeout value is bumped way up, it's the above block that is triggering the reply request not something related to timeout) Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 9.4 pg_control corruption
I've encountered a corrupt pg_control file on my 9.4 development cluster. I've mostly been using the cluster for changeset extraction / slony testing. This is a 9.4 (currently commit 6ad903d70a440e + a walsender change discussed in another thread) but would have had the initdb done with an earlier 9.4 snapshot. /usr/local/pgsql94wal/bin$ ./pg_controldata ../data WARNING: Calculated CRC checksum does not match value stored in file. Either the file is corrupt, or it has a different layout than this program is expecting. The results below are untrustworthy. pg_control version number:937 Catalog version number: 201405111 Database system identifier: 6014096177254975326 Database cluster state: in production pg_control last modified: Tue 08 Jul 2014 06:15:58 PM EDT Latest checkpoint location: 5/44DC5FC8 Prior checkpoint location:5/44C58B88 Latest checkpoint's REDO location:5/44DC5FC8 Latest checkpoint's REDO WAL file:000100050044 Latest checkpoint's TimeLineID: 1 Latest checkpoint's PrevTimeLineID: 1 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 0/1558590 Latest checkpoint's NextOID: 505898 Latest checkpoint's NextMultiXactId: 3285 Latest checkpoint's NextMultiOffset: 6569 Latest checkpoint's oldestXID:1281 Latest checkpoint's oldestXID's DB: 1 Latest checkpoint's oldestActiveXID: 0 Latest checkpoint's oldestMultiXid: 1 Latest checkpoint's oldestMulti's DB: 1 Time of latest checkpoint:Tue 08 Jul 2014 06:15:23 PM EDT Fake LSN counter for unlogged rels: 0/1 Minimum recovery ending location: 0/0 Min recovery ending loc's timeline: 0 Backup start location:0/0 Backup end location: 0/0 End-of-backup record required:no Current wal_level setting:logical Current wal_log_hints setting:off Current max_connections setting: 200 Current max_worker_processes setting: 8 Current max_prepared_xacts setting: 0 Current max_locks_per_xact setting: 64 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment:16777216 Maximum length of identifiers:64 Maximum columns in an index: 32 Maximum size of a TOAST chunk:1996 Size of a large-object chunk: 65793 Date/time type storage: floating-point numbers Float4 argument passing: by reference Float8 argument passing: by reference Data page checksum version: 2602751502 ssinger@ssinger-laptop:/usr/local/pgsql94wal/bin$ Before this postgres crashed, and seemed to have problems recovering. I might have hit CTRL-C but I didn't do anything drastic like issue a kill -9. test1 [unknown] 2014-07-08 18:15:18.986 EDTFATAL: the database system is in recovery mode test1 [unknown] 2014-07-08 18:15:20.482 EDTWARNING: terminating connection because of crash of another server process test1 [unknown] 2014-07-08 18:15:20.482 EDTDETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. test1 [unknown] 2014-07-08 18:15:20.482 EDTHINT: In a moment you should be able to reconnect to the database and repeat your command. 2014-07-08 18:15:20.483 EDTLOG: all server processes terminated; reinitializing 2014-07-08 18:15:20.720 EDTLOG: database system was interrupted; last known up at 2014-07-08 18:15:15 EDT 2014-07-08 18:15:20.865 EDTLOG: database system was not properly shut down; automatic recovery in progress 2014-07-08 18:15:20.954 EDTLOG: redo starts at 5/41023848 2014-07-08 18:15:23.153 EDTLOG: unexpected pageaddr 4/D8DC6000 in log segment 000100050044, offset 14442496 2014-07-08 18:15:23.153 EDTLOG: redo done at 5/44DC5F60 2014-07-08 18:15:23.153 EDTLOG: last completed transaction was at log time 2014-07-08 18:15:17.874937-04 test2 [unknown] 2014-07-08 18:15:24.247 EDTFATAL: the database system is in recovery mode test2 [unknown] 2014-07-08 18:15:24.772 EDTFATAL: the database system is in recovery mode test2 [unknown] 2014-07-08 18:15:25.281 EDTFATAL: the database system is in recovery mode test1 [unknown] 2014-07-08 18:15:25.547 EDTFATAL: the database system is in recovery mode test2 [unknown] 2014-07-08 18:15:25.548 EDTFATAL: the database system is in recovery mode test3 [unknown] 2014-07-08 18:15:25.549 EDTFATAL: the database system is in recovery mode test4 [unknown] 2014-07-08 18:15:25.557 EDTFATAL: the database system is in recovery mode test5 [unknown] 2014-07-08 18:15:25.582 EDTFATAL: the database system is in recovery mode test2 [unknown] 2014-07-08 18:15:25.584 EDTFATAL: the database system is in recove
Re: [HACKERS] 9.4 pg_control corruption
On 07/08/2014 10:14 PM, Tom Lane wrote: Steve Singer writes: I've encountered a corrupt pg_control file on my 9.4 development cluster. I've mostly been using the cluster for changeset extraction / slony testing. This is a 9.4 (currently commit 6ad903d70a440e + a walsender change discussed in another thread) but would have had the initdb done with an earlier 9.4 snapshot. Somehow or other you missed the update to pg_control version number 942. There's no obvious reason to think that this pg_control file is corrupt on its own terms, but the pg_controldata version you're using expects the 942 layout. The fact that the server wasn't complaining about this suggests that you've not recompiled the server, or at least not xlog.c. Possibly the odd failure to restart indicates that you have a partially updated server executable? The server is complaining about that, it started to after the crash (which is why I ran pg_controldata) ssinger@ssinger-laptop:/usr/local/pgsql94wal/bin$ ./postgres -D ../data 2014-07-08 22:28:57.796 EDTFATAL: database files are incompatible with server 2014-07-08 22:28:57.796 EDTDETAIL: The database cluster was initialized with PG_CONTROL_VERSION 937, but the server was compiled with PG_CONTROL_VERSION 942. 2014-07-08 22:28:57.796 EDTHINT: It looks like you need to initdb. ssinger@ssinger-laptop:/usr/local/pgsql94wal/bin$ The server seemed fine (and it was 9.4 because I was using 9.4 features) The server crashed The server performed crash recovery The server server wouldn't start and pg_controldata shows the attached output I wasn't recompiling or reinstalling around this time either. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 logical replication - walsender keepalive replies
On 07/06/2014 10:11 AM, Andres Freund wrote: Hi Steve, Right. I thought about this for a while, and I think we should change two things. For one, don't request replies here. It's simply not needed, as this isn't dealing with timeouts. For another don't just check ->flush < sentPtr but also && ->write < sentPtr. The reason we're sending these feedback messages is to inform the 'logical standby' that there's been WAL activity which it can't see because they don't correspond to anything that's logically decoded (e.g. vacuum stuff). Would that suit your needs? Greetings, Yes I think that will work for me. I tested with the attached patch that I think does what you describe and it seems okay. Andres Freund diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c new file mode 100644 index 3189793..844a5de *** a/src/backend/replication/walsender.c --- b/src/backend/replication/walsender.c *** WalSndWaitForWal(XLogRecPtr loc) *** 1203,1211 * possibly are waiting for a later location. So we send pings * containing the flush location every now and then. */ ! if (MyWalSnd->flush < sentPtr && !waiting_for_ping_response) { ! WalSndKeepalive(true); waiting_for_ping_response = true; } --- 1203,1213 * possibly are waiting for a later location. So we send pings * containing the flush location every now and then. */ ! if (MyWalSnd->flush < sentPtr && ! MyWalSnd->write < sentPtr && ! !waiting_for_ping_response) { ! WalSndKeepalive(false); waiting_for_ping_response = true; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 logical replication - walsender keepalive replies
On 07/14/2014 01:19 PM, Steve Singer wrote: On 07/06/2014 10:11 AM, Andres Freund wrote: Hi Steve, Right. I thought about this for a while, and I think we should change two things. For one, don't request replies here. It's simply not needed, as this isn't dealing with timeouts. For another don't just check ->flush < sentPtr but also && ->write < sentPtr. The reason we're sending these feedback messages is to inform the 'logical standby' that there's been WAL activity which it can't see because they don't correspond to anything that's logically decoded (e.g. vacuum stuff). Would that suit your needs? Greetings, Yes I think that will work for me. I tested with the attached patch that I think does what you describe and it seems okay. Any feedback on this? Do we want that change for 9.4, or do we want something else? Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 9.4 logical decoding assertion
I hit the following on 9.4 testing logical decoding. TRAP: FailedAssertion("!(prev_first_lsn < cur_txn->first_lsn)", File: "reorderbuffer.c", Line: 618) LOG: server process (PID 3801) was terminated by signal 6: Aborted Unfortunately I don't have a core file and I haven't been able to reproduce this. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 logical decoding assertion
On 08/15/2014 09:42 AM, Andres Freund wrote: On 2014-08-14 16:03:08 -0400, Steve Singer wrote: I hit the following on 9.4 testing logical decoding. TRAP: FailedAssertion("!(prev_first_lsn < cur_txn->first_lsn)", File: "reorderbuffer.c", Line: 618) LOG: server process (PID 3801) was terminated by signal 6: Aborted I saw that recently while hacking around, but I thought it was because of stuff I'd added. But apparently not. Hm. I think I see how that might happen. It might be possible (and harmless) if two subxacts of the same toplevel xact have the same first_lsn. But if it's not just <= vs < it'd be worse. Unfortunately I don't have a core file and I haven't been able to reproduce this. Any information about the workload? Any chance you still have the data directory around? I was running the slony regression tests but I ran the same tests script after a number of times after and the problem didn't reproduce itself. The last thing the tests did before the crash was part of the slony failover process. I am doing my testing running with all 5 nodes/databases under the same postmaster (giving something like 20 replication slots open) A few milliseconds before the one of the connections had just done a START_REPLICATION SLOT "slon_4_2" LOGICAL 0/32721A58 and then that connection reported the socket being closed, but because so much was going on concurrently I can't say for sure if that connection experienced the assert or was closed because another backend asserted. I haven't done an initdb since, so I have the data directory but I've dropped and recreated all of my slots many times since so the wal files are long gone. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tablesync patch broke the assumption that logical rep depends on?
On 04/10/2017 01:28 PM, Fujii Masao wrote: Hi, src/backend/replication/logical/launcher.c * Worker started and attached to our shmem. This check is safe * because only launcher ever starts the workers, so nobody can steal * the worker slot. The tablesync patch enabled even worker to start another worker. So the above assumption is not valid for now. This issue seems to cause the corner case where the launcher picks up the same worker slot that previously-started worker has already picked up to start another worker. Regards, I'm not sure if what I am seeing is related to this or another issue. If I create a subscription for a publication that doesn't exist (yet) I see 'publication mypub does not exist" in the subscribers log If I then create the publication on the publisher the error doesn't go away, the worker process keeps spinning with the same error. If I then drop the subscription and recreate it it works. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 10 beta docs: different replication solutions
We don't seem to describe logical replication on https://www.postgresql.org/docs/10/static/different-replication-solutions.html The attached patch adds a section. Steve diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml index 138bdf2..1329d1f 100644 --- a/doc/src/sgml/high-availability.sgml +++ b/doc/src/sgml/high-availability.sgml @@ -158,6 +158,26 @@ protocol to make nodes agree on a serializable transactional order. + + + Logical Replication + + + + Logical replication allows a database server to send a stream of + data modifications to another server. + PostgreSQL logical replication constructs + a stream of logical data modifications from the WAL. + + + Logical replication allows the data changes from individual tables + to be replicated. Logical replication doesn't require a particular server + to be designated as a master or a slave but allows data to flow in multiple + directions. For more information on logical replication, see . + + + + Trigger-Based Master-Standby Replication @@ -290,6 +310,7 @@ protocol to make nodes agree on a serializable transactional order. Shared Disk Failover File System Replication Write-Ahead Log Shipping + Logical Replication Trigger-Based Master-Standby Replication Statement-Based Replication Middleware Asynchronous Multimaster Replication @@ -304,6 +325,7 @@ protocol to make nodes agree on a serializable transactional order. NAS DRBD Streaming Repl. + Logical Repl. Slony pgpool-II Bucardo @@ -315,6 +337,7 @@ protocol to make nodes agree on a serializable transactional order. shared disk disk blocks WAL + Logical decoding table rows SQL table rows @@ -330,6 +353,7 @@ protocol to make nodes agree on a serializable transactional order. • • • + • @@ -337,6 +361,7 @@ protocol to make nodes agree on a serializable transactional order. + • • • @@ -349,6 +374,7 @@ protocol to make nodes agree on a serializable transactional order. • + • @@ -360,6 +386,7 @@ protocol to make nodes agree on a serializable transactional order. with sync off • + • • @@ -371,6 +398,7 @@ protocol to make nodes agree on a serializable transactional order. • with sync on + • • @@ -385,6 +413,7 @@ protocol to make nodes agree on a serializable transactional order. • • • + • @@ -393,6 +422,7 @@ protocol to make nodes agree on a serializable transactional order. • + • • • @@ -403,6 +433,7 @@ protocol to make nodes agree on a serializable transactional order. • • • + • -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 10 beta docs: different replication solutions
On Mon, 31 Jul 2017, Merlin Moncure wrote: On Sun, Jul 30, 2017 at 8:34 PM, Steve Singer wrote: We don't seem to describe logical replication on https://www.postgresql.org/docs/10/static/different-replication-solutions.html The attached patch adds a section. This is a good catch. Two quick observations: 1) Super pedantic point. I don't like the 'repl.' abbreviation in the 'most common implementation' both for the existing hs/sr and for the newly added logical. Updated 2) This lingo: + Logical replication allows the data changes from individual tables + to be replicated. Logical replication doesn't require a particular server + to be designated as a master or a slave but allows data to flow in multiple + directions. For more information on logical replication, see . Is good, but I would revise it just a bit to emphasize the subscription nature of logical replication to link the concepts expressed strongly in the main section. For example: Logical replication allows the data changes [remove: "from individual tables to be replicated"] to be published to subscriber nodes. Data can flow in any direction between nodes on a per-table basis; there is no concept of a master server. Conflict resolution must be handled completely by the application. For more information on... what do you think? Sounds good. I've incorporated these changes into an updated patch but I changed the language around conflict resolution. Conflict resolution could be handled by triggers or adding extra columns to the primary key, that wouldn't be 'completely by the application' merlin diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml index 138bdf2..19b26f8 100644 --- a/doc/src/sgml/high-availability.sgml +++ b/doc/src/sgml/high-availability.sgml @@ -158,6 +158,27 @@ protocol to make nodes agree on a serializable transactional order. + + + Logical Replication + + + + Logical replication allows a database server to send a stream of + data modifications to another server. + PostgreSQL logical replication constructs + a stream of logical data modifications from the WAL. + + + Logical replication allows the data changes to be published to subscriber + nodes. Data can flow in any direction between nodes on a per-table basis; + there is no concept of a master server. PostgreSQL + does not include support for conflict resolution. + For more information on logical replication, see . + + + + Trigger-Based Master-Standby Replication @@ -290,6 +311,7 @@ protocol to make nodes agree on a serializable transactional order. Shared Disk Failover File System Replication Write-Ahead Log Shipping + Logical Replication Trigger-Based Master-Standby Replication Statement-Based Replication Middleware Asynchronous Multimaster Replication @@ -303,7 +325,8 @@ protocol to make nodes agree on a serializable transactional order. Most Common Implementation NAS DRBD - Streaming Repl. + Streaming Replication. + Logical Replication. Slony pgpool-II Bucardo @@ -315,6 +338,7 @@ protocol to make nodes agree on a serializable transactional order. shared disk disk blocks WAL + Logical decoding table rows SQL table rows @@ -330,6 +354,7 @@ protocol to make nodes agree on a serializable transactional order. • • • + • @@ -337,6 +362,7 @@ protocol to make nodes agree on a serializable transactional order. + • • • @@ -349,6 +375,7 @@ protocol to make nodes agree on a serializable transactional order. • + • @@ -360,6 +387,7 @@ protocol to make nodes agree on a serializable transactional order. with sync off • + • • @@ -371,6 +399,7 @@ protocol to make nodes agree on a serializable transactional order. • with sync on + • • @@ -385,6 +414,7 @@ protocol to make nodes agree on a serializable transactional order. • • • + • @@ -393,6 +423,7 @@ protocol to make nodes agree on a serializable transactional order. • + • • • @@ -403,6 +434,7 @@ protocol to make nodes agree on a serializable transactional order. • • • + • -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] snapshot too old, configured by time
On 08/31/2015 10:07 AM, Kevin Grittner wrote: Kevin, I've started to do a review on this patch but I am a bit confused with some of what I am seeing. The attached testcase fails I replace the cursor in your test case with direct selects from the table. I would have expected this to generate the snapshot too old error as well but it doesn't. # Failed test 'expect "snapshot too old" error' # at t/002_snapshot_too_old_select.pl line 64. # got: '' # expected: '72000' # Looks like you failed 1 test of 9. Dubious, test returned 1 (wstat 256, 0x100) Failed 1/9 subtests Am I misunderstanding something or is the patch not working as expected? As discussed when the "proof of concept" patch was submitted during 9.5 development, here is a version intended to be considered for commit to 9.6, with the following changes: 1. It is configured using time rather than number of transactions. Not only was there unanimous agreement here that this was better, but the EDB customer who had requested this feature and who had been testing it independently made the same request. 2. The "proof of concept" patch only supported heap and btree checking; this supports all index types. 3. Documentation has been added. 4. Tests have been added. They are currently somewhat minimal, since this is using a whole new technique for testing from any existing committed tests -- I wanted to make sure that this approach to testing was OK with everyone before expanding it. If it is, I assume we will want to move some of the more generic portions to a .pm file to make it available for other tests. Basically, this patch aims to limit bloat when there are snapshots that are kept registered for prolonged periods. The immediate reason for this is a customer application that keeps read-only cursors against fairly static data open for prolonged periods, and automatically fields SQLSTATE 72000 to re-open them if necessary. When used, it should also provide some protections against extreme bloat from forgotten "idle in transaction" connections which are left holding a snapshot. Once a snapshot reaches the age threshold, it can be terminated if reads data modified after the snapshot was built. It is expected that useful ranges will normally be somewhere from a few hours to a few days. By default old_snapshot_threshold is set to -1, which disables the new behavior. The customer has been testing a preliminary version of this time-based patch for several weeks, and is happy with the results -- it is preventing bloat for them and not generating "snapshot too old" errors at unexpected times. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company 002_snapshot_too_old_select.pl Description: Perl program -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator
On 09/18/2013 11:39 AM, Stephen Frost wrote: * Kevin Grittner (kgri...@ymail.com) wrote: = and <> aren't listed above even though they do a byte-for-byte comparison because, well, I guess because we have chosen to treat two UTF8 strings which produce the same set of glyphs using different bytes as unequal. :-/ I tend to side with Andres on this case actually- we're being asked to store specific UTF8 bytes by the end user. That is not the same as treating two different numerics which are the same *number* as different because they have different binary representations, which is entirely an internal-to-postgres consideration. The problem is that there are datatypes (citext, postgis,...) that have defined = to return true when comparing two values that are different not just stored differently. Are you saying that matview's should update only when the = operator of the datatype returns false and if people don't like this behaviour they should fix the datatypes? Thanks, Stephen -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator - Review
On 09/12/2013 06:27 PM, Kevin Grittner wrote: Attached is a patch for a bit of infrastructure I believe to be necessary for correct behavior of REFRESH MATERIALIZED VIEW CONCURRENTLY as well as incremental maintenance of matviews. Here is attempt at a review of the v1 patch. There has been a heated discussion on list about if we even want this type of operator. I will try to summarize it here The arguments against it are * that a record_image_identical call on two records that returns false can still return true under the equality operator, and the records might (or might not) appear to be the same to users * Once we expose this as an operator via SQL someone will find it and use it and then complain when we change things such as the internal representation of a datatype which might break there queries * The differences between = and record_image_identical might not be understood by everywhere who finds the operator leading to confusion * Using a criteria other than equality (the = operator provided by the datatype) might cause problems if we later provide 'on change' triggers for materialized views The arguments for this patch are * We want the materialized view to return the same value as would be returned if the query were executed directly. This not only means that the values should be the same according to a datatypes = operator but that they should appear the same 'to the eyeball'. * Supporting the materialized view refresh check via SQL makes a lot of sense but doing that requires exposing something via SQL * If we adequately document what we mean by record_image_identical and the operator we pick for this then users shouldn't be surprised at what they get if they use this I think there is agreement that better (as in more obscure) operators than === and !== need to be picked and we need to find a place in the user-docs to warn users of the behaviour of this operators. Hannu has proposed *==* "binary equal, surely very equal by any other definition as wall" !==? "maybe not equal" -- "binary inequal, may still be equal by other comparison methods" and no one has yet objected to this. My vote would be to update the patch to use those operator names and then figure out where we can document them. It it means we have to section describing the equal operator for records as well then maybe we should do that so we can get on with things. Code Review The record_image_eq and record_image_cmp functions are VERY similar. It seems to me that you could have the meet of these functions into a common function (that isn't exposed via SQL) that then behaves differently in 2 or 3 spots based on a parameter that controls if it detoasts the tuple for the compare or returns false on the equals. Beyond that I don't see any issues with the code. You call out a question about two records being compared have a different number of columns should it always be an error, or only an error when they match on the columns they do have in common. The current behaviour is select * FROM r1,r2 where r1<==r2; a | b | a | b | c ---+---+---+---+--- 1 | 1 | 1 | 2 | 1 (1 row) update r2 set b=1; UPDATE 1 test=# select * FROM r1,r2 where r2<==r1; ERROR: cannot compare record types with different numbers of columns This seems like a bad idea to me. I don't like that I get a type comparison error only sometimes based on the values of the data in the column. If I'm a user testing code that compares two records with this operator and I test my query with 1 value pair then and it works then I'd naively expect to get a true or false on all other value sets of the same record type. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator - Review
On 09/20/2013 08:38 AM, Kevin Grittner wrote: Did you look at the record_eq and record_cmp functions which exist before this patch? Is there a reason we should do it one way for the default operators and another way for this alternative? Do you think we should change record_eq and record_cmp to do things the way you suggest? I think the record_eq and record_cmp functions would be better if they shared code as well, but I don't think changing that should be part of this patch, you aren't otherwise touching those functions. I know some people dislike code that is switch based and prefer duplication, my preference is to avoid duplication. This seems like a bad idea to me. I don't like that I get a type comparison error only sometimes based on the values of the data in the column. If I'm a user testing code that compares two records with this operator and I test my query with 1 value pair then and it works then I'd naively expect to get a true or false on all other value sets of the same record type. Again, this is a result of following the precedent set by the existing record comparison operators. test=# create table r1 (c1 int, c2 int); CREATE TABLE test=# create table r2 (c1 int, c2 int, c3 int); CREATE TABLE test=# insert into r1 values (1, 2); INSERT 0 1 test=# insert into r2 values (3, 4, 5); INSERT 0 1 test=# select * from r1 join r2 on r1 < r2; c1 | c2 | c1 | c2 | c3 ++++ 1 | 2 | 3 | 4 | 5 (1 row) test=# update r1 set c1 = 3, c2 = 4; UPDATE 1 test=# select * from r1 join r2 on r1 < r2; ERROR: cannot compare record types with different numbers of columns Would be in favor of forcing a change to the record comparison operators which have been in use for years? If not, is there a good reason to have them behave differently in this regard? -- I hadn't picked up on that you copied that part of the behaviour from the existing comparison operators. No we would need a really good reason for changing the behaviour of the comparison operators and I don't think we have that. I agree that the binary identical operators should behave similarly to the existing comparison operators and error out on the column number mismatch after comparing the columns that are present in both. Steve Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On 09/20/2013 06:33 AM, Andres Freund wrote: Hi, The points I find daunting are the semantics, like: * How do we control whether a standby is allowed prevent WAL file removal. What if archiving is configured? * How do we control whether a standby is allowed to peg xmin? * How long do we peg an xmin/wal file removal if the standby is gone * How does the userinterface look to remove a slot if a standby is gone * How do we decide/control which commands use a slot in which cases? I think we are going to want to be flexible enough to support users with a couple of different points of use-cases. * Some people will want to keep xmin pegged and prevent WAL removal so a standby with a slot can always catch up, and wi * Most people will want to say keep X megabytes of WA (if needed by a behind slot) and keep xmin pegged so that the WAL can be consumed by a logical plugin. I can see us also implementing a restore_command that the walsender could use to get archived segments but for logical replication xmin would still need to be low enough I don't think the current patch set is incompatible with us later implementing any of the above. I'd rather see us focus on getting the core functionality committed and worry about a good interface for managing slots later. Greetings, Andres Freund Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On 09/24/2013 11:21 AM, Andres Freund wrote: Not having a consumer of the walsender interface included sounds like a bad idea to me, even if it were only useful for testing. Now, you could argue it should be in /contrib - and I wouldn't argue against that except it shares code with the rest of src/bin/pg_basebackup. +1 on pg_receivellog (or whatever better name we pick) being somewhere. I found the pg_receivellog code very useful as an example and for debugging/development purposes. It isn't something that I see useful for the average user and I think the use-cases it meets are closer to other things we usually put in /contrib Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On 09/17/2013 10:31 AM, Andres Freund wrote: This patch set now fails to apply because of the commit "Rename various "freeze multixact" variables". And I am even partially guilty for that patch... Rebased patches attached. While testing the logical replication changes against my WIP logical slony I am sometimes getting error messages from the WAL sender of the form: unexpected duplicate for tablespace X relfilenode X The call stack is HeapTupleSatisfiesMVCCDuringDecoding heap_hot_search_buffer index_fetch_heap index_getnext systable_getnext RelidByRelfilenode ReorderBufferCommit DecodeCommit . . . I am working off something based on your version e0acfeace6d695c229efd5d78041a1b734583431 Any ideas? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On 09/25/2013 11:08 AM, Andres Freund wrote: On 2013-09-25 11:01:44 -0400, Steve Singer wrote: On 09/17/2013 10:31 AM, Andres Freund wrote: This patch set now fails to apply because of the commit "Rename various "freeze multixact" variables". And I am even partially guilty for that patch... Rebased patches attached. While testing the logical replication changes against my WIP logical slony I am sometimes getting error messages from the WAL sender of the form: unexpected duplicate for tablespace X relfilenode X Any chance you could provide a setup to reproduce the error? The steps to build a setup that should reproduce this error are: 1. I had apply the attached patch on top of your logical replication branch so my pg_decode_init would now if it was being called as part of a INIT_REPLICATION or START_REPLICATION. Unless I have misunderstood something you probably will want to merge this fix in 2. Get my WIP for adding logical support to slony from: g...@github.com:ssinger/slony1-engine.git branch logical_repl (4af1917f8418a) (My code changes to slony are more prototype level code quality than production code quality) 3. cd slony1-engine ./configure --with-pgconfigdir=/usr/local/pg94wal/bin (or whatever) make make install 4. Grab the clustertest framework JAR from https://github.com/clustertest/clustertest-framework and build up a clustertest jar file 5. Create a file slony1-engine/clustertest/conf/java.conf that contains the path to the above JAR file as a shell variable assignment: ie CLUSTERTESTJAR=/home/ssinger/src/clustertest/clustertest_git/build/jar/clustertest-coordinator.jar 6. cp clustertest/conf/disorder.properties.sample clustertest/conf/disorder.properties edit disorder.properites to have the proper values for your environment. All 6 databases can point at the same postgres instance, this test will only actually use 2 of them(so far). 7. Run the test cd clustertest ./run_all_disorder_tests.sh This involves having the slon connect to the walsender on the database test1 and replicate the data into test2 (which is a different database on the same postmaster) If this setup seems like too much effort I can request one of the commitfest VM's from Josh and get everything setup there for you. Steve Any ideas? I'll look into it. Could you provide any context to what youre doing that's being decoded? Greetings, Andres Freund >From 9efae980c357d3b75c6d98204ed75b8d29ed6385 Mon Sep 17 00:00:00 2001 From: Steve Singer Date: Mon, 16 Sep 2013 10:21:39 -0400 Subject: [PATCH] set the init parameter to true when performing an INIT REPLICATION --- src/backend/replication/walsender.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 2187d96..1b8f289 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -743,7 +743,7 @@ InitLogicalReplication(InitLogicalReplicationCmd *cmd) sendTimeLine = ThisTimeLineID; initStringInfo(&output_message); - ctx = CreateLogicalDecodingContext(MyLogicalDecodingSlot, false, InvalidXLogRecPtr, + ctx = CreateLogicalDecodingContext(MyLogicalDecodingSlot, true, InvalidXLogRecPtr, NIL, replay_read_page, WalSndPrepareWrite, WalSndWriteData); -- 1.7.10.4 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On 09/25/2013 01:20 PM, Steve Singer wrote: On 09/25/2013 11:08 AM, Andres Freund wrote: On 2013-09-25 11:01:44 -0400, Steve Singer wrote: On 09/17/2013 10:31 AM, Andres Freund wrote: This patch set now fails to apply because of the commit "Rename various "freeze multixact" variables". And I am even partially guilty for that patch... Rebased patches attached. While testing the logical replication changes against my WIP logical slony I am sometimes getting error messages from the WAL sender of the form: unexpected duplicate for tablespace X relfilenode X Any chance you could provide a setup to reproduce the error? The steps to build a setup that should reproduce this error are: 1. I had apply the attached patch on top of your logical replication branch so my pg_decode_init would now if it was being called as part of a INIT_REPLICATION or START_REPLICATION. Unless I have misunderstood something you probably will want to merge this fix in 2. Get my WIP for adding logical support to slony from: g...@github.com:ssinger/slony1-engine.git branch logical_repl (4af1917f8418a) (My code changes to slony are more prototype level code quality than production code quality) 3. cd slony1-engine ./configure --with-pgconfigdir=/usr/local/pg94wal/bin (or whatever) make make install 4. Grab the clustertest framework JAR from https://github.com/clustertest/clustertest-framework and build up a clustertest jar file 5. Create a file slony1-engine/clustertest/conf/java.conf that contains the path to the above JAR file as a shell variable assignment: ie CLUSTERTESTJAR=/home/ssinger/src/clustertest/clustertest_git/build/jar/clustertest-coordinator.jar 6. cp clustertest/conf/disorder.properties.sample clustertest/conf/disorder.properties edit disorder.properites to have the proper values for your environment. All 6 databases can point at the same postgres instance, this test will only actually use 2 of them(so far). 7. Run the test cd clustertest ./run_all_disorder_tests.sh This involves having the slon connect to the walsender on the database test1 and replicate the data into test2 (which is a different database on the same postmaster) If this setup seems like too much effort I can request one of the commitfest VM's from Josh and get everything setup there for you. Steve Any ideas? I'll look into it. Could you provide any context to what youre doing that's being decoded? I've determined that when in this test the walsender seems to be hitting this when it is decode the transactions that are behind the slonik commands to add tables to replication (set add table, set add sequence). This is before the SUBSCRIBE SET is submitted. I've also noticed something else that is strange (but might be unrelated). If I stop my slon process and restart it I get messages like: WARNING: Starting logical replication from 0/a9321360 ERROR: cannot stream from 0/A9321360, minimum is 0/A9320B00 Where 0/A9321360 was sent in the last packet my slon received from the walsender before the restart. If force it to restart replication from 0/A9320B00 I see datarows that I appear to have already seen before the restart. I think this is happening when I process the data for 0/A9320B00 but don't get the feedback message my slon was killed. Is this expected? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On 09/26/2013 02:47 PM, Steve Singer wrote: I've determined that when in this test the walsender seems to be hitting this when it is decode the transactions that are behind the slonik commands to add tables to replication (set add table, set add sequence). This is before the SUBSCRIBE SET is submitted. I've also noticed something else that is strange (but might be unrelated). If I stop my slon process and restart it I get messages like: WARNING: Starting logical replication from 0/a9321360 ERROR: cannot stream from 0/A9321360, minimum is 0/A9320B00 Where 0/A9321360 was sent in the last packet my slon received from the walsender before the restart. If force it to restart replication from 0/A9320B00 I see datarows that I appear to have already seen before the restart. I think this is happening when I process the data for 0/A9320B00 but don't get the feedback message my slon was killed. Is this expected? I've further narrowed this down to something (or the combination of) what the _disorder_replica.altertableaddTriggers(1); stored function does. (or @SLONYNAMESPACE@.altertableaddTriggers(int); Which is essentially * Get an exclusive lock on sl_config_lock * Get an exclusive lock on the user table in question * create a trigger (the deny access trigger) * create a truncate trigger * create a deny truncate trigger I am not yet able to replicate the error by issuing the same SQL commands from psql, but I must be missing something. I can replicate this when just using the test_decoding plugin. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.1
On 09/27/2013 11:44 AM, Andres Freund wrote: I'm encountering a make error: Gah. Lastminute changes. Always the same... Updated patch attached. Greetings, Andres Freund I'm still encountering an error in the make. make clean . .make[3]: Entering directory `/usr/local/src/postgresql/src/bin/pg_basebackup' rm -f pg_basebackup pg_receivexlog pg_recvlogical(X) \ pg_basebackup.o pg_receivexlog.o pg_recvlogical.o \ receivelog.o streamutil.o /bin/sh: 1: Syntax error: "(" unexpected make[3]: *** [clean] Error 2 I had to add a quotes in to the clean commands to make it work -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On 09/27/2013 05:18 PM, Andres Freund wrote: Hi Steve, On 2013-09-27 17:06:59 -0400, Steve Singer wrote: I've determined that when in this test the walsender seems to be hitting this when it is decode the transactions that are behind the slonik commands to add tables to replication (set add table, set add sequence). This is before the SUBSCRIBE SET is submitted. I've also noticed something else that is strange (but might be unrelated). If I stop my slon process and restart it I get messages like: WARNING: Starting logical replication from 0/a9321360 ERROR: cannot stream from 0/A9321360, minimum is 0/A9320B00 Where 0/A9321360 was sent in the last packet my slon received from the walsender before the restart. Uh, that looks like I fumbled some comparison. Let me check. I've further narrowed this down to something (or the combination of) what the _disorder_replica.altertableaddTriggers(1); stored function does. (or @SLONYNAMESPACE@.altertableaddTriggers(int); Which is essentially * Get an exclusive lock on sl_config_lock * Get an exclusive lock on the user table in question * create a trigger (the deny access trigger) * create a truncate trigger * create a deny truncate trigger I am not yet able to replicate the error by issuing the same SQL commands from psql, but I must be missing something. I can replicate this when just using the test_decoding plugin. Thanks. That should get me started with debugging. Unless it's possibly fixed in the latest version, one bug fixed there might cause something like this if the moon stands exactly right? The latest version has NOT fixed the problem. Also, I was a bit inaccurate in my previous descriptions. To clarify: 1. I sometimes am getting that 'unexpected duplicate' error 2. The 'set add table ' which triggers those functions that create and configure triggers is actually causing the walsender to hit the following assertion 2 0x00773d47 in ExceptionalCondition ( conditionName=conditionName@entry=0x8cf400 "!(ent->cmin == change->tuplecid.cmin)", errorType=errorType@entry=0x7ab830 "FailedAssertion", fileName=fileName@entry=0x8cecc3 "reorderbuffer.c", lineNumber=lineNumber@entry=1162) at assert.c:54 #3 0x00665480 in ReorderBufferBuildTupleCidHash (txn=0x1b6e610, rb=) at reorderbuffer.c:1162 #4 ReorderBufferCommit (rb=0x1b6e4f8, xid=, commit_lsn=3461001952, end_lsn=) at reorderbuffer.c:1285 #5 0x0065f0f7 in DecodeCommit (xid=, nsubxacts=, sub_xids=, ninval_msgs=16, msgs=0x1b637c0, buf=0x7fff54d01530, buf=0x7fff54d01530, ctx=0x1adb928, ctx=0x1adb928) at decode.c:477 I had added an assert(false) to the code where the 'unknown duplicate' error was logged to make spotting this easier but yesterday I didn't double check that I was hitting the assertion I added versus this other one. I can't yet say if this is two unrelated issues or if I'd get to the 'unknown duplicate' message immediately after. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator - Review
On 09/28/2013 03:03 PM, Kevin Grittner wrote: +To support matching of rows which include elements without a default +B-tree operator class, the following operators are defined for composite +type comparison: +*=, +*<>, +*<, +*<=, +*>, and +*>=. +These operators are also used internally to maintain materialized views, +and may be useful to replication software. To ensure that all user +visible changes are detected, even when the equality operator for the +type treats old and new values as equal, the byte images of the stored +data are compared. While ordering is deterministic, it is not generally +useful except to facilitate merge joins. Ordering may differ between +system architectures and major releases of +PostgreSQL. + How about To support matching of rows which include elements without a default B-tree operator class, the following operators are defined for composite type comparison: *=, *<>, *<, *<=, *>, and *>=. These operators compare the internal binary representation of the two rows. Two rows might have a different binary representation even though comparisons of the two rows with the equality operator is true. The ordering of rows under these comparision operators is deterministic but not otherwise meaningful. These operators are used internally for materialized views and might be useful for other specialized purposes such as replication but are not intended to be generally useful for writing queries. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.2
On 09/30/2013 06:44 PM, Andres Freund wrote: Hi, The series from friday was a bit too buggy - obviously I was too tired. So here's a new one: With this series I've also noticed #2 0x007741a7 in ExceptionalCondition ( conditionName=conditionName@entry=0x7c2908 "!(!(tuple->t_infomask & 0x1000))", errorType=errorType@entry=0x7acc70 "FailedAssertion", fileName=fileName@entry=0x91767e "tqual.c", lineNumber=lineNumber@entry=1608) at assert.c:54 54abort(); 0x007a4432 in HeapTupleSatisfiesMVCCDuringDecoding ( htup=0x10bfe48, snapshot=0x108b3d8, buffer=310) at tqual.c:1608 #4 0x0049d6b7 in heap_hot_search_buffer (tid=tid@entry=0x10bfe4c, relation=0x7fbebbcd89c0, buffer=310, snapshot=0x10bfda0, heapTuple=heapTuple@entry=0x10bfe48, all_dead=all_dead@entry=0x7fff4aa3866f "\001\370\375\v\001", first_call=1 '\001') at heapam.c:1756 #5 0x004a8174 in index_fetch_heap (scan=scan@entry=0x10bfdf8) at indexam.c:539 #6 0x004a82a8 in index_getnext (scan=0x10bfdf8, direction=direction@entry=ForwardScanDirection) at indexam.c:622 #7 0x004a6fa9 in systable_getnext (sysscan=sysscan@entry=0x10bfd48) at genam.c:343 #8 0x0076df40 in RelidByRelfilenode (reltablespace=0, relfilenode=529775) at relfilenodemap.c:214 ---Type to continue, or q to quit--- #9 0x00664ad7 in ReorderBufferCommit (rb=0x1082d98, xid=, commit_lsn=4638756800, end_lsn=, commit_time=commit_time@entry=433970378426176) at reorderbuffer.c:1320 In addition to some of the other ones I've posted about. * fix pg_recvlogical makefile (Thanks Steve) * fix two commits not compiling properly without later changes (Thanks Kevin) * keep track of commit timestamps * fix bugs with option passing in test_logical_decoding * actually parse option values in test_decoding instead of just using the option name * don't use anonymous structs in unions. That's compiler specific (msvc and gcc) before C11 on which we can't rely. That unfortunately will break output plugins because ReorderBufferChange need to qualify old/new tuples now * improve error handling/cleanup in test_logical_decoding * some minor cleanups Patches attached, git tree updated. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] record identical operator - Review
On 09/30/2013 09:08 AM, Kevin Grittner wrote: Steve Singer wrote: How about To support matching of rows which include elements without a default B-tree operator class, the following operators are defined for composite type comparison: *=, *<>, *<, *<=, *>, and *>=. These operators compare the internal binary representation of the two rows. Two rows might have a different binary representation even though comparisons of the two rows with the equality operator is true. The ordering of rows under these comparision operators is deterministic but not otherwise meaningful. These operators are used internally for materialized views and might be useful for other specialized purposes such as replication but are not intended to be generally useful for writing queries. I agree that's an improvement. Thanks! Are there any outstanding issues on this patch preventing it from being committed? I think we have discussed this patch enough such that we now have consensus on proceeding with adding a record identical operator to SQL. No one has objected to the latest names of the operators. You haven't adjusted the patch to reduce the duplication between the equality and comparison functions, if you disagree with me and feel that doing so would increase the code complexity and be inconsistent with how we do things elsewhere that is fine. Steve -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.2
On 10/03/2013 12:38 PM, Andres Freund wrote: Does your code use SELECT FOR UPDATE/SHARE on system or treat_as_catalog tables? Greetings, Andres Freund Yes. It declares sl_table and sl_sequence and sl_set as catalog. It does a SELECT .. from @NAMESPACE@.sl_table T, @NAMESPACE@.sl_set S, "pg_catalog".pg_class PGC, "pg_catalog".pg_namespace PGN, "pg_catalog".pg_index PGX, "pg_catalog".pg_class PGXC where ... for update in the code being executed by the 'set add table'. (We also do select for update commands in many other places during cluster configuration commands) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.2
On 10/03/2013 04:00 PM, Andres Freund wrote: Ok, there were a couple of bugs because I thought mxacts wouldn't need to be supported. So far your testcase doesn't crash the database anymore - it spews some internal errors though, so I am not sure if it's entirely fixed for you. Thanks for testing and helping! I've pushed the changes to the git tree, they aren't squashed yet and there's some further outstanding stuff, so I won't repost the series yet. Greetings, Andres Freund When I run your updated version (from friday, not what you posted today) against a more recent version of my slony changes I can get the test case to pass 2/3 'rd of the time. The failures are due to an issue in slon itself that I need to fix. I see lots of 0LOG: tx with subtxn 58836 but they seem harmless. Thanks -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.1
On 10/07/2013 09:32 AM, Andres Freund wrote: Todo: * rename treat_as_catalog_table, after agreeing on the new name * rename remaining timetravel function names * restrict SuspendDecodingSnapshots usage to RelationInitPhysicalAddr, that ought to be enough. * add InLogicalDecoding() function. * throw away older data when reading xl_running_xacts records, to deal with immediate shutdowns/crashes What is your current plan for decoding sequence updates? Is this something that you were going to hold-off on supporting till a future version? ( know this was discussed a while ago but I don't remember where it stands now) From a Slony point of view this isn't a big deal, I can continue to capture sequence changes in sl_seqlog when I create each SYNC event and then just replicate the INSERT statements in sl_seqlog via logical decoding. I can see why someone building a replication system not based on the concept of a SYNC would have a harder time with this. I am guessing we would want to pass sequence operations to the plugins as we encounter the WAL for them out-of-band of any transaction. This would mean that a set of operations like begin; insert into a (id) values(4); insert into a (id) values(nextval('some_seq')); commit would be replayed on the replicas as setval('some_seq',100); begin; insert into a (id) values (4); insert into a (id) values (100); commit; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.5
On 11/05/2013 10:21 AM, Andres Freund wrote: Hi, Attached to this mail and in the xlog-decoding-rebasing-remapping branch in my git[1] repository you can find the next version of the patchset that: * Fixes full table rewrites of catalog tables using the method Robert prefers (which is to log rewrite mappings to disk) * Extract the REPLICA IDENTITY as configured with ALTER TABLE for the old tuple for UPDATEs and DELETEs * Much better support for synchronous replication * Better resource cleanup (as in we need less local WAL available) * Lots of smaller fixes The change around REPLICA IDENTITY is *incompatible* to older output plugins since we now log tuples using the table's TupleDesc, not the indexes. My updated plugin is getting rows with change->tp.oldtuple as NULL on updates either with the default PRIMARY KEY identify or with a FULL identity. When I try the test_decoding plugin on UPDATE I get rows like: table "do_inventory": UPDATE: ii_id[int8]:251 ii_in_stock[int8]:1 ii_reserved[int8]:144 ii_total_sold[int8]:911 which I think is only data from the new tuple.The lack of "old-key" in the output makes me think the test decoding plugin also isn't getting the old tuple. (This is with your patch-set rebased ontop of ac4ab97ec05ea900db0f14d428cae2e79832e02d which includes the patches Robert committed the other day, I can't rule out that I didn't break something in the rebase). Robert, I'd be very grateful if you could have a look at patch 0007 implementing what we've discussed. I kept it separate to make it easier to look at it in isolation, but I think in the end it partially should be merged into the wal_level=logical patch. I still think the "wide cmin/cmax" solution is more elegant and has wider applicability, but this works as well although it's about 5 times the code. Comments? [1]: http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=summary Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.5
On 11/09/2013 05:42 PM, Andres Freund wrote: On 2013-11-09 17:36:49 -0500, Steve Singer wrote: On 11/05/2013 10:21 AM, Andres Freund wrote: Hi, Attached to this mail and in the xlog-decoding-rebasing-remapping branch in my git[1] repository you can find the next version of the patchset that: * Fixes full table rewrites of catalog tables using the method Robert prefers (which is to log rewrite mappings to disk) * Extract the REPLICA IDENTITY as configured with ALTER TABLE for the old tuple for UPDATEs and DELETEs * Much better support for synchronous replication * Better resource cleanup (as in we need less local WAL available) * Lots of smaller fixes The change around REPLICA IDENTITY is *incompatible* to older output plugins since we now log tuples using the table's TupleDesc, not the indexes. My updated plugin is getting rows with change->tp.oldtuple as NULL on updates either with the default PRIMARY KEY identify or with a FULL identity. When I try the test_decoding plugin on UPDATE I get rows like: table "do_inventory": UPDATE: ii_id[int8]:251 ii_in_stock[int8]:1 ii_reserved[int8]:144 ii_total_sold[int8]:911 which I think is only data from the new tuple.The lack of "old-key" in the output makes me think the test decoding plugin also isn't getting the old tuple. (This is with your patch-set rebased ontop of ac4ab97ec05ea900db0f14d428cae2e79832e02d which includes the patches Robert committed the other day, I can't rule out that I didn't break something in the rebase). I've pushed an updated tree to git, that contains that http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/xlog-decoding-rebasing-remapping git://git.postgresql.org/git/users/andresfreund/postgres.git and some more fixes. I'll send out an email with details sometime soon. 93c5c2a171455763995cef0afa907bcfaa405db4 Still give me the following: update disorder.do_inventory set ii_in_stock=2 where ii_id=251; UPDATE 1 test1=# LOG: tuple in table with oid: 35122 without primary key \d disorder.do_inventory Table "disorder.do_inventory" Column | Type | Modifiers ---++--- ii_id | bigint | not null ii_in_stock | bigint | ii_reserved | bigint | ii_total_sold | bigint | Indexes: "do_inventory_pkey" PRIMARY KEY, btree (ii_id) Foreign-key constraints: "do_inventory_item_ref" FOREIGN KEY (ii_id) REFERENCES disorder.do_item(i_id) ON DELETE CASCADE Referenced by: TABLE "disorder.do_item" CONSTRAINT "do_item_inventory_ref" FOREIGN KEY (i_id) REFERENCES disorder.do_inventory(ii_id) DEFERRABLE INITIALLY DEFERRED TABLE "disorder.do_restock" CONSTRAINT "do_restock_inventory_ref" FOREIGN KEY (r_i_id) REFERENCES disorder.do_inventory(ii_id) ON DELETE CASCADE Triggers: _disorder_replica_truncatetrigger BEFORE TRUNCATE ON disorder.do_inventory FOR EACH STATEMENT EXECUTE PROCEDURE _disorder_replica.log_truncate('3') Disabled triggers: _disorder_replica_denyaccess BEFORE INSERT OR DELETE OR UPDATE ON disorder.do_inventory FOR EACH ROW EXECUTE PROCEDURE _disorder_replica.denyaccess('_disorder_replica') _disorder_replica_truncatedeny BEFORE TRUNCATE ON disorder.do_inventory FOR EACH STATEMENT EXECUTE PROCEDURE _disorder_replica.deny_truncate() Replica Identity: FULL The test decoder plugin gives me: table "do_inventory": UPDATE: old-pkey: a) The table does have a primary key b) I don't get anything in the old key when I was expecting all the rows c) If I change the table to use the pkey index with alter table disorder.do_inventory replica identity using index do_inventory_pkey; The LOG message on the update goes away but the output of the test decoder plugin goes back to table "do_inventory": UPDATE: ii_id[int8]:251 ii_in_stock[int8]:5 ii_reserved[int8]:144 ii_total_sold[int8]:911 Which I suspect means oldtuple is back to null Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.5
On 11/10/2013 09:41 AM, Andres Freund wrote: Still give me the following: update disorder.do_inventory set ii_in_stock=2 where ii_id=251; UPDATE 1 test1=# LOG: tuple in table with oid: 35122 without primary key Hm. Could it be that you still have an older "test_decoding" plugin lying around? The current one doesn't contain that string anymore. That'd explain the problems. In v6.4 the output plugin API was changed that plain heaptuples are passed for the "old" key, although with non-key columns set to NULL. Earlier it was a "index tuple" as defined by the indexes TupleDesc. Grrr, yah that was the problem I had compiled but not installed the newer plugin. Sorry. a) The table does have a primary key b) I don't get anything in the old key when I was expecting all the rows c) If I change the table to use the pkey index with alter table disorder.do_inventory replica identity using index do_inventory_pkey; The LOG message on the update goes away but the output of the test decoder plugin goes back to table "do_inventory": UPDATE: ii_id[int8]:251 ii_in_stock[int8]:5 ii_reserved[int8]:144 ii_total_sold[int8]:911 Which I suspect means oldtuple is back to null Which is legitimate though, if you don't update the primary (or explicitly chosen candidate) key. Those only get logged if there's actual changes in those columns. Makes sense? Is the expectation that plugin writters will call RelationGetIndexAttrBitmap(relation,INDEX_ATTR_BITMAP_IDENTITY_KEY); to figure out what the identity key is. How do we feel about having the decoder logic populate change.oldtuple with the identity on UPDATE statements when it is null? The logic I have now is to use oldtuple if it is not null, otherwise go figure out which columns from the identiy key we should be using. I think most plugins that do anything useful with an update will need to duplicate that Greetings, Andres Freund -- Andres Freundhttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] snapshot too old, configured by time
On 10/15/2015 05:47 PM, Kevin Grittner wrote: All other issues raised by Álvaro and Steve have been addressed, except for this one, which I will argue against: I've been looking through the updated patch In snapmgr.c + * XXX: If we can trust a read of an int64 value to be atomic, we can skip the + * spinlock here. + */ +int64 +GetOldSnapshotThresholdTimestamp(void) Was your intent with the XXX for it to be a TODO to only aquire the lock on platforms without the atomic 64bit operations? On a more general note: I've tried various manual tests of this feature and it sometimes works as expected and sometimes doesn't. I'm getting the feeling that how I expect it to work isn't quite in sync with how it does work. I'd expect the following to be sufficient to generate the test T1: Obtains a snapshot that can see some rows T2: Waits 60 seconds and performs an update on those rows T2: Performs a vacuum T1: Waits 60 seconds, tries to select from the table. The snapshot should be too old For example it seems that in test 002 the select_ok on conn1 following the vacuum but right before the final sleep is critical to the snapshot too old error showing up (ie if I remove that select_ok but leave in the sleep I don't get the error) Is this intended and if so is there a better way we can explain how things work? Also is 0 intended to be an acceptable value for old_snapshot_threshold and if so what should we expect to see then? So if I understand correctly, every call to BufferGetPage needs to have a TestForOldSnapshot immediately afterwards? It seems easy to later introduce places that fail to test for old snapshots. What happens if they do? Does vacuum remove tuples anyway and then the query returns wrong results? That seems pretty dangerous. Maybe the snapshot could be an argument of BufferGetPage? There are 486 occurences of BufferGetPage in the source code, and this patch follows 36 of them with TestForOldSnapshot. This only needs to be done when a page is going to be used for a scan which produces user-visible results. That is, it is *not* needed for positioning within indexes to add or vacuum away entries, for heap inserts or deletions (assuming the row to be deleted has already been found). It seems wrong to modify about 450 BufferGetPage references to add a NULL parameter; and if we do want to make that noop change as insurance, it seems like it should be a separate patch, since the substance of this patch would be buried under the volume of that. I will add this to the November CF. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix conversion for Decimal arguments in plpython functions
On 05/28/2013 04:41 PM, Szymon Guz wrote: Hi, I've got a patch. This is for a plpython enhancement. There is an item at the TODO list http://wiki.postgresql.org/wiki/Todo#Server-Side_Languages "Fix loss of information during conversion of numeric type to Python float" This patch uses a decimal.Decimal type from Python standard library for the plpthon function numeric argument instead of float. Patch contains changes in code, documentation and tests. Most probably there is something wrong, as this is my first Postgres patch :) Thanks for contributing. This patch applies cleanly against master and compiles with warnings plpy_main.c: In function ‘PLy_init_interp’: plpy_main.c:157:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] plpy_main.c:161:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] You can avoid this by moving the declaration of decimal and decimal_dict to be at the top of the function where mainmod is declared. Also in this function you've introduced places where it returns with an error (the PLy_elog(ERROR...) calls before decrementing the reference to mainmod. I think you can decrement the mainmod reference after the call to SetItemString before your changes that import the Decimal module. The patch works as expected, I am able to write python functions that take numerics as arguments and work with them. I can adjust the decimal context precision inside of my function. One concern I have is that this patch makes pl/python functions involving numerics more than 3 times as slow as before. create temp table b(a numeric); insert into b select generate_series(1,1); create or replace function x(a numeric,b numeric) returns numeric as $$ if a==None: return b return a+b $$ language plpythonu; create aggregate sm(basetype=numeric, sfunc=x,stype=numeric); test=# select sm(a) from b; sm -- 50005000 (1 row) Time: 565.650 ms versus before the patch this was taking in the range of 80ms. Would it be faster to call numeric_send instead of numeric_out and then convert the sequence of Int16's to a tuple of digits that can be passed into the Decimal constructor? I think this is worth trying and testing, Documentation = Your patched version of the docs say PostgreSQL real, double, and numeric are converted to Python Decimal. This type is imported fromdecimal.Decimal. I don't think this is correct, as far as I can tell your not changing the behaviour for postgresql real and double types, they continue to use floating point. PostgreSQL real and doubleare converted to Python float. PostgreSQL numeric is converted to Python Decimal. This type is imported from decimal.Decimal. Maybe? Steve thanks, Szymon -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix conversion for Decimal arguments in plpython functions
On 06/25/2013 06:42 AM, Szymon Guz wrote: Hi, I've attached a new patch. I've fixed all the problems you've found, except for the efficiency problem, which has been described in previous email. thanks, Szymon This version of the patch addresses the issues I mentioned. Thanks for looking into seeing if the performance issue is with our conversions to strings or inherit with the python decimal type. I guess we (Postgresql) can't do much about it. A runtime switch to use cdecimal if it is available is a good idea, but I agree with you that could be a different patch. One minor thing I noticed in this round, PLy_elog(ERROR, "could not import module 'decimal'"); I think should have "decimal" in double-quotes. I think this patch is ready for a committer to look at it. Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix conversion for Decimal arguments in plpython functions
On 06/26/2013 04:47 PM, Szymon Guz wrote: Attached patch has all changes against trunk code. There is added a function for conversion from Postgres numeric to Python Decimal. The Decimal type is taken from cdecimal.Decimal, if it is available. It is an external library, quite fast, but may be not available. If it is not available, then decimal.Decimal will be used. It is in standard Python library, however it is rather slow. The initialization is done in the conversion function, the pointer to a proper Decimal constructor is stored as static variable inside the function and is lazy initialized. The documentation is updated. I've tested this version with python 2.7 with and without cdecimal and also with 3.3 that has the faster decimal performance. It seems fine. The v5 version of the patch makes only white-space changes to plpy_main.c you should excluded that from the patch if your making a new version (I have done this in the v6 version I'm attaching) Tests for python 2 and 3 have been added. They work only with standard decimal.Decimal, as the type is printed in the *.out files. I think there is nothing we can do with that now. I think we should make test_type_conversion_numeric to do something that generates the same output in both cases. ie py.info(str(x)). I downside of having the test fail on installs with cdecimal installed is much greater than any benefit we get by ensuring that the type is really decimal. I've attached a v6 version of the patch that does this, do you agree with my thinking? Steve regards, Szymon diff --git a/doc/src/sgml/plpython.sgml b/doc/src/sgml/plpython.sgml new file mode 100644 index aaf758d..da27874 *** a/doc/src/sgml/plpython.sgml --- b/doc/src/sgml/plpython.sgml *** $$ LANGUAGE plpythonu; *** 308,321 !PostgreSQL real, double, !and numeric are converted to !Python float. Note that for !the numeric this loses information and can lead to !incorrect results. This might be fixed in a future !release. --- 308,326 + + +PostgreSQL real and double are converted to +Python float. + + + !PostgreSQL numeric is converted to !Python Decimal. This type is imported from ! cdecimal package if it is available. If cdecimal ! cannot be used, then decimal.Decimal will be used. diff --git a/src/pl/plpython/expected/plpython_types.out b/src/pl/plpython/expected/plpython_types.out new file mode 100644 index 4641345..46308ed *** a/src/pl/plpython/expected/plpython_types.out --- b/src/pl/plpython/expected/plpython_types.out *** CONTEXT: PL/Python function "test_type_ *** 213,248 (1 row) CREATE FUNCTION test_type_conversion_numeric(x numeric) RETURNS numeric AS $$ ! plpy.info(x, type(x)) return x $$ LANGUAGE plpythonu; ! /* The current implementation converts numeric to float. */ SELECT * FROM test_type_conversion_numeric(100); ! INFO: (100.0, ) CONTEXT: PL/Python function "test_type_conversion_numeric" test_type_conversion_numeric -- ! 100.0 (1 row) SELECT * FROM test_type_conversion_numeric(-100); ! INFO: (-100.0, ) CONTEXT: PL/Python function "test_type_conversion_numeric" test_type_conversion_numeric -- !-100.0 (1 row) SELECT * FROM test_type_conversion_numeric(50.5); ! INFO: (50.5, ) CONTEXT: PL/Python function "test_type_conversion_numeric" test_type_conversion_numeric -- 50.5 (1 row) SELECT * FROM test_type_conversion_numeric(null); ! INFO: (None, ) CONTEXT: PL/Python function "test_type_conversion_numeric" test_type_conversion_numeric -- --- 213,264 (1 row) CREATE FUNCTION test_type_conversion_numeric(x numeric) RETURNS numeric AS $$ ! plpy.info(str(x)) return x $$ LANGUAGE plpythonu; ! /* The current implementation converts numeric to decimal.Decimal. */ SELECT * FROM test_type_conversion_numeric(100); ! INFO: 100 CONTEXT: PL/Python function "test_type_conversion_numeric" test_type_conversion_numeric -- ! 100 (1 row) SELECT * FROM test_type_conversion_numeric(-100); ! INFO: -100 CONTEXT: PL/Python function "test_type_conversion_numeric" test_type_conversion_numeric -- ! -100 (1 row) SELECT * FROM test_type_conversion_numeric(50.5); ! INFO: 50.5 CONTEXT: PL/Python function "test_type_conversion_numeric" test_type_conversion_numeric -- 50.5 (1 row) + SELECT *
Re: [HACKERS] [PATCH] Fix conversion for Decimal arguments in plpython functions
On 06/27/2013 05:04 AM, Szymon Guz wrote: On 27 June 2013 05:21, Steve Singer <mailto:st...@ssinger.info>> wrote: On 06/26/2013 04:47 PM, Szymon Guz wrote: Hi Steve, thanks for the changes. You're idea about common code for decimal and cdecimal is good, however not good enough. I like the idea of common code for decimal and cdecimal. But we need class name, not the value. I've changed the code from str(x) to x.__class__.__name__ so the function prints class name (which is Decimal for both packages), not the value. We need to have the class name check. The value is returned by the function and is a couple of lines lower in the file. patch is attached. I think the value is more important than the name, I want to the tests to make sure that the conversion is actually converting properly. With your method of getting the class name without the module we can have both. The attached patch should print the value and the class name but not the module name. Steve thanks, Szymon diff --git a/doc/src/sgml/plpython.sgml b/doc/src/sgml/plpython.sgml new file mode 100644 index aaf758d..da27874 *** a/doc/src/sgml/plpython.sgml --- b/doc/src/sgml/plpython.sgml *** $$ LANGUAGE plpythonu; *** 308,321 !PostgreSQL real, double, !and numeric are converted to !Python float. Note that for !the numeric this loses information and can lead to !incorrect results. This might be fixed in a future !release. --- 308,326 + + +PostgreSQL real and double are converted to +Python float. + + + !PostgreSQL numeric is converted to !Python Decimal. This type is imported from ! cdecimal package if it is available. If cdecimal ! cannot be used, then decimal.Decimal will be used. diff --git a/src/pl/plpython/expected/plpython_types.out b/src/pl/plpython/expected/plpython_types.out new file mode 100644 index 4641345..e602336 *** a/src/pl/plpython/expected/plpython_types.out --- b/src/pl/plpython/expected/plpython_types.out *** CONTEXT: PL/Python function "test_type_ *** 213,248 (1 row) CREATE FUNCTION test_type_conversion_numeric(x numeric) RETURNS numeric AS $$ ! plpy.info(x, type(x)) return x $$ LANGUAGE plpythonu; ! /* The current implementation converts numeric to float. */ SELECT * FROM test_type_conversion_numeric(100); ! INFO: (100.0, ) CONTEXT: PL/Python function "test_type_conversion_numeric" test_type_conversion_numeric -- ! 100.0 (1 row) SELECT * FROM test_type_conversion_numeric(-100); ! INFO: (-100.0, ) CONTEXT: PL/Python function "test_type_conversion_numeric" test_type_conversion_numeric -- !-100.0 (1 row) SELECT * FROM test_type_conversion_numeric(50.5); ! INFO: (50.5, ) CONTEXT: PL/Python function "test_type_conversion_numeric" test_type_conversion_numeric -- 50.5 (1 row) SELECT * FROM test_type_conversion_numeric(null); ! INFO: (None, ) CONTEXT: PL/Python function "test_type_conversion_numeric" test_type_conversion_numeric -- --- 213,264 (1 row) CREATE FUNCTION test_type_conversion_numeric(x numeric) RETURNS numeric AS $$ ! plpy.info(x,x.__class__.__name__) return x $$ LANGUAGE plpythonu; ! /* The current implementation converts numeric to Decimal. */ SELECT * FROM test_type_conversion_numeric(100); ! INFO: (Decimal('100'), 'Decimal') CONTEXT: PL/Python function "test_type_conversion_numeric" test_type_conversion_numeric -- ! 100 (1 row) SELECT * FROM test_type_conversion_numeric(-100); ! INFO: (Decimal('-100'), 'Decimal') CONTEXT: PL/Python function "test_type_conversion_numeric" test_type_conversion_numeric -- ! -100 (1 row) SELECT * FROM test_type_conversion_numeric(50.5); ! INFO: (Decimal('50.5'), 'Decimal') CONTEXT: PL/Python function "test_type_conversion_numeric" test_type_conversion_numeric -- 50.5 (1 row) + SELECT * FROM test_type_conversion_numeric(1234567890.0987654321); + INFO: (Decimal('1234567890.0987654321'), 'Decimal') + CONTEXT: PL/Python function "test_type_conversion_numeric" + test_type_conversion_numeric + -- + 1234567890.0987654321 + (1 row) + +
Re: [HACKERS] changeset generation v5-01 - Patches & git tree
On 07/05/2013 08:03 AM, Andres Freund wrote: On 2013-06-27 21:52:03 -0700, Kevin Grittner wrote: Tried that, too, and problem persists. The log shows the last commit on your branch as 022c2da1873de2fbc93ae524819932719ca41bdb. Ok. I think I have a slight idea what's going on. Could you check whether recompiling with -O0 "fixes" the issue? There's something strange going on here, not sure whether it's just a bug that's hidden, by either not doing optimizations or by adding more elog()s, or wheter it's a compiler bug. I am getting the same test failure Kevin is seeing. This is on a x64 Debian wheezy machine with gcc (Debian 4.7.2-5) 4.7.2 Building with -O0 results in passing tests. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] changeset generation v5-01 - Patches & git tree
On 07/05/2013 09:34 AM, Andres Freund wrote: On 2013-07-05 09:28:45 -0400, Steve Singer wrote: On 07/05/2013 08:03 AM, Andres Freund wrote: On 2013-06-27 21:52:03 -0700, Kevin Grittner wrote: Tried that, too, and problem persists. The log shows the last commit on your branch as 022c2da1873de2fbc93ae524819932719ca41bdb. Ok. I think I have a slight idea what's going on. Could you check whether recompiling with -O0 "fixes" the issue? There's something strange going on here, not sure whether it's just a bug that's hidden, by either not doing optimizations or by adding more elog()s, or wheter it's a compiler bug. I am getting the same test failure Kevin is seeing. This is on a x64 Debian wheezy machine with gcc (Debian 4.7.2-5) 4.7.2 Building with -O0 results in passing tests. Does the patch from http://archives.postgresql.org/message-id/20130705132513.GB11640%40awork2.anarazel.de or the git tree (which is rebased ontop of the mvcc catalog commit from robert which needs some changes) fix it, even with optimizations? Yes your latest git tree the tests pass with -O2 Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] changeset generation v5-01 - Patches & git tree
On 06/14/2013 06:51 PM, Andres Freund wrote: The git tree is at: git://git.postgresql.org/git/users/andresfreund/postgres.git branch xlog-decoding-rebasing-cf4 http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/xlog-decoding-rebasing-cf4 We discussed issues related to passing options to the plugins a number of months ago ( http://www.postgresql.org/message-id/20130129015732.ga24...@awork2.anarazel.de) I'm still having issues with the syntax you describe there. START_LOGICAL_REPLICATION "1" 0/0 ("foo","bar") unexpected termination of replication stream: ERROR: foo requires a parameter START_LOGICAL_REPLICATION "1" 0/0 ("foo" "bar") "START_LOGICAL_REPLICATION "1" 0/0 ("foo" "bar")": ERROR: syntax error START_LOGICAL_REPLICATION "1" 0/0 ("foo") works okay Steve On 2013-06-15 00:48:17 +0200, Andres Freund wrote: Overview of the attached patches: 0001: indirect toast tuples; required but submitted independently 0002: functions for testing; not required, 0003: (tablespace, filenode) syscache; required 0004: RelationMapFilenodeToOid: required, simple 0005: pg_relation_by_filenode() function; not required but useful 0006: Introduce InvalidCommandId: required, simple 0007: Adjust Satisfies* interface: required, mechanical, 0008: Allow walsender to attach to a database: required, needs review 0009: New GetOldestXmin() parameter; required, pretty boring 0010: Log xl_running_xact regularly in the bgwriter: required 0011: make fsync_fname() public; required, needs to be in a different file 0012: Relcache support for an Relation's primary key: required 0013: Actual changeset extraction; required 0014: Output plugin demo; not required (except for testing) but useful 0015: Add pg_receivellog program: not required but useful 0016: Add test_logical_decoding extension; not required, but contains the tests for the feature. Uses 0014 0017: Snapshot building docs; not required Version v5-01 attached Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication identifiers, take 3
On 09/26/2014 06:05 PM, Andres Freund wrote: On 2014-09-26 14:57:12 -0400, Robert Haas wrote: Sure, it'll possibly not be trivial to move them elsewhere. On the other hand, the padding bytes have been unused for 8+ years without somebody laying "claim" on them but "me". I don't think it's a good idea to leave them there unused when nobody even has proposed another use for a long while. That'll just end up with them continuing to be unused. And there's actually four more consecutive bytes on 64bit systems that are unused. Should there really be a dire need after that, we can simply bump the record size. WAL volume wise it'd not be too bad to make the record a tiny bit bigger - the header is only a relatively small fraction of the entire content. If we were now increasing the WAL record size anyway for some unrelated reason, would we be willing to increase it by a further 2 bytes for the node identifier? If the answer is 'no' then I don't think we can justify using the 2 padding bytes just because they are there and have been unused for years. But if the answer is yes, we feel this important enough to justfiy a slightly (2 byte) larger WAL record header then we shouldn't use the excuse of maybe needing those 2 bytes for something else. When something else comes along that needs the WAL space we'll have to increase the record size. To say that if some other patch comes along that needs the space we'll redo this feature to use the method Robert describes is unrealistic. If we think that the replication identifier isn't general/important/necessary to justify 2 bytes of WAL header space then we should start out with something that doesn't use the WAL header, Steve I've also wondered about that. Perhaps we simply should have an additional 'name' column indicating the replication solution? Yeah, maybe, but there's still the question of substructure within the non-replication-solution part of the name. Not sure if we can assume that a bipartite identifier, specifically, is right, or whether some solutions will end up with different numbers of components. Ah. I thought you only wanted to suggest a separator between the replication solution and it's internal dat. But you actually want to suggest an internal separator to be used in the solution's namespace? I'm fine with that. I don't think we can suggest much beyond that - different solutions will have fundamentally differing requirements about which information to store. Agreed. So, let's recommend underscores as that separator? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication identifiers, take 3
On 09/26/2014 10:21 AM, Andres Freund wrote: On 2014-09-26 09:53:09 -0400, Robert Haas wrote: On Fri, Sep 26, 2014 at 5:05 AM, Andres Freund wrote: Let me try to summarize the information requirements for each of these things. For #1, you need to know, after crash recovery, for each standby, the last commit LSN which the client has confirmed via a feedback message. I'm not sure I understand what you mean here? This is all happening on the *standby*. The standby needs to know, after crash recovery, the latest commit LSN from the primary that it has successfully replayed. Ah, sorry, you're right: so, you need to know, after crash recovery, for each machine you are replicating *from*, the last transaction (in terms of LSN) from that server that you successfully replayed. Precisely. I don't think a solution which logs the change of origin will be simpler. When the origin is in every record, you can filter without keep track of any state. That's different if you can switch the origin per tx. At the very least you need a in memory entry for the origin. But again, that complexity pertains only to logical decoding. Somebody who wants to tweak the WAL format for an UPDATE in the future doesn't need to understand how this works, or care. I agree that that's a worthy goal. But I don't see how this isn't the case with what I propose? This isn't happening on the level of individual rmgrs/ams - there've been two padding bytes after 'xl_rmid' in struct XLogRecord for a long time. There's also the significant advantage that not basing this on the xid allows it to work correctly with records not tied to a transaction. There's not that much of that happening yet, but I've several features in mind: * separately replicate 2PC commits. 2PC commits don't have an xid anymore... With some tooling on top replication 2PC in two phases allow for very cool stuff. Like optionally synchronous multimaster replication. * I have a pending patch that allows to send 'messages' through logical decoding - yielding a really fast and persistent queue. For that it's useful have transactional *and* nontransactional messages. * Sanely replicating CONCURRENTLY stuff gets harder if you tie things to the xid. At what point in the decoding stream should something related to a CONCURRENTLY command show up? Also, for a logical message queue why couldn't you have a xid associated with the message that had nothing else in the transaction? l -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] HINT: pg_hba.conf changed since last config reload
On 10/16/2014 11:34 PM, Craig Ringer wrote: Given the generally positive reception to this, here's a patch. The first patch adds an errhint_log , akin to the current errdetail_log, so we can send a different HINT to the server log than we do to the client. The patch behaves as you describe.I feel that this feature would be useful , and you implemented the suggestions given that requested the reload notice but be sent to the client but instead just a hint about checking the server log. You follow the pattern set with detail_log which makes sense. The variable name "hint_log" doesn't make it obvious to me that the hint goes to the server log, but not the client. The comment for errhint_log should maybe explicitly say that. One question about the code: Does errfinish (elog.c at around line 505) need to free hint_log ? (I would assume it does) Other than that the patch looks good to me. - Something else I noticed while testing. This isn't introduced by your patch but I am wondering if it an existing bug if I setup my configuration like this: #data_directory = 'ConfigDir' # use data in another directory # (change requires restart) hba_file = 'ConfigDir/pg_hba2.conf' # host-based authentication file and start postgres like ./postgres -D ../data it looks for pg2hba2.conf at bin/ConfigDir/pg_hba2.conf (relative to the bin directory I started it from) Then if I change my pg_hba.conf and do a reload I get the following in the log LOG: parameter "hba_file" cannot be changed without restarting the server LOG: configuration file "/usr/local/pgsql95git/bin/../data/postgresql.conf" contains errors; unaffected changes were applied set_config_option is comparing the relative path with the absolute path. Steve (Even if DETAIL was appropriate for this info, which it isn't, I can't use errdetail_log because it's already used for other information in some of the same error sites.) The second patch adds a test during errors to report if pg_hba.conf is stale, or if pg_ident.conf is stale. Typical output, client: psql: FATAL: Peer authentication failed for user "fred" HINT: See the server error log for additional information. Typical output, server: LOG: provided user name (fred) and authenticated user name (craig) do not match FATAL: Peer authentication failed for user "fred" DETAIL: Connection matched pg_hba.conf line 84: "local all all peer" HINT: pg_hba.conf has been changed since last server configuration reload. Reload the server configuration to apply the changes. I've added this to the next CF. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving of INT64_FORMAT to c.h
On 10/16/2014 08:08 AM, Andres Freund wrote: On 2014-10-16 08:04:17 -0400, Jan Wieck wrote: Hi, PostgreSQL has for ages defined INT64_FORMAT and UINT64_FORMAT in pg_config.h. This commit http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ce486056ecd28050 moved those definitions to c.h, which breaks compilation of all released Slony-I versions against current master. Can those be moved back to where they used to be? Well, you could add additional configure stuff to also emit what you want. Slony uses the definitions in external tools, like slon and slonik, to format sequence numbers in log output. Then it should include c.h/postgres_fe.h? So the header of c.h says "Note that the definitions here are not intended to be exposed to clients" but postgres_fe.h says "This should be the first file included by PostgreSQL client libraries and" Should client programs that live outside the postgres source tree be including postgres_fe.h ? I have a feeling the answer is no. If the answer is no, then why does a make install install postgres_fe.h ? Slonik used to include postgre_fe.h but back in 2011 or so we stopped doing so because it was causing issues (I think on win32 builds) Maybe slony client programs shouldn't be trying to steal portability definitions from postgres headers, but I doubt we are the only ones doing that. It isn't a big deal for slony to define it's own INT64_FORMAT for 9.5+ but third party projects that have been including pg_config.h will hit similar issues. if there was good reason for the change then fine (Postgres isn't intended to be a general purpose C portability layer). Steve Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Moving of INT64_FORMAT to c.h
On 10/22/2014 04:17 PM, Robert Haas wrote: Slonik used to include postgre_fe.h but back in 2011 or so we stopped doing so because it was causing issues (I think on win32 builds) That seems like something we ought to consider fixing, but obviously we'd need more details. When I'll try to find sometime to get a windows environment running and try to figure out what the problems were. It's also possible that I removed the postgres_fe include at the same time as I was fixing other win32 issues and the postgres_fe include wasn't causing a problem it was just removed because it was unnecessary. At the moment slony only includes the server include dir if we are building with pgport (which we normally do on windows) We have had reports of issues like described (http://www.slony.info/bugzilla/show_bug.cgi?id=315) where some installs make us pick up server and client includes from different versions of PG. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] "snapshot too large" error when initializing logical replication (9.4)
I sometimes get the error "snapshot too large" from my logical replication walsender process when in response to a CREATE_REPLICATION_SLOT. This is in SnapBuildExportSnapshot in snapbuild.c newxcnt is 212 at that point I have max_connections = 200 procArray->maxProcs=212 Should we be testing newxcnt > GetMaxSnapshotXidCount() instead of newxcnt >= GetMaxSnapshotXidCount() -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] logical decoding - reading a user catalog table
My logical decoding plugin is occasionally getting this error "could not resolve cmin/cmax of catalog tuple" I get this when my output plugin is trying to read one of the user defined catalog tables (user_catalog_table=true) I am not sure if this is a bug in the time-travel support in the logical decoding support of if I'm just using it wrong (ie not getting a sufficient lock on the relation or something). This is the interesting part of the stack trace #4 0x0091bbc8 in HeapTupleSatisfiesHistoricMVCC (htup=0x7fffcf42a900, snapshot=0x7f786ffe92d8, buffer=10568) at tqual.c:1631 #5 0x004aedf3 in heapgetpage (scan=0x28d7080, page=0) at heapam.c:399 #6 0x004b0182 in heapgettup_pagemode (scan=0x28d7080, dir=ForwardScanDirection, nkeys=0, key=0x0) at heapam.c:747 #7 0x004b1ba6 in heap_getnext (scan=0x28d7080, direction=ForwardScanDirection) at heapam.c:1475 #8 0x7f787002dbfb in lookupSlonyInfo (tableOid=91754, ctx=0x2826118, origin_id=0x7fffcf42ab8c, table_id=0x7fffcf42ab88, set_id=0x7fffcf42ab84) at slony_logical.c:663 #9 0x7f787002b7a3 in pg_decode_change (ctx=0x2826118, txn=0x28cbec0, relation=0x7f787a3446a8, change=0x7f786ffe3268) at slony_logical.c:237 #10 0x007497d4 in change_cb_wrapper (cache=0x28cbda8, txn=0x28cbec0, relation=0x7f787a3446a8, change=0x7f786ffe3268) at logical.c:704 Here is what the code in lookupSlonyInfo is doing -- sltable_oid = get_relname_relid("sl_table",slony_namespace); sltable_rel = relation_open(sltable_oid,AccessShareLock); tupdesc=RelationGetDescr(sltable_rel); scandesc=heap_beginscan(sltable_rel, GetCatalogSnapshot(sltable_oid),0,NULL); reloid_attnum = get_attnum(sltable_oid,"tab_reloid"); if(reloid_attnum == InvalidAttrNumber) elog(ERROR,"sl_table does not have a tab_reloid column"); set_attnum = get_attnum(sltable_oid,"tab_set"); if(set_attnum == InvalidAttrNumber) elog(ERROR,"sl_table does not have a tab_set column"); tableid_attnum = get_attnum(sltable_oid, "tab_id"); if(tableid_attnum == InvalidAttrNumber) elog(ERROR,"sl_table does not have a tab_id column"); while( (tuple = heap_getnext(scandesc,ForwardScanDirection) )) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical decoding - reading a user catalog table
On 10/28/2014 01:31 PM, Andres Freund wrote: On 2014-10-25 18:18:07 -0400, Steve Singer wrote: My logical decoding plugin is occasionally getting this error "could not resolve cmin/cmax of catalog tuple" I get this when my output plugin is trying to read one of the user defined catalog tables (user_catalog_table=true) Hm. That should obviously not happen. Could you describe how that table is modified? Does that bug happen initially, or only after a while? It doesn't happen right away, in this case it was maybe 4 minutes after creating the slot. The error also doesn't always happen when I run the this test workload but it is reproducible with some trying. I' don't do anything special to that table, it gets created then I do inserts on it. I don't do an alter table or anything fancy like that. I was running the slony failover test (all nodes under the same postmaster) which involves the occasional dropping and recreating of databases along with normal query load + replication. I'll send you tar of the data directory off list with things in this state. Do you have a testcase that would allow me to easily reproduce the problem? I don't have a isolated test case that does this. The test that I'm hitting this with does lots of stuff and doesn't even always hit this. I am not sure if this is a bug in the time-travel support in the logical decoding support of if I'm just using it wrong (ie not getting a sufficient lock on the relation or something). I don't know yet... This is the interesting part of the stack trace #4 0x0091bbc8 in HeapTupleSatisfiesHistoricMVCC (htup=0x7fffcf42a900, snapshot=0x7f786ffe92d8, buffer=10568) at tqual.c:1631 #5 0x004aedf3 in heapgetpage (scan=0x28d7080, page=0) at heapam.c:399 #6 0x004b0182 in heapgettup_pagemode (scan=0x28d7080, dir=ForwardScanDirection, nkeys=0, key=0x0) at heapam.c:747 #7 0x004b1ba6 in heap_getnext (scan=0x28d7080, direction=ForwardScanDirection) at heapam.c:1475 #8 0x7f787002dbfb in lookupSlonyInfo (tableOid=91754, ctx=0x2826118, origin_id=0x7fffcf42ab8c, table_id=0x7fffcf42ab88, set_id=0x7fffcf42ab84) at slony_logical.c:663 #9 0x7f787002b7a3 in pg_decode_change (ctx=0x2826118, txn=0x28cbec0, relation=0x7f787a3446a8, change=0x7f786ffe3268) at slony_logical.c:237 #10 0x007497d4 in change_cb_wrapper (cache=0x28cbda8, txn=0x28cbec0, relation=0x7f787a3446a8, change=0x7f786ffe3268) at logical.c:704 Here is what the code in lookupSlonyInfo is doing -- sltable_oid = get_relname_relid("sl_table",slony_namespace); sltable_rel = relation_open(sltable_oid,AccessShareLock); tupdesc=RelationGetDescr(sltable_rel); scandesc=heap_beginscan(sltable_rel, GetCatalogSnapshot(sltable_oid),0,NULL); reloid_attnum = get_attnum(sltable_oid,"tab_reloid"); if(reloid_attnum == InvalidAttrNumber) elog(ERROR,"sl_table does not have a tab_reloid column"); set_attnum = get_attnum(sltable_oid,"tab_set"); if(set_attnum == InvalidAttrNumber) elog(ERROR,"sl_table does not have a tab_set column"); tableid_attnum = get_attnum(sltable_oid, "tab_id"); if(tableid_attnum == InvalidAttrNumber) elog(ERROR,"sl_table does not have a tab_id column"); while( (tuple = heap_getnext(scandesc,ForwardScanDirection) )) (Except missing spaces ;)) I don't see anything obviously wrong with this. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "snapshot too large" error when initializing logical replication (9.4)
On 10/28/2014 01:27 PM, Andres Freund wrote: Hi, On 2014-10-25 18:09:36 -0400, Steve Singer wrote: I sometimes get the error "snapshot too large" from my logical replication walsender process when in response to a CREATE_REPLICATION_SLOT. Yes. That's possible if 'too much' was going on until a consistent point was reached. I think we can just use a much larger size for the array if necessary. I've attached patch for this. Could you try whether that helps? I don't have a testcase handy that reproduces the problem. This patch seems to fix things. I've done numerous runs of the test with I was doing before with your patch applied and don't seem to be having this issue anymore. This is in SnapBuildExportSnapshot in snapbuild.c newxcnt is 212 at that point I have max_connections = 200 procArray->maxProcs=212 Should we be testing newxcnt > GetMaxSnapshotXidCount() instead of newxcnt >= GetMaxSnapshotXidCount() It actually looks correct to me new - newxcnt is used as an offset into an array of size GetMaxSnapshotXidCount(). Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On 11/05/2014 11:23 AM, Jim Nasby wrote: Except that commit time is not guaranteed unique *even on a single system*. That's my whole point. If we're going to bother with all the commit time machinery it seems really silly to provide a way to uniquely order every commit. Clearly trying to uniquely order commits across multiple systems is a far larger problem, and I'm not suggesting we attempt that. But for a single system AIUI all we need to do is expose the LSN of each commit record and that will give you the exact and unique order in which transactions committed. This isn't a hypothetical feature either; if we had this, logical replication systems wouldn't have to try and fake this via batches. You could actually recreate exactly what data was visible at what time to all transactions, not just repeatable read ones (as long as you kept snapshot data as well, which isn't hard). As for how much data to keep, if you have a process that's doing something to record this information permanently all it needs to do is keep an old enough snapshot around. That's not that hard to do, even from user space. +1 for this. It isn't just 'replication' systems that have a need for getting the commit order of transactions on a single system. I have a application (not slony) where we want to query a table but order the output based on the transaction commit order of when the insert into the table was done (think of a queue). I'm not replicating the output but passing the data to other applications for further processing. If I just had the commit timestamp I would need to put in some other condition to break ties in a consistent way. I think being able to get an ordering by commit LSN is what I really want in this case not the timestamp. Logical decoding is one solution to this (that I was considering) but being able to do something like select * FROM event_log order by commit_id would be a lot simpler. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On 11/05/2014 05:43 PM, Andres Freund wrote: On 2014-11-05 17:17:05 -0500, Steve Singer wrote: Imo that's essentially a different feature. What you essentially would need here is a 'commit sequence number' - but no timestamps. And probably to be useful that number has to be 8 bytes in itself. I think this gets to the heart of some of the differing views people have expressed on this patch Is this patch supposed to: A) Add commit timestamp tracking but nothing more B) Add infrastructure to store commit timestamps and provide a facility for storing additional bits of data extensions might want to be associated with the commit C). Add commit timestamps and node identifiers to commits If the answer is (A) then I can see why some people are objecting to adding extradata.If the answer is (B) then it's fair to ask how well does this patch handle various types of things people might want to attach to the commit record (such as the LSN). I think the problem is that once you start providing a facility extensions can use to store data along with the commit record then being restricted to 4 or 8 bytes is very limiting. It also doesn't allow you to load two extensions at once on a system. You wouldn't be able to have both the 'steve_commit_order' extension and BDR installed at the same time. I don't think this patch does a very good job at (B) but It wasn't intended to. If what we are really doing is C, and just calling the space 'extradata' until we get the logical identifier stuff in and then we are going to rename extradata to nodeid then we should say so. If we are really concerned about the pg_upgrade impact of expanding the record later then maybe we should add 4 bytes of padding to the CommitTimeStampEntry now and but leave the manipulating the node id until later. Steve Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On 11/07/2014 07:07 PM, Petr Jelinek wrote: The list of what is useful might be long, but we can't have everything there as there are space constraints, and LSN is another 8 bytes and I still want to have some bytes for storing the "origin" or whatever you want to call it there, as that's the one I personally have biggest use-case for. So this would be ~24bytes per txid already, hmm I wonder if we can pull some tricks to lower that a bit. The reason why Jim and myself are asking for the LSN and not just the timestamp is that I want to be able to order the transactions. Jim pointed out earlier in the thread that just ordering on timestamp allows for multiple transactions with the same timestamp. Maybe we don't need the entire LSN to solve that. If you already have the commit timestamp maybe you only need another byte or two of granularity to order transactions that are within the same microsecond. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On 11/10/2014 08:39 AM, Petr Jelinek wrote: On 09/11/14 17:57, Steve Singer wrote: On 11/07/2014 07:07 PM, Petr Jelinek wrote: The list of what is useful might be long, but we can't have everything there as there are space constraints, and LSN is another 8 bytes and I still want to have some bytes for storing the "origin" or whatever you want to call it there, as that's the one I personally have biggest use-case for. So this would be ~24bytes per txid already, hmm I wonder if we can pull some tricks to lower that a bit. The reason why Jim and myself are asking for the LSN and not just the timestamp is that I want to be able to order the transactions. Jim pointed out earlier in the thread that just ordering on timestamp allows for multiple transactions with the same timestamp. Maybe we don't need the entire LSN to solve that. If you already have the commit timestamp maybe you only need another byte or two of granularity to order transactions that are within the same microsecond. Hmm maybe just one part of LSN, but I don't really like that either, if we want to store LSN we should probably store it as is as somebody might want to map it to txid for other reasons. I did the calculation above wrong btw, it's actually 20 bytes not 24 bytes per record, I am inclined to just say we can live with that. Since we agreed that the (B) case is not really feasible and we are doing the (C), I also wonder if extradata should be renamed to nodeid (even if it's not used at this point as nodeid). And then there is question about the size of it, since the nodeid itself can live with 2 bytes probably ("64k of nodes ought to be enough for everybody" ;) ). Or leave the extradata as is but use as reserved space for future use and not expose it at this time on SQL level at all? I guess Andres could answer what suits him better here. I am happy with renaming extradata to nodeid and not exposing it at this time. If we feel that commit-order (ie LSN or something equivalent) is really a different patch/feature than commit-timestamp then I am okay with that also but we should make sure to warn users of the commit-timestamp in the documentation that two transactions might have the same timestamp and that the commit order might not be the same as ordering by the commit timestamp. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical decoding - reading a user catalog table
On 11/13/2014 02:44 PM, Andres Freund wrote: H I've pushed a fix for a bug that could possibly also cause this. Although it'd be odd that it always hits the user catalog table. Except if your tests mostly modify the slony tables, but do not do much DDL otherwise? The test I was running doesn't do DDL, so only the user catalog tables would have changes being tracked. I still sometimes get the error. When I get sometime on the weekend I'll work on some detailed instructions on how to grab and setup the various components to replicate this test. Also since updating (to 2c267e47afa4f9a7c) I've seen a assertion failure in a normal client connection, not the walsender #3 0x006b4978 in GetSerializableTransactionSnapshotInt ( snapshot=snapshot@entry=0xbfa8a0 , sourcexid=sourcexid@entry=0) at predicate.c:1738 #4 0x006b66c3 in GetSafeSnapshot (origSnapshot=) at predicate.c:1517 #5 GetSerializableTransactionSnapshot ( snapshot=0xbfa8a0 ) at predicate.c:1598 #6 0x007d16dd in GetTransactionSnapshot () at snapmgr.c:200 #7 0x006c0e35 in exec_simple_query ( query_string=0x1fd01b8 "select ev_origin, ev_seqno, ev_timestamp,ev_snapshot, \"pg_catalog\".txid_snapshot_xmin(ev_snapshot), \"pg_catalog\".txid_snapshot_xmax(ev_snapshot), coalesce(ev_provider_xid,\""...) at postgres.c:959 #8 PostgresMain (argc=, argv=argv@entry=0x1f5ab50, I have no idea if this has anything to do with your recent changes or some other change. I haven't so far been able to replicate that since the first time I saw it. I'll send you tar of the data directory off list with things in this state. Do you have a testcase that would allow me to easily reproduce the problem? I don't have a isolated test case that does this. The test that I'm hitting this with does lots of stuff and doesn't even always hit this. If it still happens, could you send me instructions of how to reproduce the problem after cloning the necessary source repositories? It's quite hard to validate a possible fix otherwise. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On 11/14/2014 08:21 PM, Simon Riggs wrote: The requested information is already available, as discussed. Logical decoding adds commit ordering for *exactly* the purpose of using it for replication, available to all solutions. This often requested feature has now been added and doesn't need to be added twice. So what we are discussing is adding a completely superfluous piece of information. Not including the LSN info does nothing to trigger based replication, which will no doubt live on happily for many years. But adding LSN will slow down logical replication, for no purpose at all. Simon, The use cases I'm talking about aren't really replication related. Often I have come across systems that want to do something such as 'select * from orders where X > the_last_row_I_saw order by X' and then do further processing on the order. This is kind of awkard to do today because you don't have a good candidate for 'X' to order on. Using either a sequence or insert-row timestamp doesn't work well because a transaction with a lower value for X might end up committing after the higher value in in a query result. Yes you could setup a logical wal slot and listen on the stream of inserts into your order table but thats a lot of administration overhead compared to just issuing an SQL query for what really is a query type operation. Using the commit timestamp for my X sounded very tempting but could allow duplicates. One could argue that this patch is about replication features, and providing commit ordering for query purposes should be a separate patch to add that on top of this infrastructure. I see merit to smaller more focused patches but that requires leaving the door open to easily extending things later. It could also be that I'm the only one who wants to order and filter queries in this manner (but that would surprise me). If the commit lsn has limited appeal and we decide we don't want it at all then we shouldn't add it. I've seen this type of requirement in a number of different systems at a number of different companies. I've generally seen it dealt with by either selecting rows behind the last now() timestamp seen and then filtering out already processed rows or by tracking the 'processed' state of each row individually (ie performing an update on each row once its been processed) which performs poorly. Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical decoding - reading a user catalog table
On 11/13/2014 02:44 PM, Andres Freund wrote: Hi Steve, If it still happens, could you send me instructions of how to reproduce the problem after cloning the necessary source repositories? It's quite hard to validate a possible fix otherwise. 1. Install PG 9.4 2. Perform an initdb max_connections = 200 wal_level=logical max_walsenders=50 wal_keep_segments = 100 wal_sender_timeout = 600s max_replication_slots = 120 Create a user slon with superuser create a user test (set passwords for them you'll use them later) Configure pg_hba.conf to allow slon to connect as a replication user 3. Download slony from github (https://github.com/ssinger/slony1-engine.git) checkout the branch logical_remote_helper ./configure --with-pgconfigdir=/path_to_your_pgcbindir make make install 4. Download clustertest & compile clustertest from (https://github.com/clustertest/clustertest-framework). 5. In the slony source tree cd to the clustertest directory 6. cp conf/disorder.properties.sample to conf/disorder.properties Edit the file to have the correct paths, ports, users, passwords. 7 cp conf/java.properties.sample to conf/java.properties edit the file to point at the JAR you downloaded earlier. I run with all 5 databases on the same PG cluster. Performance will be much better with 5 different clusters, but I recommend trying to emulate my configuration as much as possible to replicate this To run the tests then do ./run_all_disorder_tests.sh At the moment (commit df5acfd1a3) is configured to just run the Failover node test cases where I am seeing this not the entire suite. It typically takes between 30 minutes to an hour to run though. I installed things following the above steps on a different system than my usual development laptop and I have been unable to reproduce the error so for (on that system). But I am still able to reproduce it on occasion on my normal development laptop. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical decoding - reading a user catalog table
On 11/16/2014 04:49 PM, Steve Singer wrote: I installed things following the above steps on a different system than my usual development laptop and I have been unable to reproduce the error so for (on that system). But I am still able to reproduce it on occasion on my normal development laptop. After continuously running the test I was eventually able to reproduce the error on the other system as well. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical decoding - reading a user catalog table
On 11/17/2014 10:37 AM, Andres Freund wrote: On 2014-11-13 22:23:02 -0500, Steve Singer wrote: Also since updating (to 2c267e47afa4f9a7c) I've seen a assertion failure in a normal client connection, not the walsender #3 0x006b4978 in GetSerializableTransactionSnapshotInt ( snapshot=snapshot@entry=0xbfa8a0 , sourcexid=sourcexid@entry=0) at predicate.c:1738 #4 0x006b66c3 in GetSafeSnapshot (origSnapshot=) at predicate.c:1517 #5 GetSerializableTransactionSnapshot ( snapshot=0xbfa8a0 ) at predicate.c:1598 #6 0x007d16dd in GetTransactionSnapshot () at snapmgr.c:200 #7 0x006c0e35 in exec_simple_query ( query_string=0x1fd01b8 "select ev_origin, ev_seqno, ev_timestamp, ev_snapshot, \"pg_catalog\".txid_snapshot_xmin(ev_snapshot), \"pg_catalog\".txid_snapshot_xmax(ev_snapshot), coalesce(ev_provider_xid,\""...) at postgres.c:959 #8 PostgresMain (argc=, argv=argv@entry=0x1f5ab50, I have no idea if this has anything to do with your recent changes or some other change. I haven't so far been able to replicate that since the first time I saw it. That crash is decidedly odd. Any chance you still have the full backtrace around? Yes I still have the core file This is in the SSI code... I'm not immediately seeing how this could be related to logical decoding. It can't even be a imported snapshot, because the exported snapshot is marked repeatable read. Are you actually using serializable transactions? If so, how and why? Yes, the test client that performs the 'simulated workload' does some serializable transactions. It connects as a normal client via JDBC and does a prepareStatement("SET TRANSACTION ISOLATION LEVEL SERIALIZABLE") and then does some DML. Why? because it seemed like a good thing to include in the test suite. Your right this might have nothing to do with logical decoding. I haven't been able to reproduce again either, I can't even say if this problem was introduced to 9.4 in the past month or if it has been around much longer and I just haven't hit it before. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical decoding - reading a user catalog table
On 11/17/2014 11:34 AM, Andres Freund wrote: Hi, Kevin: CCed you, because it doesn't really look like a logical decoding related issue. On 2014-11-17 11:25:40 -0500, Steve Singer wrote: On 11/17/2014 10:37 AM, Andres Freund wrote: On 2014-11-13 22:23:02 -0500, Steve Singer wrote: Also since updating (to 2c267e47afa4f9a7c) I've seen a assertion failure in a normal client connection, not the walsender #3 0x006b4978 in GetSerializableTransactionSnapshotInt ( snapshot=snapshot@entry=0xbfa8a0 , sourcexid=sourcexid@entry=0) at predicate.c:1738 #4 0x006b66c3 in GetSafeSnapshot (origSnapshot=) at predicate.c:1517 #5 GetSerializableTransactionSnapshot ( snapshot=0xbfa8a0 ) at predicate.c:1598 #6 0x007d16dd in GetTransactionSnapshot () at snapmgr.c:200 #7 0x006c0e35 in exec_simple_query ( query_string=0x1fd01b8 "select ev_origin, ev_seqno, ev_timestamp, ev_snapshot, \"pg_catalog\".txid_snapshot_xmin(ev_snapshot), \"pg_catalog\".txid_snapshot_xmax(ev_snapshot), coalesce(ev_provider_xid,\""...) at postgres.c:959 #8 PostgresMain (argc=, argv=argv@entry=0x1f5ab50, I have no idea if this has anything to do with your recent changes or some other change. I haven't so far been able to replicate that since the first time I saw it. That crash is decidedly odd. Any chance you still have the full backtrace around? Yes I still have the core file Cool, could you show the full thing? Or did you just snip it because it's just the Assert/ExceptionalCondition thing? I just snipped the exception stuff. Here is the full thing (gdb) backtrace #0 0x7f6fb22e8295 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x7f6fb22eb438 in __GI_abort () at abort.c:90 #2 0x007a08e7 in ExceptionalCondition ( conditionName=conditionName@entry=0x918e88 "!(TransactionIdFollows(snapshot->xmin, PredXact->SxactGlobalXmin))", errorType=errorType@entry=0x7da390 "FailedAssertion", fileName=fileName@entry=0x9182c3 "predicate.c", lineNumber=lineNumber@entry=1738) at assert.c:54 #3 0x006b4978 in GetSerializableTransactionSnapshotInt ( snapshot=snapshot@entry=0xbfa8a0 , sourcexid=sourcexid@entry=0) at predicate.c:1738 #4 0x006b66c3 in GetSafeSnapshot (origSnapshot=) at predicate.c:1517 #5 GetSerializableTransactionSnapshot ( snapshot=0xbfa8a0 ) at predicate.c:1598 #6 0x007d16dd in GetTransactionSnapshot () at snapmgr.c:200 #7 0x006c0e35 in exec_simple_query ( query_string=0x1fd01b8 "select ev_origin, ev_seqno, ev_timestamp,ev_snapshot, \"pg_catalog\".txid_snapshot_xmin(ev_snapshot), \"pg_catalog\".txid_snapshot_xmax(ev_snapshot), coalesce(ev_provider_xid,\""...) at postgres.c:959 #8 PostgresMain (argc=, argv=argv@entry=0x1f5ab50, ---Type to continue, or q to quit--- dbname=0x1f5ab30 "test2", username=) at postgres.c:4016 #9 0x0046a08e in BackendRun (port=0x1f83010) at postmaster.c:4123 #10 BackendStartup (port=0x1f83010) at postmaster.c:3797 #11 ServerLoop () at postmaster.c:1576 #12 0x00665eae in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1f59d00) at postmaster.c:1223 #13 0x0046ab58 in main (argc=3, argv=0x1f59d00) at main.c:227 Could you print *snapshot in frame #3? (gdb) p *snapshot $1 = {satisfies = 0x7d0330 , xmin = 412635, xmax = 412637, xip = 0x1f86e40, xcnt = 1, subxcnt = 0, subxip = 0x1fd2190, suboverflowed = 0 '\000', takenDuringRecovery = 0 '\000', copied = 0 '\000', curcid = 0, active_count = 0, regd_count = 0} (gdb) This is in the SSI code... I'm not immediately seeing how this could be related to logical decoding. It can't even be a imported snapshot, because the exported snapshot is marked repeatable read. Are you actually using serializable transactions? If so, how and why? Yes, the test client that performs the 'simulated workload' does some serializable transactions. It connects as a normal client via JDBC and does a prepareStatement("SET TRANSACTION ISOLATION LEVEL SERIALIZABLE") and then does some DML. Why? because it seemed like a good thing to include in the test suite. Yes, it's a good reason as the above backtrace proves ;). I'm just trying to understand uner which circumstances it happens. The above backtrace looks like it can only be triggered if you do a BEGIN TRANSACTION SERIALIZABLE DEFERRABLE READ ONLY; Is that something you do? The slony remote listener connection does this when it selects from sl_event. We switched to this shortly after 9.1 came out. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On 11/19/2014 08:22 AM, Alvaro Herrera wrote: I think we're overblowing the pg_upgrade issue. Surely we don't need to preserve commit_ts data when upgrading across major versions; and pg_upgrade is perfectly prepared to remove old data when upgrading (actually it just doesn't copy it; consider pg_subtrans or pg_serial, for instance.) If we need to change binary representation in a future major release, we can do so without any trouble. That sounds reasonable. I am okay with Petr removing the LSN portion this patch. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pglogical - logical replication contrib module
On 12/31/2015 06:34 PM, Petr Jelinek wrote: Hi, I'd like to submit the replication solution which is based on the pglogical_output [1] module (which is obviously needed for this to compile). Hi, make check gives me for extra in ../../contrib/pglogical_output contrib/pglogical; do make -C '../..'/$extra DESTDIR='/usr/local/src/postgresql'/tmp_install install >>'/usr/local/src/postgresql'/tmp_install/log/install.log || exit; done make[1]: *** ../../../../contrib/pglogical_output: No such file or directory. Stop. ../../src/Makefile.global:325: recipe for target 'temp-install' failed make: *** [temp-install] Error 2 ssinger@ssinger-laptop:/usr/local/src/postgresql/contrib/pglogical$ The attached patch fixes that but it then is creating the test database 'contrib_regression' not 'regression' changing pglogical.provider_dsn = 'contrib_regression' still leaves me with a lot of failures. diff --git a/contrib/pglogical/Makefile b/contrib/pglogical/Makefile new file mode 100644 index 1640f63..a4dab88 *** a/contrib/pglogical/Makefile --- b/contrib/pglogical/Makefile *** include $(top_srcdir)/contrib/contrib-gl *** 27,34 # typical installcheck users do not have (e.g. buildfarm clients). @installcheck: ; ! EXTRA_INSTALL += $(top_srcdir)/contrib/pglogical_output ! EXTRA_REGRESS_OPTS += $(top_srcdir)/contrib/pglogical/regress-postgresql.conf override CPPFLAGS := $(CPPFLAGS) -I$(top_srcdir)/contrib/pglogical_output --- 27,34 # typical installcheck users do not have (e.g. buildfarm clients). @installcheck: ; ! EXTRA_INSTALL += contrib/pglogical_output ! EXTRA_REGRESS_OPTS += --temp-config=regress-postgresql.conf override CPPFLAGS := $(CPPFLAGS) -I$(top_srcdir)/contrib/pglogical_output -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pglogical - logical replication contrib module
On 12/31/2015 06:34 PM, Petr Jelinek wrote: Hi, I'd like to submit the replication solution which is based on the pglogical_output [1] module (which is obviously needed for this to compile). The pglogical contrib module provides extension which does the master-slave logical replication based on the logical decoding. The basic documentation is in README.md, I didn't bother making sgml docs yet since I expect that there will be ongoing changes happening and it's easier for me to update the markdown docs than sgml. I will do the conversion once we start approaching committable state. I am going to send my comments/issues out in batches as I find them instead of waiting till I look over everything. I find this part of the documentation a bit unclear +Once the provider node is setup, subscribers can be subscribed to it. First the +subscriber node must be created: + +SELECT pglogical.create_node( +node_name := 'subscriber1', +dsn := 'host=thishost port=5432 dbname=db' +); + My initial reading was that I should execute this on the provider node. Perhaps instead - Once the provider node is setup you can then create subscriber nodes. Create the subscriber nodes and then execute the following commands on each subscriber node create extension pglogical select pglogical.create_node(node_name:='subsriberX',dsn:='host=thishost dbname=db port=5432'); --- Also the documentation for create_subscription talks about + - `synchronize_structure` - specifies if to synchronize structure from +provider to the subscriber, default true I did the following test2=# select pglogical.create_subscription(subscription_name:='default sub',provider_dsn:='host=localhost dbname=test1 port=5436'); create_subscription - 247109879 Which then resulted in the following showing up in my PG log LOG: worker process: pglogical apply 16542:247109879 (PID 4079) exited with exit code 1 ERROR: replication slot name "pgl_test2_test1_default sub" contains invalid character HINT: Replication slot names may only contain lower case letters, numbers, and the underscore character. FATAL: could not send replication command "CREATE_REPLICATION_SLOT "pgl_test2_test1_default sub" LOGICAL pglogical_output": status PGRES_FATAL_ERROR: ERROR: replication slot name "pgl_test2_test1_default sub" contains invalid character HINT: Replication slot names may only contain lower case letters, numbers, and the underscore character. The create_subscription command should check if the subscription name is valid (meets the rules that will be applied against the slot command). I wondered how I could fix my mistake. The docs say +- `pglogical.pglogical_drop_subscription(subscription_name name, ifexists bool)` + Disconnects the subscription and removes it from the catalog. + test2=# select pglogical.pglogical_drop_subscription('default sub', true); ERROR: function pglogical.pglogical_drop_subscription(unknown, boolean) does not exist The command is actually called pglogical.drop_subscription the docs should be fixed to show the actual command name I then wanted to add a second table to my database. ('b'). select pglogical.replication_set_add_table('default','public.b',true); replication_set_add_table --- t (1 row) In my pglog I then got LOG: starting sync of table public.b for subscriber defaultsub ERROR: replication slot name "pgl_test2_test1_defaultsub_public.b" contains invalid character HINT: Replication slot names may only contain lower case letters, numbers, and the underscore character. FATAL: could not send replication command "CREATE_REPLICATION_SLOT "pgl_test2_test1_defaultsub_public.b" LOGICAL pglogical_output": status PGRES_FATAL_ERROR: ERROR: replication slot name "pgl_test2_test1_defaultsub_public.b" contains invalid character HINT: Replication slot names may only contain lower case letters, numbers, and the underscore character. I then did test1=# select pglogical.replication_set_remove_table('default','public.b'); replication_set_remove_table -- t (1 row) but my log still keep repeating the error, so I tried connecting to the replica and did the same test2=# select pglogical.replication_set_remove_table('default','public.b'); ERROR: replication set mapping -303842815:16726 not found Is there any way to recover from this situation? The documenation says I can drop a replication set, maybe that will let replication continue. +- `pglogical.delete_replication_set(set_name text)` + Removes the replication set. + select pglogical.delete_replication_set('default'); ERROR: function pglogical.delete_replication_set(unknown) does not exist LINE 1: select pglogical.delete_replication_set('default'); ^ HINT: No function matches the given name and argument types. You might need to add explicit type c
Re: [HACKERS] pglogical - logical replication contrib module
On 01/09/2016 01:30 PM, Steve Singer wrote: On 12/31/2015 06:34 PM, Petr Jelinek wrote: I'm not really sure what to do to 'recover' my cluster at this point so I'll send this off and rebuild my cluster and start over. I had a setup test1--->test2 (with 2 tables in the default set) I then created a third database (all three hosted on the same PG cluster) In the third database (test3) test3=# create extension pglogical; CREATE EXTENSION test3=# select pglogical.create_node(node_name:='test3', dsn:='host=localhost dbname=test3 port=5436'); create_node - 2001662995 (1 row) test3=# select pglogical.create_subscription(subscription_name:='defaultsub',provider_dsn:='host=localhost dbname=test2 port=5436'); create_subscription - 2974019075 It copied the schema over but not the data (if I use test2 as the provider_dsn then it does copy the data). I then tried inserting a row into a table on test1. Things crashed and after crash recovery I keep getting 2016-01-10 13:03:15 EST LOG: database system is ready to accept connections 2016-01-10 13:03:15 EST LOG: autovacuum launcher started 2016-01-10 13:03:15 EST LOG: starting apply for subscription defaultsub 2016-01-10 13:03:15 EST LOG: starting apply for subscription defaultsub 2016-01-10 13:03:15 EST test2LOG: starting logical decoding for slot "pgl_test3 _test2_defaultsub" 2016-01-10 13:03:15 EST test2DETAIL: streaming transactions committing after 0/ 18292D8, reading WAL from 0/18292D8 2016-01-10 13:03:15 EST test2LOG: logical decoding found consistent point at 0/ 18292D8 2016-01-10 13:03:15 EST test2DETAIL: Logical decoding will begin using saved sn apshot. TRAP: FailedAssertion("!(IsTransactionState())", File: "catcache.c", Line: 1127) 2016-01-10 13:03:15 EST test2LOG: unexpected EOF on standby connection 2016-01-10 13:03:15 EST LOG: worker process: pglogical apply 17016:2974019075 ( PID 24746) was terminated by signal 6: Aborted The stack trace is #3 0x007b83af in SearchCatCache (cache=0xe27d18, v1=15015784, v2=v2@entry=0, v3=v3@entry=0, v4=v4@entry=0) at catcache.c:1127 #4 0x007c503e in SearchSysCache (cacheId=cacheId@entry=47, key1=, key2=key2@entry=0, key3=key3@entry=0, key4=key4@entry=0) at syscache.c:981 #5 0x006996d4 in replorigin_by_name ( roname=0xe51f30 "pgl_test2_test1_defaultsub", missing_ok=missing_ok@entry=0 '\000') at origin.c:216 #6 0x7fdb54a908d3 in handle_origin (s=0x7ffd873f6da0) at pglogical_apply.c:235 #7 replication_handler (s=0x7ffd873f6da0) at pglogical_apply.c:1031 #8 apply_work (streamConn=streamConn@entry=0xe84fb0) at pglogical_apply.c:1309 #9 0x7fdb54a911cc in pglogical_apply_main (main_arg=) at pglogical_apply.c:1691 #10 0x00674912 in StartBackgroundWorker () at bgworker.c:726 ---Type to continue, or q to quit--- #11 0x0067f7e2 in do_start_bgworker (rw=0xe03890) at postmaster.c:5501 #12 maybe_start_bgworker () at postmaster.c:5676 #13 0x00680206 in sigusr1_handler (postgres_signal_arg=) at postmaster.c:4937 #14 #15 0x7fdb54fa2293 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81 #16 0x00468285 in ServerLoop () at postmaster.c:1648 #17 0x0068161e in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0xddede0) at postmaster.c:1292 #18 0x0046979d in main (argc=3, argv=0xddede0) at main.c:223 I tried dropping the subscription and re-adding it. I keep getting 2016-01-10 13:21:48 EST test1LOG: logical decoding found consistent point at 0/1830080 2016-01-10 13:21:48 EST test1DETAIL: There are no running transactions. 2016-01-10 13:21:48 EST test1LOG: exported logical decoding snapshot: "04DE-1" with 0 transaction IDs 2016-01-10 13:21:48 EST test3ERROR: relation "a" already exists 2016-01-10 13:21:48 EST test3STATEMENT: CREATE TABLE a ( a integer NOT NULL, b integer ); pg_restore: [archiver (db)] Error while PROCESSING TOC: pg_restore: [archiver (db)] Error from TOC entry 182; 1259 16700 TABLE a ssinger pg_restore: [archiver (db)] could not execute query: ERROR: relation "a" already exists Command was: CREATE TABLE a ( a integer NOT NULL, b integer ); 2016-01-10 13:21:48 EST ERROR: could not execute command "/usr/local/pgsql96gitlogical/bin/pg_restore --section="pre-data" --exit-on-error -1 -d "host=localhost dbname=test3 port=5436" "/tmp/pglogical-28079.dump"" 2016-01-10 13:21:48 EST test1LOG: unexpected EOF on client connection with an open transaction 2016-01-10 13:21:48 EST LOG: worker process: pglogical apply 17016:844915593 (PID 28079) exited with exit code 1 2016-01-10 13:21:48 EST ERROR: subscriber defaultsub4 initialization failed duri
Re: [HACKERS] [PATCH 09/14] Adjust all *Satisfies routines to take a HeapTuple instead of a HeapTupleHeader
On 12-11-14 08:17 PM, Andres Freund wrote: For the regular satisfies routines this is needed in prepareation of logical decoding. I changed the non-regular ones for consistency as well. The naming between htup, tuple and similar is rather confused, I could not find any consistent naming anywhere. This is preparatory work for the logical decoding feature which needs to be able to get to a valid relfilenode from when checking the visibility of a tuple. I have taken a look at this patch. The patch does what it says, it changes a bunch of HeapTupleSatisfies routines to take a HeapTuple structure instead of a HeapTupleHeader. It also sets the HeapTuple.t_tableOid value before calling these routines. The patch does not modify these routines to actually do anything useful with the additional data fields though it does add some assertions to make sure t_tableOid is set. The patch compiles cleanly and the unit tests pass. This patch does not seem to depend on any of the other patches in this set and applies cleanly against master. The patch doesn't actually add any functionality, unless someone sees a reason for complaining about this that I don't see, then I think it can be committed. Steve --- contrib/pgrowlocks/pgrowlocks.c | 2 +- src/backend/access/heap/heapam.c | 13 ++ src/backend/access/heap/pruneheap.c | 16 ++-- src/backend/catalog/index.c | 2 +- src/backend/commands/analyze.c | 3 ++- src/backend/commands/cluster.c | 2 +- src/backend/commands/vacuumlazy.c| 3 ++- src/backend/storage/lmgr/predicate.c | 2 +- src/backend/utils/time/tqual.c | 50 +--- src/include/utils/snapshot.h | 4 +-- src/include/utils/tqual.h| 20 +++ 11 files changed, 83 insertions(+), 34 deletions(-)
Re: [HACKERS] logical changeset generation v3 - Source for Slony
First, you can add me to the list of people saying 'wow', I'm impressed. The approach I am taking to reviewing this to try and answer the following question 1) How might a future version of slony be able to use logical replication as described by your patch and design documents and what would that look like. 2) What functionality is missing from the patch set that would stop me from implementing or prototyping the above. Connecting slon to the remote postgresql Today the slony remote listener thread queries a bunch of events from sl_event for a batch of SYNC events. Then the remote helper thread queries data from sl_log_1 and sl_log_2.I see this changing, instead the slony remote listener thread would connect to the remote system and get a logical replication stream. 1) Would slony connect as a normal client connection and call something like 'select pg_slony_process_xlog(...)' to get bunch of logical replication change records to process. OR 2) Would slony connect as a replication connection similar to how the pg_receivelog program does today and then process the logical changeset outputs. Instead of writing it to a file (as pg_receivelog does) It seems that the second approach is what is encouraged. I think we would put a lot of the pg_receivelog functionality into slon and it would issue a command like 'INIT_LOGICAL_REPLICATION 'slony') to use the slony logical replication plugin. Slon would also have to provide feedback to the walsender about what it has processed so the origin database knows what catalog snapshots can be expired. Based on eyeballing pg_receivelog.c it seems that about half the code in the 700 file is related to command line arguments etc, and the other half is related to looping over the copy out stream, sending feedback and other things that we would need to duplicate in slon. pg_receivelog.c has comment: /* * We have to use postgres.h not postgres_fe.h here, because there's so much * backend-only stuff in the XLOG include files we need. But we need a * frontend-ish environment otherwise.Hence this ugly hack. */ This looks like more of a carryover from pg_receivexlog.c. From what I can tell we can eliminate the postgres.h include if we also eliminate the utils/datetime.h and utils/timestamp.h and instead add in: #include "postgres_fe.h" #define POSTGRES_EPOCH_JDATE 2451545 #define UNIX_EPOCH_JDATE 2440588 #define SECS_PER_DAY 86400 #define USECS_PER_SEC INT64CONST(100) typedef int64 XLogRecPtr; #define InvalidXLogRecPtr 0 If there is a better way of getting these defines someone should speak up. I recall that in the past slon actually did include postgres.h and it caused some issues (I think with MSVC win32 builds). Since pg_receivelog.c will be used as a starting point/sample for third parties to write client programs it would be better if it didn't encourage client programs to include postgres.h The Slony Output Plugin = Once we've modified slon to connect as a logical replication client we will need to write a slony plugin. As I understand the plugin API: * A walsender is processing through WAL records, each time it sees a COMMIT WAL record it will call my plugins .begin .change (for each change in the transaction) .commit * The plugin for a particular stream/replication client will see one transaction at a time passed to it in commit order. It won't see .change(t1) followed by .change (t2), followed by a second .change(t1). The reorder buffer code hides me from all that complexity (yah) From a slony point of view I think the output of the plugin will be rows, suitable to be passed to COPY IN of the form: origin_id, table_namespace,table_name,command_type, cmd_updatencols,command_args This is basically the Slony 2.2 sl_log format minus a few columns we no longer need (txid, actionseq). command_args is a postgresql text array of column=value pairs. Ie [ {id=1},{name='steve'},{project='slony'}] I don't t think our output plugin will be much more complicated than the test_decoding plugin. I suspect we will want to give it the ability to filter out non-replicated tables. We will also have to filter out change records that didn't originate on the local-node that aren't part of a cascaded subscription. Remember that in a two node cluster slony will have connections from A-->B and from B--->A even if user tables only flow one way. Data that is replicated from A into B will show up in the WAL stream for B. Exactly how we do this filtering is an open question, I think the output plugin will at a minimum need to know: a) What the slony node id is of the node it is running on. This is easy to figure out if the output plugin is able/allowed to query its database. Will this be possible? I would expect to be able to query the database as it exists now(at plugin invocation time) not as it existed in the pa
Re: [HACKERS] logical changeset generation v3 - Source for Slony
On 12-11-18 11:07 AM, Andres Freund wrote: Hi Steve! I think we should provide some glue code to do this, otherwise people will start replicating all the bugs I hacked into this... More seriously: I think we should have support code here, no user will want to learn the intracacies of feedback messages and such. Where that would live? No idea. libpglogicalrep.so ? I wholeheartedly aggree. It should also be cleaned up a fair bit before others copy it should we not go for having some client side library. Imo the library could very roughly be something like: state = SetupStreamingLLog(replication-slot, ...); while((message = StreamingLLogNextMessage(state)) { write(outfd, message->data, message->length); if (received_100_messages) { fsync(outfd); StreamingLLogConfirm(message); } } Although I guess thats not good enough because StreamingLLogNextMessage would be blocking, but that shouldn't be too hard to work around. How about we pass a timeout value to StreamingLLogNextMessage (..) where it returns if no data is available after the timeout to give the caller a chance to do something else. This is basically the Slony 2.2 sl_log format minus a few columns we no longer need (txid, actionseq). command_args is a postgresql text array of column=value pairs. Ie [ {id=1},{name='steve'},{project='slony'}] It seems to me that that makes escaping unneccesarily complicated, but given you already have all the code... ;) When I look at the actual code/representation we picked it is closer to {column1,value1,column2,value2...} I don't t think our output plugin will be much more complicated than the test_decoding plugin. Good. Thats the idea ;). Are you ok with the interface as it is now or would you like to change something? I'm going to think about this some more and maybe try to write an example plugin before I can say anything with confidence. Yes. We will also need something like that. If you remember the first prototype we sent to the list, it included the concept of an 'origin_node' in wal record. I think you actually reviewed that one ;) That was exactly aimed at something like this... Since then my thoughts about how the origin_id looks like have changed a bit: - origin id is internally still represented as an uint32/Oid - never visible outside of wal/system catalogs - externally visible it gets - assigned an uuid - optionally assigned a user defined name - user settable (permissions?) origin when executing sql: - SET change_origin_uuid = 'uuid'; - SET change_origin_name = 'user-settable-name'; - defaults to the local node - decoding callbacks get passed the origin of a change - txn->{origin_uuid, origin_name, origin_internal?} - the init decoding callback can setup an array of interesting origins, so the others don't even get the ReorderBuffer treatment I have to thank the discussion on -hackers and a march through prague with Marko here... So would the uuid and optional name assignment be done in the output plugin or some else? When/how does the uuid get generated and where do we store it so the same uuid gets returned when postgres restarts. Slony today stores all this type of stuff in user-level tables and user-level functions (because it has no other choice).What is the connection between these values and the 'slot-id' in your proposal for the init arguments? Does the slot-id need to be the external uuid of the other end or is there no direct connection? Today slony allows us to replicate between two databases in the same postgresql cluster (I use this for testing all the time) Slony also allows for two different 'slony clusters' to be setup in the same database (or so I'm told, I don't think I have ever tried this myself). plugin functions that let me query the local database and then return the uuid and origin_name would work in this model. +1 on being able to mark the 'change origin' in a SET command when the replication process is pushing data into the replica. Exactly how we do this filtering is an open question, I think the output plugin will at a minimum need to know: a) What the slony node id is of the node it is running on. This is easy to figure out if the output plugin is able/allowed to query its database. Will this be possible? I would expect to be able to query the database as it exists now(at plugin invocation time) not as it existed in the past when the WAL was generated. In addition to the node ID I can see us wanting to be able to query other slony tables (sl_table,sl_set etc...) Hm. There is no fundamental reason not to allow normal database access to the current database but it won't be all that cheap, so doing it frequently is not a good idea. The reason its not cheap is that you basically need to teardown the postgres internal caches if you switch the timestream in which you are working. Would go something like: TransactionContext = AllocSetCreate(
Re: [HACKERS] [PATCH 11/14] Introduce wal decoding via catalog timetravel
On 12-11-14 08:17 PM, Andres Freund wrote: I am getting errors like the following when I try to use either your test_decoding plugin or my own (which does even less than yours) LOG: database system is ready to accept connections LOG: autovacuum launcher started WARNING: connecting to WARNING: Initiating logical rep LOG: computed new xmin: 773 LOG: start reading from 0/17F5D58, scrolled back to 0/17F4000 LOG: got new xmin 773 at 25124280 LOG: found initial snapshot (via running xacts). Done: 1 WARNING: reached consistent point, stopping! WARNING: Starting logical replication LOG: start reading from 0/17F5D58, scrolled back to 0/17F4000 LOG: found initial snapshot (via running xacts). Done: 1 FATAL: cannot read pg_class without having selected a database TRAP: FailedAssertion("!(SHMQueueEmpty(&(MyProc->myProcLocks[i])))", File: "proc.c", Line: 759) This seems to be happening under the calls at reorderbuffer.c:832if (!SnapBuildHasCatalogChanges(NULL, xid, &change->relnode)) The sequence of events I do is: 1. start pg_receivellog 2. run a checkpoint 3. Attach to the walsender process with gdb 4. Start a new client connection with psql and do 'INSERT INTO a values (1)' twice. (skipping step 3 doesn't make a difference) I This introduces several things: * 'reorderbuffer' module which reassembles transactions from a stream of interspersed changes * 'snapbuilder' which builds catalog snapshots so that tuples from wal can be understood * logging more data into wal to facilitate logical decoding * wal decoding into an reorderbuffer * shared library output plugins with 5 callbacks * init * begin * change * commit * walsender infrastructur to stream out changes and to keep the global xmin low enough * INIT_LOGICAL_REPLICATION $plugin; waits till a consistent snapshot is built and returns * initial LSN * replication slot identifier * id of a pg_export() style snapshot * START_LOGICAL_REPLICATION $id $lsn; streams out changes * uses named output plugins for output specification Todo: * testing infrastructure (isolationtester) * persistence/spilling to disk of built snapshots, longrunning transactions * user docs * more frequent lowering of xmins * more docs about the internals * support for user declared catalog tables * actual exporting of initial pg_export snapshots after INIT_LOGICAL_REPLICATION * own shared memory segment instead of piggybacking on walsender's * nicer interface between snapbuild.c, reorderbuffer.c, decode.c and the outside. * more frequent xl_running_xid's so xmin can be upped more frequently * add STOP_LOGICAL_REPLICATION $id --- src/backend/access/heap/heapam.c| 280 +- src/backend/access/transam/xlog.c |1 + src/backend/catalog/index.c | 74 ++ src/backend/replication/Makefile|2 + src/backend/replication/logical/Makefile| 19 + src/backend/replication/logical/decode.c| 496 ++ src/backend/replication/logical/logicalfuncs.c | 247 + src/backend/replication/logical/reorderbuffer.c | 1156 +++ src/backend/replication/logical/snapbuild.c | 1144 ++ src/backend/replication/repl_gram.y | 32 +- src/backend/replication/repl_scanner.l |2 + src/backend/replication/walsender.c | 566 ++- src/backend/storage/ipc/procarray.c | 23 + src/backend/storage/ipc/standby.c |8 +- src/backend/utils/cache/inval.c |2 +- src/backend/utils/cache/relcache.c |3 +- src/backend/utils/misc/guc.c| 11 + src/backend/utils/time/tqual.c | 249 + src/bin/pg_controldata/pg_controldata.c |2 + src/include/access/heapam_xlog.h| 23 + src/include/access/transam.h|5 + src/include/access/xlog.h |3 +- src/include/catalog/index.h |4 + src/include/nodes/nodes.h |2 + src/include/nodes/replnodes.h | 22 + src/include/replication/decode.h| 21 + src/include/replication/logicalfuncs.h | 44 + src/include/replication/output_plugin.h | 76 ++ src/include/replication/reorderbuffer.h | 284 ++ src/include/replication/snapbuild.h | 128 +++ src/include/replication/walsender.h |1 + src/include/replication/walsender_private.h | 34 +- src/include/storage/itemptr.h |3 + src/include/storage/sinval.h|2 + src/include/utils/tqual.h | 31 +- 35 files changed, 4966 insertions(+), 34 deletions(-) create mode 100644 src/backend/replication/logical/Makefile create mode 100644 src/backend/replication/logical/decode.c
Re: [HACKERS] [PATCH 11/14] Introduce wal decoding via catalog timetravel
On 12-12-03 07:42 AM, Andres Freund wrote: Two things: 1) Which exact options are you using for pg_receivellog? Not "-d replication" by any chance? Yes that is exactly what I'md doing. Using a real database name instead makes this go away. Thanks This seems to produce exactly that kind off error messages. I added some error checking around that. Pushed. Thanks! Andres Freund -- Andres Freundhttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 11/14] Introduce wal decoding via catalog timetravel
On 12-12-03 09:48 AM, Andres Freund wrote: On 2012-12-03 09:35:55 -0500, Steve Singer wrote: On 12-12-03 07:42 AM, Andres Freund wrote: Yes that is exactly what I'md doing. Using a real database name instead makes this go away. Was using "replication" an accident or do you think it makes sense in some way? The 'replication' line in pg_hba.conf made me think that the database name had to be replication for walsender connections. Greetings, Andres Freund -- Andres Freundhttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding & exported base snapshot
On 12-12-11 06:52 PM, Andres Freund wrote: Hi, Problem 1: One problem I see is that while exporting a snapshot solves the visibility issues of the table's contents it does not protect against schema changes. I am not sure whether thats a problem. If somebody runs a CLUSTER or something like that, the table's contents will be preserved including MVCC semantics. That's fine. The more problematic cases I see are TRUNCATE, DROP and ALTER TABLE. Imagine the following: S1: INIT_LOGICAL_REPLICATION S1: get snapshot: 0333-1 S2: ALTER TABLE foo ALTER COLUMN blub text USING (blub::text); S3: pg_dump --snapshot 0333-1 S1: START_LOGICAL_REPLICATION In that case the pg_dump would dump foo using the schema *after* the ALTER TABLE but showing only rows visible to our snapshot. After START_LOGICAL_REPLICATION all changes after the xlog position from INIT_LOGICAL_REPLICATION will be returned though - including all the tuples from the ALTER TABLE and potentially - if some form of schema capturing was in place - the ALTER TABLE itself. The copied schema would have the new format already though. Does anybody see that as aproblem or is it just a case of PEBKAC? One argument for the latter is that thats already a problematic case for normal pg_dump's. Its just that the window is a bit larger here. Is there anyway to detect this situation as part of the pg_dump? If I could detect this, abort my pg_dump then go and get a new snapshot then I don't see this as a big deal. I can live with telling users, "don't do DDL like things while subscribing a new node, if you do the subscription will restart". I am less keen on telling users "don't do DDL like things while subscribing a new node or the results will be unpredictable" I'm trying to convince myself if I will be able to take a pg_dump from an exported snapshot plus the changes made after in between the snapshot id to some later time and turn the results into a consistent database. What if something like this comes along INIT REPLICATION insert into foo (id,bar) values (1,2); alter table foo drop column bar; pg_dump --snapshot The schema I get as part of the pg_dump won't have bar because it has been dropped, even though it will have the rows that existed with bar. I then go to process the INSERT statement. It will have a WAL record with column data for bar and the logical replication replay will lookup the catalog rows from before the alter table so it will generate a logical INSERT record with BAR. That will fail on the replica. I'm worried that it will be difficult to pragmatically stitch together the inconsistent snapshot from the pg_dump plus the logical records generated in between the snapshot and the dump (along with any record of the DDL if it exists). Problem 2: Control Flow. To be able to do a "SET TRANSACTION SNAPSHOT" the source transaction needs to be alive. That's currently solved by exporting the snapshot in the walsender connection that did the INIT_LOGICAL_REPLICATION. The question is how long should we preserve that snapshot? There is no requirement - and there *cannot* be one in the general case, the slot needs to be usable after a restart - that START_LOGICAL_REPLICATION has to be run in the same replication connection as INIT. Possible solutions: 1) INIT_LOGICAL_REPLICATION waits for an answer from the client that confirms that logical replication initialization is finished. Before that the walsender connection cannot be used for anything else. 2) we remove the snapshot as soon as any other commend is received, this way the replication connection stays usable, e.g. to issue a START_LOGICAL_REPLICATION in parallel to the initial data dump. In that case the snapshot would have to be imported *before* the next command was received as SET TRANSACTION SNAPSHOT requires the source transaction to be still open. Opinions? Option 2 sounds more flexible. Is it more difficult to implement? Andres -- Andres Freundhttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical decoding & exported base snapshot
On 12-12-12 06:20 AM, Andres Freund wrote: Possible solutions: 1) INIT_LOGICAL_REPLICATION waits for an answer from the client that confirms that logical replication initialization is finished. Before that the walsender connection cannot be used for anything else. 2) we remove the snapshot as soon as any other commend is received, this way the replication connection stays usable, e.g. to issue a START_LOGICAL_REPLICATION in parallel to the initial data dump. In that case the snapshot would have to be imported *before* the next command was received as SET TRANSACTION SNAPSHOT requires the source transaction to be still open. Option 2 sounds more flexible. Is it more difficult to implement? No, I don't think so. It's a bit more intrusive in that it requires knowledge about logical replication in more parts of walsender, but it should be ok. Note btw, that my description of 1) was easy to misunderstand. The "that" in "Before that the walsender connection cannot be used for anything else." is the answer from the client, not the usage of the exported snapshot. Once the snapshot has been iimported into other session(s) the source doesn't need to be alive anymore. Does that explanation change anything? I think I understood you were saying the walsender connection can't be used for anything else (ie streaming WAL) until the exported snapshot has been imported. I think your clarification is still consistent with this? WIth option 2 I can still get the option 1 behaviour by not sending the next command to the walsender until I am done importing the snapshot. However if I want to start processing WAL before the snapshot has been imported I can do that with option 2. I am not sure I would need to do that, I'm just saying having the option is more flexible. Greetings, Andres Freund -- Andres Freundhttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] AIX buildfarm member
The only animal in the buildfarm running AIX is grebe (http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=grebe&br=HEAD) It is likely that the server running this animal will go away sometime this year and the machine replacing it isn't running AIX. If someone else in the community is running PostgreSQL on AIX then it would be good if they setup a buildfarm member, perhaps with a more recent version of AIX. Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 13-01-09 03:07 PM, Tom Lane wrote: Andres Freund writes: Well, I *did* benchmark it as noted elsewhere in the thread, but thats obviously just machine (E5520 x 2) with one rather restricted workload (pgbench -S -jc 40 -T60). At least its rather palloc heavy. Here are the numbers: before: #101646.763208 101350.361595 101421.425668 101571.211688 101862.172051 101449.857665 after: #101553.596257 102132.277795 101528.816229 101733.541792 101438.531618 101673.400992 So on my system if there is a difference, its positive (0.12%). pgbench-based testing doesn't fill me with a lot of confidence for this --- its numbers contain a lot of communication overhead, not to mention that pgbench itself can be a bottleneck. It struck me that we have a recent test case that's known to be really palloc-intensive, namely Pavel's example here: http://www.postgresql.org/message-id/CAFj8pRCKfoz6L82PovLXNK-1JL=jzjwat8e2bd2pwnkm7i7...@mail.gmail.com I set up a non-cassert build of commit 78a5e738e97b4dda89e1bfea60675bcf15f25994 (ie, just before the patch that reduced the data-copying overhead for that). On my Fedora 16 machine (dual 2.0GHz Xeon E5503, gcc version 4.6.3 20120306 (Red Hat 4.6.3-2)) I get a runtime for Pavel's example of 17023 msec (average over five runs). I then applied oprofile and got a breakdown like this: samples| %| -- 108409 84.5083 /home/tgl/testversion/bin/postgres 13723 10.6975 /lib64/libc-2.14.90.so 3153 2.4579 /home/tgl/testversion/lib/postgresql/plpgsql.so samples %symbol name 1096010.1495 AllocSetAlloc 6325 5.8572 MemoryContextAllocZeroAligned 6225 5.7646 base_yyparse 3765 3.4866 copyObject 2511 2.3253 MemoryContextAlloc 2292 2.1225 grouping_planner 2044 1.8928 SearchCatCache 1956 1.8113 core_yylex 1763 1.6326 expression_tree_walker 1347 1.2474 MemoryContextCreate 1340 1.2409 check_stack_depth 1276 1.1816 GetCachedPlan 1175 1.0881 AllocSetFree 1106 1.0242 GetSnapshotData 1106 1.0242 _SPI_execute_plan 1101 1.0196 extract_query_dependencies_walker I then applied the palloc.h and mcxt.c hunks of your patch and rebuilt. Now I get an average runtime of 1 ms, a full 2% faster, which is a bit astonishing, particularly because the oprofile results haven't moved much: 107642 83.7427 /home/tgl/testversion/bin/postgres 14677 11.4183 /lib64/libc-2.14.90.so 3180 2.4740 /home/tgl/testversion/lib/postgresql/plpgsql.so samples %symbol name 10038 9.3537 AllocSetAlloc 6392 5.9562 MemoryContextAllocZeroAligned 5763 5.3701 base_yyparse 4810 4.4821 copyObject 2268 2.1134 grouping_planner 2178 2.0295 core_yylex 1963 1.8292 palloc 1867 1.7397 SearchCatCache 1835 1.7099 expression_tree_walker 1551 1.4453 check_stack_depth 1374 1.2803 _SPI_execute_plan 1282 1.1946 MemoryContextCreate 1187 1.1061 AllocSetFree ... 653 0.6085 palloc0 ... 552 0.5144 MemoryContextAlloc The number of calls of AllocSetAlloc certainly hasn't changed at all, so how did that get faster? I notice that the postgres executable is about 0.2% smaller, presumably because a whole lot of inlined fetches of CurrentMemoryContext are gone. This makes me wonder if my result is due to chance improvements of cache line alignment for inner loops. I would like to know if other people get comparable results on other hardware (non-Intel hardware would be especially interesting). If this result holds up across a range of platforms, I'll withdraw my objection to making palloc a plain function. regards, tom lane Sorry for the delay I only read this thread today. I just tried Pawel's test on a POWER5 machine with an older version of gcc (see the grebe buildfarm animal for details) 78a5e738e: 37874.855 (average of 6 runs) 78a5e738 + palloc.h + mcxt.c: 38076.8035 The functions do seem to slightly slow things down on POWER. I haven't bothered to run oprofile or tprof to get a breakdown of the functions since Andres has already removed this from his patch. Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v4
On 13-01-14 08:38 PM, Andres Freund wrote: Hi everyone, Here is the newest version of logical changeset generation. 2) Currently the logical replication infrastructure assigns a 'slot-id' when a new replica is setup. That slot id isn't really nice (e.g. "id-321578-3"). It also requires that [18] keeps state in a global variable to make writing regression tests easy. I think it would be better to make the user specify those replication slot ids, but I am not really sure about it. Shortly after trying out the latest version I hit the following scenario 1. I started pg_receivellog but mistyped the name of my plugin 2. It looped and used up all of my logical replication slots I killed pg_receivellog and restarted it with the correct plugin name but it won't do anything because I have no free slots. I can't free the slots with -F because I have no clue what the names of the slots are. I can figure the names out by looking in pg_llog but if my replication program can't do that so it won't be able to clean up from a failed attempt. I agree with you that we should make the user program specify a slot, we eventually might want to provide a view that shows the currently allocated slots. For a logical based slony I would just generate the slot name based on the remote node id. If walsender generates the slot name then I would need to store a mapping between slot names and slons so when a slon restarted it would know which slot to resume using. I'd have to use a table in the slony schema on the remote database for this. There would always be a risk of losing track of a slot id if the slon crashed after getting the slot number but before committing the mapping on the remote database. 3) Currently no options can be passed to an output plugin. I am thinking about making "INIT_LOGICAL_REPLICATION 'plugin'" accept the now widely used ('option' ['value'], ...) syntax and pass that to the output plugin's initialization function. I think we discussed this last CF, I like this idea. 4) Does anybody object to: -- allocate a permanent replication slot INIT_LOGICAL_REPLICATION 'plugin' 'slotname' (options); -- stream data START_LOGICAL_REPLICATION 'slotname' 'recptr'; -- deallocate a permanent replication slot FREE_LOGICAL_REPLICATION 'slotname'; +1 5) Currently its only allowed to access catalog tables, its fairly trivial to extend this to additional tables if you can accept some (noticeable but not too big) overhead for modifications on those tables. I was thinking of making that an option for tables, that would be useful for replication solutions configuration tables. I think this will make the life of anyone developing a new replication system easier. Slony has a lot of infrastructure for allowing slonik scripts to wait for configuration changes to popogate everywhere before making other configuration changes because you can get race conditions. If I were designing a new replication system and I had this feature then I would try to use it to come up with a simpler model of propagating configuration changes. Andres Freund Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 13-01-21 02:28 AM, Andres Freund wrote: I haven't removed it from the patch afaik, so it would be great to get a profile here! Its "only" for xlogdump, but that tool helped me immensely and I don't want to maintain it independently... Here is the output from tprof Here is the baseline: Average time 37862 Total Ticks For All Processes (./postgres_perftest) = 19089 Subroutine Ticks% Source Address Bytes == = == == === = .AllocSetAlloc 2270 0.99 aset.c 3bfb84b0 .base_yyparse 1217 0.53 gram.c fe644 168ec .text 1015 0.44 copyfuncs.c 11bfb4 4430 .MemoryContextAllocZero 535 0.23 mcxt.c 3b990 f0 .MemoryContextAlloc 533 0.23 mcxt.c 3b780 e0 .check_stack_depth 392 0.17 postgres.c 568ac100 .core_yylex 385 0.17 gram.c fb5b4 1c90 .expression_tree_walker 347 0.15 nodeFuncs.c 50da4750 .AllocSetFree308 0.13 aset.c 3c9981b0 .grouping_planner242 0.11 planner.c 2399f0 1d10 .SearchCatCache 198 0.09 catcache.c 7ec20550 ._SPI_execute_plan 195 0.09 spi.c 2f0c187c0 .pfree 195 0.09 mcxt.c 3b3b0 70 query_dependencies_walker185 0.08 setrefs.c 2559441b0 .GetSnapshotData 183 0.08 procarray.c 69efc460 .query_tree_walker 176 0.08 nodeFuncs.c 50ae4210 .strncpy 168 0.07 strncpy.s ba080130 .fmgr_info_cxt_security 166 0.07 fmgr.c 3f7b0850 .transformStmt 159 0.07 analyze.c 29091c 12d0 .text141 0.06 parse_collate.c 28ddf8 1 .ExecInitExpr137 0.06 execQual.c 17f18c 15f0 .fix_expr_common 132 0.06 setrefs.c 2557e4160 .standard_ExecutorStart 127 0.06 execMain.c 1d9a00940 .GetCachedPlan 125 0.05 plancache.c ce664310 .strcpy 121 0.05 noname 3bd401a8 With your changes (same test as I described before) Average: 37938 Total Ticks For All Processes (./postgres_perftest) = 19311 Subroutine Ticks% Source Address Bytes == = == == === = .AllocSetAlloc 2162 2.17 aset.c 3bfb84b0 .base_yyparse 1242 1.25 gram.c fdc7c 167f0 .text 1028 1.03 copyfuncs.c 11b4d0 4210 .palloc 553 0.56 mcxt.c 3b4c8 d0 .MemoryContextAllocZero 509 0.51 mcxt.c 3b9e8 f0 .core_yylex 413 0.41 gram.c fac2c 1c60 .check_stack_depth 404 0.41 postgres.c 56730100 .expression_tree_walker 320 0.32 nodeFuncs.c 50d28750 .AllocSetFree261 0.26 aset.c 3c9981b0 ._SPI_execute_plan 232 0.23 spi.c 2ee6247c0 .GetSnapshotData 221 0.22 procarray.c 69d54460 .grouping_planner211 0.21 planner.c 237b60 1cf0 .MemoryContextAlloc 190 0.19 mcxt.c 3b738 e0 .query_tree_walker 184 0.18 nodeFuncs.c 50a68210 .SearchCatCache 182 0.18 catcache.c 7ea08550 .transformStmt 181 0.18 analyze.c 28e774 12d0 query_dependencies_walker180 0.18 setrefs.c 25397c1b0 .strncpy 175 0.18 strncpy.s b9a60130 .MemoryContextCreate 167 0.17 mcxt.c 3bad8160 .pfree 151 0.15 mcxt.c 3b208 70 .strcpy 150 0.15 noname 3bd401a8 .fmgr_info_cxt_security 146 0.15 fmgr.c 3f790850 .text132 0.13 parse_collate.c 28bc50 1 .ExecInitExpr125 0.13 execQual.c 17de28 15e0 .expression_tree_mutator 124 0.12 nodeFuncs.c 53268 1080 .strcmp 122 0.12 noname 1e44158 .fix_expr_common 117 0.12 setrefs.c 25381c160 Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
On 13-01-21 12:15 PM, Andres Freund wrote: On 2013-01-21 11:59:18 -0500, Steve Singer wrote: On 13-01-21 02:28 AM, Andres Freund wrote: I haven't removed it from the patch afaik, so it would be great to get a profile here! Its "only" for xlogdump, but that tool helped me immensely and I don't want to maintain it independently... Here is the output from tprof Here is the baseline: Any chance to do the test ontop of HEAD? The elog stuff has gone in only afterwards and might actually effect this enough to be relevant. HEAD(:fb197290) Average: 28877 .AllocSetAlloc 1442 1.90 aset.c 3a9384b0 .base_yyparse 1220 1.61 gram.c fdbc0 168ec .MemoryContextAlloc 485 0.64 mcxt.c 3a154 e0 .core_yylex 407 0.54 gram.c fab30 1c90 .AllocSetFree320 0.42 aset.c 3b3181b0 .MemoryContextAllocZero 316 0.42 mcxt.c 3a364 f0 .grouping_planner271 0.36 planner.c 2b0ce8 1d10 .strncpy 256 0.34 strncpy.s b8ca0130 .expression_tree_walker 222 0.29 nodeFuncs.c 4f734750 ._SPI_execute_plan 221 0.29 spi.c 2fb230840 .SearchCatCache 182 0.24 catcache.c 7d870550 .GetSnapshotData 161 0.21 procarray.c 68acc460 .fmgr_info_cxt_security 155 0.20 fmgr.c 3e130850 .pfree 152 0.20 mcxt.c 39d84 70 .expression_tree_mutator 151 0.20 nodeFuncs.c 51c84 1170 .check_stack_depth 142 0.19 postgres.c 5523c100 .text126 0.17 parse_collate.c 233d90 1 .transformStmt 125 0.16 analyze.c 289e88 12c0 .ScanKeywordLookup 123 0.16 kwlookup.c f7ab4154 .new_list120 0.16 list.c 40f74 b0 .subquery_planner115 0.15 planner.c 2b29f8dc0 .GetCachedPlan 115 0.15 plancache.c cdab0310 .ExecInitExpr114 0.15 execQual.c 17e690 15f0 .set_plan_refs 113 0.15 setrefs.c 252630cb0 .standard_ExecutorStart 110 0.14 execMain.c 1d9264940 .heap_form_tuple 107 0.14 heaptuple.c 177c842f0 .query_tree_walker 105 0.14 nodeFuncs.c 4f474210 HEAD + patch: Average: 29035 Total Ticks For All Processes (./postgres_perftest) = 15044 Subroutine Ticks% Source Address Bytes == = == == === = .AllocSetAlloc 1422 1.64 aset.c 3a9384b0 .base_yyparse 1201 1.38 gram.c fd1e8 167f0 .palloc 470 0.54 mcxt.c 39e04 d0 .core_yylex 364 0.42 gram.c fa198 1c60 .MemoryContextAllocZero 282 0.33 mcxt.c 3a324 f0 ._SPI_execute_plan 250 0.29 spi.c 2f8c18840 .AllocSetFree244 0.28 aset.c 3b3181b0 .strncpy 234 0.27 strncpy.s b86a0130 .expression_tree_walker 229 0.26 nodeFuncs.c 4f698750 .grouping_planner176 0.20 planner.c 2aea84 1d30 .SearchCatCache 170 0.20 catcache.c 7d638550 .GetSnapshotData 168 0.19 procarray.c 68904460 .assign_collations_walker155 0.18 parse_collate.c 231f4c7e0 .subquery_planner141 0.16 planner.c 2b07b4dc0 .fmgr_info_cxt_security 141 0.16 fmgr.c 3e110850 .check_stack_depth 140 0.16 postgres.c 550a0100 .ExecInitExpr138 0.16 execQual.c 17d2f8 15e0 .pfree 137 0.16 mcxt.c 39b44 70 .transformStmt 132 0.15 analyze.c 287dec 12c0 .new_list128 0.15 list.c 40f44 90 .expression_tree_mutator 125 0.14 nodeFuncs.c 51bd8 1080 .preprocess_expression 118 0.14 planner.c 2adf541a0 .strcmp 118 0.14 noname 1e44158 .standard_ExecutorStart 115 0.13 execMain.c 1d77c0940 .MemoryContextAlloc 111 0.13 mcxt.c 3a074 e0 .set_plan_refs 109 0.13 setrefs.c 2506c4ca0 Otherwise I have to say I am a bit confused by the mighty difference in costs attributed to AllocSetAlloc given the amount of calls should be exactly the same and the number of "trampoline" function calls should also stay the same. Greetings, Andres Freund -- Andres Freundhttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state
On 13-01-24 06:40 AM, Andres Freund wrote: Fair enough. I am also working on a user of this infrastructure but that doesn't help you very much. Steve Singer seemed to make some stabs at writing an output plugin as well. Steve, how far did you get there? I was able to get something that generated output for INSERT statements in a format similar to what a modified slony apply trigger would want. This was with the list of tables to replicate hard-coded in the plugin. This was with the patchset from the last commitfest.I had gotten a bit hung up on the UPDATE and DELETE support because slony allows you to use an arbitrary user specified unique index as your key. It looks like better support for tables with a unique non-primary key is in the most recent patch set. I am hoping to have time this weekend to update my plugin to use parameters passed in on the init and other updates in the most recent version. If I make some progress I will post a link to my progress at the end of the weekend. My big issue is that I have limited time to spend on this. BTW, why does all the transaction reordering stuff has to be in core? It didn't use to, but people argued pretty damned hard that no undecoded data should ever allowed to leave the postgres cluster. And to be fair it makes writing an output plugin *way* much easier. Check http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=contrib/test_decoding/test_decoding.c;hb=xlog-decoding-rebasing-cf4 If you skip over tuple_to_stringinfo(), which is just pretty generic scaffolding for converting a whole tuple to a string, writing out the changes in some format by now is pretty damn simple. I think we will find that the replication systems won't be the only users of this feature. I have often seen systems that have a logging requirement for auditing purposes or to log then reconstruct the sequence of changes made to a set of tables in order to feed a downstream application. Triggers and a journaling table are the traditional way of doing this but it should be pretty easy to write a plugin to accomplish the same thing that should give better performance. If the reordering stuff wasn't in core this would be much harder. How much of this infrastructure is to support replicating DDL changes? IOW, if we drop that requirement, how much code can we slash? Unfortunately I don't think too much unless we add in other code that allows us to check whether the current definition of a table is still the same as it was back when the tuple was logged. Any other features or requirements that could be dropped? I think it's clear at this stage that this patch is not going to be committed as it is. If you can reduce it to a fraction of what it is now, that fraction might have a chance. Otherwise, it's just going to be pushed to the next commitfest as whole, and we're going to be having the same doubts and discussions then. One thing that reduces complexity is to declare the following as unsupported: - CREATE TABLE foo(data text); - DECODE UP TO HERE; - INSERT INTO foo(data) VALUES(very-long-to-be-externally-toasted-tuple); - DROP TABLE foo; - DECODE UP TO HERE; but thats just a minor thing. I think what we can do more realistically than to chop of required parts of changeset extraction is to start applying some of the preliminary patches independently: - the relmapper/relfilenode changes + pg_relation_by_filenode(spc, relnode) should be independently committable if a bit boring - allowing walsenders to connect to a database possibly needs an interface change but otherwise it should be fine to go in independently. It also has other potential use-cases, so I think thats fair. - logging xl_running_xact's more frequently could also be committed independently and makes sense independently as it allows a standby to enter HS faster if the master is busy - Introducing InvalidCommandId should be relatively uncontroversial. The fact that no invalid value for command ids exists is imo an oversight - the *Satisfies change could be applied and they are imo ready but there's no use-case for it without the rest, so I am not sure whether theres a point - currently not separately available, but we could add wal_level=logical independently. There would be no user of it, but it would be partial work. That includes the relcache support for keeping track of the primary key which already is available separately. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Event Triggers: adding information
On 13-01-24 05:43 AM, Dimitri Fontaine wrote: Robert Haas writes: On Mon, Jan 21, 2013 at 12:27 PM, Dimitri Fontaine wrote: - Context exposing and filtering I'm not feeling very sanguine about any of this. I feel strongly that we should try to do something that's more principled than just throwing random junk into ProcessUtility. The goal is to allow for advanced users of that feature to get at the sequence, constraints and index names that have been picked automatically by PostgreSQL when doing some CREATE TABLE statement that involves them: CREATE TABLE foo (id serial PRIMARY KEY, f1 text); Andres and Steve, as you're the next ones to benefit directly from that whole feature and at different levels, do you have a strong opinion on whether you really need that to operate at the logical level? I haven't been following this thread very closely and I'm not exactly sure what your asking. If the question is what a replication system would need to replicate DDL commands, then I need a method of translating whatever the DDL trigger is passed into either the 'CREATE TABLE public.foo(id serial primary key, f1 text)' OR some set of equivalent commands that will allow me to create the same objects on the replica. I don't have any brilliant insight on how I would go about during it. Honestly I haven't spent a lot of time thinking about it in the past 8 months. If your asking what I would need for a trigger to automatically add a new table to replication then I think I would only need to know (or be able to obtain) the fully qualified name of the table and then in another trigger call be given the name of the fully qualified name of the sequence. The triggers would need to be able to do DML to configuration tables. I don't need to know that a table and sequence are connected if all I want to do is replicate them. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state
On 13-01-24 11:15 AM, Steve Singer wrote: On 13-01-24 06:40 AM, Andres Freund wrote: Fair enough. I am also working on a user of this infrastructure but that doesn't help you very much. Steve Singer seemed to make some stabs at writing an output plugin as well. Steve, how far did you get there? I was able to get something that generated output for INSERT statements in a format similar to what a modified slony apply trigger would want. This was with the list of tables to replicate hard-coded in the plugin. This was with the patchset from the last commitfest.I had gotten a bit hung up on the UPDATE and DELETE support because slony allows you to use an arbitrary user specified unique index as your key. It looks like better support for tables with a unique non-primary key is in the most recent patch set. I am hoping to have time this weekend to update my plugin to use parameters passed in on the init and other updates in the most recent version. If I make some progress I will post a link to my progress at the end of the weekend. My big issue is that I have limited time to spend on this. This isn't a complete review just a few questions I've hit so far that I thought I'd ask to see if I'm not seeing something related to updates. *** a/src/include/catalog/index.h --- b/src/include/catalog/index.h *** extern bool ReindexIsProcessingHeap(Oid *** 114,117 --- 114,121 extern bool ReindexIsProcessingIndex(Oid indexOid); extern OidIndexGetRelation(Oid indexId, bool missing_ok); + extern void relationFindPrimaryKey(Relation pkrel, Oid *indexOid, +int16 *nratts, int16 *attnums, Oid *atttypids, +Oid *opclasses); + #endif /* INDEX_H */ I don't see this defined anywhere could it be left over from a previous version of the patch? In decode.c DecodeUpdate: + + /* +* FIXME: need to get/save the old tuple as well if we want primary key +* changes to work. +*/ + change->newtuple = ReorderBufferGetTupleBuf(reorder); I also don't see any code in heap_update to find + save the old primary key values like you added to heap_delete. You didn't list "Add ability to change the primary key on an UPDATE" in the TODO so I'm wondering if I'm missing something. Is there another way I can bet the primary key values for the old_tuple? Also, I think the name of the test contrib module was changed but you didn't update the make file. This fixes it diff --git a/contrib/Makefile b/contrib/Makefile index 1cc30fe..36e6bfe 100644 --- a/contrib/Makefile +++ b/contrib/Makefile @@ -50,7 +50,7 @@ SUBDIRS = \ tcn \ test_parser \ test_decoding \ - test_logical_replication \ + test_logical_decoding \ tsearch2\ unaccent\ vacuumlo\ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Event Triggers: adding information
On 13-01-26 11:11 PM, Robert Haas wrote: On Fri, Jan 25, 2013 at 11:58 AM, Dimitri Fontaine wrote: My understanding is that if the command string we give to event triggers is ambiguous (sub-object names, schema qualifications, etc), it comes useless for logical replication use. I'll leave it to the consumers of that to speak up now. Yeah, that's probably true. I think it might be useful for other purposes, but I think we need a bunch of infrastructure we don't have yet to make logical replication of DDL a reality. I agree. Does anyone have a specific use case other than DDL replication where an ambiguous command string would be useful? Even for use cases like automatically removing a table from replication when it is dropped, I would want to be able to determine which table is being dropped unambiguously. Could I determine that from an oid? I suspect so, but parsing a command string and then trying to figure out the table from the search_path doesn't sound very appealing. Well, the point is that if you have a function that maps a parse tree onto an object name, any API or ABI changes can be reflected in an updated definition for that function. So suppose I have the command "CREATE TABLE public.foo (a int)". And we have a call pg_target_object_namespace(), which will return "public" given the parse tree for the foregoing command. And we have a call pg_target_object_name(), which will return "foo". We can whack around the underlying parse tree representation all we want and still not break anything - because any imaginable parse tree representation will allow the object name and object namespace to be extracted. Were that not possible it could scarcely be called a parse tree any longer. How do you get the fully qualified type of the first column? col1=pg_target_get_column(x, 0) pg_target_get_type(col1); or something similar. I think that could work but we would be adding a lot of API functions to get all the various bits of info one would want the API to expose. I also suspect executing triggers that had to make lots of function calls to walk a tree would be much slower than an extension that could just walk the parse-tree or some other abstract tree like structure. Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v4
On 13-01-22 11:30 AM, Andres Freund wrote: Hi, I pushed a new rebased version (the xlogreader commit made it annoying to merge). The main improvements are * way much coherent code internally for intializing logical rep * explicit control over slots * options for logical replication Exactly what is the syntax for using that. My reading your changes to repl_gram.y make me think that any of the following should work (but they don't). START_LOGICAL_REPLICATION 'slon1' 0/0 ('opt1') ERROR: syntax error: unexpected character "(" "START_LOGICAL_REPLICATION 'slon1' 0/0 ('opt1' 'val1') ERROR: syntax error: unexpected character "(" START_LOGICAL_REPLICATION 'slon1' 0/0 ('opt1','opt2') ERROR: syntax error: unexpected character "(" I'm also attaching a patch to pg_receivellog that allows you to specify these options on the command line. I'm not saying I think that it is appropriate to be adding more bells and whistles to the utilities two weeks into the CF but I found this useful for testing so I'm sharing it. Greetings, Andres Freund -- Andres Freundhttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services >From 176087bacec6cbf0b86e4ffeb918f41b4a5b8d7a Mon Sep 17 00:00:00 2001 From: Steve Singer Date: Sun, 27 Jan 2013 12:24:33 -0500 Subject: [PATCH] allow pg_receivellog to pass plugin options from the command line to the plugin --- src/bin/pg_basebackup/pg_receivellog.c | 14 ++ 1 files changed, 10 insertions(+), 4 deletions(-) diff --git a/src/bin/pg_basebackup/pg_receivellog.c b/src/bin/pg_basebackup/pg_receivellog.c index 04bedbe..30b3cea 100644 --- a/src/bin/pg_basebackup/pg_receivellog.c +++ b/src/bin/pg_basebackup/pg_receivellog.c @@ -54,7 +54,7 @@ static XLogRecPtr startpos; static bool do_init_slot = false; static bool do_start_slot = false; static bool do_stop_slot = false; - +static const char * plugin_opts=""; static void usage(void); static void StreamLog(); @@ -84,6 +84,7 @@ usage(void) printf(_(" -s, --status-interval=INTERVAL\n" " time between status packets sent to server (in seconds)\n")); printf(_(" -S, --slot=SLOTuse existing replication slot SLOT instead of starting a new one\n")); + printf(_(" -o --options=OPTIONS A comma separated list of options to the plugin\n")); printf(_("\nAction to be performed:\n")); printf(_(" --init initiate a new replication slot (for the slotname see --slot)\n")); printf(_(" --startstart streaming in a replication slot (for the slotname see --slot)\n")); @@ -264,8 +265,8 @@ StreamLog(void) slot); /* Initiate the replication stream at specified location */ - snprintf(query, sizeof(query), "START_LOGICAL_REPLICATION '%s' %X/%X", - slot, (uint32) (startpos >> 32), (uint32) startpos); + snprintf(query, sizeof(query), "START_LOGICAL_REPLICATION '%s' %X/%X (%s)", + slot, (uint32) (startpos >> 32), (uint32) startpos,plugin_opts); res = PQexec(conn, query); if (PQresultStatus(res) != PGRES_COPY_BOTH) { @@ -560,6 +561,7 @@ main(int argc, char **argv) {"init", no_argument, NULL, 1}, {"start", no_argument, NULL, 2}, {"stop", no_argument, NULL, 3}, + {"options",required_argument,NULL,'o'}, {NULL, 0, NULL, 0} }; int c; @@ -584,7 +586,7 @@ main(int argc, char **argv) } } - while ((c = getopt_long(argc, argv, "f:nvd:h:p:U:wWP:s:S:", + while ((c = getopt_long(argc, argv, "f:nvd:h:p:U:wWP:s:S:o:", long_options, &option_index)) != -1) { switch (c) @@ -659,6 +661,10 @@ main(int argc, char **argv) case 3: do_stop_slot = true; break; + case 'o': +if(optarg != NULL) + plugin_opts = pg_strdup(optarg); +break; /* action */ default: -- 1.7.0.4 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to add a primary key using an existing index
On 10-11-22 03:24 PM, Steve Singer wrote: On 10-11-22 09:37 AM, Gurjeet Singh wrote: On Sat, Nov 20, 2010 at 9:00 AM, Steve Singer Almost fixed. I still get an unexpected difference. ! DETAIL: cannot create PRIMARY KEY/UNIQUE constraint with a non-unique index. CREATE UNIQUE INDEX rpi_idx2 ON rpi_test(a , b); -- should fail; WITH INDEX option specified more than once. ALTER TABLE rpi_test ADD PRIMARY KEY (a, b) --- 35,41 -- should fail; non-unique ALTER TABLE rpi_test ADD primary key(a, b) WITH (INDEX = 'rpi_idx1'); ERROR: "rpi_idx1" is not a unique index ! DETAIL: Cannot create PRIMARY KEY/UNIQUE constraint using a non-unique index. The attached version of the patch gets your regression tests to pass. I'm going to mark this as ready for a committer. replace_pkey_index.revised2.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4
On 10-12-06 06:56 PM, Greg Smith wrote: Tom Lane wrote: The various testing that's been reported so far is all for Linux and thus doesn't directly address the question of whether other kernels will have similar performance properties. Survey of some popular platforms: So my guess is that some small percentage of Windows users might notice a change here, and some testing on FreeBSD would be useful too. That's about it for platforms that I think anybody needs to worry about. If you tell me which options to pgbench and which .conf file settings you'd like to see I can probably arrange to run some tests on AIX. -- Greg Smith 2ndQuadrant usg...@2ndquadrant.comBaltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us "PostgreSQL 9.0 High Performance":http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4
On 10-12-06 09:00 PM, Josh Berkus wrote: Steve, If you tell me which options to pgbench and which .conf file settings you'd like to see I can probably arrange to run some tests on AIX. Compile and run test_fsync in PGSRC/src/tools/fsync. Attached are runs against two different disk sub-systems from a server running AIX 5.3. The first one is against the local disks Loops = 1 Simple write: 8k write 60812.454/second Compare file sync methods using one write: open_datasync 8k write 162.160/second open_sync 8k write 158.472/second 8k write, fdatasync 158.157/second 8k write, fsync 45.382/second Compare file sync methods using two writes: 2 open_datasync 8k writes79.472/second 2 open_sync 8k writes80.095/second 8k write, 8k write, fdatasync 159.268/second 8k write, 8k write, fsync44.725/second Compare open_sync with different sizes: open_sync 16k write 162.017/second 2 open_sync 8k writes79.709/second Test if fsync on non-write file descriptor is honored: (If the times are similar, fsync() can sync data written on a different descriptor.) 8k write, fsync, close 45.361/second 8k write, close, fsync 36.311/second The below profile is from the same machine using an IBM DS 6800 SAN for storage. Loops = 1 Simple write: 8k write 75933.027/second Compare file sync methods using one write: open_datasync 8k write 2762.801/second open_sync 8k write 2453.822/second 8k write, fdatasync2867.331/second 8k write, fsync1094.048/second Compare file sync methods using two writes: 2 open_datasync 8k writes 1287.845/second 2 open_sync 8k writes 1332.084/second 8k write, 8k write, fdatasync 1966.411/second 8k write, 8k write, fsync 1048.354/second Compare open_sync with different sizes: open_sync 16k write2281.425/second 2 open_sync 8k writes 1401.561/second Test if fsync on non-write file descriptor is honored: (If the times are similar, fsync() can sync data written on a different descriptor.) 8k write, fsync, close 1298.404/second 8k write, close, fsync 1188.582/second -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to add a primary key using an existing index
I've taken a look at this version of the patch. Submission Review This version of the patch applies cleanly to master. It matches your git repo and includes test + docs. Usability Review --- The command syntax now matches what was discussed during the last cf. The text of the notice: test=# alter table a add constraint acons unique using index aind2; NOTICE: ALTER TABLE / ADD UNIQUE USING INDEX will rename index "aind2" to "acons" Documentation -- I've attached a patch (to be applied on top of your latest patch) with some editorial changes I'd recommend to your documentation. I feel it reads a bit clearer (but others should chime in if they disagree or have better wordings) A git tree with changes rebased to master + this change is available at https://github.com/ssinger/postgres/tree/ssinger/constraint_with_index Code Review --- src/backend/parser/parse_utilcmd.c: 1452 Your calling strdup on the attribute name. I don't have a good enough grasp on the code to be able to trace this through to where the memory gets free'd. Does it get freed? Should/could this be a call to pstrdup Feature Test - I wasn't able to find any issues in my testing I'm marking this as returned with feedback pending your answer on the possible memory leak above but I think the patch is very close to being ready. Steve Singer diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml index 83d2fbb..0b486ab 100644 *** a/doc/src/sgml/ref/alter_table.sgml --- b/doc/src/sgml/ref/alter_table.sgml *** ALTER TABLE table_constraint_using_index ! This form adds a new PRIMARY KEY or UNIQUE constraint to the table using an existing index. All the columns of the index will be included in the constraint. ! The index should be UNIQUE, and should not be a partial index ! or an expressional index. ! This can be helpful in situations where one wishes to recreate or ! REINDEX the index of a PRIMARY KEY or a ! UNIQUE constraint, but dropping and recreating the constraint ! to acheive the effect is not desirable. See the illustrative example below. If a constraint name is provided then the index will be renamed to that ! name, else the constraint will be named to match the index name. --- 242,270 ADD table_constraint_using_index ! This form adds a new PRIMARY KEY or UNIQUE constraint to the table using an existing index. All the columns of the index will be included in the constraint. ! The index should be a UNIQUE index. A partial index ! or an expressional index is not allowed. ! Adding a constraint using an existing index can be helpful in situations ! where you wishes to rebuild an index used for a ! PRIMARY KEY or a UNIQUE constraint, ! but dropping and recreating the constraint ! is not desirable. See the illustrative example below. If a constraint name is provided then the index will be renamed to that ! name of the constraint. Otherwise the constraint will be named to match ! the index name. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] log_hostname and pg_stat_activity
Here is my review for this patch Submission Review -Patch applies cleanly -Patch does not include documentation changes. At a minimum: update the table that lists what pg_stat_activity and pg_stat_replication includes in monitoring.sgml but I propose more below. -No tests are included but writing unit tests that depend on produce the same output involving the hostname of the client is not possible. Usability review See my comments below in the testing section. The patch does do what it says but the log_hostname issue is a usability issue (it not being obvious why you get only null owhen log_hostname is turned off). Documenting it might be fine. If log_hostname were new to 9.1 I would encourage renaming it to something that implies it does more than just control logging output but I don't see this as a good enough reason to rename a parameter. I think being able to see the hostnames connections in pg_stat_activity come from is useful and it is a feature we don't already have. Feature Test If my connection is authorized through a line in pg_hba that uses client_hostname then the column shows what I expect even with log_hostname set to off. However if I connect with a line in pg_hba that matches on an IP network then my client_hostname is always null unless log_hostname is set to true. This is consistent with the behavior you describe but I think the average user will find it a bit confusing. Having a column that is always null unless a GUC is set is less than ideal but I understand why log_hostname isn't on by default. I would be inclined to add an entry for these views in catalogs.sgml that where we can then give users a pointer that they need to set log_hostname to get anything out of this column. If I connect through unix sockets (say psql -h /tmp --port 5433) I was also expecting to see "[local]" in client_hostname but I am just getting NULL. This this lookup is static I don't see why it should be dependent on log_hostname (even with log_hostname set I'm not seeing [local]) I have not tested this with ipv6 Performance Review -- The lookup is done when the connection is established not each time the view is queried. I don't see any performance issues Coding Review - As Alvaro pointed out BackendStatusShmemSize should be updated. To answer his question about why clientaddr works: clientaddr is a SockAddr which is a structure not a pointer so the data gets copied by value to beentry. That won't work for strings, I have no issues with how your allocating space in beentry and then strlcpy'ing the values. I see no issues with the implementation I'm marking this as 'Waiting for Author' pending documentation changes and maybe a fix the behaviour with unix sockets Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v4 - Heikki's thoughts about the patch state
On 13-01-24 11:15 AM, Steve Singer wrote: On 13-01-24 06:40 AM, Andres Freund wrote: Fair enough. I am also working on a user of this infrastructure but that doesn't help you very much. Steve Singer seemed to make some stabs at writing an output plugin as well. Steve, how far did you get there? I was able to get something that generated output for INSERT statements in a format similar to what a modified slony apply trigger would want. This was with the list of tables to replicate hard-coded in the plugin. This was with the patchset from the last commitfest.I had gotten a bit hung up on the UPDATE and DELETE support because slony allows you to use an arbitrary user specified unique index as your key. It looks like better support for tables with a unique non-primary key is in the most recent patch set. I am hoping to have time this weekend to update my plugin to use parameters passed in on the init and other updates in the most recent version. If I make some progress I will post a link to my progress at the end of the weekend. My big issue is that I have limited time to spend on this. A few more comments; In decode.c DecodeDelete + if (r->xl_len <= (SizeOfHeapDelete + SizeOfHeapHeader)) + { + elog(DEBUG2, "huh, no primary key for a delete on wal_level = logical?"); + return; + } + I think we should be passing delete's with candidate key data logged to the plugin. If the table isn't a replicated table then ignoring the delete is fine. If the table is a replicated table but someone has deleted the unique index from the table then the plugin will receive INSERT changes on the table but not DELETE changes. If this happens the plugin would have any way of knowing that it is missing delete changes. If my plugin gets passed a DELETE change record but with no key data then my plugin could do any of 1. Start screaming for help (ie log errors) 2. Drop the table from replication 3. Pass the delete (with no key values) onto the replication client and let it deal with it (see 1 and 2) Also, 'huh' isn't one of our standard log message phrases :) How do you plan on dealing with sequences? I don't see my plugin being called on sequence changes and I don't see XLOG_SEQ_LOG listed in DecodeRecordIntoReorderBuffer. Is there a reason why this can't be easily added? Also what do we want to do about TRUNCATE support. I could always leave a TRUNCATE trigger in place that logged the truncate to a sl_truncates and have my replication daemon respond to the insert on a sl_truncates table by actually truncating the data on the replica. I've spent some time this weekend updating my prototype plugin that generates slony 2.2 style COPY output. I have attached my progress here (also https://github.com/ssinger/slony1-engine/tree/logical_repl). I have not gotten as far as modifying slon to act as a logical log receiver, or made a version of the slony apply trigger that would process these changes. I haven't looked into the details of what is involved in setting up a subscription with the snapshot exporting. I couldn't get the options on the START REPLICATION command to parse so I just hard coded some list building code in the init method. I do plan on pasing the list of tables to replicate from the replica to the plugin (because this list comes from the replica). Passing what could be a few thousand table names as a list of arguments is a bit ugly and I admit my list processing code is rough. Does this make us want to reconsider the format of the option_list ? I guess should provide an opinion on if I think that the patch in this CF, if committed could be used to act as a source for slony instead of the log trigger. The biggest missing piece I mentioned in my email yesterday, that we aren't logging the old primary key on row UPDATEs. I don't see building a credible replication system where you don't allow users to update any column of a row. The other issues I've raised (DecodeDelete hiding bad deletes, replication options not parsing for me) look like easy fixes no wal decoding support for sequences or truncate are things that I could work around by doing things much like slony does today. The SYNC can still capture the sequence changes in a table (where the INSERT's would be logged) and I can have a trigger capture truncates. I mostly did this review from the point of view of someone trying to use the feature, I haven't done a line-by-line review of the code. I suspect Andres can address these issues and get an updated patch out during this CF. I think a more detailed code review by someone more familiar with postgres internals will reveal a handful of other issues that hopefully can be fixed without a lot of effort. If this were the only patch in the commitfest I would encourage Andres to push t