Re: [HACKERS] Crash on promotion when recovery.conf is renamed
On Sat, Dec 17, 2016 at 9:23 PM, Magnus Haganderwrote: > On Fri, Dec 16, 2016 at 7:08 AM, Michael Paquier > wrote: >> Looking at PrescanPreparedTransactions(), I am thinking as well that it >> would >> be better to get a hard failure when bumping on a corrupted 2PC file. >> Future >> files are one thing, but corrupted files should be treated more carefully. > > > Again without looking at it, I agree (so much easier that way :P). Ignoring > corruption is generally a bad idea. Failing hard makes the user notice the > error, and makes it possible to initiate recovery from a standby or from > backups or something, or to *intentionally* remove/clear/ignore it. And I am finishing with the two patches attached: - 0001 changes the 2PC checks so as corrupted entries are FATAL. PreScanPreparedTransaction is used when a hot standby is initialized. In this case a failure protects the range of XIDs generated, potentially saving from corruption of data. At the end of recovery, this is done before any on-disk actions are taken. - 0002 is the thing that Heikki has sent previously to minimize the window between end-of-recovery record write and timeline history file archiving. I am attaching that to next CF. -- Michael From 0e816454c57b851f4aa2743ea776b1e604a27d77 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Tue, 20 Dec 2016 16:47:35 +0900 Subject: [PATCH 2/2] Minimize window between history file and end-of-recovery record Once a standby node is promoted, this makes the assignment of the new timeline number booked earlier as the history file gets archived immediately. This way the other nodes are aware that this new timeline number is taken and should not be assigned to other nodes. The window between which the history file is archived and the end-of-recovery record is written cannot be zeroed, but this way it is minimized as much as possible. The new order of actions prevents as well a corrupted data directory on failure. --- src/backend/access/transam/xlog.c | 27 ++- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index ea2c523e6e..f23feb47e8 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -7210,6 +7210,24 @@ StartupXLOG(void) else snprintf(reason, sizeof(reason), "no recovery target specified"); + /* + * We are now done reading the old WAL. Turn off archive fetching if it + * was active, and make a writable copy of the last WAL segment. (Note + * that we also have a copy of the last block of the old WAL in readBuf; + * we will use that below.) + */ + exitArchiveRecovery(EndOfLogTLI, EndOfLog); + + /* + * Write the timeline history file, and have it archived. After this + * point (or rather, as soon as the file is archived), the timeline + * will appear as "taken" in the WAL archive and to any standby servers. + * If we crash before actually switching to the new timeline, standby + * servers will nevertheless think that we switched to the new timeline, + * and will try to connect to the new timeline. To minimize the window + * for that, try to do as little as possible between here and writing the + * end-of-recovery record. + */ writeTimeLineHistory(ThisTimeLineID, recoveryTargetTLI, EndRecPtr, reason); } @@ -7219,15 +7237,6 @@ StartupXLOG(void) XLogCtl->PrevTimeLineID = PrevTimeLineID; /* - * We are now done reading the old WAL. Turn off archive fetching if it - * was active, and make a writable copy of the last WAL segment. (Note - * that we also have a copy of the last block of the old WAL in readBuf; - * we will use that below.) - */ - if (ArchiveRecoveryRequested) - exitArchiveRecovery(EndOfLogTLI, EndOfLog); - - /* * Prepare to write WAL starting at EndOfLog position, and init xlog * buffer cache using the block containing the last record from the * previous incarnation. -- 2.11.0 From 4037db778a4353fc79b15eb683a5b9e3f648db24 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Tue, 20 Dec 2016 16:41:44 +0900 Subject: [PATCH 1/2] Change detection of corrupted 2PC files as FATAL When scanning 2PC files, be it for initializing a hot standby node or finding the XID range at the end of recovery, bumping on a corrupted 2PC file is reported as a WARNING and are blindly corrupted. If it happened that the corrupted file was actually legit, there is actually a risk of corruption, so switch that to a hard failure. --- src/backend/access/transam/twophase.c | 23 --- src/backend/access/transam/xlog.c | 10 +++--- 2 files changed, 15 insertions(+), 18 deletions(-) diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index 5415604993..9cbcb83062 100644 --- a/src/backend/access/transam/twophase.c +++
Re: [HACKERS] increasing the default WAL segment size
Hello all, On Mon, Dec 19, 2016 at 3:14 PM, Beena Emersonwrote: > Hello all, > > Please find attached a patch to make wal segment size initdb configurable. > > The attached patch removes --with-wal-segsize configure option and adds a > new initdb option --wal-segsize. The module initdb passes the wal-segsize > value into an environment variable which is used to overwrite the guc value > wal_ segment_size and set the internal variables : XLogSegSize and > XLOG_SEG_SIZE (xlog_internal.h). The default wal_segment_size is not > changed but I have increased the maximum size to 1GB. > > Since XLOG_SEG_SIZE is now variable, it could not be used directly in > src/bin modules and few macros and few changes had to be made: > - in guc.c , remove GUC_UNIT_XSEGS which used XLOG_SEG_SIZE and > introduce show functions for the guc which used the unit (min_wal_size and > max_wal_size). > - For pg_basebackup, add new replication command SHOW_WAL_SEGSZ to > fetch the wal_segment_size in bytes. > - pg_controldata, pg_resetxlog, pg_rewind, fetch the xlog_seg_size from > the ControlFile. > - Since pg_xlogdump reads the wal files, it uses the file size to > determine the xlog_seg_size. > - In pg_test_fsync, a buffer of size XLOG_SEG_SIZE was created, filled > with random data and written to a temporary file to check for any > write/fsync error before performing the tests. Since it does not affect the > actual performance results, the XLOG_SEG_SIZE in the module is replaced > with the default value (16MB). > > Please note that the documents are not updated in this patch. > > Feedback and suggestions are welcome. > This patch has been added to the commit fest (https://commitfest. postgresql.org/12/921/) After further testing, I found that pg_standby contrib module does not work with the changes. I will fix it in the next version of the patch. Comments on the current patch are welcome. Thank you, Beena Emerson Have a Great Day!
Re: [HACKERS] too low cost of Bitmap index scan
2016-12-19 23:28 GMT+01:00 Robert Haas: > On Sat, Dec 17, 2016 at 3:30 AM, Pavel Stehule > wrote: > > -> Bitmap Heap Scan on "Zasilka" (cost=5097.39..5670.64 rows=1 > width=12) > > (actual time=62.253..62.400 rows=3 loops=231) > ... > > When I disable bitmap scan, then the query is 6x time faster > > >-> Index Scan using "Zasilka_idx_Dopravce" on "Zasilka" > > (cost=0.00..30489.04 rows=1 width=12) (actual time=15.651..17.187 rows=3 > > loops=231) > > Index Cond: ("Dopravce" = "Dopravce_Ridic_1"."ID") > >Filter: (("StavDatum" > (now() - '10 days'::interval)) AND > (("Stav" = > > 34) OR ("Stav" = 29) OR ("Stav" = 180) OR ("Stav" = 213) OR ("Stav" = > 46) OR > > (("Produkt" = 33) AND ("Stav" = 179)) OR ((("ZpetnaZasilka" = > '-1'::integer) > > OR ("PrimaZasilka" = '-1'::integer)) AND ("Stav" = 40 > > Rows Removed by Filter: 7596 > > I'm not sure, but my guess would be that the query planner isn't > getting a very accurate selectivity estimate for that giant filter > condition, and that's why the cost estimate is off. > maybe operator cost is too high? Regards Pavel > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Re: [HACKERS] Logical Replication WIP
On 2016-12-19 08:04, Erik Rijkers wrote: On 2016-12-18 11:12, Petr Jelinek wrote: (now using latest: patchset:) 0001-Add-PUBLICATION-catalogs-and-DDL-v14.patch 0002-Add-SUBSCRIPTION-catalog-and-DDL-v14.patch 0003-Define-logical-replication-protocol-and-output-plugi-v14.patch 0004-Add-logical-replication-workers-v14.patch 0005-Add-separate-synchronous-commit-control-for-logical--v14.patch Sometimes replication (caused by a pgbench run) runs for a few seconds replicating all 4 pgbench tables correctly, but never longer than 10 to 20 seconds. I've concocted pgbench_derail.sh. It assumes 2 instances running, initially without the publication and subsciption. There are two separate installations, on the same machine. To startup the two instances I use instance.sh: # ./instances.sh #!/bin/sh port1=6972 port2=6973 project1=logical_replication project2=logical_replication2 pg_stuff_dir=$HOME/pg_stuff PATH1=$pg_stuff_dir/pg_installations/pgsql.$project1/bin:$PATH PATH2=$pg_stuff_dir/pg_installations/pgsql.$project2/bin:$PATH server_dir1=$pg_stuff_dir/pg_installations/pgsql.$project1 server_dir2=$pg_stuff_dir/pg_installations/pgsql.$project2 data_dir1=$server_dir1/data data_dir2=$server_dir2/data options1=" -c wal_level=logical -c max_replication_slots=10 -c max_worker_processes=12 -c max_logical_replication_workers=10 -c max_wal_senders=10 -c logging_collector=on -c log_directory=$server_dir1 -c log_filename=logfile.${project1} " options2=" -c wal_level=replica -c max_replication_slots=10 -c max_worker_processes=12 -c max_logical_replication_workers=10 -c max_wal_senders=10 -c logging_collector=on -c log_directory=$server_dir2 -c log_filename=logfile.${project2} " which postgres export PATH=$PATH1; postgres -D $data_dir1 -p $port1 ${options1} & export PATH=$PATH2; postgres -D $data_dir2 -p $port2 ${options2} & # end ./instances.sh #--- pgbench_derail.sh #!/bin/sh # assumes both instances are running # clear logs # echo > $HOME/pg_stuff/pg_installations/pgsql.logical_replication/logfile.logical_replication # echo > $HOME/pg_stuff/pg_installations/pgsql.logical_replication2/logfile.logical_replication2 port1=6972 port2=6973 function cb() { # display the 4 pgbench tables' accumulated content as md5s # a,b,t,h stand for: pgbench_accounts, -branches, -tellers, -history for port in $port1 $port2 do md5_a=$(echo "select * from pgbench_accounts order by aid" |psql -qtAXp$port|md5sum|cut -b 1-9) md5_b=$(echo "select * from pgbench_branches order by bid" |psql -qtAXp$port|md5sum|cut -b 1-9) md5_t=$(echo "select * from pgbench_tellers order by tid" |psql -qtAXp$port|md5sum|cut -b 1-9) md5_h=$(echo "select * from pgbench_history order by aid,bid,tid"|psql -qtAXp$port|md5sum|cut -b 1-9) cnt_a=$(echo "select count(*) from pgbench_accounts"|psql -qtAXp $port) cnt_b=$(echo "select count(*) from pgbench_branches"|psql -qtAXp $port) cnt_t=$(echo "select count(*) from pgbench_tellers" |psql -qtAXp $port) cnt_h=$(echo "select count(*) from pgbench_history" |psql -qtAXp $port) printf "$port a,b,t,h: %6d %6d %6d %6d" $cnt_a $cnt_b $cnt_t $cnt_h echo -n " $md5_a $md5_b $md5_t $md5_h" if [[ $port -eq $port1 ]]; then echo " master" elif [[ $port -eq $port2 ]]; then echo " replica" else echo " ERROR" fi done } echo " drop table if exists pgbench_accounts; drop table if exists pgbench_branches; drop table if exists pgbench_tellers; drop table if exists pgbench_history;" | psql -X -p $port1 \ && echo " drop table if exists pgbench_accounts; drop table if exists pgbench_branches; drop table if exists pgbench_tellers; drop table if exists pgbench_history;" | psql -X -p $port2 \ && pgbench -p $port1 -qis 1 \ && echo "alter table pgbench_history replica identity full;" | psql -1p $port1 \ && pg_dump -F c -p $port1 \ -t pgbench_accounts \ -t pgbench_branches \ -t pgbench_tellers \ -t pgbench_history \ | pg_restore -p $port2 -d testdb echo "$(cb)" sleep 2 echo "$(cb)" echo "create publication pub1 for all tables;" | psql -p $port1 -aqtAX echo " create subscription sub1 connection 'port=${port1}' publication pub1 with (disabled); alter subscription sub1 enable; " | psql -p $port2 -aqtAX # # repeat a short (10 s) pgbench-un to show that during such # short runs the logical replication often remains intact. # Longer pgbench-runs always derail the logrep of one or more # of these 4 table # # bug: pgbench_history no longer replicates # sometimes also the other 3 table de-synced. echo "$(cb)" echo "-- pgbench -c 1 -T 10 -P 5 (short run, first)" pgbench -c 1 -T 10 -P 5 sleep 2 echo "$(cb)" echo "-- pgbench -c 1 -T 10 -P 5 (short run, second)" pgbench -c 1 -T 10 -P 5 sleep 2 echo "$(cb)" echo "-- pgbench -c 1 -T 120 -P 15 (long run)"
Re: [HACKERS] Logical decoding on standby
On 07/12/16 07:05, Craig Ringer wrote: > On 21 November 2016 at 16:17, Craig Ringerwrote: >> Hi all >> >> I've prepared a working initial, somewhat raw implementation for >> logical decoding on physical standbys. > > Hi all > > I've attached a significantly revised patch, which now incorporates > safeguards to ensure that we prevent decoding if the master has not > retained needed catalogs and cancel decoding sessions that are holding > up apply because they need too-old catalogs > > The biggest change in this patch, and the main intrusive part, is that > procArray->replication_slot_catalog_xmin is no longer directly used by > vacuum. Instead, a new ShmemVariableCache->oldestCatalogXmin field is > added, with a corresponding CheckPoint field. Vacuum notices if > procArray->replication_slot_catalog_xmin has advanced past > ShmemVariableCache->oldestCatalogXmin and writes a new xact rmgr > record with the new value before it copies it to oldestCatalogXmin. > This means that a standby can now reliably tell when catalogs are > about to be removed or become candidates for removal, so it can pause > redo until logical decoding sessions on the standby advance far enough > that their catalog_xmin passes that point. It also means that if our > hot_standby_feedback somehow fails to lock in the catalogs our slots > need on a standby, we can cancel sessions with a conflict with > recovery. > > If wal_level is < logical this won't do anything, since > replication_slot_catalog_xmin and oldestCatalogXmin will both always > be 0. > > Because oldestCatalogXmin advances eagerly as soon as vacuum sees the > new replication_slot_catalog_xmin, this won't impact catalog bloat. > > Ideally this mechanism won't generally actually be needed, since > hot_standby_feedback stops the master from removing needed catalogs, > and we make an effort to ensure that the standby has > hot_standby_feedback enabled and is using a replication slot. We > cannot prevent the user from dropping and re-creating the physical > slot on the upstream, though, and it doesn't look simple to stop them > turning off hot_standby_feedback or turning off use of a physical slot > after creating logical slots, either. So we try to stop users shooting > themselves in the foot, but if they do it anyway we notice and cope > gracefully. Logging catalog_xmin also helps slots created on standbys > know where to start, and makes sure we can deal gracefully with a race > between hs_feedback and slot creation on a standby. > Hi, If this mechanism would not be needed most of the time, wouldn't it be better to not have it and just have a way to ask physical slot about what's the current reserved catalog_xmin (in most cases the standby should actually know what it is since it's sending the hs_feedback, but first slot created on such standby may need to check). WRT preventing hs_feedback going off, we can refuse to start with hs_feedback off when there are logical slots detected. We can also refuse to connect to the master without physical slot if there are logical slots detected. I don't see problem with either of those. You may ask what if user drops the slot and recreates or somehow otherwise messes up catalog_xmin on master, well, considering that under my proposal we'd first (on connect) check the slot for catalog_xmin we'd know about it so we'd either mark the local slots broken/drop them or plainly refuse to connect to the master same way as if it didn't have required WAL anymore (not sure which behavior is better). Note that user could mess up catalog_xmin even in your design the same way, so it's not really a regression. In general I don't think that it's necessary to WAL log anything for this. It will not work without slot and therefore via archiving anyway so writing to WAL does not seem to buy us anything. There are some interesting side effects of cascading (ie having A->B->C replication and creating logical slot on C) but those should not be insurmountable. Plus it might even be okay to only allow creating logical slots on standbys connected directly to master in v1. That's about approach, but since there are prerequisite patches in the patchset that don't really depend on the approach I will comment about them as well. 0001 and 0002 add testing infrastructure and look fine to me, possibly committable. But in 0003 I don't understand following code: > + if (endpos != InvalidXLogRecPtr && !do_start_slot) > + { > + fprintf(stderr, > + _("%s: cannot use --create-slot or --drop-slot > together with --endpos\n"), > + progname); > + fprintf(stderr, _("Try \"%s --help\" for more information.\n"), > + progname); > + exit(1); > + } Why is --create-slot and --endpos not allowed together? 0004 again looks good but depends on 0003. 0005 is timeline following which is IMHO ready for committer, as is
Re: [HACKERS] invalid combination of options "-D - -F t -X stream" in pg_basebackup
On Tue, Dec 20, 2016 at 1:43 AM, Magnus Haganderwrote: > > > On Mon, Dec 19, 2016 at 5:39 PM, Fujii Masao wrote: >> >> Hi, >> >> Isn't it better to forbid the conbination of the options "-D -", "-F t" >> and >> "-X stream" in pg_basebackup? This is obviously invalid setting and the >> docs >> warns this as follows. But currently users can specify such setting and >> pg_basebackup can exit unexpectedly with an error. >> >> --- >> If the value - (dash) is specified as target directory, the tar contents >> will >> be written to standard output, suitable for piping to for example gzip. >> This is only possible if the cluster has no additional tablespaces. >> --- > > > Yes, definitely. I'd say that's an oversight in implementing the support for > stream-to-tar that it did not detect this issue. > > Do you want to provide a patch for it, or should I? What about the attached patch? +fprintf(stderr, _("Try \"%s --help\" for more information.\n"), +progname); I added the above hint message because other codes checking invalid options also have such hint messages. But there is no additional useful information about valid combination of options in the help messages, so I'm a bit tempted to remove the above hint message. Regards, -- Fujii Masao *** a/src/bin/pg_basebackup/pg_basebackup.c --- b/src/bin/pg_basebackup/pg_basebackup.c *** *** 2268,2273 main(int argc, char **argv) --- 2268,2283 exit(1); } + if (format == 't' && streamwal && strcmp(basedir, "-") == 0) + { + fprintf(stderr, + _("%s: cannot stream transaction logs in tar mode to stdout\n"), + progname); + fprintf(stderr, _("Try \"%s --help\" for more information.\n"), + progname); + exit(1); + } + if (replication_slot && !streamwal) { fprintf(stderr, -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Quorum commit for multiple synchronous replication.
On Tue, Dec 20, 2016 at 2:31 PM, Masahiko Sawadawrote: > Do we need to consider the sorting method and the selecting k-th > latest LSN method? Honestly, nah. Tests are showing that there are many more bottlenecks before that with just memory allocation and parsing. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_background contrib module proposal
On Tue, Dec 20, 2016 at 12:21 AM, David Fetterwrote: > On Thu, Nov 24, 2016 at 09:16:53AM +0530, amul sul wrote: >> Hi All, >> >> I would like to take over pg_background patch and repost for >> discussion and review. > > This looks great. > > Sadly, it now breaks initdb when applied atop > dd728826c538f000220af98de025c00114ad0631 with: > > performing post-bootstrap initialization ... TRAP: > FailedAssertion("!(const Node*)(rinfo))->type) == T_RestrictInfo))", > File: "costsize.c", Line: 660) > sh: line 1: 2840 Aborted (core dumped) > "/home/shackle/pggit/postgresql/tmp_install/home/shackle/10/bin/postgres" > --single -F -O -j -c search_path=pg_catalog -c exit_on_error=true template1 > > /dev/null > I've complied binary with --enable-cassert flag and pg_background patch to the top of described commit as well as on latest HEAD, but I am not able to reproduce this crash, see this: [VM postgresql]$ /home/amul/Public/pg_inst/pg-master/bin/postgres --single -F -O -j -c search_path=pg_catalog -c exit_on_error=true template1 PostgreSQL stand-alone backend 10devel backend> CREATE EXTENSION pg_background; backend> select * from pg_extension; 1: extname (typeid = 19, len = 64, typmod = -1, byval = f) 2: extowner (typeid = 26, len = 4, typmod = -1, byval = t) 3: extnamespace (typeid = 26, len = 4, typmod = -1, byval = t) 4: extrelocatable (typeid = 16, len = 1, typmod = -1, byval = t) 5: extversion (typeid = 25, len = -1, typmod = -1, byval = f) 6: extconfig (typeid = 1028, len = -1, typmod = -1, byval = f) 7: extcondition (typeid = 1009, len = -1, typmod = -1, byval = f) 1: extname = "plpgsql" (typeid = 19, len = 64, typmod = -1, byval = f) 2: extowner = "10" (typeid = 26, len = 4, typmod = -1, byval = t) 3: extnamespace = "11" (typeid = 26, len = 4, typmod = -1, byval = t) 4: extrelocatable = "f" (typeid = 16, len = 1, typmod = -1, byval = t) 5: extversion = "1.0" (typeid = 25, len = -1, typmod = -1, byval = f) 1: extname = "pg_background" (typeid = 19, len = 64, typmod = -1, byval = f) 2: extowner = "10" (typeid = 26, len = 4, typmod = -1, byval = t) 3: extnamespace = "11" (typeid = 26, len = 4, typmod = -1, byval = t) 4: extrelocatable = "t" (typeid = 16, len = 1, typmod = -1, byval = t) 5: extversion = "1.0" (typeid = 25, len = -1, typmod = -1, byval = f) backend> I hope I am missing anything. Thanks ! Regards, Amul -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Quorum commit for multiple synchronous replication.
On Mon, Dec 19, 2016 at 9:49 PM, Fujii Masaowrote: > On Sun, Dec 18, 2016 at 9:36 PM, Michael Paquier > wrote: >> On Fri, Dec 16, 2016 at 10:42 PM, Fujii Masao wrote: >>> Attached is the modified version of the patch. Barring objections, I will >>> commit this version. >> >> There is a whitespace: >> $ git diff master --check >> src/backend/replication/syncrep.c:39: trailing whitespace. >> + * > > Okey, pushed the patch with this fix. Thanks! Thank you for reviewing and commit! > Regarding this feature, there are some loose ends. We should work on > and complete them until the release. > > (1) > Which synchronous replication method, priority or quorum, should be > chosen when neither FIRST nor ANY is specified in s_s_names? Right now, > a priority-based sync replication is chosen for keeping backward > compatibility. However some hackers argued to change this decision > so that a quorum commit is chosen because they think that most users > prefer to a quorum. > > (2) > There will be still many source comments and documentations that > we need to update, for example, in high-availability.sgml. We need to > check and update them throughly. Will try to update them. > (3) > The priority value is assigned to each standby listed in s_s_names > even in quorum commit though those priority values are not used at all. > Users can see those priority values in pg_stat_replication. > Isn't this confusing? If yes, it might be better to always assign 1 as > the priority, for example. > > > Any other? > Do we need to consider the sorting method and the selecting k-th latest LSN method? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2016/12/20 12:59, Robert Haas wrote: > On Sun, Dec 18, 2016 at 10:00 PM, Amit Langote >wrote: >> Here are updated patches including the additional information. > > Thanks. Committed 0001. Will have to review the others when I'm less tired. Thanks! > BTW, elog() is only supposed to be used for can't happen error > messages; when it is used, no translation is done. So this is wrong: > > if (skip_validate) > elog(NOTICE, "skipping scan to validate partition constraint"); You're right. I was using it for debugging when I first wrote that code, but then thought it would be better to eventually turn that into ereport(INFO/NOTICE) for the final submission, which I missed to do. Sorry about that. Here's a patch. Thanks, Amit diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 1c219b03dd..6a179596ce 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -13297,8 +13297,10 @@ ATExecAttachPartition(List **wqueue, Relation rel, PartitionCmd *cmd) } } + /* It's safe to skip the validation scan after all */ if (skip_validate) - elog(NOTICE, "skipping scan to validate partition constraint"); + ereport(INFO, +(errmsg("skipping scan to validate partition constraint"))); /* * Set up to have the table to be scanned to validate the partition diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out index 99e20eb922..2a324c0b49 100644 --- a/src/test/regress/expected/alter_table.out +++ b/src/test/regress/expected/alter_table.out @@ -3179,7 +3179,7 @@ ALTER TABLE list_parted2 ATTACH PARTITION part_3_4 FOR VALUES IN (3, 4); ALTER TABLE list_parted2 DETACH PARTITION part_3_4; ALTER TABLE part_3_4 ALTER a SET NOT NULL; ALTER TABLE list_parted2 ATTACH PARTITION part_3_4 FOR VALUES IN (3, 4); -NOTICE: skipping scan to validate partition constraint +INFO: skipping scan to validate partition constraint -- check validation when attaching range partitions CREATE TABLE range_parted ( a int, @@ -3204,7 +3204,7 @@ CREATE TABLE part2 ( b int NOT NULL CHECK (b >= 10 AND b < 18) ); ALTER TABLE range_parted ATTACH PARTITION part2 FOR VALUES FROM (1, 10) TO (1, 20); -NOTICE: skipping scan to validate partition constraint +INFO: skipping scan to validate partition constraint -- check that leaf partitions are scanned when attaching a partitioned -- table CREATE TABLE part_5 ( @@ -3219,7 +3219,7 @@ ERROR: partition constraint is violated by some row DELETE FROM part_5_a WHERE a NOT IN (3); ALTER TABLE part_5 ADD CONSTRAINT check_a CHECK (a IN (5)), ALTER a SET NOT NULL; ALTER TABLE list_parted2 ATTACH PARTITION part_5 FOR VALUES IN (5); -NOTICE: skipping scan to validate partition constraint +INFO: skipping scan to validate partition constraint -- check that the table being attached is not already a partition ALTER TABLE list_parted2 ATTACH PARTITION part_2 FOR VALUES IN (2); ERROR: "part_2" is already a partition -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Sun, Dec 18, 2016 at 10:00 PM, Amit Langotewrote: > Here are updated patches including the additional information. Thanks. Committed 0001. Will have to review the others when I'm less tired. BTW, elog() is only supposed to be used for can't happen error messages; when it is used, no translation is done. So this is wrong: if (skip_validate) elog(NOTICE, "skipping scan to validate partition constraint"); -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Measuring replay lag
On Mon, Dec 19, 2016 at 10:46 PM, Simon Riggswrote: > On 26 October 2016 at 11:34, Thomas Munro > wrote: > >> It works by taking advantage of the { time, end-of-WAL } samples that >> sending servers already include in message headers to standbys. That >> seems to provide a pretty good proxy for when the WAL was written, if >> you ignore messages where the LSN hasn't advanced. The patch >> introduces a new GUC replay_lag_sample_interval, defaulting to 1s, to >> control how often the standby should record these timestamped LSNs >> into a small circular buffer. When its recovery process eventually >> replays a timestamped LSN, the timestamp is sent back to the upstream >> server in a new reply message field. The value visible in >> pg_stat_replication.replay_lag can then be updated by comparing with >> the current time. > > Why not just send back the lag as calculated by max_standby_streaming_delay? > I.e. at the end of replay of each chunk record the current delay in > shmem, then send it back periodically. > > If we have two methods of calculation it would be confusing. Hmm. If I understand correctly, GetStandbyLimitTime is measuring something a bit different: it computes how long it has been since the recovery process received the chunk that it's currently trying to apply, most interestingly in the case where we are waiting due to conflicts. It doesn't include time in walsender, on the network, in walreceiver, or writing and flushing and reading before it arrives in the recovery process. If I'm reading it correctly, it only updates XLogReceiptTime when it's completely caught up applying all earlier chunks, so when it falls behind, its measure of lag has a growing-only phase and a reset that can only be triggered by catching up to the latest chunk. That seems OK for its intended purpose of putting a cap on the delay introduced by conflicts. But that's not what I'm trying to provide here. The purpose of this proposal is to show the replay_lag as judged by the sending server: in the case of a primary server, that is an indication of how commits done here will take to show up to users over there, and how long COMMIT will take with remote_apply will take to come back. It measures the WAL's whole journey, and does so in a smooth way that shows accurate information even if the standby never quite catches up during long periods. Example 1: Suppose I have two servers right next each other, and the primary server has periods of high activity which exceed the standby's replay rate, perhaps because of slower/busier hardware, or because of conflicts with other queries, or because our single-core 'REDO' can't always keep up with multi-core 'DO'. By the logic of max_standby_streaming_delay, if it never catches up to the latest chunk but remains a fluctuating number of chunks behind, then AIUI the standby will compute a constantly increasing lag. By my logic, the primary will tell you quite accurately how far behind the standby's recovery is at regular intervals, showing replay_lag fluctuating up and down as appropriate, even if it never quite catches up. It can do that because it has a buffer full of regularly spaced out samples to work through, and even if you exceed the buffer size (8192 seconds' worth by default) it'll just increase the interval between samples. Example 2: Suppose I have servers on opposite sides of the world with a ping time of 300ms. By the logic used for max_standby_streaming_delay, the lag computed by the standby would be close to zero when there is no concurrent activity to conflict with. I don't think that's what users other than the recovery-conflict resolution code want to know. By my logic, replay_lag computed by the primary would show 300ms + a tiny bit more, which is how long it takes for committed transactions to be visible to user queries on the standby and for us to know that that is the case. That's interesting because it tells you how long synchronous_commit = remote_apply would make you wait (if that server is waited for according to syncrep config). In summary, the max_standby_streaming_delay approach only measures activity inside the recovery process on the standby, and only uses a single variable for timestamp tracking, so although it's semi-related it's not what I wanted to show. (I suppose there might be an argument that max_standby_streaming_delay should also track received-on-standby-time for each sampled LSN in a circular buffer, and then use that information to implement max_standby_streaming_delay more fairly. We only need to cancel queries that conflict with WAL records that have truly been waiting max_standby_streaming_delay since receive time, instead of cancelling everything that conflicts with recovery until we're caught up to the last chunk as we do today as soon as max_standby_streaming_delay is exceeded while trying to apply *any* WAL record. This may not make any sense or be worth
Re: pg_authid.rolpassword format (was Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol)
On Sat, Dec 17, 2016 at 5:48 PM, Michael Paquierwrote: > On Sun, Dec 18, 2016 at 3:59 AM, Robert Haas wrote: >> On Fri, Dec 16, 2016 at 5:30 PM, Michael Paquier >> wrote: >>> From the discussions of last year on -hackers, it was decided to *not* >>> have an additional column per complains from a couple of hackers >>> (Robert you were in this set at this point), ... >> >> Hmm, I don't recall taking that position, but then there are a lot of >> things that I ought to recall and don't. (Ask my wife!) > > [... digging objects of the past ...] > From the past thread: > https://www.postgresql.org/message-id/ca+tgmoy790rphhbogxmbtg6mzsendoxdbxebekaet9zpz8g...@mail.gmail.com > The complain is directed directly to multiple verifiers per users > though, not to have the type in a separate column. Yes, I rather like the separate column. But since Heikki is doing the work (or if he is) I'm not going to gripe too much. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Creating a DSA area to provide work space for parallel execution
On Mon, Dec 19, 2016 at 6:35 PM, Thomas Munrowrote: > On Tue, Dec 20, 2016 at 11:12 AM, Robert Haas wrote: >> On Thu, Dec 1, 2016 at 6:35 AM, Thomas Munro >> wrote: >>> On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro >>> wrote: Here's a new version to apply on top of dsa-v7.patch. >>> >>> Here's a version to go with dsa-v8.patch. >> >> All right, so I've committed this, but not before (1) renaming some >> things that you added, (2) adding documentation for the new LWLock >> tranche, (3) not creating the DSA in the "simulated DSM using >> backend-private memory" case, and (4) some cosmetic tweaks. > > Thanks! Hmm, so now in rare circumstances we can finish up running > ExecXXXInitializeDSM, but later have estate->es_query_dsa == NULL. > This is something to be careful about. Yes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication WIP
On 19/12/16 15:39, Steve Singer wrote: > On 12/18/2016 09:04 PM, Petr Jelinek wrote: >> On 18/12/16 19:02, Steve Singer wrote: >> >>> pg_dump is also generating warnings >>> >>> pg_dump: [archiver] WARNING: don't know how to set owner for object type >>> SUBSCRIPTION >>> >>> I know that the plan is to add proper ACL's for publications and >>> subscriptions later. I don't know if we want to leave the warning in >>> until then or do something about it. >>> >> No, ACLs are separate from owner. This is thinko on my side. I was >> thinking we can live without ALTER ... OWNER TO for now, but we actually >> need it for pg_dump and for REASSIGN OWNED. So now I added the OWNER TO >> for both PUBLICATION and SUBSCRIPTION. > > > When I try to restore my pg_dump with publications I get > > ./pg_dump -h localhost --port 5440 test |./psql -h localhost --port > 5440 test2 > > > ALTER TABLE > CREATE PUBLICATION > ERROR: unexpected command tag "PUBLICATION > > This comes from a > ALTER PUBLICATION mypub OWNER TO ssinger; > > > Does the OWNER TO clause need to be added to AlterPublicationStmt: > instead of AlterOwnerStmt ? Nah that's just bug in what command tag string we return in the utility.c, I noticed this myself after sending the v14, it's one line fix. > Also we should update the tab-complete for ALTER PUBLICATION to show the > OWNER to options + the \h help in psql and the reference SGML > Yeah. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol
On Thu, Dec 15, 2016 at 3:17 PM, Michael Paquierwrote: > In the case where the binaries are *not* built with libidn, I think > that we had better reject valid UTF-8 string directly and just allow > ASCII? SASLprep is a no-op on ASCII characters. > > Thoughts about this approach? And Heikki has mentioned me that he'd prefer not having an extra dependency for the normalization, which is LGPL-licensed by the way. So I have looked at the SASLprep business to see what should be done to get a complete implementation in core, completely independent of anything known. The first thing is to be able to understand in the SCRAM code if a string is UTF-8 or not, and this code is in src/common/. pg_wchar.c offers a set of routines exactly for this purpose, which is built with libpq but that's not available for src/common/. So instead of moving all the file, I'd like to create a new file in src/common/utf8.c which includes pg_utf_mblen() and pg_utf8_islegal(). On top of that I think that having a routine able to check a full string would be useful for many users, as pg_utf8_islegal() can only check one set of characters. If the password string is found to be of UTF-8 format, SASLprepare is applied. If not, the string is copied as-is with perhaps unexpected effects for the client But he's in trouble already if client is not using UTF-8. Then comes the real business... Note that's my first time touching encoding, particularly UTF-8 in depth, so please be nice. I may write things that are incorrect or sound so from here :) The second thing is the normalization itself. Per RFC4013, NFKC needs to be applied to the string. The operation is described in [1] completely, and it is named as doing 1) a compatibility decomposition of the bytes of the string, followed by 2) a canonical composition. About 1). The compatibility decomposition is defined in [2], "by recursively applying the canonical and compatibility mappings, then applying the canonical reordering algorithm". Canonical and compatibility mapping are some data available in UnicodeData.txt, the 6th column of the set defined in [3] to be precise. The meaning of the decomposition mappings is defined in [2] as well. The canonical decomposition is basically to look for a given UTF-8 character, and then apply the multiple characters resulting in its new shape. The compatibility mapping should as well be applied, but [5], a perl tool called charlint.pl doing this normalization work, does not care about this phase... Do we? About 2)... Once the decomposition has been applied, those bytes need to be recomposed using the Canonical_Combining_Class field of UnicodeData.txt in [3], which is the 3rd column of the set. Its values are defined in [4]. An other interesting thing, charlint.pl [5] does not care about this phase. I am wondering if we should as well not just drop this part as well... Once 1) and 2) are done, NKFC is complete, and so is SASLPrepare. So what we need from Postgres side is a mapping table to, having the following fields: 1) Hexa sequence of UTF8 character. 2) Its canonical combining class. 3) The kind of decomposition mapping if defined. 4) The decomposition mapping, in hexadecimal format. Based on what I looked at, either perl or python could be used to process UnicodeData.txt and to generate a header file that would be included in the tree. There are 30k entries in UnicodeData.txt, 5k of them have a mapping, so that will result in many tables. One thing to improve performance would be to store the length of the table in a static variable, order the entries by their hexadecimal keys and do a dichotomy lookup to find an entry. We could as well use more fancy things like a set of tables using a Radix tree using decomposed by bytes. We should finish by just doing one lookup of the table for each character sets anyway. In conclusion, at this point I am looking for feedback regarding the following items: 1) Where to put the UTF8 check routines and what to move. 2) How to generate the mapping table using UnicodeData.txt. I'd think that using perl would be better. 3) The shape of the mapping table, which depends on how many operations we want to support in the normalization of the strings. The decisions for those items will drive the implementation in one sense or another. [1]: http://www.unicode.org/reports/tr15/#Description_Norm [2]: http://www.unicode.org/Public/5.1.0/ucd/UCD.html#Character_Decomposition_Mappings [3]: http://www.unicode.org/Public/5.1.0/ucd/UCD.html#UnicodeData.txt [4]: http://www.unicode.org/Public/5.1.0/ucd/UCD.html#Canonical_Combining_Class_Values [5]: https://www.w3.org/International/charlint/ Heikki, others, thoughts? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical replication existing data copy
On 19/12/16 18:25, Peter Eisentraut wrote: > On 12/19/16 4:30 AM, Petr Jelinek wrote: >> So existing table data can be copied once subscription is created, but >> also new tables can be added and their data copied. This is where the >> REFRESH PUBLICATION comes into play. Adding table to publication does >> not make it automatically replicated by the subscription as the >> subscription does not have tracking info for that table. So to add new >> table user must call ALTER SUBSCRIPTION ... REFRESH PUBLICATION on >> subscriber otherwise the data won't be replicated. > > Couldn't the subscriber automatically add tracking info when apply > stream data arrives for a relation it has not seen before? > Sure, but it has many caveats: - what if the table does not exist - what it if exists and already has data - what if the table is rarely written to We can't control any of that until we have DDL replication/automatic structure dumping. Once we have those, we can add options to control default behavior per subscriber, but with current feature set, anything that does not require user action will behave non-deterministically which is usually confusing. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor correction in alter_table.sgml
On 2016/12/20 4:08, Peter Eisentraut wrote: > On 11/30/16 8:47 PM, Amit Langote wrote: >>> So maybe something like >>> >>> All the forms of ALTER TABLE that act on a single table, >>> except RENAME and SET SCHEMA, can be combined into a >>> list of multiple alterations to be applied together. >> >> Updated patch attached. > > Could you please rebase this patch? Some new stuff about partitions has > been added in the paragraph you are changing. Sure, here it is. Thanks, Amit diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml index 8ea6624147..cce853be8a 100644 --- a/doc/src/sgml/ref/alter_table.sgml +++ b/doc/src/sgml/ref/alter_table.sgml @@ -767,13 +767,12 @@ ALTER TABLE [ IF EXISTS ] name - All the actions except RENAME, - SET TABLESPACE, SET SCHEMA, - ATTACH PARTITION, and - DETACH PARTITION can be combined into - a list of multiple alterations to apply in parallel. For example, it - is possible to add several columns and/or alter the type of several - columns in a single command. This is particularly useful with large + All the forms of ALTER TABLE that act on a single table, + except RENAME, SET SCHEMA, + ATTACH PARTITION, and DETACH PARTITION + can be combined into a list of multiple alterations to be applied together. + For example, it is possible to add several columns and/or alter the type of + several columns in a single command. This is particularly useful with large tables, since only one pass over the table need be made. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical replication existing data copy
On 20 Dec. 2016 1:27 am, "Peter Eisentraut" < peter.eisentr...@2ndquadrant.com> wrote: On 12/19/16 4:30 AM, Petr Jelinek wrote: > So existing table data can be copied once subscription is created, but > also new tables can be added and their data copied. This is where the > REFRESH PUBLICATION comes into play. Adding table to publication does > not make it automatically replicated by the subscription as the > subscription does not have tracking info for that table. So to add new > table user must call ALTER SUBSCRIPTION ... REFRESH PUBLICATION on > subscriber otherwise the data won't be replicated. Couldn't the subscriber automatically add tracking info when apply stream data arrives for a relation it has not seen before? If no table has been created by the user and we start trying to apply a data stream apply will break. Since manual action is needed to create the destination I don't see a problem with requiring manual enabling too, personally. Let the fully transparent way wait until we can do DDL replication in v11+
Re: [HACKERS] Creating a DSA area to provide work space for parallel execution
On Tue, Dec 20, 2016 at 11:12 AM, Robert Haaswrote: > On Thu, Dec 1, 2016 at 6:35 AM, Thomas Munro > wrote: >> On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro >> wrote: >>> Here's a new version to apply on top of dsa-v7.patch. >> >> Here's a version to go with dsa-v8.patch. > > All right, so I've committed this, but not before (1) renaming some > things that you added, (2) adding documentation for the new LWLock > tranche, (3) not creating the DSA in the "simulated DSM using > backend-private memory" case, and (4) some cosmetic tweaks. Thanks! Hmm, so now in rare circumstances we can finish up running ExecXXXInitializeDSM, but later have estate->es_query_dsa == NULL. This is something to be careful about. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] too low cost of Bitmap index scan
On Sat, Dec 17, 2016 at 3:30 AM, Pavel Stehulewrote: > -> Bitmap Heap Scan on "Zasilka" (cost=5097.39..5670.64 rows=1 width=12) > (actual time=62.253..62.400 rows=3 loops=231) ... > When I disable bitmap scan, then the query is 6x time faster >-> Index Scan using "Zasilka_idx_Dopravce" on "Zasilka" > (cost=0.00..30489.04 rows=1 width=12) (actual time=15.651..17.187 rows=3 > loops=231) > Index Cond: ("Dopravce" = "Dopravce_Ridic_1"."ID") >Filter: (("StavDatum" > (now() - '10 days'::interval)) AND (("Stav" = > 34) OR ("Stav" = 29) OR ("Stav" = 180) OR ("Stav" = 213) OR ("Stav" = 46) OR > (("Produkt" = 33) AND ("Stav" = 179)) OR ((("ZpetnaZasilka" = '-1'::integer) > OR ("PrimaZasilka" = '-1'::integer)) AND ("Stav" = 40 > Rows Removed by Filter: 7596 I'm not sure, but my guess would be that the query planner isn't getting a very accurate selectivity estimate for that giant filter condition, and that's why the cost estimate is off. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cache Hash Index meta page.
On Fri, Dec 16, 2016 at 5:16 AM, Mithun Cywrote: > Shouldn't _hash_doinsert() be using the cache, too >> > > Yes, we have an opportunity there, added same in code. But one difference > is at the end at-least once we need to read the meta page to split and/or > write. Performance improvement might not be as much as read-only. > Why would you need to read it at least once in one case but not the other? > >> I think it's probably not a good idea to cache the entire metapage. The >> only things that you are "really" trying to cache, I think, are >> hashm_maxbucket, hashm_lowmask, and hashm_highmask. The entire >> HashPageMetaData structure is 696 bytes on my machine, and it doesn't >> really make sense to copy the whole thing into memory if you only need 16 >> bytes of it. It could even be dangerous -- if somebody tries to rely on >> the cache for some other bit of data and we're not really guaranteeing that >> it's fresh enough for that. >> >> I'd suggest defining a new structure HashMetaDataCache with members >> hmc_maxbucket, hmc_lowmask, and hmc_highmask. The structure can have a >> comment explaining that we only care about having the data be fresh enough >> to test whether the bucket mapping we computed for a tuple is still >> correct, and that for that to be the case we only need to know whether a >> bucket has suffered a new split since we last refreshed the cache. >> > > It is not only hashm_maxbucket, hashm_lowmask, and hashm_highmask (3 > uint32s) but we also need > > *uint32 hashm_spares[HASH_MAX_SPLITPOINTS],* for bucket number to > block mapping in "BUCKET_TO_BLKNO(metap, bucket)". > > Note : #define HASH_MAX_SPLITPOINTS 32, so it is (3*uint32 + 32*uint32) = > 35*4 = 140 bytes. > Well, I guess that makes it more appealing to cache the whole page at least in terms of raw number of bytes, but I suppose my real complaint here is that there don't seem to be any clear rules for where, whether, and to what extent we can rely on the cache to be valid. Without that, I think this patch is creating an extremely serious maintenance hazard for anyone who wants to try to modify this code in the future. A future patch author needs to be able to tell what data they can use and what data they can't use, and why. Apart from this, there seems to be some base bug in _hash_doinsert(). > +* XXX this is useless code if we are only storing hash keys. > > + */ > > +if (itemsz > HashMaxItemSize((Page) metap)) > > +ereport(ERROR, > > +(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), > > + errmsg("index row size %zu exceeds hash maximum %zu", > > +itemsz, HashMaxItemSize((Page) metap)), > > + errhint("Values larger than a buffer page cannot be > indexed."))); > > "metap" (HashMetaPage) and Page are different data structure their member > types are not in sync, so should not typecast blindly as above. I think we > should remove this part of the code as we only store hash keys. So I have > removed same but kept the assert below as it is. > Any existing bugs should be the subject of a separate patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Creating a DSA area to provide work space for parallel execution
On Thu, Dec 1, 2016 at 6:35 AM, Thomas Munrowrote: > On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro > wrote: >> Here's a new version to apply on top of dsa-v7.patch. > > Here's a version to go with dsa-v8.patch. All right, so I've committed this, but not before (1) renaming some things that you added, (2) adding documentation for the new LWLock tranche, (3) not creating the DSA in the "simulated DSM using backend-private memory" case, and (4) some cosmetic tweaks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for changes to recovery.conf API
On 19 December 2016 at 21:29, Peter Eisentrautwrote: > On 12/16/16 8:52 PM, Robert Haas wrote: >> If the explanation is just a few sentences long, I see no reason not >> to include it in the release notes. > > As far as I can tell from the latest posted patch, the upgrade > instructions are > > - To trigger recovery, create an empty file recovery.trigger instead of > recovery.conf. > > - All parameters formerly in recovery.conf are now regular > postgresql.conf parameters. For backward compatibility, recovery.conf > is read after postgresql.conf, but the parameters can now be put into > postgresql.conf if desired. > > Some of that might be subject to patch review, but it's probably not > going to be much longer than that. So I think that will fit well into > the usual release notes section. Although that's right, there is more detail. The current list is this... based upon discussion in Tokyo dev meeting * Any use of the earlier recovery.conf or any of the old recovery parameter names causes an ERROR, so we have a clear API break * To trigger recovery, create an empty recovery.trigger file. Same as recovery.conf with standby_mode = off * To trigger standby, create an empty standby.trigger file. Same as recovery.conf with standby_mode = on * Trigger files can be placed in a writable directory outside of the data directory, trigger_directory (I would also like to rename the concept of "trigger file" to another word/phrase not already used for something else, c.f. table triggers) * recovery.conf parameters are all moved to postgresql.conf, with these changes a) standby_mode no longer exists b) trigger_file is replaced by promote_trigger_file equivalent parameters will be added for c) recovery_trigger_file and d) standby_trigger_file e) recovery_target is a new parameter with the two part content mentioned upthread, e.g. 'xid 1234' f) recovery_target and all trigger_file parameters are now reloadable, not server start g) recovery parameters are shown in the postgresql.conf.sample, but commented out h) recovery.conf.sample is removed which leaves many of the parameters with same name and meaning as before, but enough change to be a clear API break that needs people to understand the changes * pg_basebackup -R will generate a parameter file written to data directory that will allow it to work, even when postgresql.conf is in a different directory (this bit was the most awkward part of the list of changes) * hot_standby parameter would be removed I think I can work this into the docs without anything worth further discussion, but let me write that first. The release notes can warn about the old API generating an ERROR, with a multiple links to discussion of how it now works. Merry Christmas everybody, new patch in time for New Year, further discussion welcome -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: add error message in backend/catalog/index.c
On 12/13/16 1:15 AM, Ioseph Kim wrote: > I propose to append an error message when index name and table name are > same. > > > example: > > postgres@postgres=# create table t (a int not null, constraint t primary > key (a)); > ERROR: relation "t" already exists The code to detect that would probably be fairly complicated for such a rare case. A more principled approach might be to create an error context for the index creation so that you can tell what is going on when the error happens. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackups and slots
On 12/17/16 9:55 AM, Magnus Hagander wrote: > That makes a lot of sense now that we have temporary replication slots, > I agree. And then max_replication_slots could be something like > persistent_replication_slots? I was thinking was more like that each walsender gets one fixed slot, which can be temporary or persistent. Or maybe default max_replication_slots to something like -1, which means equal to max_wal_senders. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for changes to recovery.conf API
On 12/16/16 8:52 PM, Robert Haas wrote: > If the explanation is just a few sentences long, I see no reason not > to include it in the release notes. As far as I can tell from the latest posted patch, the upgrade instructions are - To trigger recovery, create an empty file recovery.trigger instead of recovery.conf. - All parameters formerly in recovery.conf are now regular postgresql.conf parameters. For backward compatibility, recovery.conf is read after postgresql.conf, but the parameters can now be put into postgresql.conf if desired. Some of that might be subject to patch review, but it's probably not going to be much longer than that. So I think that will fit well into the usual release notes section. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time to drop old-style (V0) functions?
On Mon, Dec 19, 2016 at 3:13 PM, Peter Eisentrautwrote: > On 12/9/16 7:52 AM, Robert Haas wrote: >> It's kind of ironic, at least IMHO, that the DirectionFunctionCall >> macros are anything but direct. Each one is a non-inlined function >> call that does a minimum of 8 variable assignments before actually >> calling the function. > > If this is a problem (it might be), then we can just make those calls, > er, direct C function calls to different function. For example, > > result = DatumGetObjectId(DirectFunctionCall1(oidin, > CStringGetDatum(pro_name_or_oid))); > > could just be > > result = oidin_internal(pro_name_or_oid); Yeah, true -- that works for simple cases, and might be beneficial where you're doing lots and lots of calls in a tight loop. For the more general problem, I wonder if we should try to figure out something where we have one calling convention for "simple" functions (those that little or none of the information in fcinfo and can pretty much just take their SQL args as C args) and another for "complex" functions (where we do the full push-ups). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time to drop old-style (V0) functions?
On 12/9/16 7:52 AM, Robert Haas wrote: > It's kind of ironic, at least IMHO, that the DirectionFunctionCall > macros are anything but direct. Each one is a non-inlined function > call that does a minimum of 8 variable assignments before actually > calling the function. If this is a problem (it might be), then we can just make those calls, er, direct C function calls to different function. For example, result = DatumGetObjectId(DirectFunctionCall1(oidin, CStringGetDatum(pro_name_or_oid))); could just be result = oidin_internal(pro_name_or_oid); -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor correction in alter_table.sgml
On 11/30/16 8:47 PM, Amit Langote wrote: >> So maybe something like >> >> All the forms of ALTER TABLE that act on a single table, >> except RENAME and SET SCHEMA, can be combined into a >> list of multiple alterations to be applied together. > > Updated patch attached. Could you please rebase this patch? Some new stuff about partitions has been added in the paragraph you are changing. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_background contrib module proposal
On Thu, Nov 24, 2016 at 09:16:53AM +0530, amul sul wrote: > Hi All, > > I would like to take over pg_background patch and repost for > discussion and review. This looks great. Sadly, it now breaks initdb when applied atop dd728826c538f000220af98de025c00114ad0631 with: performing post-bootstrap initialization ... TRAP: FailedAssertion("!(const Node*)(rinfo))->type) == T_RestrictInfo))", File: "costsize.c", Line: 660) sh: line 1: 2840 Aborted (core dumped) "/home/shackle/pggit/postgresql/tmp_install/home/shackle/10/bin/postgres" --single -F -O -j -c search_path=pg_catalog -c exit_on_error=true template1 > /dev/null Best, David. -- David Fetterhttp://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PSQL commands: \quit_if, \quit_unless
On Mon, Dec 19, 2016 at 11:23 AM, Alvaro Herrerawrote: > David G. Johnston wrote: > > > Being able to do more conditional work in psql would make setting up more > > robust scripts easier and without either losing transaction capabilities > or > > session pooling for improved performance when large numbers of small > > commands are run in between flow control in done in bash. > > Have you tried to script processes in shell using a single background > psql process with which the shell code communicates using a pipe? I've > long been curious about that approach, but never had a strong need > enough to actually write the code. It should be possible. > I've envisioned and read up a bit on the approach but the cost-benefit hasn't yet made actually doing it worthwhile. I do pretty much myself run all of the scripts I've been writing - the cost-benefit ratio is likely to change once they are turned over to a non-programmer to run. David J.
Re: [HACKERS] PSQL commands: \quit_if, \quit_unless
David G. Johnston wrote: > Being able to do more conditional work in psql would make setting up more > robust scripts easier and without either losing transaction capabilities or > session pooling for improved performance when large numbers of small > commands are run in between flow control in done in bash. Have you tried to script processes in shell using a single background psql process with which the shell code communicates using a pipe? I've long been curious about that approach, but never had a strong need enough to actually write the code. It should be possible. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PSQL commands: \quit_if, \quit_unless
> > > > > -- this is how I fake an 'exit 0' now: > > \set work_needs_to_be_done t > > select > > case > > when :'work_needs_to_be_done'::boolean then '' > > else '\q' > > end as cmd > > \gset > > :cmd > > > > -- ridiculous example to illustrate complications in remembering past > > commands > > select > > case > >when random() < 0.5 then '\ir my_script.sql' > >when random() < 0.7 'select * from a_table; select count(*) from > > another_table;' > >else 'select null as foo;' > > end as cmd > > \gset > > :cmd > > > > And even then, things get complicated, because an \ir include which > makes it > > this iteration might not make it the next, and the \endwhile might have > been > > inside that include, or vice-versa, an included file starts a \while it > > doesn't finish. > > I see your point. Just out of curiosity, why in the world don't you > use something other than psql for scripting? I mean, if you accessed > the data from Perl or Python or > $INSERT_YOUR_FAVORITE_SCRIPTING_LANGUAGE_HERE, you could do all of > this stuff very easily without any contortions. I've always thought > of psql as something that's fine for interactive use and goofy trivial > scripting but not really suitable for serious work. I grant that you > seem to be making it serve the purpose, but, man. > Since you asked: Heh. I *don't* do the second example, I was just pointing out that those things could be done, not that they should be done, and how hard it would be to implement loops when the source code is potentially coming from a stream. My current client does use mostly python, but also perl, and ruby, and PHP and, reactjs, and $NEW_THING. Here are the reasons I often prefer psql: - Wiring up a python script to do one if-then in the middle of 40 SQL statements goes a long way toward obfuscating what SQL is going to be run. - Packaging up the SQL statements in a DO $$ $$; block conceals what statements were run, and how long they took. - In python, etc, it's up to me to show rowcounts and timings. - On very small docker-ish systems, the fewer things I have to install, the better, and golly, python is large these days. - When doing work for regulated industry clients (SOX,HIPAA, school district PII, etc), the auditors like seeing clearly what SQL _will_ run, what SQL _did_ run, and what was affected. psql scripts with echo-queries set and captured output do that nicely. Installing extra scripting languages gives them the vapors, and now we need an auditor that thinks they know two languages, not one. I'm not saying it makes sense, I'm saying fewer dependencies gets a auditor's checkbox checked sooner. > Right, I think that while/for can be left for another time, as long as > the plan doesn't preclude doing it someday. +1
Re: [HACKERS] PSQL commands: \quit_if, \quit_unless
On Mon, Dec 19, 2016 at 10:30 AM, Robert Haaswrote: > > I see your point. Just out of curiosity, why in the world don't you > use something other than psql for scripting? I mean, if you accessed > the data from Perl or Python or > $INSERT_YOUR_FAVORITE_SCRIPTING_LANGUAGE_HERE, you could do all of > this stuff very easily without any contortions. I've always thought > of psql as something that's fine for interactive use and goofy trivial > scripting but not really suitable for serious work. I grant that you > seem to be making it serve the purpose, but, man. > I'm coming to the realization that this sentiment, when applied to my primary application, is probably correct... In my situation the scripting language of choice is Bash - which largely acts as glue for programs such as psql, pdftk, enscript, and the R language. Being able to do more conditional work in psql would make setting up more robust scripts easier and without either losing transaction capabilities or session pooling for improved performance when large numbers of small commands are run in between flow control in done in bash. David J.
Re: [HACKERS] PSQL commands: \quit_if, \quit_unless
2016-12-19 18:30 GMT+01:00 Robert Haas: > On Sat, Dec 17, 2016 at 3:39 PM, Corey Huinker > wrote: > >> To implement \while, we'd also need to remember previous commands so > >> that when we reach the end of the loop, we can rewind and put all of > >> those commands back on the stack to be executed again, or perhaps to > >> be skipped if the \while condition turns out now to be false. > > > > This might be what you meant when you said "those commands back on the > > stack", but I think we'd have to remember not a list of commands, but the > > raw string of bytes from the start of the \while to the \endwhile (or > > equivalent), because any psql vars within that block could themselves be > a > > non-parameter part of a command: > > > > -- this is how I fake an 'exit 0' now: > > \set work_needs_to_be_done t > > select > > case > > when :'work_needs_to_be_done'::boolean then '' > > else '\q' > > end as cmd > > \gset > > :cmd > > > > -- ridiculous example to illustrate complications in remembering past > > commands > > select > > case > >when random() < 0.5 then '\ir my_script.sql' > >when random() < 0.7 'select * from a_table; select count(*) from > > another_table;' > >else 'select null as foo;' > > end as cmd > > \gset > > :cmd > > > > And even then, things get complicated, because an \ir include which > makes it > > this iteration might not make it the next, and the \endwhile might have > been > > inside that include, or vice-versa, an included file starts a \while it > > doesn't finish. > > I see your point. Just out of curiosity, why in the world don't you > use something other than psql for scripting? I mean, if you accessed > the data from Perl or Python or > $INSERT_YOUR_FAVORITE_SCRIPTING_LANGUAGE_HERE, you could do all of > this stuff very easily without any contortions. I've always thought > of psql as something that's fine for interactive use and goofy trivial > scripting but not really suitable for serious work. I grant that you > seem to be making it serve the purpose, but, man. > The integration of any scripting environment with SQL is much more less than in psql - just some easy scenarios are in psql natural. It is similar why some people like me, prefer PLpgSQL against Java, Perl, ... years ago there was a bash integrated with postgres - for me nice idea, but this project is dead. > > > So maybe what we store is a stack of buffers that are currently open > (STDIN > > being captured as a buffer only when a \while starts, everything else > being > > files), and additionally have a stack of positions where a \while started > > (buffer_id, position in buffer). > > Yeah, sounds about right. > > > Additionally, we could assert that all \while-\endwhile pairs must > happen in > > the same MainLoop (aka file), and mismatches are an error. > > Sounds prudent. > > > I'm happy to keep sketching out what control structures might look like > and > > how to implement them. But I'm also happy to leave while/for loops out > for > > now. > > Right, I think that while/for can be left for another time, as long as > the plan doesn't preclude doing it someday. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Re: [HACKERS] Hash Indexes
On Sun, Dec 18, 2016 at 8:54 AM, Amit Kapilawrote: >> I committed remove-hash-wrtbuf and fix_dirty_marking_v1 but I've got >> some reservations about fix_lock_chaining_v1. ISTM that the natural >> fix here would be to change the API contract for _hash_freeovflpage so >> that it doesn't release the lock on the write buffer. Why does it >> even do that? I think that the only reason why _hash_freeovflpage >> should be getting wbuf as an argument is so that it can handle the >> case where wbuf happens to be the previous block correctly. > > Yeah, as of now that is the only case, but for WAL patch, I think we > need to ensure that the action of moving all the tuples to the page > being written and the overflow page being freed needs to be logged > together as an atomic operation. Not really. We can have one operation that empties the overflow page and another that unlinks it and makes it free. > Now apart from that, it is > theoretically possible that write page will remain locked for multiple > overflow pages being freed (when the page being written has enough > space that it can accommodate tuples from multiple overflow pages). I > am not sure if it is worth worrying about such a case because > practically it might happen rarely. So, I have prepared a patch to > retain a lock on wbuf in _hash_freeovflpage() as suggested by you. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PSQL commands: \quit_if, \quit_unless
On Sat, Dec 17, 2016 at 3:39 PM, Corey Huinkerwrote: >> To implement \while, we'd also need to remember previous commands so >> that when we reach the end of the loop, we can rewind and put all of >> those commands back on the stack to be executed again, or perhaps to >> be skipped if the \while condition turns out now to be false. > > This might be what you meant when you said "those commands back on the > stack", but I think we'd have to remember not a list of commands, but the > raw string of bytes from the start of the \while to the \endwhile (or > equivalent), because any psql vars within that block could themselves be a > non-parameter part of a command: > > -- this is how I fake an 'exit 0' now: > \set work_needs_to_be_done t > select > case > when :'work_needs_to_be_done'::boolean then '' > else '\q' > end as cmd > \gset > :cmd > > -- ridiculous example to illustrate complications in remembering past > commands > select > case >when random() < 0.5 then '\ir my_script.sql' >when random() < 0.7 'select * from a_table; select count(*) from > another_table;' >else 'select null as foo;' > end as cmd > \gset > :cmd > > And even then, things get complicated, because an \ir include which makes it > this iteration might not make it the next, and the \endwhile might have been > inside that include, or vice-versa, an included file starts a \while it > doesn't finish. I see your point. Just out of curiosity, why in the world don't you use something other than psql for scripting? I mean, if you accessed the data from Perl or Python or $INSERT_YOUR_FAVORITE_SCRIPTING_LANGUAGE_HERE, you could do all of this stuff very easily without any contortions. I've always thought of psql as something that's fine for interactive use and goofy trivial scripting but not really suitable for serious work. I grant that you seem to be making it serve the purpose, but, man. > So maybe what we store is a stack of buffers that are currently open (STDIN > being captured as a buffer only when a \while starts, everything else being > files), and additionally have a stack of positions where a \while started > (buffer_id, position in buffer). Yeah, sounds about right. > Additionally, we could assert that all \while-\endwhile pairs must happen in > the same MainLoop (aka file), and mismatches are an error. Sounds prudent. > I'm happy to keep sketching out what control structures might look like and > how to implement them. But I'm also happy to leave while/for loops out for > now. Right, I think that while/for can be left for another time, as long as the plan doesn't preclude doing it someday. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical replication existing data copy
On 12/19/16 4:30 AM, Petr Jelinek wrote: > So existing table data can be copied once subscription is created, but > also new tables can be added and their data copied. This is where the > REFRESH PUBLICATION comes into play. Adding table to publication does > not make it automatically replicated by the subscription as the > subscription does not have tracking info for that table. So to add new > table user must call ALTER SUBSCRIPTION ... REFRESH PUBLICATION on > subscriber otherwise the data won't be replicated. Couldn't the subscriber automatically add tracking info when apply stream data arrives for a relation it has not seen before? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Creating a DSA area to provide work space for parallel execution
On Sun, Dec 18, 2016 at 10:33 PM, Thomas Munrowrote: > On Sat, Dec 17, 2016 at 5:41 AM, Robert Haas wrote: >> On Wed, Dec 14, 2016 at 3:25 PM, Robert Haas wrote: >>> Thoughts? >> >> Hearing no objections, I've gone ahead and committed this. If that >> makes somebody really unhappy I can revert it, but I am betting that >> the real story is that nobody cares about preserving T_ID(). > > I suppose LWLock could include a uint16 member 'id' without making > LWLock any larger, since it would replace the padding between > 'tranche' and 'state'. But I think a better solution, if anyone > really wants these T_ID numbers from a dtrace script, would be to add > lock address to the existing lwlock probes, and then figure out a way > to add some new probes to report the base and stride in the right > places so you can do the book keeping in dtrace variables. Interesting idea. My bet is that nobody cares about dtrace very much. probes.d was first added in 2006, and continued to gradually get new probes (all from submissions by Robert Lor) until 2009. Since then, nothing has happened except for perfunctory maintenance by various committers trying to solve other problems who had to maintain the work that had already been done whether they cared about it or not. (I notice that the probes lwlock-acquire-or-wait and lwlock-acquire-or-wait-fail have never been documented.) I would be tempted to rip the whole thing out as an attractive nuisance, except that I bet somebody would complain about that... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning vs. sql_inheritance
On Mon, Dec 19, 2016 at 11:48 AM, Alvaro Herrerawrote: > Any particular reason not to change inhOpt to be a simple boolean, and > remove the enum? No, no particular reason. I thought about it, but I didn't really see any advantage in getting rid of the typedef. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning vs. sql_inheritance
Robert Haas wrote: > On Fri, Dec 16, 2016 at 7:39 PM, Tom Lanewrote: > > Peter Eisentraut writes: > >> On 12/16/16 11:05 AM, Robert Haas wrote: > >>> If we were going to do anything about this, > >>> my vote would be to remove sql_inheritance. > > > >> Go for it. > > > >> Let's also remove the table* syntax then. > > > > Meh --- that might break existing queries, to what purpose? > > > > We certainly shouldn't remove query syntax without a deprecation period. > > I'm less concerned about that for GUCs. > > I agree. Patch attached, just removing the GUC and a fairly minimal > amount of the supporting infrastructure. Any particular reason not to change inhOpt to be a simple boolean, and remove the enum? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Quorum commit for multiple synchronous replication.
Fujii Masao wrote: > Regarding this feature, there are some loose ends. We should work on > and complete them until the release. Please list these in https://wiki.postgresql.org/wiki/Open_Items so that we don't forget. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] invalid combination of options "-D - -F t -X stream" in pg_basebackup
On Mon, Dec 19, 2016 at 5:39 PM, Fujii Masaowrote: > Hi, > > Isn't it better to forbid the conbination of the options "-D -", "-F t" and > "-X stream" in pg_basebackup? This is obviously invalid setting and the > docs > warns this as follows. But currently users can specify such setting and > pg_basebackup can exit unexpectedly with an error. > > --- > If the value - (dash) is specified as target directory, the tar contents > will > be written to standard output, suitable for piping to for example gzip. > This is only possible if the cluster has no additional tablespaces. > --- > Yes, definitely. I'd say that's an oversight in implementing the support for stream-to-tar that it did not detect this issue. Do you want to provide a patch for it, or should I? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
[HACKERS] invalid combination of options "-D - -F t -X stream" in pg_basebackup
Hi, Isn't it better to forbid the conbination of the options "-D -", "-F t" and "-X stream" in pg_basebackup? This is obviously invalid setting and the docs warns this as follows. But currently users can specify such setting and pg_basebackup can exit unexpectedly with an error. --- If the value - (dash) is specified as target directory, the tar contents will be written to standard output, suitable for piping to for example gzip. This is only possible if the cluster has no additional tablespaces. --- Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_background contrib module proposal
2016-12-19 4:21 GMT+05:00 David Fetter: > Couldn't it sleep in increments smaller than a second? Like maybe > milliseconds? Also, it's probably cleaner (or at least more > comprehensible) to write something using format() and dollar quoting > than the line with the double 's. Right. Here's version which waits for half a second. I do not see how to path doubles via dollar sign, pg_background_launch() gets a string as a parameter, it's not EXCEUTE USING. Best regards, Andrey Borodin. diff --git a/contrib/pg_background/Makefile b/contrib/pg_background/Makefile index c4e717d..085fbff 100644 --- a/contrib/pg_background/Makefile +++ b/contrib/pg_background/Makefile @@ -6,6 +6,8 @@ OBJS = pg_background.o EXTENSION = pg_background DATA = pg_background--1.0.sql +REGRESS = pg_background + ifdef USE_PGXS PG_CONFIG = pg_config PGXS := $(shell $(PG_CONFIG) --pgxs) diff --git a/contrib/pg_background/expected/pg_background.out b/contrib/pg_background/expected/pg_background.out new file mode 100644 index 000..a6613ce --- /dev/null +++ b/contrib/pg_background/expected/pg_background.out @@ -0,0 +1,31 @@ +CREATE EXTENSION pg_background; +--run 6 workers which wait .0, .1, .2, .3, .4, .5 seconds respectively +CREATE TABLE input as SELECT x FROM generate_series(0,.5,0.1) x ORDER BY x DESC; +CREATE TABLE output(place int,value float); +--sequence for indication of who's finished on which place +CREATE sequence s start 1; +CREATE TABLE handles as SELECT pg_background_launch(format('select pg_sleep(%s); insert into output values (nextval(''s''),%s);',x,x)) h FROM input; +--wait until everyone finishes +SELECT (SELECT * FROM pg_background_result(h) as (x text) limit 1) FROM handles; +x +-- + SELECT 1 + SELECT 1 + SELECT 1 + SELECT 1 + SELECT 1 + SELECT 1 +(6 rows) + +--output results +SELECT * FROM output ORDER BY place; + place | value +---+--- + 1 | 0 + 2 | 0.1 + 3 | 0.2 + 4 | 0.3 + 5 | 0.4 + 6 | 0.5 +(6 rows) + diff --git a/contrib/pg_background/sql/pg_background.sql b/contrib/pg_background/sql/pg_background.sql new file mode 100644 index 000..770f0b8 --- /dev/null +++ b/contrib/pg_background/sql/pg_background.sql @@ -0,0 +1,12 @@ +CREATE EXTENSION pg_background; + +--run 6 workers which wait .0, .1, .2, .3, .4, .5 seconds respectively +CREATE TABLE input as SELECT x FROM generate_series(0,.5,0.1) x ORDER BY x DESC; +CREATE TABLE output(place int,value float); +--sequence for indication of who's finished on which place +CREATE sequence s start 1; +CREATE TABLE handles as SELECT pg_background_launch(format('select pg_sleep(%s); insert into output values (nextval(''s''),%s);',x,x)) h FROM input; +--wait until everyone finishes +SELECT (SELECT * FROM pg_background_result(h) as (x text) limit 1) FROM handles; +--output results +SELECT * FROM output ORDER BY place; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Retire src/backend/port/dynloader/linux.c ?
On 2016-12-18 22:19:36 -0500, Tom Lane wrote: > Andres Freundwrites: > > Shouldn't we just remove that code? > > What for? I every now and then end up looking at it for a few minutes, and wonder what the hell dld is, just to see that it's old stuff. > It's maintenance-free ... hasn't been touched since 2004. > While I agree with you that it's *probably* dead code, it's hard to > see much upside from removing it. > > If we want to get into arguing whether code is dead or not, there's > an awful lot of potentially removable stuff in the tree, but I doubt > it's worth the trouble to figure out what's really dead. Well, it's sometimes annoying to hit the code while looking around that's actually dead. But I don't feel that strongly. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw bug in 9.6
Etsuro Fujitawrites: > On 2016/12/17 1:13, Tom Lane wrote: >> So I think the rule could be >> "When first asked to produce a path for a given foreign joinrel, collect >> the cheapest paths for its left and right inputs, and make a nestloop path >> (or hashjoin path, if full join) from those, using the join quals needed >> for the current input relation pair. > Seems reasonable. >> Use this as the fdw_outerpath for >> all foreign paths made for the joinrel." > I'm not sure that would work well for foreign joins with sort orders. > Consider a merge join, whose left input is a 2-way foreign join with a > sort order that implements a full join and whose right input is a sorted > local table scan. If the EPQ subplan for the foreign join wouldn't > produce the right sort order, the merge join might break during EPQ > rechecks (note that in this case the EPQ subplan for the foreign join > might produce more than a single row during an EPQ recheck). How so? We only recheck one row at a time, therefore it can be claimed to have any sort order you care about. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication WIP
On 12/18/2016 09:04 PM, Petr Jelinek wrote: On 18/12/16 19:02, Steve Singer wrote: pg_dump is also generating warnings pg_dump: [archiver] WARNING: don't know how to set owner for object type SUBSCRIPTION I know that the plan is to add proper ACL's for publications and subscriptions later. I don't know if we want to leave the warning in until then or do something about it. No, ACLs are separate from owner. This is thinko on my side. I was thinking we can live without ALTER ... OWNER TO for now, but we actually need it for pg_dump and for REASSIGN OWNED. So now I added the OWNER TO for both PUBLICATION and SUBSCRIPTION. When I try to restore my pg_dump with publications I get ./pg_dump -h localhost --port 5440 test |./psql -h localhost --port 5440 test2 ALTER TABLE CREATE PUBLICATION ERROR: unexpected command tag "PUBLICATION This comes from a ALTER PUBLICATION mypub OWNER TO ssinger; Does the OWNER TO clause need to be added to AlterPublicationStmt: instead of AlterOwnerStmt ? Also we should update the tab-complete for ALTER PUBLICATION to show the OWNER to options + the \h help in psql and the reference SGML -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Quorum commit for multiple synchronous replication.
On Sun, Dec 18, 2016 at 9:36 PM, Michael Paquierwrote: > On Fri, Dec 16, 2016 at 10:42 PM, Fujii Masao wrote: >> Attached is the modified version of the patch. Barring objections, I will >> commit this version. > > There is a whitespace: > $ git diff master --check > src/backend/replication/syncrep.c:39: trailing whitespace. > + * Okey, pushed the patch with this fix. Thanks! Regarding this feature, there are some loose ends. We should work on and complete them until the release. (1) Which synchronous replication method, priority or quorum, should be chosen when neither FIRST nor ANY is specified in s_s_names? Right now, a priority-based sync replication is chosen for keeping backward compatibility. However some hackers argued to change this decision so that a quorum commit is chosen because they think that most users prefer to a quorum. (2) There will be still many source comments and documentations that we need to update, for example, in high-availability.sgml. We need to check and update them throughly. (3) The priority value is assigned to each standby listed in s_s_names even in quorum commit though those priority values are not used at all. Users can see those priority values in pg_stat_replication. Isn't this confusing? If yes, it might be better to always assign 1 as the priority, for example. Any other? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw bug in 9.6
On 2016/12/19 13:59, Ashutosh Bapat wrote: On Fri, Dec 16, 2016 at 9:43 PM, Tom Lanewrote: Etsuro Fujita writes: On 2016/12/16 11:25, Etsuro Fujita wrote: As I said upthread, an alternative I am thinking is (1) to create an equivalent nestloop join path using inner/outer paths of a foreign join path, except when that join path implements a full join, in which case a merge/hash join path is used, (2) store it in fdw_outerpath as-is, and (3) during an EPQ recheck, apply postgresRecheckForeignScan to the outer subplan created from the fdw_outerpath as-is. What do you think about that? Let me explain about #1 and #2 a bit more. What I have in mind is: * modify postgresGetForeignPaths so that a simple foreign table scan path is stored into the base relation's PgFdwRelationInfo. * modify postgresGetForeignJoinPaths so that (a) a local join path for a 2-way foreign join is created using simple foreign table scan paths stored in the base relations' PgFdwRelationInfos, and stored into the join relation's PgFdwRelationInfo. (b) a local join path for a 3-way foreign join, whose left input is a 2-way foreign join, is created using a local join path stored in the left input join relation's PgFdwRelationInfo and a simple foreign table scan path stored into the right input base relation's PgFdwRelationInfo. (c) Likewise for higher level foreign joins. (d) local join paths created are passed to create_foreignscan_path and stored into the fdw_outerpaths of the resulting ForeignPahts. Hm, isn't this overcomplicated? As you said earlier, we don't really care all that much whether the fdw_outerpath is free of lower foreign joins, because EPQ setup will select those lower join's substitute EPQ plans anyway. All that we need is that the EPQ tree be a legal implementation of the join order with join quals applied at the right places. So I think the rule could be "When first asked to produce a path for a given foreign joinrel, collect the cheapest paths for its left and right inputs, and make a nestloop path (or hashjoin path, if full join) from those, using the join quals needed for the current input relation pair. Use this as the fdw_outerpath for all foreign paths made for the joinrel." We could use fdw_outerpath of the left and right inputs instead of looking for the cheapest paths. For base relations as left or right relations, use the unparameterized cheapest foreign path. This will ensure that all joins in fdw_outerpath are local joins, making EPQ checks slightly efficient by avoiding redirection at every foreign path. That seems close to what I had in mind described above. One additional work required for that would to store the fdw_outerpath into the RelOptInfo's fdw_private such as PgFdwRelationInfo before add_path for the foreign-join path containing the fdw_outerpath, because add_path might reject the foreign-join path. I think the additional work would make that rather complicated. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Protect syscache from bloating with negative cache entries
Hello, recently one of my customer stumbled over an immoderate catcache bloat. This is a known issue living on the Todo page in the PostgreSQL wiki. https://wiki.postgresql.org/wiki/Todo#Cache_Usage > Fix memory leak caused by negative catcache entries https://www.postgresql.org/message-id/51c0a1ff.2050...@vmware.com This patch addresses the two cases of syscache bloat by using invalidation callback mechanism. Overview of the patch The bloat is caused by negative cache entries in catcaches. They are crucial for performance but it is a problem that there's no way to remove them. They last for the backends' lifetime. The first patch provides a means to flush catcache negative entries, then defines a relcache invalidation callback to flush negative entries in syscaches for pg_statistic(STATRELATTINH) and pg_attributes(ATTNAME, ATTNUM). The second patch implements a syscache invalidation callback so that deletion of a schema causes a flush for pg_class (RELNAMENSP). Both of the aboves are not hard-coded and defined in cacheinfo using additional four members. Remaining problems Still, catcache can bloat by repeatedly accessing non-existent table with unique names in a permanently-living schema but it seems a bit too artificial (or malicious). Since such negative entries don't have a trigger to remove, caps are needed to prevent them from bloating syscaches, but the limits are hardly seem reasonably determinable. Defects or disadvantages This patch scans over whole the target catcache to find negative entries to remove and it might take a (comparably) long time on a catcache with so many entries. By the second patch, unrelated negative caches may be involved in flushing since they are keyd by hashvalue, not by the exact key values. The attached files are the following. 1. 0001-Cleanup-negative-cache-of-pg_statistic-when-dropping.patch Negative entry flushing by relcache invalidation using relcache invalidation callback. 2. 0002-Cleanup-negative-cache-of-pg_class-when-dropping-a-s.patch Negative entry flushing by catcache invalidation using catcache invalidation callback. 3. gen.pl a test script for STATRELATTINH bloating. 4. gen2.pl a test script for RELNAMENSP bloating. 3 and 4 are used as the following, ./gen.pl | psql postgres > /dev/null regards, -- Kyotaro Horiguchi NTT Open Source Software Center >From 203735e13d2b8584e1ddc652b602465a4f839355 Mon Sep 17 00:00:00 2001 From: Kyotaro HoriguchiDate: Thu, 15 Dec 2016 17:43:03 +0900 Subject: [PATCH 1/2] Cleanup negative cache of pg_statistic when dropping a relation. Accessing columns that don't have statistics causes leaves negative entries in catcache for pg_statstic, but there's no chance to remove them. Especially when repeatedly creating then dropping temporary tables bloats catcache so much that memory pressure can be significant. This patch removes negative entries in STATRELATTINH, ATTNAME and ATTNUM when corresponding relation is dropped. --- src/backend/utils/cache/catcache.c | 57 +++- src/backend/utils/cache/syscache.c | 274 +++-- src/include/utils/catcache.h | 3 + src/include/utils/syscache.h | 2 + 4 files changed, 263 insertions(+), 73 deletions(-) diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index 6016d19..c1d9d2f 100644 --- a/src/backend/utils/cache/catcache.c +++ b/src/backend/utils/cache/catcache.c @@ -304,10 +304,11 @@ CatCachePrintStats(int code, Datum arg) if (cache->cc_ntup == 0 && cache->cc_searches == 0) continue; /* don't print unused caches */ - elog(DEBUG2, "catcache %s/%u: %d tup, %ld srch, %ld+%ld=%ld hits, %ld+%ld=%ld loads, %ld invals, %ld lsrch, %ld lhits", + elog(DEBUG2, "catcache %s/%u: %d tup, %d negtup, %ld srch, %ld+%ld=%ld hits, %ld+%ld=%ld loads, %ld invals, %ld lsrch, %ld lhits", cache->cc_relname, cache->cc_indexoid, cache->cc_ntup, + cache->cc_nnegtup, cache->cc_searches, cache->cc_hits, cache->cc_neg_hits, @@ -374,6 +375,10 @@ CatCacheRemoveCTup(CatCache *cache, CatCTup *ct) /* free associated tuple data */ if (ct->tuple.t_data != NULL) pfree(ct->tuple.t_data); + + if (ct->negative) + --cache->cc_nnegtup; + pfree(ct); --cache->cc_ntup; @@ -637,6 +642,49 @@ ResetCatalogCache(CatCache *cache) } /* + * CleanupCatCacheNegEntries + * + * Remove negative cache tuples maching a partial key. + * + */ +void +CleanupCatCacheNegEntries(CatCache *cache, ScanKeyData *skey) +{ + int i; + + /* If this cache has no negative entries, nothing to do */ + if (cache->cc_nnegtup == 0) + return; + + /* searching with a paritial key needs scanning the whole cache */ + for (i = 0; i < cache->cc_nbuckets; i++) + { + dlist_head *bucket = >cc_bucket[i]; + dlist_mutable_iter iter; + + dlist_foreach_modify(iter, bucket) + { + CatCTup*ct = dlist_container(CatCTup, cache_elem, iter.cur); + bool
Re: [HACKERS] Measuring replay lag
On Mon, Dec 19, 2016 at 4:03 PM, Peter Eisentrautwrote: > On 11/22/16 4:27 AM, Thomas Munro wrote: >> Thanks very much for testing! New version attached. I will add this >> to the next CF. > > I don't see it there yet. Thanks for the reminder. Added here: https://commitfest.postgresql.org/12/920/ Here's a rebased patch. -- Thomas Munro http://www.enterprisedb.com replay-lag-v14.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Make pg_basebackup -x stream the default
On Sat, Dec 17, 2016 at 2:37 PM, Magnus Haganderwrote: > Attached is an updated patch that does this. As a bonus it simplifies the > code a bit. I also fixed an error message that I missed updating in the > previous patch. looks good to me. Still applies cleanly at head with all tests pass. I have no further comments, although since I'm not experienced Postgres reviewer, commiter may want to take another look before merging. -- Vladimir Rusinov Storage SRE, Google Ireland Google Ireland Ltd.,Gordon House, Barrow Street, Dublin 4, Ireland Registered in Dublin, Ireland Registration Number: 368047 smime.p7s Description: S/MIME Cryptographic Signature
Re: [HACKERS] Measuring replay lag
On 26 October 2016 at 11:34, Thomas Munrowrote: > It works by taking advantage of the { time, end-of-WAL } samples that > sending servers already include in message headers to standbys. That > seems to provide a pretty good proxy for when the WAL was written, if > you ignore messages where the LSN hasn't advanced. The patch > introduces a new GUC replay_lag_sample_interval, defaulting to 1s, to > control how often the standby should record these timestamped LSNs > into a small circular buffer. When its recovery process eventually > replays a timestamped LSN, the timestamp is sent back to the upstream > server in a new reply message field. The value visible in > pg_stat_replication.replay_lag can then be updated by comparing with > the current time. Why not just send back the lag as calculated by max_standby_streaming_delay? I.e. at the end of replay of each chunk record the current delay in shmem, then send it back periodically. If we have two methods of calculation it would be confusing. Admittedly the approach here is the same one I advocated a some years back when Robert and I were discussing time delayed standbys. > Compared to the usual techniques people use to estimate replay lag, > this approach has the following advantages: > > 1. The lag is measured in time, not LSN difference. > 2. The lag time is computed using two observations of a single > server's clock, so there is no clock skew. > 3. The lag is updated even between commits (during large data loads etc). Yes, good reasons. > In the previous version I was effectively showing the ping time > between the servers during idle times when the standby was fully > caught up because there was nothing happening. I decided that was not > useful information and that it's more newsworthy and interesting to > see the estimated replay lag for the most recent real replayed > activity, so I changed that. > > In the last thread[1], Robert Haas wrote: >> Well, one problem with this is that you can't put a loop inside of a >> spinlock-protected critical section. > > Fixed. > >> In general, I think this is a pretty reasonable way of attacking this >> problem, but I'd say it's significantly under-commented. Where should >> someone go to get a general overview of this mechanism? The answer is >> not "at place XXX within the patch". (I think it might merit some >> more extensive documentation, too, although I'm not exactly sure what >> that should look like.) > > I have added lots of comments. > >> When you overflow the buffer, you could thin in out in a smarter way, >> like by throwing away every other entry instead of the oldest one. I >> guess you'd need to be careful how you coded that, though, because >> replaying an entry with a timestamp invalidates some of the saved >> entries without formally throwing them out. > > Done, by overwriting the newest sample rather than the oldest if the > buffer is full. That seems to give pretty reasonable degradation, > effectively lowering the sampling rate, without any complicated buffer > or rate management code. > >> Conceivably, 0002 could be split into two patches, one of which >> computes "stupid replay lag" considering only records that naturally >> carry timestamps, and a second adding the circular buffer to handle >> the case where much time passes without finding such a record. > > I contemplated this but decided that it'd be best to use ONLY samples > from walsender headers, and never use the time stamps from commit > records for this. If we use times from commit records, then a > cascading sending server will not be able to compute the difference in > time without introducing clock skew (not to mention the difficulty of > combining timestamps from two sources if we try to do both). I > figured that it's better to have value that shows a cascading > sender->standby->cascading sender round trip time that is free of > clock skew, than a master->cascading sender->standby->cascading sender > incomplete round trip that includes clock skew. > > By the same reasoning I decided against introducing a new periodic WAL > record or field from the master to hold extra time stamps in between > commits to do this, in favour of the buffered transient timestamp > approach I took in this patch. That said, I can see there are > arguments for doing it with extra periodic WAL timestamps, if people > don't think it'd be too invasive to mess with the WAL for this, and > don't care about cascading standbys giving skewed readings. One > advantage would be that persistent WAL timestamps would still be able > to provide lag estimates if a standby has been down for a while and > was catching up, and this approach can't until it's caught up due to > lack of buffered transient timestamps. Thoughts? -1 for adding anything to the WAL for this. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via
Re: [HACKERS] increasing the default WAL segment size
Hello all, Please find attached a patch to make wal segment size initdb configurable. The attached patch removes --with-wal-segsize configure option and adds a new initdb option --wal-segsize. The module initdb passes the wal-segsize value into an environment variable which is used to overwrite the guc value wal_ segment_size and set the internal variables : XLogSegSize and XLOG_SEG_SIZE (xlog_internal.h). The default wal_segment_size is not changed but I have increased the maximum size to 1GB. Since XLOG_SEG_SIZE is now variable, it could not be used directly in src/bin modules and few macros and few changes had to be made: - in guc.c , remove GUC_UNIT_XSEGS which used XLOG_SEG_SIZE and introduce show functions for the guc which used the unit (min_wal_size and max_wal_size). - For pg_basebackup, add new replication command SHOW_WAL_SEGSZ to fetch the wal_segment_size in bytes. - pg_controldata, pg_resetxlog, pg_rewind, fetch the xlog_seg_size from the ControlFile. - Since pg_xlogdump reads the wal files, it uses the file size to determine the xlog_seg_size. - In pg_test_fsync, a buffer of size XLOG_SEG_SIZE was created, filled with random data and written to a temporary file to check for any write/fsync error before performing the tests. Since it does not affect the actual performance results, the XLOG_SEG_SIZE in the module is replaced with the default value (16MB). Please note that the documents are not updated in this patch. Feedback and suggestions are welcome. -- Beena Emerson Have a Great Day! initdb-walsegsize_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Slow I/O can break throttling of base backup
On Fri, Dec 16, 2016 at 11:24 AM, Antonin Houskawrote: > Antonin Houska wrote: > > > It seems to be my bug. I'll check tomorrow. > > I could reproduce the problem by adding sufficient sleep time to the > loop. > > > Magnus Hagander wrote: > >> I wonder if the else if (sleep > 0) at the bottom of throttle() should > just > >> be a simple else and always run, resetting last_throttle? > > I agree. In fact, I could simplify the code even more. > > Since (elapsed + sleep) almost equals to GetCurrentIntegerTimestamp(), we > can > use the following statement unconditionally (I think I tried too hard to > avoid > calling GetCurrentIntegerTimestamp too often in the original patch): > > throttled_last = GetCurrentIntegerTimestamp(); > > Thus we can also get rid of the "else" branch that clears both "sleep" and > "wait_result". > > (The patch contains minor comment refinement that I found useful when > seeing > the code after years.) > > Thanks! Applied and backpatched. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Logical Replication WIP
On 19/12/16 08:04, Erik Rijkers wrote: > On 2016-12-18 11:12, Petr Jelinek wrote: > > (now using latest: patchset:) > > 0001-Add-PUBLICATION-catalogs-and-DDL-v14.patch > 0002-Add-SUBSCRIPTION-catalog-and-DDL-v14.patch > 0003-Define-logical-replication-protocol-and-output-plugi-v14.patch > 0004-Add-logical-replication-workers-v14.patch > 0005-Add-separate-synchronous-commit-control-for-logical--v14.patch > >> BTW you don't need to add primary key to pgbench_history. Simply ALTER >> TABLE pgbench_history REPLICA IDENTITY FULL; should be enough. > > Either should, but neither is. > > set-up: > Before creating the publication/subscription: > On master I run pgbench -qis 1, then set replica identity (and/or add > serial column) for pgbench_history, then dump/restore the 4 pgbench > tables from master to replica. > Then enabling publication/subscription. logs looks well. (Other tests > I've devised earlier (on other tables) still work nicely.) > > Now when I do a pgbench-run on master, something like: > >pgbench -c 1 -T 20 -P 1 > > I often see this (when running pgbench): > > ERROR: publisher does not send replica identity column expected by the > logical replication target public.pgbench_tellers > or, sometimes (less often) the same ERROR for pgbench_accounts appears > (as in the subsciber-log below) > > -- publisher log > 2016-12-19 07:44:22.738 CET 22690 LOG: logical decoding found > consistent point at 0/14598C78 > 2016-12-19 07:44:22.738 CET 22690 DETAIL: There are no running > transactions. > 2016-12-19 07:44:22.738 CET 22690 LOG: exported logical decoding > snapshot: "000130FA-1" with 0 transaction IDs > 2016-12-19 07:44:22.886 CET 22729 LOG: starting logical decoding for > slot "sub1" > 2016-12-19 07:44:22.886 CET 22729 DETAIL: streaming transactions > committing after 0/14598CB0, reading WAL from 0/14598C78 > 2016-12-19 07:44:22.886 CET 22729 LOG: logical decoding found > consistent point at 0/14598C78 > 2016-12-19 07:44:22.886 CET 22729 DETAIL: There are no running > transactions. > 2016-12-19 07:45:25.568 CET 22729 LOG: could not receive data from > client: Connection reset by peer > 2016-12-19 07:45:25.568 CET 22729 LOG: unexpected EOF on standby > connection > 2016-12-19 07:45:25.580 CET 26696 LOG: starting logical decoding for > slot "sub1" > 2016-12-19 07:45:25.580 CET 26696 DETAIL: streaming transactions > committing after 0/1468E0D0, reading WAL from 0/1468DC90 > 2016-12-19 07:45:25.589 CET 26696 LOG: logical decoding found > consistent point at 0/1468DC90 > 2016-12-19 07:45:25.589 CET 26696 DETAIL: There are no running > transactions. > > -- subscriber log > 2016-12-19 07:44:22.878 CET 17027 LOG: starting logical replication > worker for subscription 24581 > 2016-12-19 07:44:22.883 CET 22726 LOG: logical replication apply for > subscription sub1 started > 2016-12-19 07:45:11.069 CET 22726 WARNING: leaked hash_seq_search scan > for hash table 0x2def1a8 > 2016-12-19 07:45:25.566 CET 22726 ERROR: publisher does not send > replica identity column expected by the logical replication target > public.pgbench_accounts > 2016-12-19 07:45:25.568 CET 16984 LOG: worker process: logical > replication worker 24581 (PID 22726) exited with exit code 1 > 2016-12-19 07:45:25.568 CET 17027 LOG: starting logical replication > worker for subscription 24581 > 2016-12-19 07:45:25.574 CET 26695 LOG: logical replication apply for > subscription sub1 started > 2016-12-19 07:46:10.950 CET 26695 WARNING: leaked hash_seq_search scan > for hash table 0x2def2c8 > 2016-12-19 07:46:10.950 CET 26695 WARNING: leaked hash_seq_search scan > for hash table 0x2def2c8 > 2016-12-19 07:46:10.950 CET 26695 WARNING: leaked hash_seq_search scan > for hash table 0x2def2c8 > > > Sometimes replication (caused by a pgbench run) runs for a few seconds > replicating all 4 pgbench tables correctly, but never longer than 10 to > 20 seconds. > > If you cannot reproduce with the provided info it I will make a more > precise setup-desciption, but it's so solidly failing here that I hope > that won't be necessary. > Hi, nope can't reproduce that. I can reproduce the leaked hash_seq_search. The attached fixes that. But no issues with replication itself. The error basically means that pkey on publisher and subscriber isn't the same. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c index d795c5b..7ab94f5 100644 --- a/src/backend/replication/logical/relation.c +++ b/src/backend/replication/logical/relation.c @@ -59,6 +59,7 @@ logicalrep_relmap_invalidate_cb(Datum arg, Oid reloid) if (entry->reloid == reloid) { entry->reloid = InvalidOid; + hash_seq_term(); break;