Re: Online enabling of checksums
> 28 февр. 2018 г., в 6:22, Daniel Gustafssonнаписал(а): > >> Is there any way we could provide this functionality for previous versions >> (9.6,10)? Like implement utility for offline checksum enabling, without >> WAL-logging, surely. > > While outside the scope of the patch in question (since it deals with enabling > checksums online), such a utility should be perfectly possible to write. I've tried to rebase this patch to 10 and, despite minor rebase issues (oids, bgw_type, changes to specscanner), patch works fine. Do we provide backporting for such features? Best regards, Andrey Borodin.
Re: Contention preventing locking
On 28.02.2018 16:32, Amit Kapila wrote: On Mon, Feb 26, 2018 at 8:26 PM, Konstantin Knizhnikwrote: On 26.02.2018 17:20, Amit Kapila wrote: Can you please explain, how it can be done easily without extra tuple locks? I have tried to read your patch but due to lack of comments, it is not clear what you are trying to achieve. As far as I can see you are changing the locktag passed to LockAcquireExtended by the first waiter for the transaction. How will it achieve the serial waiting protocol (queue the waiters for tuple) for a particular tuple being updated? The idea of transaction lock chaining was very simple. I have explained it in the first main in this thread. Assumed that transaction T1 has updated tuple R1. Transaction T2 also tries to update this tuple and so waits for T1 XID. If then yet another transaction T3 also tries to update R1, then it should wait for T2, not for T1. Isn't this exactly we try to do via tuple locks (heap_acquire_tuplock)? Currently, T2 before waiting for T1 will acquire tuple lock on R1 and T3 will wait on T2 (not on T-1) to release the tuple lock on R-1 and similarly, the other waiters should form a queue and will be waked one-by-one. After this as soon T2 is waked up, it will release the lock on tuple and will try to fetch the updated tuple. Now, releasing the lock on tuple by T-2 will allow T-3 to also proceed and as T-3 was supposed to wait on T-1 (according to tuple satisfies API), it will immediately be released and it will also try to do the same work as is done by T-2. One of those will succeed and other have to re-fetch the updated-tuple again. Yes, but two notices: 1. Tuple lock is used inside heap_* functions. But not in EvalPlanQualFetch where transaction lock is also used. 2. Tuple lock is hold until the end of update, not until commit of the transaction. So other transaction can receive conrol before this transaction is completed. And contention still takes place. Contention is reduced and performance is increased only if locks (either tuple lock, either xid lock) are hold until the end of transaction. Unfortunately it may lead to deadlock. My last attempt to reduce contention was to replace shared lock with exclusive in XactLockTableWait and removing unlock from this function. So only one transaction can get xact lock and will will hold it until the end of transaction. Also tuple lock seems to be not needed in this case. It shows better performance on pgrw test but on YCSB benchmark with workload A (50% of updates) performance was even worser than with vanilla postgres. But was is wost of all - there are deadlocks in pgbench tests. I think in this whole process backends may need to wait multiple times either on tuple lock or xact lock. It seems the reason for these waits is that we immediately release the tuple lock (acquired by heap_acquire_tuplock) once the transaction on which we were waiting is finished. AFAICU, the reason for releasing the tuple lock immediately instead of at end of the transaction is that we don't want to accumulate too many locks as that can lead to the unbounded use of shared memory. How about if we release the tuple lock at end of the transaction unless the transaction acquires more than a certain threshold (say 10 or 50) of such locks in which case we will fall back to current strategy? Certainly, I have tested such version. Unfortunately it doesn't help. Tuple lock is using tuple TID. But once transaction has made the update, new version of tuple will be produced with different TID and all new transactions will see this version, so them will not notice this lock at all. This is why my first attempt to address content was to replace TID lock with PK (primary key) lock. And it really helps to reduce contention and degradation of performance with increasing number of connections. But it is not so easy to correctly extract Pk in all cases. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
RE: [bug fix] pg_rewind creates corrupt WAL files, and the standby cannot catch up the primary
From: Michael Paquier [mailto:mich...@paquier.xyz] > Yes, it should not copy those WAL files. Most of the time they are going > to be meaningless. See this recent thread: > https://www.postgresql.org/message-id/20180126023609.GH17847%40paquier > .xyz > So I would rather go this way instead of having to worry about manipulating > those WAL segments as you do. Depending on the needs, I think that even > a backpatch could be considered. Thank you for information. I didn't notice those activities going around pg_rewind. It's a regret that Chen's patch, which limits the WAL to be copied, is not committed yet. It looks good to be ready for committer. > > Related to this, shouldn't pg_rewind avoid copying more files and > > directories like pg_basebackup? Currently, pg_rewind doesn't copy > > postmaster.pid, postmaster.opts, and temporary files/directories > > (pg_sql_tmp/). > > Yes, it should not copy those files. I have a patch in the current CF to > do that: > https://commitfest.postgresql.org/17/1507/ Wow, what a great patch. I think I should look at it. But I'm afraid it won't be backpatched because it's big... Even with your patch and Chen's one, my small patch is probably necessary to avoid leaving 0-byte or half-baked files. I'm not sure whether those strangely sized files would cause actual trouble, but maybe it would be healthy to try to clean things up as much as possible. (files in pg_twophase/ might emit WARNING messages, garbage server log files might make the DBA worried, etc.; yes, these may be just FUD.) Regards Takayuki Tsunakawa
Re: Two-phase update of restart_lsn in LogicalConfirmReceivedLocation
On 1 March 2018 at 13:39, Arseny Sherwrote: > Hello, > > In LogicalConfirmReceivedLocation two fields (data.catalog_xmin and > effective_catalog_xmin) of ReplicationSlot structure are used for > advancing xmin of the slot. This allows to avoid hole when tuples might > already have been vacuumed, but slot's state was not yet flushed to the > disk: if we crash during this hole, after restart we would assume that > all tuples with xmax > old catalog_xmin are still there, while actually > some of them might be already vacuumed. However, the same dodge is not > used for restart_lsn advancement. This means that under somewhat > unlikely circumstances it is possible that we will start decoding from > segment which was already recycled, making the slot broken. Shouldn't > this be fixed? You mean the small window between MyReplicationSlot->data.restart_lsn = MyReplicationSlot->candidate_restart_lsn; and ReplicationSlotMarkDirty(); ReplicationSlotSave(); in LogicalConfirmReceivedLocation ? We do release the slot spinlock after updating the slot and before dirtying and flushing it. But to make the change visible, someone else would have to call ReplicationSlotsComputeRequiredLSN(). That's possible by the looks, and could be caused by a concurrent slot drop, physical slot confirmation, or another logical slot handling a concurrent confirmation. For something to break, we'd have to * hit this race to update XLogCtl->replicationSlotMinLSN by a concurrent call to ReplicationSlotsComputeRequiredLSN while in LogicalConfirmReceivedLocation * have the furthest-behind slot be the one in the race window in LogicalConfirmReceivedLocation * checkpoint here, before slot is marked dirty * actually recycle/remove the needed xlog * crash before writing the new slot state Checkpoints write out slot state. But only for dirty slots. And we didn't dirty the slot while we had its spinlock, we only dirty it just before writing. So I can't say it's definitely impossible. It seems astonishingly unlikely, but that's not always good enough. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] pgbench randomness initialization
Hello Tom, Fabien COELHOwrites: This is a simple patch that does what it says on the tin. I ran into trouble with the pgbench TAP test *even before applying the patch*, but only because I was doing a VPATH build as a user without 'write' on the source tree (001_pgbench_with_server.pl tried to make pgbench create log files there). Bad me. Oddly, that was the only test in the whole tree to have such an issue, so here I add a pre-patch to fix that. Now my review needs a review. :) Yep. I find the multiple chdir solution a little bit too extreme. ISTM that it should rather add the correct path to --log-prefix by prepending $node->basedir, like the pgbench function does for -f scripts. See attached. Hm ... so I tried to replicate this problem, and failed to: the log files get made under the VPATH build directory, as desired, even without this patch. Am I doing something wrong, or is this platform-dependent somehow? As I recall, it indeed works if the source directories are rw, but fails if they are ro because then the local mkdir fails. So you would have to do a vpath build with sources that are ro to get the issue the patch is fixing. Otherwise, the issue would have been cought earlier by the buildfarm, which I guess is doing vpath compilation and full validation. -- Fabien.
Re: server crash in nodeAppend.c
On Wed, Feb 28, 2018 at 9:29 PM, Robert Haaswrote: > Nice test case. I pushed commit > ce1663cdcdbd9bf15c81570277f70571b3727dd3, including your test case, to > fix this. Thanks Robert for fix and commit. I have reverified commit, this is working fine now. Thanks & Regards, Rajkumar Raghuwanshi QMG, EnterpriseDB Corporation
Re: CALL optional in PL/pgSQL
2018-03-01 5:51 GMT+01:00 Peter Eisentraut: > This seems to be a popular issue when porting from PL/SQL, so I'll throw > it out here for discussion. Apparently, in PL/SQL you can call another > procedure without the CALL keyword. Here is a patch that attempts to > implement that in PL/pgSQL as well. It's not very pretty. > The CALL is not optional in PL/SQL - I was surprised - it is required in some environments, and it should not be used in other (like PL/SQL) please, fix me, if I am wrong. SQL/PSM requires it. I agree, so in this case, the CALL can be optional - because procedures are called by different mechanism than functions - and there is not additional overhead. It is not strictly necessary, because tools like ora2pg has not any problem with procedure identification and some transformations. But - if we allow optional CALL in PL/pgSQL, then we will have inconsistence between PL/pgSQL and other environments, when the CALL will be required. What is not too nice. > I seem to recall that there were past discussions about this, with > respect to the PERFORM command, but I couldn't find them anymore. > > Also, I think PL/SQL allows you to call a procedure with no arguments > without parentheses. I have not implemented that. I think it could be > done, but it's not very appealing. > I don't like this feature. I don't see any benefit. Different case are functions - then users can implement some pseudovariables like CURRENT_USER, .. > If anyone has more details about the PL/SQL side of this, that would be > useful. What I could find is that using CALL and not using CALL appear > to be equivalent. > > -- > Peter Eisentraut http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
Re: Synchronous replay take III
On Thu, Mar 1, 2018 at 2:39 PM, Thomas Munrowrote: > I was pinged off-list by a fellow -hackers denizen interested in the > synchronous replay feature and wanting a rebased patch to test. Here > it goes, just in time for a Commitfest. Please skip to the bottom of > this message for testing notes. Moved to next CF based on https://www.postgresql.org/message-id/24193.1519882945%40sss.pgh.pa.us . -- Thomas Munro http://www.enterprisedb.com
Re: 2018-03 Commitfest starts tomorrow
David Steelewrites: > I'll be starting the Commitfest at midnight AoE (07:00 ET, 13:00 CET) so > please get your patches in before then. > Please remember that if you drop a new and large (or invasive patch) > into this CF it may be moved to the next CF. > This last CF for PG11 should generally be restricted to patches that > have gone through review in prior CFs or are modest in scope. As of right now, there are 229 entries in this commitfest (not counting the items already committed or RWF). It's hard for me to tell for sure, because the activity log page doesn't stretch back far enough, but I think about 40 of those have been created in the last 24 hours. It's certainly well over 20, because the log does go back far enough to show 21 patch records created since about noon Wednesday UTC. This is NOT how it is supposed to work. This is gaming the system. I think that we should summarily bounce to the September 'fest anything submitted in the last two days; certainly anything that's nontrivial. There is no way that we can possibly handle 200+ CF entries in a month. A large fraction of these are going to end up pushed to September no matter what, and I think simple fairness demands that we spend our time on the ones that are not Johnny-come-latelies. We also need to be pretty hard-nosed about quickly bouncing anything that's not realistically going to be able to get committed this month. I've not gone through the list in detail, but there are at least several waiting-on-author items that have seen no activity since the last 'fest. I'd recommend marking those RWF pretty soon. regards, tom lane
Re: PATCH: Unlogged tables re-initialization tests
On Thu, Mar 1, 2018 at 9:24 AM, David Steelewrote: > These tests were originally included in the exclude unlogged tables > patch [1] to provide coverage for the refactoring of reinit.c. Hi David, +# The following tests test symlinks. Windows doesn't have symlinks, so +# skip on Windows. Could you please explain this a bit more? I don't see any symlinks being used directly in this test. I do see CREATE TABLESPACE and that uses symlink(), but win32_port.h converts that to "junctions", apparently the Windows equivalent. Is there some reason that doesn't work with this test? If, by any chance, you are talking about whatever dark force prevents the "tablespace" regression test from succeeding when run as a user with administrative privileges on Windows, then I would *love* to hear an explanation for that phenomenon and how to fix it, because it currently prevents me from automatically testing all Commitfest patches on the appveyor.com Windows build farm. I know that it passes as a non-administrative user. -- Thomas Munro http://www.enterprisedb.com
Re: Why chain of snapshots is used in ReorderBufferCommit?
Hi, On 2018-03-01 08:17:33 +0300, Arseny Sher wrote: > While prowling through snapbuild & reorderbuffer code, I wondered: why a queue > of snapshots is used for replaying each transaction instead of just picking up > snapshot from snapbuilder once when COMMIT record is read? I am not aware of > any > DDL/DML mix which would make this invalid: e.g. we can't insert something into > table in xact T1, then alter a column in xact T2, and then insert something > more > in T1. All ALTER TABLE forms which are currently interesting from the decoding > POV obtain ACCESS EXCLUSIVE lock, conflicting with any other manipulations on > the > table: > https://www.postgresql.org/docs/devel/static/sql-altertable.html I don't think that's right. For one there's plenty types of DDL where no such locking exists, consider e.g. composite datums where additional columns can be added even after such a type is already used by another table. For another, T1 might need to see a DDL change to a table that has been made by T2 after T1 started, which is not prohibited by locking if T1 uses the table first after T2 commits. - Andres
Re: Online enabling of checksums
> 28 февр. 2018 г., в 22:06, Robert Haasнаписал(а): > > On Sun, Feb 25, 2018 at 9:54 AM, Magnus Hagander wrote: >> Also if that wasn't clear -- we only do the full page write if there isn't >> already a checksum on the page and that checksum is correct. > > Hmm. > > Suppose that on the master there is a checksum on the page and that > checksum is correct, but on the standby the page contents differ in > some way that we don't always WAL-log, like as to hint bits, and there > the checksum is incorrect. Then you'll enable checksums when the > standby still has some pages without valid checksums, and disaster > will ensue. > > I think this could be hard to prevent if checksums are turned on and > off multiple times. This seems 100% valid concern. If pages can be binary different (on master and standby), we have to log act of checksumming a page. And WAL replay would have to verify that his checksum is OK. What is unclear to me is what standby should do if he sees incorrect checksum? Change his page? Request page from master? Shutdown? Or should we just WAL-log page whenever it is checksummed by worker? Even if the checksum was correct? Best regards, Andrey Borodin.
Re: CALL optional in PL/pgSQL
On Wednesday, February 28, 2018, Peter Eisentraut < peter.eisentr...@2ndquadrant.com> wrote: > > > I seem to recall that there were past discussions about this, with > respect to the PERFORM command, but I couldn't find them anymore. > I'm thinking you are thinking of this one. https://www.postgresql.org/message-id/CAFjFpReVcC%2BRE3WBJb2X1-O%3D_%2BORiZggDZ-Orr0QafeuAC7k-w%40mail.gmail.com David J.
RE: [bug fix] Produce a crash dump before main() on Windows
From: Tsunakawa, Takayuki [mailto:tsunakawa.ta...@jp.fujitsu.com] > Another idea to add to the current patch is to move the call to SetErrorMode() > to the below function, which is called first in main(). How about this? > > void > pgwin32_install_crashdump_handler(void) > { > SetUnhandledExceptionFilter(crashDumpHandler); > } I moved SetErrorMode() to the beginning of main(). It should be placed before any code which could crash. The current location is a bit late: in fact, write_stderr() crashed when WSAStartup() failed. Regards Takayuki Tsunakawa crash_dump_before_main_v2.patch Description: crash_dump_before_main_v2.patch
Re: [HACKERS] path toward faster partition pruning
On Thu, Mar 1, 2018 at 6:57 AM, Amit Langotewrote: > On 2018/02/28 19:14, Ashutosh Bapat wrote: >> On Wed, Feb 28, 2018 at 6:42 AM, Amit Langote wrote: >>> BTW, should there be a relevant test in partition_join.sql? If yes, >>> attached a patch (partitionwise-join-collation-test-1.patch) to add one. >> >> A partition-wise join path will be created but discarded because of >> higher cost. This test won't see it in that case. So, please add some >> data like other tests and add command to analyze the partitioned >> tables. That kind of protects from something like that. > > Thanks for the review. > > Hmm, the added test is such that the partition collations won't match, so > partition-wise join won't be considered at all due to differing > PartitionSchemes, unless I'm missing something. > The point is we wouldn't know whether PWJ was not selected because of PartitionScheme mismatch OR the PWJ paths were expensive compared to non-PWJ as happens with empty tables. In both the cases we will see a non-PWJ "plan" although in one case PWJ was not possible and in the other it was possible. I think what we want to test is that PWJ Is not possible with collation mismatch. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
CALL optional in PL/pgSQL
This seems to be a popular issue when porting from PL/SQL, so I'll throw it out here for discussion. Apparently, in PL/SQL you can call another procedure without the CALL keyword. Here is a patch that attempts to implement that in PL/pgSQL as well. It's not very pretty. I seem to recall that there were past discussions about this, with respect to the PERFORM command, but I couldn't find them anymore. Also, I think PL/SQL allows you to call a procedure with no arguments without parentheses. I have not implemented that. I think it could be done, but it's not very appealing. If anyone has more details about the PL/SQL side of this, that would be useful. What I could find is that using CALL and not using CALL appear to be equivalent. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From ffd93d16fd046bf0352eec914f3ae087390370ef Mon Sep 17 00:00:00 2001 From: Peter EisentrautDate: Wed, 28 Feb 2018 23:22:17 -0500 Subject: [PATCH] PL/pgSQL: Allow calling procedures without CALL keyword For compatibility with Oracle, allow calling procedures without the CALL keyword. So BEGIN someproc(); END is the same thing as BEGIN CALL someproc(); END This works as long as someproc is not a keyword. --- src/pl/plpgsql/src/expected/plpgsql_call.out | 6 - src/pl/plpgsql/src/pl_gram.y | 34 ++-- src/pl/plpgsql/src/sql/plpgsql_call.sql | 2 ++ 3 files changed, 39 insertions(+), 3 deletions(-) diff --git a/src/pl/plpgsql/src/expected/plpgsql_call.out b/src/pl/plpgsql/src/expected/plpgsql_call.out index e2442c603c..11288ede87 100644 --- a/src/pl/plpgsql/src/expected/plpgsql_call.out +++ b/src/pl/plpgsql/src/expected/plpgsql_call.out @@ -43,6 +43,8 @@ AS $$ BEGIN CALL test_proc3(y); CALL test_proc3($1); +test_proc3(y); +public.test_proc3(y); END; $$; CALL test_proc4(66); @@ -51,7 +53,9 @@ SELECT * FROM test1; 66 66 -(2 rows) + 66 + 66 +(4 rows) DROP PROCEDURE test_proc1; DROP PROCEDURE test_proc2; diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y index 688fbd6531..3c9b3e9fce 100644 --- a/src/pl/plpgsql/src/pl_gram.y +++ b/src/pl/plpgsql/src/pl_gram.y @@ -1921,7 +1921,25 @@ stmt_execsql : K_IMPORT plpgsql_push_back_token(tok); if (tok == '=' || tok == COLON_EQUALS || tok == '[') word_is_not_variable(&($1), @1); - $$ = make_execsql_stmt(T_WORD, @1); + else if (tok == '(' && + /* check for keywords that can be followed by ( */ +($1.quoted || (strcmp($1.ident, "explain") != 0 && + strcmp($1.ident, "reindex") != 0 && + strcmp($1.ident, "select") != 0 && + strcmp($1.ident, "vacuum") != 0 && + strcmp($1.ident, "values") != 0))) + { + PLpgSQL_stmt_execsql *new; + + new = palloc0(sizeof(PLpgSQL_stmt_execsql)); + new->cmd_type = PLPGSQL_STMT_EXECSQL; + new->lineno = plpgsql_location_to_lineno(@1); + new->sqlstmt = read_sql_stmt(psprintf("CALL %s", quote_identifier(($1).ident))); + + $$ = (PLpgSQL_stmt *)new; + } + else + $$ = make_execsql_stmt(T_WORD, @1); } | T_CWORD { @@ -1931,7 +1949,19 @@ stmt_execsql : K_IMPORT plpgsql_push_back_token(tok); if (tok == '=' || tok == COLON_EQUALS || tok == '[') cword_is_not_variable(&($1), @1); - $$ = make_execsql_stmt(T_CWORD, @1); + else
Re: [HACKERS] SERIALIZABLE with parallel query
On Mon, Feb 26, 2018 at 6:37 PM, Thomas Munrowrote: > I've now broken it into two patches. Rebased. -- Thomas Munro http://www.enterprisedb.com 0001-Enable-parallel-query-with-SERIALIZABLE-isolatio-v13.patch Description: Binary data 0002-Enable-the-read-only-SERIALIZABLE-optimization-f-v13.patch Description: Binary data
fixing more format truncation issues
In 6275f5d28a1577563f53f2171689d4f890a46881, we fixed warnings from the options -Wformat-overflow and -Wformat-truncation, which are part of -Wall in gcc 7. Here, I propose to dial this up a bit by adding -Wformat-overflow=2 -Wformat-truncation=2, which use some more worst-case approaches for estimating the buffer sizes. AFAICT, the issues addressed here either can't really happen without trying very hard, or would cause harmless output truncation. Still, it seems good to clean this up properly and not rely on made-up buffer size guesses that turn out to be wrong, even if we don't want to adopt the warning options by default. One issue that is of external interest is that I increase BGW_MAXLEN from 64 to 96. Apparently, the old value would cause the bgw_name of logical replication workers to be truncated in some circumstances. I have also seen truncated background worker names with third-party packages, so giving some more room here would be useful. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 99316e6db5a98f5daaaf61a75a01b88e7b2f39bf Mon Sep 17 00:00:00 2001 From: Peter EisentrautDate: Wed, 28 Feb 2018 23:11:24 -0500 Subject: [PATCH] Fix more format truncation issues Add compiler warning options -Wformat-overflow=2 -Wformat-truncation=2, supported since GCC 7, and fix the warnings that they create. The issues are all harmless, but some dubious coding patterns are cleaned up. --- configure| 71 configure.in | 3 ++ contrib/pgstattuple/pgstattuple.c| 2 +- src/backend/commands/explain.c | 5 ++- src/backend/utils/adt/dbsize.c | 2 +- src/backend/utils/adt/float.c| 24 +-- src/backend/utils/adt/formatting.c | 33 +-- src/backend/utils/misc/guc.c | 4 +- src/bin/initdb/initdb.c | 6 +-- src/bin/pg_dump/pg_backup_archiver.c | 2 +- src/bin/pg_dump/pg_backup_tar.c | 2 +- src/bin/pgbench/pgbench.c| 4 +- src/include/postmaster/bgworker.h| 2 +- src/interfaces/libpq/fe-secure-openssl.c | 2 +- src/pl/tcl/pltcl.c | 2 +- 15 files changed, 111 insertions(+), 53 deletions(-) diff --git a/configure b/configure index 7dcca506f8..48a7546513 100755 --- a/configure +++ b/configure @@ -4595,6 +4595,77 @@ if test x"$pgac_cv_prog_cc_cflags__Wformat_security" = x"yes"; then CFLAGS="$CFLAGS -Wformat-security" fi + # gcc 7+ + { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $CC supports -Wformat-overflow=2" >&5 +$as_echo_n "checking whether $CC supports -Wformat-overflow=2... " >&6; } +if ${pgac_cv_prog_cc_cflags__Wformat_overflow_2+:} false; then : + $as_echo_n "(cached) " >&6 +else + pgac_save_CFLAGS=$CFLAGS +CFLAGS="$pgac_save_CFLAGS -Wformat-overflow=2" +ac_save_c_werror_flag=$ac_c_werror_flag +ac_c_werror_flag=yes +cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ + +int +main () +{ + + ; + return 0; +} +_ACEOF +if ac_fn_c_try_compile "$LINENO"; then : + pgac_cv_prog_cc_cflags__Wformat_overflow_2=yes +else + pgac_cv_prog_cc_cflags__Wformat_overflow_2=no +fi +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +ac_c_werror_flag=$ac_save_c_werror_flag +CFLAGS="$pgac_save_CFLAGS" +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_prog_cc_cflags__Wformat_overflow_2" >&5 +$as_echo "$pgac_cv_prog_cc_cflags__Wformat_overflow_2" >&6; } +if test x"$pgac_cv_prog_cc_cflags__Wformat_overflow_2" = x"yes"; then + CFLAGS="$CFLAGS -Wformat-overflow=2" +fi + + { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $CC supports -Wformat-truncation=2" >&5 +$as_echo_n "checking whether $CC supports -Wformat-truncation=2... " >&6; } +if ${pgac_cv_prog_cc_cflags__Wformat_truncation_2+:} false; then : + $as_echo_n "(cached) " >&6 +else + pgac_save_CFLAGS=$CFLAGS +CFLAGS="$pgac_save_CFLAGS -Wformat-truncation=2" +ac_save_c_werror_flag=$ac_c_werror_flag +ac_c_werror_flag=yes +cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ + +int +main () +{ + + ; + return 0; +} +_ACEOF +if ac_fn_c_try_compile "$LINENO"; then : + pgac_cv_prog_cc_cflags__Wformat_truncation_2=yes +else + pgac_cv_prog_cc_cflags__Wformat_truncation_2=no +fi +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +ac_c_werror_flag=$ac_save_c_werror_flag +CFLAGS="$pgac_save_CFLAGS" +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_prog_cc_cflags__Wformat_truncation_2" >&5 +$as_echo "$pgac_cv_prog_cc_cflags__Wformat_truncation_2" >&6; } +if test x"$pgac_cv_prog_cc_cflags__Wformat_truncation_2" = x"yes"; then + CFLAGS="$CFLAGS -Wformat-truncation=2" +fi + # Disable strict-aliasing rules; needed for gcc 3.3+ { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $CC supports
chained transactions
This feature is meant to help manage transaction isolation in procedures. I proposed elsewhere a patch that allows running SET TRANSACTION in PL/pgSQL. But if you have complex procedures that commit many times in different branches perhaps, you'd have to do this in every new transaction, which would create code that is difficult to manage. The SQL standard offers the "chained transactions" feature to address this. The new command variants COMMIT AND CHAIN and ROLLBACK AND CHAIN immediately start a new transaction with the characteristics (isolation level, read/write, deferrable) of the previous one. So code that has particular requirements regard transaction isolation and such can use this to simplify code management. In the patch series, 0001 through 0006 is some preparatory code cleanup that is useful independently. 0007 is the implementation of the feature for the main SQL environment, 0008 is for PL/pgSQL. Support in other PLs could be added easily. The patch series also requires the "SET TRANSACTION in PL/pgSQL" patch for use in the test suite. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From ce07129f4d7ba376c59fa5d8244953a4eec05027 Mon Sep 17 00:00:00 2001 From: Peter EisentrautDate: Fri, 16 Feb 2018 20:29:20 -0500 Subject: [PATCH v1 1/8] Update function comments After a6542a4b6870a019cd952d055d2e7af2da2fe102, some function comments were misplaced. Fix that. --- src/backend/access/transam/xact.c | 40 ++- 1 file changed, 18 insertions(+), 22 deletions(-) diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index dbaaf8e005..d7688879a3 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -3190,12 +3190,25 @@ PreventTransactionChain(bool isTopLevel, const char *stmtType) } /* - * These two functions allow for warnings or errors if a command is - * executed outside of a transaction block. + * WarnNoTranactionChain + * RequireTransactionChain + * + * These two functions allow for warnings or errors if a command is executed + * outside of a transaction block. This is useful for commands that have no + * effects that persist past transaction end (and so calling them outside a + * transaction block is presumably an error). DECLARE CURSOR is an example. + * While top-level transaction control commands (BEGIN/COMMIT/ABORT) and SET + * that have no effect issue warnings, all other no-effect commands generate + * errors. + * + * If we appear to be running inside a user-defined function, we do not + * issue anything, since the function could issue more commands that make + * use of the current statement's results. Likewise subtransactions. + * Thus this is an inverse for PreventTransactionChain. * - * While top-level transaction control commands (BEGIN/COMMIT/ABORT) and - * SET that have no effect issue warnings, all other no-effect commands - * generate errors. + * isTopLevel: passed down from ProcessUtility to determine whether we are + * inside a function. + * stmtType: statement type name, for warning or error messages. */ void WarnNoTransactionChain(bool isTopLevel, const char *stmtType) @@ -3209,23 +3222,6 @@ RequireTransactionChain(bool isTopLevel, const char *stmtType) CheckTransactionChain(isTopLevel, true, stmtType); } -/* - * RequireTransactionChain - * - * This routine is to be called by statements that must run inside - * a transaction block, because they have no effects that persist past - * transaction end (and so calling them outside a transaction block - * is presumably an error). DECLARE CURSOR is an example. - * - * If we appear to be running inside a user-defined function, we do not - * issue anything, since the function could issue more commands that make - * use of the current statement's results. Likewise subtransactions. - * Thus this is an inverse for PreventTransactionChain. - * - * isTopLevel: passed down from ProcessUtility to determine whether we are - * inside a function. - * stmtType: statement type name, for warning or error messages. - */ static void CheckTransactionChain(bool isTopLevel, bool throwError, const char *stmtType) { base-commit: 51057feaa6bd24b51e6a4715c2090491ef037534 prerequisite-patch-id: 767f2e4d21b2ba347d4d08c3257abb91921fdb7b -- 2.16.2 From c1d5189f52537d4632d0a4069a732e8666c123e1 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Fri, 16 Feb 2018 20:44:15 -0500 Subject: [PATCH v1 2/8] Rename TransactionChain functions We call this thing a "transaction block" everywhere except in a few functions, where it is mysteriously called a "transaction chain". In the SQL standard, a transaction chain is something different. So rename these functions to match
Re: Support for ECDSA & ed25519 digital signatures in pgcrypto?
On Sun, Feb 4, 2018 at 04:38:24PM +0530, Nilesh Trivedi wrote: > I recently had to build ed25519 digital signature validation in PostgreSQL. > Since pgcrypto doesn't > support these methods, I had to look into PL/Python and PL/v8 based > implementations. The > experience turned out to be very poor (documented here: > https://gist.github.com > /nileshtrivedi > /7cd622d4d521986593bff81bfa1e5893 > > I think OpenSSL already supports these encryption methods and it would be > great > to have them > supported within pgcrypto - especially with the advent of distributed systems > like IPFS, public > blockchains like BitCoin, Ethereum. Elliptic curve cryptography has some major > advantages over > RSA: for both security and usability. Some are listed here: https:// > ed25519.cr.yp.to/ > > Is somebody working on this? I'm not a C programmer but if needed, I can look > into implementing > this. I agree there is going to be a lot more focus on ECDSA because elliptic curve cryptography is much more efficient for large key sizes, see: https://momjian.us/main/writings/pgsql/tls.pdf#page=17 and RSA can't support elliptic curve. Chrome accessing mail.google.com is already using ECDSA: https://momjian.us/main/writings/pgsql/tls.pdf#page=16 -- Bruce Momjianhttp://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
faster testing with symlink installs
I'm proposing a way to make test runs a bit faster. A fair amount of time is taken by creating the test installation. Not huge compared to the whole test run, but still long enough to get boring. The idea (that I stole from somewhere else) is that we make the installation as symlinks pointing back to the source tree. So after the first run, the installation is automatically up to date after each build. Then we can skip the whole temp install run if we find that we already have one (unless you change the set of installed files in a major way, which is very rate). This also helps if you use make installcheck: You just need to rebuild and restart the server, skipping the make install run. So that's what the patch does: make install INSTALL_SYMLINKS=1 installs files by making relative symlinks. make check TEMP_INSTALL_SYMLINKS=1 creates the temporary installation using symlinks and skips the install step if a tmp_install already exists. It requires the "ln" program from GNU coreutils. So it would be optional. Except ... this doesn't actually work. find_my_exec() resolves symlinks to find the actual program installation, and so for example the installed initdb will look for postgres in src/bin/initdb/. I would like to revert this behavior, because it seems to do more harm than good. The original commit message indicates that this would allow symlinking a program to somewhere outside of the installation tree and still allow it to work and find its friends. But that could just as well be done with a shell script. Reverting this behavior would also allow Homebrew-like installations to work better. The actual installation is in a versioned directory like /usr/local/Cellar/postgresql/10.1/... but symlinked to /usr/local/opt/postgresql/. But because of the symlink resolution, calling /usr/local/opt/postgresql/bin/pg_config returns paths under Cellar, which then causes breakage of software built against that path on postgresql upgrades the change the version number. Thoughts? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From f101d14195d5bc26f09a84e823f06b6c422fe444 Mon Sep 17 00:00:00 2001 From: Peter EisentrautDate: Sun, 28 Jan 2018 12:24:14 -0500 Subject: [PATCH 1/3] Remove duplicate installation of some header files In non-vpath builds, some header files such as pg_config.h were actually installed twice (to the same target location). Rearrange the make rules a bit to avoid that. --- src/include/Makefile | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/src/include/Makefile b/src/include/Makefile index a689d352b6..2d1baca04a 100644 --- a/src/include/Makefile +++ b/src/include/Makefile @@ -39,13 +39,7 @@ install: all installdirs $(INSTALL_DATA) $(srcdir)/port.h '$(DESTDIR)$(includedir_internal)' $(INSTALL_DATA) $(srcdir)/postgres_fe.h '$(DESTDIR)$(includedir_internal)' $(INSTALL_DATA) $(srcdir)/libpq/pqcomm.h '$(DESTDIR)$(includedir_internal)/libpq' -# These headers are needed for server-side development - $(INSTALL_DATA) pg_config.h '$(DESTDIR)$(includedir_server)' - $(INSTALL_DATA) pg_config_ext.h '$(DESTDIR)$(includedir_server)' - $(INSTALL_DATA) pg_config_os.h '$(DESTDIR)$(includedir_server)' - $(INSTALL_DATA) utils/errcodes.h '$(DESTDIR)$(includedir_server)/utils' - $(INSTALL_DATA) utils/fmgroids.h '$(DESTDIR)$(includedir_server)/utils' - $(INSTALL_DATA) utils/fmgrprotos.h '$(DESTDIR)$(includedir_server)/utils' +# Headers for server-side development # We don't use INSTALL_DATA for performance reasons --- there are a lot of files cp $(srcdir)/*.h '$(DESTDIR)$(includedir_server)'/ || exit; \ chmod $(INSTALL_DATA_MODE) '$(DESTDIR)$(includedir_server)'/*.h || exit; \ @@ -54,7 +48,8 @@ install: all installdirs chmod $(INSTALL_DATA_MODE) '$(DESTDIR)$(includedir_server)'/$$dir/*.h || exit; \ done ifeq ($(vpath_build),yes) - for file in dynloader.h catalog/schemapg.h parser/gram.h storage/lwlocknames.h utils/probes.h; do \ + for file in pg_config.h pg_config_ext.h pg_config_os.h utils/errcodes.h utils/fmgroids.h utils/fmgrprotos.h \ + dynloader.h catalog/schemapg.h parser/gram.h storage/lwlocknames.h utils/probes.h; do \ cp $$file '$(DESTDIR)$(includedir_server)'/$$file || exit; \ chmod $(INSTALL_DATA_MODE) '$(DESTDIR)$(includedir_server)'/$$file || exit; \ done base-commit: 51057feaa6bd24b51e6a4715c2090491ef037534 -- 2.16.2 From afc5738edc445f577d3b7da5503ff61d1debe953 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Sun, 28 Jan 2018 12:24:37 -0500 Subject: [PATCH 2/3] Don't resolve symlinks in find_my_exec() This reverts 336969e490d71c316a42fabeccda87f798e562dd. That feature allowed symlinking a binary to a location outside of the tree and still be able to find the
Re: [bug fix] pg_rewind creates corrupt WAL files, and the standby cannot catch up the primary
On Thu, Mar 01, 2018 at 01:26:32AM +, Tsunakawa, Takayuki wrote: > BTW, should pg_rewind really copy WAL files from the primary? If the > sole purpose of pg_rewind is to recover an instance to use as a > standby, can pg_rewind just remove all WAL files in the target > directory, because the standby can get WAL files from the primary > and/or archive? Yes, it should not copy those WAL files. Most of the time they are going to be meaningless. See this recent thread: https://www.postgresql.org/message-id/20180126023609.GH17847%40paquier.xyz So I would rather go this way instead of having to worry about manipulating those WAL segments as you do. Depending on the needs, I think that even a backpatch could be considered. > Related to this, shouldn't pg_rewind avoid copying more files and > directories like pg_basebackup? Currently, pg_rewind doesn't copy > postmaster.pid, postmaster.opts, and temporary files/directories > (pg_sql_tmp/). Yes, it should not copy those files. I have a patch in the current CF to do that: https://commitfest.postgresql.org/17/1507/ -- Michael signature.asc Description: PGP signature
Synchronous replay take III
Hi hackers, I was pinged off-list by a fellow -hackers denizen interested in the synchronous replay feature and wanting a rebased patch to test. Here it goes, just in time for a Commitfest. Please skip to the bottom of this message for testing notes. In previous threads[1][2][3] I called this feature proposal "causal reads". That was a terrible name, borrowed from MySQL. While it is probably a useful term of art, for one thing people kept reading it as "casual", which it ain't, and more importantly this patch is only one way to achieve read-follows-write causal consistency. Several others are proposed or exist in forks (user managed wait-for-LSN, global transaction manager, ...). OVERVIEW For writers, it works a bit like RAID mirroring: when you commit a write transaction, it waits until the data has become visible on all elements of the array, and if an array element is not responding fast enough it is kicked out of the array. For readers, it's a little different because you're connected directly to the array elements (rather than going through a central controller), so it uses a system of leases allowing read transactions to know instantly and whether they are running on an element that is currently in the array and are therefore able to service synchronous_replay transactions, or should raise an error telling you to go and ask some other element. This is a design choice favouring read-mostly workloads at the expense of write transactions. Hot standbys' whole raison for existing is to move *some* read-only workloads off the primary server. This proposal is for users who are prepared to trade increased primary commit latency for a guarantee about visibility on the standbys, so that *all* read-only work could be moved to hot standbys. The guarantee is: When two transactions tx1, tx2 are run with synchronous_replay set to on and tx1 reports successful commit before tx2 begins, then tx1 is guaranteed either to see tx1 or to raise a new error 40P02 if it is run on a hot standby. I have joked that that error means "snapshot too young". You could handle it the same way you handle deadlocks and serialization failures: by retrying, except in this case you might want to avoid that node for a while. Note that this feature is concerned with transaction visibility. It is not concerned with transaction durability. It will happily kick all of your misbehaving or slow standbys out of the array so that you fall back to single-node commit durability. You can express your durability requirement (ie I must have have N copies of the data on disk before I tell any external party about a transaction) separately, by configuring regular synchronous replication alongside this feature. I suspect that this feature would be most popular with people who are already using regular synchronous replication though, because they already tolerate higher commit latency. STATUS Here's a quick summary of the status of this proposal as I see it: * Simon Riggs, as the committer most concerned with the areas this proposal touches -- namely streaming replication and specifically syncrep -- has not so far appeared to be convinced by the value of this approach, and has expressed a preference for pursuing client-side or middleware tracked LSN tokens exclusively. I am perceptive enough to see that failing to sell the idea to Simon is probably fatal to the proposal. The main task therefore is to show convincingly that there is a real use case for this high-level design and its set of trade-offs, and that it justifies its maintenance burden. * I have tried to show that there are already many users who route their read-only queries to hot standby databases (not just "reporting queries"), and libraries and tools to help people do that using heuristics like "logged in users need fresh data, so primary only" or "this session has written in the past N minutes, so primary only". This proposal would provide a way for those users to do something based on a guarantee instead of such flimsy heuristics. I have tried to show that the libraries used by Python, Ruby, Java etc to achieve that sort of load balancing should easily be able to handle finding read-only nodes, routing read-only queries and dealing with the new error. I do also acknowledge that such libraries could also be used to provide transparent read-my-writes support by tracking LSNs and injecting wait-for-LSN directives with alternative proposals, but that is weaker than a global reads-follow-writes guarantee and the difference can matter. * I have argued that token-based systems are in fact rather complicated[4] and by no means a panacea. As usual, there are a whole bunch of trade-offs. I suspect that this proposal AND fully user-managed causality tokens (no middleware) are both valuable sweet spots for a non-GTM system. * Ants Aasma pointed out that this proposal doesn't provide a read-follows-read guarantee. He is right, and I'm not sure to what extent that is a
Re: [HACKERS] path toward faster partition pruning
On 2018/02/28 19:14, Ashutosh Bapat wrote: > On Wed, Feb 28, 2018 at 6:42 AM, Amit Langote wrote: >> BTW, should there be a relevant test in partition_join.sql? If yes, >> attached a patch (partitionwise-join-collation-test-1.patch) to add one. > > A partition-wise join path will be created but discarded because of > higher cost. This test won't see it in that case. So, please add some > data like other tests and add command to analyze the partitioned > tables. That kind of protects from something like that. Thanks for the review. Hmm, the added test is such that the partition collations won't match, so partition-wise join won't be considered at all due to differing PartitionSchemes, unless I'm missing something. Thanks, Amit
[bug fix] pg_rewind creates corrupt WAL files, and the standby cannot catch up the primary
Hello, Our customer hit another bug of pg_rewind with PG 9.5. The attached patch fixes this. PROBLEM After a long run of successful pg_rewind, the synchronized standby could not catch up the primary forever, emitting the following message repeatedly: LOG: XX000: could not read from log segment 0006028A0031, offset 16384: No error CAUSE If the primary removes WAL files that pg_rewind is going to get, pg_rewind leaves 0-byte WAL files in the target directory here: [libpq_fetch.c] case FILE_ACTION_COPY: /* Truncate the old file out of the way, if any */ open_target_file(entry->path, true); fetch_file_range(entry->path, 0, entry->newsize); break; pg_rewind completes successfully, create recovery.conf, and then start the standby in the target cluster. walreceiver receives WAL records and write them to the 0-byte WAL files. Finally, xlog reader complains that he cannot read a WAL page. FIX pg_rewind deletes the file when it finds that the primary has deleted it. OTHER THOUGHTS BTW, should pg_rewind really copy WAL files from the primary? If the sole purpose of pg_rewind is to recover an instance to use as a standby, can pg_rewind just remove all WAL files in the target directory, because the standby can get WAL files from the primary and/or archive? Related to this, shouldn't pg_rewind avoid copying more files and directories like pg_basebackup? Currently, pg_rewind doesn't copy postmaster.pid, postmaster.opts, and temporary files/directories (pg_sql_tmp/). Regards Takayuki Tsunakawa pg_rewind_corrupt_wal.patch Description: pg_rewind_corrupt_wal.patch
Re: Online enabling of checksums
On 1 March 2018 at 03:42, Alvaro Herrerawrote: > I noticed that pg_verify_checksum takes an "-o oid" argument to only > check the relation with that OID; but that's wrong, because the number > is a relfilenode, not an OID (since it's compared to the on-disk file > name). I would suggest changing everything to clarify that it's a > pg_class.relfilenode value, otherwise it's going to be very confusing. > Maybe use "-f filenode" if -f is available? > > I see this mistake/misunderstanding enough that I'd quite like to change how we generate relfilenode IDs, making them totally independent of the oid space. Unsure how practical it is, but it'd be so nice to get rid of that trap. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: RFC: Add 'taint' field to pg_control.
On 1 March 2018 at 09:00, Justin Pryzbywrote: > > > > - started in single user mode or with system indices disabled? > > why? > > Some of these I suggested just as a datapoint (or other brainstorms I > couldn't > immediately reject). A cluster where someone has UPDATED pg_* (even > pg_statistic) or otherwise hacked on I would tend to think about > differently > than a "pristine" cluster that's never seen anything more interesting than > a > join. How about "has run in a state where system catalog modifications are permitted". Because I've had one seriously frustrating case where that would've been extremely pertinent information. That said, I'm not convinced it's worth the effort and I think this is going off into the weeds a little. Lets focus on Andres's core suggestion (and maybe refine the fsync part to get rid of false positives due to bulk loading) and build on from there in subsequent work. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Online enabling of checksums
> On 01 Mar 2018, at 05:07, Tomas Vondrawrote: > > On 02/28/2018 08:42 PM, Alvaro Herrera wrote: >> I noticed that pg_verify_checksum takes an "-o oid" argument to only >> check the relation with that OID; but that's wrong, because the number >> is a relfilenode, not an OID (since it's compared to the on-disk file >> name). I would suggest changing everything to clarify that it's a >> pg_class.relfilenode value, otherwise it's going to be very confusing. >> Maybe use "-f filenode" if -f is available? > > I'd argue this is merely a mistake in the --help text. Agreed, the --help text isn’t really clear in this case and should be updated to say something along the lines of: printf(_(" -o relfilenode check only relation with specified relfilenode\n")); cheers ./daniel
Re: RFC: Add 'taint' field to pg_control.
On 1 March 2018 at 06:28, Justin Pryzbywrote: > On Wed, Feb 28, 2018 at 02:18:12PM -0800, Andres Freund wrote: > > On 2018-02-28 17:14:18 -0500, Peter Eisentraut wrote: > > > I can see why you'd want that, but as a DBA, I don't necessarily want > > > all of that recorded, especially in a quasi-permanent way. > > > > Huh? You're arguing that we should make it easier for DBAs to hide > > potential causes of corruption? I fail to see when that's necessary / > > desirable? That a cluster ran with fsync=off isn't an embarassing thing, > > nor does it contain private data... > > The more fine grained these are the more useful they can be: > > Running with fsync=off is common advice while loading, so reporting that > "fsync=off at some point" is much less useful than reporting "having to run > recovery from an fsync=off database". > > I propose there could be two bits: > > 1. fsync was off at some point in history when the DB needed recovery; > 2. fsync was off during the current instance; I assume the bit would be > set > whenever fsync=off was set, but could be cleared when the server was > cleanly shut down. If the bit is set during recovery, the first bit > would > be permanently set. Not sure but if this bit is set and then fsync > re-enabled, maybe this could be cleared after any checkpoint and not > just > shutdown ? > > I think that's a very useful way to eliminate false positives and make this diagnostic field more useful. But we'd need to guarantee that when you've switched from fsync=off to fsync=on, we've actually fsync'd every extent of every table and index, whether or not we think they're currently dirty and whether or not the smgr currently has them open. Do something like initdb does to flush everything. Without that, we could have WAL-vs-heap ordering issues and so on *even after a Pg restart* if the system has lots of dirty writeback to do. A quick look at CheckPointGuts and CheckPointBuffers suggests we don't presently do that but I admit I haven't read in depth. We force a fsync of the current WAL segment when switching fsync on, but don't do anything else AFAICS. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: RFC: Add 'taint' field to pg_control.
On 1 March 2018 at 05:43, Andres Freundwrote: > Hi, > > a significant number of times during investigations of bugs I wondered > whether running the cluster with various settings, or various tools > could've caused the issue at hand. Therefore I'd like to propose adding > a 'tainted' field to pg_control, that contains some of the "history" of > the cluster. Individual bits inside that field that I can think of right > now are: > - pg_resetxlog was used non-passively on cluster > - ran with fsync=off > - ran with full_page_writes=off > - pg_upgrade was used > > What do others think? > > A huge +1 from me for the idea. I can't even count the number of black box "WTF did you DO?!?" servers I've looked at, where bizarre behaviour has turned out to be down to the user doing something very silly and not saying anything about it. It's only some flags, so putting it in pg_control is arguably somewhat wasteful but so minor as to be of no real concern. And that's probably the best way to make sure it follows the cluster around no matter what backup/restore/copy mechanisms are used and how "clever" they try to be. What I'd _really_ love would be to blow the scope of this up a bit and turn it into a key-events cluster journal, recording key param switches, recoveries (and lsn ranges), pg_upgrade's, etc. But then we'd run into people with weird workloads who managed to make it some massive file, we'd have to make sure we had a way to stop it getting left out of copies/backups, and it'd generally be irritating. So lets not do that. Proper support for class-based logging and multiple outputs would be a good solution for this at some future point. What you propose is simple enough to be quick to implement, adds no admin overhead, and will be plenty useful enough. I'd like to add "postmaster.pid was absent when the cluster started" to this list, please. Sure, it's not conclusive, and there are legit reasons why that might be the case, but so often it's somebody kill -9'ing the postmaster, then removing the postmaster.pid and starting up again without killing the workers -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Two small patches for the isolationtester lexer
> On 01 Mar 2018, at 06:01, Tom Lanewrote: > Daniel Gustafsson writes: >> I agree, patch 0002 was broken and the correct fix is a much bigger project - >> one too big for me to tackle right now (but hopefully at some point in the >> near >> future). Thanks for the review of it though! > > OK. I'm going to mark this commitfest entry closed, since the other patch > is in and this one needs a good bit more work. Please start a new thread, > or at least a new CF entry, if you do more work in this area. I absolutely agree, thanks! cheers ./daniel
Re: ON CONFLICT DO UPDATE for partitioned tables
On 2018/03/01 1:03, Robert Haas wrote: > On Tue, Feb 27, 2018 at 7:46 PM, Alvaro Herrera >wrote: >> I updated Amit Langote's patch for INSERT ON CONFLICT DO UPDATE[1]. >> Following the lead of edd44738bc88 ("Be lazier about partition tuple >> routing.") this incarnation only does the necessary push-ups for the >> specific partition that needs it, at execution time. As far as I can >> tell, it works as intended. >> >> I chose to refuse the case where the DO UPDATE clause causes the tuple >> to move to another partition (i.e. you're updating the partition key of >> the tuple). While it's probably possible to implement that, it doesn't >> seem a very productive use of time. > > I would have thought that to be the only case we could support with > the current infrastructure. Doesn't a correct implementation for any > other case require a global index? I'm thinking that Alvaro is talking here about the DO UPDATE action part, not the conflict checking part. The latter will definitely require global indexes if conflict were to be checked on columns not containing the partition key. The case Alvaro mentions arises after checking the conflict, presumably using an inherited unique index whose keys must include the partition keys. If the conflict action is DO UPDATE and its SET clause changes partition key columns such that the row will have to change the partition, then the current patch will result in an error. I think that's because making update row movement work in this case will require some adjustments to what 2f178441044 (Allow UPDATE to move rows between partitions) implemented. We wouldn't have things set up in the ModifyTableState that the current row-movement code depends on being set up; for example, there wouldn't be per-subplan ResultRelInfo's in the ON CONFLICT DO UPDATE case. The following Assert in ExecUpdate() will fail for instance: map_index = resultRelInfo - mtstate->resultRelInfo; Assert(map_index >= 0 && map_index < mtstate->mt_nplans); Thanks, Amit
Re: row filtering for logical replication
On 1 March 2018 at 07:03, Euler Taveirawrote: > Hi, > > The attached patches add support for filtering rows in the publisher. > The output plugin will do the work if a filter was defined in CREATE > PUBLICATION command. An optional WHERE clause can be added after the > table name in the CREATE PUBLICATION such as: > > CREATE PUBLICATION foo FOR TABLE departments WHERE (id > 2000 AND id <= > 3000); > > Row that doesn't match the WHERE clause will not be sent to the > subscribers. > > Patches 0001 and 0002 are only refactors and can be applied > independently. 0003 doesn't include row filtering on initial > synchronization. > Good idea. I haven't read this yet, but one thing to make sure you've handled is limiting the clause to referencing only the current tuple and the catalogs. user-catalog tables are OK, too, anything that is RelationIsAccessibleInLogicalDecoding(). This means only immutable functions may be invoked, since a stable or volatile function might attempt to access a table. And views must be prohibited or recursively checked. (We have tree walkers that would help with this). It might be worth looking at the current logic for CHECK expressions, since the requirements are similar. In my opinion you could safely not bother with allowing access to user catalog tables in the filter expressions and limit them strictly to immutable functions and the tuple its self. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: VPATH build from a tarball fails with some gmake versions
On Thu, Mar 1, 2018 at 9:24 AM, Tom Lanewrote: > I tried doing a VPATH build referencing a source directory that I'd > distclean'd > but not maintainer-clean'd, which should simulate the case of a VPATH > build from a tarball. I was quite surprised that it fell over: > > ... > gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement > -Wendif-labels -Wmissing-format-attribute -Wformat-security > -fno-strict-aliasing -fwrapv -g -O2 -I. -I/home/postgres/pgsql/src/bin/psql > -I/home/postgres/pgsql/src/interfaces/libpq -I../../../src/include > -I/home/postgres/pgsql/src/include -D_GNU_SOURCE -c -o sql_help.o > sql_help.c > gcc: sql_help.c: No such file or directory > gcc: no input files > make[3]: *** [sql_help.o] Error 1 > make[3]: Leaving directory `/home/postgres/buildroot/src/bin/psql' > > gmake should be generating a path to sql_help.c, since that exists > in the corresponding source directory, but it does not. I can reproduce > this on RHEL6 with make-3.81-23, but not on Fedora 26 with make-4.2.1-2, > and (curiously) also not on High Sierra with Apple's copy of make 3.81. > I wonder how many other people see it. > > The attached patch seems to fix it, indicating that there's something > about the rules relating sql_help.c and sql_help.h that is confusing > make, such that turning sql_help.c into the primary target for the > create_help rule removes the confusion. > Very odd. I can't reproduce it on Centos6 with make 3.81.23 and a downloaded tarball. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: row filtering for logical replication
On Wed, Feb 28, 2018 at 08:03:02PM -0300, Euler Taveira wrote: > Hi, > > The attached patches add support for filtering rows in the publisher. > The output plugin will do the work if a filter was defined in CREATE > PUBLICATION command. An optional WHERE clause can be added after the > table name in the CREATE PUBLICATION such as: > > CREATE PUBLICATION foo FOR TABLE departments WHERE (id > 2000 AND id <= 3000); > > Row that doesn't match the WHERE clause will not be sent to the subscribers. > > Patches 0001 and 0002 are only refactors and can be applied > independently. 0003 doesn't include row filtering on initial > synchronization. > > Comments? Great feature! I think a lot of people will like to have the option of trading a little extra CPU on the pub side for a bunch of network traffic and some work on the sub side. I noticed that the WHERE clause applies to all tables in the publication. Is that actually the right thing? I'm thinking of a case where we have foo(id, ...) and bar(foo_id, ). To slice that correctly, we'd want to do the ids in the foo table and the foo_ids in the bar table. In the system as written, that would entail, at least potentially, writing a lot of publications by hand. Something like WHERE ( (table_1,..., table_N) HAS (/* WHERE clause here */) AND (table_N+1,..., table_M) HAS (/* WHERE clause here */) AND ... ) could be one way to specify. I also noticed that in psql, \dRp+ doesn't show the WHERE clause, which it probably should. Does it need regression tests? Best, David. -- David Fetterhttp://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: prokind column (was Re: [HACKERS] SQL procedures)
On Wed, Feb 28, 2018 at 05:37:11PM -0500, Peter Eisentraut wrote: > On 2/28/18 15:45, Tom Lane wrote: >> I have reviewed this patch and attach an updated version below. >> I've rebased it up to today, fixed a few minor errors, and adopted >> most of Michael's suggestions. Also, since I remain desperately >> unhappy with putting zeroes into prorettype, I changed it to not >> do that ;-) ... now procedures have VOIDOID as their prorettype, >> and it will be substantially less painful to allow them to return >> some other scalar result in future, should we wish to. I believe >> I've found all the places that were relying on prorettype == 0 as >> a substitute for prokind == 'p'. > > I have just posted "INOUT parameters in procedures", which contains some > of those same changes. So I think we're on the same page. I will work > on consolidating this. Thanks Peter. I have read the patch set that Tom has posted here and hunted for other inconsistencies. It seems to me that you have spotted everything. The changes in ProcedureCreate() are nice. I was wondering as well if it would be worth checking the contents of pg_proc with prorettype = 0 in the regression tests to always make sure that this never happens... Until I saw that opr_sanity.sql was actually doing already that so procedures actually are breaking that check on HEAD. -- Michael signature.asc Description: PGP signature
Re: IndexTupleDSize macro seems redundant
Stephen Frostwrites: > Updated (combined) patch attached for review. I went through and looked > again to make sure there weren't any cases of making an unaligned > pointer to a struct and didn't see any, and I added some comments to > _bt_restore_page(). This seems to have fallen through a crack, so I went ahead and pushed it. regards, tom lane
Re: Cast jsonb to numeric, int, float, bool
On 01.03.2018 00:43, Darafei Praliaskouski wrote: The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:not tested We're using this patch and like it a lot. We store a lot of log-like data in s3-hosted .json.gz files. Sometimes we need to suddenly ingest them and calculate statistics and check the new what-ifs. We ingest data to simple single-column table with jsonb field, and then perform our calculation on top of it. Without this patch the only option to get data already parsed as numbers into jsonb into calculation is via binary-text-binary transformation. We're relying on the patch for our custom spatial type extension and casting in it. https://github.com/gojuno/lostgis/blob/master/sql/types/type_tpv.sql#L21 For postgres installations without the patch we do WITH INOUT cast stubbing, https://github.com/gojuno/lostgis/blob/master/sql/types/__jsonb_casts.sql - but without performance benefits of raw C processing :) A thing this patch does not implement is cast from jsonb to bigint. That would be useful for transforming stored osm_id OpenStreetMap object identifiers. Right now we're stubbing it with jsonb::numeric::bigint cast, but the middle one would be nice to get rid of. The new status of this patch is: Ready for Committer Attached new version of the patch in which I removed duplicated code using new subroutine JsonbExtractScalar(). I am not sure what is better to do when a JSON item has an unexpected type: to throw an error or to return SQL NULL. Also JSON nulls could be converted to SQL NULLs. I should note here that expression (jb -> 'key')::datatype can be rewritten with SQL/JSON function JSON_VALUE: JSON_VALUE(jb, '$.key' RETURNING datatype ERROR ON ERROR) But by standard JSON_VALUE tries to cast string JSON items to the specified datatype too, so JSON_VALUE('{"key": "123"}'::jsonb, '$.key' RETURNING int ERROR ON ERROR) does not throw an error but returns 123. We already have jsonpath operators @#, @*, so it might be very useful if our jsonb casts were equivalent to theirs SQL/JSON analogues. For example, (jb @# '$.key')::datatype could be equivalent to JSON_VALUE(jb, '$.key' RETURNING datatype ERROR ON ERROR) or JSON_VALUE(jb, '$.key' RETURNING datatype [NULL ON ERROR]). But if we want to have NULL ON ERRORbehavior (which is default in SQL/JSON) in casts, then casts should not throw any errors. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company >From fd71e22f087af2cd14c4a9101c96efb56888ab72 Mon Sep 17 00:00:00 2001 From: AnastasiaDate: Thu, 1 Mar 2018 01:55:31 +0300 Subject: [PATCH] Cast jsonbNumeric to numeric, int4 and float8. Cast jsonbBool to bool. --- src/backend/utils/adt/jsonb.c | 95 +++ src/include/catalog/pg_cast.h | 5 +++ src/include/catalog/pg_proc.h | 9 3 files changed, 109 insertions(+) diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c index 0f70180..fb907f2 100644 --- a/src/backend/utils/adt/jsonb.c +++ b/src/backend/utils/adt/jsonb.c @@ -1845,3 +1845,98 @@ jsonb_object_agg_finalfn(PG_FUNCTION_ARGS) PG_RETURN_POINTER(out); } + + +/* + * Extract scalar value from raw-scalar pseudo-array jsonb. + */ +static JsonbValue * +JsonbExtractScalar(JsonbContainer *jbc, JsonbValue *res) +{ + JsonbIterator *it; + JsonbIteratorToken tok PG_USED_FOR_ASSERTS_ONLY; + JsonbValue tmp; + + if (!JsonContainerIsArray(jbc) || !JsonContainerIsScalar(jbc)) + return NULL; + + /* + * A root scalar is stored as an array of one element, so we get the + * array and then its first (and only) member. + */ + it = JsonbIteratorInit(jbc); + + tok = JsonbIteratorNext(, , true); + Assert(tok == WJB_BEGIN_ARRAY); + Assert(tmp.val.array.nElems == 1 && tmp.val.array.rawScalar); + + tok = JsonbIteratorNext(, res, true); + Assert (tok == WJB_ELEM); + Assert(IsAJsonbScalar(res)); + + tok = JsonbIteratorNext(, , true); + Assert (tok == WJB_END_ARRAY); + + tok = JsonbIteratorNext(, , true); + Assert(tok == WJB_DONE); + + return res; +} + +Datum +jsonb_numeric(PG_FUNCTION_ARGS) +{ + Jsonb *in = PG_GETARG_JSONB_P(0); + JsonbValue v; + + if (!JsonbExtractScalar(>root, ) || v.type != jbvNumeric) + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("jsonb value must be numeric"))); + + PG_RETURN_NUMERIC(v.val.numeric); +} + +Datum +jsonb_int4(PG_FUNCTION_ARGS) +{ + Jsonb *in = PG_GETARG_JSONB_P(0); + JsonbValue v; + + if (!JsonbExtractScalar(>root, ) || v.type != jbvNumeric) + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("jsonb value must be numeric"))); + + PG_RETURN_INT32(DatumGetInt32(DirectFunctionCall1(numeric_int4, + NumericGetDatum(v.val.numeric; +} + +Datum
Re: Missing comment edit
Kyotaro HORIGUCHIwrites: > I happend to find that the comment on formdesc is missing > pg_subscription. Please find the attached patch (I'm sure:) to > fix that . Hmm ... certainly, that comment is now wrong, but I'm kind of inclined to just remove it, because it'll certainly be wrong again in future. It's not telling you anything you can't find out with a trivial search in the same file, so is it worth the maintenance overhead? regards, tom lane
unused includes in test_decoding
Hi, This is a cosmetic patch that removes some unused includes in test_decoding. It seems to be like this since day 1 (9.4). -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento From c50fa4a67276d423ba92cf85578f38533d1b68e0 Mon Sep 17 00:00:00 2001 From: Euler TaveiraDate: Wed, 28 Feb 2018 23:44:51 + Subject: [PATCH] Remove some unused includes Those includes are not used by test_decoding. --- contrib/test_decoding/test_decoding.c | 10 -- 1 file changed, 10 deletions(-) diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c index 0f18afa..f0b37f6 100644 --- a/contrib/test_decoding/test_decoding.c +++ b/contrib/test_decoding/test_decoding.c @@ -12,25 +12,15 @@ */ #include "postgres.h" -#include "access/sysattr.h" - -#include "catalog/pg_class.h" #include "catalog/pg_type.h" -#include "nodes/parsenodes.h" - -#include "replication/output_plugin.h" #include "replication/logical.h" -#include "replication/message.h" #include "replication/origin.h" #include "utils/builtins.h" #include "utils/lsyscache.h" #include "utils/memutils.h" #include "utils/rel.h" -#include "utils/relcache.h" -#include "utils/syscache.h" -#include "utils/typcache.h" PG_MODULE_MAGIC; -- 2.7.4
Re: compress method for spgist - 2
Alexander Korotkovwrites: > On Thu, Jan 4, 2018 at 1:17 AM, Dagfinn Ilmari Mannsåker > wrote: >> This patch added two copies of the poly_ops row to the "Built-in SP-GiST >> Operator Classes" table in spgist.sgml. > Thank for fixing this! I'm sure that Teodor will push this after end of > New Year holidays in Russia. He didn't ... so I did. regards, tom lane
row filtering for logical replication
Hi, The attached patches add support for filtering rows in the publisher. The output plugin will do the work if a filter was defined in CREATE PUBLICATION command. An optional WHERE clause can be added after the table name in the CREATE PUBLICATION such as: CREATE PUBLICATION foo FOR TABLE departments WHERE (id > 2000 AND id <= 3000); Row that doesn't match the WHERE clause will not be sent to the subscribers. Patches 0001 and 0002 are only refactors and can be applied independently. 0003 doesn't include row filtering on initial synchronization. Comments? -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento From ae95eb51dd72c2e8ba278da950b478f6c6741fc0 Mon Sep 17 00:00:00 2001 From: Euler TaveiraDate: Tue, 27 Feb 2018 02:21:03 + Subject: [PATCH 1/3] Refactor function create_estate_for_relation Relation localrel is the only LogicalRepRelMapEntry structure member that is useful for create_estate_for_relation. --- src/backend/replication/logical/worker.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c index 04985c9..6820c1a 100644 --- a/src/backend/replication/logical/worker.c +++ b/src/backend/replication/logical/worker.c @@ -183,7 +183,7 @@ ensure_transaction(void) * This is based on similar code in copy.c */ static EState * -create_estate_for_relation(LogicalRepRelMapEntry *rel) +create_estate_for_relation(Relation rel) { EState *estate; ResultRelInfo *resultRelInfo; @@ -193,12 +193,12 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel) rte = makeNode(RangeTblEntry); rte->rtekind = RTE_RELATION; - rte->relid = RelationGetRelid(rel->localrel); - rte->relkind = rel->localrel->rd_rel->relkind; + rte->relid = RelationGetRelid(rel); + rte->relkind = rel->rd_rel->relkind; estate->es_range_table = list_make1(rte); resultRelInfo = makeNode(ResultRelInfo); - InitResultRelInfo(resultRelInfo, rel->localrel, 1, NULL, 0); + InitResultRelInfo(resultRelInfo, rel, 1, NULL, 0); estate->es_result_relations = resultRelInfo; estate->es_num_result_relations = 1; @@ -584,7 +584,7 @@ apply_handle_insert(StringInfo s) } /* Initialize the executor state. */ - estate = create_estate_for_relation(rel); + estate = create_estate_for_relation(rel->localrel); remoteslot = ExecInitExtraTupleSlot(estate, RelationGetDescr(rel->localrel)); @@ -688,7 +688,7 @@ apply_handle_update(StringInfo s) check_relation_updatable(rel); /* Initialize the executor state. */ - estate = create_estate_for_relation(rel); + estate = create_estate_for_relation(rel->localrel); remoteslot = ExecInitExtraTupleSlot(estate, RelationGetDescr(rel->localrel)); localslot = ExecInitExtraTupleSlot(estate, @@ -806,7 +806,7 @@ apply_handle_delete(StringInfo s) check_relation_updatable(rel); /* Initialize the executor state. */ - estate = create_estate_for_relation(rel); + estate = create_estate_for_relation(rel->localrel); remoteslot = ExecInitExtraTupleSlot(estate, RelationGetDescr(rel->localrel)); localslot = ExecInitExtraTupleSlot(estate, -- 2.7.4 From 8421809e38d3e1d43b00feaac4b8bccaa6738079 Mon Sep 17 00:00:00 2001 From: Euler Taveira Date: Wed, 24 Jan 2018 17:01:31 -0200 Subject: [PATCH 2/3] Rename a WHERE node A WHERE clause will be used for row filtering in logical replication. We already have a similar node: 'WHERE (condition here)'. Let's rename the node to a generic name and use it for row filtering too. --- src/backend/parser/gram.y | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index d99f2be..bf32362 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -468,7 +468,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); %type def_arg columnElem where_clause where_or_current_clause a_expr b_expr c_expr AexprConst indirection_el opt_slice_bound columnref in_expr having_clause func_table xmltable array_expr -ExclusionWhereClause operator_def_arg +OptWhereClause operator_def_arg %type rowsfrom_item rowsfrom_list opt_col_def_list %type opt_ordinality %type ExclusionConstraintList ExclusionConstraintElem @@ -3734,7 +3734,7 @@ ConstraintElem: $$ = (Node *)n; } | EXCLUDE access_method_clause '(' ExclusionConstraintList ')' -opt_definition OptConsTableSpace ExclusionWhereClause +opt_definition OptConsTableSpace OptWhereClause ConstraintAttributeSpec { Constraint *n = makeNode(Constraint); @@ -3831,7 +3831,7 @@ ExclusionConstraintElem: index_elem WITH any_operator } ; -ExclusionWhereClause: +OptWhereClause: WHERE '(' a_expr ')' { $$ = $3; } | /*EMPTY*/{ $$ =
Re: [HACKERS] pgbench randomness initialization
Fabien COELHOwrites: >> This is a simple patch that does what it says on the tin. I ran into >> trouble with the pgbench TAP test *even before applying the patch*, but >> only because I was doing a VPATH build as a user without 'write' >> on the source tree (001_pgbench_with_server.pl tried to make pgbench >> create log files there). Bad me. Oddly, that was the only test in the >> whole tree to have such an issue, so here I add a pre-patch to fix that. >> Now my review needs a review. :) > Yep. I find the multiple chdir solution a little bit too extreme. > ISTM that it should rather add the correct path to --log-prefix by > prepending $node->basedir, like the pgbench function does for -f scripts. > See attached. Hm ... so I tried to replicate this problem, and failed to: the log files get made under the VPATH build directory, as desired, even without this patch. Am I doing something wrong, or is this platform-dependent somehow? regards, tom lane
VPATH build from a tarball fails with some gmake versions
I tried doing a VPATH build referencing a source directory that I'd distclean'd but not maintainer-clean'd, which should simulate the case of a VPATH build from a tarball. I was quite surprised that it fell over: ... gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -g -O2 -I. -I/home/postgres/pgsql/src/bin/psql -I/home/postgres/pgsql/src/interfaces/libpq -I../../../src/include -I/home/postgres/pgsql/src/include -D_GNU_SOURCE -c -o sql_help.o sql_help.c gcc: sql_help.c: No such file or directory gcc: no input files make[3]: *** [sql_help.o] Error 1 make[3]: Leaving directory `/home/postgres/buildroot/src/bin/psql' gmake should be generating a path to sql_help.c, since that exists in the corresponding source directory, but it does not. I can reproduce this on RHEL6 with make-3.81-23, but not on Fedora 26 with make-4.2.1-2, and (curiously) also not on High Sierra with Apple's copy of make 3.81. I wonder how many other people see it. The attached patch seems to fix it, indicating that there's something about the rules relating sql_help.c and sql_help.h that is confusing make, such that turning sql_help.c into the primary target for the create_help rule removes the confusion. Comments? regards, tom lane diff --git a/src/bin/psql/Makefile b/src/bin/psql/Makefile index c8eb2f9..4482097 100644 *** a/src/bin/psql/Makefile --- b/src/bin/psql/Makefile *** psql: $(OBJS) | submake-libpq submake-li *** 35,49 help.o: sql_help.h ! sql_help.c: sql_help.h ; ! sql_help.h: create_help.pl $(wildcard $(REFDOCDIR)/*.sgml) $(PERL) $< $(REFDOCDIR) $* psqlscanslash.c: FLEXFLAGS = -Cfe -p -p psqlscanslash.c: FLEX_NO_BACKUP=yes psqlscanslash.c: FLEX_FIX_WARNING=yes ! distprep: sql_help.h psqlscanslash.c install: all installdirs $(INSTALL_PROGRAM) psql$(X) '$(DESTDIR)$(bindir)/psql$(X)' --- 35,49 help.o: sql_help.h ! sql_help.h: sql_help.c ; ! sql_help.c: create_help.pl $(wildcard $(REFDOCDIR)/*.sgml) $(PERL) $< $(REFDOCDIR) $* psqlscanslash.c: FLEXFLAGS = -Cfe -p -p psqlscanslash.c: FLEX_NO_BACKUP=yes psqlscanslash.c: FLEX_FIX_WARNING=yes ! distprep: sql_help.c psqlscanslash.c install: all installdirs $(INSTALL_PROGRAM) psql$(X) '$(DESTDIR)$(bindir)/psql$(X)'
Re: prokind column (was Re: [HACKERS] SQL procedures)
On 2/28/18 15:45, Tom Lane wrote: > I have reviewed this patch and attach an updated version below. > I've rebased it up to today, fixed a few minor errors, and adopted > most of Michael's suggestions. Also, since I remain desperately > unhappy with putting zeroes into prorettype, I changed it to not > do that ;-) ... now procedures have VOIDOID as their prorettype, > and it will be substantially less painful to allow them to return > some other scalar result in future, should we wish to. I believe > I've found all the places that were relying on prorettype == 0 as > a substitute for prokind == 'p'. I have just posted "INOUT parameters in procedures", which contains some of those same changes. So I think we're on the same page. I will work on consolidating this. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
INOUT parameters in procedures
This patch set adds support for INOUT parameters to procedures. Currently, INOUT and OUT parameters are not supported. A top-level CALL returns the output parameters as a result row. In PL/pgSQL, I have added special support to pass the output back into the variables, as one would expect. These patches apply on top of the "prokind" patch set v2. (Tom has submitted an updated version of that, which overlaps with some of the changes I've made here. I will work on consolidating that soon.) So ... no OUT parameters, though. I'm struggling to find a way to make this compatible with everything else. For functions, the OUT parameters don't appear in the signature. But that is not how this is specified in the SQL standard for procedures (I think). In PL/pgSQL, you'd expect that CREATE PROCEDURE foo(a int, OUT b int) ... could be called like CALL foo(x, y); but that would require a different way of parsing function invocation. At the top-level, it's even more dubious. In DB2, apparently you write CALL foo(123, ?); with a literal ? for the OUT parameters. In Oracle, I've seen CALL ... INTO syntax. Anyway, I'm leaving this out for now. It can be worked around by using INOUT parameters. Future improvements would be mainly syntax/parsing adjustments; the guts that I'm implementing here would remain valid. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 127f3716a28cceca5077786e2cb3717e36dbb426 Mon Sep 17 00:00:00 2001 From: Peter EisentrautDate: Tue, 27 Feb 2018 09:55:32 -0500 Subject: [PATCH v1 1/2] fixup! Add prokind column, replacing proisagg and proiswindow --- src/backend/commands/dropcmds.c | 2 +- src/backend/parser/parse_func.c | 6 +++--- src/backend/utils/cache/lsyscache.c | 12 ++-- src/include/utils/lsyscache.h | 2 +- 4 files changed, 11 insertions(+), 11 deletions(-) diff --git a/src/backend/commands/dropcmds.c b/src/backend/commands/dropcmds.c index fc4ce8d22a..45493abf57 100644 --- a/src/backend/commands/dropcmds.c +++ b/src/backend/commands/dropcmds.c @@ -92,7 +92,7 @@ RemoveObjects(DropStmt *stmt) */ if (stmt->removeType == OBJECT_FUNCTION) { - if (get_func_isagg(address.objectId)) + if (get_func_kind(address.objectId) == PROKIND_AGGREGATE) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("\"%s\" is an aggregate function", diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c index 9dbf2c2b63..0b5145f70d 100644 --- a/src/backend/parser/parse_func.c +++ b/src/backend/parser/parse_func.c @@ -2078,7 +2078,7 @@ LookupFuncWithArgs(ObjectType objtype, ObjectWithArgs *func, bool noError) if (objtype == OBJECT_FUNCTION) { /* Make sure it's a function, not a procedure */ - if (oid && get_func_rettype(oid) == InvalidOid) + if (oid && get_func_kind(oid) == PROKIND_PROCEDURE) { if (noError) return InvalidOid; @@ -2109,7 +2109,7 @@ LookupFuncWithArgs(ObjectType objtype, ObjectWithArgs *func, bool noError) } /* Make sure it's a procedure */ - if (get_func_rettype(oid) != InvalidOid) + if (get_func_kind(oid) != PROKIND_PROCEDURE) { if (noError) return InvalidOid; @@ -2145,7 +2145,7 @@ LookupFuncWithArgs(ObjectType objtype, ObjectWithArgs *func, bool noError) } /* Make sure it's an aggregate */ - if (!get_func_isagg(oid)) + if (get_func_kind(oid) != PROKIND_AGGREGATE) { if (noError) return InvalidOid; diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c index 161470aa34..869a937d5a 100644 --- a/src/backend/utils/cache/lsyscache.c +++ b/src/backend/utils/cache/lsyscache.c @@ -1600,20 +1600,20 @@ func_parallel(Oid funcid) } /* - * get_func_isagg - *Given procedure id, return whether the function is an aggregate. + * get_func_kind + *Given procedure id, return the function kind (prokind). */ -bool -get_func_isagg(Oid funcid) +char +get_func_kind(Oid funcid) { HeapTuple tp; - boolresult; + charresult; tp = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcid)); if (!HeapTupleIsValid(tp)) elog(ERROR, "cache lookup failed for function %u", funcid); - result = ((Form_pg_proc) GETSTRUCT(tp))->prokind == PROKIND_AGGREGATE; + result = ((Form_pg_proc)
Re: RFC: Add 'taint' field to pg_control.
On Wed, Feb 28, 2018 at 02:18:12PM -0800, Andres Freund wrote: > On 2018-02-28 17:14:18 -0500, Peter Eisentraut wrote: > > I can see why you'd want that, but as a DBA, I don't necessarily want > > all of that recorded, especially in a quasi-permanent way. > > Huh? You're arguing that we should make it easier for DBAs to hide > potential causes of corruption? I fail to see when that's necessary / > desirable? That a cluster ran with fsync=off isn't an embarassing thing, > nor does it contain private data... The more fine grained these are the more useful they can be: Running with fsync=off is common advice while loading, so reporting that "fsync=off at some point" is much less useful than reporting "having to run recovery from an fsync=off database". I propose there could be two bits: 1. fsync was off at some point in history when the DB needed recovery; 2. fsync was off during the current instance; I assume the bit would be set whenever fsync=off was set, but could be cleared when the server was cleanly shut down. If the bit is set during recovery, the first bit would be permanently set. Not sure but if this bit is set and then fsync re-enabled, maybe this could be cleared after any checkpoint and not just shutdown ? Justin
Re: RFC: Add 'taint' field to pg_control.
Hi, On 2018-02-28 16:16:53 -0600, Justin Pryzby wrote: Unfortunately your list seems to raise the bar to a place I don't see us going soon :( > - pg_control versions used on this cluster (hopefully a full list..obviously >not going back before PG11); That needs arbitrary much space, that's not feasible for pg_control. Needs to be <= 512 bytes > - did recovery (you could use "needed recovery" instead, but then there's the >question of how reliable that field would be); >+ or: timestamp of most recent recovery (attempt?) What'd that be useful for? > - index corruption detected (on system table?); Note that "CREATE IF NOT >EXIST" doesn't avoid unique key errors on system tables, so it's not useful >to log every ERROR on system tables. I don't think we have a easy way to diagnose this specifically enough > - page %u is uninitialized --- fixing Doesn't really indicate a bug / problem. > - here's one I just dug up: ERROR: missing chunk number 0 for toast value > 10270298 in pg_toast_2619 Hm. > - XID wraparound? why? > - autovacuum disabled? why? > - checksum failue (in system relation?) Hm. > - local_preload_libraries? Hm? > - started in single user mode or with system indices disabled? why? > - hit assertion or PANIC ?? We certainly don't write to persistent data in those case. > - UPDATE/DELETE/INSERT on system table ? (or maintenance command like >vacuum/analyze/cluster/reindex?) Hm, the former I could see some use for, but it might end up being pretty ugly locking and layering wise. Greetings, Andres Freund
Re: [HACKERS] MERGE SQL Statement for PG11
On Fri, Feb 9, 2018 at 6:36 AM, Robert Haaswrote: > Here's my $0.02: I think that new concurrency errors thrown by the > merge code itself deserve strict scrutiny and can survive only if they > have a compelling justification, but if the merge code happens to > choose an action that leads to a concurrency error of its own, I think > that deserves only mild scrutiny. > > On that basis, of the options I listed in > http://postgr.es/m/ca+tgmozdl-caukhkwet7sr7sqr0-e2t91+devhqen5sfqsm...@mail.gmail.com > I like #1 least. > > I also dislike #4 from that list for the reasons stated there. For > example, if you say WHEN MATCHED AND x.some_boolean and then WHEN > MATCHED, you expect that every tuple that hits the latter clause will > have that Boolean as false or null, but #4 makes that not true. > > I think the best options are #2 and #5 -- #2 because it's simple, and > #5 because it's (maybe) more intuitive, albeit at the risk of > livelock. But I'm willing to convinced of some other option; like > you, I'm willing to be open-minded about this. But, as you say, we > need a coherent, well-considered justification for the selected > option, not just "well, this is what we implemented so you should like > it". The specification should define the implementation, not the > reverse. At first I hated option #2. I'm warming to #2 a lot now, though, because I've come to understand the patch's approach a little better. (Pavan and Simon should still verify that I got things right in my mail from earlier today, though.) It now seems like the patch throws a RC serialization error more or less only due to concurrent deletions (rarely, it will actually be a concurrent update that changed the join qual values of our MERGE). We're already not throwing the error (we just move on to the next input row from the join) when we happen to not have a WHEN NOT MATCHED case. But why even make that distinction? Why not just go ahead and WHEN NOT MATCHED ... INSERT when the MERGE happens to have such a clause? The INSERT will succeed, barring any concurrent conflicting insertion by a third session -- a hazard that has nothing to do with RC conflict handling in particular. Just inserting without throwing a RC serialization error is almost equivalent to a new MVCC snapshot being acquired due to a RC conflict, so all of this now looks okay to me. Especially because every other MERGE implementation seems to have serious issues with doing anything too fancy with the MERGE target table's quals within the main ON join. I think that SQL Server actually assumes that you're using the target's PK in a simple equi-join. All the examples look like that, and this assumption is heavily suggested by the "Do not attempt to improve query performance by filtering out rows in the target table in the ON clause" weasel words from their docs, that I referenced in my mail from earlier today. I can get my head around all of this now only because I've come to understand that once we've decided that a given input from the main join is NOT MATCHED, we stick to that determination. We don't bounce around between MATCHED and NOT MATCHED cases during an EPQ update chain walk. We can bounce around between multiple alternative MATCHED merge actions/WHEN cases, but that seems okay because it's still part of essentially the same EPQ update chain walk -- no obvious livelock hazard. It seems fairly clean to restart everything ("goto lmerge") for each and every tuple in the update chain. Maybe we should actually formalize that you're only supposed to do a PK or unique index equi-join within the main ON join, though -- you can do something fancy with the source table, but not the target table. -- Peter Geoghegan
Re: RFC: Add 'taint' field to pg_control.
Hi, On 2018-02-28 17:14:18 -0500, Peter Eisentraut wrote: > I can see why you'd want that, but as a DBA, I don't necessarily want > all of that recorded, especially in a quasi-permanent way. Huh? You're arguing that we should make it easier for DBAs to hide potential causes of corruption? I fail to see when that's necessary / desirable? That a cluster ran with fsync=off isn't an embarassing thing, nor does it contain private data... Greetings, Andres Freund
Re: RFC: Add 'taint' field to pg_control.
On Wed, Feb 28, 2018 at 01:43:11PM -0800, Andres Freund wrote: > a significant number of times during investigations of bugs I wondered > whether running the cluster with various settings, or various tools > could've caused the issue at hand. Therefore I'd like to propose adding > a 'tainted' field to pg_control, that contains some of the "history" of > the cluster. Individual bits inside that field that I can think of right > now are: > - pg_resetxlog was used non-passively on cluster > - ran with fsync=off > - ran with full_page_writes=off > - pg_upgrade was used What about: - pg_control versions used on this cluster (hopefully a full list..obviously not going back before PG11); - did recovery (you could use "needed recovery" instead, but then there's the question of how reliable that field would be); + or: timestamp of most recent recovery (attempt?) - index corruption detected (on system table?); Note that "CREATE IF NOT EXIST" doesn't avoid unique key errors on system tables, so it's not useful to log every ERROR on system tables. - page %u is uninitialized --- fixing - here's one I just dug up: ERROR: missing chunk number 0 for toast value 10270298 in pg_toast_2619 - XID wraparound? - autovacuum disabled? - checksum failue (in system relation?) - local_preload_libraries? - started in single user mode or with system indices disabled? - hit assertion or PANIC ?? - UPDATE/DELETE/INSERT on system table ? (or maintenance command like vacuum/analyze/cluster/reindex?) Justin
Re: RFC: Add 'taint' field to pg_control.
On 2018-02-28 23:13:44 +0100, Tomas Vondra wrote: > > On 02/28/2018 10:43 PM, Andres Freund wrote: > > Hi, > > > > a significant number of times during investigations of bugs I wondered > > whether running the cluster with various settings, or various tools > > could've caused the issue at hand. Therefore I'd like to propose adding > > a 'tainted' field to pg_control, that contains some of the "history" of > > the cluster. Individual bits inside that field that I can think of right > > now are: > > - pg_resetxlog was used non-passively on cluster > > - ran with fsync=off > > - ran with full_page_writes=off > > - pg_upgrade was used > > > > What do others think? > Isn't the uncertainty usually that you know one of these things happened > on the cluster, but you don't know if that's the actual root cause? > That's my experience, at least. My experience is that you get a bugreport and you have no clue what happened with the cluster. Often the "owner" doesn't have either. Then you go through various theories and end up blaming it on one of these, even though it's quite possibly never happened. Of course this does *not* solve the issue when these actually occurred. It's just a tool to narrow down possible causes / exclude others. Greetings, Andres Freund
Re: RFC: Add 'taint' field to pg_control.
On 2/28/18 16:43, Andres Freund wrote: > a significant number of times during investigations of bugs I wondered > whether running the cluster with various settings, or various tools > could've caused the issue at hand. Therefore I'd like to propose adding > a 'tainted' field to pg_control, that contains some of the "history" of > the cluster. Individual bits inside that field that I can think of right > now are: > - pg_resetxlog was used non-passively on cluster > - ran with fsync=off > - ran with full_page_writes=off > - pg_upgrade was used I can see why you'd want that, but as a DBA, I don't necessarily want all of that recorded, especially in a quasi-permanent way. Maybe this could be stored in a separate location, like a small text/log-like file in the data directory. Then DBAs can save or keep or remove that information as they wish. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: RFC: Add 'taint' field to pg_control.
On 02/28/2018 10:43 PM, Andres Freund wrote: > Hi, > > a significant number of times during investigations of bugs I wondered > whether running the cluster with various settings, or various tools > could've caused the issue at hand. Therefore I'd like to propose adding > a 'tainted' field to pg_control, that contains some of the "history" of > the cluster. Individual bits inside that field that I can think of right > now are: > - pg_resetxlog was used non-passively on cluster > - ran with fsync=off > - ran with full_page_writes=off > - pg_upgrade was used > > What do others think? > Isn't the uncertainty usually that you know one of these things happened on the cluster, but you don't know if that's the actual root cause? That's my experience, at least. That being said, I'm not strongly opposed to adding such field. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] [PATCH] kNN for SP-GiST
Attached 3rd version of kNN for SP-GiST. On 09.03.2017 16:52, Alexander Korotkov wrote: Hi, Nikita! I take a look to this patchset. My first notes are following. * 0003-Extract-index_store_orderby_distances-v02.patch Function index_store_orderby_distances doesn't look general enough for its name. GiST supports ordering only by float4 and float8 distances. SP-GiST also goes that way. But in the future, access methods could support distances of other types. Thus, I suggest to rename function to index_store_float8_orderby_distances(). This function was renamed. * 0004-Add-kNN-support-to-SP-GiST-v02.patch Patch didn't applied cleanly. Patches were rebased onto current master. *** AUTHORS *** 371,373 --- 376,379 Teodor Sigaev> Oleg Bartunov > + Vlad Sterzhanov > I don't think it worth mentioning here GSoC student who made just dirty prototype and abandon this work just after final evaluation. Student's mention was removed. More generally, you switched SP-GiST from scan stack to pairing heap. We should check if it introduces overhead to non-KNN scans. I decided to bring back scan stack for unordered scans. Also some benchmarks comparing KNN SP-GIST vs KNN GiST would be now. * 0005-Add-ordering-operators-to-SP-GiST-kd_point_ops-and-quad_point_ops-v02.patch I faced following warning. spgproc.c:32:10: warning: implicit declaration of function 'get_float8_nan' is invalid in C99 [-Wimplicit-function-declaration] return get_float8_nan(); ^ 1 warning generated. Fixed. * 0006-Add-spg_compress-method-to-SP-GiST-v02.patch * 0007-Add-SP-GiST-opclasses-poly_ops-and-circle_ops-v02.patch I think this is out of scope for KNN SP-GiST. This is very valuable, but completely separate feature. And it should be posted and discussed separately. Compress method for SP-GiST with poly_ops already have been committed, so I left only ordering operators for polygons in the last 6th patch. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c index f57380a..a4345ef 100644 --- a/src/backend/utils/adt/geo_ops.c +++ b/src/backend/utils/adt/geo_ops.c @@ -41,7 +41,6 @@ enum path_delim static int point_inside(Point *p, int npts, Point *plist); static int lseg_crossing(double x, double y, double px, double py); static BOX *box_construct(double x1, double x2, double y1, double y2); -static BOX *box_fill(BOX *result, double x1, double x2, double y1, double y2); static bool box_ov(BOX *box1, BOX *box2); static double box_ht(BOX *box); static double box_wd(BOX *box); @@ -451,7 +450,7 @@ box_construct(double x1, double x2, double y1, double y2) /* box_fill - fill in a given box struct */ -static BOX * +BOX * box_fill(BOX *result, double x1, double x2, double y1, double y2) { if (x1 > x2) @@ -482,7 +481,7 @@ box_fill(BOX *result, double x1, double x2, double y1, double y2) /* box_copy - copy a box */ BOX * -box_copy(BOX *box) +box_copy(const BOX *box) { BOX *result = (BOX *) palloc(sizeof(BOX)); diff --git a/src/include/utils/geo_decls.h b/src/include/utils/geo_decls.h index a589e42..94806c2 100644 --- a/src/include/utils/geo_decls.h +++ b/src/include/utils/geo_decls.h @@ -182,6 +182,7 @@ typedef struct extern double point_dt(Point *pt1, Point *pt2); extern double point_sl(Point *pt1, Point *pt2); extern double pg_hypot(double x, double y); -extern BOX *box_copy(BOX *box); +extern BOX *box_copy(const BOX *box); +extern BOX *box_fill(BOX *box, double xlo, double xhi, double ylo, double yhi); #endif /* GEO_DECLS_H */ diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c index b30b931..da76a76 100644 --- a/src/backend/access/gist/gistget.c +++ b/src/backend/access/gist/gistget.c @@ -14,9 +14,9 @@ */ #include "postgres.h" +#include "access/genam.h" #include "access/gist_private.h" #include "access/relscan.h" -#include "catalog/pg_type.h" #include "miscadmin.h" #include "pgstat.h" #include "lib/pairingheap.h" @@ -540,7 +540,6 @@ getNextNearest(IndexScanDesc scan) { GISTScanOpaque so = (GISTScanOpaque) scan->opaque; bool res = false; - int i; if (scan->xs_hitup) { @@ -561,45 +560,10 @@ getNextNearest(IndexScanDesc scan) /* found a heap item at currently minimal distance */ scan->xs_ctup.t_self = item->data.heap.heapPtr; scan->xs_recheck = item->data.heap.recheck; - scan->xs_recheckorderby = item->data.heap.recheckDistances; - for (i = 0; i < scan->numberOfOrderBys; i++) - { -if (so->orderByTypes[i] == FLOAT8OID) -{ -#ifndef USE_FLOAT8_BYVAL - /* must free any old value to avoid memory leakage */ - if (!scan->xs_orderbynulls[i]) -
Re: Two small patches for the isolationtester lexer
Daniel Gustafssonwrites: > On 22 Feb 2018, at 05:10, Tom Lane wrote: >> Actually, looking closer, this would also trigger on '#' used inside a >> SQL literal, which seems to move the problem cases into the "pretty >> likely" category instead of the "far-fetched" one. So I'd only be OK >> with it if we made the lexer smart enough to distinguish inside-a-SQL- >> literal from not. That might be a good thing anyway, since it would >> allow us to not choke on literals containing '}', but it'd be a great >> deal more work. You might be able to steal code from the psql lexer >> though. > I agree, patch 0002 was broken and the correct fix is a much bigger project - > one too big for me to tackle right now (but hopefully at some point in the > near > future). Thanks for the review of it though! OK. I'm going to mark this commitfest entry closed, since the other patch is in and this one needs a good bit more work. Please start a new thread, or at least a new CF entry, if you do more work in this area. regards, tom lane
Re: Two small patches for the isolationtester lexer
Daniel Gustafssonwrites: >> On 22 Feb 2018, at 05:12, Tom Lane wrote: >> Another idea is just to teach addlitchar to realloc the buffer bigger >> when necessary. > I think this is the best approach for the task, the attached patch changes the > static allocation to instead realloc when required. Having an upper limit on > the allocation seemed like a good idea to me, but perhaps it’s overthinking > and > complicating things for no good reason. Yeah, it doesn't seem to me that we need any fixed limit on the length, so I deleted that part of the logic, and pushed it with one or two other cosmetic adjustments. regards, tom lane
Re: Cast jsonb to numeric, int, float, bool
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:not tested We're using this patch and like it a lot. We store a lot of log-like data in s3-hosted .json.gz files. Sometimes we need to suddenly ingest them and calculate statistics and check the new what-ifs. We ingest data to simple single-column table with jsonb field, and then perform our calculation on top of it. Without this patch the only option to get data already parsed as numbers into jsonb into calculation is via binary-text-binary transformation. We're relying on the patch for our custom spatial type extension and casting in it. https://github.com/gojuno/lostgis/blob/master/sql/types/type_tpv.sql#L21 For postgres installations without the patch we do WITH INOUT cast stubbing, https://github.com/gojuno/lostgis/blob/master/sql/types/__jsonb_casts.sql - but without performance benefits of raw C processing :) A thing this patch does not implement is cast from jsonb to bigint. That would be useful for transforming stored osm_id OpenStreetMap object identifiers. Right now we're stubbing it with jsonb::numeric::bigint cast, but the middle one would be nice to get rid of. The new status of this patch is: Ready for Committer
RFC: Add 'taint' field to pg_control.
Hi, a significant number of times during investigations of bugs I wondered whether running the cluster with various settings, or various tools could've caused the issue at hand. Therefore I'd like to propose adding a 'tainted' field to pg_control, that contains some of the "history" of the cluster. Individual bits inside that field that I can think of right now are: - pg_resetxlog was used non-passively on cluster - ran with fsync=off - ran with full_page_writes=off - pg_upgrade was used What do others think? Perhaps we should rename 'tainted' to something less "blame-y" sounding, but I can't quite think of something more descriptive than 'cluster_history'. Greetings, Andres Freund
Re: Online enabling of checksums
On 02/28/2018 08:42 PM, Alvaro Herrera wrote: > I noticed that pg_verify_checksum takes an "-o oid" argument to only > check the relation with that OID; but that's wrong, because the number > is a relfilenode, not an OID (since it's compared to the on-disk file > name). I would suggest changing everything to clarify that it's a > pg_class.relfilenode value, otherwise it's going to be very confusing. > Maybe use "-f filenode" if -f is available? > I'd argue this is merely a mistake in the --help text. Firstly, relfilenodes are OIDs too, so I don't think "-o" is incorrect. Secondly, the SGML docs actually say: -o relfilenode Only validate checksums in the relation with specified relfilenode. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] MERGE SQL Statement for PG11
On Wed, Feb 28, 2018 at 8:53 AM, Robert Haaswrote: > On Tue, Feb 27, 2018 at 5:07 PM, Peter Geoghegan wrote: >> I now feel like Simon's suggestion of throwing an error in corner >> cases isn't so bad. It still seems like we could do better, but the >> more I think about it, the less that seems like a cop-out. My reasons >> are: > > I still think we really ought to try not to add a new class of error. I do too. However, it's only fair to acknowledge that the lack of any strong counterproposal after a month or so tells us something. >> * We can all agree that *not* raising an error in the specific way >> Simon proposes is possible, somehow or other. We also all agree that >> avoiding the broader category of RC errors can only be taken so far >> (e.g. in any event duplicate violations errors are entirely possible, >> in RC mode, when a MERGE inserts a row). So this is a question of what >> exact middle ground to take. Neither of the two extremes (throwing an >> error on the first sign of a RC conflict, and magically preventing >> concurrency anomalies) are actually on the table. > > Just because there's no certainty about which behavior is best doesn't > mean that none of them are better than throwing an error. Bear in mind that the patch doesn't throw an error at the first sign of trouble. It throws an error after doing an EPQ style walk, which individually verifies the extra "WHEN ... AND" quals for every tuple in the update chain (ExecUpdate()/ExecDelete() return after first EPQ tuple check for MERGE, so that an ExecMerge() caller can do that extra part with special EPQ slot). The controversial serialization error only comes when the original basic assessment of MATCHED needs to change, without regard to extra "WHEN ... AND" quals (and only when the MERGE statement was written in such a way that that matters -- it must have a WHEN NOT MATCHED clause, too). AFAICT, the patch will only throw an error when the join quals no longer pass -- not when the extra "WHEN ... AND" quals no longer pass. That would be far worse, IMV. ExecMerge() will retry until it reaches the end of the update chain, possibly changing its mind about which particular WHEN case applies repeatedly, going back and forth between mergeActions (WHEN cases) as the update chain in walked. The patch can do a lot to roll with conflicts before it gives up. Conflicts are generally caused by concurrent DELETEs, so you could say that this offers a kind of symmetry with concurrent INSERTs causing duplicate violation errors. (Concurrent UPDATEs that change the join qual values could provoke the error too, but presumably we're joining on the PK in most cases, so that seems much less likely than a simple DELETE.) Bear in mind that both the SQL Server and Oracle implementations have many significant restrictions/caveats around doing something cute with the main join quals. It's not surprising that we'd have something like that, too. According to Rahila, Oracle doesn't allow MERGE to update the columns in the join qual at all, and only allows a single WHEN MATCHED case (a DELETE can only come after an UPDATE in Oracle, oddly enough). SQL Server's docs have a warning that states: "Do not attempt to improve query performance by filtering out rows in the target table in the ON clause, such as by specifying AND NOT target_table.column_x = value. Doing so may return unexpected and incorrect results.". What does that even mean!? I am slightly concerned that the patch may still have livelock hazards in its approach to RC conflict handling. I haven't studied that aspect in enough detail, though. Since we always make forward progress by following an update chain, any problem here is probably fixable. -- Peter Geoghegan
SET TRANSACTION in PL/pgSQL
Currently, you can't run SET TRANSACTION in PL/pgSQL. A normal SQL command run inside PL/pgSQL acquires a snapshot, but SET TRANSACTION does not work anymore if a snapshot is set. Here is a patch to work around that by handling this command separately. I have coded this here bypassing SPI entirely. But there is some overlap with the no_snapshot option in the patch "PL/pgSQL nested CALL with transactions", so maybe a better solution will arise. This will also inform how to tackle this in other PLs. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 8b4d1ca1a90c18eb3a797b3938e804478022543a Mon Sep 17 00:00:00 2001 From: Peter EisentrautDate: Mon, 26 Feb 2018 11:59:06 -0500 Subject: [PATCH v1] PL/pgSQL: Add support for SET TRANSACTION A normal SQL command run inside PL/pgSQL acquires a snapshot, but SET TRANSACTION does not work anymore if a snapshot is set. So we have to handle this separately. --- .../plpgsql/src/expected/plpgsql_transaction.out | 20 + src/pl/plpgsql/src/pl_exec.c | 49 ++ src/pl/plpgsql/src/pl_funcs.c | 23 ++ src/pl/plpgsql/src/pl_gram.y | 32 +- src/pl/plpgsql/src/pl_scanner.c| 2 + src/pl/plpgsql/src/plpgsql.h | 13 +- src/pl/plpgsql/src/sql/plpgsql_transaction.sql | 19 + 7 files changed, 156 insertions(+), 2 deletions(-) diff --git a/src/pl/plpgsql/src/expected/plpgsql_transaction.out b/src/pl/plpgsql/src/expected/plpgsql_transaction.out index 8ec22c646c..5f569dc64a 100644 --- a/src/pl/plpgsql/src/expected/plpgsql_transaction.out +++ b/src/pl/plpgsql/src/expected/plpgsql_transaction.out @@ -236,6 +236,26 @@ SELECT * FROM test3; 1 (1 row) +-- SET TRANSACTION +DO LANGUAGE plpgsql $$ +BEGIN +PERFORM 1; +RAISE INFO '%', current_setting('transaction_isolation'); +COMMIT; +SET TRANSACTION ISOLATION LEVEL REPEATABLE READ; +PERFORM 1; +RAISE INFO '%', current_setting('transaction_isolation'); +COMMIT; +SET TRANSACTION ISOLATION LEVEL REPEATABLE READ; +RESET TRANSACTION ISOLATION LEVEL; +PERFORM 1; +RAISE INFO '%', current_setting('transaction_isolation'); +COMMIT; +END; +$$; +INFO: read committed +INFO: repeatable read +INFO: read committed DROP TABLE test1; DROP TABLE test2; DROP TABLE test3; diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c index 4ff87e0879..9a25ee9ad9 100644 --- a/src/pl/plpgsql/src/pl_exec.c +++ b/src/pl/plpgsql/src/pl_exec.c @@ -32,6 +32,7 @@ #include "parser/scansup.h" #include "storage/proc.h" #include "tcop/tcopprot.h" +#include "tcop/utility.h" #include "utils/array.h" #include "utils/builtins.h" #include "utils/datum.h" @@ -299,6 +300,8 @@ static int exec_stmt_commit(PLpgSQL_execstate *estate, PLpgSQL_stmt_commit *stmt); static int exec_stmt_rollback(PLpgSQL_execstate *estate, PLpgSQL_stmt_rollback *stmt); +static int exec_stmt_set(PLpgSQL_execstate *estate, + PLpgSQL_stmt_set *stmt); static void plpgsql_estate_setup(PLpgSQL_execstate *estate, PLpgSQL_function *func, @@ -1997,6 +2000,10 @@ exec_stmt(PLpgSQL_execstate *estate, PLpgSQL_stmt *stmt) rc = exec_stmt_rollback(estate, (PLpgSQL_stmt_rollback *) stmt); break; + case PLPGSQL_STMT_SET: + rc = exec_stmt_set(estate, (PLpgSQL_stmt_set *) stmt); + break; + default: estate->err_stmt = save_estmt; elog(ERROR, "unrecognized cmd_type: %d", stmt->cmd_type); @@ -4566,6 +4573,48 @@ exec_stmt_rollback(PLpgSQL_execstate *estate, PLpgSQL_stmt_rollback *stmt) return PLPGSQL_RC_OK; } +/* + * exec_stmt_set + * + * Execute SET/RESET statement. + * + * We just parse and execute the statement normally, but we have to do it + * without going through the usual SPI functions, because we don't want to set + * a snapshot for things like SET TRANSACTION. + * + * XXX It might be nice to use SPI_execute(), but in order to not get a + * snapshot, we have to pass read_only = true, which in turn prevents SET + * commands. + */ +static int +exec_stmt_set(PLpgSQL_execstate *estate, PLpgSQL_stmt_set *stmt) +{ + List *raw_parsetree_list; + RawStmt*parsetree; + List *stmt_list; + PlannedStmt *pstmt; + + raw_parsetree_list = pg_parse_query(stmt->expr->query); + Assert(list_length(raw_parsetree_list) == 1); + parsetree = linitial_node(RawStmt, raw_parsetree_list); + + stmt_list = pg_analyze_and_rewrite(parsetree, stmt->expr->query, NULL, 0, NULL); + stmt_list =
2018-03 Commitfest starts tomorrow
Hackers! I'll be starting the Commitfest at midnight AoE (07:00 ET, 13:00 CET) so please get your patches in before then. Please remember that if you drop a new and large (or invasive patch) into this CF it may be moved to the next CF. This last CF for PG11 should generally be restricted to patches that have gone through review in prior CFs or are modest in scope. Thanks, -- -David da...@pgmasters.net
PATCH: Unlogged tables re-initialization tests
These tests were originally included in the exclude unlogged tables patch [1] to provide coverage for the refactoring of reinit.c. After review we found a simpler implementation that did not require the reinit.c refactor so I dropped the tests from that patch. I did not include the refactor here because it's just noise now, but I think the tests still have value. I will add to the 2018-03 CF. -- -David da...@pgmasters.net [1]https://www.postgresql.org/message-id/04791bab-cb04-ba43-e9c0-664a4c1ffb2c%40pgmasters.net diff --git a/src/test/recovery/t/014_unlogged_reinit.pl b/src/test/recovery/t/014_unlogged_reinit.pl new file mode 100644 index 00..ac2e251158 --- /dev/null +++ b/src/test/recovery/t/014_unlogged_reinit.pl @@ -0,0 +1,117 @@ +# Tests that unlogged tables are properly reinitialized after a crash. +# +# The behavior should be the same when restoring from a backup but that is not +# tested here (yet). +use strict; +use warnings; +use PostgresNode; +use TestLib; +use Test::More tests => 16; + +# Initialize node without replication settings +my $node = get_new_node('main'); + +$node->init; +$node->start; +my $pgdata = $node->data_dir; + +# Create an unlogged table to test that forks other than init are not copied +$node->safe_psql('postgres', 'CREATE UNLOGGED TABLE base_unlogged (id int)'); + +my $baseUnloggedPath = $node->safe_psql('postgres', + q{select pg_relation_filepath('base_unlogged')}); + +# Make sure main and init forks exist +ok(-f "$pgdata/${baseUnloggedPath}_init", 'init fork in base'); +ok(-f "$pgdata/$baseUnloggedPath", 'main fork in base'); + +# The following tests test symlinks. Windows doesn't have symlinks, so +# skip on Windows. +my $tablespaceDir = undef; +my $ts1UnloggedPath = undef; + +SKIP: +{ + skip "symlinks not supported on Windows", 2 if ($windows_os); + + # Create unlogged tables in a tablespace + $tablespaceDir = TestLib::tempdir . "/ts1"; + + mkdir($tablespaceDir) + or die "unable to mkdir \"$tablespaceDir\""; + + $node->safe_psql('postgres', + "CREATE TABLESPACE ts1 LOCATION '$tablespaceDir'"); + $node->safe_psql('postgres', + 'CREATE UNLOGGED TABLE ts1_unlogged (id int) TABLESPACE ts1'); + + $ts1UnloggedPath = $node->safe_psql('postgres', + q{select pg_relation_filepath('ts1_unlogged')}); + + # Make sure main and init forks exist + ok(-f "$pgdata/${ts1UnloggedPath}_init", 'init fork in tablespace'); + ok(-f "$pgdata/$ts1UnloggedPath", 'main fork in tablespace'); +} + +# Crash the postmaster +$node->stop('immediate'); + +# Write forks to test that they are removed during recovery +$node->command_ok(['touch', "$pgdata/${baseUnloggedPath}_vm"], + 'touch vm fork in base'); +$node->command_ok(['touch', "$pgdata/${baseUnloggedPath}_fsm"], + 'touch fsm fork in base'); + +# Remove main fork to test that it is recopied from init +unlink("$pgdata/${baseUnloggedPath}") + or die "unable to remove \"${baseUnloggedPath}\""; + +# The following tests test symlinks. Windows doesn't have symlinks, so +# skip on Windows. +SKIP: +{ + skip "symlinks not supported on Windows", 2 if ($windows_os); + + # Write forks to test that they are removed by recovery + $node->command_ok(['touch', "$pgdata/${ts1UnloggedPath}_vm"], + 'touch vm fork in tablespace'); + $node->command_ok(['touch', "$pgdata/${ts1UnloggedPath}_fsm"], + 'touch fsm fork in tablespace'); + + # Remove main fork to test that it is recopied from init + unlink("$pgdata/${ts1UnloggedPath}") + or die "unable to remove \"${ts1UnloggedPath}\""; +} + +# Start the postmaster +$node->start; + +# Check unlogged table in base +ok(-f "$pgdata/${baseUnloggedPath}_init", 'init fork in base'); +ok(-f "$pgdata/$baseUnloggedPath", 'main fork in base'); +ok(!-f "$pgdata/${baseUnloggedPath}_vm", 'vm fork not in base'); +ok(!-f "$pgdata/${baseUnloggedPath}_fsm", 'fsm fork not in base'); + +# Drop unlogged table +$node->safe_psql('postgres', 'DROP TABLE base_unlogged'); + +# The following tests test symlinks. Windows doesn't have symlinks, so +# skip on Windows. +SKIP: +{ + skip "symlinks not supported on Windows", 4 if ($windows_os); + + # Check unlogged table in tablespace + ok(-f "$pgdata/${ts1UnloggedPath}_init", 'init fork in tablespace'); + ok(-f "$pgdata/$ts1UnloggedPath", 'main fork in tablespace'); + ok(!-f "$pgdata/${ts1UnloggedPath}_vm", 'vm fork not in tablespace'); + ok(!-f "$pgdata/${ts1UnloggedPath}_fsm", 'fsm fork not in tablespace'); + + # Drop unlogged table + $node->safe_psql('postgres', 'DROP TABLE ts1_unlogged'); + + # Drop tablespace + $node->safe_psql('postgres', 'DROP TABLESPACE ts1'); + rmdir($tablespaceDir) + or die "unable to rmdir \"$tablespaceDir\""; +}
PL/pgSQL nested CALL with transactions
So far, a nested CALL or DO in PL/pgSQL would not establish a context where transaction control statements were allowed. This patch fixes that by handling CALL and DO specially in PL/pgSQL, passing the atomic/nonatomic execution context through and doing the required management around transaction boundaries. This requires a new flag in SPI to run SPI_execute* without snapshot management. This currently poked directly into the plan struct, requiring access to spi_priv.h. This could use some refinement ideas. Other PLs are currently not supported, but they could be extended in the future using the principles being worked out here. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From fa13c429d1b401643af4fa87f553784edaa1515a Mon Sep 17 00:00:00 2001 From: Peter EisentrautDate: Wed, 28 Feb 2018 14:50:11 -0500 Subject: [PATCH v1] PL/pgSQL: Nested CALL with transactions So far, a nested CALL or DO in PL/pgSQL would not establish a context where transaction control statements were allowed. This fixes that by handling CALL and DO specially in PL/pgSQL, passing the atomic/nonatomic execution context through and doing the required management around transaction boundaries. XXX This requires a new flag in SPI to run SPI_execute* without snapshot management. This currently poked directly into the plan struct, requiring access to spi_priv.h. This could use some refinement ideas. --- doc/src/sgml/plpgsql.sgml | 13 +++- src/backend/executor/spi.c | 12 +-- src/backend/tcop/utility.c | 4 +- src/include/executor/spi_priv.h| 1 + src/include/tcop/utility.h | 1 + .../plpgsql/src/expected/plpgsql_transaction.out | 73 +- src/pl/plpgsql/src/pl_exec.c | 86 -- src/pl/plpgsql/src/pl_funcs.c | 25 +++ src/pl/plpgsql/src/pl_gram.y | 37 +- src/pl/plpgsql/src/pl_handler.c| 4 +- src/pl/plpgsql/src/pl_scanner.c| 2 + src/pl/plpgsql/src/plpgsql.h | 16 +++- src/pl/plpgsql/src/sql/plpgsql_transaction.sql | 42 +-- 13 files changed, 273 insertions(+), 43 deletions(-) diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml index c1e3c6a19d..04aa1759cd 100644 --- a/doc/src/sgml/plpgsql.sgml +++ b/doc/src/sgml/plpgsql.sgml @@ -3447,9 +3447,9 @@ Looping Through a Cursor's Result Transaction Management -In procedures invoked by the CALL command from the top -level as well as in anonymous code blocks (DO command) -called from the top level, it is possible to end transactions using the +In procedures invoked by the CALL command +as well as in anonymous code blocks (DO command), +it is possible to end transactions using the commands COMMIT and ROLLBACK. A new transaction is started automatically after a transaction is ended using these commands, so there is no separate START @@ -3479,6 +3479,13 @@ Transaction Management + +Transaction control is only possible in CALL or +DO invocations from the top level or nested +CALL or DO invocations without any +other intervening command. + + A transaction cannot be ended inside a loop over a query result, nor inside a block with exception handlers. diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c index 9fc4431b80..174ec6cd5a 100644 --- a/src/backend/executor/spi.c +++ b/src/backend/executor/spi.c @@ -2042,7 +2042,7 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI, * In the first two cases, we can just push the snap onto the stack once * for the whole plan list. */ - if (snapshot != InvalidSnapshot) + if (snapshot != InvalidSnapshot && !plan->no_snapshots) { if (read_only) { @@ -2121,7 +2121,7 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI, * In the default non-read-only case, get a new snapshot, replacing * any that we pushed in a previous cycle. */ - if (snapshot == InvalidSnapshot && !read_only) + if (snapshot == InvalidSnapshot && !read_only && !plan->no_snapshots) { if (pushed_active_snap) PopActiveSnapshot(); @@ -2172,7 +2172,7 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI, * If not read-only mode, advance the command counter before each * command and update the snapshot. */ - if (!read_only) + if (!read_only && !plan->no_snapshots) {
Re: Online enabling of checksums
I noticed that pg_verify_checksum takes an "-o oid" argument to only check the relation with that OID; but that's wrong, because the number is a relfilenode, not an OID (since it's compared to the on-disk file name). I would suggest changing everything to clarify that it's a pg_class.relfilenode value, otherwise it's going to be very confusing. Maybe use "-f filenode" if -f is available? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: handling of heap rewrites in logical decoding
On 2/25/18 07:27, Craig Ringer wrote: > I'm pretty sure we _will_ want the ability to decode and stream rewrite > contents for non-IMMUTABLE table rewrites. > > Filtering out by default is OK by me, but I think making it impossible > to decode is a mistake. So I'm all for the oid option and had written a > suggestion for it before I saw you already mentioned it in the next > part of your mail. > > The main issue with filtering out rewrites by default is that I don't > see how, if we default to ignore/filter-out, plugins would indicate > "actually I want to choose about this one" or "I understand table > rewrites". I'd prefer not to add another change callback. Second version, which uses an OID. I added another field to the output plugin options (next to the output_type), to indicate whether the plugin wants to receive these changes. I added some test cases to test_decoding to show how it works either way. It's a bit messy to pass this setting through to the ReorderBuffer; maybe there is a better idea. But the result seems pretty useful. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 7463a354906744ec78d1537be6a3993f632f8a3c Mon Sep 17 00:00:00 2001 From: Peter EisentrautDate: Sat, 24 Feb 2018 17:03:57 -0500 Subject: [PATCH v2] Handle heap rewrites even better in logical decoding Logical decoding should not publish anything about tables created as part of a heap rewrite during DDL. Those tables don't exist externally, so consumers of logical decoding cannot do anything sensible with that information. In ab28feae2bd3d4629bd73ae3548e671c57d785f0, we worked around this for built-in logical replication, but that was hack. This is a more proper fix: We mark such transient heaps using the new field pg_class.relwrite, linking to the original relation OID. By default, we ignore them in logical decoding before they get to the output plugin. Optionally, a plugin can register their interest in getting such changes, if the handle DDL specially, in which case the new field will help them get information about the actual table. --- .../test_decoding/expected/concurrent_ddl_dml.out | 82 -- contrib/test_decoding/expected/ddl.out | 20 +++--- .../test_decoding/specs/concurrent_ddl_dml.spec| 2 +- contrib/test_decoding/sql/ddl.sql | 5 +- contrib/test_decoding/test_decoding.c | 14 doc/src/sgml/catalogs.sgml | 12 doc/src/sgml/logicaldecoding.sgml | 5 ++ src/backend/bootstrap/bootparse.y | 1 + src/backend/catalog/heap.c | 4 ++ src/backend/catalog/toasting.c | 1 + src/backend/commands/cluster.c | 1 + src/backend/commands/tablecmds.c | 1 + src/backend/replication/logical/logical.c | 4 ++ src/backend/replication/logical/reorderbuffer.c| 7 ++ src/backend/replication/pgoutput/pgoutput.c| 26 --- src/include/catalog/heap.h | 1 + src/include/catalog/pg_class.h | 22 +++--- src/include/replication/output_plugin.h| 1 + src/include/replication/reorderbuffer.h| 2 + 19 files changed, 109 insertions(+), 102 deletions(-) diff --git a/contrib/test_decoding/expected/concurrent_ddl_dml.out b/contrib/test_decoding/expected/concurrent_ddl_dml.out index a15bfa292e..1f9e7661b7 100644 --- a/contrib/test_decoding/expected/concurrent_ddl_dml.out +++ b/contrib/test_decoding/expected/concurrent_ddl_dml.out @@ -10,7 +10,7 @@ step s1_insert_tbl1: INSERT INTO tbl1 (val1, val2) VALUES (1, 1); step s2_alter_tbl2_float: ALTER TABLE tbl2 ALTER COLUMN val2 TYPE float; step s1_insert_tbl2: INSERT INTO tbl2 (val1, val2) VALUES (1, 1); step s1_commit: COMMIT; -step s2_get_changes: SELECT regexp_replace(data, 'temp_\d+', 'temp') AS data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); +step s2_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); data BEGIN @@ -32,16 +32,13 @@ step s2_alter_tbl1_float: ALTER TABLE tbl1 ALTER COLUMN val2 TYPE float; -step s2_get_changes: SELECT regexp_replace(data, 'temp_\d+', 'temp') AS data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); +step s2_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); data BEGIN table public.tbl1: INSERT: val1[integer]:1 val2[integer]:1 table public.tbl2: INSERT: val1[integer]:1 val2[integer]:1 COMMIT -BEGIN -table public.pg_temp: INSERT: val1[integer]:1 val2[double precision]:1
Re: [PATCH] Verify Checksums during Basebackups
On 2/28/18 1:08 PM, Michael Banck wrote: > > The attached small patch verifies checksums (in case they are enabled) > during a basebackup. The rationale is that we are reading every block in > this case anyway, so this is a good opportunity to check them as well. > Other and complementary ways of checking the checksums are possible of > course, like the offline checking tool that Magnus just submitted. +1. I've done some work in this area so I have signed up to review. -- -David da...@pgmasters.net
Re: [HACKERS] Cast jsonb to numeric, int, float, bool
01.02.2017 17:41, Anastasia Lubennikova: Now the simplest way to extract booleans and numbers from json/jsonb is to cast it to text and then cast to the appropriate type: postgres=# select 'true'::jsonb::text::bool; bool -- t postgres=# select '1.0'::jsonb::text::numeric; numeric - 1.0 This patch implements direct casts from jsonb numeric (jbvNumeric) to numeric, int4 and float8, and from jsonb bool (jbvBool) to bool. postgres=# select 'true'::jsonb::bool; bool -- t postgres=# select '1.0'::jsonb::numeric; numeric - 1.0 Waiting for your feedback. If you find it useful, I can also add support of json and other types, such as smallint and bigint. I totally forgot about this thread, but it is a small but useful feature. Maybe it's time to dust it off. This patch was made by request of one of our clients, and though they already use custom build, I think it would be better to have these casts in core. The patch is attached. There are no tests yet, since I don't really sure what set of types do we want to support. Now the patch provides jsonb to numeric, int4, float8 and bool. Also I have some doubts about castcontext. But 'explisit' seems to be the correct one here. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company commit 03b57ad50b9fe5bc785958bcf16badb58971d489 Author: AnastasiaDate: Wed Feb 28 21:14:31 2018 +0300 Cast jsonbNumeric to numeric, int4 and float8. Cast jsonbBool to bool diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c index 0f70180..3678dbe 100644 --- a/src/backend/utils/adt/jsonb.c +++ b/src/backend/utils/adt/jsonb.c @@ -1845,3 +1845,129 @@ jsonb_object_agg_finalfn(PG_FUNCTION_ARGS) PG_RETURN_POINTER(out); } + +Datum +jsonb_numeric(PG_FUNCTION_ARGS) +{ + Jsonb *in = PG_GETARG_JSONB_P(0); + JsonbIterator *it; + JsonbValue v; + + if (!JB_ROOT_IS_OBJECT(in) + && !(JB_ROOT_IS_ARRAY(in) && !JB_ROOT_IS_SCALAR(in))) + { + Assert(JB_ROOT_IS_SCALAR(in)); + + it = JsonbIteratorInit(>root); + + /* + * A root scalar is stored as an array of one element, so we get the + * array and then its first (and only) member. + */ + (void) JsonbIteratorNext(, , true); + Assert(v.type == jbvArray); + (void) JsonbIteratorNext(, , true); + + if (v.type == jbvNumeric) + PG_RETURN_NUMERIC(v.val.numeric); + } + + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("key value must be json numeric"))); +} + +Datum +jsonb_int4(PG_FUNCTION_ARGS) +{ + Jsonb *in = PG_GETARG_JSONB_P(0); + JsonbIterator *it; + JsonbValue v; + + if (!JB_ROOT_IS_OBJECT(in) + && !(JB_ROOT_IS_ARRAY(in) && !JB_ROOT_IS_SCALAR(in))) + { + Assert(JB_ROOT_IS_SCALAR(in)); + + it = JsonbIteratorInit(>root); + + /* + * A root scalar is stored as an array of one element, so we get the + * array and then its first (and only) member. + */ + (void) JsonbIteratorNext(, , true); + Assert(v.type == jbvArray); + (void) JsonbIteratorNext(, , true); + + if (v.type == jbvNumeric) + PG_RETURN_INT32(DatumGetInt32(DirectFunctionCall1(numeric_int4, + NumericGetDatum(v.val.numeric; + } + + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("key value must be json numeric"))); +} + +Datum +jsonb_float8(PG_FUNCTION_ARGS) +{ + Jsonb *in = PG_GETARG_JSONB_P(0); + JsonbIterator *it; + JsonbValue v; + + if (!JB_ROOT_IS_OBJECT(in) + && !(JB_ROOT_IS_ARRAY(in) && !JB_ROOT_IS_SCALAR(in))) + { + Assert(JB_ROOT_IS_SCALAR(in)); + + it = JsonbIteratorInit(>root); + + /* + * A root scalar is stored as an array of one element, so we get the + * array and then its first (and only) member. + */ + (void) JsonbIteratorNext(, , true); + Assert(v.type == jbvArray); + (void) JsonbIteratorNext(, , true); + + if (v.type == jbvNumeric) + PG_RETURN_FLOAT8(DatumGetFloat8(DirectFunctionCall1(numeric_float8, + NumericGetDatum(v.val.numeric; + } + + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("key value must be json numeric"))); +} + +Datum +jsonb_bool(PG_FUNCTION_ARGS) +{ + Jsonb *in = PG_GETARG_JSONB_P(0); + JsonbIterator *it; + JsonbValue v; + + if (!JB_ROOT_IS_OBJECT(in) + && !(JB_ROOT_IS_ARRAY(in) && !JB_ROOT_IS_SCALAR(in))) + { + Assert(JB_ROOT_IS_SCALAR(in)); + + it = JsonbIteratorInit(>root); + + /* + * A root scalar is stored as an array of one element, so we get the + * array and then its first (and only) member. + */ + (void) JsonbIteratorNext(, , true); + Assert(v.type == jbvArray); + (void) JsonbIteratorNext(, , true); + + if (v.type == jbvBool) + PG_RETURN_BOOL(v.val.boolean); + } + + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("key value must be json boolean"))); +} diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h index b447818..ea765df 100644 ---
Re: Server won't start with fallback setting by initdb.
On Fri, Feb 9, 2018 at 3:08 AM, Kyotaro HORIGUCHIwrote: > I'm not sure such a case happens in the real world nowadays, > initdb uses the fallback value of max_connections=10. But it is > out of favor of server since it is not larger than > max_wal_senders(10). > >> postgres: max_wal_senders must be less than max_connections > > I think that we can safely increase the fallback value to 20 with > which regtests are known not to fail. I believe that is > preferable than explicitly reducing max_wal_senders in the > generated config file. I confirmed that tegtest won't fail with > the value. (Except with permanent failure of dynamic shared > memory) I propose an alternative fix: let's instead change the code like this: if (max_wal_senders > MaxConnections) { write_stderr("%s: max_wal_senders may not be more than max_connections\n", progname); ExitPostmaster(1); } That way, the behavior would match the documentation, which says: "WAL sender processes count towards the total number of connections, so the parameter cannot be set higher than max_connections." Alternatively, we could change the documentation to match the code and then do this. But it's not good that the documentation and the code don't agree on whether equal values are acceptable. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
[PATCH] Verify Checksums during Basebackups
Hi, some installations have data which is only rarerly read, and if they are so large that dumps are not routinely taken, data corruption would only be detected with some large delay even with checksums enabled. The attached small patch verifies checksums (in case they are enabled) during a basebackup. The rationale is that we are reading every block in this case anyway, so this is a good opportunity to check them as well. Other and complementary ways of checking the checksums are possible of course, like the offline checking tool that Magnus just submitted. It probably makes sense to use the same approach for determining the segment numbers as Magnus did in his patch, or refactor that out in a utility function, but I'm sick right now so wanted to submit this for v11 first. I did some light benchmarking and it seems that the performance degradation is minimal, but this could well be platform or architecture-dependent. Right now, the checksums are always checked but maybe this could be made optional, probably by extending the replication protocol. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index 185f32a5f9..841471cfce 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -16,6 +16,7 @@ #include #include +#include "access/xlog.h" #include "access/xlog_internal.h" /* for pg_start/stop_backup */ #include "catalog/catalog.h" #include "catalog/pg_type.h" @@ -30,6 +31,8 @@ #include "replication/basebackup.h" #include "replication/walsender.h" #include "replication/walsender_private.h" +#include "storage/bufpage.h" +#include "storage/checksum.h" #include "storage/dsm_impl.h" #include "storage/fd.h" #include "storage/ipc.h" @@ -1199,10 +1202,18 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf bool missing_ok) { FILE *fp; + BlockNumber blkno = 0; char buf[TAR_SEND_SIZE]; + uint16 checksum; size_t cnt; + char *filename; + int i; pgoff_t len = 0; + char page[BLCKSZ]; size_t pad; + PageHeader phdr; + BlockNumber segno = 0; + bool verify_checksum; fp = AllocateFile(readfilename, "rb"); if (fp == NULL) @@ -1214,10 +1225,54 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf errmsg("could not open file \"%s\": %m", readfilename))); } + /* + * Verify checksums only when checksums are enabled, and the file to + * send is either in one of the default tablespaces (`./global' or + * `./base'), or in an external tablespace with an absolute pathname + * (starting with `/'). + */ + if (DataChecksumsEnabled() && + (strncmp(readfilename, "./global/", 9) == 0 || + strncmp(readfilename, "./base/", 7) == 0 || + strncmp(readfilename, "/", 1) == 0)) + verify_checksum = 1; + else + verify_checksum = 0; + _tarWriteHeader(tarfilename, NULL, statbuf, false); + /* + * Determine segment number for possible checksum verification, as block + * numbers are ongoing. The initial block number of a multi-segment file is + * the segment number times RELSEG_SIZE. + */ + filename = basename(readfilename); + if (strstr(filename, ".")) + segno = atoi(strstr(filename, ".")+1); + blkno = segno * RELSEG_SIZE; + while ((cnt = fread(buf, 1, Min(sizeof(buf), statbuf->st_size - len), fp)) > 0) { + if (verify_checksum) + { + /* + * The checksums are verified at block level, so we iterate over + * the buffer in chunks of BLCKSZ. + */ + for (i = 0; i < cnt / BLCKSZ; i++) + { +memcpy(page, (buf + BLCKSZ * i), BLCKSZ); +checksum = pg_checksum_page((char *) page, blkno); +phdr = (PageHeader) page; +if (phdr->pd_checksum != checksum) + ereport(WARNING, + (errmsg("checksum mismatch in file \"%s\", block %d: " + "expected %x, found %x", readfilename, blkno, + checksum, phdr->pd_checksum))); +blkno++; + } + } + /* Send the chunk as a CopyData message */ if (pq_putmessage('d', buf, cnt)) ereport(ERROR, diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index cdf4f5be37..b410311791 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl @@ -4,7 +4,7 @@ use Cwd; use Config; use PostgresNode; use TestLib; -use Test::More tests => 79; +use Test::More tests => 83; program_help_ok('pg_basebackup'); program_version_ok('pg_basebackup'); @@ -15,7 +15,7 @@ my $tempdir = TestLib::tempdir; my $node = get_new_node('main'); # Initialize node without replication settings -$node->init; +$node->init(extra => [ '--data-checksums' ]); $node->start; my
Re: Parallel index creation & pg_stat_activity
Hi Peter, On 2018-02-28 16:50:44 +, Phil Florent wrote: > With an index creation (create index t1_i1 on t1(c1, c2);) I have this kind > of output : > > ./t -d 20 -o "pid, backend_type, query, wait_event_type, wait_event" > busy_pc | distinct_exe | pid | backend_type | query > | wait_event_type | wait_event > -+--+--++---+-+-- > 68 | 1 / 136 | 8262 | client backend | create index t1_i1 on > t1(c1, c2); | IO | DataFileRead > 26 | 1 / 53 | 8262 | client backend | create index t1_i1 on > t1(c1, c2); | | >6 | 1 / 11 | 8262 | client backend | create index t1_i1 on > t1(c1, c2); | IO | BufFileWrite > (3 rows) > No parallel worker. At least one parallel worker was active though, I could > see its work with a direct query on pg_stat_activity or a ps -ef : > > ... > postgres 8262 8230 7 08:54 ?00:22:46 postgres: 11/main: postgres > postgres [local] CREATE INDEX > ... > postgres 9833 8230 23 14:17 ?00:00:33 postgres: 11/main: parallel > worker for PID 8262 > ... Looks like we're not doing a pgstat_report_activity() in the workers? Any argument for not doing so? Greetings, Andres Freund
Re: Changing the autovacuum launcher scheduling; oldest table first algorithm
On 02/28/2018 12:04 PM, Masahiko Sawada wrote: Hi, I've created the new thread for the changing AV launcher scheduling. The problem of AV launcher scheduling is described on [1] but I summarize it here. If there is even one database that is at risk of wraparound, currently AV launcher selects the database having the oldest datfrozenxid until all tables in that database have been vacuumed. This leads that tables on other database can bloat and not be frozen because other database are not selected by AV launcher. It makes things worse if the database has a large table that takes a long time to be vacuumed. Attached patch changes the AV launcher scheduling algorithm based on the proposed idea by Robert so that it selects a database first that has the oldest table when the database is at risk of wraparound. For detail of the algorithm please refer to [2]. In this algorithm, the running AV workers advertise the next datfrozenxid on the shared memory that we will have. That way, AV launcher can select a database that has the oldest table in the database cluster. However, this algorithm doesn't support the case where the age of next datfrozenxid gets greater than autovacum_vacuum_max_age. This case can happen if users set autovacvuum_vacuum_max_age to small value and vacuuming the whole database takes a long time. But since it's not a common case and it doesn't degrade than current behavior even if this happened, I think it's not a big problem. Feedback is very welcome. [1] https://www.postgresql.org/message-id/0A3221C70F24FB45833433255569204D1F8A4DC6%40G01JPEXMBYT05 [2] https://www.postgresql.org/message-id/CA%2BTgmobT3m%3D%2BdU5HF3VGVqiZ2O%2Bv6P5wN1Gj%2BPrq%2Bhj7dAm9AQ%40mail.gmail.com Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center Hello! I`ve tried to compile your patch on Fedora24 with gcc 6.3.1 20161221: gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -g3 -O0 -I../../../src/include -D_GNU_SOURCE -c -o autovacuum.o autovacuum.c In file included from ../../../src/include/postgres.h:46:0, from autovacuum.c:62: autovacuum.c: In function ‘get_next_xid_bounds’: autovacuum.c:1979:9: warning: implicit declaration of function ‘TransactionIdIsVaild’ [-Wimplicit-function-declaration] Assert(TransactionIdIsVaild(frozenxid)); ^ ../../../src/include/c.h:713:7: note: in definition of macro ‘Trap’ if (condition) \ ^ autovacuum.c:1979:2: note: in expansion of macro ‘Assert’ Assert(TransactionIdIsVaild(frozenxid)); ^~ autovacuum.c:1980:28: error: ‘minmutli’ undeclared (first use in this function) Assert(MultiXactIdIsValid(minmutli)); ^ ../../../src/include/c.h:713:7: note: in definition of macro ‘Trap’ if (condition) \ ^ autovacuum.c:1980:2: note: in expansion of macro ‘Assert’ Assert(MultiXactIdIsValid(minmutli)); ^~ autovacuum.c:1980:9: note: in expansion of macro ‘MultiXactIdIsValid’ Assert(MultiXactIdIsValid(minmutli)); ^~ autovacuum.c:1980:28: note: each undeclared identifier is reported only once for each function it appears in Assert(MultiXactIdIsValid(minmutli)); ^ ../../../src/include/c.h:713:7: note: in definition of macro ‘Trap’ if (condition) \ ^ autovacuum.c:1980:2: note: in expansion of macro ‘Assert’ Assert(MultiXactIdIsValid(minmutli)); ^~ autovacuum.c:1980:9: note: in expansion of macro ‘MultiXactIdIsValid’ Assert(MultiXactIdIsValid(minmutli)); ^~ : recipe for target 'autovacuum.o' failed make[3]: *** [autovacuum.o] Error 1 -- Grigory Smolkin Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] path toward faster partition pruning
On Tue, Feb 27, 2018 at 8:12 PM, Amit Langotewrote: > Ah, OK. I was missing that there is no need to have both parttypcoll and > partcollation in PartitionSchemeData, as the Vars in rel->partexprs are > built from a bare PartitionKey (not PartitionSchemeData), and after that > point, parttypcoll no longer needs to kept around. > > I noticed that there is a typo in the patch. > > + memcpy(part_scheme->partcollation, partkey->parttypcoll, > > s/parttypcoll/partcollation/g Committed your version. > BTW, should there be a relevant test in partition_join.sql? If yes, > attached a patch (partitionwise-join-collation-test-1.patch) to add one. I don't feel strongly about it, but I'm not going to try to prevent you from adding one, either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)
On 28 February 2018 at 05:51, Thomas Munrowrote: > On Wed, Jan 3, 2018 at 4:31 AM, Shubham Barai > wrote: > > I have created new isolation tests. Please have a look at > > updated patch. > > Hi Shubham, > > Could we please have a rebased version of the gin one? > Sure. I have attached a rebased version Regards, Shubham Predicate-Locking-in-gin-index_v5.patch Description: Binary data
Re: Online enabling of checksums
On Sun, Feb 25, 2018 at 9:54 AM, Magnus Haganderwrote: > Also if that wasn't clear -- we only do the full page write if there isn't > already a checksum on the page and that checksum is correct. Hmm. Suppose that on the master there is a checksum on the page and that checksum is correct, but on the standby the page contents differ in some way that we don't always WAL-log, like as to hint bits, and there the checksum is incorrect. Then you'll enable checksums when the standby still has some pages without valid checksums, and disaster will ensue. I think this could be hard to prevent if checksums are turned on and off multiple times. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] MERGE SQL Statement for PG11
On Tue, Feb 27, 2018 at 5:07 PM, Peter Geogheganwrote: > I now feel like Simon's suggestion of throwing an error in corner > cases isn't so bad. It still seems like we could do better, but the > more I think about it, the less that seems like a cop-out. My reasons > are: I still think we really ought to try not to add a new class of error. > * We can all agree that *not* raising an error in the specific way > Simon proposes is possible, somehow or other. We also all agree that > avoiding the broader category of RC errors can only be taken so far > (e.g. in any event duplicate violations errors are entirely possible, > in RC mode, when a MERGE inserts a row). So this is a question of what > exact middle ground to take. Neither of the two extremes (throwing an > error on the first sign of a RC conflict, and magically preventing > concurrency anomalies) are actually on the table. Just because there's no certainty about which behavior is best doesn't mean that none of them are better than throwing an error. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] user-defined numeric data types triggering ERROR: unsupported type
OK, time to revive this old thread ... On 09/23/2017 05:27 PM, Tom Lane wrote: > Tomas Vondrawrites: [ scalarineqsel may fall over when used by extension operators ] > >> What about using two-pronged approach: > >> 1) fall back to mid bucket in back branches (9.3 - 10) >> 2) do something smarter (along the lines you outlined) in PG11 > > Sure. We need to test the fallback case anyway. > Attached is a minimal fix adding a flag to convert_numeric_to_scalar, tracking when it fails because of unsupported data type. If any of the 3 calls (value + lo/hi boundaries) sets it to 'true' we simply fall back to the default estimate (0.5) within the bucket. >>> [ sketch of a more extensible design ] > >> Sounds reasonable to me, I guess - I can't really think about anything >> simpler giving us the same flexibility. > > Actually, on further thought, that's still too simple. If you look > at convert_string_to_scalar() you'll see it's examining all three > values concurrently (the probe value, of one datatype, and two bin > boundary values of possibly a different type). The reason is that > it's stripping off whatever common prefix those strings have before > trying to form a numeric equivalent. While certainly > convert_string_to_scalar is pretty stupid in the face of non-ASCII > sort orders, the prefix-stripping is something I really don't want > to give up. So the design I sketched of considering each value > totally independently isn't good enough. > > We could, perhaps, provide an API that lets an operator estimation > function replace convert_to_scalar in toto, but that seems like > you'd still end up duplicating code in many cases. Not sure about > how to find a happy medium. > I plan to work on this improvement next, once I polish a couple of other patches for the upcoming commit fest. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index fcc8323..5f6019a 100644 --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -176,7 +176,7 @@ static bool estimate_multivariate_ndistinct(PlannerInfo *root, static bool convert_to_scalar(Datum value, Oid valuetypid, double *scaledvalue, Datum lobound, Datum hibound, Oid boundstypid, double *scaledlobound, double *scaledhibound); -static double convert_numeric_to_scalar(Datum value, Oid typid); +static double convert_numeric_to_scalar(Datum value, Oid typid, bool *unknown_type); static void convert_string_to_scalar(char *value, double *scaledvalue, char *lobound, @@ -4071,10 +4071,15 @@ convert_to_scalar(Datum value, Oid valuetypid, double *scaledvalue, case REGDICTIONARYOID: case REGROLEOID: case REGNAMESPACEOID: - *scaledvalue = convert_numeric_to_scalar(value, valuetypid); - *scaledlobound = convert_numeric_to_scalar(lobound, boundstypid); - *scaledhibound = convert_numeric_to_scalar(hibound, boundstypid); - return true; + { + bool unknown_type = false; + + *scaledvalue = convert_numeric_to_scalar(value, valuetypid, _type); + *scaledlobound = convert_numeric_to_scalar(lobound, boundstypid, _type); + *scaledhibound = convert_numeric_to_scalar(hibound, boundstypid, _type); + + return (!unknown_type); + } /* * Built-in string types @@ -4147,7 +4152,7 @@ convert_to_scalar(Datum value, Oid valuetypid, double *scaledvalue, * Do convert_to_scalar()'s work for any numeric data type. */ static double -convert_numeric_to_scalar(Datum value, Oid typid) +convert_numeric_to_scalar(Datum value, Oid typid, bool *unknown_type) { switch (typid) { @@ -4185,9 +4190,11 @@ convert_numeric_to_scalar(Datum value, Oid typid) /* * Can't get here unless someone tries to use scalarineqsel() on an - * operator with one numeric and one non-numeric operand. + * operator with one numeric and one non-numeric operand. Or more + * precisely, operand that we don't know how to transform to scalar. */ - elog(ERROR, "unsupported type: %u", typid); + *unknown_type = true; + return 0; }
Re: Registering LWTRANCHE_PARALLEL_HASH_JOIN
On Tue, Feb 27, 2018 at 3:58 PM, Thomas Munrowrote: > On Wed, Feb 28, 2018 at 8:39 AM, Robert Haas wrote: >> On Sat, Feb 10, 2018 at 6:07 PM, Thomas Munro >> wrote: >>> I forgot to register a display name for LWTRANCHE_PARALLEL_HASH_JOIN, >>> the tranche ID used by the LWLock that Parallel Hash uses when handing >>> out chunks of memory. Please see attached. >> >> I think that you need to insert some weasel words into the >> documentation for this, because I don't think it's really accurate to >> say that it's only used when trying to acquire a new chunk of memory. >> >> Or maybe I'm wrong and it's altogether accurate ... but >> ExecParallelHashMergeCounters doesn't look like an allocation to me, >> and ExecParallelHashTuplePrealloc doesn't really look like an >> allocation either. > > Ok. How about this? > > I noticed that some of the descriptions don't attempt to explain what > activity the lock protects at all, they just say "Waiting for $BLAH > lock". I went the other way and covered the various different uses. > There are 4 uses for the lock but only three things in my list, > because I think "allocate" covers both ExecParallelHashTupleAlloc() > and ExecParallelHashTuplePrealloc(). Well, the trouble with that of course is that if something changes later then we have to update the docs, whereas if we keep it vague then we avoid that. But I've committed that version as you have it and maybe by the time it needs to be updated they'll have made you a committer and you can be the poor shmuck who has to spend time committing fixes like this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: ON CONFLICT DO UPDATE for partitioned tables
On Tue, Feb 27, 2018 at 7:46 PM, Alvaro Herrerawrote: > I updated Amit Langote's patch for INSERT ON CONFLICT DO UPDATE[1]. > Following the lead of edd44738bc88 ("Be lazier about partition tuple > routing.") this incarnation only does the necessary push-ups for the > specific partition that needs it, at execution time. As far as I can > tell, it works as intended. > > I chose to refuse the case where the DO UPDATE clause causes the tuple > to move to another partition (i.e. you're updating the partition key of > the tuple). While it's probably possible to implement that, it doesn't > seem a very productive use of time. I would have thought that to be the only case we could support with the current infrastructure. Doesn't a correct implementation for any other case require a global index? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist
Hi, I want to propose a bunch of patches which allow to reduce WAL traffic generated by CREATE INDEX for GiST, GIN and SP-GiST. Similarly to b-tree and RUM, we can now log index pages of other access methods only once in the end of indexbuild process. Implementation is based on generic_xlog. Not only it decreases the amount of WAL generated, but also completely eliminates WAL overhead in case of error during index build. I also attached sql scripts which I used to measure xlog size. They show that pg_wal_lsn_diff for patched version is from 3 to 5 times smaller. Not sure if regression tests are needed, since it is just an optimization. But I do not mind to add them if someone feels that it is necessary. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company commit 179285fb5175d715c20fc95eca3087b6a1899ed9 Author: AnastasiaDate: Wed Feb 28 17:45:54 2018 +0300 add function generate_xlog_for_rel() diff --git a/src/backend/access/transam/generic_xlog.c b/src/backend/access/transam/generic_xlog.c index ce02354..dd2c041 100644 --- a/src/backend/access/transam/generic_xlog.c +++ b/src/backend/access/transam/generic_xlog.c @@ -545,3 +545,34 @@ generic_mask(char *page, BlockNumber blkno) mask_unused_space(page); } + +/* + * Function to write generic xlog for every existing block of a relation. + * Caller is responsible for locking the relation exclusively. + */ +void +generate_xlog_for_rel(Relation rel) +{ + BlockNumber blkno; + BlockNumber nblocks; + + nblocks = RelationGetNumberOfBlocks(rel); + + elog(DEBUG2, "generate_xlog_for_rel '%s', nblocks %u BEGIN.", + RelationGetRelationName(rel), nblocks); + + for (blkno = 0; blkno < nblocks; blkno++) + { + Buffer buffer; + GenericXLogState *state; + + CHECK_FOR_INTERRUPTS(); + + buffer = ReadBuffer(rel, blkno); + LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE); + + state = GenericXLogStart(rel); + GenericXLogRegisterBuffer(state, buffer, GENERIC_XLOG_FULL_IMAGE); + GenericXLogFinish(state); + + UnlockReleaseBuffer(buffer); + } + elog(DEBUG2, "generate_xlog_for_rel '%s' END.", RelationGetRelationName(rel)); +} diff --git a/src/include/access/generic_xlog.h b/src/include/access/generic_xlog.h index b23e1f6..33be157 100644 --- a/src/include/access/generic_xlog.h +++ b/src/include/access/generic_xlog.h @@ -42,4 +42,7 @@ extern const char *generic_identify(uint8 info); extern void generic_desc(StringInfo buf, XLogReaderState *record); extern void generic_mask(char *pagedata, BlockNumber blkno); +/* other utils */ +void generate_xlog_for_rel(Relation rel); + #endif /* GENERIC_XLOG_H */ commit e176bd8f650a4b112fa2e61960a27cb57329138c Author: Anastasia Date: Wed Feb 28 17:53:15 2018 +0300 optimal WAL for gin diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c index 37070b3..c615d3c 100644 --- a/src/backend/access/gin/ginbtree.c +++ b/src/backend/access/gin/ginbtree.c @@ -388,7 +388,7 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack, /* It will fit, perform the insertion */ START_CRIT_SECTION(); - if (RelationNeedsWAL(btree->index)) + if (RelationNeedsWAL(btree->index) && !btree->isBuild) { XLogBeginInsert(); XLogRegisterBuffer(0, stack->buffer, REGBUF_STANDARD); @@ -409,7 +409,7 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack, MarkBufferDirty(childbuf); } - if (RelationNeedsWAL(btree->index)) + if (RelationNeedsWAL(btree->index) && !btree->isBuild) { XLogRecPtr recptr; ginxlogInsert xlrec; @@ -566,7 +566,7 @@ ginPlaceToPage(GinBtree btree, GinBtreeStack *stack, } /* write WAL record */ - if (RelationNeedsWAL(btree->index)) + if (RelationNeedsWAL(btree->index) && !btree->isBuild) { XLogRecPtr recptr; diff --git a/src/backend/access/gin/gindatapage.c b/src/backend/access/gin/gindatapage.c index f9daaba..349baa7 100644 --- a/src/backend/access/gin/gindatapage.c +++ b/src/backend/access/gin/gindatapage.c @@ -592,7 +592,7 @@ dataBeginPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack, * Great, all the items fit on a single page. If needed, prepare data * for a WAL record describing the changes we'll make. */ - if (RelationNeedsWAL(btree->index)) + if (RelationNeedsWAL(btree->index) && !btree->isBuild) computeLeafRecompressWALData(leaf); /* @@ -629,6 +629,7 @@ dataBeginPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack, * subsequent insertions will probably also go to the end. This packs * the index somewhat tighter when appending to a table, which is very * common. + * */ if (!btree->isBuild) { @@ -718,7 +719,7 @@ dataExecPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack, dataPlaceToPageLeafRecompress(buf, leaf); /* If needed, register WAL data built by computeLeafRecompressWALData */ - if (RelationNeedsWAL(btree->index)) + if
Re: server crash in nodeAppend.c
On Tue, Feb 27, 2018 at 5:24 AM, Rajkumar Raghuwanshiwrote: > SET parallel_setup_cost=0; > SET parallel_tuple_cost=0; > create or replace function foobar() returns setof text as > $$ select 'foo'::varchar union all select 'bar'::varchar ; $$ > language sql stable; > > postgres=# select foobar(); > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > > --logfile > 2018-02-26 22:15:52.908 IST [61738] LOG: database system is ready to accept > connections > TRAP: FailedAssertion("!(whichplan >= 0 && whichplan <= node->as_nplans)", > File: "nodeAppend.c", Line: 365) > 2018-02-26 22:17:50.441 IST [61738] LOG: server process (PID 61748) was > terminated by signal 6: Aborted > 2018-02-26 22:17:50.441 IST [61738] DETAIL: Failed process was running: > select foobar(); Nice test case. I pushed commit ce1663cdcdbd9bf15c81570277f70571b3727dd3, including your test case, to fix this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
PATCH: Exclude temp relations from base backup
This is a follow-up patch from the exclude unlogged relations discussion [1]. The patch excludes temporary relations during a base backup using the existing looks_like_temp_rel_name() function for identification. It shares code to identify database directories from [1], so for now that has been duplicated in this patch to make it independent. I'll rebase depending on what gets committed first. Thanks, -- -David da...@pgmasters.net [1] https://www.postgresql.org/message-id/04791bab-cb04-ba43-e9c0-664a4c1ffb2c%40pgmasters.net diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 3cec9e0b0c..3880a9134b 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -2549,7 +2549,7 @@ The commands accepted in walsender mode are: Various temporary files and directories created during the operation of the PostgreSQL server, such as any file or directory beginning - with pgsql_tmp. + with pgsql_tmp and temporary relations. diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index 185f32a5f9..f0c3d13b2b 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -26,6 +26,7 @@ #include "nodes/pg_list.h" #include "pgtar.h" #include "pgstat.h" +#include "port.h" #include "postmaster/syslogger.h" #include "replication/basebackup.h" #include "replication/walsender.h" @@ -958,6 +959,36 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces, charpathbuf[MAXPGPATH * 2]; struct stat statbuf; int64 size = 0; + const char *lastDir; /* Split last dir from parent path. */ + boolisDbDir = false;/* Does this directory contain relations? */ + + /* +* Determine if the current path is a database directory that can +* contain relations. +* +* Start by finding the location of the delimiter between the parent +* path and the current path. +*/ + lastDir = last_dir_separator(path); + + /* Does this path look like a database path (i.e. all digits)? */ + if (lastDir != NULL && + strspn(lastDir + 1, "0123456789") == strlen(lastDir + 1)) + { + /* Part of path that contains the parent directory. */ + int parentPathLen = lastDir - path; + + /* +* Mark path as a database directory if the parent path is either +* $PGDATA/base or a tablespace version path. +*/ + if (strncmp(path, "./base", parentPathLen) == 0 || + (parentPathLen >= (sizeof(TABLESPACE_VERSION_DIRECTORY) - 1) && +strncmp(lastDir - (sizeof(TABLESPACE_VERSION_DIRECTORY) - 1), +TABLESPACE_VERSION_DIRECTORY, +sizeof(TABLESPACE_VERSION_DIRECTORY) - 1) == 0)) + isDbDir = true; + } dir = AllocateDir(path); while ((de = ReadDir(dir, path)) != NULL) @@ -1007,6 +1038,16 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces, if (excludeFound) continue; + /* Exclude temporary relations */ + if (isDbDir && looks_like_temp_rel_name(de->d_name)) + { + elog(DEBUG2, +"temporary relation file \"%s\" excluded from backup", +de->d_name); + + continue; + } + snprintf(pathbuf, sizeof(pathbuf), "%s/%s", path, de->d_name); /* Skip pg_control here to back up it last */ diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c index 2a18e94ff4..d30a725f90 100644 --- a/src/backend/storage/file/fd.c +++ b/src/backend/storage/file/fd.c @@ -325,7 +325,6 @@ static void RemovePgTempFilesInDir(const char *tmpdirname, bool missing_ok, bool unlink_all); static void RemovePgTempRelationFiles(const char *tsdirname); static void RemovePgTempRelationFilesInDbspace(const char *dbspacedirname); -static bool looks_like_temp_rel_name(const char *name); static void walkdir(const char *path, void (*action) (const char *fname, bool isdir, int elevel), @@ -3192,7 +3191,7 @@ RemovePgTempRelationFilesInDbspace(const char *dbspacedirname) } /* t_, or t__ */ -static bool +bool looks_like_temp_rel_name(const char *name) { int pos; diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index cdf4f5be37..16bd91f63e 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
Re: Let's remove DSM_INPL_NONE.
On Tue, Feb 27, 2018 at 11:30 PM, Tom Lanewrote: > I'd be in favor of having some normally-ifdef'd-out facility for causing > failures of this kind. (As I mentioned upthread, I do not think "always > fail" is sufficient.) That's very different from having a user-visible > option, though. We don't expose a GUC that enables CLOBBER_FREED_MEMORY, > to take a relevant example. Sure. Feel free to propose something you like. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Kerberos test suite
On 2/27/18 00:56, Thomas Munro wrote: > FWIW it passes for me if I add this: > > +elsif ($^O eq 'freebsd') > +{ > + $krb5_bin_dir = '/usr/local/bin'; > + $krb5_sbin_dir = '/usr/local/sbin'; I suppose you only need the second one, right? The first one should be in the path. > +} > > One thing that surprised me is that your test runs my system's > installed psql command out of my $PATH, not the one under test. Is > that expected? Oh, you probably have a psql in /usr/local/bin, which we prepend to the path, per the above. We should probably append instead. The ldap test suite has the same issue. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: Direct converting numeric types to bool
-Original Message- From: n.zhuch...@postgrespro.ru [mailto:n.zhuch...@postgrespro.ru] Sent: Wednesday, February 28, 2018 6:04 PM To: pgsql-hackersSubject: Direct converting numeric types to bool Attached patch allow direct convertion of numeric types to bool like integer::bool. Supported types: - smallint; - bigint; - real; - double precision; - decimal(numeric). This functionality is helped with migration from Oracle. -- Nikita Zhuchkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company Hello! What prevent us from: postgres=# select 1::bigint::int::boolean; bool -- t (1 row) It is just one additional casting and required no additional patching -- Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly
On Tue, Feb 27, 2018 at 5:14 PM, Tom Lanewrote: >> I ran the postgres_fdw regression test with no sleep two times in a >> CLOBBER_CACHE_ALWAYS-enabled build, and then the regression test with >> the sleep (60 seconds) two times, but I couldn't reproduce that in both >> cases. I suspect the changes in the order of the RETURNING output there >> was still caused by autovacuum kicking in partway through the run. So >> to make the regression test more stable against autovacuum, I'd propose >> to modify the regression test to disable autovacuum for the remote table >> (ie "S 1"."T 1") (and perform VACUUM ANALYZE to that table manually >> instead) in hopes of getting that fixed. Attached is a patch for that. >> I think changes added by the previous patch wouldn't make sense >> anymore, so I removed those changes. > > Ping? We're still seeing those failures on jaguarundi. This is another patch that got past me. Committed now. I'd be lying if I said I was filled with confidence that this was going to fix it, but I don't know what to do other than keep tinkering with it until we find something that works. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Direct converting numeric types to bool
2018-02-28 16:13 GMT+01:00 Pavel Stehule: > Hi > > 2018-02-28 16:06 GMT+01:00 : > >> n.zhuch...@postgrespro.ru писал 2018-02-28 18:04: >> >> Attached patch allow direct convertion of numeric types to bool like >>> integer::bool. >>> Supported types: >>> - smallint; >>> - bigint; >>> - real; >>> - double precision; >>> - decimal(numeric). >>> >>> This functionality is helped with migration from Oracle. >>> >> > Looks little bit obscure to upstream code (can lives as extension outside) > > all work can be done be function > > CREATE OR REPLACE FUNCTION public.to_bool(anyelement) > RETURNS boolean > LANGUAGE sql > IMMUTABLE STRICT > AS $function$ > select $1::int::boolean $function$ > > I really doesn't see any sense to allow cast from double to boolean > Long time Oracle had not boolean, so some ugly tricks was necessary there. There are not reason do same in Postgres. > > -1 from me > > Regards > > Pavel > >
Re: Direct converting numeric types to bool
Hi 2018-02-28 16:06 GMT+01:00: > n.zhuch...@postgrespro.ru писал 2018-02-28 18:04: > > Attached patch allow direct convertion of numeric types to bool like >> integer::bool. >> Supported types: >> - smallint; >> - bigint; >> - real; >> - double precision; >> - decimal(numeric). >> >> This functionality is helped with migration from Oracle. >> > Looks little bit obscure to upstream code (can lives as extension outside) all work can be done be function CREATE OR REPLACE FUNCTION public.to_bool(anyelement) RETURNS boolean LANGUAGE sql IMMUTABLE STRICT AS $function$ select $1::int::boolean $function$ I really doesn't see any sense to allow cast from double to boolean -1 from me Regards Pavel
Re: Incorrect comments in partition.c
On Wed, Feb 28, 2018 at 3:24 AM, Etsuro Fujitawrote: > I'll add this to the upcoming commitfest. Committed. Sorry that I didn't notice this thread sooner (and that the original commits didn't take care of it). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Direct converting numeric types to bool
n.zhuch...@postgrespro.ru writes: > Attached patch allow direct convertion of numeric types to bool like > integer::bool. > Supported types: > - smallint; > - bigint; > - real; > - double precision; > - decimal(numeric). > This functionality is helped with migration from Oracle. I think you forgot to attach the patch, but in any case: is this really a behavior we want? "Oracle has it" is not a good argument in my view, nor do I recall people complaining that they need such a behavior to migrate. regards, tom lane
Re: [HACKERS] [POC] Faster processing at Gather node
On Tue, Feb 27, 2018 at 4:06 PM, Andres Freundwrote: >> OK, I'll try to check how feasible that would be. > > cool. It's not too hard, but it doesn't really seem to help, so I'm inclined to leave it alone. To make it work, you need to keep two separate counters in the shm_mq_handle, one for the number of bytes since we did an increment and the other for the number of bytes since we sent a signal. I don't really want to introduce that complexity unless there is a benefit. With just 0001 and 0002: 3968.899 ms, 4043.428 ms, 4042.472 ms, 4142.226 ms With two-separate-counters.patch added: 4123.841 ms, 4101.917 ms, 4063.368 ms, 3985.148 ms If you take the total of the 4 times, that's an 0.4% slowdown with the patch applied, but I think that's just noise. It seems possible that with a larger queue -- and maybe a different query shape it would help, but I really just want to get the optimizations that I've got committed, provided that you find them acceptable, rather than spend a lot of time looking for new optimizations, because: 1. I've got other things to get done. 2. I think that the patches I've got here capture most of the available benefit. 3. This case isn't super-common in the first place -- we generally want to avoid feeding tons of tuples through the Gather. 4. We might abandon the shm_mq approach entirely and switch to something like sticking tuples in DSA using the flexible tuple slot stuff you've proposed elsewhere. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company 0002-shm-mq-reduce-receiver-latch-set-v3.patch Description: Binary data 0001-shm-mq-less-spinlocks-v4.patch Description: Binary data two-separate-counters.patch Description: Binary data
Re: Direct converting numeric types to bool
n.zhuch...@postgrespro.ru писал 2018-02-28 18:04: Attached patch allow direct convertion of numeric types to bool like integer::bool. Supported types: - smallint; - bigint; - real; - double precision; - decimal(numeric). This functionality is helped with migration from Oracle. diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql index 686528c..6cbfcf5 100644 --- a/src/backend/catalog/information_schema.sql +++ b/src/backend/catalog/information_schema.sql @@ -2970,3 +2970,83 @@ CREATE VIEW user_mappings AS FROM _pg_user_mappings; GRANT SELECT ON user_mappings TO PUBLIC; + +-- bool --> smallint(int2) +CREATE OR REPLACE FUNCTION bool_smallint(boolean) + RETURNS smallint AS + 'bool_int2' + LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE + COST 1; +CREATE CAST (boolean AS smallint) WITH FUNCTION bool_smallint(boolean) as implicit; + +-- smallint(int2) --> bool +CREATE OR REPLACE FUNCTION smallint_bool(smallint) + RETURNS bool AS + 'int2_bool' + LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE + COST 1; +CREATE CAST (smallint AS boolean) WITH FUNCTION smallint_bool(smallint) as implicit; + +-- bigint(int8) --> bool +CREATE OR REPLACE FUNCTION bool_bigint(boolean) + RETURNS bigint AS + 'bool_int8' + LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE + COST 1; +CREATE CAST (boolean AS bigint) WITH FUNCTION bool_bigint(boolean) as implicit; + +-- bool --> bigint(int8) +CREATE OR REPLACE FUNCTION bigint_bool(bigint) + RETURNS bool AS + 'int8_bool' + LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE + COST 1; +CREATE CAST (bigint AS boolean) WITH FUNCTION bigint_bool(bigint) as implicit; + +-- double precision(float8) --> bool +CREATE OR REPLACE FUNCTION double_precision_bool(double precision) + RETURNS bool AS + 'float8_bool' + LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE + COST 1; +CREATE CAST (double precision AS boolean) WITH FUNCTION double_precision_bool(double precision) as implicit; + +-- bool --> double precision(float8) +CREATE OR REPLACE FUNCTION bool_double_precision(boolean) + RETURNS double precision AS + 'bool_float8' + LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE + COST 1; +CREATE CAST (boolean AS double precision) WITH FUNCTION bool_double_precision(boolean) as implicit; + +-- real(float4) --> bool +CREATE OR REPLACE FUNCTION real_bool(real) + RETURNS bool AS + 'float4_bool' + LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE + COST 1; +CREATE CAST (real AS boolean) WITH FUNCTION real_bool(real) as implicit; + +-- bool --> real(float4) +CREATE OR REPLACE FUNCTION bool_real(boolean) + RETURNS real AS + 'bool_float4' + LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE + COST 1; +CREATE CAST (boolean AS real) WITH FUNCTION bool_real(boolean) as implicit; + +-- numeric(decimal) --> bool +CREATE OR REPLACE FUNCTION numeric_bool(decimal) + RETURNS bool AS + 'numeric_bool' + LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE + COST 1; +CREATE CAST (decimal AS boolean) WITH FUNCTION numeric_bool(decimal) as implicit; + +-- bool --> numeric(decimal) +CREATE OR REPLACE FUNCTION bool_numeric(boolean) + RETURNS decimal AS + 'bool_numeric' + LANGUAGE internal IMMUTABLE STRICT PARALLEL SAFE + COST 1; +CREATE CAST (boolean AS decimal) WITH FUNCTION bool_numeric(boolean) as implicit; diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c index bc6a3e0..a687104 100644 --- a/src/backend/utils/adt/float.c +++ b/src/backend/utils/adt/float.c @@ -1331,6 +1331,53 @@ i2tof(PG_FUNCTION_ARGS) PG_RETURN_FLOAT4((float4) num); } + /* + * float4_bool - converts a float4 number to a bool + * epsilon - machine epsilon gives an upper bound on the relative error + * due to rounding in floating point arithmetic. + */ +Datum +float4_bool(PG_FUNCTION_ARGS) +{ + float epsilon = 5.96e-08; + + if (fabs(PG_GETARG_FLOAT4(0)) <= epsilon) + PG_RETURN_BOOL(false); + else + PG_RETURN_BOOL(true); +} + +/* + * bool_float4 - converts a bool to a float4 number + */ +Datum +bool_float4(PG_FUNCTION_ARGS) +{ + if (PG_GETARG_BOOL(0) == false) + PG_RETURN_FLOAT4(0); + else + PG_RETURN_FLOAT4(1); +} + +/* + * float8_bool - converts a float4 number to a bool + * epsilon - machine epsilon gives an upper bound on the relative error + * due to rounding in floating point arithmetic. + */ +Datum +float8_bool(PG_FUNCTION_ARGS) +{ + float epsilon = 1.11e-16; + + if (fabs(PG_GETARG_FLOAT8(0)) <= epsilon) + PG_RETURN_BOOL(false); + else + PG_RETURN_BOOL(true); +} + +/* + * bool_float8 - converts a bool to a float4 number + */ +Datum +bool_float8(PG_FUNCTION_ARGS) +{ + if (PG_GETARG_BOOL(0) == false) + PG_RETURN_FLOAT8(0); + else + PG_RETURN_FLOAT8(1); +} /* * === diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c index 559c365..a6ea79b 100644 --- a/src/backend/utils/adt/int.c +++ b/src/backend/utils/adt/int.c @@ -354,6 +354,26 @@
Direct converting numeric types to bool
Attached patch allow direct convertion of numeric types to bool like integer::bool. Supported types: - smallint; - bigint; - real; - double precision; - decimal(numeric). This functionality is helped with migration from Oracle. -- Nikita Zhuchkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Disabling src/test/[ssl|ldap] when not building with SSL/LDAP support
On 2/24/18 18:29, Michael Paquier wrote: > Sure. But then I think that it would be nice to show up on screen the > reason why the test failed if possible. As of now if SSL is missing the > whole run shows in red without providing much useful information. > Instead of 0001 as shaped previously, why not using as well diag to show > the failure on the screen? > > For example the following block at the top of each test: > if (!check_pg_config("#define USE_OPENSSL 1")) > { > diag "SSL tests not supported without support in build"; > die; > } I think BAIL_OUT() is intended for this. >> What I had in mind would consist of something like this in >> src/test/Makefile: >> >> ifeq ($(with_ldap),yes) >> ifneq (,$(filter ldap,$(YOUR_VARIABLE_HERE))) >> SUBDIRS += ldap >> endif >> endif > > OK. So let's call it PG_TEST_EXTRA_INSECURE or PG_TEST_EXTRA, which can > take 'ldap', 'ssl' or 'ldap,ssl' as valid values. Seeing that > documented is really necessary in my opinion. Any idea for a better > name? I don't have a great idea about the name. The value should be space-separated to work better with make functions. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Opclass parameters
В письме от 28 февраля 2018 00:46:36 пользователь Nikita Glukhov написал: > I would like to present patch set implementing opclass parameters. > > This feature was recently presented at pgconf.ru: > http://www.sai.msu.su/~megera/postgres/talks/opclass_pgconf.ru-2018.pdf > > A analogous work was already done by Nikolay Shaplov two years ago: > https://www.postgresql.org/message-id/5213596.TqFRiqmCTe%40nataraj-amd64 > But this patches are not based on it, although they are very similar. Hi! You know, I am still working on this issue. When I started to work on it, I found out that option code is not flexible at all, and you can' reuse it for options that are not relOptions. I gave your patch just a short glance for now, but as far as I can you start deviding options into global and local one. I am afraid it will create grater mess than it is now. What I am doing right now, I am creating a new reloption internal API, that will allow to create any option in any place using the very same code. I think it should be done first, and then use it for opclass parameters and any kind of options you like. The big patch is here https://commitfest.postgresql.org/14/992/ (I am afraid it will not apply to current master as it is, It is quite old. But you can get the idea) The smaller parts of the patch that in current commitfest are https://commitfest.postgresql.org/17/1536/ https://commitfest.postgresql.org/17/1489/ You can help reviewing them. Then the whole thing will go faster. The second patch is quite trivial. The first will also need attention of someone who is really good in understanding postgres internals. --- Concerning the patch that you've provided. I've just have a short look. But I already have some question. 1. I've seen you've added a new attribute into pg_index. Why??!! As far as I can get, if have index built on several columns (A1, A2, A3) you can set, own opclass for each column. And set individual options for each opclass if we are speaking about options. So I would expect to have these options not in pg_index, but in pg_attribute. And we already have one there: attoptions.I just do not get how you have come to per-index options. May be I should read code more attentively... 2. Your patch does not provide any example of your new tool usage. In my prototype patch I've shown the implementation of opclass options for intarray. May be you should do the same. (Use my example it will be more easy then do it from scratch). It will give more understanding of how this improvement can be used. 3. --- a/src/test/regress/expected/btree_index.out +++ b/src/test/regress/expected/btree_index.out @@ -150,3 +150,8 @@ vacuum btree_tall_tbl; -- need to insert some rows to cause the fast root page to split. insert into btree_tall_tbl (id, t) select g, repeat('x', 100) from generate_series(1, 500) g; +-- Test unsupported btree opclass parameters +create index on btree_tall_tbl (id int4_ops(foo=1)); +ERROR: access method "btree" does not support opclass options +create index on btree_tall_tbl (id default(foo=1)); +ERROR: access method "btree" does not support opclass options Are you sure we really need these in postgres??? - So my proposal is the following: let's commit https://commitfest.postgresql.org/17/1536/ https://commitfest.postgresql.org/17/1489/ I will provide a final patch for option engine refactoring in commit fest, and you will write you implementation of opclass options on top of it. We can to it simultaneously. Or I will write opclass options myself... I do not care who will do oplcass options as long it is done using refactored options engine. If you do it, I can review it. > > > Opclass parameters can give user ability to: > * Define the values of the constants that are hardcoded now in the > opclasses depending on the indexed data. > * Specify what to index for non-atomic data types (arrays, json[b], > tsvector). Partial index can only filter whole rows. > * Specify what indexing algorithm to use depending on the indexed data. > > > Description of patches: > > 1. Infrastructure for opclass parameters. > > SQL grammar is changed only for CREATE INDEX statement: parenthesized > parameters in reloptions format are added after column's opclass name. > Default opclass can be specified with DEFAULT keyword: > > CREATE INDEX idx ON tab USING am ( > {expr {opclass | DEFAULT} ({name=value} [,...])} [,...] > ); > > Example for contrib/intarray: > > CREATE INDEX ON arrays USING gist ( >arr gist__intbig_ops (siglen = 32), >arr DEFAULT (numranges = 100) > ); > > \d arrays > Table "public.arrays" > Column | Type| Collation | Nullable | Default > +---+---+--+- > arr| integer[] | | | > Indexes: > "arrays_arr_arr1_idx" gist (arr gist__intbig_ops (siglen='32'), arr > gist__int_ops (numranges='100')) > >
Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept
Thank you for your valuable comments. I've made a few adjustments. The main goal of my changes is to let long read-only transactions run on replica if hot_standby_feedback is turned on. Patch1 - hsfeedback_av_truncate.patch is made to stop ResolveRecoveryConflictWithLock occurs on replica, after autovacuum lazy truncates heap on master cutting some pages at the end. When hot_standby_feedback is on, we know that the autovacuum does not remove anything superfluous, which could be needed on standby, so there is no need to rise any ResolveRecoveryConflict*. 1) Add to xl_standby_locks and xl_smgr_truncate isautovacuum flag, which tells us that autovacuum generates them. 2) When autovacuum decides to trim the table (using lazy_truncate_heap), it takes AccessExclusiveLock and sends this lock to the replica, but replica should ignore AccessExclusiveLock if hot_standby_feedback=on. 3) When autovacuum truncate wal message is replayed on a replica, it takes ExclusiveLock on a table, so as not to interfere with read-only requests. We have two cases of resolving ResolveRecoveryConflictWithLock if timers (max_standby_streaming_delay and max_standby_archive_delay) have run out: backend is idle in transaction (waiting input) - in this case backend will be sent SIGTERM backend transaction is running query - in this case running transaction will be aborted How to test: Make async replica, turn on feedback and reduce max_standby_streaming_delay. Make autovacuum more aggressive. autovacuum = on autovacuum_max_workers = 1 autovacuum_naptime = 1s autovacuum_vacuum_threshold = 1 autovacuum_vacuum_cost_delay = 0 Test1: Here we will do a load on the master and simulation of a long transaction with repeated 1 second SEQSCANS on the replica (by calling pg_sleep 1 second duration every 6 seconds). MASTERREPLICA hot_standby = on max_standby_streaming_delay = 1s hot_standby_feedback = on start CREATE TABLE test AS (SELECT id, 1 AS value FROM generate_series(1,1) id); pgbench -T600 -P2 -n --file=master.sql postgres (update test set value = value;) start BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED; SELECT pg_sleep(value) FROM test; \watch 6 ---Autovacuum truncate pages at the end Result on replica: FATAL: terminating connection due to conflict with recovery DETAIL: User was holding a relation lock for too long. On Patched version lazy_vacuum_truncation passed without fatal errors. Only some times Error occurs because this tests is too synthetic ERROR: canceling statement due to conflict with recovery DETAIL: User was holding shared buffer pin for too long. Because of rising ResolveRecoveryConflictWithSnapshot while redo some visibility flags to avoid this conflict we can do test2 or increase max_standby_streaming_delay. Test2: Here we will do a load on the master and simulation of a long transaction on the replica (by taking LOCK on table) MASTERREPLICA hot_standby = on max_standby_streaming_delay = 1s hot_standby_feedback = on start CREATE TABLE test AS (SELECT id, 1 AS value FROM generate_series(1,1) id); pgbench -T600 -P2 -n --file=master.sql postgres (update test set value = value;) start BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED; LOCK TABLE test IN ACCESS SHARE MODE; select * from test; \watch 6 ---Autovacuum truncate pages at the end Result on replica: FATAL: terminating connection due to conflict with recovery DETAIL: User was holding a relation lock for too long. On Patched version lazy_vacuum_truncation passed without fatal errors. Test3: Here we do a load on the master and simulation of a long transaction with repeated 1 second SEQSCANS on the replica (by calling pg_sleep 1 second duration every 6 seconds). MASTERREPLICA hot_standby = on max_standby_streaming_delay = 4s hot_standby_feedback = on start CREATE TABLE test AS (SELECT id, 200 AS value FROM generate_series(1,1) id); pgbench -T600 -P2 -n --file=master.sql postgres (update test set value = value;) start BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED; SELECT pg_sleep(value) FROM test; ---Autovacuum truncate pages at the end Result on replica: FATAL: terminating connection due to conflict with recovery DETAIL: User was holding a relation lock for too long. On Patched version lazy_vacuum_truncation passed without fatal errors. This way we can make transactions with SEQSCAN, INDEXSCAN or BITMAPSCAN Patch2 - hsfeedback_noninvalide_xmin.patch When walsender is initialized, its xmin in PROCARRAY is set to GetOldestXmin() in order to prevent autovacuum running on master from truncating relation and removing some pages that are required by replica. This might happen if master's autovacuum and replica's query started simultaneously. And the replica has not yet reported its xmin value. How to test: Make async replica, turn on feedback, reduce max_standby_streaming_delay and aggressive
Re: Reopen logfile on SIGHUP
If there is already SIGUSR1 for logfile reopening then shouldn`t postmaster watch for it? Postmaster PID is easy to obtain. On 02/28/2018 01:32 AM, Greg Stark wrote: On 27 February 2018 at 14:41, Anastasia Lubennikovawrote: Small patch in the attachment implements logfile reopeninig on SIGHUP. It only affects the file accessed by logging collector, which name you can check with pg_current_logfile(). HUP will cause Postgres to reload its config files. That seems like a fine time to reopen the log files as well but it would be nice if there was also some way to get it to *just* do that and not reload the config files. I wonder if it would be easiest to just have the syslogger watch for some other signal as well (I'm guessing the the syslogger doesn't use relcache invalidations so it could reuse USR1 for example). That would be a bit inconvenient as the admins would have to find the syslogger and deliver the signal directly, rather than through the postmaster but it would be pretty easy for them. -- Grigory Smolkin Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Function to track shmem reinit time
It can be used to accurately calculate server uptime, since you can`t rely on pg_postmaster_start_time() in this. On 02/28/2018 03:11 PM, Anastasia Lubennikova wrote: Attached patch introduces a new function pg_shmem_init_time(), which returns the time shared memory was last (re)initialized. It is created for use by monitoring tools to track backend crashes. Currently, if the 'restart_after_crash' option is on, postgres will just restart. And the only way to know that it happened is to regularly parse logfile or monitor it, catching restart messages. This approach is really inconvenient for users, who have gigabytes of logs. This new function can be periodiacally called by a monitoring agent, and, if /shmem_init_time/ doesn't match /pg_postmaster_start_time,/ we know that server crashed-restarted, and also know the exact time, when. Also, working on this patch, I noticed a bit of dead code and some discordant comments in postmaster.c. I see no reason to leave it as is. So there is a small remove_dead_shmem_reinit_code_v0.patch. -- Anastasia Lubennikova Postgres Professional:http://www.postgrespro.com The Russian Postgres Company -- Grigory Smolkin Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Contention preventing locking
On Mon, Feb 26, 2018 at 8:26 PM, Konstantin Knizhnikwrote: > > On 26.02.2018 17:20, Amit Kapila wrote: >> >> Can you please explain, how it can be done easily without extra tuple >> locks? I have tried to read your patch but due to lack of comments, >> it is not clear what you are trying to achieve. As far as I can see >> you are changing the locktag passed to LockAcquireExtended by the >> first waiter for the transaction. How will it achieve the serial >> waiting protocol (queue the waiters for tuple) for a particular tuple >> being updated? >> > The idea of transaction lock chaining was very simple. I have explained it > in the first main in this thread. > Assumed that transaction T1 has updated tuple R1. > Transaction T2 also tries to update this tuple and so waits for T1 XID. > If then yet another transaction T3 also tries to update R1, then it should > wait for T2, not for T1. > Isn't this exactly we try to do via tuple locks (heap_acquire_tuplock)? Currently, T2 before waiting for T1 will acquire tuple lock on R1 and T3 will wait on T2 (not on T-1) to release the tuple lock on R-1 and similarly, the other waiters should form a queue and will be waked one-by-one. After this as soon T2 is waked up, it will release the lock on tuple and will try to fetch the updated tuple. Now, releasing the lock on tuple by T-2 will allow T-3 to also proceed and as T-3 was supposed to wait on T-1 (according to tuple satisfies API), it will immediately be released and it will also try to do the same work as is done by T-2. One of those will succeed and other have to re-fetch the updated-tuple again. I think in this whole process backends may need to wait multiple times either on tuple lock or xact lock. It seems the reason for these waits is that we immediately release the tuple lock (acquired by heap_acquire_tuplock) once the transaction on which we were waiting is finished. AFAICU, the reason for releasing the tuple lock immediately instead of at end of the transaction is that we don't want to accumulate too many locks as that can lead to the unbounded use of shared memory. How about if we release the tuple lock at end of the transaction unless the transaction acquires more than a certain threshold (say 10 or 50) of such locks in which case we will fall back to current strategy? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com