[HACKERS] pg9.4b1: unhelpful error message when creating a collation
sh lsb_release -d Description: Ubuntu 14.04 LTS Codename: trusty sh uname -a Linux sto 3.13.0-24-generic #47-Ubuntu SMP Fri May 2 23:30:00 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux sh locale -a C C.UTF-8 en_AG en_AG.utf8 en_AU.utf8 en_BW.utf8 en_CA.utf8 en_DK.utf8 en_GB.utf8 en_HK.utf8 en_IE.utf8 en_IN en_IN.utf8 en_NG en_NG.utf8 en_NZ.utf8 en_PH.utf8 en_SG.utf8 en_US.utf8 en_ZA.utf8 en_ZM en_ZM.utf8 en_ZW.utf8 fr_FR.utf8 POSIX # with postgresql 9.4 installed from apt.postgresql.org. sql SELECT VERSION(); PostgreSQL 9.4beta1 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit sql \dOS+ +- same list as above with locale sql SHOW lc_messages; en_US.UTF-8 # stuff which is working as expected: sql CREATE COLLATION test0 (locale='C'); CREATE COLLATION sql CREATE COLLATION test1 (locale='en_US.UTF-8'); CREATE COLLATION sql CREATE COLLATION test2 (locale='en_US.UTF8'); CREATE COLLATION sql CREATE COLLATION test3 (locale='Foo'); ERROR: could not create locale test3: No such file or directory DETAIL: The operating system could not find any locale data for the locale name Foo. # stuff which is incoherent: sql CREATE COLLATION french (locale='fr_FR.utf8'); ERROR: could not create locale fr_FR.utf8: Success The collation creation fails, not sure why yet. However, the error .. success message is especially unhelpful. Raising client_min_messages does not add more information. I could not get the same error with 9.3.4. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Priority table or Cache table
On 05/20/2014 01:46 PM, Fujii Masao wrote: On Mon, Mar 17, 2014 at 1:16 PM, Haribabu Kommi kommi.harib...@gmail.com wrote: ... I Implemented a proof of concept patch to see whether the buffer pool split can improve the performance or not. Summary of the changes: 1. The priority buffers are allocated as continuous to the shared buffers. 2. Added new reloption parameter called buffer_pool to specify the buffer_pool user wants the table to use. I'm not sure if storing the information of priority table into database is good because this means that it's replicated to the standby and the same table will be treated with high priority even in the standby server. I can imagine some users want to set different tables as high priority ones in master and standby. There might be a possibility to override this in postgresql.conf for optimising what you described but for most uses it is best to be in the database, at least to get started. Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Index-only scans for GIST
Hi, hackers! There are first results of my work on GSoC project Index-only scans for GIST. 1. Version of my project code is in forked repository https://github.com/lubennikovaav/postgres/tree/indexonlygist2 Patch is in attachments - This version is only for one-column indexes - fetch() method is realized only for box opclass (because it's trivial) 2. I test Index-only scans with SQL script box_test.sql and it works really faster. (results in box_test_out) I'll be glad to get your feedback about this feature. -- Best regards, Lubennikova Anastasia indexonlygist_2.0.patch Description: Binary data box_test.sql Description: Binary data box_test_out.out Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Window function optimisation, allow pushdowns of items matching PARTITION BY clauses
Hello David, sorry for the late response. I will try out your changes from the view of a user in mid-June. However, I can't do a trustworthy code review as I'm not an experienced postgre-hacker (yet). Best Regards Thomas Am 14.04.2014 13:19, schrieb David Rowley: On 14 April 2014 03:31, Tom Lane t...@sss.pgh.pa.us mailto:t...@sss.pgh.pa.us wrote: David Rowley dgrow...@gmail.com mailto:dgrow...@gmail.com writes: On this thread http://www.postgresql.org/message-id/52c6f712.6040...@student.kit.edu there was some discussion around allowing push downs of quals that happen to be in every window clause of the sub query. I've quickly put together a patch which does this (see attached) I think you should have check_output_expressions deal with this, instead. Compare the existing test there for non-DISTINCT output columns. I've moved the code there and it seems like a much better place for it. Thanks for the tip. Oh and I know that my function var_exists_in_all_query_partition_by_clauses has no business in allpaths.c, I'll move it out as soon as I find a better home for it. I might be wrong, but I think you could just embed that search loop in check_output_expressions, and it wouldn't be too ugly. I've changed the helper function to take the TargetEntry now the same as targetIsInSortList does for the hasDistinctOn test and I renamed the helper function to targetExistsInAllQueryPartitionByClauses. It seems much better, but there's probably a bit of room for improving on the names some more. I've included the updated patch with some regression tests. The final test I added tests that an unused window which would, if used, cause the pushdown not to take place still stops the pushdownf from happening... I know you both talked about this case in the other thread, but I've done nothing for it as Tom mentioned that this should likely be fixed elsewhere, if it's even worth worrying about at all. I'm not quite sure if I needed to bother including this test for it, but I did anyway, it can be removed if it is deemed unneeded. Per Thomas' comments, I added a couple of tests that ensure that the order of the columns in the partition by clause don't change the behaviour. Looking at the implementation of the actual code makes this test seems a bit unneeded as really but I thought that the test should not assume anything about the implementation so from that point of view the extra test seems like a good idea. For now I can't think of much else to do for the patch, but I'd really appreciate it Thomas if you could run it through the paces if you can find any time for it, or maybe just give my regression tests a once over to see if you can think of any other cases that should be covered. Regards David Rowley -- == Thomas Mayer Durlacher Allee 61 D-76131 Karlsruhe Telefon: +49-721-2081661 Fax: +49-721-72380001 Mobil: +49-174-2152332 E-Mail: thomas.ma...@student.kit.edu === -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_recvlogical not accepting -I to specify start LSN position
Hi all, As written in subject, pg_recvlogical does not work properly with option -I but it should: $ pg_recvlogical -I 0/0 pg_recvlogical: invalid option -- I Try pg_recvlogical --help for more information. $ pg_recvlogical --help | grep \-I -I, --startpos=PTR where in an existing slot should the streaming start Attached patch corrects that, reshuffling at the same time the option letters parsed with getopt_long in alphabetical order. Regards, -- Michael diff --git a/src/bin/pg_basebackup/pg_recvlogical.c b/src/bin/pg_basebackup/pg_recvlogical.c index 651cc40..33678d9 100644 --- a/src/bin/pg_basebackup/pg_recvlogical.c +++ b/src/bin/pg_basebackup/pg_recvlogical.c @@ -647,7 +647,7 @@ main(int argc, char **argv) } } - while ((c = getopt_long(argc, argv, f:F:nvd:h:o:p:U:wWP:s:S:, + while ((c = getopt_long(argc, argv, d:f:F:h:I:no:p:P:s:S:U:vwW, long_options, option_index)) != -1) { switch (c) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 btree index corruption
On 05/21/2014 10:22 PM, Jeff Janes wrote: Testing partial-write crash-recovery in 9.4 (e12d7320ca494fd05134847e30) with foreign keys, I found some btree index corruption. 28807 VACUUM 2014-05-21 15:33:46.878 PDT:ERROR: right sibling 4044 of block 460 is not next child 23513 of block 1264 in index foo_p_id_idx 28807 VACUUM 2014-05-21 15:33:46.878 PDT:STATEMENT: VACUUM; It took ~8 hours on 8 cores to encounter this problem. This is a single occurrence, it has not yet been reproduced. I don't know that the partial-writes, or the crash recovery, or the foreign key, parts of this test are important--it could be a more generic problem that only happened to be observed here. Nor do I know yet if it occurs in 9_3_STABLE. Below is the testing harness and the data directory (massively bloated at 3.7GB once uncompressed). It is currently in wrap-around shutdown, but that is the effect of persistent vacuum failures, not the cause of them. You can restart the data directory and it will repeat the above sibling error once autovac kicks in. I don't know if the bloat is due to the vacuum failure or if it was already in process before the failures started. I've cranked up the logging on that front future efforts. I'm using some fast-foward code on the xid consumption so that freezing occurs more often, and some people have expressed reservations that the code might be imperfect, and I can't rule that out as the cause (but I've never traced any other problems back to that code). But it did make it through 4 complete wraps before this problem was encountered, so if that is the problem it must be probabilistic rather than deterministic. https://drive.google.com/folderview?id=0Bzqrh1SO9FcENWd6ZXlwVWpxU0Eusp=sharing I downloaded the data directory and investigated. I got this message when I started it up: 20392 2014-05-25 05:51:37.835 PDT:ERROR: right sibling 4044 of block 460 is not next child 23513 of block 86458 in index foo_p_id_idx 20392 2014-05-25 05:51:37.835 PDT:CONTEXT: automatic vacuum of table jjanes.public.foo Interestingly, it's complaining about parent page 86458, while yours claimed it was 1264. I don't know why; perhaps a huge number of insertions happened after that error, causing the parent level pages to be split, moving the downlinks it complains about to the right. Did you continue running the test after that error occurred? This is what the tree looks like around those pages: Level 1: +-+ +-+ +-+ | Blk 1264| | Blk 160180 | | Blk 86458 | | | | | | | | Downlinks: | - | Downlinks: | - | Downlinks: | | ... | | ... | | 1269| | | | | | 460| | | | | |23513| +-+ +-+ +-+ Leaf level: +-+ +-+ +---+ +-+ | Blk 1269| | Blk 460 | | Blk 4044 | | Blk 23513 | | | - | | - | HALF_DEAD | - | | | (314 items) | | (empty) | | | | (212 items) | +-+ +-+ +---+ +-+ Leaf block 4044 is marked as half-dead, and there is no downlink to it on the parent block, 86458. That is normal, when a vacuum is interrupted during page deletion (in 9.4). The next vacuum ought to continue it. However, because block 460 is empty, the next vacuum that comes along will try to start the page deletion process on that block. Page deletion does a cross-check at the parent level, checking that the next downlink on the parent page points to the right sibling of the page we're about to delete. However, because the right sibling is already marked as half-dead, and the downlink pointing to it has already been removed, the cross-check fails. So, the sanity check is a bit overzealous. It's normal that the right sibling has no downlink, if it's marked as half-dead. The trivial fix is to just remove the cross-check, but I'd rather not do that because generally-speaking more sanity checks are good. I think we could fetch the right sibling page to see if it's marked as half-dead, and if it is, just give up on deleting the current page. This vacuum will eventually get to the right sibling page and finish the deletion, and the next vacuum will then be able to delete the empty page. Or we could finish the deletion of the right page immediately, but I'd like to keep the rarely-occurring corner-case codepaths to the minimum. With the attached patch, I was able to get past that error, but when VACUUM reaches the end, I got this: jjanes=# vacuum foo; ERROR: database is not accepting commands to avoid wraparound data loss in database jjanes HINT: Stop the postmaster and vacuum that database in single-user mode. You might also need to commit or roll back old prepared
[HACKERS] Could not finish anti-wraparound VACUUM when stop limit is reached
While debugging the B-tree bug that Jeff Janes reported (http://www.postgresql.org/message-id/CAMkU=1y=vwf07ay+cpqk_7fpihrctmssv9y99sbghitkxpb...@mail.gmail.com), a new issue came up: If you reach the xidStopLimit, and try to run VACUUM, it fails with error: jjanes=# vacuum; ERROR: database is not accepting commands to avoid wraparound data loss in database jjanes HINT: Stop the postmaster and vacuum that database in single-user mode. You might also need to commit or roll back old prepared transactions. The backtrace looks like this: #0 errstart (elevel=20, filename=0x9590a0 varsup.c, lineno=120, funcname=0x9593f0 __func__.10455 GetNewTransactionId, domain=0x0) at elog.c:249 #1 0x004f7c14 in GetNewTransactionId (isSubXact=0 '\000') at varsup.c:115 #2 0x004f86db in AssignTransactionId (s=0xd62900 TopTransactionStateData) at xact.c:510 #3 0x004f84a4 in GetCurrentTransactionId () at xact.c:382 #4 0x0062dc1c in vac_truncate_clog (frozenXID=1493663893, minMulti=1) at vacuum.c:909 #5 0x0062dc06 in vac_update_datfrozenxid () at vacuum.c:888 #6 0x0062cdf6 in vacuum (vacstmt=0x29e05e0, relid=0, do_toast=1 '\001', bstrategy=0x2a5cc38, for_wraparound=0 '\000', isTopLevel=1 '\001') at vacuum.c:294 #7 0x007a3c55 in standard_ProcessUtility (parsetree=0x29e05e0, queryString=0x29dfbf8 vacuum ;, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, dest=0x29e0988, completionTag=0x7fff9411a490 ) at utility.c:645 So, vac_truncate_clog() tries to get a new transaction ID, which fails because we've already reached the stop-limit. vac_truncate_clog() doesn't really need a new XID to be assigned, though, it only uses it to compare against datfrozenxid to see if wrap-around has already happened, so it could use ReadNewTransactionId() instead. Jeff's database seems to have wrapped around already, because after fixing the above, I get this: jjanes=# vacuum; WARNING: some databases have not been vacuumed in over 2 billion transactions DETAIL: You might have already suffered transaction-wraparound data loss. VACUUM We do not truncate clog when wraparound has already happened, so we never get past that point. Jeff advanced XID counter aggressively with some custom C code, so hitting the actual wrap-around is a case of don't do that. Still, the case is quite peculiar: pg_controldata says that nextXid is 4/1593661139. The oldest datfrozenxid is equal to that, 1593661139. So ISTM he managed to not just wrap around, but execute 2 billion more transactions after the wraparound and reach datfrozenxid again. I'm not sure how that happened. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg9.4b1: unhelpful error message when creating a collation
Hi, On 2014-05-25 09:17:24 +0200, Fabien COELHO wrote: sql CREATE COLLATION french (locale='fr_FR.utf8'); ERROR: could not create locale fr_FR.utf8: Success The collation creation fails, not sure why yet. However, the error .. success message is especially unhelpful. This seems to be a glibc bug. If a nonexistant locale has already been asked for errno is set to 0 instead of something sensible. Using a debugger it's possible to see that in an earlier setlocale() call errno is set correctly, but newlocale() then fails without setting errno. The culprit seems to be some caching in glibc's _nl_find_locale(). I am not entirely sure why it's not reliably triggered 9.4. Searching for that error turned up: https://sourceware.org/bugzilla/show_bug.cgi?id=14247 https://bugzilla.redhat.com/show_bug.cgi?id=827510 The latter by Tom Lane ;). Unfortunately not much seems to have happened since. Except badgering the glibc guys a bit, the only thing I can think of is to fudge errno = 0 into errno = ENOENT. Not nice. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [9.5] possible fast path for pinning a page multiple times
Context: A patch from a while ago was rejected: http://www.postgresql.org/message-id/1369886097.23418.0.camel@jdavis Most of the objection seemed to be that extra page pins might happen in some circumstances, such as this one mentioned by Heikki: http://www.postgresql.org/message-id/50fd11c5.1030...@vmware.com That was a single-threaded case, but did represent additional pins being acquired, which could add up to contention over the BufMappingPartitionLock if there is other traffic on that lock partition. Idea: Let's say we have a routine PinBufferTag, that's like PinBuffer but it takes an additional BufferTag argument. When it locks the buffer header, it would also compare the argument to the buffer's tag, and if they don't match, return a status indicating that it's the wrong buffer and don't pin it. In other words, it pins the buffer only if it's the right one. Then, we can just have a backend-local cache that maps BufferTag to buffer ID. If it's missing an entry, or if the entry is wrong, then it just proceeds with the normal BufferAlloc path. But if the cache holds the right value, then we completely bypass the BufMappingPartitionLock while getting the pin. Before I do too much performance testing of this, is it a correct approach? It seems too easy. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Could not finish anti-wraparound VACUUM when stop limit is reached
On 2014-05-25 11:40:09 -0400, Heikki Linnakangas wrote: So, vac_truncate_clog() tries to get a new transaction ID, which fails because we've already reached the stop-limit. vac_truncate_clog() doesn't really need a new XID to be assigned, though, it only uses it to compare against datfrozenxid to see if wrap-around has already happened, so it could use ReadNewTransactionId() instead. Right. But IIRC we need one in the vicinity anyway to write new pg_database et al rows? Jeff's database seems to have wrapped around already, because after fixing the above, I get this: jjanes=# vacuum; WARNING: some databases have not been vacuumed in over 2 billion transactions DETAIL: You might have already suffered transaction-wraparound data loss. VACUUM We do not truncate clog when wraparound has already happened, so we never get past that point. Jeff advanced XID counter aggressively with some custom C code, so hitting the actual wrap-around is a case of don't do that. Still, the case is quite peculiar: pg_controldata says that nextXid is 4/1593661139. The oldest datfrozenxid is equal to that, 1593661139. So ISTM he managed to not just wrap around, but execute 2 billion more transactions after the wraparound and reach datfrozenxid again. I'm not sure how that happened. I think that may be explained by Jeff's changes to varsup.c to use up xids more rapidly. If JJ_xid is = 100 it'll possibly jump right over xidStopLimit. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] assertion in 9.4 with wal_level=logical
On 2014-04-18 11:50:55 -0300, Alvaro Herrera wrote: Robert Haas wrote: On Thu, Apr 17, 2014 at 10:47 PM, Andres Freund and...@2ndquadrant.com wrote: It's this (older) assertion in HeapTupleHeaderGetCmax(): Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tup))); That can allocate memory if xmax is a multixact... Does anybody have a better idea to solve this than adding a CritSectionCount == 0 in there? Blech. Isn't that just nerfing the assertion? Well, that's exactly the point. Most of the time, HeapTupleHeaderGetCmax gets called in a non-critical section, and we want to run the assertion in that case. But it's not huge trouble if the assertion is not run in the rare case where HeapTupleHeaderGetCmax is being used to write a Xlog record. It's a bit painful that HeapTupleHeaderGetUpdateXid allocates memory, but to fix that we would have to remove all allocations from GetMultiXactIdMembers which doesn't sound feasible. Since nobody seemed to have a better idea I've proceeded in doing so... Not pretty. I've verified that the assertion could be triggered before, but not after by doing something like: S1: CREATE TABLE a(); BEGIN; SELECT * FROM pg_class WHERE relname = 'a' FOR SHARE; S2: BEGIN; SELECT * FROM pg_class WHERE relname = 'a' FOR SHARE; COMMIT; S1: ALTER TABLE a RENAME to b; ALTER TABLE b RENAME to a; -- this triggered the assertion That seems a tad too obscure for its own isolationtester test. Thanks for the report, Steve! Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PATCH: pgbench / int64 instead of int for xact count
Hi, I've been running a few longer pgbench tests (~week), and I've run into this: transaction type: SELECT only scaling factor: 1250 query mode: simple number of clients: 32 number of threads: 4 duration: 605000 s number of transactions actually processed: -1785047856 latency average: -10.846 ms tps = -2950.492090 (including connections establishing) tps = -2950.492325 (excluding connections establishing) The instance was doing ~10k tps for a week, which caused overflow of the int counter, used to track number of transactions. Hence the negative values. I think we've reached the time when hardeare capable of doing this is pretty common (SSDs, ...), so I think it's time to switch the counter to int64. Tomas diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index 6cc06d7..f5b3f60 100644 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c @@ -239,7 +239,7 @@ typedef struct typedef struct { instr_time conn_time; - int xacts; + int64 xacts; int64 latencies; int64 sqlats; int64 throttle_lag; @@ -2180,7 +2180,7 @@ process_builtin(char *tb) /* print out results */ static void -printResults(int ttype, int normal_xacts, int nclients, +printResults(int ttype, int64 normal_xacts, int nclients, TState *threads, int nthreads, instr_time total_time, instr_time conn_total_time, int64 total_latencies, int64 total_sqlats, @@ -2213,13 +2213,13 @@ printResults(int ttype, int normal_xacts, int nclients, if (duration = 0) { printf(number of transactions per client: %d\n, nxacts); - printf(number of transactions actually processed: %d/%d\n, + printf(number of transactions actually processed: %ld/%d\n, normal_xacts, nxacts * nclients); } else { printf(duration: %d s\n, duration); - printf(number of transactions actually processed: %d\n, + printf(number of transactions actually processed: %ld\n, normal_xacts); } @@ -2359,7 +2359,7 @@ main(int argc, char **argv) instr_time start_time; /* start up time */ instr_time total_time; instr_time conn_total_time; - int total_xacts = 0; + int64 total_xacts = 0; int64 total_latencies = 0; int64 total_sqlats = 0; int64 throttle_lag = 0; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_recvlogical not accepting -I to specify start LSN position
Hi, On 2014-05-25 22:35:24 +0900, Michael Paquier wrote: As written in subject, pg_recvlogical does not work properly with option -I but it should: $ pg_recvlogical -I 0/0 pg_recvlogical: invalid option -- I Try pg_recvlogical --help for more information. $ pg_recvlogical --help | grep \-I -I, --startpos=PTR where in an existing slot should the streaming start Good catch. Apparently only the long argument version ever worked... Attached patch corrects that, reshuffling at the same time the option letters parsed with getopt_long in alphabetical order. Hm. Not a big fan of this in isolation. In the attached patch I've reordered the options to all be ordered alphabetically, but only inside the section they are in --help. What do you think? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 85f54dafd13238ff831d1c66cddeaf071ad60708 Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Sun, 25 May 2014 18:47:05 +0200 Subject: [PATCH] Fix pg_recvlogical to accept the documented -I instead only --startpos. The bug was caused by omitting 'I:' from the short argument list to getopt_long(). To make similar bugs in the future less likely reorder options in --help, long and short option lists to be in the same, alphabetical within groups, order. Report and fix by Michael Paquier, some additional reordering by me. --- src/bin/pg_basebackup/pg_recvlogical.c | 50 +- 1 file changed, 25 insertions(+), 25 deletions(-) diff --git a/src/bin/pg_basebackup/pg_recvlogical.c b/src/bin/pg_basebackup/pg_recvlogical.c index 651cc40..cbbba9e 100644 --- a/src/bin/pg_basebackup/pg_recvlogical.c +++ b/src/bin/pg_basebackup/pg_recvlogical.c @@ -68,6 +68,8 @@ usage(void) printf(_( %s [OPTION]...\n), progname); printf(_(\nOptions:\n)); printf(_( -f, --file=FILEreceive log into this file. - for stdout\n)); + printf(_( -F --fsync-interval=SECS\n + frequency of syncs to the output file (default: %d)\n), (fsync_interval / 1000)); printf(_( -n, --no-loop do not loop on connection lost\n)); printf(_( -v, --verbose output verbose messages\n)); printf(_( -V, --version output version information, then exit\n)); @@ -80,8 +82,7 @@ usage(void) printf(_( -w, --no-password never prompt for password\n)); printf(_( -W, --password force password prompt (should happen automatically)\n)); printf(_(\nReplication options:\n)); - printf(_( -F --fsync-interval=SECS\n - frequency of syncs to the output file (default: %d)\n), (fsync_interval / 1000)); + printf(_( -I, --startpos=PTR where in an existing slot should the streaming start\n)); printf(_( -o, --option=NAME[=VALUE]\n specify option NAME with optional value VALUE, to be passed\n to the output plugin\n)); @@ -89,7 +90,6 @@ usage(void) printf(_( -s, --status-interval=SECS\n time between status packets sent to server (default: %d)\n), (standby_message_timeout / 1000)); printf(_( -S, --slot=SLOTuse existing replication slot SLOT instead of starting a new one\n)); - printf(_( -I, --startpos=PTR where in an existing slot should the streaming start\n)); printf(_(\nAction to be performed:\n)); printf(_( --create create a new replication slot (for the slotname see --slot)\n)); printf(_( --startstart streaming in a replication slot (for the slotname see --slot)\n)); @@ -600,6 +600,7 @@ main(int argc, char **argv) static struct option long_options[] = { /* general options */ {file, required_argument, NULL, 'f'}, + {fsync-interval, required_argument, NULL, 'F'}, {no-loop, no_argument, NULL, 'n'}, {verbose, no_argument, NULL, 'v'}, {version, no_argument, NULL, 'V'}, @@ -612,12 +613,11 @@ main(int argc, char **argv) {no-password, no_argument, NULL, 'w'}, {password, no_argument, NULL, 'W'}, /* replication options */ + {startpos, required_argument, NULL, 'I'}, {option, required_argument, NULL, 'o'}, {plugin, required_argument, NULL, 'P'}, {status-interval, required_argument, NULL, 's'}, - {fsync-interval, required_argument, NULL, 'F'}, {slot, required_argument, NULL, 'S'}, - {startpos, required_argument, NULL, 'I'}, /* action */ {create, no_argument, NULL, 1}, {start, no_argument, NULL, 2}, @@ -647,7 +647,7 @@ main(int argc, char **argv) } } - while ((c = getopt_long(argc, argv, f:F:nvd:h:o:p:U:wWP:s:S:, + while ((c = getopt_long(argc, argv, f:F:nvd:h:p:U:wWI:o:P:s:S:, long_options, option_index)) != -1) { switch (c) @@ -656,6 +656,15 @@ main(int argc, char **argv) case 'f': outfile = pg_strdup(optarg); break; + case 'F': +fsync_interval = atoi(optarg) * 1000; +if
Re: [HACKERS] PATCH: pgbench / int64 instead of int for xact count
Hi, On 2014-05-25 18:05:03 +0200, Tomas Vondra wrote: I've been running a few longer pgbench tests (~week), and I've run into this: number of transactions actually processed: -1785047856 latency average: -10.846 ms tps = -2950.492090 (including connections establishing) tps = -2950.492325 (excluding connections establishing) The instance was doing ~10k tps for a week, which caused overflow of the int counter, used to track number of transactions. Hence the negative values. I think we've reached the time when hardeare capable of doing this is pretty common (SSDs, ...), so I think it's time to switch the counter to int64. Especially when it's perfectly possible to do 500k read only transactions a second... printf(number of transactions per client: %d\n, nxacts); - printf(number of transactions actually processed: %d/%d\n, + printf(number of transactions actually processed: %ld/%d\n, normal_xacts, nxacts * nclients); That's not right though. On windows a long (indicated by the %l) is only 4 bytes wide. Check INT64_FORMAT. That's generated by configure/platform template files and should always be correct. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sending out a request for more buildfarm animals?
On 2014-05-25 01:02:25 +0200, Tomas Vondra wrote: On 14.5.2014 15:17, Andres Freund wrote: The cache invalidation bug was apparently fixed, but we're still getting failures (see for example markhor): http://www.pgbuildfarm.org/cgi-bin/show_history.pl?nm=markhorbr=HEAD I see there's a transaction (COMMIT+BEGIN) - is this caused by the extremely long runtimes? Yes, that's the reason. Normally the test doesn't trigger autovacuum at all, but if it's running for a *long* time it can. I haven't yet figured out a good way to deal with that. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [GSoC] Clustering in MADlib - status update
Hi, Here is my first report. You can also find it on my Gitlab [0]. Week 1 - 2014/05/25 For this first week, I have written a test script that generates some simple datasets, and produces an image containing the output of the MADlib clustering algorithms. This script can be called like this: ./clustering_test.py new ds0 -n 8 # generates a dataset called ds0 with 8 clusters ./clustering_test.py query ds0 -o output.png # outputs the result of the clustering algorithms applied to ds0 in output.png See ./clustering_test.py -h for all the available options. An example of output can be found here [1].http://git.viod.eu/viod/gsoc_2014/blob/master/clustering_test/example_dataset.png Of course, I will keep improving this test script, as it is still far from perfect; but for now, it does approximately what I want. For next week, I'll start working on the implementation of k-medoids in MADlib. As a reminder, according to the timeline I suggested for the project, this step must be done on May 30. Depending on the problems I will face (mostly lack of knowledge of the codebase, I guess), this might not be finished on time, but it should be done a few days later (by the end of next week, hopefully). Attached is the patch containing everything I have done this week, though the git log might be more convenient to read. Regards, Maxence A. [0] http://git.viod.eu/viod/gsoc_2014/blob/master/reports.rst [1] http://git.viod.eu/viod/gsoc_2014/blob/master/clustering_test/example_dataset.png -- Maxence Ahlouche 06 06 66 97 00 diff --git a/.gitignore b/.gitignore new file mode 100644 index 000..97de20e --- /dev/null +++ b/.gitignore @@ -0,0 +1,2 @@ +***/__pycache__/ +**.png \ No newline at end of file diff --git a/autogen_results.py b/autogen_results.py deleted file mode 100755 index 033c309..000 --- a/autogen_results.py +++ /dev/null @@ -1,6 +0,0 @@ -#!/usr/bin/python - -import os - -while(True): -os.system(./k-means_test.py --regen -o results/$(date | md5sum | cut -d ' ' -f 1).png) diff --git a/clustering_test.py b/clustering_test.py deleted file mode 100755 index 2afc0d1..000 --- a/clustering_test.py +++ /dev/null @@ -1,35 +0,0 @@ -#!/usr/bin/env python3 - -import argparse -import psycopg2 as pg -import sys - - -class DatabaseConnection(): -db_name = 'madlib' -user = 'madlib' -host = 'localhost' -port = 5432 -table_name = 'tmp_points' -field_name = 'coords' - -def __init__(self): -self.conn = pg.connect(database=self.db_name, user=self.user, host=self.host, port=5432) -self.cur = self.conn.cursor() -self.cur.execute('DROP TABLE IF EXISTS %s CASCADE;' % self.table_name) -self.cur.execute('CREATE TABLE %s (id SERIAL PRIMARY KEY, coords INT[]);' % self.table_name) -self.conn.commit() - -def __del__(self): -self.cur.close() -self.conn.close() - - -def main(args): -parser = argparse.ArgumentParser(description='Visualize output of the clustering algorithms provided by MADlib, in PostgreSQL.') -parser.add_argument('-n', metavar='number of clusters', type=int) - -dc = DatabaseConnection() - -if __name__ == '__main__': -main(sys.argv[1:]) diff --git a/clustering_test/autogen_results.py b/clustering_test/autogen_results.py new file mode 100755 index 000..033c309 --- /dev/null +++ b/clustering_test/autogen_results.py @@ -0,0 +1,6 @@ +#!/usr/bin/python + +import os + +while(True): +os.system(./k-means_test.py --regen -o results/$(date | md5sum | cut -d ' ' -f 1).png) diff --git a/clustering_test/clustering_test.py b/clustering_test/clustering_test.py new file mode 100755 index 000..248b5cf --- /dev/null +++ b/clustering_test/clustering_test.py @@ -0,0 +1,63 @@ +#!/usr/bin/env python3 + +import argparse + +import database as db +import dataset_generator as ds +import visualiser as vs + + +if __name__ == '__main__': +parser = argparse.ArgumentParser( +description='Visualize output of the clustering algorithms provided by ' +'MADlib, in PostgreSQL. You should start by adding a dataset. You need ' +'a PostgreSQL running.') +subparsers = parser.add_subparsers(help='subparsers help', dest='action') + +parser_dataset = subparsers.add_parser('new', help='generate a dataset') +parser_dataset.add_argument( +'dataset_name', +help='the name of the dataset to create', +) +parser_dataset.add_argument( +'-n', +'--nb_clusters', +type=int, +help='the number of clusters composing the new dataset. Defaults to a ' +'random value between 2 and 10.', +) +parser_dataset.add_argument( +'-d', +'--distribution', +choices = ds.gen_cluster.keys(), +help='the distribution of the points in the clusters. Defaults to ' +'uniform.', +) + +parser_query = subparsers.add_parser('query', help='apply clustering algorithms on a dataset') +parser_query.add_argument( +
Re: [HACKERS] pg_upgrade fails: Mismatch of relation OID in database 8.4 - 9.3
Hi, On 2014-05-23 08:23:57 -0600, Jeff Ross wrote: UDB=# \x Expanded display is on. UDB=# SELECT attrelid::regclass, attname, attnum, attlen, * FROM pg_attribute WHERE attrelid = 'masterairportlist'::regclass ORDER BY attnum ASC; UDB=# [ RECORD 1 ]-+-- ... A quick sum over the returned values seems to indicate that it's too large to not have a toast table. Adding up attlen and atttypmod gives a value of 1283. Considering that there additionally has to be VARHDRSZ space for the varlena header and that the maximum storage size for a varchar(n) is n * pg_encoding_max_length(GetDatabaseEncoding()) (usually 4) this seems to indicate that bad things[tm] have been done to the database. I suggest you write a script that does a 'ALTER TABLE $tbl ADD COLUMN toastme text; ALTER TABLE $tbl DROP COLUMN toastme' for all tables. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg9.4b1: unhelpful error message when creating a collation
Andres Freund and...@2ndquadrant.com writes: On 2014-05-25 09:17:24 +0200, Fabien COELHO wrote: sql CREATE COLLATION french (locale='fr_FR.utf8'); ERROR: could not create locale fr_FR.utf8: Success This seems to be a glibc bug. If a nonexistant locale has already been asked for errno is set to 0 instead of something sensible. Searching for that error turned up: https://sourceware.org/bugzilla/show_bug.cgi?id=14247 https://bugzilla.redhat.com/show_bug.cgi?id=827510 The latter by Tom Lane ;). Unfortunately not much seems to have happened since. Yeah. Red Hat seems to be carrying a patch (the RH bugzilla says it was fixed in Fedora 17, and I see correct behavior in current RHEL6) but if the sourceware bug reflects the state of affairs upstream then other distros may still be suffering from the bug. Except badgering the glibc guys a bit, the only thing I can think of is to fudge errno = 0 into errno = ENOENT. Not nice. Yeah, I don't especially care for adding such a workaround. If the consequences were worse than a wrong error message, then maybe --- but adding the workaround might itself sometimes mean a wrong error message, so it's not clear we'd be making much forward progress. Better to pester the glibc boys to fix it properly. In the meantime, you might have some luck asking the Ubuntu packagers to copy Red Hat's fix. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.5] possible fast path for pinning a page multiple times
Jeff Davis pg...@j-davis.com writes: Idea: Let's say we have a routine PinBufferTag, that's like PinBuffer but it takes an additional BufferTag argument. When it locks the buffer header, it would also compare the argument to the buffer's tag, and if they don't match, return a status indicating that it's the wrong buffer and don't pin it. In other words, it pins the buffer only if it's the right one. Then, we can just have a backend-local cache that maps BufferTag to buffer ID. If it's missing an entry, or if the entry is wrong, then it just proceeds with the normal BufferAlloc path. But if the cache holds the right value, then we completely bypass the BufMappingPartitionLock while getting the pin. Before I do too much performance testing of this, is it a correct approach? It seems too easy. Don't see why it wouldn't work. I think you'd need to test buffer status not just the contents of the tag field, but that's easy enough. Bigger issues are how large we'd want to allow the local cache to get (and what to do to keep it within that) and how much performance we really gain. But those are tuning questions not correctness questions. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: pgbench / int64 instead of int for xact count
On 25.5.2014 19:05, Andres Freund wrote: printf(number of transactions per client: %d\n, nxacts); -printf(number of transactions actually processed: %d/%d\n, +printf(number of transactions actually processed: %ld/%d\n, normal_xacts, nxacts * nclients); That's not right though. On windows a long (indicated by the %l) is only 4 bytes wide. Check INT64_FORMAT. That's generated by configure/platform template files and should always be correct. Oh, right. v2 of the patch attached. Tomas diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index 6cc06d7..2f586da 100644 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c @@ -239,7 +239,7 @@ typedef struct typedef struct { instr_time conn_time; - int xacts; + int64 xacts; int64 latencies; int64 sqlats; int64 throttle_lag; @@ -2180,7 +2180,7 @@ process_builtin(char *tb) /* print out results */ static void -printResults(int ttype, int normal_xacts, int nclients, +printResults(int ttype, int64 normal_xacts, int nclients, TState *threads, int nthreads, instr_time total_time, instr_time conn_total_time, int64 total_latencies, int64 total_sqlats, @@ -2213,13 +2213,13 @@ printResults(int ttype, int normal_xacts, int nclients, if (duration = 0) { printf(number of transactions per client: %d\n, nxacts); - printf(number of transactions actually processed: %d/%d\n, + printf(number of transactions actually processed: INT64_FORMAT /%d\n, normal_xacts, nxacts * nclients); } else { printf(duration: %d s\n, duration); - printf(number of transactions actually processed: %d\n, + printf(number of transactions actually processed: INT64_FORMAT \n, normal_xacts); } @@ -2359,7 +2359,7 @@ main(int argc, char **argv) instr_time start_time; /* start up time */ instr_time total_time; instr_time conn_total_time; - int total_xacts = 0; + int64 total_xacts = 0; int64 total_latencies = 0; int64 total_sqlats = 0; int64 throttle_lag = 0; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: pgbench / int64 instead of int for xact count
Tomas Vondra t...@fuzzy.cz writes: On 25.5.2014 19:05, Andres Freund wrote: That's not right though. On windows a long (indicated by the %l) is only 4 bytes wide. Check INT64_FORMAT. That's generated by configure/platform template files and should always be correct. Oh, right. v2 of the patch attached. Hm ... if normal_xacts can overflow an int, shouldn't we expect that the product nxacts * nclients could? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Replacement for OSSP-UUID for Linux and BSD
Matteo Beccati p...@beccati.com writes: here's the latest version of my uuid changes patch, according to proposal (2) from Tom in the thread about OSSP-UUID[1]. Hmm ... this is not actually what I had in mind. Unless I'm misreading the patch, this nukes the uuid-ossp extension entirely in favor of a new extension uuid (providing the same SQL functions with a different underlying implementation). I don't think this works from the standpoint of providing compatibility for users of the existing extension. In particular, it'd break pg_upgrade (because of the change of the .so library name) as well as straight pg_dump upgrades (which would expect CREATE EXTENSION uuid-ossp to work). Not to mention application code that might expect CREATE EXTENSION uuid-ossp to still work. Another objection is that for people for whom OSSP uuid still works fine, this is forcing them to adopt a new implementation whose compatibility is as yet unproven. What I'd rather do is preserve contrib/uuid-ossp with the same extension and .so name, but with two configure options that select different underlying implementations. In the long run it'd be nice to migrate away from the uuid-ossp extension name, mostly because of the poorly-chosen use of a dash in the name. But I'm not sure how we do that without breaking backwards compatibility, and anyway it's an entirely cosmetic thing that we can worry about later. Anyhow, doing it like that seems like it ought to be a pretty straightforward refactoring of your patch. I could pursue that, or you can. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: pgbench / int64 instead of int for xact count
On 25.5.2014 20:32, Tom Lane wrote: Tomas Vondra t...@fuzzy.cz writes: On 25.5.2014 19:05, Andres Freund wrote: That's not right though. On windows a long (indicated by the %l) is only 4 bytes wide. Check INT64_FORMAT. That's generated by configure/platform template files and should always be correct. Oh, right. v2 of the patch attached. Hm ... if normal_xacts can overflow an int, shouldn't we expect that the product nxacts * nclients could? Maybe, but I saw that as a separate thing, mostly unrelated to the 'hardware got so fast it can overflow' problem. Because nxacts is the parameter we use to define duration, so if it din't overflow in the past, it won't overflow today. OTOH, thanks to hardware improvements we tend to use more clients / higher number of transactions, so fixing this seems like a good idea. Should we change the type of nxacts to int64 (thus allowing '-t' with 64-bit integers), or just the overflow in the printf call? I don't find it very practical to use -t with values not fitting into 32-bits (the -T seems better to do that), so I'm inclined to just fix the printf. Patch v3 attached. Tomas diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index 6cc06d7..cf71c73 100644 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c @@ -239,7 +239,7 @@ typedef struct typedef struct { instr_time conn_time; - int xacts; + int64 xacts; int64 latencies; int64 sqlats; int64 throttle_lag; @@ -2180,7 +2180,7 @@ process_builtin(char *tb) /* print out results */ static void -printResults(int ttype, int normal_xacts, int nclients, +printResults(int ttype, int64 normal_xacts, int nclients, TState *threads, int nthreads, instr_time total_time, instr_time conn_total_time, int64 total_latencies, int64 total_sqlats, @@ -2213,13 +2213,13 @@ printResults(int ttype, int normal_xacts, int nclients, if (duration = 0) { printf(number of transactions per client: %d\n, nxacts); - printf(number of transactions actually processed: %d/%d\n, - normal_xacts, nxacts * nclients); + printf(number of transactions actually processed: INT64_FORMAT / INT64_FORMAT \n, + normal_xacts, (int64)nxacts * nclients); } else { printf(duration: %d s\n, duration); - printf(number of transactions actually processed: %d\n, + printf(number of transactions actually processed: INT64_FORMAT \n, normal_xacts); } @@ -2359,7 +2359,7 @@ main(int argc, char **argv) instr_time start_time; /* start up time */ instr_time total_time; instr_time conn_total_time; - int total_xacts = 0; + int64 total_xacts = 0; int64 total_latencies = 0; int64 total_sqlats = 0; int64 throttle_lag = 0; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] IMPORT FOREIGN SCHEMA statement
On Fri, May 23, 2014 at 10:08:06PM +0200, Ronan Dunklau wrote: Hello, Since my last proposal didn't get any strong rebuttal, please find attached a more complete version of the IMPORT FOREIGN SCHEMA statement. Thanks! Please to send future patches to this thread so people can track them in their mail. I tried to follow the SQL-MED specification as closely as possible. This adds discoverability to foreign servers. The structure of the statement as I understand it is simple enough: IMPORT FOREIGN SCHEMA remote_schema FROM SERVER some_server [ (LIMIT TO | EXCEPT) table_list ] INTO local_schema. The import_foreign_schema patch adds the infrastructure, and a new FDW routine: typedef List *(*ImportForeignSchema_function) (ForeignServer *server, ImportForeignSchemaStmt * parsetree); This routine must return a list of CreateForeignTableStmt mirroring whatever tables were found on the remote side, which will then be executed. The import_foreign_schema_postgres_fdw patch proposes an implementation of this API for postgres_fdw. It will import a foreign schema using the right types as well as nullable information. In the case of PostgreSQL, the right types are obvious until there's a user-defined one. What do you plan to do in that case? Regarding documentation, I don't really know where it should have been put. If I missed something, let me know and I'll try to correct it. It's not exactly something you missed, but I need to bring it up anyway before we go too far. The standard missed two crucial concepts when this part of it was written: 1. No single per-database-type universal type mapping can be correct. People will have differing goals for type mapping, and writing a whole new FDW for each of those goals is, to put it mildly, wasteful. I will illustrate with a concrete and common example. MySQL's datetime type encourages usages which PostgreSQL's corresponding type, timestamptz, simply disallows, namely '-00-00 00:00:00' as its idea of UNKNOWN or NULL. One way PostgreSQL's mapping could work is to map it to TEXT, which would preserve the strings exactly and be in some sense an identity map. It would also make the type somewhat useless in its original intended form. Another one would map the type is to a composite data type mysql_datetime(tstz timestamptz, is_wacky boolean) which would capture, for example, ('2014-04-01 00:00:00+00', false) for the UTC start of April Fools' Day this year, and (NULL, true) for '-00-00 00:00:00'. There are doubtless others, and there is no principled way to assign any one of them as universally correct. This brings me to the next crucial concept the standard missed: 2. The correct mapping may not be the identity, and furthermore, the inbound and outbound mappings might in general not be mere inversions of each other. MySQL (no aspersions intended) again provides a concrete example with its unsigned integer types. Yes, it's possible to create a domain over INT8 which simulates UINT4, a domain over NUMERIC which simulates UINT8, etc., but again this process's correctness depends on circumstances. To address these problems, I propose the following: - We make type mappings settable at the level of: - FDW - Instance (a.k.a. cluster) - Database - Schema - Table - Column using the existing ALTER command and some way of spelling out how a remote type maps to a local type. This would consist of: - The remote type - The local type to which it maps - The inbound transformation (default identity) - The outbound transformation (default identity) At any given level, the remote type would need to be unique. To communicate this to the system, we either invent new syntax, with all the hazards attendant thereto, or we could use JSON or similar serialization. ALTER FOREIGN TABLE foo ADD TYPE MAPPING FROM datetime TO TEXT WITH ( INBOUND TRANSFORMATION IDENTITY, OUTBOUND TRANSFORMATION IDENTITY ) /* Ugh!!! */ vs. ALTER FOREIGN TABLE foo ADD (mapping '{ datetime: text, inbound: IDENTITY, outbound: IDENTITY }') Each FDW would have some set of default mappings and some way to override them as above. What say? Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: pgbench / int64 instead of int for xact count
Tomas Vondra t...@fuzzy.cz writes: Should we change the type of nxacts to int64 (thus allowing '-t' with 64-bit integers), or just the overflow in the printf call? I don't find it very practical to use -t with values not fitting into 32-bits (the -T seems better to do that), so I'm inclined to just fix the printf. I concur: I don't see people setting -t higher than INT_MAX, but a -t times -c product exceeding INT_MAX seems more plausible. Patch v3 attached. Looks good, pushed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sending out a request for more buildfarm animals?
Andres Freund and...@2ndquadrant.com writes: On 2014-05-25 01:02:25 +0200, Tomas Vondra wrote: The cache invalidation bug was apparently fixed, but we're still getting failures (see for example markhor): http://www.pgbuildfarm.org/cgi-bin/show_history.pl?nm=markhorbr=HEAD I see there's a transaction (COMMIT+BEGIN) - is this caused by the extremely long runtimes? Yes, that's the reason. Normally the test doesn't trigger autovacuum at all, but if it's running for a *long* time it can. I haven't yet figured out a good way to deal with that. Any way to make the test print only WAL entries arising from the foreground transaction? If an autovac run can trigger this failure, then I would think it would happen sometimes, probabilistically, even when the test runtime wasn't all that long. That would be very unhappy-making, eg for packagers who would like build runs to reliably work the first time. So I think this is important even without the desire to run CLOBBER_CACHE regression tests. Another idea is to provide a variant expected file, but that seems a bit fragile: if you can get one extra transaction, why not two? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] IMPORT FOREIGN SCHEMA statement
Le dimanche 25 mai 2014 12:41:18 David Fetter a écrit : On Fri, May 23, 2014 at 10:08:06PM +0200, Ronan Dunklau wrote: Hello, Since my last proposal didn't get any strong rebuttal, please find attached a more complete version of the IMPORT FOREIGN SCHEMA statement. Thanks! Please to send future patches to this thread so people can track them in their mail. I'll do. I didn't for the previous one because it was a few months ago, and no patch had been added to the commit fest. I tried to follow the SQL-MED specification as closely as possible. This adds discoverability to foreign servers. The structure of the statement as I understand it is simple enough: IMPORT FOREIGN SCHEMA remote_schema FROM SERVER some_server [ (LIMIT TO | EXCEPT) table_list ] INTO local_schema. The import_foreign_schema patch adds the infrastructure, and a new FDW routine: typedef List *(*ImportForeignSchema_function) (ForeignServer *server, ImportForeignSchemaStmt * parsetree); This routine must return a list of CreateForeignTableStmt mirroring whatever tables were found on the remote side, which will then be executed. The import_foreign_schema_postgres_fdw patch proposes an implementation of this API for postgres_fdw. It will import a foreign schema using the right types as well as nullable information. In the case of PostgreSQL, the right types are obvious until there's a user-defined one. What do you plan to do in that case ? The current implementation fetches the types as regtype, and when receiving a custom type, two things can happen: - the type is defined locally: everything will work as expected - the type is not defined locally: the conversion function will fail, and raise an error of the form: ERROR: type schema.typname does not exist Should I add that to the regression test suite ? Regarding documentation, I don't really know where it should have been put. If I missed something, let me know and I'll try to correct it. It's not exactly something you missed, but I need to bring it up anyway before we go too far. The standard missed two crucial concepts when this part of it was written: 1. No single per-database-type universal type mapping can be correct. People will have differing goals for type mapping, and writing a whole new FDW for each of those goals is, to put it mildly, wasteful. I will illustrate with a concrete and common example. MySQL's datetime type encourages usages which PostgreSQL's corresponding type, timestamptz, simply disallows, namely '-00-00 00:00:00' as its idea of UNKNOWN or NULL. One way PostgreSQL's mapping could work is to map it to TEXT, which would preserve the strings exactly and be in some sense an identity map. It would also make the type somewhat useless in its original intended form. Another one would map the type is to a composite data type mysql_datetime(tstz timestamptz, is_wacky boolean) which would capture, for example, ('2014-04-01 00:00:00+00', false) for the UTC start of April Fools' Day this year, and (NULL, true) for '-00-00 00:00:00'. There are doubtless others, and there is no principled way to assign any one of them as universally correct. This brings me to the next crucial concept the standard missed: 2. The correct mapping may not be the identity, and furthermore, the inbound and outbound mappings might in general not be mere inversions of each other. MySQL (no aspersions intended) again provides a concrete example with its unsigned integer types. Yes, it's possible to create a domain over INT8 which simulates UINT4, a domain over NUMERIC which simulates UINT8, etc., but again this process's correctness depends on circumstances. To address these problems, I propose the following: - We make type mappings settable at the level of: - FDW - Instance (a.k.a. cluster) - Database - Schema - Table - Column using the existing ALTER command and some way of spelling out how a remote type maps to a local type. This would consist of: - The remote type - The local type to which it maps - The inbound transformation (default identity) - The outbound transformation (default identity) At any given level, the remote type would need to be unique. To communicate this to the system, we either invent new syntax, with all the hazards attendant thereto, or we could use JSON or similar serialization. ALTER FOREIGN TABLE foo ADD TYPE MAPPING FROM datetime TO TEXT WITH ( INBOUND TRANSFORMATION IDENTITY, OUTBOUND TRANSFORMATION IDENTITY ) /* Ugh!!! */ vs. ALTER FOREIGN TABLE foo ADD (mapping '{ datetime: text, inbound: IDENTITY, outbound: IDENTITY }') Each FDW would have some set of default mappings and some way to override them as
Re: [HACKERS] 9.4 btree index corruption
On Sun, May 25, 2014 at 7:16 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 05/21/2014 10:22 PM, Jeff Janes wrote: Testing partial-write crash-recovery in 9.4 (e12d7320ca494fd05134847e30) with foreign keys, I found some btree index corruption. ... https://drive.google.com/folderview?id=0Bzqrh1SO9FcENWd6ZXlwVWpxU0E; usp=sharing I downloaded the data directory and investigated. I got this message when I started it up: 20392 2014-05-25 05:51:37.835 PDT:ERROR: right sibling 4044 of block 460 is not next child 23513 of block 86458 in index foo_p_id_idx 20392 2014-05-25 05:51:37.835 PDT:CONTEXT: automatic vacuum of table jjanes.public.foo Interestingly, it's complaining about parent page 86458, while yours claimed it was 1264. I don't know why; perhaps a huge number of insertions happened after that error, causing the parent level pages to be split, moving the downlinks it complains about to the right. Did you continue running the test after that error occurred? Yes, I didn't set it up to abort upon vacuum errors, so it continued to run until it reached the 1,000,000 wrap around limit. Between when the vacuums started failing and when it reached wrap around limit, it seemed to run normally (other than the increasing bloat and non-advancement of frozenxid). I only noticed the ERROR when I was looking through the log file in postmortem. Looking back, I see 10 errors with 1264, then it switched to 86458. I didn't notice the change and reported only the first ERROR message. I'll apply your patch and see what happens, but 90 further hours of testing gave no more occurrences of the error so I'm afraid that not seeing errors is not much evidence that the error was fixed. So I'll have to rely on your knowledge of the code. This is what the tree looks like around those pages: Level 1: +-+ +-+ +-+ | Blk 1264| | Blk 160180 | | Blk 86458 | | | | | | | | Downlinks: | - | Downlinks: | - | Downlinks: | | ... | | ... | | 1269| | | | | | 460| | | | | |23513| +-+ +-+ +-+ Is this done with the pageinspect extension? With the attached patch, I was able to get past that error, but when VACUUM reaches the end, I got this: jjanes=# vacuum foo; ERROR: database is not accepting commands to avoid wraparound data loss in database jjanes HINT: Stop the postmaster and vacuum that database in single-user mode. You might also need to commit or roll back old prepared transactions. That seems like an unrelated issue, I'll start a new thread about that. Yeah, that's expected. Since vacuum failures don't abort the test, it ran on until something else stopped it. Many thanks for the testing, Jeff! You're welcome. If only I was as good at fixing things as at breaking them. Cheers, Jeff
[HACKERS] Spreading full-page writes
Here's an idea I tried to explain to Andres and Simon at the pub last night, on how to reduce the spikes in the amount of WAL written at beginning of a checkpoint that full-page writes cause. I'm just writing this down for the sake of the archives; I'm not planning to work on this myself. When you are replaying a WAL record that lies between the Redo-pointer of a checkpoint and the checkpoint record itself, there are two possibilities: a) You started WAL replay at that checkpoint's Redo-pointer. b) You started WAL replay at some earlier checkpoint, and are already in a consistent state. In case b), you wouldn't need to replay any full-page images, normal differential WAL records would be enough. In case a), you do, and you won't be consistent until replaying all the WAL up to the checkpoint record. We can exploit those properties to spread out the spike. When you modify a page and you're about to write a WAL record, check if the page has the BM_CHECKPOINT_NEEDED flag set. If it does, compare the LSN of the page against the *previous* checkpoints redo-pointer, instead of the one's that's currently in-progress. If no full-page image is required based on that comparison, IOW if the page was modified and a full-page image was already written after the earlier checkpoint, write a normal WAL record without full-page image and set a new flag in the buffer header (BM_NEEDS_FPW). Also set a new flag on the WAL record, XLR_FPW_SKIPPED. When checkpointer (or any other backend that needs to evict a buffer) is about to flush a page from the buffer cache that has the BM_NEEDS_FPW flag set, write a new WAL record, containing a full-page-image of the page, before flushing the page. Here's how this works out during replay: a) You start WAL replay from the latest checkpoint's Redo-pointer. When you see a WAL record that's been marked with XLR_FPW_SKIPPED, don't replay that record at all. It's OK because we know that there will be a separate record containing the full-page image of the page later in the stream. b) You are continuing WAL replay that started from an earlier checkpoint, and have already reached consistency. When you see a WAL record that's been marked with XLR_FPW_SKIPPED, replay it normally. It's OK, because the flag means that the page was modified after the earlier checkpoint already, and hence we must have seen a full-page image of it already. When you see one of the WAL records containing a separate full-page-image, ignore it. This scheme make the b-case behave just as if the new checkpoint was never started. The regular WAL records in the stream are identical to what they would've been if the redo-pointer pointed to the earlier checkpoint. And the additional FPW records are simply ignored. In the a-case, it's not be safe to replay the records marked with XLR_FPW_SKIPPED, because they don't contain FPWs, and you have all the usual torn-page hazards that comes with that. However, the separate FPW records that come later in the stream will fix-up those pages. Now, I'm sure there are issues with this scheme I haven't thought about, but I wanted to get this written down. Note this does not reduce the overall WAL volume - on the contrary - but it ought to reduce the spike. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 btree index corruption
On 05/25/2014 05:45 PM, Jeff Janes wrote: On Sun, May 25, 2014 at 7:16 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 05/21/2014 10:22 PM, Jeff Janes wrote: Testing partial-write crash-recovery in 9.4 (e12d7320ca494fd05134847e30) with foreign keys, I found some btree index corruption. ... https://drive.google.com/folderview?id=0Bzqrh1SO9FcENWd6ZXlwVWpxU0E; usp=sharing I downloaded the data directory and investigated. I got this message when I started it up: 20392 2014-05-25 05:51:37.835 PDT:ERROR: right sibling 4044 of block 460 is not next child 23513 of block 86458 in index foo_p_id_idx 20392 2014-05-25 05:51:37.835 PDT:CONTEXT: automatic vacuum of table jjanes.public.foo Interestingly, it's complaining about parent page 86458, while yours claimed it was 1264. I don't know why; perhaps a huge number of insertions happened after that error, causing the parent level pages to be split, moving the downlinks it complains about to the right. Did you continue running the test after that error occurred? Yes, I didn't set it up to abort upon vacuum errors, so it continued to run until it reached the 1,000,000 wrap around limit. Between when the vacuums started failing and when it reached wrap around limit, it seemed to run normally (other than the increasing bloat and non-advancement of frozenxid). I only noticed the ERROR when I was looking through the log file in postmortem. Looking back, I see 10 errors with 1264, then it switched to 86458. I didn't notice the change and reported only the first ERROR message. Ok, that explains the difference in the error message. I'll apply your patch and see what happens, but 90 further hours of testing gave no more occurrences of the error so I'm afraid that not seeing errors is not much evidence that the error was fixed. So I'll have to rely on your knowledge of the code. I've pushed the fix. This is what the tree looks like around those pages: Level 1: +-+ +-+ +-+ | Blk 1264| | Blk 160180 | | Blk 86458 | | | | | | | | Downlinks: | - | Downlinks: | - | Downlinks: | | ... | | ... | | 1269| | | | | | 460| | | | | |23513| +-+ +-+ +-+ Is this done with the pageinspect extension? Yeah, with some more tooling on top of it. The elegant diagram I drew by hand. Thanks again! BTW, it would be useful to keep the WAL logs for as long as you have disk space. I'm not sure if that would've helped in this case, but it can be really useful to see all the actions that were done to the page before something goes wrong. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Could not finish anti-wraparound VACUUM when stop limit is reached
On Sunday, May 25, 2014, Heikki Linnakangas hlinnakan...@vmware.comjavascript:_e(%7B%7D,'cvml','hlinnakan...@vmware.com'); wrote: While debugging the B-tree bug that Jeff Janes reported ( http://www.postgresql.org/message-id/CAMkU=1y=VwF07Ay+Cpqk_ 7fpihrctmssv9y99sbghitkxpb...@mail.gmail.com), a new issue came up: If you reach the xidStopLimit, and try to run VACUUM, it fails with error: jjanes=# vacuum; ERROR: database is not accepting commands to avoid wraparound data loss in database jjanes HINT: Stop the postmaster and vacuum that database in single-user mode. You might also need to commit or roll back old prepared transactions. This problem also afflicted me in 9.3 and 9.2 (and probably existed further back too). I figured it was mostly a barrier to more effective testing, but it would be nice to have it fixed. But I don't understand how you encountered this. I only ran into it when the vacuum had already been started, but not yet completed, by the time the limit was reached. Once it is already reached, how do you even get the vacuum to start? Doesn't it error out right at the beginning. Jeff's database seems to have wrapped around already, because after fixing the above, I get this: Do you have the patch to fix this? jjanes=# vacuum; WARNING: some databases have not been vacuumed in over 2 billion transactions DETAIL: You might have already suffered transaction-wraparound data loss. VACUUM This is odd. When I apply your patch from the other thread to fix the vacuum, and then start up in single-user mode, I can run vacuum to completion and re-open the database. When I first start it up, it says it needs to be vacuumed within 999,935 transactions. There is no indication that it has already suffered a wrap around, just that it was about to do so. We do not truncate clog when wraparound has already happened, so we never get past that point. Jeff advanced XID counter aggressively with some custom C code, so hitting the actual wrap-around is a case of don't do that. Still, the case is quite peculiar: pg_controldata says that nextXid is 4/1593661139. The oldest datfrozenxid is equal to that, 1593661139. So ISTM he managed to not just wrap around, but execute 2 billion more transactions after the wraparound and reach datfrozenxid again. I'm not sure how that happened. If it had actually undergone an undetected wraparound, wouldn't data be disappearing and appearing inappropriately? I think the testing harness should have detected that inconsistency. (Also, the max setting for JJ_xid during the test run was 40, so I don't think it could have blown right past the 1,000,000 safety margin and out the other side without triggering a shutdown). Cheers, Jeff
Re: [HACKERS] pg_upgrade fails: Mismatch of relation OID in database 8.4 - 9.3
On 5/25/14, 11:44 AM, Andres Freund wrote: Hi, On 2014-05-23 08:23:57 -0600, Jeff Ross wrote: UDB=# \x Expanded display is on. UDB=# SELECT attrelid::regclass, attname, attnum, attlen, * FROM pg_attribute WHERE attrelid = 'masterairportlist'::regclass ORDER BY attnum ASC; UDB=# [ RECORD 1 ]-+-- ... A quick sum over the returned values seems to indicate that it's too large to not have a toast table. Adding up attlen and atttypmod gives a value of 1283. Considering that there additionally has to be VARHDRSZ space for the varlena header and that the maximum storage size for a varchar(n) is n * pg_encoding_max_length(GetDatabaseEncoding()) (usually 4) this seems to indicate that bad things[tm] have been done to the database. I suggest you write a script that does a 'ALTER TABLE $tbl ADD COLUMN toastme text; ALTER TABLE $tbl DROP COLUMN toastme' for all tables. Greetings, Andres Freund Hi Andres, Yes, that is exactly what I will do before our next test migration. I'd already started on the script since I could not see any downside to adding a column big enough to force a toast table and then dropping it, exactly as Bruce and you suggest and especially if that will let me use pg_upgrade rather than the traditional dump/restore. Could a check like this be added to pg_upgrade? Is there a downside to adding a column big enough to force a toast table and then dropping it for any table that is too large not to have a toast table but doesn't? Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_recvlogical not accepting -I to specify start LSN position
On Mon, May 26, 2014 at 2:01 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-05-25 22:35:24 +0900, Michael Paquier wrote: Attached patch corrects that, reshuffling at the same time the option letters parsed with getopt_long in alphabetical order. Hm. Not a big fan of this in isolation. In the attached patch I've reordered the options to all be ordered alphabetically, but only inside the section they are in --help. What do you think? That's fine as well. Thanks. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Replacement for OSSP-UUID for Linux and BSD
Hi Tom, thanks for the feedback. On 25/05/2014 21:10, Tom Lane wrote: Matteo Beccati p...@beccati.com writes: here's the latest version of my uuid changes patch, according to proposal (2) from Tom in the thread about OSSP-UUID[1]. Hmm ... this is not actually what I had in mind. Unless I'm misreading the patch, this nukes the uuid-ossp extension entirely in favor of a new extension uuid (providing the same SQL functions with a different underlying implementation). I don't think this works from the standpoint of providing compatibility for users of the existing extension. In particular, it'd break pg_upgrade (because of the change of the .so library name) as well as straight pg_dump upgrades (which would expect CREATE EXTENSION uuid-ossp to work). Not to mention application code that might expect CREATE EXTENSION uuid-ossp to still work. Another objection is that for people for whom OSSP uuid still works fine, this is forcing them to adopt a new implementation whose compatibility is as yet unproven. What I'd rather do is preserve contrib/uuid-ossp with the same extension and .so name, but with two configure options that select different underlying implementations. Sure, that makes sense. I wasn't actually sure it was ok to keep the OSSP brand even though the extensions didn't use the oosp library, hence the renaming. But I do agree upgrades wouldn't be very easy if we don't. In the long run it'd be nice to migrate away from the uuid-ossp extension name, mostly because of the poorly-chosen use of a dash in the name. But I'm not sure how we do that without breaking backwards compatibility, and anyway it's an entirely cosmetic thing that we can worry about later. Anyhow, doing it like that seems like it ought to be a pretty straightforward refactoring of your patch. I could pursue that, or you can. I do have a system with the ossp library installed: I'd be happy to give it a try tomorrow morning my time, depending on my workload. I'll keep you posted! Cheers -- Matteo Beccati Development Consulting - http://www.beccati.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fwd: Typo fixes in Solution.pm, part of MSVC scripts
Hi all, I noticed a couple of typos in Solution.pm, fixed by the patch attached. Those things have been introduced by commits cec8394 (visual 2013 support) and 0a78320 (latest pgident run, not actually a problem of pgident, because of a misuse of commas). The tab problems should not be backpatched to 9.3, but the use of commas instead of semicolons should be. Regards, -- Michael diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm index 283d399..c0d7f38 100644 --- a/src/tools/msvc/Solution.pm +++ b/src/tools/msvc/Solution.pm @@ -747,13 +747,13 @@ sub new my $self = $classname-SUPER::_new(@_); bless($self, $classname); - $self-{solutionFileVersion} = '12.00'; - $self-{vcver}= '12.00'; - $self-{visualStudioName} = 'Visual Studio 2013'; - $self-{VisualStudioVersion} = '12.0.21005.1', - $self-{MinimumVisualStudioVersion} = '10.0.40219.1', + $self-{solutionFileVersion}= '12.00'; + $self-{vcver} = '12.00'; + $self-{visualStudioName} = 'Visual Studio 2013'; + $self-{VisualStudioVersion}= '12.0.21005.1'; + $self-{MinimumVisualStudioVersion} = '10.0.40219.1'; - return $self; + return $self; } sub GetAdditionalHeaders -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] IMPORT FOREIGN SCHEMA statement
On Sun, May 25, 2014 at 11:23:41PM +0200, Ronan Dunklau wrote: Le dimanche 25 mai 2014 12:41:18 David Fetter a écrit : On Fri, May 23, 2014 at 10:08:06PM +0200, Ronan Dunklau wrote: Hello, Since my last proposal didn't get any strong rebuttal, please find attached a more complete version of the IMPORT FOREIGN SCHEMA statement. Thanks! Please to send future patches to this thread so people can track them in their mail. I'll do. I didn't for the previous one because it was a few months ago, and no patch had been added to the commit fest. Thanks for adding this one :) The import_foreign_schema_postgres_fdw patch proposes an implementation of this API for postgres_fdw. It will import a foreign schema using the right types as well as nullable information. In the case of PostgreSQL, the right types are obvious until there's a user-defined one. What do you plan to do in that case ? The current implementation fetches the types as regtype, and when receiving a custom type, two things can happen: - the type is defined locally: everything will work as expected - the type is not defined locally: the conversion function will fail, and raise an error of the form: ERROR: type schema.typname does not exist Should I add that to the regression test suite ? Yes. In the easy case of PostgreSQL, you might also be able to establish whether the UDT in the already defined locally case above is identical to the one defined remotely, but I don't think it's possible even in principle for non-PostgreSQL remote systems, possibly not even for ones with non-identical architecture, PostgreSQL major version, etc. Regarding documentation, I don't really know where it should have been put. If I missed something, let me know and I'll try to correct it. It's not exactly something you missed, but I need to bring it up anyway before we go too far. The standard missed two crucial concepts when this part of it was written: 1. No single per-database-type universal type mapping can be correct. [snip] To address these problems, I propose the following: - We make type mappings settable at the level of: - FDW - Instance (a.k.a. cluster) - Database - Schema - Table - Column using the existing ALTER command and some way of spelling out how Oops. Forgot to include CREATE in the above. a remote type maps to a local type. This would consist of: - The remote type - The local type to which it maps - The inbound transformation (default identity) - The outbound transformation (default identity) At any given level, the remote type would need to be unique. To communicate this to the system, we either invent new syntax, with all the hazards attendant thereto, or we could use JSON or similar serialization. ALTER FOREIGN TABLE foo ADD TYPE MAPPING FROM datetime TO TEXT WITH ( INBOUND TRANSFORMATION IDENTITY, OUTBOUND TRANSFORMATION IDENTITY ) /* Ugh!!! */ Ugh!!! means I don't think we should do it this way. vs. ALTER FOREIGN TABLE foo ADD (mapping '{ datetime: text, inbound: IDENTITY, outbound: IDENTITY }') Each FDW would have some set of default mappings and some way to override them as above. I understand your points, but I'm not really comfortable with the concept, unless there is something that I missed. My poor communication ability might have a lot to do with it. I assure you my explanation would have been even worse if I had tried it in French, though. :P We can already support this use case through specific-fdw options. Should I add that to postgres_fdw ? I believe the capability belongs in our FDW API with the decision of whether to implement it up to FDW authors. They know (or should know) how to throw ERRCODE_FEATURE_NOT_SUPPORTED. Additionally, I don't really see how that would be useful in a general case. With an in-core defined meaning of type transformation, any FDW that doesn't fit exactly into the model would have a hard time. For example, what happens if an FDW is only ever capable of returning text ? That's actually the case where it's most important to have the feature all the way down to the column level. Or if a mapping can only be set at the server or FDW model because it depends on some connection parameter ? ERRCODE_FEATURE_NOT_SUPPORTED. The bulk of the code for managing type mappings would be FDW-specific anyway. The part that actually does the transformations would necessarily be part of each FDW. I omitted opining on whether such transformations should be assumed to be local or remote because I can imagine cases where only local (or only remote) could be correct. What you're proposing looks like a universal option, with a specific syntax,
Re: [HACKERS] relaying errors from background workers
On Thu, May 22, 2014 at 9:51 AM, Robert Haas robertmh...@gmail.com wrote: Suppose a user backend starts a background worker for some purpose; the background worker dies with an error. The infrastructure we have today is sufficient for the user backend to discover that the worker backend has died, but not why. There might be an error in the server log, but the error information won't be transmitted back to the user backend in any way. I think it would be nice to fix that. I also think it would be nice to be able to relay not only errors, but also messages logged via ereport() or elog() at lower log levels (WARNING, NOTICE, INFO, DEBUG). The design I have in mind is to teach elog.c how to write such messages to a shm_mq. This is in fact one of the major use cases I had in mind when I designed the shm_mq infrastructure, because it seems to me that almost anything we want to do in parallel is likely to want to do this. Even aside from parallelism, it's not too hard to imagine wanting to use background workers to launch a job in the background and then come back later and see what happened. If there was an error, you're going to want go know specifically what went wrong, not just that something went wrong. The main thing I'm not sure about is how to format the message that we write to the shm_mq. One option is to try to use the good old FEBE protocol. This doesn't look entirely straightforward, because send_message_to_frontend() assembles the message using pq_sendbyte() and pq_sendstring(), and then sends it out to the client using pq_endmessage() and pq_flush(). This infrastructure assumes that the connection to the frontend is a socket. It doesn't seem impossible to kludge that infrastructure to be able to send to either a socket or a shm_mq, but I'm not sure whether it's a good idea. I think it will be better to keep assembling part of message similar to current and then have different way of communicating to backend. This will make rethrowing of message to client simpler. Already we have mechanism for reporting to client and server (send_message_to_server_log()/send_message_to_frontend()), so devising a similar way for communicating with backend seems to be a plausible way. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] pg_upgrade fails: Mismatch of relation OID in database 8.4 - 9.3
Hi, On 2014-05-25 16:59:24 -0600, Jeff Ross wrote: Could a check like this be added to pg_upgrade? Is there a downside to adding a column big enough to force a toast table and then dropping it for any table that is too large not to have a toast table but doesn't? It can take time and permanently 'uses up' a attribute number. So you'd add costs for everyone for the few people that decided to do something dangerous and unsupported... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 btree index corruption
Hi, On 2014-05-25 14:45:38 -0700, Jeff Janes wrote: You're welcome. If only I was as good at fixing things as at breaking them. At the moment there seems to be a bigger shortage of folks being good at breaking stuff - before the release! - than of people fixing the resulting breakage... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL access to database attributes
* Tom Lane (t...@sss.pgh.pa.us) wrote: I don't really object to doing an unlocked check for another such database, but I'm not convinced that additional locking to try to prevent a race is worth its keep. +1 on the nannyism, and +1 to ignoring the race. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Sending out a request for more buildfarm animals?
On 2014-05-25 16:58:39 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-05-25 01:02:25 +0200, Tomas Vondra wrote: The cache invalidation bug was apparently fixed, but we're still getting failures (see for example markhor): http://www.pgbuildfarm.org/cgi-bin/show_history.pl?nm=markhorbr=HEAD I see there's a transaction (COMMIT+BEGIN) - is this caused by the extremely long runtimes? Yes, that's the reason. Normally the test doesn't trigger autovacuum at all, but if it's running for a *long* time it can. I haven't yet figured out a good way to deal with that. Any way to make the test print only WAL entries arising from the foreground transaction? None that doesn't suck, so far :(. The least bad I can think of is toe just add a xinfo flag for such 'background' transaction commits. Don't like it much... If an autovac run can trigger this failure, then I would think it would happen sometimes, probabilistically, even when the test runtime wasn't all that long. That would be very unhappy-making, eg for packagers who would like build runs to reliably work the first time. So I think this is important even without the desire to run CLOBBER_CACHE regression tests. Agreed. Another idea is to provide a variant expected file, but that seems a bit fragile: if you can get one extra transaction, why not two? Yeah, that's not going to work. Autovac's transaction could appear at different places in the changestream. We probably don't want a expected file listing all. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sending out a request for more buildfarm animals?
On 2014-05-25 16:58:39 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-05-25 01:02:25 +0200, Tomas Vondra wrote: The cache invalidation bug was apparently fixed, but we're still getting failures (see for example markhor): http://www.pgbuildfarm.org/cgi-bin/show_history.pl?nm=markhorbr=HEAD I see there's a transaction (COMMIT+BEGIN) - is this caused by the extremely long runtimes? Yes, that's the reason. Normally the test doesn't trigger autovacuum at all, but if it's running for a *long* time it can. I haven't yet figured out a good way to deal with that. Any way to make the test print only WAL entries arising from the foreground transaction? None that doesn't suck, so far :(. The least bad I can think of is toe just add a xinfo flag for such 'background' transaction commits. Don't like it much... If an autovac run can trigger this failure, then I would think it would happen sometimes, probabilistically, even when the test runtime wasn't all that long. That would be very unhappy-making, eg for packagers who would like build runs to reliably work the first time. So I think this is important even without the desire to run CLOBBER_CACHE regression tests. Agreed. Another idea is to provide a variant expected file, but that seems a bit fragile: if you can get one extra transaction, why not two? Yeah, that's not going to work. Autovac's transaction could appear at different places in the changestream. We probably don't want a expected file listing all. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers