Re: [HACKERS] GSoC 2015 - mentors, students and admins.
I would also like to participate as a student. 2015-02-16 13:28 GMT-02:00 Dmitry Dolgov 9erthali...@gmail.com: I would like to participate as student. On 10 February 2015 at 03:52, Thom Brown t...@linux.com wrote: Hi all, Google Summer of Code 2015 is approaching. I'm intending on registering PostgreSQL again this year. Before I do that, I'd like to have an idea of how many people are interested in being either a student or a mentor. I've volunteered to be admin again, but if anyone else has a strong interest of seeing the projects through this year, please let yourself be known. Who would be willing to mentor projects this year? Project ideas are welcome, and I've copied many from last year's submissions into this year's wiki page. Please feel free to add more (or delete any that stand no chance or are redundant): https://wiki.postgresql.org/wiki/GSoC_2015 Students can find more information about GSoC here: http://www.postgresql.org/developer/summerofcode Thanks Thom
[HACKERS] Query Rewrite with Postgres' materialized views
Hello, Are there any plans for implementing query rewriting (i.e. allowing the optimizer to substitute materialized views for queries) in upcoming versions? I have recently seen this answer http://www.postgresql.org/message-id/200501041006.18735.j...@agliodbs.com, saying that query rewriting could be achieved using the rules system, but I could not figure out how to. Could someone please give me some tips on how to do it? Is it possible to make a rule not to all SELECTs of a table but for a specific query? Or the trick is done with another approach? Thank you for your time, -- Eric Grinstein Computer Engineering Student PUC-Rio
Re: [HACKERS] Perl coding error in msvc build system?
This patch been reviewed by 4 people, resulting in 2 minor suggestions for changes (adding an m modifier, and removing a bogus last). It has 2 clear upvotes and 0 downvotes. I think it should be revised along the lines suggested and committed without further ado. My patch actually only covered the first of the two faulty locations I pointed out. Attached is a patch that also fixes the second one. I noticed that DetermineVisualStudioVersion() is never called with an argument, so I removed that branch altogether. diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm index 39e41f6..714585f 100644 --- a/src/tools/msvc/Solution.pm +++ b/src/tools/msvc/Solution.pm @@ -71,17 +71,9 @@ sub DeterminePlatform my $self = shift; # Examine CL help output to determine if we are in 32 or 64-bit mode. - $self-{platform} = 'Win32'; - open(P, cl /? 21 |) || die cl command not found; - while (P) - { - if (/^\/favor:.+AMD64/) - { - $self-{platform} = 'x64'; - last; - } - } - close(P); + my $output = `cl /? 21`; + $? 8 == 0 or die cl command not found; + $self-{platform} = ($output =~ /^\/favor:.+AMD64/m) ? 'x64' : 'Win32'; print Detected hardware platform: $self-{platform}\n; } diff --git a/src/tools/msvc/VSObjectFactory.pm b/src/tools/msvc/VSObjectFactory.pm index d255bec..b83af40 100644 --- a/src/tools/msvc/VSObjectFactory.pm +++ b/src/tools/msvc/VSObjectFactory.pm @@ -92,30 +92,16 @@ sub CreateProject sub DetermineVisualStudioVersion { - my $nmakeVersion = shift; - - if (!defined($nmakeVersion)) - { - -# Determine version of nmake command, to set proper version of visual studio -# we use nmake as it has existed for a long time and still exists in current visual studio versions - open(P, nmake /? 21 |) - || croak -Unable to determine Visual Studio version: The nmake command wasn't found.; - while (P) - { - chomp; - if (/(\d+)\.(\d+)\.\d+(\.\d+)?$/) - { -return _GetVisualStudioVersion($1, $2); - } - } - close(P); - } - elsif ($nmakeVersion =~ /(\d+)\.(\d+)\.\d+(\.\d+)?$/) + # To determine version of Visual Studio we use nmake as it has + # existed for a long time and still exists in current Visual + # Studio versions. + my $output = `nmake /? 21`; + $? 8 == 0 or croak Unable to determine Visual Studio version: The nmake command wasn't found.; + if ($output =~ /(\d+)\.(\d+)\.\d+(\.\d+)?$/m) { return _GetVisualStudioVersion($1, $2); } + croak Unable to determine Visual Studio version: The nmake version could not be determined.; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: NULL checks of deferenced pointers in picksplit method of intarray
Michael Paquier michael.paqu...@gmail.com wrote: Coverity is pointing out that _int_split.c has unnecessary checks for deferenced pointers in 5 places. Attached is a patch to adjust those things. Pushed. Thanks! Also, as far as I understood from this code, no elements manipulated are NULL, perhaps this is worth an assertion? I'm not clear where you were thinking of, but anyway that seemed like a separate patch if we're going to do it, so I went ahead with pushing the issued Coverity flagged. The arguments to the function don't need such a check because the function is exposed to SQL with the STRICT option (but you probably already knew that). While reviewing the safety of this patch the only place that I ran across that I felt maybe deserved an assertion was that n = 0 near the top of copy_intArrayType(), but that seems marginal. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates
On 02/13/2015 02:07 PM, Michael Paquier wrote: Andreas, are you planning to continue working on this patch? Peter has provided some comments that are still unanswered. Yes, but I am quite busy right now. I will try to find time some time later this week. -- Andreas Karlsson -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]
Hi, In a lot of places in the code we have many structures doing declarations of the type foo[1] to emulate variable length arrays. Attached are a set of patches aimed at replacing that with FLEXIBLE_ARRAY_MEMBER to prevent potential failures that could be caused by compiler optimizations and improve report of static code analyzers of the type Coverity (idea by Tom, patches from me): - 0001 uses FLEXIBLE_ARRAY_MEMBER in a bunch of trivial places (at least check-world does not complain) - 0002 does the same for catalog tables - 0003 changes varlena in c.h. This is done as a separate patch because this needs some other modifications as variable-length things need to be placed at the end of structures, because of: -- rolpassword that should be placed as the last field of pg_authid (and shouldn't there be CATALOG_VARLEN here??) -- inv_api.c uses bytea in an internal structure without putting it at the end of the structure. For clarity I think that we should just use a bytea pointer and do a sufficient allocation for the duration of the lobj manipulation. -- Similarly, tuptoaster.c needed to be patched for toast_save_datum There are as well couple of things that are not changed on purpose: - in namespace.h for FuncCandidateList. I tried manipulating it but it resulted in allocation overflow in PortalHeapMemory - I don't think that the t_bits fields in htup_details.h should be updated either. Regards, -- Michael From f3389a945811e1db19eb5debc9e41b84fed0a9f5 Mon Sep 17 00:00:00 2001 From: Michael Paquier mich...@otacoo.com Date: Mon, 16 Feb 2015 02:44:10 +0900 Subject: [PATCH 1/3] First cut with FLEXIBLE_ARRAY_MEMBER This is targetted to prevent false positive errors that static code analyzers may do by assuming that some structures have an unvarying size. --- contrib/cube/cubedata.h | 7 +++ contrib/intarray/_int.h | 4 ++-- contrib/ltree/ltree.h | 14 +++--- contrib/pg_trgm/trgm.h | 2 +- src/bin/pg_dump/dumputils.h | 2 +- src/include/access/gin_private.h| 4 ++-- src/include/access/gist_private.h | 5 +++-- src/include/access/heapam_xlog.h| 2 +- src/include/access/htup_details.h | 4 ++-- src/include/access/spgist_private.h | 10 +- src/include/access/xact.h | 8 src/include/c.h | 4 ++-- src/include/commands/dbcommands.h | 4 ++-- src/include/commands/tablespace.h | 2 +- src/include/executor/hashjoin.h | 2 +- src/include/nodes/bitmapset.h | 2 +- src/include/nodes/params.h | 2 +- src/include/nodes/tidbitmap.h | 2 +- src/include/postgres.h | 9 + src/include/postmaster/syslogger.h | 2 +- src/include/replication/walsender_private.h | 2 +- src/include/storage/bufpage.h | 3 ++- src/include/storage/fsm_internals.h | 2 +- src/include/storage/standby.h | 4 ++-- src/include/tsearch/dicts/regis.h | 2 +- src/include/tsearch/dicts/spell.h | 6 +++--- src/include/tsearch/ts_type.h | 6 +++--- src/include/utils/catcache.h| 2 +- src/include/utils/datetime.h| 5 +++-- src/include/utils/geo_decls.h | 3 ++- src/include/utils/jsonb.h | 2 +- src/include/utils/relmapper.h | 2 +- src/include/utils/varbit.h | 3 ++- 33 files changed, 69 insertions(+), 64 deletions(-) diff --git a/contrib/cube/cubedata.h b/contrib/cube/cubedata.h index 5d44e11..3535847 100644 --- a/contrib/cube/cubedata.h +++ b/contrib/cube/cubedata.h @@ -23,11 +23,10 @@ typedef struct NDBOX unsigned int header; /* - * Variable length array. The lower left coordinates for each dimension - * come first, followed by upper right coordinates unless the point flag - * is set. + * The lower left coordinates for each dimension come first, followed + * by upper right coordinates unless the point flag is set. */ - double x[1]; + double x[FLEXIBLE_ARRAY_MEMBER]; } NDBOX; #define POINT_BIT 0x8000 diff --git a/contrib/intarray/_int.h b/contrib/intarray/_int.h index 7f93206..d524f0f 100644 --- a/contrib/intarray/_int.h +++ b/contrib/intarray/_int.h @@ -73,7 +73,7 @@ typedef struct { int32 vl_len_; /* varlena header (do not touch directly!) */ int32 flag; - char data[1]; + char data[FLEXIBLE_ARRAY_MEMBER]; } GISTTYPE; #define ALLISTRUE 0x04 @@ -133,7 +133,7 @@ typedef struct QUERYTYPE { int32 vl_len_; /* varlena header (do not touch directly!) */ int32 size; /* number of ITEMs */ - ITEM items[1]; /* variable length array */ + ITEM items[FLEXIBLE_ARRAY_MEMBER]; } QUERYTYPE; #define HDRSIZEQT offsetof(QUERYTYPE, items) diff --git a/contrib/ltree/ltree.h
Re: [HACKERS] GSoC 2015 - mentors, students and admins.
I'm interested in participating in gsoc 2015 as student. Sincerely, Ivanitskiy Ilya.
Re: [HACKERS] Allow snapshot too old error, to prevent bloat
On Sun, Feb 15, 2015 at 11:36:50AM -0500, Tom Lane wrote: I wonder if we couldn't achieve largely the same positive effects through adding a simple transaction-level timeout option. That would be far easier for users to understand and manage, it would be trivial to allow specific high-value transactions to run with a laxer setting, it does not create any implementation-detail-dependent behavior that we'd be having to explain to users forevermore, and (not incidentally) it would be a lot simpler and more trustworthy to implement. There's no well-defined correlation between your setting and the net effect on database bloat, but that's true with the snapshot too old approach as well. While we have prided ourselves on not generating query cancel messages due to snapshot-too-old, there is the downside of bloat. I understand the need to allow users to make the trade-off between bloat and query cancellation. It seems we already have a mechanism in place that allows tuning of query cancel on standbys vs. preventing standby queries from seeing old data, specifically max_standby_streaming_delay/max_standby_archive_delay. We obsessed about how users were going to react to these odd variables, but there has been little negative feedback. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow snapshot too old error, to prevent bloat
On 2/16/15 2:41 AM, Andres Freund wrote: Since you actually don't want cancellations for the long running reporting queries it very much might be sensible to switch to a more complicated version of HeapTupleSatisfiesVacuum() if there's longrunning queries. One that can recognize if rows are actually visible to any backend, or if there are just snapshots that see older rows. I've previously wondered how hard this would be, but I don't think it's *that* hard. And it'd be a much more general solution. Josh Berkus mentioned on IRC that they're working on improving vacuum, and I think it was something along these lines. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issue installing doc tools on OSX
On 2/15/15 9:23 PM, David Steele wrote: That seems a bit incredible, since port should be able to resolve the dependencies by itself. I suggest that this should be reported as a bug to MacPorts. Sure, that has been my experience, but the error message was very clear. Unfortunately I did not capture the error before I changed the order and the log file was removed on the next run. How about uninstalling and reinstalling? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issue installing doc tools on OSX
On Feb16, 2015, at 23:18 , Peter Eisentraut pete...@gmx.net wrote: On 2/15/15 9:23 PM, David Steele wrote: That seems a bit incredible, since port should be able to resolve the dependencies by itself. I suggest that this should be reported as a bug to MacPorts. Sure, that has been my experience, but the error message was very clear. Unfortunately I did not capture the error before I changed the order and the log file was removed on the next run. I just tried this on OS X 10.9.5 running MacPorts 2.3.3 and a ports tree as of a few minutes ago. The command After uninstalling docbook, opensp and openjade, re-installing them with sudo port install docbook-dsssl docbook-sgml-4.2 docbook-xml-4.2 docbook-xsl libxslt openjade opensp completed without a hitch. It even logged --- Computing dependencies for openjade --- Dependencies to be installed: opensp BTW, why does the list of suggested packages include docbook-xml? I was under the impression that postgres used only the SGML version of docbook. And I previously only has the SGML version installed, and I'm pretty sure that I was able to build the documentation successfully. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commit fest 2015-12 enters money time
I couldn't find operation to add new entry on the current CF. Go to the bottom of this CF: https://commitfest.postgresql.org/4/ Then click on New patch. I couldn't find the New patch link on both of IE and Chrome, even though I logged in. Did you do something special? Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier Sent: Tuesday, February 17, 2015 8:34 AM To: Kohei KaiGai Cc: PostgreSQL mailing lists Subject: Re: [HACKERS] Commit fest 2015-12 enters money time On Tue, Feb 17, 2015 at 7:39 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: 2015-02-16 12:25 GMT+09:00 Michael Paquier michael.paqu...@gmail.com: On Fri, Feb 13, 2015 at 10:06 AM, Michael Paquier michael.paqu...@gmail.com wrote: In order to move on to the next CF, I am going to go through all the remaining patches and update their status accordingly. And sorry for slacking a bit regarding that. If you wish to complain, of course feel free to post messages on this thread or on the thread related to the patch itself. As all the entries have been either moved or updated, CF 2014-12 is closed, and CF 2015-02 has been changed to In Progress. I tried to add my name on the CF entry for the Join pushdown support for foreign tables patch, however, it was unintentionally marked as rejected on the last CF, even though it should be returned with feedback. https://commitfest.postgresql.org/3/20/ Is it available to move it to the current CF? I don't recall you can... I couldn't find operation to add new entry on the current CF. Go to the bottom of this CF: https://commitfest.postgresql.org/4/ Then click on New patch. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]
On 2015-02-17 05:51:22 +0900, Michael Paquier wrote: - 0002 does the same for catalog tables - 0003 changes varlena in c.h. This is done as a separate patch because this needs some other modifications as variable-length things need to be placed at the end of structures, because of: -- rolpassword that should be placed as the last field of pg_authid (and shouldn't there be CATALOG_VARLEN here??) Yes, there should. -- inv_api.c uses bytea in an internal structure without putting it at the end of the structure. For clarity I think that we should just use a bytea pointer and do a sufficient allocation for the duration of the lobj manipulation. Hm. I don't really see the problem tbh. There's actually is a reason the bytea is at the beginning - the other fields are *are* the data part of the bytea (which is just the varlena header). -- Similarly, tuptoaster.c needed to be patched for toast_save_datum I'm not a big fan of these two changes. This adds some not that small memory allocations to a somewhat hot path. Without a big win in clarity. And it doesn't seem to have anything to do with with FLEXIBLE_ARRAY_MEMBER. There are as well couple of things that are not changed on purpose: - in namespace.h for FuncCandidateList. I tried manipulating it but it resulted in allocation overflow in PortalHeapMemory Did you change the allocation in FuncnameGetCandidates()? Note the - sizeof(Oid) there. - I don't think that the t_bits fields in htup_details.h should be updated either. Why not? Any not broken code should already use MinHeapTupleSize and similar macros. Generally it's worthwhile to check the code you changed for sizeof(changed_struct) and similar things. Because this very well could add bugs. And not all of them will result in a outright visibile error like the FuncnameGetCandidates() one. --- a/src/include/access/htup_details.h +++ b/src/include/access/htup_details.h @@ -150,7 +150,7 @@ struct HeapTupleHeaderData /* ^ - 23 bytes - ^ */ - bits8 t_bits[1]; /* bitmap of NULLs -- VARIABLE LENGTH */ + bits8 t_bits[1]; /* bitmap of NULLs */ /* MORE DATA FOLLOWS AT END OF STRUCT */ }; @@ -579,7 +579,7 @@ struct MinimalTupleData /* ^ - 23 bytes - ^ */ - bits8 t_bits[1]; /* bitmap of NULLs -- VARIABLE LENGTH */ + bits8 t_bits[1]; /* bitmap of NULLs */ /* MORE DATA FOLLOWS AT END OF STRUCT */ }; This sees like overeager search/replace. diff --git a/src/include/nodes/params.h b/src/include/nodes/params.h index 5b096c5..0ad7ef0 100644 --- a/src/include/nodes/params.h +++ b/src/include/nodes/params.h @@ -71,7 +71,7 @@ typedef struct ParamListInfoData ParserSetupHook parserSetup;/* parser setup hook */ void *parserSetupArg; int numParams; /* number of ParamExternDatas following */ - ParamExternData params[1]; /* VARIABLE LENGTH ARRAY */ + ParamExternData params[1]; }ParamListInfoData; Eh? diff --git a/src/include/catalog/pg_attribute.h b/src/include/catalog/pg_attribute.h index 87a3462..73bcefe 100644 --- a/src/include/catalog/pg_attribute.h +++ b/src/include/catalog/pg_attribute.h @@ -157,13 +157,13 @@ CATALOG(pg_attribute,1249) BKI_BOOTSTRAP BKI_WITHOUT_OIDS BKI_ROWTYPE_OID(75) BK /* NOTE: The following fields are not present in tuple descriptors. */ /* Column-level access permissions */ - aclitem attacl[1]; + aclitem attacl[FLEXIBLE_ARRAY_MEMBER]; /* Column-level options */ - textattoptions[1]; + textattoptions[FLEXIBLE_ARRAY_MEMBER]; /* Column-level FDW options */ - textattfdwoptions[1]; + textattfdwoptions[FLEXIBLE_ARRAY_MEMBER]; #endif } FormData_pg_attribute; Ugh. This really isn't C at all anymore - these structs wouldn't compile if you removed the #ifdef. Maybe we should instead a 'textarray' type for genbki's benefit? Generally the catalog changes are much less clearly a benefit imo. From ad8f54cdb5776146f17d1038bb295b5f13b549f1 Mon Sep 17 00:00:00 2001 From: Michael Paquier mich...@otacoo.com Date: Mon, 16 Feb 2015 03:53:38 +0900 Subject: [PATCH 3/3] Update varlena in c.h to use FLEXIBLE_ARRAY_MEMBER Places using a variable-length variable not at the end of a structure are changed with workaround, without impacting what those features do. I vote for rejecting most of this, except a (corrected version) of the pg_authid change. Which doesn't really belong to the rest of the patch anyway ;)x #define VARHDRSZ ((int32) sizeof(int32)) diff --git a/src/include/catalog/pg_authid.h b/src/include/catalog/pg_authid.h index e01e6aa..d8789a5 100644 --- a/src/include/catalog/pg_authid.h +++ b/src/include/catalog/pg_authid.h @@ -56,8 +56,10 @@
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On Mon, Feb 16, 2015 at 12:00 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: So INSERT ON CONFLICT IGNORE on a table with an exclusion constraint might fail. I don't like that. The point of having the command in the first place is to deal with concurrency issues. If it sometimes doesn't work, it's broken. I don't like it either, although I think it wouldn't come up very often with exclusion constraints - basically, it would rarely be noticed due to the different use cases. To be honest, in suggesting this idea I was hedging against us not figuring out a solution to that problem in time. You didn't like my suggestion of dropping IGNORE entirely, either. I'll do my best to come up with something, but I'm uncomfortable that at this late stage, ON CONFLICT IGNORE support for exclusion constraints seems like a risk to the entire project. I ask that if push comes to shove you show some flexibility here. I'll try my best to ensure that you don't have to even consider committing something with a notable omission. You don't have to give me an answer to this now. The idea of comparing the TIDs of the tuples as a tie-breaker seems most promising to me. If the conflicting tuple's TID is smaller than the inserted tuple's, super-delete and wait. Otherwise, wait without super-deletion. That's really simple. Do you see a problem with that? No. I'll work on that, and see how it stands up to stress testing. Come to think of it, that does seem most promising. BTW, it would good to explain somewhere in comments or a README the term unprincipled deadlock. It's been a useful concept, and hard to grasp. If you defined it once, with examples and everything, then you could just have See .../README in other places that need to refer it. Agreed. I have described those in the revised executor README, in case you missed that. Do you think they ought to have their own section? Note that hash indexes have unprincipled deadlock issues, but no one has bothered to fix them. Whatever works, really. I can't say that the performance implications of acquiring that hwlock are at the forefront of my mind. I never found that to be a big problem on an 8 core box, relative to vanilla INSERTs, FWIW - lock contention is not normal, and may be where any heavweight lock costs would really be encountered. Oh, cool. I guess the fast-path in lmgr.c kicks ass, then :-). Seems that way. But even if that wasn't true, it wouldn't matter, since I don't see that we have a choice. I don't see how storing the promise token in heap tuples buys us not having to involve heavyweight locking of tokens. (I think you may have made a thinko in suggesting otherwise) It wouldn't get rid of heavyweight locks, but it would allow getting rid of the procarray changes. The inserter could generate a token, then acquire the hw-lock on the token, and lastly insert the heap tuple with the correct token. Do you really think that's worth the disk overhead? I generally agree with the zero overhead principle, which is that anyone not using the feature shouldn't pay no price for it (or vanishingly close to no price). Can't say that I have a good sense of the added distributed cost (if any) to be paid by adding new fields to the PGPROC struct (since the PGXACT struct was added in 9.2). Is your only concern that the PGPROC array will now have more fields, making it more complicated? Surely that's a price worth paying to avoid making these heap tuples artificially somewhat larger. You're probably left with tuples that are at least 8 bytes larger, once alignment is taken into consideration. That doesn't seem great. That couldn't work without further HeapTupleSatisfiesDirty() logic. Why not? Just meant that it wasn't sufficient to check xmin == xmax, while allowing that something like that could work with extra work (e.g. the use of infomask bits)... Ok, so we can't unconditionally always ignore tuples with xmin==xmax. We would need an infomask bit to indicate super-deletion, and only ignore it if the bit is set. ...which is what you say here. I'm starting to think that we should bite the bullet and consume an infomask bit for this. The infomask bits are a scarce resource, but we should use them when it makes sense. It would be good for forensic purposes too, to leave a trace that a super-deletion happened. I too think the tqual.c changes are ugly. I can't see a way around using a token lock, though - I would only consider (and only consider) refining the interface by which a waiter becomes aware that it must wait on the outcome of the inserting xact's speculative insertion/value lock (and in particular, that is should not wait on xact outcome). We clearly need something to wait on that isn't an XID...heavyweight locking of a token value is the obvious way of doing that. Yeah. Jim Nasby said something about setting the HEAP_XMIN_INVALID hint bit. Maybe he is right...if that can be made to be reliable (always
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 02/16/2015 08:48 PM, Petr Jelinek wrote: On 17/02/15 01:57, Peter Geoghegan wrote: On Mon, Feb 16, 2015 at 4:44 PM, Petr Jelinek p...@2ndquadrant.com wrote: We definitely want this feature, I wished to have this info many times. I would still like to see a benchmark. Average of 3 runs of read-only pgbench on my system all with pg_stat_statement activated: HEAD: 20631 SQRT: 20533 SQRTD: 20592 So using sqrtd the cost is 0.18%. I think that's acceptable. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commit fest 2015-12 enters money time
On Tue, Feb 17, 2015 at 8:50 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: I couldn't find operation to add new entry on the current CF. Go to the bottom of this CF: https://commitfest.postgresql.org/4/ Then click on New patch. I couldn't find the New patch link on both of IE and Chrome, even though I logged in. Did you do something special? Err... No. Perhaps I can do it only thanks to my superpowers :) Well, no worries I just added it here and it has the same data: https://commitfest.postgresql.org/4/167/ -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 21/01/15 17:32, Andrew Dunstan wrote: On 01/21/2015 11:21 AM, Arne Scheffer wrote: Why is it a bad thing to call the column stddev_samp analog to the aggregate function or make a note in the documentation, that the sample stddev is used to compute the solution? ... But I will add a note to the documentation, that seems reasonable. I agree this is worth mentioning in the doc. In any case here some review from me: We definitely want this feature, I wished to have this info many times. The patch has couple of issues though: The 1.2 to 1.3 upgrade file is named pg_stat_statements--1.2-1.3.sql and should be renamed to pg_stat_statements--1.2--1.3.sql (two dashes). There is new sqrtd function for fast sqrt calculation, which I think is a good idea as it will reduce the overhead of the additional calculation and this is not something where little loss of precision would matter too much. The problem is that the new function is actually not used anywhere in the code, I only see use of plain sqrt. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Mon, Feb 16, 2015 at 4:44 PM, Petr Jelinek p...@2ndquadrant.com wrote: We definitely want this feature, I wished to have this info many times. I would still like to see a benchmark. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issue installing doc tools on OSX
On 2/16/15 6:10 PM, Florian Pflug wrote: On Feb16, 2015, at 23:18 , Peter Eisentraut pete...@gmx.net wrote: On 2/15/15 9:23 PM, David Steele wrote: That seems a bit incredible, since port should be able to resolve the dependencies by itself. I suggest that this should be reported as a bug to MacPorts. Sure, that has been my experience, but the error message was very clear. Unfortunately I did not capture the error before I changed the order and the log file was removed on the next run. I just tried this on OS X 10.9.5 running MacPorts 2.3.3 and a ports tree as of a few minutes ago. The command After uninstalling docbook, opensp and openjade, re-installing them with sudo port install docbook-dsssl docbook-sgml-4.2 docbook-xml-4.2 docbook-xsl libxslt openjade opensp completed without a hitch. It even logged --- Computing dependencies for openjade --- Dependencies to be installed: opensp Yeah, it worked when I did the uninstall/reinstall. -- - David Steele da...@pgmasters.net signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Commit fest 2015-12 enters money time
On Tue, Feb 17, 2015 at 7:39 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: 2015-02-16 12:25 GMT+09:00 Michael Paquier michael.paqu...@gmail.com: On Fri, Feb 13, 2015 at 10:06 AM, Michael Paquier michael.paqu...@gmail.com wrote: In order to move on to the next CF, I am going to go through all the remaining patches and update their status accordingly. And sorry for slacking a bit regarding that. If you wish to complain, of course feel free to post messages on this thread or on the thread related to the patch itself. As all the entries have been either moved or updated, CF 2014-12 is closed, and CF 2015-02 has been changed to In Progress. I tried to add my name on the CF entry for the Join pushdown support for foreign tables patch, however, it was unintentionally marked as rejected on the last CF, even though it should be returned with feedback. https://commitfest.postgresql.org/3/20/ Is it available to move it to the current CF? I don't recall you can... I couldn't find operation to add new entry on the current CF. Go to the bottom of this CF: https://commitfest.postgresql.org/4/ Then click on New patch. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commit fest 2015-12 enters money time
On Tue, Feb 17, 2015 at 8:50 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: I couldn't find operation to add new entry on the current CF. Go to the bottom of this CF: https://commitfest.postgresql.org/4/ Then click on New patch. I couldn't find the New patch link on both of IE and Chrome, even though I logged in. Did you do something special? Err... No. Perhaps I can do it only thanks to my superpowers :) Well, no worries I just added it here and it has the same data: https://commitfest.postgresql.org/4/167/ That's exactly superpower. Thanks for your help. -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issue installing doc tools on OSX
On 2/16/15 6:10 PM, Florian Pflug wrote: BTW, why does the list of suggested packages include docbook-xml? That is used for building the man pages. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow snapshot too old error, to prevent bloat
On 2015-02-16 16:35:46 -0500, Bruce Momjian wrote: It seems we already have a mechanism in place that allows tuning of query cancel on standbys vs. preventing standby queries from seeing old data, specifically max_standby_streaming_delay/max_standby_archive_delay. We obsessed about how users were going to react to these odd variables, but there has been little negative feedback. FWIW, I think that's a somewhat skewed perception. I think it was right to introduce those, because we didn't really have any alternatives. But max_standby_streaming_delay, max_standby_archive_delay and hot_standby_feedback are among the most frequent triggers for questions and complaints that I/we see. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NULL checks of deferenced pointers in picksplit method of intarray
On Tue, Feb 17, 2015 at 6:49 AM, Kevin Grittner kgri...@ymail.com wrote: Michael Paquier michael.paqu...@gmail.com wrote: Coverity is pointing out that _int_split.c has unnecessary checks for deferenced pointers in 5 places. Attached is a patch to adjust those things. Pushed. Thanks! Thanks. Also, as far as I understood from this code, no elements manipulated are NULL, perhaps this is worth an assertion? I'm not clear where you were thinking of, but anyway that seemed like a separate patch if we're going to do it, so I went ahead with pushing the issued Coverity flagged. The arguments to the function don't need such a check because the function is exposed to SQL with the STRICT option (but you probably already knew that). While reviewing the safety of this patch the only place that I ran across that I felt maybe deserved an assertion was that n = 0 near the top of copy_intArrayType(), but that seems marginal. Yeah, we don't do that for the other STRICT functions, let's not do it then. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] We do not need pg_promote_v4_to_v6_addr/mask
We have some code in the server that attempts to match IPv4 address entries in pg_hba.conf to incoming connections that are in IPv6 protocol but have addresses in the range :::: (the IPv4-in-IPv6 subrange). As revealed by today's bug report from Hugo Osvaldo Barrera, this code has been broken since commit f3aec2c7f51904e7 (shipped in 9.0), as a result of sloppiness with a memcpy() source address. How is it that nobody noticed? Experimentation with a RHEL6 box says that at least on Linux kernels, such cases never arise because the kernel will not report such an address to the postmaster. You can ask to connect with an address like that, viz psql -h :::7f00:0001 ... but what the kernel will tell the postmaster is the client address is IPv4-style 127.0.0.1. It's certainly possible that on other platforms there is a visible distinction between :::7f00:0001 and 127.0.0.1, but the lack of field complaints about this bug suggests that there isn't. Moreover: suppose that on hypothetical platform X the kernel does report such client addresses differently. Anybody on such a platform who's used both types of client addresses with any 9.x release must have ended up making different pg_hba.conf entries for the two cases. It's not impossible that they chose to make the entries meaningfully different; in which case they will not thank us for fixing the code so that the distinction vanishes again. If your platform thinks these are different addresses then you probably do too. In short, then, I'm having second thoughts about the quick-hack patch I committed earlier to fix the memcpy thinko. On platforms where it has any effect at all, that effect will be to make a subtle and perhaps security-relevant change to the interpretation of pg_hba.conf entries, changing what the postmaster has treated them as meaning since 9.0.0. That doesn't sound like the kind of thing we want to do in minor releases. Therefore, I now think what we ought to do, both in HEAD and in the back branches, is rip out the #ifdef HAVE_IPV6 stanza in check_ip(), delete pg_promote_v4_to_v6_addr and pg_promote_v4_to_v6_mask which will thereby become unused, and remove the single sentence in the manual that claims that IPv4-in-IPv6 connections will match IPv4 pg_hba.conf entries. This amounts to accepting our behavior since 9.0 as being correct. If there was ever a live need for the older behavior, its time has evidently passed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Join push-down support for foreign tables
Let me put some comments in addition to where you're checking now. [design issues] * Cost estimation Estimation and evaluation of cost for remote join query is not an obvious issue. In principle, local side cannot determine the cost to run remote join without remote EXPLAIN, because local side has no information about JOIN logic applied on the remote side. Probably, we have to put an assumption for remote join algorithm, because local planner has no idea about remote planner's choice unless foreign-join don't take use_remote_estimate. I think, it is reasonable assumption (even if it is incorrect) to calculate remote join cost based on local hash-join algorithm. If user wants more correct estimation, remote EXPLAIN will make more reliable cost estimation. It also needs a consensus whether cost for remote CPU execution is equivalent to local CPU. If we think local CPU is rare resource than remote one, a discount rate will make planner more preferable to choose remote join than local one. Once we assume a join algorithm for remote join, unit cost for remote CPU, we can calculate a cost for foreign join based on the local join logic plus cost for network translation (maybe fdw_tuple_cost?). * FDW options Unlike table scan, FDW options we should refer is unclear. Table level FDW options are associated with a foreign table as literal. I think we have two options here: 1. Foreign-join refers FDW options for foreign-server, but ones for foreign-tables are ignored. 2. Foreign-join is prohibited when both of relations don't have identical FDW options. My preference is 2. Even though N-way foreign join, it ensures all the tables involved with (N-1)-way foreign join has identical FDW options, thus it leads we can make N-way foreign join with all identical FDW options. One exception is updatable flag of postgres_fdw. It does not make sense on remote join, so I think mixture of updatable and non-updatable foreign tables should be admitted, however, it is a decision by FDW driver. Probably, above points need to take time for getting consensus. I'd like to see your opinion prior to editing your patch. [implementation issues] The interface does not intend to add new Path/Plan type for each scan that replaces foreign joins. What postgres_fdw should do is, adding ForeignPath towards a particular joinrel, then it populates ForeignScan with remote join query once it got chosen by the planner. A few functions added in src/backend/foreign/foreign.c are not called by anywhere, at this moment. create_plan_recurse() is reverted to static. It is needed for custom- join enhancement, if no other infrastructure can support. I'll check the code to construct remote query later. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -Original Message- From: Shigeru Hanada [mailto:shigeru.han...@gmail.com] Sent: Monday, February 16, 2015 1:54 PM To: Kaigai Kouhei(海外 浩平) Cc: Robert Haas; PostgreSQL-development Subject: ##freemail## Re: [HACKERS] Join push-down support for foreign tables Kaigai-san, Oops. I rebased the patch onto your v4 custom/foreign join patch. But as you mentioned off-list, I found a flaw about inappropriate change about NestPath still remains in the patch... I might have made my dev branch into unexpected state. I'll check it soon. 2015-02-16 13:13 GMT+09:00 Kouhei Kaigai kai...@ak.jp.nec.com: Hanada-san, Your patch mixtures enhancement of custom-/foreign-scan interface and enhancement of contrib/postgres_fdw... Probably, it is a careless mis- operation. Please make your patch as differences from my infrastructure portion. Also, I noticed this Join pushdown support for foreign tables patch is unintentionally rejected in the last commit fest. https://commitfest.postgresql.org/3/20/ I couldn't register myself as reviewer. How do I operate it on the new commitfest application? Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Shigeru Hanada Sent: Monday, February 16, 2015 1:03 PM To: Robert Haas Cc: PostgreSQL-development Subject: Re: [HACKERS] Join push-down support for foreign tables Hi I've revised the patch based on Kaigai-san's custom/foreign join patch posted in the thread below. http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F8 0 108c...@bpxm15gp.gisp.nec.co.jp Basically not changed from the version in the last CF, but as Robert commented before, N-way (not only 2-way) joins should be supported in the first version by construct SELECT SQL by containing source query in FROM clause as inline views (a.k.a. from clause subquery). 2014-12-26 13:48 GMT+09:00 Shigeru Hanada shigeru.han...@gmail.com: 2014-12-16 1:22 GMT+09:00 Robert Haas robertmh...@gmail.com: On
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 17/02/15 01:57, Peter Geoghegan wrote: On Mon, Feb 16, 2015 at 4:44 PM, Petr Jelinek p...@2ndquadrant.com wrote: We definitely want this feature, I wished to have this info many times. I would still like to see a benchmark. Average of 3 runs of read-only pgbench on my system all with pg_stat_statement activated: HEAD: 20631 SQRT: 20533 SQRTD: 20592 -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We do not need pg_promote_v4_to_v6_addr/mask
On 02/16/2015 09:07 PM, Tom Lane wrote: I wrote: We have some code in the server that attempts to match IPv4 address entries in pg_hba.conf to incoming connections that are in IPv6 protocol but have addresses in the range :::: (the IPv4-in-IPv6 subrange). As revealed by today's bug report from Hugo Osvaldo Barrera, this code has been broken since commit f3aec2c7f51904e7 (shipped in 9.0), as a result of sloppiness with a memcpy() source address. How is it that nobody noticed? BTW, a bit of digging in the git logs and mail archives says that the code in question was originally added in 7.4 (commit 3c9bb8886df7d56a), in response to this discussion: http://www.postgresql.org/message-id/flat/200309012156.05874.t.maekit...@epgmbh.de So back in 2003 there were Linux boxes that actively transformed IPv4 connection addresses to :::: format. Current Linux behavior is the exact opposite: even if you try to say :::: in a connection request, IPv4 is what comes out the other end. I find the same on current OS X btw. So I'm definitely now of the opinion that this is a workaround for a long-deceased Linux kernel bug, and not something we need to continue^X^X^Xresume supporting. Wow, talk about a walk down memory lane. Apparently that thread triggered my rewrite of initdb - I'd totally forgotten that. The proposed course sounds reasonable enough. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]
Andres Freund and...@2ndquadrant.com writes: On 2015-02-17 05:51:22 +0900, Michael Paquier wrote: diff --git a/src/include/catalog/pg_authid.h b/src/include/catalog/pg_authid.h index e01e6aa..d8789a5 100644 --- a/src/include/catalog/pg_authid.h +++ b/src/include/catalog/pg_authid.h @@ -56,8 +56,10 @@ CATALOG(pg_authid,1260) BKI_SHARED_RELATION BKI_ROWTYPE_OID(2842) BKI_SCHEMA_MAC int32rolconnlimit; /* max connections allowed (-1=no limit) */ /* remaining fields may be null; use heap_getattr to read them! */ -textrolpassword;/* password, if any */ timestamptz rolvaliduntil; /* password expiration time, if any */ +#ifdef CATALOG_VARLEN +textrolpassword;/* password, if any */ +#endif } FormData_pg_authid; That change IIRC is wrong, because it'll make rolvaliduntil until NOT NULL (any column that's fixed width and has only fixed with columns before it is marked as such). You were muttering about a BKI_FORCE_NOT_NULL option ... for symmetry, maybe we could add BKI_FORCE_NULL as well, and then use that for cases like this? Also, if we want to insist that these fields be accessed through heap_getattr, I'd be inclined to put them inside the #ifdef CATALOG_VARLEN to enforce that. I'm generally -1 on reordering any catalog columns as part of this patch. There should be zero user-visible change from it IMO. However, if we stick both those columns inside the ifdef, we don't need to reorder. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory
Attached is an 0.3 version, rebased over today's HEAD changes (applies to commit 9e3ad1aac52454569393a947c06be0d301749362 or later), and with some better logic for transferring expanded array values into and out of plpgsql functions. Using this example: create or replace function arraysetnum(n int) returns numeric[] as $$ declare res numeric[] := '{}'; begin for i in 1 .. n loop res[i] := i; end loop; return res; end $$ language plpgsql strict; create or replace function arraysumnum(arr numeric[]) returns numeric as $$ declare res numeric := 0; begin for i in array_lower(arr, 1) .. array_upper(arr, 1) loop res := res + arr[i]; end loop; return res; end $$ language plpgsql strict; create or replace function arraytimenum(n int) returns numeric as $$ declare tmp numeric[]; begin tmp := arraysetnum(n); return arraysumnum(tmp); end $$ language plpgsql strict; either of the test cases select arraysumnum(arraysetnum(10)); select arraytimenum(10); involve exactly one coercion from flat to expanded array (during the initial assignment of the '{}' constant to the res variable), no coercions from expanded to flat, and no bulk copy operations. So I'm starting to feel fairly good about this. Obviously there's a nontrivial amount of work to do with integrating the array-code changes and teaching the rest of the array functions about expanded arrays (or at least as many of them as seem performance-critical). But that looks like just a few days of basically-mechanical effort. A larger question is what we ought to do about extending the array-favoring hacks in plpgsql to support this type of optimization for non-built-in types. Realize that what this patch is able to improve are basically two types of cases: * nests of function calls that take and return the same complex datatype, think foo(bar(baz(x))), where x is stored in some flat format but foo() bar() and baz() all agree on an expanded format that's easier to process. * plpgsql variables stored in an expanded format that's easier to process for most functions that might work with their values. The first case can be implemented by mutual agreement among the functions of the datatype; it does not need any additional help beyond what's in this patch. But the second case does not work very well unless plpgsql takes some proactive step to force variable values into the expanded format. Otherwise you get a win only if the last assignment to the variable happened to come from a source that supplied a read-write expanded value. You can make that happen with appropriate coding in the plpgsql function, of course, but it's klugy to have to do that. I would not be ashamed to ship this in 9.5 as just an array optimization and leave the larger question for next time ... but it does feel a bit unfinished like this. OTOH, I'm not sure whether the PostGIS folk care all that much about the intermediate-values-in-plpgsql-variables scenario. They didn't bring it up in the discussion a year or so back about their requirements. We do know very well that plpgsql array variables are a performance pain point, so maybe fixing that is enough of a goal for 9.5. (BTW, the nested-function-calls case sure seems like it's dead center of the wheelhouse for JSONB. Just sayin'. I do not myself have time to think about applying this technology to JSONB right now, but does anyone else want to step up?) regards, tom lane diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c index 867035d..860ad78 100644 *** a/src/backend/access/common/heaptuple.c --- b/src/backend/access/common/heaptuple.c *** *** 60,65 --- 60,66 #include access/sysattr.h #include access/tuptoaster.h #include executor/tuptable.h + #include utils/expandeddatum.h /* Does att's datatype allow packing into the 1-byte-header varlena format? */ *** heap_compute_data_size(TupleDesc tupleDe *** 93,105 for (i = 0; i numberOfAttributes; i++) { Datum val; if (isnull[i]) continue; val = values[i]; ! if (ATT_IS_PACKABLE(att[i]) VARATT_CAN_MAKE_SHORT(DatumGetPointer(val))) { /* --- 94,108 for (i = 0; i numberOfAttributes; i++) { Datum val; + Form_pg_attribute atti; if (isnull[i]) continue; val = values[i]; + atti = att[i]; ! if (ATT_IS_PACKABLE(atti) VARATT_CAN_MAKE_SHORT(DatumGetPointer(val))) { /* *** heap_compute_data_size(TupleDesc tupleDe *** 108,118 */ data_length += VARATT_CONVERTED_SHORT_SIZE(DatumGetPointer(val)); } else { ! data_length = att_align_datum(data_length, att[i]-attalign, ! att[i]-attlen, val); ! data_length = att_addlength_datum(data_length, att[i]-attlen, val); } } --- 111,131 */ data_length += VARATT_CONVERTED_SHORT_SIZE(DatumGetPointer(val)); } +
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 17/02/15 03:03, Andrew Dunstan wrote: On 02/16/2015 08:57 PM, Andrew Dunstan wrote: On 02/16/2015 08:48 PM, Petr Jelinek wrote: On 17/02/15 01:57, Peter Geoghegan wrote: On Mon, Feb 16, 2015 at 4:44 PM, Petr Jelinek p...@2ndquadrant.com wrote: We definitely want this feature, I wished to have this info many times. I would still like to see a benchmark. Average of 3 runs of read-only pgbench on my system all with pg_stat_statement activated: HEAD: 20631 SQRT: 20533 SQRTD: 20592 So using sqrtd the cost is 0.18%. I think that's acceptable. Actually, sqrt/sqrtd is not called in accumulating the stats, only in the reporting function. So it looks like the difference here should be noise. Maybe we need some longer runs. Yes there are variations between individual runs so it might be really just that, I can leave it running for much longer time tomorrow. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We do not need pg_promote_v4_to_v6_addr/mask
I wrote: We have some code in the server that attempts to match IPv4 address entries in pg_hba.conf to incoming connections that are in IPv6 protocol but have addresses in the range :::: (the IPv4-in-IPv6 subrange). As revealed by today's bug report from Hugo Osvaldo Barrera, this code has been broken since commit f3aec2c7f51904e7 (shipped in 9.0), as a result of sloppiness with a memcpy() source address. How is it that nobody noticed? BTW, a bit of digging in the git logs and mail archives says that the code in question was originally added in 7.4 (commit 3c9bb8886df7d56a), in response to this discussion: http://www.postgresql.org/message-id/flat/200309012156.05874.t.maekit...@epgmbh.de So back in 2003 there were Linux boxes that actively transformed IPv4 connection addresses to :::: format. Current Linux behavior is the exact opposite: even if you try to say :::: in a connection request, IPv4 is what comes out the other end. I find the same on current OS X btw. So I'm definitely now of the opinion that this is a workaround for a long-deceased Linux kernel bug, and not something we need to continue^X^X^Xresume supporting. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On Mon, Feb 16, 2015 at 4:11 PM, Peter Geoghegan p...@heroku.com wrote: Jim Nasby said something about setting the HEAP_XMIN_INVALID hint bit. Maybe he is right...if that can be made to be reliable (always WAL-logged), it could be marginally better than setting xmin to invalidTransactionId. I'm not a big fan of that. The xmin-invalid bit is currently always just a hint. Well, Andres makes the point that that isn't quite so. Hmm. So the tqual.c routines actually check if (HeapTupleHeaderXminInvalid(tuple)). Which is: #define HeapTupleHeaderXminInvalid(tup) \ ( \ ((tup)-t_infomask (HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID)) == \ HEAP_XMIN_INVALID \ ) What appears to happen if I try to only use the HEAP_XMIN_INVALID bit (and not explicitly set xmin to InvalidTransactionId, and not explicitly check that xmin isn't InvalidTransactionId in each HeapTupleSatisfies* routine) is that after a little while, Jeff Janes' tool shows up spurious unique violations, as if some tuple has become visible when it shouldn't have. I guess this is because the HEAP_XMIN_COMMITTED hint bit can still be set, which in effect invalidates the HEAP_XMIN_INVALID hint bit. It takes about 2 minutes before this happens for the first time when fsync = off, following a fresh initdb, which is unacceptable. I'm not sure if it's worth trying to figure out how HEAP_XMIN_COMMITTED gets set. Not that I'm 100% sure that that's what this really is; it just seems very likely. Attached broken patch (broken_visible.patch) shows the changes made to reveal this problem. Only the changes to tqual.c and heap_delete() should matter here, since I did not test recovery. I also thought about unifying the check for if (TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple), InvalidTransactionId)) with the conventional HeapTupleHeaderXminInvalid() macro, and leaving everything else as-is. This is no good either, and for similar reasons - control often won't reach the macro, which is behind a check of if (!HeapTupleHeaderXminCommitted(tuple)). The best I think we can hope for is having a dedicated if (TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple), InvalidTransactionId)) macro HeapTupleHeaderSuperDeleted() to do the work each time, which does not need to be checked so often. A second attached patch (compact_tqual_works.patch - which is non-broken, AFAICT) shows how this is possible, while also moving the check further down within each tqual.c routine (which seems more in keeping with the fact that super deletion is a relatively obscure concept). I haven't been able to break this variant using my existing test suite, so this seems promising as a way of reducing tqual.c clutter. However, as I said the other day, this is basically just polish. -- Peter Geoghegan diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 0aa3e57..b777c56 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -2899,7 +2899,7 @@ l1: } else { - HeapTupleHeaderSetXmin(tp.t_data, InvalidTransactionId); + HeapTupleHeaderSetXminInvalid(tp.t_data); } /* Make sure there is no forward chain link in t_ctid */ @@ -7382,12 +7382,12 @@ heap_xlog_delete(XLogReaderState *record) htup-t_infomask = ~(HEAP_XMAX_BITS | HEAP_MOVED); htup-t_infomask2 = ~HEAP_KEYS_UPDATED; HeapTupleHeaderClearHotUpdated(htup); + if (xlrec-flags XLOG_HEAP_KILLED_SPECULATIVE_TUPLE) + HeapTupleHeaderSetXminInvalid(htup); + fix_infomask_from_infobits(xlrec-infobits_set, htup-t_infomask, htup-t_infomask2); - if (!(xlrec-flags XLOG_HEAP_KILLED_SPECULATIVE_TUPLE)) - HeapTupleHeaderSetXmax(htup, xlrec-xmax); - else - HeapTupleHeaderSetXmin(htup, InvalidTransactionId); + HeapTupleHeaderSetXmax(htup, xlrec-xmax); HeapTupleHeaderSetCmax(htup, FirstCommandId, false); /* Mark the page as a candidate for pruning */ diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c index 99bb417..fd857b1 100644 --- a/src/backend/utils/time/tqual.c +++ b/src/backend/utils/time/tqual.c @@ -170,13 +170,6 @@ HeapTupleSatisfiesSelf(HeapTuple htup, Snapshot snapshot, Buffer buffer) Assert(ItemPointerIsValid(htup-t_self)); Assert(htup-t_tableOid != InvalidOid); - /* - * Never return super-deleted tuples - */ - if (TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple), - InvalidTransactionId)) - return false; - if (!HeapTupleHeaderXminCommitted(tuple)) { if (HeapTupleHeaderXminInvalid(tuple)) @@ -735,15 +728,6 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot, snapshot-xmin = snapshot-xmax = InvalidTransactionId; snapshot-speculativeToken = 0; - /* - * Never return super-deleted tuples - * - * XXX: Comment this code out and you'll get conflicts within - * ExecLockUpdateTuple(), which result in an infinite loop. - */ - if (TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple), - InvalidTransactionId)) - return false; if
Re: [HACKERS] We do not need pg_promote_v4_to_v6_addr/mask
Andrew Dunstan and...@dunslane.net writes: On 02/16/2015 09:07 PM, Tom Lane wrote: BTW, a bit of digging in the git logs and mail archives says that the code in question was originally added in 7.4 (commit 3c9bb8886df7d56a), in response to this discussion: http://www.postgresql.org/message-id/flat/200309012156.05874.t.maekit...@epgmbh.de Wow, talk about a walk down memory lane. Tell me about it ;-). I was entirely astonished to discover that the original author of the promote_v4_to_v6 code was moi. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup -x/X doesn't play well with archive_mode wal_keep_segments
On Fri, Feb 13, 2015 at 9:18 AM, Andres Freund and...@2ndquadrant.com wrote: On 2015-02-12 11:44:05 -0800, Sergey Konoplev wrote: On Thu, Feb 12, 2015 at 11:40 AM, Andres Freund and...@2ndquadrant.com wrote: This obviously should not be the case. I'll have a look in a couple of hours. Until then you can likely just work around the problem by creating the archive_status directory. Thank you. Just let me know if you need some extra info or debugging. No need for debugging. It's plain and simply a (cherry-pick) conflict I resolved wrongly during backpatching. 9.3, 9.4 and master do not have that problem. That whole fix was quite painful because every single release had significantly different code :(. pg_basebackup/ is pretty messy. I'm not sure why my testsuite didn't trigger that problem. Possibly because a retry makes things work :( Somewhat uckily it's 9.2 only (9.3, 9.4 and master look correct, earlier releases don't have pg_receivexlog) Are you planning to back-patch the fix to 9.2? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 02/16/2015 08:57 PM, Andrew Dunstan wrote: On 02/16/2015 08:48 PM, Petr Jelinek wrote: On 17/02/15 01:57, Peter Geoghegan wrote: On Mon, Feb 16, 2015 at 4:44 PM, Petr Jelinek p...@2ndquadrant.com wrote: We definitely want this feature, I wished to have this info many times. I would still like to see a benchmark. Average of 3 runs of read-only pgbench on my system all with pg_stat_statement activated: HEAD: 20631 SQRT: 20533 SQRTD: 20592 So using sqrtd the cost is 0.18%. I think that's acceptable. Actually, sqrt/sqrtd is not called in accumulating the stats, only in the reporting function. So it looks like the difference here should be noise. Maybe we need some longer runs. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On 17/02/15 02:57, Andrew Dunstan wrote: On 02/16/2015 08:48 PM, Petr Jelinek wrote: On 17/02/15 01:57, Peter Geoghegan wrote: On Mon, Feb 16, 2015 at 4:44 PM, Petr Jelinek p...@2ndquadrant.com wrote: We definitely want this feature, I wished to have this info many times. I would still like to see a benchmark. Average of 3 runs of read-only pgbench on my system all with pg_stat_statement activated: HEAD: 20631 SQRT: 20533 SQRTD: 20592 So using sqrtd the cost is 0.18%. I think that's acceptable. I think so too. I found one more issue with the 1.2--1.3 upgrade script, the DROP FUNCTION pg_stat_statements(); should be DROP FUNCTION pg_stat_statements(bool); since in 1.2 the function identity has changed. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : REINDEX xxx VERBOSE
Hello, I had a look on gram.y and found other syntaxes using WITH option clause. At Wed, 11 Feb 2015 14:34:17 -0600, Jim Nasby jim.na...@bluetreble.com wrote in 54dbbcc9.1020...@bluetreble.com I suspect at least some of this stems from how command line programs tend to process options before arguments. I tend to agree with you Tom, but I think what's more important is that we're consistent. COPY is already a bit of an oddball because it uses WITH, but both EXPLAIN and VACUUM use parenthesis immediately after the first verb. Introducing a parenthesis version that goes at the end instead of the beginning is just going to make this worse. If we're going to take a stand on this, we need to do it NOW, before we have even more commands that use (). I know you were worried about accepting options anywhere because it leads to reserved words, but perhaps we could support it just for EXPLAIN and VACUUM, and then switch to trailing options if people think that would be better. I agree with the direction, but I see two issues here; how many syntax variants we are allowed to have for one command at a time, and how far we should/may extend the unified options syntax on other commands. Let me put the issues aside for now, VACUUM can have trailing options naturally but it seems to change, but, IMHO, EXPLAIN should have the target statement at the tail end. Are you thinking of syntaxes like following? VACUUM [FULL] [FREEZE] ... [ANALYZE] [tname [(cname, ...)] | VACUUM [({FULL [bool]|FREEZE [bool]|...}[,...])] [tname [(cname, ...)] | VACUUM [tname [(cname, ...)] [[WITH ]({FULL [bool]|FREEZE [bool]|...})] REINDEX [{INDEX|TABLE|...}] name [[WITH] (VERBOSE [bool]|...)] EXPLAIN [[WITH] ({ANALYZE [bool]|VERBOSE [bool]|... [,...]})] statement For concrete examples, the lines prefixed by asterisk are in new syntax. VACUUM FULL table1; VACUUM ANALYZE table1 (col1); VACUUM (ANALYZE, VERBOSE) table1 (col1); *VACUUM table1 WITH (FREEZE on) *VACUUM table1 (cola) WITH (ANALYZE) *VACUUM table1 WITH (ANALYZE) *VACUUM table1 (FREEZE on) The fifth example looks quite odd. REINDEX INDEX index1 FORCE; *REINDEX TABLE table1 WITH (VERBOSE on); *REINDEX TABLE table1 (VERBOSE on, FORCE on); EXPLAIN (ANALYZE) SELECT 1; *EXPLAIN WITH (ANALYZE) SELECT 1; The WITH looks a bit uneasy.. COPY table1 FROM 'file.txt' WITH (FORMAT csv); Returning to the second issue, the following statements have option list (or single option) preceded (or not preceded) by the word WITH. The prefixing dollar sign indicates that the syntax is of SQL standard according to the PostgreSQL Documentation. Asterisk indicates that the line shows the syntax if the new policy is applied. Other few statements like DECLARE looks using WITH as a part of an idiom. CREATE MATERIALIZED VIEW mv1 WITH (storageparam, ...) AS qry WITH DATA; This is similar to EXPLAIN in the sense that a query follows it, but this syntax can have the second WITH follows by DATA. CREATE EXTENSION ext1 WITH SCHEMA s1 VERSION v1 FROM over; *CREATE EXTENSION ext1 WITH (SCHEMA s1, VERSION v1, FROM over); This seems to fit the unification. CREATE ROLE role WITH LOGIN; CREATE ROLE role SUPERUSER, LOGIN; $CREATE ROLE role WITH ADMIN admin; *CREATE ROLE role WITH (SUPERUSER, LOGIN); *CREATE ROLE role (SUPERUSER, LOGIN); This seems meaninglessly too complecated. GRANT WITH GRANT OPTION; *GRANT WITH (GRANT on); Mmm. Seems no reasoning... CREATE VIEW v1 AS qry WITH CASCADED CHECK OPTION; *CREATE VIEW v1 AS qry WITH (CASCADED_CHECK); Wired syntax? ALTER DATABASE db1 WITH CONNECTION LIMIT 50; *ALTER DATABASE db1 WITH (CONNECTION_LIMIT 50); Hardly looks reasonable.. $DECLARE c1 INSENSITIVE SCROLL CURSOR WITH HOLD; This cannot have another style. Mmm... I'm at a loss what is desirable.. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2015 - mentors, students and admins.
On 13 February 2015 at 15:40, Thom Brown t...@linux.com wrote: On 12 February 2015 at 14:55, Alexander Korotkov aekorot...@gmail.com wrote: Hi! On Mon, Feb 9, 2015 at 11:52 PM, Thom Brown t...@linux.com wrote: Google Summer of Code 2015 is approaching. I'm intending on registering PostgreSQL again this year. Before I do that, I'd like to have an idea of how many people are interested in being either a student or a mentor. I'm ready to participate as mentor again! Thanks for the interest from mentors and students so far. Could I interest more to volunteer as mentors this year? I'll be registering the project with Google soon so I'll want an idea of how many mentors we'll have available. Any more mentors wish to step forward? Please let me know A.S.A.P. Thanks Thom
Re: [HACKERS] Commit fest 2015-12 enters money time
2015-02-16 12:25 GMT+09:00 Michael Paquier michael.paqu...@gmail.com: On Fri, Feb 13, 2015 at 10:06 AM, Michael Paquier michael.paqu...@gmail.com wrote: In order to move on to the next CF, I am going to go through all the remaining patches and update their status accordingly. And sorry for slacking a bit regarding that. If you wish to complain, of course feel free to post messages on this thread or on the thread related to the patch itself. As all the entries have been either moved or updated, CF 2014-12 is closed, and CF 2015-02 has been changed to In Progress. I tried to add my name on the CF entry for the Join pushdown support for foreign tables patch, however, it was unintentionally marked as rejected on the last CF, even though it should be returned with feedback. https://commitfest.postgresql.org/3/20/ Is it available to move it to the current CF? I couldn't find operation to add new entry on the current CF. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow snapshot too old error, to prevent bloat
On Feb 17, 2015 12:26 AM, Andres Freund and...@2ndquadrant.com wrote: On 2015-02-16 16:35:46 -0500, Bruce Momjian wrote: It seems we already have a mechanism in place that allows tuning of query cancel on standbys vs. preventing standby queries from seeing old data, specifically max_standby_streaming_delay/max_standby_archive_delay. We obsessed about how users were going to react to these odd variables, but there has been little negative feedback. FWIW, I think that's a somewhat skewed perception. I think it was right to introduce those, because we didn't really have any alternatives. But max_standby_streaming_delay, max_standby_archive_delay and hot_standby_feedback are among the most frequent triggers for questions and complaints that I/we see. Agreed. And a really bad one used to be vacuum_defer_cleanup_age, because of confusing units amongst other things. Which in terms seems fairly close to Kevins suggestions, unfortunately. /Magnus
Re: [HACKERS] Really bad blowups with hash outer join and nulls
On 16.2.2015 03:38, Andrew Gierth wrote: Tomas == Tomas Vondra tomas.von...@2ndquadrant.com writes: Tomas Improving the estimates is always good, but it's not going to Tomas fix the case of non-NULL values (it shouldn't be all that Tomas difficult to create such examples with a value whose hash starts Tomas with a bunch of zeroes). Right now, I can't get it to plan such an example, because (a) if there are no stats to work from then the planner makes fairly pessimistic assumptions about hash bucket filling, and (b) if there _are_ stats to work from, then a frequently-occurring non-null value shows up as an MCV and the planner takes that into account to calculate bucketsize. The problem could only be demonstrated for NULLs because the planner was ignoring NULL for the purposes of estimating bucketsize, which is correct for all join types except RIGHT and FULL (which, iirc, are more recent additions to the hashjoin repertoire). Oh, right, the estimate fix is probably sufficient then. If you want to try testing it, you may find this useful: select i, hashint8(i) from unnest(array[1474049294, -1779024306, -1329041947]) u(i); i | hashint8 -+-- 1474049294 |0 -1779024306 |0 -1329041947 |0 (3 rows) (those are the only three int4 values that hash to exactly 0) It's probably possible to construct pathological cases by finding a lot of different values with zeros in the high bits of the hash, but that's something that wouldn't be likely to happen by chance. Yeah, it's probably possible, but it's admittedly considerably harder than I initially thought. For example it could be possible to create the table with no MCV values but sorted so that all the initial values have hashvalue=0, triggering (growEnabled=false). But that's rather unlikely to happen in practice I guess. A more likely failure scenario is a hash join higher up the plan, processing results of other joins etc. In that case the estimates will be tricky, although the planner chooses quite pessimistic defaults in those cases. -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2015 - mentors, students and admins.
I would like to participate as student. On 10 February 2015 at 03:52, Thom Brown t...@linux.com wrote: Hi all, Google Summer of Code 2015 is approaching. I'm intending on registering PostgreSQL again this year. Before I do that, I'd like to have an idea of how many people are interested in being either a student or a mentor. I've volunteered to be admin again, but if anyone else has a strong interest of seeing the projects through this year, please let yourself be known. Who would be willing to mentor projects this year? Project ideas are welcome, and I've copied many from last year's submissions into this year's wiki page. Please feel free to add more (or delete any that stand no chance or are redundant): https://wiki.postgresql.org/wiki/GSoC_2015 Students can find more information about GSoC here: http://www.postgresql.org/developer/summerofcode Thanks Thom
Re: [HACKERS] Allow snapshot too old error, to prevent bloat
On 2015-02-16 05:19:11 +, Kevin Grittner wrote: Tom Lane t...@sss.pgh.pa.us wrote: Jim Nasby jim.na...@bluetreble.com writes: On 2/15/15 10:36 AM, Tom Lane wrote: I wonder if we couldn't achieve largely the same positive effects through adding a simple transaction-level timeout option. We suggested this to our customer and got out of the meeting with it looking like it *might* fly. In the next meeting, however, they said they had run it by others and reviewed the code and it was completely unacceptable -- they would not consider pg with this as the solution. A common use-case is long-running reports hitting relatively stable data in a database that also has tables with a high churn rate (ie: queue tables). In those scenarios your only options right now are to suffer huge amounts of bloat in the high-churn or not do your reporting. A simple transaction timeout only solves this by denying you reporting queries. Agreed, but Kevin's proposal has exactly the same problem only worse, because (a) the reporting transactions might or might not fail (per Murphy, they'll fail consistently only when you're on deadline), and (b) if they do fail, there's nothing you can do short of increasing the slack db-wide. These they were comfortable with, and did *not* consider to be unpredictable or something they could not do something about. I really don't feel I can say more than that, though, without disclosing more than I should. I agree that we need to do something about the dangers of long snapshots, I'm not sure this is it. I'm not sure of the contrary either. What I'm definitely not a fan of though, is this implementation. Having to add checks to a large number of places is a recipe for silently wrong query results. One thing I was wondering about recently was introducing an optimization for repeatedly updated rows into the vacuum code: A row that has xmin = xmax where these have committed can be removed, even if the xid is above the xmin horizon - no other transaction is ever going to see it. While there's some hairy-ness implementing that, it doesn't seem too hard. And there's a fair number of cases where that'd allow less bloat to accumulate. Obviously it'd be better if we had logic to do that for other patterns as well (where the updates aren't in the same xact), but it seems like a good start. There might be something in that, but again it's not much like this patch. The key point I think we're both making is that nondeterministic failures are bad, especially when you're talking about long-running, expensive-to- retry transactions. What the customer most doesn't want to be nondeterministic is whether the error is generated only when the snapshot has been used to read a page which has been modified since the snapshot was taken. If tables or materialized views are set up before a query and then not subsequently modified during the execution of the query, that query must not be canceled even if it runs for days, but it must not cause unconstrained bloat during the run. So far I don't see any way to avoid including the LSN with the snapshot or modifying the index AMs. Let's be clear on the footprint for that; for the btree implementation it is: IIUC you never would want cancellations when accessing the the tables these longrunning backends actually access. The errors about too old snapshots are just a stopgap because we can't compute the xmin per table in a meaningful way atm. Right? Is the bloat caused by rows these transactions actually can see or are the not-removed rows newer than the transaction's xmax? Since you actually don't want cancellations for the long running reporting queries it very much might be sensible to switch to a more complicated version of HeapTupleSatisfiesVacuum() if there's longrunning queries. One that can recognize if rows are actually visible to any backend, or if there are just snapshots that see older rows. I've previously wondered how hard this would be, but I don't think it's *that* hard. And it'd be a much more general solution. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication identifiers, take 4
On 2015-02-16 11:07:09 +0200, Heikki Linnakangas wrote: On 02/16/2015 02:21 AM, Andres Freund wrote: Furthermore the fact that the origin of records is recorded allows to avoid decoding them in logical decoding. That has both efficiency advantages (we can do so before they are stored in memory/disk) and functionality advantages. Imagine using a logical replication solution to replicate inserts to a single table between two databases where inserts are allowed on both - unless you prevent the replicated inserts from being replicated again you obviously have a loop. This infrastructure lets you avoid that. That makes sense. How does this work if you have multiple replication systems in use in the same cluster? You might use Slony to replication one table to one system, and BDR to replication another table with another system. Or the same replication software, but different hosts. It should just work. Replication identifiers are identified by a free form text, each replication solution can add the information/distinguising data they need in there. Bdr uses something like #define BDR_NODE_ID_FORMAT bdr_UINT64_FORMAT_%u_%u_%u_%s with remote_sysid, remote_tlid, remote_dboid, MyDatabaseId, configurable_name as parameters as a replication identifier name. I've been wondering whether the bdr_ part in the above should be split of into a separate field, similar to how the security label stuff does it. But I don't think it'd really buy us much, especially as we did not do that for logical slot names. Each of the used replication solution would probably ask their output plugin to only stream locally generated (i.e. origin_id = InvalidRepNodeId) changes, and possibly from a defined list of other known hosts in the cascading case. Does that answer your question? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On 2015-02-16 10:00:24 +0200, Heikki Linnakangas wrote: On 02/16/2015 02:44 AM, Peter Geoghegan wrote: Are we happy with acquiring the SpeculativeInsertLock heavy-weight lock for every insertion? That seems bad for performance reasons. Also, are we happy with adding the new fields to the proc array? Another alternative that was discussed was storing the speculative insertion token on the heap tuple itself. (See http://www.postgresql.org/message-id/52d00d2d.6030...@vmware.com) Whatever works, really. I can't say that the performance implications of acquiring that hwlock are at the forefront of my mind. I never found that to be a big problem on an 8 core box, relative to vanilla INSERTs, FWIW - lock contention is not normal, and may be where any heavweight lock costs would really be encountered. Oh, cool. I guess the fast-path in lmgr.c kicks ass, then :-). I don't think it actually uses the fast path, does it? IIRC that's restricted to LOCKTAG_RELATION when done via LockAcquireExtended + open coded for the VirtualXactLock table. I'm not super worried atm either. Worth checking, but probably not worth obsessing over. Besides, why should one transaction be entitled to insert a conflicting value tuple just because a still running transaction deleted it (having also inserted the tuple itself)? Didn't one prototype version of value locking #2 have that as a bug [1]? In fact, originally, didn't the xmin set to invalid thing come immediately from realizing that that wasn't workable? Ah, right. So the problem was that some code might assume that if you insert a row, delete it in the same transaction, and then insert the same value again, the 2nd insertion will always succeed because the previous insertion effectively acquired a value-lock on the key. Ok, so we can't unconditionally always ignore tuples with xmin==xmax. We would need an infomask bit to indicate super-deletion, and only ignore it if the bit is set. I'm starting to think that we should bite the bullet and consume an infomask bit for this. The infomask bits are a scarce resource, but we should use them when it makes sense. It would be good for forensic purposes too, to leave a trace that a super-deletion happened. Well, we IIRC don't have any left right now. We could reuse MOVED_IN|MOVED_OUT, as that never was set in the past. But it'd essentially use two infomask bits forever... Jim Nasby said something about setting the HEAP_XMIN_INVALID hint bit. Maybe he is right...if that can be made to be reliable (always WAL-logged), it could be marginally better than setting xmin to invalidTransactionId. I'm not a big fan of that. The xmin-invalid bit is currently always just a hint. We already rely on XMIN_INVALID being set correctly for freezing. C.f. HeapTupleHeaderXminFrozen, HeapTupleHeaderXminInvalid, et al. So it'd not necessarily end up being that bad. And the super deletion could easily just set it in the course of it's WAL logging. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication identifiers, take 4
On 02/16/2015 02:21 AM, Andres Freund wrote: Furthermore the fact that the origin of records is recorded allows to avoid decoding them in logical decoding. That has both efficiency advantages (we can do so before they are stored in memory/disk) and functionality advantages. Imagine using a logical replication solution to replicate inserts to a single table between two databases where inserts are allowed on both - unless you prevent the replicated inserts from being replicated again you obviously have a loop. This infrastructure lets you avoid that. That makes sense. How does this work if you have multiple replication systems in use in the same cluster? You might use Slony to replication one table to one system, and BDR to replication another table with another system. Or the same replication software, but different hosts. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On 02/16/2015 02:44 AM, Peter Geoghegan wrote: On Sat, Feb 14, 2015 at 2:06 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I did not solve the potential for livelocks (discussed at http://www.postgresql.org/message-id/cam3swztftt_fehet3tu3ykcpcypynnauquz3q+naasnh-60...@mail.gmail.com). The patch always super-deletes the already-inserted tuple on conflict, and then waits for the other inserter. That would be nice to fix, using one of the ideas from that thread, or something else. How about we don't super-delete at all with exclusion constraints? I'd be willing to accept unprincipled deadlocks for exclusion constraints, because they already exist today for UPSERT/NOSERT type use cases, and with idiomatic usage seem far less likely for the IGNORE variant (especially with exclusion constraints). So INSERT ON CONFLICT IGNORE on a table with an exclusion constraint might fail. I don't like that. The point of having the command in the first place is to deal with concurrency issues. If it sometimes doesn't work, it's broken. I can see people using ON CONFLICT UPDATE where they'd almost or actually be better off using a plain UPDATE - that's quite a different situation. I find livelocks to be a *very* scary prospect, and I don't think the remediations that were mentioned are going to fly. It's just too messy, and too much of a niche use case. TBH I am skeptical of the idea that we can fix exclusion constraints properly in this way at all, at least not without the exponential backoff thing, which just seems horrible. The idea of comparing the TIDs of the tuples as a tie-breaker seems most promising to me. If the conflicting tuple's TID is smaller than the inserted tuple's, super-delete and wait. Otherwise, wait without super-deletion. That's really simple. Do you see a problem with that? Although you understood what I was on about when I first talked about unprincipled deadlocks, I think that acceptance of that idea came much later from other people, because it's damn complicated. BTW, it would good to explain somewhere in comments or a README the term unprincipled deadlock. It's been a useful concept, and hard to grasp. If you defined it once, with examples and everything, then you could just have See .../README in other places that need to refer it. It's not worth it to add some weird Dining philosophers exponential backoff thing to make sure that the IGNORE variant when used with exclusion constraints can never deadlock in an unprincipled way, but if it is worth it then it seems unreasonable to suppose that this patch needs to solve that pre-existing problem. No? The point of solving the existing problem is that it allows us to split the patch, for easier review. Are we happy with acquiring the SpeculativeInsertLock heavy-weight lock for every insertion? That seems bad for performance reasons. Also, are we happy with adding the new fields to the proc array? Another alternative that was discussed was storing the speculative insertion token on the heap tuple itself. (See http://www.postgresql.org/message-id/52d00d2d.6030...@vmware.com) Whatever works, really. I can't say that the performance implications of acquiring that hwlock are at the forefront of my mind. I never found that to be a big problem on an 8 core box, relative to vanilla INSERTs, FWIW - lock contention is not normal, and may be where any heavweight lock costs would really be encountered. Oh, cool. I guess the fast-path in lmgr.c kicks ass, then :-). I don't see how storing the promise token in heap tuples buys us not having to involve heavyweight locking of tokens. (I think you may have made a thinko in suggesting otherwise) It wouldn't get rid of heavyweight locks, but it would allow getting rid of the procarray changes. The inserter could generate a token, then acquire the hw-lock on the token, and lastly insert the heap tuple with the correct token. Are we happy with the way super-deletion works? Currently, the xmin field is set to invalid to mark the tuple as super-deleted. That required checks in HeapTupleSatisfies* functions. One alternative would be to set xmax equal to xmin, and teach the code currently calls XactLockTableWait() to check if xmax=xmin, and not consider the tuple as conflicting. That couldn't work without further HeapTupleSatisfiesDirty() logic. Why not? Besides, why should one transaction be entitled to insert a conflicting value tuple just because a still running transaction deleted it (having also inserted the tuple itself)? Didn't one prototype version of value locking #2 have that as a bug [1]? In fact, originally, didn't the xmin set to invalid thing come immediately from realizing that that wasn't workable? Ah, right. So the problem was that some code might assume that if you insert a row, delete it in the same transaction, and then insert the same value again, the 2nd insertion will always succeed because the previous insertion effectively acquired a
Re: [HACKERS] Replication identifiers, take 4
On 02/16/2015 11:18 AM, Andres Freund wrote: On 2015-02-16 11:07:09 +0200, Heikki Linnakangas wrote: How does this work if you have multiple replication systems in use in the same cluster? You might use Slony to replication one table to one system, and BDR to replication another table with another system. Or the same replication software, but different hosts. It should just work. Replication identifiers are identified by a free form text, each replication solution can add the information/distinguising data they need in there. Bdr uses something like #define BDR_NODE_ID_FORMAT bdr_UINT64_FORMAT_%u_%u_%u_%s with remote_sysid, remote_tlid, remote_dboid, MyDatabaseId, configurable_name as parameters as a replication identifier name. I've been wondering whether the bdr_ part in the above should be split of into a separate field, similar to how the security label stuff does it. But I don't think it'd really buy us much, especially as we did not do that for logical slot names. Each of the used replication solution would probably ask their output plugin to only stream locally generated (i.e. origin_id = InvalidRepNodeId) changes, and possibly from a defined list of other known hosts in the cascading case. Does that answer your question? Yes, thanks. Note to self and everyone else looking at this: It's important to keep in mind is that the replication IDs are completely internal to the local cluster. They are *not* sent over the wire. That's not completely true if you also use physical replication, though. The replication IDs will be included in the WAL stream. Can you have logical decoding running in a hot standby server? And how does the progress report stuff that's checkpointed in pg_logical/checkpoints work in a hot standby? (Sorry if I'm not making sense, I haven't really thought hard about this myself) At a quick glance, this basic design seems workable. I would suggest expanding the replication IDs to regular 4 byte oids. Two extra bytes is a small price to pay, to make it work more like everything else in the system. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication identifiers, take 4
On 2015-02-16 11:34:10 +0200, Heikki Linnakangas wrote: Yes, thanks. Note to self and everyone else looking at this: It's important to keep in mind is that the replication IDs are completely internal to the local cluster. They are *not* sent over the wire. Well, if you *want* to, you could send the external (free form text) replication identifiers over the wire in the output plugin. There are scenarios where that might make sense. That's not completely true if you also use physical replication, though. The replication IDs will be included in the WAL stream. Right. But then a physical rep server isn't realy a different server. Can you have logical decoding running in a hot standby server? Not at the moment, there's some minor stuff missing (following timelines, enforcing tuple visibility on the primary). And how does the progress report stuff that's checkpointed in pg_logical/checkpoints work in a hot standby? (Sorry if I'm not making sense, I haven't really thought hard about this myself) It doesn't work that greatly atm, they'd need to be WAL logged for that - which would not be hard. It'd be better to include the information in the checkpoint, but that unfortunately doesn't really work, because we store the checkpoint in the control file. And that has to be = 512 bytes. What, I guess, we could do is log it in the checkpoint, after determining the redo pointer, and store the LSN for it in the checkpoint record proper. That'd mean we'd read one more record at startup, but that isn't particularly bad. At a quick glance, this basic design seems workable. I would suggest expanding the replication IDs to regular 4 byte oids. Two extra bytes is a small price to pay, to make it work more like everything else in the system. I don't know. Growing from 3 to 5 byte overhead per relevant record (or even 0 to 5 in case the padding is reused) is rather noticeable. If we later find it to be a limit (I seriously doubt that), we can still increase it in a major release without anybody really noticing. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
Hello, Thank you for reviewing and testing the patch. + /* leave if data cannot be compressed */ + if (compressed_len == 0) + return false; This should be 0, pglz_compress returns -1 when compression fails. + if (pglz_decompress(block_image, bkpb-bkp_len, record-uncompressBuf, + bkpb-bkp_uncompress_len) == 0) Similarly, this should be 0. These have been corrected in the attached. Regarding the sanity checks that have been added recently. I think that they are useful but I am suspecting as well that only a check on the record CRC is done because that's reliable enough and not doing those checks accelerates a bit replay. So I am thinking that we should simply replace them by assertions. Removing the checks makes sense as CRC ensures correctness . Moreover ,as error message for invalid length of record is present in the code , messages for invalid block length can be redundant. Checks have been replaced by assertions in the attached patch. Following if condition in XLogCompressBackupBlock has been modified as follows Previous /* + * We recheck the actual size even if pglz_compress() reports success and + * see if at least 2 bytes of length have been saved, as this corresponds + * to the additional amount of data stored in WAL record for a compressed + * block via raw_length when block contains hole.. + */ + *len = (uint16) compressed_len; + if (*len = orig_len - SizeOfXLogRecordBlockImageCompressionInfo) + return false; + return true; Current if ((hole_length != 0) + (*len = orig_len - SizeOfXLogRecordBlockImageCompressionInfo)) + return false; +return true This is because the extra information raw_length is included only if compressed block has hole in it. Once we get those small issues fixes, I think that it is with having a committer look at this patch, presumably Fujii-san Agree. I will mark this patch as ready for committer Thank you, Rahila Syed __ Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. Support-compression-for-full-page-writes-in-WAL_v19.patch Description: Support-compression-for-full-page-writes-in-WAL_v19.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On 2015-02-16 11:30:20 +, Syed, Rahila wrote: - * As a trivial form of data compression, the XLOG code is aware that - * PG data pages usually contain an unused hole in the middle, which - * contains only zero bytes. If hole_length 0 then we have removed - * such a hole from the stored data (and it's not counted in the - * XLOG record's CRC, either). Hence, the amount of block data actually - * present is BLCKSZ - hole_length bytes. + * Block images are able to do several types of compression: + * - When wal_compression is off, as a trivial form of compression, the + * XLOG code is aware that PG data pages usually contain an unused hole + * in the middle, which contains only zero bytes. If length BLCKSZ + * then we have removed such a hole from the stored data (and it is + * not counted in the XLOG record's CRC, either). Hence, the amount + * of block data actually present is length bytes. The hole offset + * on page is defined using hole_offset. + * - When wal_compression is on, block images are compressed using a + * compression algorithm without their hole to improve compression + * process of the page. length corresponds in this case to the length + * of the compressed block. hole_offset is the hole offset of the page, + * and the length of the uncompressed block is defined by raw_length, + * whose data is included in the record only when compression is enabled + * and with_hole is set to true, see below. + * + * is_compressed is used to identify if a given block image is compressed + * or not. Maximum page size allowed on the system being 32k, the hole + * offset cannot be more than 15-bit long so the last free bit is used to + * store the compression state of block image. If the maximum page size + * allowed is increased to a value higher than that, we should consider + * increasing this structure size as well, but this would increase the + * length of block header in WAL records with alignment. + * + * with_hole is used to identify the presence of a hole in a block image. + * As the length of a block cannot be more than 15-bit long, the extra bit in + * the length field is used for this identification purpose. If the block image + * has no hole, it is ensured that the raw size of a compressed block image is + * equal to BLCKSZ, hence the contents of XLogRecordBlockImageCompressionInfo + * are not necessary. */ typedef struct XLogRecordBlockImageHeader { - uint16 hole_offset;/* number of bytes before hole */ - uint16 hole_length;/* number of bytes in hole */ + uint16 length:15, /* length of block data in record */ + with_hole:1;/* status of hole in the block */ + + uint16 hole_offset:15, /* number of bytes before hole */ + is_compressed:1;/* compression status of image */ + + /* Followed by the data related to compression if block is compressed */ } XLogRecordBlockImageHeader; Yikes, this is ugly. I think we should change the xlog format so that the block_id (which currently is XLR_BLOCK_ID_DATA_SHORT/LONG or a actual block id) isn't the block id but something like XLR_CHUNK_ID. Which is used as is for XLR_CHUNK_ID_DATA_SHORT/LONG, but for backup blocks can be set to to XLR_CHUNK_BKP_WITH_HOLE, XLR_CHUNK_BKP_COMPRESSED, XLR_CHUNK_BKP_REFERENCE... The BKP blocks will then follow, storing the block id following the chunk id. Yes, that'll increase the amount of data for a backup block by 1 byte, but I think that's worth it. I'm pretty sure we will be happy about the added extensibility pretty soon. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Mon, Feb 16, 2015 at 8:30 PM, Syed, Rahila rahila.s...@nttdata.com wrote: Regarding the sanity checks that have been added recently. I think that they are useful but I am suspecting as well that only a check on the record CRC is done because that's reliable enough and not doing those checks accelerates a bit replay. So I am thinking that we should simply replace them by assertions. Removing the checks makes sense as CRC ensures correctness . Moreover ,as error message for invalid length of record is present in the code , messages for invalid block length can be redundant. Checks have been replaced by assertions in the attached patch. After more thinking, we may as well simply remove them, an error with CRC having high chances to complain before reaching this point... Current if ((hole_length != 0) + (*len = orig_len - SizeOfXLogRecordBlockImageCompressionInfo)) + return false; +return true This makes sense. Nitpicking 1: +Assert(!(blk-with_hole == 1 blk-hole_offset = 0)); Double-space here. Nitpicking 2: char * page This should be rewritten as char *page, the * being assigned with the variable name. -- Michael
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On 2015-02-16 20:55:20 +0900, Michael Paquier wrote: On Mon, Feb 16, 2015 at 8:30 PM, Syed, Rahila rahila.s...@nttdata.com wrote: Regarding the sanity checks that have been added recently. I think that they are useful but I am suspecting as well that only a check on the record CRC is done because that's reliable enough and not doing those checks accelerates a bit replay. So I am thinking that we should simply replace them by assertions. Removing the checks makes sense as CRC ensures correctness . Moreover ,as error message for invalid length of record is present in the code , messages for invalid block length can be redundant. Checks have been replaced by assertions in the attached patch. After more thinking, we may as well simply remove them, an error with CRC having high chances to complain before reaching this point... Surely not. The existing code explicitly does it like if (blk-has_data blk-data_len == 0) report_invalid_record(state, BKPBLOCK_HAS_DATA set, but no data included at %X/%X, (uint32) (state-ReadRecPtr 32), (uint32) state-ReadRecPtr); these cross checks are important. And I see no reason to deviate from that. The CRC sum isn't foolproof - we intentionally do checks at several layers. And, as you can see from some other locations, we actually try to *not* fatally error out when hitting them at times - so an Assert also is wrong. Heikki: /* cross-check that the HAS_DATA flag is set iff data_length 0 */ if (blk-has_data blk-data_len == 0) report_invalid_record(state, BKPBLOCK_HAS_DATA set, but no data included at %X/%X, (uint32) (state-ReadRecPtr 32), (uint32) state-ReadRecPtr); if (!blk-has_data blk-data_len != 0) report_invalid_record(state, BKPBLOCK_HAS_DATA not set, but data length is %u at %X/%X, (unsigned int) blk-data_len, (uint32) (state-ReadRecPtr 32), (uint32) state-ReadRecPtr); those look like they're missing a goto err; to me. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers