Re: [HACKERS] Add pg_settings.pending_restart column
Peter Eisentraut-2 wrote So here is a patch for that. It adds a column pending_restart to pg_settings that is true when the configuration file contains a changed setting that requires a restart. We already had the logic to detect such changes, for producing the log entry. I have also set it up so that if you change your mind and undo the setting and reload the server, the pending_restart flag is reset to false. Doc typo: s/of/if/ Otherwise it seems fine but I cannot help but feel that false positives are possible; though nothing that doesn't already exist. Mainly, is the change going to end up only affect the reset or default value but not the currently active value? Instead of a boolean I'd rather have a string/enum that can capture the fact that a reboot is required and the expected outcome. The same field can then communicate non-reboot stuff too - like SIGHUP required or session scope. A simple reboot required boolean api could just as easily be done via a function; and why limit to just reboots and not reconnection or SIGHUP required? Scope creeping but the reboot case doesn't seem that special overall; other than the effort needed to realize the updated value. David J. -- View this message in context: http://postgresql.nabble.com/Add-pg-settings-pending-restart-column-tp5838009p5838020.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] KNN-GiST with recheck
On Thu, Jan 8, 2015 at 1:12 AM, Alexander Korotkov aekorot...@gmail.com wrote: On Tue, Dec 16, 2014 at 4:37 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Patch attached. It should be applied on top of my pairing heap patch at http://www.postgresql.org/message-id/548ffa2c.7060...@vmware.com. Some caveats: * The signature of the distance function is unchanged, it doesn't get a recheck argument. It is just assumed that if the consistent function sets the recheck flag, then the distance needs to be rechecked as well. We might want to add the recheck argument, like you Alexander did in your patch, but it's not important right now. I didn't get how that expected to work if we have only order by qual without filter qual. In this case consistent function just isn't called at all. * I used the distance term in the executor, although the ORDER BY expr machinery is more general than that. The value returned by the ORDER BY expression doesn't have to be a distance, although that's the only thing supported by GiST and the built-in opclasses. * I short-circuited the planner to assume that the ORDER BY expression always returns a float. That's true today for knn-GiST, but is obviously a bogus assumption in general. This needs some work to get into a committable state, but from a modularity point of view, this is much better than having the indexam to peek into the heap. Nice idea to put reordering into index scan node. Doesn't look like much of overengineering. I'm going to bring it to more commitable state. Following changes has been made in attached patch: * Get sort operators from pathkeys. * Recheck argument of distance function has been reverted. -- With best regards, Alexander Korotkov. knn-gist-recheck-5.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] restrict global access to be readonly
happy times wrote Sure, we can utilize the runtime parameter default_transaction_read_only, however, it does not restrict user from changing transaction attribute to non-readonly mode, so is not safe. ISTM that implementing a means to make this setting only super-user changeable would be a quick(er) solution to the problem - the decision as to what to disallow is already decided. It seems like it would have to be a separate GUC but it would require any new SQL but would simply leverage the existing settings system to setup database-user values that only a super-user can change. David J. -- View this message in context: http://postgresql.nabble.com/restrict-global-access-to-be-readonly-tp5837818p5838021.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] forward vs backward slashes in msvc build code
On Sun, Feb 15, 2015 at 12:51 PM, Peter Eisentraut pete...@gmx.net wrote: I understand that on Windows, you can use forward slashes in path names interchangeably with backward slashes (except possibly in the shell). I have developed the attached patch to change the msvc build code to use forward slashes consistently. Together with the other attached patch, which is an unpolished hack, this allows me to run the build.pl script on not-Windows. It won't actually build, but it will create all the project files. That way, I can check whether some makefile hackery would break the Windows build. This sounds like a good idea to improve the checks of the scripts of Windows, like the presence of win32rc files, etc. Could someone verify this please? I tested quickly the second patch with MS 2010 and I am getting a build failure: chkpass cannot complete because of crypt missing. On master build passes. More details here: C:\Users\mpaquier\git\postgres\pgsql.sln (default target) (1) - C:\Users\mpaquier\git\postgres\chkpass.vcxproj (default target) (36) - (Link target) - chkpass.obj : error LNK2019: unresolved external symbol crypt referenced in function chkpass_in [C:\Users\ioltas\git\postgres\chkpass.vcxproj] .\Release\chkpass\chkpass.dll : fatal error LNK1120: 1 unresolved externals [C:\Users\mpaquier\git\postgres\chkpass.vcxproj] 0 Warning(s) 2 Error(s) Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication Helpers WIP for discussion
On 13/02/15 14:04, Petr Jelinek wrote: On 13/02/15 08:48, Michael Paquier wrote: Looking at this patch, I don't see what we actually gain much here except a decoder plugin that speaks a special protocol for a special background worker that has not been presented yet. What actually is the value of that defined as a contrib/ module in-core. Note that we have already test_decoding to basically test the logical decoding facility, used at least at the SQL level to get logical changes decoded. Based on those reasons I am planning to mark this as rejected (it has no documentation as well). So please speak up if you think the contrary, but it seems to me that this could live happily out of core. I think you are missing point of this, it's not meant to be committed in this form at all and even less as contrib module. It was meant as basis for in-core logical replication discussion, but sadly I didn't really have time to pursue it in this CF in the end. That being said and looking at the size of February CF, I think I am fine with dropping this in 9.5 cycle, it does not seem likely that there will be anything useful done with this fast enough to get to 9.5 so there is no point in spending committer resources on it in final CF. I will pick it up again after the CF is done. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Really bad blowups with hash outer join and nulls
Andrew Gierth and...@tao11.riddles.org.uk writes: A quick test suggests that initializing the hash value to ~0 rather than 0 has a curious effect: the number of batches still explodes, but the performance does not suffer the same way. (I think because almost all the batches end up empty.) I think this is worth doing even in the absence of a more general solution; nulls are common enough and important enough that they shouldn't be the worst-case value if it can be avoided. I think that's unlikely to be a path to a good solution. At least part of the problem here is that estimate_hash_bucketsize() supposes that nulls can be ignored --- which is normally true, and invalidates your claim that they're common. But in a RIGHT JOIN situation, they need to be considered as if they were regular keys. That would probably be sufficient to dissuade the planner from choosing a hash join in this example. There may also be something we can do in the executor, but it would take closer analysis to figure out what's going wrong. I don't think kluging the behavior for NULL in particular is the answer. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)
The attached patch is a rebased version of join replacement with foreign-/custom-scan. Here is no feature updates at this moment but SGML documentation is added (according to Michael's comment). This infrastructure allows foreign-data-wrapper and custom-scan- provider to add alternative scan paths towards relations join. From viewpoint of the executor, it looks like a scan on a pseudo- relation that is materialized from multiple relations, even though FDW/CSP internally processes relations join with their own logic. Its basic idea is, (1) scanrelid==0 indicates this foreign/custom scan node runs on a pseudo relation and (2) fdw_ps_tlist and custom_ps_tlist introduce the definition of the pseudo relation, because it is not associated with a tangible relation unlike simple scan case, thus planner cannot know the expected record type to be returned without these additional information. These two enhancement enables extensions to process relations join internally, and to perform as like existing scan node from viewpoint of the core backend. Also, as an aside. I had a discussion with Hanada-san about this interface off-list. He had an idea to keep create_plan_recurse() static, using a special list field in CustomPath structure to chain underlying Path node. If core backend translate the Path node to Plan node if valid list given, extension does not need to call create_plan_recurse() by itself. I have no preference about this. Does anybody have opinion? Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai Sent: Thursday, January 15, 2015 8:03 AM To: Robert Haas Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API) On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: When custom-scan node replaced a join-plan, it shall have at least two child plan-nodes. The callback handler of PlanCustomPath needs to be able to call create_plan_recurse() to transform the underlying path-nodes to plan-nodes, because this custom-scan node may take other built-in scan or sub-join nodes as its inner/outer input. In case of FDW, it shall kick any underlying scan relations to remote side, thus we may not expect ForeignScan has underlying plans... Do you have an example of this? Yes, even though full code set is too large for patch submission... https://github.com/pg-strom/devel/blob/master/src/gpuhashjoin.c#L1880 This create_gpuhashjoin_plan() is PlanCustomPath callback of GpuHashJoin. It takes GpuHashJoinPath inherited from CustomPath that has multiple underlying scan/join paths. Once it is called back from the backend, it also calls create_plan_recurse() to make inner/outer plan nodes according to the paths. In the result, we can see the following query execution plan that CustomScan takes underlying scan plans. postgres=# EXPLAIN SELECT * FROM t0 NATURAL JOIN t1 NATURAL JOIN t2; QUERY PLAN -- Custom Scan (GpuHashJoin) (cost=2968.00..140120.31 rows=3970922 width=143) Hash clause 1: (aid = aid) Hash clause 2: (bid = bid) Bulkload: On - Custom Scan (GpuScan) on t0 (cost=500.00..57643.00 rows=409 width=77) - Custom Scan (MultiHash) (cost=734.00..734.00 rows=4 width=37) hash keys: aid nBatches: 1 Buckets: 46000 Memory Usage: 99.99% - Seq Scan on t1 (cost=0.00..734.00 rows=4 width=37) - Custom Scan (MultiHash) (cost=734.00..734.00 rows=4 width=37) hash keys: bid nBatches: 1 Buckets: 46000 Memory Usage: 49.99% - Seq Scan on t2 (cost=0.00..734.00 rows=4 width=37) (13 rows) Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -Original Message- From: Robert Haas [mailto:robertmh...@gmail.com] Sent: Thursday, January 15, 2015 2:07 AM To: Kaigai Kouhei(海外 浩平) Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada Subject: ##freemail## Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API) On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote: When custom-scan node replaced a join-plan, it shall have at least two child plan-nodes. The callback handler of PlanCustomPath needs to be able to call create_plan_recurse() to transform the underlying path-nodes to plan-nodes, because this custom-scan node may take other built-in scan or sub-join nodes as its inner/outer input. In case of FDW, it shall kick any underlying scan relations to remote side, thus we may not expect ForeignScan has underlying plans... Do you
[HACKERS] NOT NULL markings for BKI columns
Hi, 8b38a538c0aa5a432dacd90f10805dc667a3d1a0 changed things so that all columns are checked for NOT NULL ness. Which is fine in general, but it currently does make it impossible to have a varlena column (except OID/INT2 vector...) as a key column in syscache. Even if the column is factually NOT NUL. The rule for determining attribute's NOT NULL setting in bootstrap is: /* * Mark as not null if type is fixed-width and prior columns are too. * This corresponds to case where column can be accessed directly via C * struct declaration. * * oidvector and int2vector are also treated as not-nullable, even though * they are no longer fixed-width. */ #define MARKNOTNULL(att) \ ((att)-attlen 0 || \ (att)-atttypid == OIDVECTOROID || \ (att)-atttypid == INT2VECTOROID) if (MARKNOTNULL(attrtypes[attnum])) { int i; for (i = 0; i attnum; i++) { if (!MARKNOTNULL(attrtypes[i])) break; } if (i == attnum) attrtypes[attnum]-attnotnull = true; } (the rule is also encoded in genbki.pl) Now, you can argue that it's a folly to use the syscache code to access something via a text column (which is what I want to do). I don't agree, but even if you're of that position, it seems worthwhile to mark further catalog columns as NOT NULL in the catalog if that's what the code expects? E.g. pg_(sh)?seclabel.provider should certainly be NOT NULL. pg_extension.extversion as well. There's a couple more I think. So, how about providing bootstrap infrastructure for marking columns as NOT NULL? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] mogrify and indent features for jsonb
On 02/15/2015 11:47 AM, Sehrope Sarkuni wrote: For jsonb_indent, how about having it match up closer to the JavaScript JSON.stringify(value, replacer, space)[1]? That way a user can specify the indentation level and optionally filter the fields they'd like to output. In JS, the replacer parameter can be either a JS function or an array of property names. I don't think the former is really possible (in a SQL callable function) but the latter would be a text[]. The space parameter can be either a string (used directly) or a number (corresponding number of spaces). The PG function signatures would be something like: CREATE OR REPLACE FUNCTION jsonb_stringify(obj jsonb, replacer text[], space text) CREATE OR REPLACE FUNCTION jsonb_stringify(obj jsonb, replacer text[], space int) For convenience we could also include overloads with replacer removed (since most people probably want the entire object): CREATE OR REPLACE FUNCTION jsonb_stringify(obj jsonb, space text) CREATE OR REPLACE FUNCTION jsonb_stringify(obj jsonb, space int) Having json_stringify versions of these would be useful as well. I think if you want these things, especially the filtering, you should probably load PLV8. We could probably do the rest, but I'm not sure it's worth doing given that PLV8 is available for all of it. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NOT NULL markings for BKI columns
Andres Freund and...@2ndquadrant.com writes: On 2015-02-15 12:31:10 -0500, Tom Lane wrote: Where are you thinking of sticking that exactly, and will pgindent do something sane with it? Hm, I was thinking about /* extversion should never be null, but the others can be. */ textextversion PG_FORCENOTNULL; /* extension version name */ but pgindent then removes some of the space between text and extversion, making it text extversion PG_FORCENOTNULL;/* extension version name */ an alternative where it doesn't do that is textPG_FORCENOTNULL(extversion);/* extension version name */ Not sure what's the best way here. The former is clearly a lot more sane semantically, so I'd go with that even if the whitespace is a bit less nice. I notice that pgindent does a similar not-very-nice thing with PG_USED_FOR_ASSERTS_ONLY. I wonder if we could hack it to handle those two identifiers specially? BTW, the precedent of PG_USED_FOR_ASSERTS_ONLY would suggest calling this one PG_FORCE_NOT_NULL, or at least using underscores for word breaks in whatever we end up calling it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NOT NULL markings for BKI columns
On 2015-02-15 12:11:52 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: So, how about providing bootstrap infrastructure for marking columns as NOT NULL? We've not desperately needed it up to now, but if you can think of a clean representation, go for it. I'd want to preserve the property that all columns accessible via C structs are automatically NOT NULL though; too much risk of breakage otherwise. Agreed. I think that the noise of changing all existing attributes to have the marker would far outweigh the cleanliness of being consistent. I was thinking of adding BKI_FORCENOTNULL which would be specified on the attributes you want it. The FORCE in there representing that the default choice is overwritten. On a first glance that doesn't look too hard. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NOT NULL markings for BKI columns
Andres Freund and...@2ndquadrant.com writes: I was thinking of adding BKI_FORCENOTNULL which would be specified on the attributes you want it. The FORCE in there representing that the default choice is overwritten. Where are you thinking of sticking that exactly, and will pgindent do something sane with it? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow snapshot too old error, to prevent bloat
Kevin Grittner kgri...@ymail.com writes: Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kgri...@ymail.com writes: What this patch does is add a GUC call old_snapshot_threshold. It defaults to -1, which leaves behavior matching unpatched code. Above that it allows tuples to be vacuumed away after the number of transaction IDs specified by the GUC have been consumed. TBH, I'm not sure why we'd wish to emulate Oracle's single worst operational feature. I've run into cases where people have suffered horribly bloated databases because of one ill-behaved connection. Some companies don't want to be vulnerable to that and the disruption that recovery from that bloat causes. No doubt, preventing bloat is a good thing, but that doesn't mean this is the best API we could create for the issue. The proposition this patch offers to DBAs is: You can turn this knob to reduce bloat by some hard-to-quantify factor. The cost is that some long-running transactions might fail. You won't know which ones are at risk, the failures won't be the same from time to time, and you won't be able to do anything to spare high-value transactions from that fate except by turning that knob back again globally (which requires a database restart). Maybe refugees from Oracle will think that sounds good, but nobody else will. I wonder if we couldn't achieve largely the same positive effects through adding a simple transaction-level timeout option. That would be far easier for users to understand and manage, it would be trivial to allow specific high-value transactions to run with a laxer setting, it does not create any implementation-detail-dependent behavior that we'd be having to explain to users forevermore, and (not incidentally) it would be a lot simpler and more trustworthy to implement. There's no well-defined correlation between your setting and the net effect on database bloat, but that's true with the snapshot too old approach as well. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] mogrify and indent features for jsonb
For jsonb_indent, how about having it match up closer to the JavaScript JSON.stringify(value, replacer, space)[1]? That way a user can specify the indentation level and optionally filter the fields they'd like to output. In JS, the replacer parameter can be either a JS function or an array of property names. I don't think the former is really possible (in a SQL callable function) but the latter would be a text[]. The space parameter can be either a string (used directly) or a number (corresponding number of spaces). The PG function signatures would be something like: CREATE OR REPLACE FUNCTION jsonb_stringify(obj jsonb, replacer text[], space text) CREATE OR REPLACE FUNCTION jsonb_stringify(obj jsonb, replacer text[], space int) For convenience we could also include overloads with replacer removed (since most people probably want the entire object): CREATE OR REPLACE FUNCTION jsonb_stringify(obj jsonb, space text) CREATE OR REPLACE FUNCTION jsonb_stringify(obj jsonb, space int) Having json_stringify versions of these would be useful as well. Regards, -- Sehrope Sarkuni [1]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NOT NULL markings for BKI columns
Andres Freund and...@2ndquadrant.com writes: So, how about providing bootstrap infrastructure for marking columns as NOT NULL? We've not desperately needed it up to now, but if you can think of a clean representation, go for it. I'd want to preserve the property that all columns accessible via C structs are automatically NOT NULL though; too much risk of breakage otherwise. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NOT NULL markings for BKI columns
On 2015-02-15 12:31:10 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: I was thinking of adding BKI_FORCENOTNULL which would be specified on the attributes you want it. The FORCE in there representing that the default choice is overwritten. Where are you thinking of sticking that exactly, and will pgindent do something sane with it? Hm, I was thinking about /* extversion should never be null, but the others can be. */ textextversion PG_FORCENOTNULL; /* extension version name */ but pgindent then removes some of the space between text and extversion, making it text extversion PG_FORCENOTNULL;/* extension version name */ an alternative where it doesn't do that is textPG_FORCENOTNULL(extversion);/* extension version name */ Not sure what's the best way here. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gcc5: initdb produces gigabytes of _fsm files
Christoph Berg c...@df7cb.de writes: gcc5 is lurking in Debian experimental, and it's breaking initdb. FYI, this is now fixed in Red Hat's rawhide version: https://bugzilla.redhat.com/show_bug.cgi?id=1190978 Don't know what the update process is like for Debian's copy, but maybe you could pester the appropriate people to absorb the referenced upstream fix quickly. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] restrict global access to be readonly
On Feb15, 2015, at 10:13 , David G Johnston david.g.johns...@gmail.com wrote: happy times wrote Sure, we can utilize the runtime parameter default_transaction_read_only, however, it does not restrict user from changing transaction attribute to non-readonly mode, so is not safe. ISTM that implementing a means to make this setting only super-user changeable would be a quick(er) solution to the problem - the decision as to what to disallow is already decided. It seems like it would have to be a separate GUC but it would require any new SQL but would simply leverage the existing settings system to setup database-user values that only a super-user can change. I've wished for a way prevent regular users for changing specific settings in the past. Maybe we could have ALTER DATABASE database FORCE parameter TO value ALTER ROLE role [ IN DATABASE db ] FORCE parameter TO value In postgresql.conf, we could use a syntax like parameter =!= value to indicate that the parameter value can only be changed by super-users. We' have to figure out what happens if a database- or role-specific FORCEd setting attempts to override a value already FORCEd in postgresql.conf. Ideally, we'd allow database- or role-specific settings created by super-users to override previously FORCEd values, but that would require us to store the role that creates such settings in pg_db_role_setting. For SET clauses attached to functions, we'd complain if they attempt to change a FORCEd value, unless they are called by a super-user, or are marked SECURITY DEFINER and owned by a super-user. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] restrict global access to be readonly
On 2015-02-14 17:28:38 -0600, Jim Nasby wrote: Throw an error in AssignTransactionId/GetNewTransactionId? I see 4 calls to Get*TransactionId in logical replication, though arguably if we're fixing that we should look at doing something special for Slony and the likes. I don't think there are any xid assignments in the logical decoding code. There's a couple error checks erroring out if an xid has been assigned, but those use GetTopTransactionIdIfAny(), i.e. don't assign an id. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New CF app deployment
On Sat, Feb 14, 2015 at 6:42 PM, Robert Haas robertmh...@gmail.com wrote: On Sat, Feb 14, 2015 at 7:29 AM, Magnus Hagander mag...@hagander.net wrote: Ok, I've pushed an attempt at doing this. For each mailthread, you can now create annotations. Each annotation is connected to a mail in the thread, and has a free text comment field. The message will then automatically be highlighted out of the archives and a direct link to the message include, alongside the text comment. I *think* this is approximately what people wanted, so let's give it a shot. If you have a chance, please test it out and comment as soon as you can - if I'm completely off track with how it's done, we should back it out and try a different way before we start putting actual valuable data into it, so we don't end up with multiple different ways of doing it in the end... Hmm. This kind of looks like the right idea, but it's hard to use, because all you've got to work with is the subject of the message (which is the same for all), the time it was sent, and the author. In the old system, you could look at the message in the archives and then copy-and-paste in the message ID. That's obviously a bit manual, but it worked. Hmm. Agreed, I think my test-thread was just too short to do that. I've added a box where you can copy/paste the messageid. Back to the manual, but it works. I suppose what would be really spiffy is if the integration of the archives and the CF app could be such that you could see the whole message and then click annotate this message. That would be quite spiffy. A bit more work though, so that will have to go on the longer term todo :) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory
Here's an updated version of the patch I sent before. Notable changes: * I switched over to calling deserialized objects expanded objects, and the default representation is now called flat or flattened instead of reserialized. Per suggestion from Robert. * I got rid of the bit about detecting read-write pointers by address comparison. Instead there are now two vartag values for R/W and R/O pointers. After further reflection I concluded that my previous worry about wanting copied pointers to automatically become read-only was probably wrong, so there's no need for extra confusion here. * I added support for extracting array elements from expanded values (array_ref). * I hacked plpgsql to force values of array-type variables into expanded form; this is needed to get any win from the array_ref change if the function doesn't do any assignments to elements of the array. This is an improvement over the original patch, which hardwired array_set to force expansion, but I remain unsatisfied with it as a long-term answer. It's not clear that it's always a win to do this (but the tradeoff will change as we convert more array support functions to handle expanded inputs, so it's probably not worth getting too excited about that aspect of it yet). A bigger complaint is that this approach cannot fix things for non-builtin types such as hstore. I'm hesitant to add a pg_type column carrying an expansion function OID, but there may be no other workable answer for extension types. The patch as it stands is able to do nice things with create or replace function arraysetnum(n int) returns numeric[] as $$ declare res numeric[] := '{}'; begin for i in 1 .. n loop res[i] := i; end loop; return res; end $$ language plpgsql strict; create or replace function arraysumnum(arr numeric[]) returns numeric as $$ declare res numeric := 0; begin for i in array_lower(arr, 1) .. array_upper(arr, 1) loop res := res + arr[i]; end loop; return res; end $$ language plpgsql strict; regression=# select arraysumnum(arraysetnum(10)); arraysumnum - 55 (1 row) Time: 304.336 ms (versus approximately 1 minute in 9.4, although these numbers are for cassert builds so should be taken with a grain of salt.) There are still a couple more flattening/expansion conversions than I'd like, in particular the array returned by arraysetnum() gets flattened on its way out, which would be good to avoid. I'm going to stick this into the commitfest even though it's not really close to being committable; I see some other people doing likewise with their pet patches ;-). What it could particularly do with some reviewing help on is exploring the performance changes it creates; what cases does it make substantially worse? regards, tom lane diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c index 867035d..860ad78 100644 *** a/src/backend/access/common/heaptuple.c --- b/src/backend/access/common/heaptuple.c *** *** 60,65 --- 60,66 #include access/sysattr.h #include access/tuptoaster.h #include executor/tuptable.h + #include utils/expandeddatum.h /* Does att's datatype allow packing into the 1-byte-header varlena format? */ *** heap_compute_data_size(TupleDesc tupleDe *** 93,105 for (i = 0; i numberOfAttributes; i++) { Datum val; if (isnull[i]) continue; val = values[i]; ! if (ATT_IS_PACKABLE(att[i]) VARATT_CAN_MAKE_SHORT(DatumGetPointer(val))) { /* --- 94,108 for (i = 0; i numberOfAttributes; i++) { Datum val; + Form_pg_attribute atti; if (isnull[i]) continue; val = values[i]; + atti = att[i]; ! if (ATT_IS_PACKABLE(atti) VARATT_CAN_MAKE_SHORT(DatumGetPointer(val))) { /* *** heap_compute_data_size(TupleDesc tupleDe *** 108,118 */ data_length += VARATT_CONVERTED_SHORT_SIZE(DatumGetPointer(val)); } else { ! data_length = att_align_datum(data_length, att[i]-attalign, ! att[i]-attlen, val); ! data_length = att_addlength_datum(data_length, att[i]-attlen, val); } } --- 111,131 */ data_length += VARATT_CONVERTED_SHORT_SIZE(DatumGetPointer(val)); } + else if (atti-attlen == -1 + VARATT_IS_EXTERNAL_EXPANDED(DatumGetPointer(val))) + { + /* + * we want to flatten the expanded value so that the constructed + * tuple doesn't depend on it + */ + data_length = att_align_nominal(data_length, atti-attalign); + data_length += EOH_get_flat_size(DatumGetEOHP(val)); + } else { ! data_length = att_align_datum(data_length, atti-attalign, ! atti-attlen, val); ! data_length = att_addlength_datum(data_length, atti-attlen, val); } } *** heap_fill_tuple(TupleDesc tupleDesc, *** 203,212 *infomask |=
Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory
On Sun, Feb 15, 2015 at 6:41 PM, Tom Lane t...@sss.pgh.pa.us wrote: I'm going to stick this into the commitfest even though it's not really close to being committable; I see some other people doing likewise with their pet patches ;-). What it could particularly do with some reviewing help on is exploring the performance changes it creates; what cases does it make substantially worse? It's perfectly reasonable to add stuff that isn't committable yet to the CF app; the point of the CF app is to track what needs reviewing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Really bad blowups with hash outer join and nulls
Tom == Tom Lane t...@sss.pgh.pa.us writes: A quick test suggests that initializing the hash value to ~0 rather than 0 has a curious effect: the number of batches still explodes, but the performance does not suffer the same way. (I think because almost all the batches end up empty.) I think this is worth doing even in the absence of a more general solution; nulls are common enough and important enough that they shouldn't be the worst-case value if it can be avoided. Tom I think that's unlikely to be a path to a good solution. It wasn't really intended to be. Tom At least part of the problem here is that Tom estimate_hash_bucketsize() supposes that nulls can be ignored --- Tom which is normally true, and invalidates your claim that they're Tom common. But in a RIGHT JOIN situation, they need to be considered Tom as if they were regular keys. That would probably be sufficient Tom to dissuade the planner from choosing a hash join in this example. I've now tried the attached patch to correct the bucketsize estimates, and it does indeed stop the planner from considering the offending path (in this case it just does the join the other way round). One thing I couldn't immediately see how to do was account for the case where there are a lot of nulls in the table but a strict qual (or an IS NOT NULL) filters them out; this patch will be overly pessimistic about such cases. Do estimates normally try and take things like this into account? I didn't find any other relevant examples. Tom There may also be something we can do in the executor, but it Tom would take closer analysis to figure out what's going wrong. I Tom don't think kluging the behavior for NULL in particular is the Tom answer. The point with nulls is that a hash value of 0 is currently special in two distinct ways: it's always in batch 0 and bucket 0 regardless of how many batches and buckets there are, and it's the result of hashing a null. These two special cases interact in a worst-case manner, so it seems worthwhile to avoid that. -- Andrew (irc:RhodiumToad) diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index 020558b..1723fd3 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -2505,6 +2505,7 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path, SpecialJoinInfo *sjinfo, SemiAntiJoinFactors *semifactors) { + JoinType jointype = path-jpath.jointype; Path *outer_path = path-jpath.outerjoinpath; Path *inner_path = path-jpath.innerjoinpath; double outer_path_rows = outer_path-rows; @@ -2560,6 +2561,9 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path, foreach(hcl, hashclauses) { RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(hcl); + bool keep_nulls = (jointype == JOIN_RIGHT + || jointype == JOIN_FULL + || !op_strict(restrictinfo-hashjoinoperator)); Selectivity thisbucketsize; Assert(IsA(restrictinfo, RestrictInfo)); @@ -2583,7 +2587,8 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path, thisbucketsize = estimate_hash_bucketsize(root, get_rightop(restrictinfo-clause), - virtualbuckets); + virtualbuckets, + keep_nulls); restrictinfo-right_bucketsize = thisbucketsize; } } @@ -2599,7 +2604,8 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path, thisbucketsize = estimate_hash_bucketsize(root, get_leftop(restrictinfo-clause), - virtualbuckets); + virtualbuckets, + keep_nulls); restrictinfo-left_bucketsize = thisbucketsize; } } diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index 1ba103c..e8660bc 100644 --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -3435,9 +3435,13 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows) * discourage use of a hash rather strongly if the inner relation is large, * which is what we want. We do not want to hash unless we know that the * inner rel is well-dispersed (or the alternatives seem much worse). + * + * If keep_nulls is true, then we have to treat nulls as a real value, so + * adjust all results according to stanullfrac if available. */ Selectivity -estimate_hash_bucketsize(PlannerInfo *root, Node *hashkey, double nbuckets) +estimate_hash_bucketsize(PlannerInfo *root, Node *hashkey, double nbuckets, + bool keep_nulls) { VariableStatData vardata; double estfract, @@ -3454,13 +3458,6 @@ estimate_hash_bucketsize(PlannerInfo *root, Node *hashkey, double nbuckets) /* Get number of distinct values */ ndistinct = get_variable_numdistinct(vardata, isdefault); - /* If ndistinct isn't real, punt and return 0.1, per comments above */ - if (isdefault) - { - ReleaseVariableStats(vardata); - return (Selectivity) 0.1; - } - /* Get fraction that
Re: [HACKERS] Replication identifiers, take 4
Hi, Here's my next attept attempt at producing something we can agree upon. The major change that might achieve that is that I've now provided a separate method to store the origin_id of a node. I've made it conditional on !REPLICATION_IDENTIFIER_REUSE_PADDING, to show both paths. That new method uses Heikki's xlog rework to dynamically add the origin to the record if a origin is set up. That works surprisingly simply. Other changes: * Locking preventing several backends to replay changes at the same time. This is actually overly restrictive in some cases, but I think good enough for now. * Logical decoding grew a filter_by_origin callback that allows to ignore changes that were replayed on a remote system. Such filters are executed before much is done with records, potentially saving a fair bit of costs. * Rebased. That took a bit due the xlog and other changes. * A couple more SQL interface functions (like dropping a replication identifier). I also want to quickly recap replication identifiers, given that in-person conversations with several people proved that the concept was slightly misunderstood: Think about a logical replication solution trying to replay changes. The postmaster in which the data is replayed into crashes every now and then. Replication identifiers allow you to do something like: do_replication() { source = ConnectToSourceSystem('mysource'); target = ConnectToSourceSystem('target'); # mark we're replayin target.exec($$SELECT pg_replication_identifier_setup_replaying_from('myrep_mysource')$$); # get how far we've replayed last time round remote_lsn = target.exec($$SELECT remote_lsn FROM pg_get_replication_identifier_progress WHERE external_id = 'myrep_mysource'); # and now replay changes copystream = source.exec('START_LOGICAL_REPLICATION SLOT ... START %x', remote_lsn); while (record = copystream.get_record()) { if (record.type = 'begin') { target.exec('BEGIN'); # setup the position of this individual xact target.exec('SELECT pg_replication_identifier_setup_tx_origin($1, $2);', record.origin_lsn, record.origin_commit_timestamp); } else if (record.type = 'change') target.exec(record.change_sql) else if (record.type = 'commit') target.exec('COMMIT'); } } A non pseudocode version of the above would be safe against crashes of both the source and the target system. If the target system crashes the replication identifier logic will recover how far we replayed during crash recovery. If the source system crashes/disconnects we'll have the current value in memory. Note that this works perfectly well if the target system (and obviously the source system, but that's obvious) use synchronous_commit = off - we'll not miss any changes. Furthermore the fact that the origin of records is recorded allows to avoid decoding them in logical decoding. That has both efficiency advantages (we can do so before they are stored in memory/disk) and functionality advantages. Imagine using a logical replication solution to replicate inserts to a single table between two databases where inserts are allowed on both - unless you prevent the replicated inserts from being replicated again you obviously have a loop. This infrastructure lets you avoid that. The SQL interface consists out of: # manage existance of identifiers internal_id pg_replication_identifier_create(external_id); void pg_replication_identifier_drop(external_id); # replay management void pg_replication_identifier_setup_replaying_from(external_id); void pg_replication_identifier_reset_replaying_from(); bool pg_replication_identifier_is_replaying(); void pg_replication_identifier_setup_tx_origin(remote_lsn, remote_commit_time); # replication progress status view SELECT * FROM pgreplication_identifier_progress; # replicatation identifiers SELECT * FROM pg_replication_identifier; Petr has developed (for UDR, i.e. logical replication ontop of 9.4) a SQL reimplementation of replication identifiers and that has proven that for busier workloads doing a table update to store the replication progress indeed has a noticeable overhead. Especially if there's some longer running activity on the standby. The bigger questions I have are: 1) Where to store the origin. I personally still think that using the padding is fine. Now that I have proven that it's pretty simple to store additional information the argument that it might be needed for something else doesn't really hold anymore. But I can live with the other solution as well - 3 bytes additional overhead ain't so bad. 2) If we go with the !REPLICATION_IDENTIFIER_REUSE_PADDING solution, do we want to store the origin only on relevant records? That'd be XLOG_HEAP_INSERT/XLOG_HEAPMULTI_INSERT/XLOG_HEAP_UPDATE // XLOG_XACT_COMMIT/XLOG_XACT_COMMIT_PREPARED. I'm thinking of something
Re: [HACKERS] Replication identifiers, take 4
On 2015-02-16 01:21:55 +0100, Andres Freund wrote: Here's my next attept attempt at producing something we can agree upon. The major change that might achieve that is that I've now provided a separate method to store the origin_id of a node. I've made it conditional on !REPLICATION_IDENTIFIER_REUSE_PADDING, to show both paths. That new method uses Heikki's xlog rework to dynamically add the origin to the record if a origin is set up. That works surprisingly simply. Other changes: * Locking preventing several backends to replay changes at the same time. This is actually overly restrictive in some cases, but I think good enough for now. * Logical decoding grew a filter_by_origin callback that allows to ignore changes that were replayed on a remote system. Such filters are executed before much is done with records, potentially saving a fair bit of costs. * Rebased. That took a bit due the xlog and other changes. * A couple more SQL interface functions (like dropping a replication identifier). And here an actual patch. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 268d52cac6bf7fe1c019fd68248853c7c7ae18b1 Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Mon, 16 Feb 2015 01:22:08 +0100 Subject: [PATCH] Introduce replication identifiers to keep track of replication progress: v0.6 --- contrib/test_decoding/Makefile |3 +- contrib/test_decoding/expected/replident.out | 84 ++ contrib/test_decoding/sql/replident.sql| 40 + contrib/test_decoding/test_decoding.c | 28 + src/backend/access/rmgrdesc/xactdesc.c | 17 +- src/backend/access/transam/xact.c | 64 +- src/backend/access/transam/xlog.c | 34 +- src/backend/access/transam/xloginsert.c| 22 +- src/backend/access/transam/xlogreader.c| 10 + src/backend/bootstrap/bootstrap.c |5 +- src/backend/catalog/Makefile |2 +- src/backend/catalog/catalog.c |8 +- src/backend/catalog/system_views.sql |7 + src/backend/replication/logical/Makefile |3 +- src/backend/replication/logical/decode.c | 63 +- src/backend/replication/logical/logical.c |5 + src/backend/replication/logical/reorderbuffer.c|5 +- .../replication/logical/replication_identifier.c | 1190 src/backend/storage/ipc/ipci.c |3 + src/backend/utils/cache/syscache.c | 23 + src/backend/utils/misc/guc.c |1 + src/bin/initdb/initdb.c|1 + src/bin/pg_resetxlog/pg_resetxlog.c|3 + src/include/access/xact.h | 10 +- src/include/access/xlog.h |1 + src/include/access/xlogdefs.h |6 + src/include/access/xlogreader.h|9 + src/include/access/xlogrecord.h|5 +- src/include/catalog/indexing.h |6 + src/include/catalog/pg_proc.h | 28 + src/include/catalog/pg_replication_identifier.h| 75 ++ src/include/pg_config_manual.h |6 + src/include/replication/output_plugin.h|8 + src/include/replication/reorderbuffer.h|8 +- src/include/replication/replication_identifier.h | 58 + src/include/utils/syscache.h |2 + src/test/regress/expected/rules.out|5 + src/test/regress/expected/sanity_check.out |1 + 38 files changed, 1816 insertions(+), 33 deletions(-) create mode 100644 contrib/test_decoding/expected/replident.out create mode 100644 contrib/test_decoding/sql/replident.sql create mode 100644 src/backend/replication/logical/replication_identifier.c create mode 100644 src/include/catalog/pg_replication_identifier.h create mode 100644 src/include/replication/replication_identifier.h diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile index 438be44..f8334cc 100644 --- a/contrib/test_decoding/Makefile +++ b/contrib/test_decoding/Makefile @@ -37,7 +37,8 @@ submake-isolation: submake-test_decoding: $(MAKE) -C $(top_builddir)/contrib/test_decoding -REGRESSCHECKS=ddl rewrite toast permissions decoding_in_xact decoding_into_rel binary prepared +REGRESSCHECKS=ddl rewrite toast permissions decoding_in_xact decoding_into_rel \ + binary prepared replident regresscheck: all | submake-regress submake-test_decoding $(MKDIR_P) regression_output diff --git a/contrib/test_decoding/expected/replident.out b/contrib/test_decoding/expected/replident.out new file mode 100644 index 000..1c508a5 --- /dev/null +++
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On Sat, Feb 14, 2015 at 2:06 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Thanks for taking a look at it. That's somewhat cleaned up in the attached patchseries - V2.2. This has been rebased to repair the minor bit-rot pointed out by Thom. I don't really have the energy to review this patch in one batch, so I'd really like to split this into two: I think we're all feeling worn out at this point, by this patch and by others. I do appreciate your making the effort. 1. Solve the existing problem with exclusion constraints that if two insertions happen concurrently, one of them might be aborted with deadlock detected error, rather then conflicting key violation error. That really wouldn't be worth fixing otherwise, but it happens to be a useful stepping stone for this upsert patch. 2. All the rest. I think we need a more pragmatic approach to dealing with the exclusion constraint problems. I took a stab at extracting the parts needed to do 1. See attached. I didn't modify ExecUpdate to use speculative insertions, so the issue remains for UPDATEs, but that's easy to add. Cool. I did not solve the potential for livelocks (discussed at http://www.postgresql.org/message-id/cam3swztftt_fehet3tu3ykcpcypynnauquz3q+naasnh-60...@mail.gmail.com). The patch always super-deletes the already-inserted tuple on conflict, and then waits for the other inserter. That would be nice to fix, using one of the ideas from that thread, or something else. How about we don't super-delete at all with exclusion constraints? I'd be willing to accept unprincipled deadlocks for exclusion constraints, because they already exist today for UPSERT/NOSERT type use cases, and with idiomatic usage seem far less likely for the IGNORE variant (especially with exclusion constraints). I can see people using ON CONFLICT UPDATE where they'd almost or actually be better off using a plain UPDATE - that's quite a different situation. I find livelocks to be a *very* scary prospect, and I don't think the remediations that were mentioned are going to fly. It's just too messy, and too much of a niche use case. TBH I am skeptical of the idea that we can fix exclusion constraints properly in this way at all, at least not without the exponential backoff thing, which just seems horrible. We never really debated the options for how to do the speculative insertion and super-deletion. This patch is still based on the quick dirty demo patches I posted about a year ago, in response to issues you found with earlier versions. That might be OK - maybe I really hit the nail on designing those things and got them right on first try - but more likely there are better alternatives. Intuitively, it seem likely that you're right here. However, it was necessary to work through the approach to see what the problems are. For example, the need for modifications to tqual.c became apparent only through putting a full implementation of ON CONFLICT UPDATE through various tests. In general, I've emphasized that the problems with any given value locking implementation are non-obvious. Anyone who thinks that he can see all the problems with his approach to value locking without having a real implementation that is tested for problems like unprincipled deadlocks is probably wrong. That's sort of where I'm coming from with suggesting we allow unprincipled deadlocks with exclusion constraints. I'm not confident that we can find a perfect solution, and know that it's a perfect solution. It's too odd, and too niche a requirement. Although you understood what I was on about when I first talked about unprincipled deadlocks, I think that acceptance of that idea came much later from other people, because it's damn complicated. It's not worth it to add some weird Dining philosophers exponential backoff thing to make sure that the IGNORE variant when used with exclusion constraints can never deadlock in an unprincipled way, but if it is worth it then it seems unreasonable to suppose that this patch needs to solve that pre-existing problem. No? If we do something like an exponential backoff, which I think might work, I fear that that kind of yak-shaving will leave us with something impossible to justify committing; a total mess. Better the devil you know (possible unprincipled deadlocks with the IGNORE variant + exclusion constraints). Are we happy with acquiring the SpeculativeInsertLock heavy-weight lock for every insertion? That seems bad for performance reasons. Also, are we happy with adding the new fields to the proc array? Another alternative that was discussed was storing the speculative insertion token on the heap tuple itself. (See http://www.postgresql.org/message-id/52d00d2d.6030...@vmware.com) Whatever works, really. I can't say that the performance implications of acquiring that hwlock are at the forefront of my mind. I never found that to be a big problem on an 8 core box, relative to vanilla INSERTs, FWIW - lock contention is not
Re: [HACKERS] Issue installing doc tools on OSX
On 2/15/15 6:31 PM, David Steele wrote: I had a problem installing the doc tools following the directions for OSX at http://www.postgresql.org/docs/9.4/static/docguide-toolsets.html. I'm running OSX Yosemite. I got it to work by changing the command from: sudo port install docbook-dsssl docbook-sgml-4.2 docbook-xml-4.2 docbook-xsl libxslt openjade opensp To: sudo port install docbook-dsssl docbook-sgml-4.2 docbook-xml-4.2 docbook-xsl libxslt opensp openjade I didn't capture the error message unfortunately, but it was more or less: unresolved dependency opensp while installing openjade. That seems a bit incredible, since port should be able to resolve the dependencies by itself. I suggest that this should be reported as a bug to MacPorts. Also note that the other listed packages are also not ordered in dependency order. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New CF app deployment
On 2/14/15 7:30 AM, Magnus Hagander wrote: On Mon, Feb 9, 2015 at 4:56 PM, Robert Haas robertmh...@gmail.com Can we make it smarter, so that the kinds of things people produce intending for them to be patches are thought by the CF app to be patches? Doing it wouldn't be too hard, as the code right now is simply: # Attempt to identify the file using magic information mtype = mag.buffer(contents) if mtype.startswith('text/x-diff'): a.ispatch = True else: a.ispatch = False (where mag is the API call into the magic module) So we could easily add for example our own regexp parsing or so. The question is do we want to - because we'll have to maintain it as well. But I guess if we have a restricted enough set of rules, we can probably live with that. As I had described in http://www.postgresql.org/message-id/54dd2413.8030...@gmx.net, this is all but impossible. The above rule is certainly completely detached from the reality of what people actually send in. If you are just ignoring patches that don't match your rule set, this is not going to work very well. I think the old system where the patch submitter declared, this message contains my patch, is the only one that will work. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] restrict global access to be readonly
On 2/14/15 7:24 PM, Tom Lane wrote: Another possibility that would be attractive for replication-related use-cases would be nothing that generates WAL thank you very much. This would be useful, as it essentially simulates a hot standby. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Really bad blowups with hash outer join and nulls
Hi, On 16.2.2015 00:50, Andrew Gierth wrote: Tom == Tom Lane t...@sss.pgh.pa.us writes: I've now tried the attached patch to correct the bucketsize estimates, and it does indeed stop the planner from considering the offending path (in this case it just does the join the other way round). One thing I couldn't immediately see how to do was account for the case where there are a lot of nulls in the table but a strict qual (or an IS NOT NULL) filters them out; this patch will be overly pessimistic about such cases. Do estimates normally try and take things like this into account? I didn't find any other relevant examples. Improving the estimates is always good, but it's not going to fix the case of non-NULL values (it shouldn't be all that difficult to create such examples with a value whose hash starts with a bunch of zeroes). Tom There may also be something we can do in the executor, but it Tom would take closer analysis to figure out what's going wrong. I Tom don't think kluging the behavior for NULL in particular is the Tom answer. The point with nulls is that a hash value of 0 is currently special in two distinct ways: it's always in batch 0 and bucket 0 regardless of how many batches and buckets there are, and it's the result of hashing a null. These two special cases interact in a worst-case manner, so it seems worthwhile to avoid that. I think you're right this is a flaw in general - all it takes is a sufficiently common value with a hash value falling into the first batch (i.e. either 0 or starting with a lot of 0 bits, thus giving hashno==0). I think this might be solved by relaxing the check a bit. Currently we do this (see nodeHash.c:735): if (nfreed == 0 || nfreed == ninmemory) { hashtable-growEnabled = false; } which only disables nbatch growth if *all* the tuples remain in the batch. That's rather strict, and it takes a single tuple to break this. With each nbatch increase we expect 50:50 split, i.e. 1/2 the tuples staying in the batch, 1/2 moved to the new one. So a much higher ratio is suspicious and most likely mean the same hash value, so what if we did something like this: if ((nfreed = 0.9 * (nfreed + ninmemory)) || (nfreed = 0.1 * (nfreed + ninmemory))) { hashtable-growEnabled = false; } which disables nbatch growth if either =90% tuples remained in the first batch, or = 90% tuples were moved from it. The exact thresholds might be set a bit differently, but I think 10:90 sounds about good. Trivial patch implementing this attached - with it the explain analyze looks like this: test=# explain analyze select status, count(*) from q3 left join m3 on (m3.id=q3.id) group by status; QUERY PLAN -- HashAggregate (cost=64717.63..64717.67 rows=4 width=4) (actual time=514.619..514.630 rows=5 loops=1) Group Key: m3.status - Hash Right Join (cost=18100.00..62217.63 rows=50 width=4) (actual time=75.260..467.911 rows=500108 loops=1) Hash Cond: (m3.id = q3.id) - Seq Scan on m3 (cost=0.00..18334.00 rows=100 width=37) (actual time=0.003..91.799 rows=100 loops=1) - Hash (cost=7943.00..7943.00 rows=50 width=33) (actual time=74.916..74.916 rows=50 loops=1) Buckets: 65536 (originally 65536) Batches: 32 (originally 16) Memory Usage: 8824kB - Seq Scan on q3 (cost=0.00..7943.00 rows=50 width=33) (actual time=0.005..27.395 rows=50 loops=1) Planning time: 0.172 ms Execution time: 514.663 ms (10 rows) Without the patch this runs in ~240 seconds and the number of batches explodes to ~131k. In theory it might happen that threre just a few hash values and all of them are exactly the same within the first N bits (thus falling into the first batch), but N+1 bits are different. So the next split would actually help. But I think we can leave that up to probability, just like all the hash code. -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c index abd70b3..6b3f471 100644 --- a/src/backend/executor/nodeHash.c +++ b/src/backend/executor/nodeHash.c @@ -732,7 +732,8 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable) * group any more finely. We have to just gut it out and hope the server * has enough RAM. */ - if (nfreed == 0 || nfreed == ninmemory) +if ((nfreed = 0.9 * (nfreed + ninmemory)) || +(nfreed = 0.1 * (nfreed + ninmemory))) { hashtable-growEnabled = false; #ifdef HJDEBUG -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:
[HACKERS] EXPERIMENTAL: mmap-based memory context / allocator
Hi! While playing with the memory context internals some time ago, I've been wondering if there are better ways to allocate memory from the kernel - either tweaking libc somehow, or maybe interacting with kernel directly. I mostly forgot about that topic, but after the local conference last week we went to a pub and one of the things we discussed over a beer was how complex and unintuitive the memory stuff is, because of the libc heuristics, 'sbrk' properties [1] and behavior in the presence of holes, OOM etc. The virtual memory system should handle this to a large degree, but I've repeatedly ran into problem when that was not the case (for example the memory is still part of the virtual address space, and thus counted by OOM). One of the wilder ideas (I mentined beer was involved!) was a memory allocator based on mmap [2], bypassing the libc malloc implementation altogether. mmap() has some nice features (e.g. no issues with returning memory back to the kernel, which may be problem with sbrk). So I hacked a bit and switched the AllocSet implementation to mmap(). And it works surprisingly well, so here is an experimental patch for comments whether this really is a good idea etc. Some parts of the patch are a bit dirty and I've only tested it on x86. Notes - (1) The main changes are mostly trivial, rewriting malloc()/free() to mmap()/munmap(), plus related chages (e.g. mmap() returns (void*)-1 instead of NULL in case of failure). (2) A significant difference is that mmap() can't allocate blocks smaller than page size, which is 4kB on x86. This means the smallest context is 4kB (instead of 1kB), and also affects the growth of block size (it can only grow to 8kB). So this changes the AllocSet internal behavior a bit. (3) As this bypasses libc, it can't use the libc freelists (which are used by malloc). To compensate for this, there's a simple process-level freelist of blocks, shared by all memory contexts. This cache a limited capacity (roughly 4MB per). (4) Some of the comments are obsolete, still referencing malloc/free. Benchmarks -- I've done extensive testing and also benchmrking, and it seems to be no slower than the current implementation, and in some cases is actually a bit faster. a) time pgbench -i -s 300 - pgbench initialisation, measuring the COPY and the total duration. - averages of 3 runs (negligible variations between runs) COPY total - master 0:26.22 1:22 mmap 0:26.35 1:22 Pretty much no difference. b) pgbench -S -c 8 -j 8 -T 60 - short read-only runs (1 minute) after a warmup - min, max, average of 8 runs average min max - master 967859532997981 mmap 982799725999671 That's a rather consistent 1-2% improvement. c) REINDEX pgbench_accounts_pkey - large maintenance_work_mem so that it's in-memory sort - average, min, max of 8 runs (duration in seconds) average min max - master 10.359.64 13.56 mmap 9.859.81 9.90 Again, mostly improvement, except for the minimum where the currect memory context was a bit faster. But overall the mmap-based one is much more consistent. Some of the differences may be due to allocating 4kB blocks from the very start (while the current allocator starts with 1kB, then 2kB and finally 4kB). Ideas, opinions? [1] http://linux.die.net/man/2/sbrk [2] http://linux.die.net/man/2/mmap -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c index 0759e39..65e7f66 100644 --- a/src/backend/utils/mmgr/aset.c +++ b/src/backend/utils/mmgr/aset.c @@ -89,6 +89,8 @@ #include utils/memdebug.h #include utils/memutils.h +#include sys/mman.h + /* Define this to detail debug alloc information */ /* #define HAVE_ALLOCINFO */ @@ -117,13 +119,11 @@ * wastage.) * */ - +#define PAGE_SIZE 4096 /* getconf PAGE_SIZE */ #define ALLOC_MINBITS 3 /* smallest chunk size is 8 bytes */ -#define ALLOCSET_NUM_FREELISTS 11 +#define ALLOCSET_NUM_FREELISTS 11 /* FIXME probably should be only 9 */ #define ALLOC_CHUNK_LIMIT (1 (ALLOCSET_NUM_FREELISTS-1+ALLOC_MINBITS)) -/* Size of largest chunk that we use a fixed size for */ -#define ALLOC_CHUNK_FRACTION 4 -/* We allow chunks to be at most 1/4 of maxBlockSize (less overhead) */ +/* Size of largest chunk that we use a fixed size for (2kB) */ /* * The first block allocated for an allocset has size initBlockSize. @@ -242,6 +242,31 @@ typedef struct AllocChunkData #define AllocChunkGetPointer(chk) \ ((AllocPointer)(((char *)(chk)) +
Re: [HACKERS] New CF app deployment
On Sun, Feb 15, 2015 at 8:36 AM, Magnus Hagander mag...@hagander.net wrote: Hmm. This kind of looks like the right idea, but it's hard to use, because all you've got to work with is the subject of the message (which is the same for all), the time it was sent, and the author. In the old system, you could look at the message in the archives and then copy-and-paste in the message ID. That's obviously a bit manual, but it worked. Hmm. Agreed, I think my test-thread was just too short to do that. I've added a box where you can copy/paste the messageid. Back to the manual, but it works. Thank you for correcting the issue in a timely manner. I appreciate the effort. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPERIMENTAL: mmap-based memory context / allocator
On 15.2.2015 20:56, Heikki Linnakangas wrote: On 02/15/2015 08:57 PM, Tomas Vondra wrote: One of the wilder ideas (I mentined beer was involved!) was a memory allocator based on mmap [2], bypassing the libc malloc implementation altogether. mmap() has some nice features (e.g. no issues with returning memory back to the kernel, which may be problem with sbrk). So I hacked a bit and switched the AllocSet implementation to mmap(). glibc's malloc() also uses mmap() for larger allocations. Precisely because those allocations can then be handed back to the OS. I don't think we'd want to use mmap() for small allocations either. Let's not re-invent malloc().. malloc() does that only for allocations over MAP_THRESHOLD, which is 128kB by default. Vast majority of blocks we allocate are = 8kB, so mmap() almost never happens. At least that's my understanding, I may be wrong of course. - Heikki -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPERIMENTAL: mmap-based memory context / allocator
On 2015-02-15 21:07:13 +0100, Tomas Vondra wrote: On 15.2.2015 20:56, Heikki Linnakangas wrote: On 02/15/2015 08:57 PM, Tomas Vondra wrote: One of the wilder ideas (I mentined beer was involved!) was a memory allocator based on mmap [2], bypassing the libc malloc implementation altogether. mmap() has some nice features (e.g. no issues with returning memory back to the kernel, which may be problem with sbrk). So I hacked a bit and switched the AllocSet implementation to mmap(). glibc's malloc() also uses mmap() for larger allocations. Precisely because those allocations can then be handed back to the OS. I don't think we'd want to use mmap() for small allocations either. Let's not re-invent malloc().. malloc() does that only for allocations over MAP_THRESHOLD, which is 128kB by default. Vast majority of blocks we allocate are = 8kB, so mmap() almost never happens. The problem is that mmap() is, to my knowledge, noticeably more expensive than sbrk(). Especially with concurrent workloads. Which is why the malloc/libc authors chose to use sbrk... IIRC glibc malloc also batches several allocation into mmap()ed areas after some time. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPERIMENTAL: mmap-based memory context / allocator
On 15.2.2015 21:13, Andres Freund wrote: On 2015-02-15 21:07:13 +0100, Tomas Vondra wrote: malloc() does that only for allocations over MAP_THRESHOLD, which is 128kB by default. Vast majority of blocks we allocate are = 8kB, so mmap() almost never happens. The problem is that mmap() is, to my knowledge, noticeably more expensive than sbrk(). Especially with concurrent workloads. Which is why the malloc/libc authors chose to use sbrk ... Any ideas how to simulate such workloads? None of the tests I've done suggested such issue exists. IIRC glibc malloc also batches several allocation into mmap()ed areas after some time. Maybe, there's certainly a lot of such optimizations in libc. But how do you return memory to system in that case? -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow snapshot too old error, to prevent bloat
Jim Nasby jim.na...@bluetreble.com writes: On 2/15/15 10:36 AM, Tom Lane wrote: I wonder if we couldn't achieve largely the same positive effects through adding a simple transaction-level timeout option. A common use-case is long-running reports hitting relatively stable data in a database that also has tables with a high churn rate (ie: queue tables). In those scenarios your only options right now are to suffer huge amounts of bloat in the high-churn or not do your reporting. A simple transaction timeout only solves this by denying you reporting queries. Agreed, but Kevin's proposal has exactly the same problem only worse, because (a) the reporting transactions might or might not fail (per Murphy, they'll fail consistently only when you're on deadline), and (b) if they do fail, there's nothing you can do short of increasing the slack db-wide. An idea that I've had on this would be some way to lock down the tables that a long-running transaction could access. That would allow vacuum to ignore any snapshots that transaction had for tables it wasn't accessing. That's something that would be deterministic. There might be something in that, but again it's not much like this patch. The key point I think we're both making is that nondeterministic failures are bad, especially when you're talking about long-running, expensive-to- retry transactions. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPERIMENTAL: mmap-based memory context / allocator
Andres Freund and...@2ndquadrant.com writes: On 2015-02-15 21:07:13 +0100, Tomas Vondra wrote: On 15.2.2015 20:56, Heikki Linnakangas wrote: glibc's malloc() also uses mmap() for larger allocations. Precisely because those allocations can then be handed back to the OS. I don't think we'd want to use mmap() for small allocations either. Let's not re-invent malloc().. malloc() does that only for allocations over MAP_THRESHOLD, which is 128kB by default. Vast majority of blocks we allocate are = 8kB, so mmap() almost never happens. The problem is that mmap() is, to my knowledge, noticeably more expensive than sbrk(). Especially with concurrent workloads. Which is why the malloc/libc authors chose to use sbrk... IIRC glibc malloc also batches several allocation into mmap()ed areas after some time. Keep in mind also that aset.c doubles the request size every time it goes back to malloc() for some more space for a given context. So you get up to 128kB pretty quickly. There will be a population of 8K-to-64K chunks that don't ever get returned to the OS but float back and forth between different MemoryContexts as those are created and deleted. I'm inclined to think this is fine and we don't need to improve on it. Part of the reason for my optimism is that on glibc-based platforms, IME PG backends do pretty well at reducing their memory consumption back down to a minimal value after each query. (On other platforms, not so much, but arguably that's libc's fault not ours.) So I'm not really seeing a problem that needs fixed, and definitely not one that a platform-specific fix will do much for. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow snapshot too old error, to prevent bloat
On 2/15/15 10:36 AM, Tom Lane wrote: Kevin Grittner kgri...@ymail.com writes: Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kgri...@ymail.com writes: What this patch does is add a GUC call old_snapshot_threshold. It defaults to -1, which leaves behavior matching unpatched code. Above that it allows tuples to be vacuumed away after the number of transaction IDs specified by the GUC have been consumed. TBH, I'm not sure why we'd wish to emulate Oracle's single worst operational feature. I've run into cases where people have suffered horribly bloated databases because of one ill-behaved connection. Some companies don't want to be vulnerable to that and the disruption that recovery from that bloat causes. No doubt, preventing bloat is a good thing, but that doesn't mean this is the best API we could create for the issue. The proposition this patch offers to DBAs is: You can turn this knob to reduce bloat by some hard-to-quantify factor. The cost is that some long-running transactions might fail. You won't know which ones are at risk, the failures won't be the same from time to time, and you won't be able to do anything to spare high-value transactions from that fate except by turning that knob back again globally (which requires a database restart). Maybe refugees from Oracle will think that sounds good, but nobody else will. I wonder if we couldn't achieve largely the same positive effects through adding a simple transaction-level timeout option. That would be far easier for users to understand and manage, it would be trivial to allow specific high-value transactions to run with a laxer setting, it does not create any implementation-detail-dependent behavior that we'd be having to explain to users forevermore, and (not incidentally) it would be a lot simpler and more trustworthy to implement. There's no well-defined correlation between your setting and the net effect on database bloat, but that's true with the snapshot too old approach as well. A common use-case is long-running reports hitting relatively stable data in a database that also has tables with a high churn rate (ie: queue tables). In those scenarios your only options right now are to suffer huge amounts of bloat in the high-churn or not do your reporting. A simple transaction timeout only solves this by denying you reporting queries. An idea that I've had on this would be some way to lock down the tables that a long-running transaction could access. That would allow vacuum to ignore any snapshots that transaction had for tables it wasn't accessing. That's something that would be deterministic. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPERIMENTAL: mmap-based memory context / allocator
On 02/15/2015 08:57 PM, Tomas Vondra wrote: One of the wilder ideas (I mentined beer was involved!) was a memory allocator based on mmap [2], bypassing the libc malloc implementation altogether. mmap() has some nice features (e.g. no issues with returning memory back to the kernel, which may be problem with sbrk). So I hacked a bit and switched the AllocSet implementation to mmap(). glibc's malloc() also uses mmap() for larger allocations. Precisely because those allocations can then be handed back to the OS. I don't think we'd want to use mmap() for small allocations either. Let's not re-invent malloc().. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Really bad blowups with hash outer join and nulls
This came up today on IRC, though I suspect the general problem has been seen before: create table m3 (id uuid, status integer); create table q3 (id uuid); insert into m3 select uuid_generate_v4(), floor(random() * 4)::integer from generate_series(1,100); insert into q3 select id from (select id, row_number() over () as rnum from m3) s where (rnum % 4)=0; insert into q3 select null from generate_series(1,25); -- at this point m3 has a million unique rows, while q3 has -- a quarter of the ids of m3, plus as many nulls -- note that work_mem is at the default 4MB explain analyze select status, count(*) from q3 left join m3 on (m3.id=q3.id) group by status; QUERY PLAN -- HashAggregate (cost=51553.62..51553.67 rows=4 width=4) (actual time=259580.168..259580.179 rows=5 loops=1) Group Key: m3.status - Hash Right Join (cost=12927.31..49596.89 rows=391347 width=4) (actual time=18804.502..258498.897 rows=50 loops=1) Hash Cond: (m3.id = q3.id) - Seq Scan on m3 (cost=0.00..16753.10 rows=1038310 width=20) (actual time=0.011..1618.762 rows=100 loops=1) - Hash (cost=6124.47..6124.47 rows=391347 width=16) (actual time=1742.340..1742.340 rows=50 loops=1) Buckets: 131072 (originally 131072) Batches: 65536 (originally 8) Memory Usage: 8837kB - Seq Scan on q3 (cost=0.00..6124.47 rows=391347 width=16) (actual time=0.033..774.732 rows=50 loops=1) Planning time: 0.178 ms Execution time: 259583.185 ms The memory usage of this query runs to hundreds of megs despite the small work_mem. On 9.3.5, which the user was running, the problem was much more extreme (this is with the default 1MB work_mem, and a data set only a quarter of the size): explain analyze select status, count(*) from q3 left join m3 on (m3.id=q3.id) group by status; QUERY PLAN --- HashAggregate (cost=15020.11..15022.11 rows=200 width=4) (actual time=3733245.942..3733245.952 rows=5 loops=1) - Hash Right Join (cost=3976.50..14395.11 rows=125000 width=4) (actual time=227870.000..3732686.692 rows=125000 loops=1) Hash Cond: (m3.id = q3.id) - Seq Scan on m3 (cost=0.00..4189.59 rows=259659 width=20) (actual time=0.024..807.162 rows=25 loops=1) - Hash (cost=1803.00..1803.00 rows=125000 width=16) (actual time=437.991..437.991 rows=125000 loops=1) Buckets: 4096 Batches: 262144 (originally 8) Memory Usage: 1954kB - Seq Scan on q3 (cost=0.00..1803.00 rows=125000 width=16) (actual time=0.011..191.208 rows=125000 loops=1) Total runtime: 3733269.725 ms (8 rows) This query on 9.3.5 allocated over 2.7GB of RAM on my system. Note the explosion in the number of batches. Tracing execution showed that as the memory limit was reached while processing the large number of nulls, the number of batches would be doubled, then only a tiny number of rows could be written out as being no longer in the current batch, so the limit was then quickly reached again. A key part of the problem here may be the fact that nulls hash to exactly 0, and are therefore always part of the initial in-memory batch regardless of how much splitting happens. If a large subset of the data had some other constant value, it would likely have some non-zero bits in its hash value and therefore stand a good chance of being written out to a batch file before things get too desperate, avoiding the mass explosion. A quick test suggests that initializing the hash value to ~0 rather than 0 has a curious effect: the number of batches still explodes, but the performance does not suffer the same way. (I think because almost all the batches end up empty.) I think this is worth doing even in the absence of a more general solution; nulls are common enough and important enough that they shouldn't be the worst-case value if it can be avoided. (Testing this idea revealed that a regression test for rowsecurity is missing an order by clause and changes result based on hash values) -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] assessing parallel-safety
It's important for parallelism to work under the extended query protocol, and that's nontrivial. exec_parse_message() sets cursorOptions. exec_bind_message() runs the planner. We don't know if a parallel plan is okay before seeing max_rows in exec_execute_message(). On Sun, Feb 15, 2015 at 01:29:06AM -0500, Robert Haas wrote: - For testing purposes, I set this up so that the executor activates parallel mode when running any query that seems to be parallel-safe. It seems nearly certain we'd only want to do this when a parallel plan was actually selected, but this behavior is awfully useful for testing, and revealed a number of bugs in the parallel-safety markings. Why wouldn't that be a good permanent behavior? The testing benefit extends to users. If I wrongly mark a function something other than PROPARALLEL_UNSAFE, I'd rather find out ASAP. - The patch includes a script, fixpgproc.pl, which I used to insert and update the parallel markings on the system catalogs. That is, of course, not intended for commit, but it's mighty useful for experimentation. The CREATE OR REPLACE FUNCTION calls in src/backend/catalog/system_views.sql reset proparallel=u for some built-in functions, such as make_interval. Some categories of PROPARALLEL_SAFE functions deserve more attention: - database_to_xml* need to match proparallel of schema_to_xml*, which are PROPARALLEL_UNSAFE due to the heavyweight lock rule. cursor_to_xml* probably need to do likewise. - Several functions, such as array_in and anytextcat, can call arbitrary I/O functions. This implies a policy that I/O functions must be PROPARALLEL_SAFE. I agree with that decision, but it should be documented. Several json-related functions can invoke X-json cast functions; the marking implies that all such cast functions must be PROPARALLEL_SAFE. That's probably okay, too. - All selectivity estimator functions are PROPARALLEL_SAFE. That implies, for example, that any operator using eqjoinsel ought to be PROPARALLEL_SAFE or define its own restriction selectivity estimator function. On the other hand, the planner need not check the parallel safety of selectivity estimator functions. If we're already in parallel mode at the moment we enter the planner, we must be planning a query called within a PROPARALLEL_SAFE function. If the query uses an unsafe operator, the containing function was mismarked. - inet_{client,server}_{addr,port} are PROPARALLEL_SAFE; will the information they need indeed be available in workers? - Assuming you don't want to propagate XactLastRecEnd from the slave back to the master, restrict XLogInsert() during parallel mode. Restrict functions that call it, including pg_create_restore_point, pg_switch_xlog and pg_stop_backup. - pg_extension_config_dump modifies the database, so it shall be PROPARALLEL_UNSAFE. Assuming pg_stat_clear_snapshot won't reach back to the master to clear the snapshot, it shall be PROPARALLEL_RESTRICTED. If a mismarked function elicits ERROR: cannot retain locks acquired while in parallel mode, innocent queries later in the transaction get the same error: begin; select 1; -- ok savepoint q; select database_to_xml(true, true, 'x'); rollback to q; -- ERROR select 1; -- ERROR rollback; --- a/src/backend/commands/typecmds.c +++ b/src/backend/commands/typecmds.c @@ -1610,6 +1610,7 @@ makeRangeConstructors(const char *name, Oid namespace, false, /* leakproof */ false, /* isStrict */ PROVOLATILE_IMMUTABLE,/* volatility */ + PROPARALLEL_UNSAFE, /* parallel safety */ Range constructors are PROPARALLEL_SAFE. --- a/src/backend/executor/functions.c +++ b/src/backend/executor/functions.c @@ -496,7 +496,9 @@ init_execution_state(List *queryTree_list, if (queryTree-commandType == CMD_UTILITY) stmt = queryTree-utilityStmt; else - stmt = (Node *) pg_plan_query(queryTree, 0, NULL); + stmt = (Node *) pg_plan_query(queryTree, + fcache-readonly_func ? CURSOR_OPT_PARALLEL_OK : 0, + NULL); (This code is deciding whether to allow parallelism in one statement of an SQL-language function.) Why does it care if the function is read-only? I see that PL/pgSQL never allows parallelism in queries it launches. --- a/src/include/utils/lsyscache.h +++ b/src/include/utils/lsyscache.h @@ -79,6 +79,7 @@ extern bool op_mergejoinable(Oid opno, Oid inputtype); extern bool op_hashjoinable(Oid opno,
[HACKERS] Issue installing doc tools on OSX
I had a problem installing the doc tools following the directions for OSX at http://www.postgresql.org/docs/9.4/static/docguide-toolsets.html. I'm running OSX Yosemite. I got it to work by changing the command from: sudo port install docbook-dsssl docbook-sgml-4.2 docbook-xml-4.2 docbook-xsl libxslt openjade opensp To: sudo port install docbook-dsssl docbook-sgml-4.2 docbook-xml-4.2 docbook-xsl libxslt opensp openjade I didn't capture the error message unfortunately, but it was more or less: unresolved dependency opensp while installing openjade. Patch is attached. -- - David Steele da...@pgmasters.net diff --git a/doc/src/sgml/docguide.sgml b/doc/src/sgml/docguide.sgml index e0ae262..fa13404 100644 --- a/doc/src/sgml/docguide.sgml +++ b/doc/src/sgml/docguide.sgml @@ -280,7 +280,7 @@ apt-get install docbook docbook-dsssl docbook-xsl libxml2-utils openjade1.3 open para If you use MacPorts, the following will get you set up: programlisting -sudo port install docbook-dsssl docbook-sgml-4.2 docbook-xml-4.2 docbook-xsl libxslt openjade opensp +sudo port install docbook-dsssl docbook-sgml-4.2 docbook-xml-4.2 docbook-xsl libxslt opensp openjade /programlisting /para /sect2 signature.asc Description: OpenPGP digital signature
Re: [HACKERS] EXPERIMENTAL: mmap-based memory context / allocator
On 15.2.2015 21:38, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2015-02-15 21:07:13 +0100, Tomas Vondra wrote: On 15.2.2015 20:56, Heikki Linnakangas wrote: glibc's malloc() also uses mmap() for larger allocations. Precisely because those allocations can then be handed back to the OS. I don't think we'd want to use mmap() for small allocations either. Let's not re-invent malloc().. malloc() does that only for allocations over MAP_THRESHOLD, which is 128kB by default. Vast majority of blocks we allocate are = 8kB, so mmap() almost never happens. The problem is that mmap() is, to my knowledge, noticeably more expensive than sbrk(). Especially with concurrent workloads. Which is why the malloc/libc authors chose to use sbrk... IIRC glibc malloc also batches several allocation into mmap()ed areas after some time. Keep in mind also that aset.c doubles the request size every time it goes back to malloc() for some more space for a given context. So you get up to 128kB pretty quickly. That's true, so for sufficiently large contexts we're already using mmap() indirectly, through libc. Some contexts use just 8kB (ALLOCSET_SMALL_MAXSIZE), but that's just a minority. There will be a population of 8K-to-64K chunks that don't ever get returned to the OS but float back and forth between different MemoryContexts as those are created and deleted. I'm inclined to think this is fine and we don't need to improve on it. Sure, but there are scenarios where that can't happen, because the contexts are created 'concurrently' so the blocks can't float between the contexts. And example that comes to mind is array_agg() with many groups, which is made worse by allocating the MemoryContext data in TopMemoryContext, creating 'islands' and making it impossible to release the memory. http://www.postgresql.org/message-id/e010519fbe83b1331ee0dfcb122a6...@fuzzy.cz Part of the reason for my optimism is that on glibc-based platforms, IME PG backends do pretty well at reducing their memory consumption back down to a minimal value after each query. (On other platforms, not so much, but arguably that's libc's fault not ours.) So I'm not really seeing a problem that needs fixed, and definitely not one that a platform-specific fix will do much for. I certainly agree this is not something we need to fix ASAP, and that bypassing the libc may not be the right remedy. That's why I posted it just here (and not to the CF), and marked it as experimental. That however does not mean we can't improve this somehow - from time to time I have to deal with machines where the minimum amount of memory assigned to a process grew over time, gradually increased memory pressure and eventually causing trouble. There are ways to fix this (e.g. by reopening the connections, thus creating a new backend). -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gcc5: initdb produces gigabytes of _fsm files
Re: Tom Lane 2015-02-15 21030.1424022...@sss.pgh.pa.us Christoph Berg c...@df7cb.de writes: gcc5 is lurking in Debian experimental, and it's breaking initdb. FYI, this is now fixed in Red Hat's rawhide version: https://bugzilla.redhat.com/show_bug.cgi?id=1190978 Don't know what the update process is like for Debian's copy, but maybe you could pester the appropriate people to absorb the referenced upstream fix quickly. Thanks for pushing this towards the gcc people. I've updated the Debian bugs so our gcc maintainers can upload a new version as well. Christoph -- c...@df7cb.de | http://www.df7cb.de/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Selectivity estimation for inet operators
New version of the patch attached with the optimization to break the loop before looking at all of the histogram values. I can reduce join selectivity estimation runtime by reducing the values of the left hand side or both of the sides, if there is interest. Even if the above aspects of the code are really completely right, the comments fail to explain why. I spent a lot of time on the comments, but so far as these points are concerned they still only explain what is being done and not why it's a useful calculation to make. I couldn't write better comments because I don't have strong arguments about it. We can say that we don't try to make use of the both of the endpoints, because we don't know how to combine them. We only use the one with matching family and masklen, and when both of them match we use the distant one to be on the safer side. I added two more sentences to explain the calculation. diff --git a/src/backend/utils/adt/network_selfuncs.c b/src/backend/utils/adt/network_selfuncs.c index 73fc1ca..51a33c2 100644 --- a/src/backend/utils/adt/network_selfuncs.c +++ b/src/backend/utils/adt/network_selfuncs.c @@ -1,32 +1,972 @@ /*- * * network_selfuncs.c * Functions for selectivity estimation of inet/cidr operators * - * Currently these are just stubs, but we hope to do better soon. + * This module provides estimators for the subnet inclusion and overlap + * operators. Estimates are based on null fraction, most common values, + * and histogram of inet/cidr columns. * * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group * Portions Copyright (c) 1994, Regents of the University of California * * * IDENTIFICATION * src/backend/utils/adt/network_selfuncs.c * *- */ #include postgres.h +#include math.h + +#include access/htup_details.h +#include catalog/pg_operator.h +#include catalog/pg_statistic.h #include utils/inet.h +#include utils/lsyscache.h +#include utils/selfuncs.h +/* Default selectivity for the inet overlap operator */ +#define DEFAULT_OVERLAP_SEL 0.01 + +/* Default selectivity for the various inclusion operators */ +#define DEFAULT_INCLUSION_SEL 0.005 + +/* Default selectivity for specified operator */ +#define DEFAULT_SEL(operator) \ + ((operator) == OID_INET_OVERLAP_OP ? \ + DEFAULT_OVERLAP_SEL : DEFAULT_INCLUSION_SEL) + +static Selectivity networkjoinsel_inner(Oid operator, + VariableStatData *vardata1, VariableStatData *vardata2); +static Selectivity networkjoinsel_semi(Oid operator, + VariableStatData *vardata1, VariableStatData *vardata2); +static Selectivity mcv_population(float4 *mcv_numbers, int mcv_nvalues); +static Selectivity inet_hist_value_sel(Datum *values, int nvalues, + Datum constvalue, int opr_codenum); +static Selectivity inet_mcv_join_sel(Datum *mcv1_values, + float4 *mcv1_numbers, int mcv1_nvalues, Datum *mcv2_values, + float4 *mcv2_numbers, int mcv2_nvalues, Oid operator); +static Selectivity inet_mcv_hist_sel(Datum *mcv_values, float4 *mcv_numbers, + int mcv_nvalues, Datum *hist_values, int hist_nvalues, + int opr_codenum); +static Selectivity inet_hist_inclusion_join_sel(Datum *hist1_values, + int hist1_nvalues, + Datum *hist2_values, int hist2_nvalues, + int opr_codenum); +static Selectivity inet_semi_join_sel(Datum lhs_value, + bool mcv_exists, Datum *mcv_values, int mcv_nvalues, + bool hist_exists, Datum *hist_values, int hist_nvalues, + double hist_weight, + FmgrInfo *proc, int opr_codenum); +static int inet_opr_codenum(Oid operator); +static int inet_inclusion_cmp(inet *left, inet *right, int opr_codenum); +static int inet_masklen_inclusion_cmp(inet *left, inet *right, + int opr_codenum); +static int inet_hist_match_divider(inet *boundary, inet *query, + int opr_codenum); + +/* + * Selectivity estimation for the subnet inclusion/overlap operators + */ Datum networksel(PG_FUNCTION_ARGS) { - PG_RETURN_FLOAT8(0.001); + PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0); + Oid operator = PG_GETARG_OID(1); + List *args = (List *) PG_GETARG_POINTER(2); + int varRelid = PG_GETARG_INT32(3); + VariableStatData vardata; + Node *other; + bool varonleft; + Selectivity selec, +mcv_selec, +non_mcv_selec; + Datum constvalue, + *hist_values; + int hist_nvalues; + Form_pg_statistic stats; + double sumcommon, +nullfrac; + FmgrInfo proc; + + /* + * If expression is not (variable op something) or (something op + * variable), then punt and return a default estimate. + */ + if (!get_restriction_variable(root, args, varRelid, + vardata, other, varonleft)) + PG_RETURN_FLOAT8(DEFAULT_SEL(operator)); + + /* + * Can't do anything useful if the something is not a constant, either. + */ + if (!IsA(other, Const)) + { +
Re: [HACKERS] mogrify and indent features for jsonb
On 02/14/2015 10:06 PM, Andrew Dunstan wrote: Attached is a patch to provide a number of very useful facilities to jsonb that people have asked for. These are based on work by Dmitry Dolgov in his jsonbx extension, but I take responsibility for any bugs. The facilities are: new operations: concatenation:jsonb || jsonb - jsonb deletion: jsonb - text - jsonb deletion: jsonb - int - text new functions: produce indented text: jsonb_indent(jsonb) - text change an element at a path: jsonb_replace(jsonb, text[], jsonb) - jsonb. It would be relatively trivial to add: delete an element at a path: jsonb_delete(jsonb, text[]) - json and I think we should do that for the sake of completeness. The docs might need a little extra work, and the indent code definitely needs work, which I hope to complete in the next day or two, but I wanted to put a stake in the ground. In this version the indent code now works correctly, and there is an additional delete operator: jsonb - text[] - jsonb Which deletes data at the designated path. cheers andrew diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index d57243a..9936bff 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -10256,6 +10256,30 @@ table2-mapping entryDo all of these key/element emphasisstrings/emphasis exist?/entry entryliteral'[a, b]'::jsonb ?amp; array['a', 'b']/literal/entry /row + row +entryliteral||/literal/entry +entrytypejsonb/type/entry +entryConcatentate these two values to make a new value/entry +entryliteral'[a, b]'::jsonb || '[c, d]'::jsonb/literal/entry + /row + row +entryliteral-/literal/entry +entrytypetext/type/entry +entryDelete the field with this key, or element with this value/entry +entryliteral'{a: b}'::jsonb - 'a' /literal/entry + /row + row +entryliteral-/literal/entry +entrytypeinteger/type/entry +entryDelete the field or element with this index (Negative integers count from the end)/entry +entryliteral'[a, b]'::jsonb - 1 /literal/entry + /row + row +entryliteral-/literal/entry +entrytypetext[]/type/entry +entryDelete the field or element with this path/entry +entryliteral'[a, {b:1}]'::jsonb - '{1,b}'::text[] /literal/entry + /row /tbody /tgroup /table @@ -10760,6 +10784,42 @@ table2-mapping entryliteraljson_strip_nulls('[{f1:1,f2:null},2,null,3]')/literal/entry entryliteral[{f1:1},2,null,3]/literal/entry /row + row + entryparaliteraljsonb_replace(target jsonb, path text[], replacement jsonb)/literal + /para/entry + entryparatypejsonb/type/para/entry + entry + Returns replaceabletarget/replaceable + with the section designated by replaceablepath/replaceable + replaced by replaceablereplacement/replaceable. + /entry + entryliteraljsonb_replace('[{f1:1,f2:null},2,null,3]', '{0,f1},'[2,3,4]')/literal/entry + entryliteral[{f1:[2,3,4],f2:null},2,null,3]/literal +/entry + /row + row + entryparaliteraljsonb_indent(from_json jsonb)/literal + /para/entry + entryparatypetext/type/para/entry + entry + Returns replaceablefrom_json/replaceable + as indented json text. + /entry + entryliteraljsonb_indent('[{f1:1,f2:null},2,null,3]')/literal/entry + entry +programlisting + [ + { + f1: 1, + f2: null + }, + 2, + null, + 3 + ] +/programlisting +/entry + /row /tbody /tgroup /table diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c index 644ea6d..133b9ba 100644 --- a/src/backend/utils/adt/jsonb.c +++ b/src/backend/utils/adt/jsonb.c @@ -77,6 +77,8 @@ static void datum_to_jsonb(Datum val, bool is_null, JsonbInState *result, static void add_jsonb(Datum val, bool is_null, JsonbInState *result, Oid val_type, bool key_scalar); static JsonbParseState * clone_parse_state(JsonbParseState * state); +static char *JsonbToCStringWorker(StringInfo out, JsonbContainer *in, int estimated_len, bool indent); +static void add_indent(StringInfo out, bool indent, int level); /* * jsonb type input function @@ -414,12 +416,40 @@ jsonb_in_scalar(void *pstate, char *token, JsonTokenType tokentype) char * JsonbToCString(StringInfo out, JsonbContainer *in, int estimated_len) { + return JsonbToCStringWorker(out, in, estimated_len, false); +} + +/* + * same thing but with indentation turned on + */ + +char * +JsonbToCStringIndent(StringInfo out, JsonbContainer *in, int estimated_len) +{ + return JsonbToCStringWorker(out, in, estimated_len, true); +} + + +/* + * common worker for above two functions + */ +static
[HACKERS] Rationalizing the API for array_ref, array_set, and friends
The four functions array_ref, array_set, array_get_slice, array_set_slice have traditionally declared their array inputs and results as being of type ArrayType *. This is a lie, and has been since Berkeley days, because they actually also support fixed-length array types such as name and point. If anyone tried to reference such arguments/results as an ArrayType struct they'd be in for a surprise. We made it worse when we invented toasting, because the arguments (though not the results, at the moment) might also be toast pointers or short-header varlenas. Aside from being confusing, ISTM this notational abuse runs some risk of bad code generation, because a compiler might think the declaration of these arguments entitles it to assume the pointers are 4-byte-aligned. I'm not aware that we've had any actual bug reports like that, but as the compiler boys get ever tenser, who knows what will happen? Anyway, I'd been quietly averting my eyes from this issue for some time, but preserving this mess is helping to make a mess of my expanded-array patch, so I think it's time to do something about it. Does anyone have an objection to my pushing the attached cleanup patch? It simply fixes the function declarations and adds/subtracts DatumGet/GetDatum calls as necessary. regards, tom lane diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c index 3b95552..44d3b3e 100644 *** a/src/backend/commands/extension.c --- b/src/backend/commands/extension.c *** pg_extension_config_dump(PG_FUNCTION_ARG *** 2158,2170 } } ! a = array_set(a, 1, arrayIndex, ! elementDatum, ! false, ! -1 /* varlena array */ , ! sizeof(Oid) /* OID's typlen */ , ! true /* OID's typbyval */ , ! 'i' /* OID's typalign */ ); } repl_val[Anum_pg_extension_extconfig - 1] = PointerGetDatum(a); repl_repl[Anum_pg_extension_extconfig - 1] = true; --- 2158,2170 } } ! a = DatumGetArrayTypeP(array_set(PointerGetDatum(a), 1, arrayIndex, ! elementDatum, ! false, ! -1 /* varlena array */ , ! sizeof(Oid) /* OID's typlen */ , ! true /* OID's typbyval */ , ! 'i' /* OID's typalign */ )); } repl_val[Anum_pg_extension_extconfig - 1] = PointerGetDatum(a); repl_repl[Anum_pg_extension_extconfig - 1] = true; *** pg_extension_config_dump(PG_FUNCTION_ARG *** 2196,2208 elog(ERROR, extconfig and extcondition arrays do not match); /* Add or replace at same index as in extconfig */ ! a = array_set(a, 1, arrayIndex, ! elementDatum, ! false, ! -1 /* varlena array */ , ! -1 /* TEXT's typlen */ , ! false /* TEXT's typbyval */ , ! 'i' /* TEXT's typalign */ ); } repl_val[Anum_pg_extension_extcondition - 1] = PointerGetDatum(a); repl_repl[Anum_pg_extension_extcondition - 1] = true; --- 2196,2208 elog(ERROR, extconfig and extcondition arrays do not match); /* Add or replace at same index as in extconfig */ ! a = DatumGetArrayTypeP(array_set(PointerGetDatum(a), 1, arrayIndex, ! elementDatum, ! false, ! -1 /* varlena array */ , ! -1 /* TEXT's typlen */ , ! false /* TEXT's typbyval */ , ! 'i' /* TEXT's typalign */ )); } repl_val[Anum_pg_extension_extcondition - 1] = PointerGetDatum(a); repl_repl[Anum_pg_extension_extcondition - 1] = true; diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c index 0e7400f..e66d7b5 100644 *** a/src/backend/executor/execQual.c --- b/src/backend/executor/execQual.c *** static Datum ExecEvalCurrentOfExpr(ExprS *** 252,263 * * NOTE: if we get a NULL result from a subscript expression, we return NULL * when it's an array reference, or raise an error when it's an assignment. - * - * NOTE: we deliberately refrain from applying DatumGetArrayTypeP() here, - * even though that might seem natural, because this code needs to support - * both varlena arrays and fixed-length array types. DatumGetArrayTypeP() - * only works for the varlena kind. The routines we call in arrayfuncs.c - * have to know the difference (that's what they need refattrlength for). *-- */ static Datum --- 252,257 *** ExecEvalArrayRef(ArrayRefExprState *asta *** 267,274 ExprDoneCond *isDone) { ArrayRef *arrayRef = (ArrayRef *) astate-xprstate.expr; ! ArrayType *array_source; ! ArrayType *resultArray; bool isAssignment = (arrayRef-refassgnexpr != NULL); bool eisnull; ListCell *l; --- 261,267 ExprDoneCond *isDone) { ArrayRef *arrayRef = (ArrayRef *) astate-xprstate.expr; ! Datum array_source; bool isAssignment = (arrayRef-refassgnexpr != NULL); bool eisnull; ListCell *l; *** ExecEvalArrayRef(ArrayRefExprState *asta *** 278,288
Re: [HACKERS] Issue installing doc tools on OSX
On 2/15/15 7:50 PM, Peter Eisentraut wrote: On 2/15/15 6:31 PM, David Steele wrote: I had a problem installing the doc tools following the directions for OSX at http://www.postgresql.org/docs/9.4/static/docguide-toolsets.html. I'm running OSX Yosemite. I got it to work by changing the command from: sudo port install docbook-dsssl docbook-sgml-4.2 docbook-xml-4.2 docbook-xsl libxslt openjade opensp To: sudo port install docbook-dsssl docbook-sgml-4.2 docbook-xml-4.2 docbook-xsl libxslt opensp openjade I didn't capture the error message unfortunately, but it was more or less: unresolved dependency opensp while installing openjade. That seems a bit incredible, since port should be able to resolve the dependencies by itself. I suggest that this should be reported as a bug to MacPorts. Sure, that has been my experience, but the error message was very clear. Unfortunately I did not capture the error before I changed the order and the log file was removed on the next run. Since I have no easy way to reproduce it I'm not sure it's worth submitting as a bug. However, I thought a reorder of the packages in our docs couldn't hurt, and might help. -- - David Steele da...@pgmasters.net signature.asc Description: OpenPGP digital signature
Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
On Fri, Feb 13, 2015 at 7:22 PM, Jim Nasby jim.na...@bluetreble.com wrote: In patch 1, sepgsql is also affected by this commit. Note that this commit necessitates an initdb, since stored ACLs are broken. Doesn't that warrant bumping catversion? Yes, but traditionally that is left until the last minute - when the patch is committed. That's why it's called out in the commit message (it isn't otherwise obvious - it's not a common catversion necessitating change like an addition to a system catalog). Patch 2 + * When killing a speculatively-inserted tuple, we set xmin to invalid and +if (!(xlrec-flags XLOG_HEAP_KILLED_SPECULATIVE_TUPLE)) When doing this, should we also set the HEAP_XMIN_INVALID hint bit? reads more of patch Ok, I see we're not doing this because the check for a super deleted tuple is already cheap. Probably worth mentioning that in the comment... See my remarks to Heikki on this. I don't think it adds much. ExecInsert(): + * We don't suppress the effects (or, perhaps, side-effects) of + * BEFORE ROW INSERT triggers. This isn't ideal, but then we + * cannot proceed with even considering uniqueness violations until + * these triggers fire on the one hand, but on the other hand they + * have the ability to execute arbitrary user-defined code which + * may perform operations entirely outside the system's ability to + * nullify. I'm a bit confused as to why we're calling BEFORE triggers out here... hasn't this always been true for both BEFORE *and* AFTER triggers? The comment makes me wonder if there's some magic that happens with AFTER... Yes, but the difference here is that the UPDATE path could be taken (which is sort of like when a before row insert path returns NULL). What I'm calling out is the dependency on firing before row insert triggers to decide if the alternative path must be taken. Roughly speaking, we must perform part of the INSERT (firing of before row insert triggers) in order to decide to do or not do an INSERT. This is, as I said, similar to when those triggers return NULL, and won't matter with idiomatic patterns of before trigger usage. Still feels worth calling out, because sometimes users do foolishly write before triggers with many external side-effects. ExecLockUpdateTuple(): + * Try to lock tuple for update as part of speculative insertion. If + * a qual originating from ON CONFLICT UPDATE is satisfied, update + * (but still lock row, even though it may not satisfy estate's + * snapshot). I find this confusing... which row is being locked? The previously inserted one? Perhaps a better wording would be satisfied, update. Lock the original row even if it doesn't satisfy estate's snapshot. Take a look at the executor README. We're locking the only row that can be locked when an UPSERT non-conclusively thinks to take the UPDATE path: the row that was found during our pre-check. We can only UPDATE when we find such a tuple, and then lock it without finding a row-level conflict. infer_unique_index(): +/* + * We need not lock the relation since it was already locked, either by + * the rewriter or when expand_inherited_rtentry() added it to the query's + * rangetable. + */ +relationObjectId = rt_fetch(parse-resultRelation, parse-rtable)-relid; + +relation = heap_open(relationObjectId, NoLock); Seems like there should be an Assert here. Also, the comment should probably go before the heap_open call. An Assert() of what? Note that the similar function get_relation_info() does about the same thing here. heapam_xlog.h: +/* reuse xl_heap_multi_insert-only bit for xl_heap_delete */ I wish this would mention why it's safe to do this. Also, the comment mentions xl_heap_delete when the #define is for XLOG_HEAP_KILLED_SPECULATIVE_TUPLE; presumably that's wrong. Perhaps: /* * reuse XLOG_HEAP_LAST_MULTI_INSERT bit for * XLOG_HEAP_KILLED_SPECULATIVE_TUPLE. This is safe because we never do * multi-inserts for INSERT ON CONFLICT. */ It's safe, as the comment indicates, because the former is only used for xl_heap_multi_insert records, while the latter is only used for xl_heap_delete records. There is no ambiguity, because naturally we're always able to establish what type of record is under consideration before checking the bit is set. The XLOG_HEAP_* format is used for other flags there, despite the fact that other flags (like XLOG_HEAP_PREFIX_FROM_OLD) can only appear in certain context-appropriate xl_heap_* records. AFAICT, all that I've done that's new here is rely on that, since a bunch of those bits were used up in the last year or two, and the need to even consider bit reuse here is a new problem. I'll review the remaining patches later. Thanks. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Really bad blowups with hash outer join and nulls
Tomas == Tomas Vondra tomas.von...@2ndquadrant.com writes: Tomas Improving the estimates is always good, but it's not going to Tomas fix the case of non-NULL values (it shouldn't be all that Tomas difficult to create such examples with a value whose hash starts Tomas with a bunch of zeroes). Right now, I can't get it to plan such an example, because (a) if there are no stats to work from then the planner makes fairly pessimistic assumptions about hash bucket filling, and (b) if there _are_ stats to work from, then a frequently-occurring non-null value shows up as an MCV and the planner takes that into account to calculate bucketsize. The problem could only be demonstrated for NULLs because the planner was ignoring NULL for the purposes of estimating bucketsize, which is correct for all join types except RIGHT and FULL (which, iirc, are more recent additions to the hashjoin repertoire). If you want to try testing it, you may find this useful: select i, hashint8(i) from unnest(array[1474049294, -1779024306, -1329041947]) u(i); i | hashint8 -+-- 1474049294 |0 -1779024306 |0 -1329041947 |0 (3 rows) (those are the only three int4 values that hash to exactly 0) It's probably possible to construct pathological cases by finding a lot of different values with zeros in the high bits of the hash, but that's something that wouldn't be likely to happen by chance. Tomas I think this might be solved by relaxing the check a bit. Yeah, that looks potentially useful. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New CF app deployment
On Sun, Feb 15, 2015 at 4:59 PM, Peter Eisentraut pete...@gmx.net wrote: I think the old system where the patch submitter declared, this message contains my patch, is the only one that will work. I tend to agree. That being said, calling out latest attachments is also useful (or highlighting that a particular mail has a particular file attached in general). Annotations-wise, I think that it would be nice to be able to modify an annotation, in case a typo is made (the old app supported this). I also think that it's a waste of screen space to show who within the annotation view. Granted, the old app supported this, but I tend to think that if I actually cared who added a certain annotation, I'd be happy to drill down into history. I haven't cared so far, AFAICR. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW
On 2014/09/13 0:13, Tom Lane wrote: Albe Laurenz laurenz.a...@wien.gv.at writes: Tom Lane wrote: I'm not sure offhand what the new plan tree ought to look like. We could just generate a ForeignScan node, but that seems like rather a misnomer. Is it worth inventing a new ForeignUpdate node type? Or maybe it'd still be ForeignScan but with a flag field saying hey this is really an update (or a delete). The main benefit I can think of right now is that the EXPLAIN output would be less strange-looking --- but EXPLAIN is hardly the only thing that ever looks at plan trees, so having an outright misleading plan structure is likely to burn us down the line. I can understand these qualms. I wonder if ForeignUpdate is such a good name though, since it would surprise the uninitiate that in the regular (no push-down) case the actual modification is *not* performed by ForeignUpdate. So it should rather be a ForeignModifyingScan, but I personally would prefer a has_side_effects flag on ForeignScan. I was envisioning that the EXPLAIN output would look like Foreign Scan on tab1 Remote SQL: SELECT ... for the normal case, versus Foreign Update on tab1 Remote SQL: UPDATE ... for the pushed-down-update case (and similarly for DELETE). For a non-optimized update it'd still be a ForeignScan underneath a ModifyTable. As for the internal representation, I was thinking of adding a CmdType field to struct ForeignScan, with currently only CMD_SELECT, CMD_UPDATE, CMD_DELETE as allowed values, though possibly in future we'd think of a reason to allow CMD_INSERT there. This is more or less isomorphic to your has_side_effects flag, but it allows distinguishing UPDATE from DELETE which might be useful. The only thing that's bothering me about this concept is that I'm not seeing how to scale it up to handling a pushed-down update on a join, ie, UPDATE foo ... FROM bar ... where both foo and bar are remote. Maybe it's silly to worry about that until join push-down is done; but in that case I'd vote for postponing this whole patch until we have join push-down. I'll re-add this to the final CF. And I'll update the patch. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication Helpers WIP for discussion
On Sun, Feb 15, 2015 at 11:25 PM, Petr Jelinek p...@2ndquadrant.com wrote: On 13/02/15 14:04, Petr Jelinek wrote: On 13/02/15 08:48, Michael Paquier wrote: Looking at this patch, I don't see what we actually gain much here except a decoder plugin that speaks a special protocol for a special background worker that has not been presented yet. What actually is the value of that defined as a contrib/ module in-core. Note that we have already test_decoding to basically test the logical decoding facility, used at least at the SQL level to get logical changes decoded. Based on those reasons I am planning to mark this as rejected (it has no documentation as well). So please speak up if you think the contrary, but it seems to me that this could live happily out of core. I think you are missing point of this, it's not meant to be committed in this form at all and even less as contrib module. It was meant as basis for in-core logical replication discussion, but sadly I didn't really have time to pursue it in this CF in the end. That being said and looking at the size of February CF, I think I am fine with dropping this in 9.5 cycle, it does not seem likely that there will be anything useful done with this fast enough to get to 9.5 so there is no point in spending committer resources on it in final CF. I will pick it up again after the CF is done. OK, thanks for the clarifications. Note that I am marking it as rejected in CF 2014-12 not because it is something that is not wanted, but just not to re-add it to CF 2015-02 which is what returned with feedback actually does... -- Michael
Re: [HACKERS] Commit fest 2015-12 enters money time
On Fri, Feb 13, 2015 at 10:06 AM, Michael Paquier michael.paqu...@gmail.com wrote: In order to move on to the next CF, I am going to go through all the remaining patches and update their status accordingly. And sorry for slacking a bit regarding that. If you wish to complain, of course feel free to post messages on this thread or on the thread related to the patch itself. As all the entries have been either moved or updated, CF 2014-12 is closed, and CF 2015-02 has been changed to In Progress. Regards, -- Michael
Re: [HACKERS] Join push-down support for foreign tables
Hanada-san, Your patch mixtures enhancement of custom-/foreign-scan interface and enhancement of contrib/postgres_fdw... Probably, it is a careless mis- operation. Please make your patch as differences from my infrastructure portion. Also, I noticed this Join pushdown support for foreign tables patch is unintentionally rejected in the last commit fest. https://commitfest.postgresql.org/3/20/ I couldn't register myself as reviewer. How do I operate it on the new commitfest application? Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Shigeru Hanada Sent: Monday, February 16, 2015 1:03 PM To: Robert Haas Cc: PostgreSQL-development Subject: Re: [HACKERS] Join push-down support for foreign tables Hi I've revised the patch based on Kaigai-san's custom/foreign join patch posted in the thread below. http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F80 108c...@bpxm15gp.gisp.nec.co.jp Basically not changed from the version in the last CF, but as Robert commented before, N-way (not only 2-way) joins should be supported in the first version by construct SELECT SQL by containing source query in FROM clause as inline views (a.k.a. from clause subquery). 2014-12-26 13:48 GMT+09:00 Shigeru Hanada shigeru.han...@gmail.com: 2014-12-16 1:22 GMT+09:00 Robert Haas robertmh...@gmail.com: On Mon, Dec 15, 2014 at 3:40 AM, Shigeru Hanada shigeru.han...@gmail.com wrote: I'm working on $SUBJECT and would like to get comments about the design. Attached patch is for the design below. I'm glad you are working on this. 1. Join source relations As described above, postgres_fdw (and most of SQL-based FDWs) needs to check that 1) all foreign tables in the join belong to a server, and 2) all foreign tables have same checkAsUser. In addition to that, I add extra limitation that both inner/outer should be plain foreign tables, not a result of foreign join. This limiation makes SQL generator simple. Fundamentally it's possible to join even join relations, so N-way join is listed as enhancement item below. It seems pretty important to me that we have a way to push the entire join nest down. Being able to push down a 2-way join but not more seems like quite a severe limitation. Hmm, I agree to support N-way join is very useful. Postgres-XC's SQL generator seems to give us a hint for such case, I'll check it out again. -- Shigeru HANADA -- Shigeru HANADA -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow snapshot too old error, to prevent bloat
Tom Lane t...@sss.pgh.pa.us wrote: Jim Nasby jim.na...@bluetreble.com writes: On 2/15/15 10:36 AM, Tom Lane wrote: I wonder if we couldn't achieve largely the same positive effects through adding a simple transaction-level timeout option. We suggested this to our customer and got out of the meeting with it looking like it *might* fly. In the next meeting, however, they said they had run it by others and reviewed the code and it was completely unacceptable -- they would not consider pg with this as the solution. A common use-case is long-running reports hitting relatively stable data in a database that also has tables with a high churn rate (ie: queue tables). In those scenarios your only options right now are to suffer huge amounts of bloat in the high-churn or not do your reporting. A simple transaction timeout only solves this by denying you reporting queries. Agreed, but Kevin's proposal has exactly the same problem only worse, because (a) the reporting transactions might or might not fail (per Murphy, they'll fail consistently only when you're on deadline), and (b) if they do fail, there's nothing you can do short of increasing the slack db-wide. These they were comfortable with, and did *not* consider to be unpredictable or something they could not do something about. I really don't feel I can say more than that, though, without disclosing more than I should. An idea that I've had on this would be some way to lock down the tables that a long-running transaction could access. That would allow vacuum to ignore any snapshots that transaction had for tables it wasn't accessing. That's something that would be deterministic. While this option was not specifically suggested, based on their their reasons that numerous other options we suggested were unacceptable, I feel sure that this would not be acceptable. I think Tom hit the nail on the head when he said Maybe refugees from Oracle will think that sounds good... It is precisely those with very large code bases which have been modified over long periods of time to work with Oracle that would find this solution attractive, or perhaps *necessary* to consider a move to Postgres. That's a potential market we don't care to write off. There might be something in that, but again it's not much like this patch. The key point I think we're both making is that nondeterministic failures are bad, especially when you're talking about long-running, expensive-to- retry transactions. What the customer most doesn't want to be nondeterministic is whether the error is generated only when the snapshot has been used to read a page which has been modified since the snapshot was taken. If tables or materialized views are set up before a query and then not subsequently modified during the execution of the query, that query must not be canceled even if it runs for days, but it must not cause unconstrained bloat during the run. So far I don't see any way to avoid including the LSN with the snapshot or modifying the index AMs. Let's be clear on the footprint for that; for the btree implementation it is: src/backend/access/nbtree/nbtinsert.c | 7 --- src/backend/access/nbtree/nbtpage.c | 2 +- src/backend/access/nbtree/nbtsearch.c | 43 ++- src/include/access/nbtree.h | 7 --- 4 files changed, 43 insertions(+), 16 deletions(-) What this discussion has made me reconsider is the metric for considering a transaction too old. The number of transaction IDs consumed seems inferior as the unit of measure for that to LSN or time. It looks to me to be pretty trivial (on the order of maybe 30 lines of code) to specify this GUC in minutes rather than transaction IDs. At first glance this seems like it would be vulnerable to the usual complaints about mismanaged clocks, but those are easily answered by using a cached time in shared memory that we populate in such a way that it can never move backward. Done right, this could even allow the GUC to be changeable on reload rather than only at restart. A badly mismanaged system clock could not cause a query to generate incorrect results; the worst that could happen is that this feature would fail to control bloat as much as expected or reads of modified data could generate the snapshot too old error around the time of the clock adjustment. As before, this would default to a magic value to mean that you want the historical PostgreSQL behavior. If that makes the idea more palatable to anyone, I can submit a patch to that effect within the next 24 hours. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump gets attributes from tables in extensions
On Mon, Feb 16, 2015 at 4:45 PM, Michael Paquier michael.paqu...@gmail.comwrote: but not dumping any tables that are part of ext_member. Excuse the typo = s/but/by/. -- Michael
Re: [HACKERS] Join push-down support for foreign tables
Hi I've revised the patch based on Kaigai-san's custom/foreign join patch posted in the thread below. http://www.postgresql.org/message-id/9a28c8860f777e439aa12e8aea7694f80108c...@bpxm15gp.gisp.nec.co.jp Basically not changed from the version in the last CF, but as Robert commented before, N-way (not only 2-way) joins should be supported in the first version by construct SELECT SQL by containing source query in FROM clause as inline views (a.k.a. from clause subquery). 2014-12-26 13:48 GMT+09:00 Shigeru Hanada shigeru.han...@gmail.com: 2014-12-16 1:22 GMT+09:00 Robert Haas robertmh...@gmail.com: On Mon, Dec 15, 2014 at 3:40 AM, Shigeru Hanada shigeru.han...@gmail.com wrote: I'm working on $SUBJECT and would like to get comments about the design. Attached patch is for the design below. I'm glad you are working on this. 1. Join source relations As described above, postgres_fdw (and most of SQL-based FDWs) needs to check that 1) all foreign tables in the join belong to a server, and 2) all foreign tables have same checkAsUser. In addition to that, I add extra limitation that both inner/outer should be plain foreign tables, not a result of foreign join. This limiation makes SQL generator simple. Fundamentally it's possible to join even join relations, so N-way join is listed as enhancement item below. It seems pretty important to me that we have a way to push the entire join nest down. Being able to push down a 2-way join but not more seems like quite a severe limitation. Hmm, I agree to support N-way join is very useful. Postgres-XC's SQL generator seems to give us a hint for such case, I'll check it out again. -- Shigeru HANADA -- Shigeru HANADA foreign_join.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Join push-down support for foreign tables
Kaigai-san, Oops. I rebased the patch onto your v4 custom/foreign join patch. But as you mentioned off-list, I found a flaw about inappropriate change about NestPath still remains in the patch... I might have made my dev branch into unexpected state. I'll check it soon. 2015-02-16 13:13 GMT+09:00 Kouhei Kaigai kai...@ak.jp.nec.com: Hanada-san, Your patch mixtures enhancement of custom-/foreign-scan interface and enhancement of contrib/postgres_fdw... Probably, it is a careless mis- operation. Please make your patch as differences from my infrastructure portion. Also, I noticed this Join pushdown support for foreign tables patch is unintentionally rejected in the last commit fest. https://commitfest.postgresql.org/3/20/ I couldn't register myself as reviewer. How do I operate it on the new commitfest application? Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Shigeru Hanada Sent: Monday, February 16, 2015 1:03 PM To: Robert Haas Cc: PostgreSQL-development Subject: Re: [HACKERS] Join push-down support for foreign tables Hi I've revised the patch based on Kaigai-san's custom/foreign join patch posted in the thread below. http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F80 108c...@bpxm15gp.gisp.nec.co.jp Basically not changed from the version in the last CF, but as Robert commented before, N-way (not only 2-way) joins should be supported in the first version by construct SELECT SQL by containing source query in FROM clause as inline views (a.k.a. from clause subquery). 2014-12-26 13:48 GMT+09:00 Shigeru Hanada shigeru.han...@gmail.com: 2014-12-16 1:22 GMT+09:00 Robert Haas robertmh...@gmail.com: On Mon, Dec 15, 2014 at 3:40 AM, Shigeru Hanada shigeru.han...@gmail.com wrote: I'm working on $SUBJECT and would like to get comments about the design. Attached patch is for the design below. I'm glad you are working on this. 1. Join source relations As described above, postgres_fdw (and most of SQL-based FDWs) needs to check that 1) all foreign tables in the join belong to a server, and 2) all foreign tables have same checkAsUser. In addition to that, I add extra limitation that both inner/outer should be plain foreign tables, not a result of foreign join. This limiation makes SQL generator simple. Fundamentally it's possible to join even join relations, so N-way join is listed as enhancement item below. It seems pretty important to me that we have a way to push the entire join nest down. Being able to push down a 2-way join but not more seems like quite a severe limitation. Hmm, I agree to support N-way join is very useful. Postgres-XC's SQL generator seems to give us a hint for such case, I'll check it out again. -- Shigeru HANADA -- Shigeru HANADA -- Shigeru HANADA foreign_join.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_dump gets attributes from tables in extensions
Hi all, While looking at the patch to fix pg_dump with extensions containing tables referencing each other, I got surprised by the fact that getTableAttrs tries to dump table attributes even for tables that are part of an extension. Is that normal? Attached is a patch that I think makes things right, but not dumping any tables that are part of ext_member. Thoughts? Regards, -- Michael diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 7e92b74..2c9356b 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -6657,6 +6657,10 @@ getTableAttrs(Archive *fout, DumpOptions *dopt, TableInfo *tblinfo, int numTable if (!tbinfo-interesting) continue; + /* Ignore tables part of an extension */ + if (tbinfo-dobj.ext_member) + continue; + /* * Make sure we are in proper schema for this table; this allows * correct retrieval of formatted type names and default exprs -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers