Re: [HACKERS] WIP patch (v2) for updatable security barrier views
On 01/09/2014 11:19 PM, Tom Lane wrote: Dean Rasheed dean.a.rash...@gmail.com writes: My first thought was that it should just preprocess any security barrier quals in subquery_planner() in the same way as other quals are preprocessed. But thinking about it further, those quals are destined to become the quals of subqueries in the range table, so we don't actually want to preprocess them at that stage --- that will happen later when the new subquery is planned by recursion back into subquery_planner(). So I think the right answer is to make adjust_appendrel_attrs() handle recursion into sublink subqueries. TBH, this sounds like doubling down on a wrong design choice. I see no good reason that updatable security views should require any fundamental rearrangements of the order of operations in the planner; and I doubt that this is the last bug you'll have if you insist on doing that. I'd be quite happy to do this entirely within the rewriter. I've found two persistent obstacles to that, and frankly I'm stuck. I'm going to be reworking the RLS patches on top of Dean's functional patch unless I can find some way to progress with a rewriter based approach. The key problems are: 1. preprocess_targetlist in the planner assumes that the resultRelation is the correct RTE to set as the varno in a new Var it adds to fetch the row ctid (with label ctid1) as a resjunk attr for row-marking. This causes the tlist to have entries pointing to different RTE to the one being scanned by the eventual seqscan / indexscan, though the underlying Relation is the same. tlist validation checks don't like that. There may be other places that need to add tlist entries pointing to the relation we're reading rows from. They'll also need to be able to deal with the fact that this no longer the resultRelation. 2. Despite bashing my head against it for ages, I haven't figured out how to inject references to the base-rel's ctid, oid (if WITH OIDS), and any tlist entries not specified in the DML statement into the subquery tree. These are only accessible at the deepest level of rewriting, when the final view is expanded into a subquery and processed with rewriteTargetListUD(..). At this point we don't have breadcrumbs to use to walk back up the nested subqueries adding the required tlist entries. I keep on exploring ideas for this one, and get stuck in a dead end for every one. Without a way to move on these, I don't have much hope of adding updatable security barrier views support using work done in the rewriter. It seems inevitable that we'll have to add the separate concepts of source relation (tuples to feed into HeapModifyTable for ctid, and for heap_modify_table after junkfiltering) and result relation (target Relation of heap_modify_table to actually write tuples to, target of row level locking operations). There's also going to need to be some kind of breadcrumb chain to allow us to walk from the inner-most expanded view's RTE_RELATION back up the expanded view subquery tlists, adding next-inner-most refs to resjunk ctid and (if needed) oid, injecting defaults, and expanding the target list with Vars to match non-referenced attributes of the inner-most RTE_RELATION. So far I haven't come up with a sensible form for that breadcrumb trail. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On 01/11/2014 12:40 AM, Peter Geoghegan wrote: My problem is that in general I'm not sold on the actual utility of making this kind of row locking work with exclusion constraints. I'm sincerely having a hard time thinking of a practical use-case (although, as I've said, I want to make it work with IGNORE). Even if you work all this row locking stuff out, and the spill-to-disk aspect out, the interface is still wrong, because you need to figure out a way to project more than one reject per slot. Maybe I lack imagination around how to make that work, but there are a lot of ifs and buts either way. Exclusion constraints can be used to implement uniqueness checks with SP-GiST or GiST indexes. For example, if you want to enforce that there are no two tuples with the same x and y coordinates, ie. use a point as the key. You could add a b-tree index just to enforce the constraint, but it's better if you don't have to. In general, it's just always better if features don't have implementation-specific limitations like this. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Filter error log statements by sqlstate
Hi Oskari, I had a quick look over the patch (Not compiled though). Here are few comments on the changes: 1. Documentation is missing and thus becomes difficult to understand what exactly you are trying to do. Or in other words, user will be uncertain about using it more efficiently. 2. Some more comments required. At each new function and specifically at get_sqlstate_error_level(). 3. Please add test-case if possible. 4. Some code part does not comply with PostgreSQL indentation style. (Can be ignored as it will pass through pg_indent, but better fix it). 5. You have used XX000:warning, string to get maximum possible length of the valid sqlstate:level identifier. It's perfect, but small explanation about that will be good there. Also in future if we have any other error level which exceeds this, we need changes here too. Right ? I will look into this further. But please have your attention on above points. Thanks On Fri, Jan 10, 2014 at 12:56 AM, Oskari Saarenmaa o...@ohmu.fi wrote: Allow the default log_min_error_statement to be overridden per sqlstate to make it possible to filter out some error types while maintaining a low log_min_error_statement or enable logging for some error types when the default is to not log anything. I've tried to do something like this using rsyslog filters, but that's pretty awkward and doesn't work at all when the statement is split to multiple syslog messages. https://github.com/saaros/postgres/compare/log-by-sqlstate src/backend/utils/error/elog.c | 183 ++ ++- src/backend/utils/misc/guc.c | 14 +++- src/include/utils/guc.h| 4 + src/include/utils/guc_tables.h | 1 + 4 files changed, 199 insertions(+), 3 deletions(-) / Oskari -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Jeevan B Chalke Principal Software Engineer, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: +91 20 30589500 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message.
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On 01/11/2014 12:39 PM, Peter Geoghegan wrote: In any case, my patch is bound to win decisively for the other extreme, the insert-only case, because the overhead of doing an index scan first is always wasted there with your approach, and the overhead of extended btree leaf page locking has been shown to be quite low. Quite possibly. Run the benchmark, and we'll see how big a difference we're talking about. In the past you've spoken of avoiding that overhead through an adaptive strategy based on statistics, but I think you'll have a hard time beating a strategy where the decision comes as late as possible, and is informed by highly localized page-level metadata already available. My implementation can abort an attempt to just read an existing would-be duplicate very inexpensively (with no strong locks), going back to just after the _bt_search() to get a heavyweight lock if just reading doesn't work out (if there is no duplicate found), so as to not waste all of its prior work. Doing one of the two extremes of insert-mostly or update-only well is relatively easy; dynamically adapting to one or the other is much harder. Especially if it's a consistent mix of inserts and updates, where general observations aren't terribly useful. Another way to optimize it is to keep the b-tree page pinned after doing the pre-check. Then you don't need to descend the tree again when doing the insert. That would require small indexam API changes, but wouldn't be too invasive, I think. All other concerns of mine still remain, including the concern over the extra locking of the proc array - I'm concerned about the performance impact of that on other parts of the system not exercised by this test. Yeah, I'm not thrilled about that part either. Fortunately there are other ways to implement that. In fact, I think you could just not bother taking the ProcArrayLock when setting the fields. The danger is that another backend sees a mixed state of the fields, but that's OK. The worst that can happen is that it will do an unnecessary lock/release on the heavy-weight lock. And to reduce the overhead when reading the fields, you could merge the SpeculativeInsertionIsInProgress() check into TransactionIdIsInProgress(). The call site in tqual.c always calls it together with TransactionIdIsInProgress(), which scans the proc array anyway, while holding the lock. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ECPG regression tests generating warnings
On Sun, Jan 12, 2014 at 08:28:57AM -0800, Kevin Grittner wrote: desc.pgc:55: WARNING: descriptor outdesc does not exist desc.pgc:86: WARNING: descriptor outdesc does not exist Thanks, I didn't notice, fixed. Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org Jabber: michael.meskes at gmail dot com VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch (v2) for updatable security barrier views
On 12 January 2014 10:12, Craig Ringer cr...@2ndquadrant.com wrote: On 01/09/2014 06:48 PM, Dean Rasheed wrote: On 8 January 2014 10:19, Dean Rasheed dean.a.rash...@gmail.com wrote: The assertion failure with inheritance and sublinks is a separate issue --- adjust_appendrel_attrs() is not expecting to find any unplanned sublinks in the query tree when it is invoked, since they would normally have all been planned by that point. However, the addition of the new security barrier subqueries after inheritance expansion can now insert new sublinks which need to be planned. I'll look into how best to make that happen. The attached patch does that, which fixes the case you reported. Dean, any objections to adding this to the current CF, or to my doing so? OK, I'll do that. I've added a page to the wiki with a more in-depth description of how the patch works, and the test cases I've tried so far: https://wiki.postgresql.org/wiki/Making_security_barrier_views_automatically_updatable there's obviously still a lot more testing to do, but the early signs are encouraging. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Filter error log statements by sqlstate
Hi, On 13/01/14 10:26, Jeevan Chalke wrote: 1. Documentation is missing and thus becomes difficult to understand what exactly you are trying to do. Or in other words, user will be uncertain about using it more efficiently. I figured I'd write documentation for this if it looks like a useful feature which would be accepted for 9.4, but I guess it would've helped to have a bit better description of this for the initial submission as well. 2. Some more comments required. At each new function and specifically at get_sqlstate_error_level(). Just after I submitted the patch I noticed that I had a placeholder for comment about that function but never wrote the actual comment, sorry about that. 3. Please add test-case if possible. Sure. 4. Some code part does not comply with PostgreSQL indentation style. (Can be ignored as it will pass through pg_indent, but better fix it). I'll try to fix this for v2. 5. You have used XX000:warning, string to get maximum possible length of the valid sqlstate:level identifier. It's perfect, but small explanation about that will be good there. Also in future if we have any other error level which exceeds this, we need changes here too. Right ? Good point, I'll address this in v2. I will look into this further. But please have your attention on above points. Thanks for the review! / Oskari On Fri, Jan 10, 2014 at 12:56 AM, Oskari Saarenmaa o...@ohmu.fi wrote: Allow the default log_min_error_statement to be overridden per sqlstate to make it possible to filter out some error types while maintaining a low log_min_error_statement or enable logging for some error types when the default is to not log anything. I've tried to do something like this using rsyslog filters, but that's pretty awkward and doesn't work at all when the statement is split to multiple syslog messages. https://github.com/saaros/postgres/compare/log-by-sqlstate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] Archive recovery failure on 9.3+.
On 01/09/2014 10:55 PM, Josh Berkus wrote: On 01/09/2014 12:05 PM, Heikki Linnakangas wrote: Actually, why is the partially-filled 00010002 file archived in the first place? Looking at the code, it's been like that forever, but it seems like a bad idea. If the original server is still up and running, and writing more data to that file, what will happen is that when the original server later tries to archive it, it will fail because the partial version of the file is already in the archive. Or worse, the partial version overwrites a previously archived more complete version. Oh! This explains some transient errors I've seen. Wouldn't it be better to not archive the old segment, and instead switch to a new segment after writing the end-of-recovery checkpoint, so that the segment on the new timeline is archived sooner? It would be better to zero-fill and switch segments, yes. We should NEVER be in a position of archiving two different versions of the same segment. Ok, I think we're in agreement that that's the way to go for master. Now, what to do about back-branches? On one hand, I'd like to apply the same fix to all stable branches, as the current behavior is silly and always has been. On the other hand, we haven't heard any complaints about it, so we probably shouldn't fix what ain't broken. Perhaps we should apply it to 9.3, as that's where we have the acute problem the OP reported. Thoughts? In summary, I propose that we change master and REL9_3_STABLE to not archive the partial segment from previous timeline. Older branches will keep the current behavior. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] generic pseudotype IO functions?
On Mon, 2014-01-06 at 17:36 +0100, Andres Freund wrote: FWIW, I am perfectly fine with duplicating the functions for now - I just thought that that might not be the best way but I didn't (and still don't) have a strong opinion. Could we just put 0 in for the functions' OID and have code elsewhere that errors there is no input function for this type? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On Sun, Jan 12, Amit Kapila wrote: How would that work? Would it be a tool in contrib? There already is a timeout, so if a tool checked more frequently than the timeout, it should work. The durable notification of the admin would happen in the tool, right? Well, you know what tool *I'm* planning to use. Thing is, when we talk about auto-degrade, we need to determine things like Is the replica down or is this just a network blip? and take action according to the user's desired configuration. This is not something, realistically, that we can do on a single request. Whereas it would be fairly simple for an external monitoring utility to do: 1. decide replica is offline for the duration (several poll attempts have failed) 2. Send ALTER SYSTEM SET to the master and change/disable the synch_replicas. Will it possible in current mechanism, because presently master will not accept any new command when the sync replica is not available? Or is there something else also which needs to be done along with above 2 points to make it possible. Since there is not WAL written for ALTER SYSTEM SET command, then it should be able to handle this command even though sync replica is not available. Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ISN extension bug? (with patch)
On 01/03/2014 07:53 PM, Fabien COELHO wrote: If so, there is only the one-liner patch to consider. This patch doesn't apply anymore. Please submit an updated patch for the commit fest. In src/include/utils/elog.h there is an include for utils/errcodes.h which is generated somehow when compiling postgresql but not present by default. So you have to compile postgresql and then the contrib, or use PGXS with an already installed version. With this caveat, the one-liner patch (4 characters removed) reattached does compile for me: Thanks, applied. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch (v2) for updatable security barrier views
On 01/09/2014 11:19 PM, Tom Lane wrote: Dean Rasheed dean.a.rash...@gmail.com writes: My first thought was that it should just preprocess any security barrier quals in subquery_planner() in the same way as other quals are preprocessed. But thinking about it further, those quals are destined to become the quals of subqueries in the range table, so we don't actually want to preprocess them at that stage --- that will happen later when the new subquery is planned by recursion back into subquery_planner(). So I think the right answer is to make adjust_appendrel_attrs() handle recursion into sublink subqueries. TBH, this sounds like doubling down on a wrong design choice. I see no good reason that updatable security views should require any fundamental rearrangements of the order of operations in the planner. In that case, would you mind offerign a quick sanity check on the following alternative idea: - Add sourceRelation to Query. This refers to the RTE that supplies tuple projections to feed into ExecModifyTable, with appropriate resjunk ctid and (if requ'd) oid cols present. - When expanding a target view into a subquery, set sourceRelation on the outer view to the index of the RTE of the newly expanded subquery. - In rewriteTargetView, as now, reassign resultRelation to the target view's base rel. This is required so that do any RETURNING and WITH CHECK OPTION fixups required to adjust the RETURNING list to the new result relation, so they act on the final tuple after any BEFORE triggers act. Do not flatten the view subquery and merge the quals (as currently happens); allow it to be expanded as a subquery by the rewriter instead. Don't mess with the view tlist at this point except by removing the whole-row Var added by rewriteTargetListUD. - When doing tlist expansion in preprocess_targetlist, when we process the outer Query (the only one for which query type is not SELECT, and the only one that has a non-zero resultRelation), if resultRelation != sourceRelation recursively follow the chain of sourceRelation s to the bottom one with type RTE_RELATION. Do tlist expansion on that inner-most Query first, using sourceRelation to supply the varno for injected TLEs, including injecting ctid, oid if req'd, etc. During call stack unwind, have each intermediate layer do regular tlist expansion, adding a Var pointing to each tlist entry of the inner subquery. At the outer level of preprocess_targetlist, sort the tlist, now expanded to include all required vars, into attribute order for the resultRelation. (this level is the only one that has resultRelation set). Avoid invoking preprocess_targetlist on the inner Query again when it's processed in turn, or just bail out when we see sourceRelation set since we know it's already been done. (Alternately, it might be possible to run preprocess_targetlist depth-first instead of the current outermost-first, but I haven't looked at that). The optimizer can still flatten non-security-barrier updatable views, following the chain of Vars as it collapses each layer. That's effectively what the current rewriteTargetView code is doing manually at each pass right now. I'm sure there are some holes in this outline, but it's struck me as possibly workable. The key is to set sourceRelation on every inner subquery in the target query chain, not just the outer one, so it can be easily followed from the outer query though the subqueries into the innermost query with RTE_RELATION type. The only alternative I've looked at is looking clumsier the longer I examine it: adding a back-reference in each subquery's Query struct, to the Query containing it and the RTI of the subquery within the outer Query. That way, once rewriting hits the innermost rel with RTE_RELATION type, the rewriter can walk back up the Query tree doing tlist rewriting. I'm not sure if this is workable yet, and it creates ugly pointer-based backrefs *up* the Query chain, making what was previously a tree of Query* into a graph. That could get exciting, though there'd never be any need for mutators to follow the parent query pointer so it wouldn't make tree rewrites harder. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] nested hstore patch
Thank you, Erik ! Oleg On Mon, Jan 13, 2014 at 12:25 PM, Erik Rijkers e...@xs4all.nl wrote: On Mon, January 13, 2014 00:24, Erik Rijkers wrote: On Sat, January 11, 2014 22:47, Andrew Dunstan wrote: On 01/11/2014 03:03 PM, Erik Rijkers wrote: On Sat, January 11, 2014 20:30, Peter Eisentraut wrote: The documentation doesn't build. corrective patch is here: http://www.postgresql.org/message-id/37b9f104d5a838eec9b75f3668517aa5.squir...@webmail.xs4all.nl It will be in the next version of the patch posted. Attached is another handful of doc-fixes... There are errors in the example expressions in Table F-6. hstore Operators. Attached is a cumulative doc-patch (which includes the changes I sent earlier) which fixes these. I also attach an test perl program that shows the (small) differences in output between what's in that doc table and what one actually gets. (I found these too insignificant to change but perhaps you have a different opinion.) thanks, Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On Jan12, 2014, at 04:18 , Josh Berkus j...@agliodbs.com wrote: Thing is, when we talk about auto-degrade, we need to determine things like Is the replica down or is this just a network blip? and take action according to the user's desired configuration. This is not something, realistically, that we can do on a single request. Whereas it would be fairly simple for an external monitoring utility to do: 1. decide replica is offline for the duration (several poll attempts have failed) 2. Send ALTER SYSTEM SET to the master and change/disable the synch_replicas. In other words, if we're going to have auto-degrade, the most intelligent place for it is in RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest* place. Anything we do *inside* Postgres is going to have a really, really hard time determining when to degrade. +1 This is also how 2PC works, btw - the database provides the building blocks, i.e. PREPARE and COMMIT, and leaves it to a transaction manager to deal with issues that require a whole-cluster perspective. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Where do we stand on 9.3 bugs?
What remaining issues are there blocking a 9.3.3 release? I know that there were unresolved multixact issues when we put out 9.3.2 --- are those all dealt with now? What else do people see as release-blockers? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] generic pseudotype IO functions?
Peter Eisentraut pete...@gmx.net writes: On Mon, 2014-01-06 at 17:36 +0100, Andres Freund wrote: FWIW, I am perfectly fine with duplicating the functions for now - I just thought that that might not be the best way but I didn't (and still don't) have a strong opinion. Could we just put 0 in for the functions' OID and have code elsewhere that errors there is no input function for this type? That doesn't seem like much of an improvement to me: that would be taking a catalog corruption condition and blessing it as a legitimate state of affairs, thereby reducing our ability to detect problems. One instance where it would create issues is that I'm pretty sure pg_dump would get confused by such a type. Admittedly, pg_dump will never try to dump the built-in pseudotypes, but do we really want them handled so differently from user-definable types? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Where do we stand on 9.3 bugs?
Tom Lane wrote: What remaining issues are there blocking a 9.3.3 release? I know that there were unresolved multixact issues when we put out 9.3.2 --- are those all dealt with now? What else do people see as release-blockers? The only thing I'm aware still outstanding in multixact land is the creation of a separate multixact freeze age GUC variable. All other issues were resolved. Bug #8470 is still unresolved, but that's a performance issue. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Where do we stand on 9.3 bugs?
On 2014-01-13 12:26:45 -0300, Alvaro Herrera wrote: Tom Lane wrote: What remaining issues are there blocking a 9.3.3 release? I know that there were unresolved multixact issues when we put out 9.3.2 --- are those all dealt with now? What else do people see as release-blockers? The only thing I'm aware still outstanding in multixact land is the creation of a separate multixact freeze age GUC variable. All other issues were resolved. Bug #8470 is still unresolved, but that's a performance issue. It's not strictly a 9.3 issue, but there's still the issue of /members wrapping around into valid data, right? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Where do we stand on 9.3 bugs?
Andres Freund wrote: On 2014-01-13 12:26:45 -0300, Alvaro Herrera wrote: Tom Lane wrote: What remaining issues are there blocking a 9.3.3 release? I know that there were unresolved multixact issues when we put out 9.3.2 --- are those all dealt with now? What else do people see as release-blockers? The only thing I'm aware still outstanding in multixact land is the creation of a separate multixact freeze age GUC variable. All other issues were resolved. Bug #8470 is still unresolved, but that's a performance issue. It's not strictly a 9.3 issue, but there's still the issue of /members wrapping around into valid data, right? Yes, but TBH I don't have much of an intention to fix that other than having the freeze limit. At least not for 9.3.3. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] nested hstore patch
Andrew Dunstan wrote: On 01/13/2014 03:25 AM, Erik Rijkers wrote: There are errors in the example expressions in Table F-6. hstore Operators. Attached is a cumulative doc-patch (which includes the changes I sent earlier) which fixes these. I also attach an test perl program that shows the (small) differences in output between what's in that doc table and what one actually gets. (I found these too insignificant to change but perhaps you have a different opinion.) A new version of the patch is attached. It includes all of Erik's docs fixes and a small fix by Alexander Korotkov for hstore hash ops. Interestingly, this also include transaction_commit event triggers. There are also a few PANIC elogs, probably not what's intended. (I was just giving this a quick skim to see if there's support to build JSON objects incrementally from C source, i.e. not have to call functions using the fmgr interface. Apparently that's not the case, but if I'm wrong please let me know.) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] nested hstore patch
Andrew, did you run perl script ? Actually, I found, that operator table needs to be fixed. Oleg On Mon, Jan 13, 2014 at 7:36 PM, Andrew Dunstan and...@dunslane.net wrote: On 01/13/2014 03:25 AM, Erik Rijkers wrote: There are errors in the example expressions in Table F-6. hstore Operators. Attached is a cumulative doc-patch (which includes the changes I sent earlier) which fixes these. I also attach an test perl program that shows the (small) differences in output between what's in that doc table and what one actually gets. (I found these too insignificant to change but perhaps you have a different opinion.) A new version of the patch is attached. It includes all of Erik's docs fixes and a small fix by Alexander Korotkov for hstore hash ops. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add CREATE support to event triggers
Alvaro Herrera escribió: In an event trigger, the function pg_event_trigger_get_creation_commands() returns the following JSON blob: After playing with this for a while, I realized something that must have seemed quite obvious to those paying attention: what this function is, is just a glorified sprintf() for JSON. So I propose we take our existing format(text) and use it to model a new format(json) function, which will be useful to the project at hand and be of more general applicability. To make it a better fit, I have changed the spec slightly. The format string is now the fmt element in the topmost JSON. This format string can contain % escapes, which consist of: * the literal % itself * an element name, enclosed in braces { }. The name can optionally be followed by a colon and a possibly-empty array separator. * a format specifier, which can be I (identifier), D (dotted name), or s (string) * Alternatively, %% expands to a literal %, as usual. For each such escape, the JSON object is searched using the element name as key. For identifiers, the element is expected to be a string, and will be quoted per identifier quoting rules. Dotted-names are used to format possibly-qualified relation names and such; the element must be an object with one, two or three string elements, each of which is quoted per identifier rules, and output separated by periods. Finally, for arrays we expand each element in the JSON array element, and separate them with the separator specified in the {} part of the format specifier. For instance, alvherre=# select format(json '{fmt:hello, %{who}s! This is %{name}I, who:world, name:a function}'); format -- hello, world! This is a function Elements can be objects, in which case they are expanded recursively: a fmt element is looked up and expanded as described above. I don't yet see a need for %L escapes (that is, literals that can expand to a single-quoted value or to NULL), but if I see it I will add that too. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] nested hstore patch
On 01/13/2014 11:03 AM, Alvaro Herrera wrote: Andrew Dunstan wrote: On 01/13/2014 03:25 AM, Erik Rijkers wrote: There are errors in the example expressions in Table F-6. hstore Operators. Attached is a cumulative doc-patch (which includes the changes I sent earlier) which fixes these. I also attach an test perl program that shows the (small) differences in output between what's in that doc table and what one actually gets. (I found these too insignificant to change but perhaps you have a different opinion.) A new version of the patch is attached. It includes all of Erik's docs fixes and a small fix by Alexander Korotkov for hstore hash ops. Interestingly, this also include transaction_commit event triggers. Oh, wow, really? git really did something horrible, or I did inadvertently. This is what comes from using the same directory for multiple development lines :-( Will fix There are also a few PANIC elogs, probably not what's intended. Oleg, Teodor, please address. (I was just giving this a quick skim to see if there's support to build JSON objects incrementally from C source, i.e. not have to call functions using the fmgr interface. Apparently that's not the case, but if I'm wrong please let me know.) Erm, maybe you need the other json patch: http://www.postgresql.org/message-id/52c76b33.1050...@dunslane.net If we need to adjust some of that a bit to make it more friendly for internal use I'm happy to try to do that. Unfortunately, I don't think that's terribly easy for VARIADIC any functions like these. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disallow arrays with non-standard lower bounds
On Sun, Jan 12, 2014 at 4:38 AM, Craig Ringer cr...@2ndquadrant.com wrote: Implicit casts to text, anybody? This backward compatibility break orphaned the company I work for on 8.1 until last year and very nearly caused postgres to be summarily extirpated (only rescued at the last minute by my arrival). It cost hundreds of thousands of dollars to qualify a sprawling java code base so that it could be moved back into a supported version. Breaking compatibility sucks -- it hurts your users and costs people money. Hacking type casts may not have been a mistake, but the arbitrary introduction of the breakage certainly was. This project has no deprecation policy, and I'd argue we'd need one before considering breaking changes. For example, maybe we could pull out an occasional release for longer term support to help users that caught out. But really, the better way to go IMNSHO is to take a hard line on compatibility issues pretty much always -- consider the case of libc and win32 api. There are certain limited exceptions to this rule -- for example security problems or gross violations of the standard (bringing row-wise comparison to spec comes to mind as an example of that). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Linux kernel impact on PostgreSQL performance
Hi, I'm the chair for Linux Storage, Filesystem and Memory Management Summit 2014 (LSF/MM). A CFP was sent out last month (https://lwn.net/Articles/575681/) that you may have seen already. In recent years we have had at least one topic that was shared between all three tracks that was lead by a person outside of the usual kernel development community. I am checking if the PostgreSQL community would be willing to volunteer someone to lead a topic discussing PostgreSQL performance with recent kernels or to highlight regressions or future developments you feel are potentially a problem. With luck someone suitable is already travelling to the collaboration summit (http://events.linuxfoundation.org/events/collaboration-summit) and it would not be too inconvenient to drop in for LSF/MM as well. There are two reasons why I'm suggesting this. First, PostgreSQL was the basis of a test used to highlight a scheduler problem around kernel 3.6 but otherwise in my experience it is rare that PostgreSQL is part of a bug report. I am skeptical this particular bug report was a typical use case for PostgreSQL (pgbench, read-only, many threads, very small in-memory database). I wonder why reports related to PostgreSQL are not more common. One assumption would be that PostgreSQL is perfectly happy with the current kernel behaviour in which case our discussion here is done. This brings me to the second reason -- there is evidence that the PostgreSQL community is not happy with the current direction of kernel development. The most obvious example is this thread http://postgresql.1045698.n5.nabble.com/Why-we-are-going-to-have-to-go-DirectIO-td5781471.html but I suspect there are others. The thread alleges that the kernel community are in the business of pushing hackish changes into the IO stack without much thought or testing although the linked article describes a VM and not a storage problem. I'm not here to debate the kernels regression testing or development methodology but LSF/MM is one place where a large number of people involved with the IO layers will be attending. If you have a concrete complaint then here is a soap box. Does the PostgreSQL community have a problem with recent kernels, particularly with respect to the storage, filesystem or memory management layers? If yes, do you have some data that can highlight this and can you volunteer someone to represent your interests to the kernel community? Are current developments in the IO layer counter to the PostgreSQL requirements? If so, what developments, why are they a problem, do you have a suggested alternative or some idea of what we should watch out for? The track topic would be up to you but just as a hint, we'd need something a lot more concrete than you should test more. -- Mel Gorman SUSE Labs -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part 1: additional information
On Sat, Jan 11, 2014 at 6:15 AM, Tomas Vondra t...@fuzzy.cz wrote: On 8.1.2014 22:58, Alexander Korotkov wrote: Thanks for reporting. Fixed version is attached. I've tried to rerun the 'archie' benchmark with the current patch, and once again I got PANIC: could not split GIN page, didn't fit I reran it with '--enable-cassert' and with that I got TRAP: FailedAssertion(!(ginCompareItemPointers(items[i - 1], items[i]) 0), File: gindatapage.c, Line: 149) LOG: server process (PID 5364) was terminated by signal 6: Aborted DETAIL: Failed process was running: INSERT INTO messages ... so the assert in GinDataLeafPageGetUncompressed fails for some reason. I can easily reproduce it, but my knowledge in this area is rather limited so I'm not entirely sure what to look for. I've fixed this bug and many other bug. Now patch passes test suite that I've used earlier. The results are so: Operations time: event | period ---+- index_build | 00:01:47.53915 index_build_recovery | 00:00:04 index_update | 00:05:24.388163 index_update_recovery | 00:00:53 search_new| 00:24:02.289384 search_updated| 00:27:09.193343 (6 rows) Index sizes: label | size ---+--- new | 384761856 after_updates | 667942912 (2 rows) Also, I made following changes in algorithms: - Now, there is a limit to number of uncompressed TIDs in the page. After reaching this limit, they are encoded independent on if they can fit page. That seems to me more desirable behaviour and somehow it accelerates search speed. Before this change times were following: event | period ---+- index_build | 00:01:51.467888 index_build_recovery | 00:00:04 index_update | 00:05:03.315155 index_update_recovery | 00:00:51 search_new| 00:24:43.194882 search_updated| 00:28:36.316784 (6 rows) - Page are not fully re-encoded if it's enough to re-encode just last segment. README is updated. -- With best regards, Alexander Korotkov. gin-packed-postinglists-varbyte5.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] KNN-GiST with recheck
Hackers! This patch was split from thread: http://www.postgresql.org/message-id/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=gon-...@mail.gmail.com I've split it to separate thead, because it's related to partial sort only conceptually not technically. Also I renamed it to knn-gist-recheck from partial-knn as more appropriate name. In the attached version docs are updated. Possible weak point of this patch design is that it fetches heap tuple from GiST scan. However, I didn't receive any notes about its design, so, I'm going to put it to commitfest. Here goes a desription of this patch same as in original thread. KNN-GiST provides ability to get ordered results from index, but this order is based only on index information. For instance, GiST index contains bounding rectangles for polygons, and we can't get exact distance to polygon from index (similar situation is in PostGIS). In attached patch, GiST distance method can set recheck flag (similar to consistent method). This flag means that distance method returned lower bound of distance and we should recheck it from heap. See an example. create table test as (select id, polygon(3+(random()*10)::int, circle(point(random(), random()), 0.0003 + random()*0.001)) as p from generate_series(1,100) id); create index test_idx on test using gist (p); We can get results ordered by distance from polygon to point. postgres=# select id, p - point(0.5,0.5) from test order by p - point(0.5,0.5) limit 10; id | ?column? +-- 755611 | 0.000405855808916853 807562 | 0.000464123777564343 437778 | 0.000738524708741959 947860 | 0.00076250998760724 389843 | 0.000886362723569568 17586 | 0.000981960100555216 411329 | 0.00145338112316853 894191 | 0.00149399559703506 391907 | 0.0016647896049741 235381 | 0.00167554614889509 (10 rows) It's fast using just index scan. QUERY PLAN -- Limit (cost=0.29..1.86 rows=10 width=36) (actual time=0.180..0.230 rows=10 loops=1) - Index Scan using test_idx on test (cost=0.29..157672.29 rows=100 width=36) (actual time=0.179..0.228 rows=10 loops=1) Order By: (p - '(0.5,0.5)'::point) Total runtime: 0.305 ms (4 rows) -- With best regards, Alexander Korotkov. knn-gist-recheck-1.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add transforms feature
On Fri, Jan 10, 2014 at 10:40 PM, Peter Eisentraut pete...@gmx.net wrote: On Wed, 2013-12-11 at 11:07 -0500, Tom Lane wrote: We should have learned by now that those are usually a bad idea. In this case, we've got changes in the behavior of function calling, which seems like not only a nightmare for debugging but a fertile source of security issues. I note that this is the same mechanism that we have elaborately designed for *avoiding* security issues from search_path. And it works like crap. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] nested hstore patch
On 01/13/2014 11:16 AM, Oleg Bartunov wrote: Andrew, did you run perl script ? Actually, I found, that operator table needs to be fixed. No. My build machine doesn't actually have DBD::Pg installed. Can you send me a patch if you don't want to push it yourself, or maybe Erik can send a pacth top adjust the table. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: Partial sort
On Tue, Dec 31, 2013 at 5:41 AM, Andreas Karlsson andr...@proxel.se wrote: On 12/29/2013 08:24 AM, David Rowley wrote: If it was possible to devise some way to reuse any previous tuplesortstate perhaps just inventing a reset method which clears out tuples, then we could see performance exceed the standard seqscan - sort. The code the way it is seems to lookup the sort functions from the syscache for each group then allocate some sort space, so quite a bit of time is also spent in palloc0() and pfree() If it was not possible to do this then maybe adding a cost to the number of sort groups would be better so that the optimization is skipped if there are too many sort groups. It should be possible. I have hacked a quick proof of concept for reusing the tuplesort state. Can you try it and see if the performance regression is fixed by this? One thing which have to be fixed with my patch is that we probably want to close the tuplesort once we have returned the last tuple from ExecSort(). I have attached my patch and the incremental patch on Alexander's patch. Thanks. It's included into attached version of patch. As wall as estimation improvements, more comments and regression tests fix. -- With best regards, Alexander Korotkov. partial-sort-5.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: show xid and xmin in pg_stat_activity and pg_stat_replication
On 12/17/2013 04:58 PM, Christian Kruse wrote: attached you will find a patch for showing the current transaction id (xid) and the xmin of a backend in pg_stat_activty and the xmin in pg_stat_replication. Docs. When an admin is looking for a long-running transaction that's blocking vacuum, he will currently rely on the timestamp fields, xact_start and query_start. I'm not sure how much extra value this adds over those timestamps in pg_stat_activity, but there are not such fields in pg_stat_replication, so that part is definitely useful. And if we're going to add xmin to pg_stat_replication, it makes sense to add it to pg_stat_activity too. Unless someone can come up with something better to display for walsenders. The timestamp of the last commit record that's been replayed, perhaps? What else would a user would want to do with these? This definitely sounds useful to me as a developer, though. So I'm thinking we should add these for that reason, in any case. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On 01/13/2014 04:12 PM, Florian Pflug wrote: On Jan12, 2014, at 04:18 , Josh Berkus j...@agliodbs.com wrote: Thing is, when we talk about auto-degrade, we need to determine things like Is the replica down or is this just a network blip? and take action according to the user's desired configuration. This is not something, realistically, that we can do on a single request. Whereas it would be fairly simple for an external monitoring utility to do: 1. decide replica is offline for the duration (several poll attempts have failed) 2. Send ALTER SYSTEM SET to the master and change/disable the synch_replicas. In other words, if we're going to have auto-degrade, the most intelligent place for it is in RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest* place. Anything we do *inside* Postgres is going to have a really, really hard time determining when to degrade. +1 This is also how 2PC works, btw - the database provides the building blocks, i.e. PREPARE and COMMIT, and leaves it to a transaction manager to deal with issues that require a whole-cluster perspective. ++1 I like Simons idea to have a pg_xxx function for switching between replication modes, which should be enough to support a monitor daemon doing the switching. Maybe we could have an 'syncrep_taking_too_long_command' GUC which could be used to alert such a monitoring daemon, so it can immediately check weather to a) switch master to async rep or standalone mode (in case of sync slave becoming unavailable) or b) to failover to slave (in almost equally likely case that it was the master which became disconnected from the world and slave is available) or c) do something else depending on circumstances/policy :) NB! Note that in case of b) 'syncrep_taking_too_long_command' will very likely also not reach the monitor daemon, so it can not relay on this as main trigger! Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On 01/13/2014 10:12 AM, Hannu Krosing wrote: In other words, if we're going to have auto-degrade, the most intelligent place for it is in RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest* place. Anything we do *inside* Postgres is going to have a really, really hard time determining when to degrade. +1 This is also how 2PC works, btw - the database provides the building blocks, i.e. PREPARE and COMMIT, and leaves it to a transaction manager to deal with issues that require a whole-cluster perspective. ++1 +1 I like Simons idea to have a pg_xxx function for switching between replication modes, which should be enough to support a monitor daemon doing the switching. Maybe we could have an 'syncrep_taking_too_long_command' GUC which could be used to alert such a monitoring daemon, so it can immediately check weather to I would think that would be a column in pg_stat_replication. Basically last_ack or something like that. a) switch master to async rep or standalone mode (in case of sync slave becoming unavailable) Yep. or b) to failover to slave (in almost equally likely case that it was the master which became disconnected from the world and slave is available) or I think this should be left to external tools. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc In a time of universal deceit - telling the truth is a revolutionary act., George Orwell -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Linux kernel impact on PostgreSQL performance
Mel, I'm the chair for Linux Storage, Filesystem and Memory Management Summit 2014 (LSF/MM). A CFP was sent out last month (https://lwn.net/Articles/575681/) that you may have seen already. In recent years we have had at least one topic that was shared between all three tracks that was lead by a person outside of the usual kernel development community. I am checking if the PostgreSQL community would be willing to volunteer someone to lead a topic discussing PostgreSQL performance with recent kernels or to highlight regressions or future developments you feel are potentially a problem. With luck someone suitable is already travelling to the collaboration summit (http://events.linuxfoundation.org/events/collaboration-summit) and it would not be too inconvenient to drop in for LSF/MM as well. We can definitely get someone there. I'll certainly be there; I'm hoping to get someone who has closer involvement with our kernel interaction as well. There are two reasons why I'm suggesting this. First, PostgreSQL was the basis of a test used to highlight a scheduler problem around kernel 3.6 but otherwise in my experience it is rare that PostgreSQL is part of a bug report. I am skeptical this particular bug report was a typical use case for PostgreSQL (pgbench, read-only, many threads, very small in-memory database). I wonder why reports related to PostgreSQL are not more common. One assumption would be that PostgreSQL is perfectly happy with the current kernel behaviour in which case our discussion here is done. To be frank, it's because most people are still running on 2.6.19, and as a result are completely unaware of recent developments. Second, because there's no obvious place to complain to ... lkml doesn't welcome bug reports, and where else do you go? Does the PostgreSQL community have a problem with recent kernels, particularly with respect to the storage, filesystem or memory management layers? If yes, do you have some data that can highlight this and can you volunteer someone to represent your interests to the kernel community? Yes, and yes. Are current developments in the IO layer counter to the PostgreSQL requirements? If so, what developments, why are they a problem, do you have a suggested alternative or some idea of what we should watch out for? Mostly the issue is changes to the IO scheduler which improve one use case at the expense of others, or set defaults which emphasize desktop hardware over server hardware. What also came up with the recent change to LRU is that the Postgres community apparently has more experience than the Linux community with buffer-clearing algorithms, and we ought to share that. The track topic would be up to you but just as a hint, we'd need something a lot more concrete than you should test more. How about don't add major IO behavior changes with no backwards-compatibility switches? ;-) Seriously, one thing I'd like to get out of Collab would be a reasonable regimen for testing database performance on Linux kernels. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Linux kernel impact on PostgreSQL performance
Josh Berkus j...@agliodbs.com wrote: Does the PostgreSQL community have a problem with recent kernels, particularly with respect to the storage, filesystem or memory management layers? How about don't add major IO behavior changes with no backwards-compatibility switches? ;-) I notice, Josh, that you didn't mention the problems many people have run into with Transparent Huge Page defrag and with NUMA access. Is that because there *are* configuration options that allow people to get decent performance once the issue is diagnosed? It seems like maybe there could be a better way to give a heads-up on hazards in a new kernel to the database world, but I don't know quite what that would be. For all I know, it is already available if you know where to look. Seriously, one thing I'd like to get out of Collab would be a reasonable regimen for testing database performance on Linux kernels. ... or perhaps you figure this is what would bring such issues to the community's attention before people are bitten in production environments? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Mon, Jan 13, 2014 at 12:23 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Exclusion constraints can be used to implement uniqueness checks with SP-GiST or GiST indexes. For example, if you want to enforce that there are no two tuples with the same x and y coordinates, ie. use a point as the key. You could add a b-tree index just to enforce the constraint, but it's better if you don't have to. In general, it's just always better if features don't have implementation-specific limitations like this. That seems rather narrow. Among other things, I worry about the baggage for users in documenting supporting SP-GiST/GiST. We support it, but it only really works for the case where you're using exclusion constraints as unique constraints, something that might make sense in certain narrow contexts, contrary to our earlier general statement that a unique index should be preferred there. We catalog amcanunique methods as the way that we support unique indexes. I really do feel that that's the appropriate level to support the feature at, and I have not precluded other amcanunique implementations from doing the same, having documented the intended value locking interface/contract for the benefit of any future amcanunique AM author. It's ON DUPLICATE KEY, not ON OVERLAPPING KEY, or any other syntax suggestive of exclusion constraints and their arbitrary commutative operators. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Linux kernel impact on PostgreSQL performance
On 01/13/2014 10:51 AM, Kevin Grittner wrote: How about don't add major IO behavior changes with no backwards-compatibility switches? ;-) I notice, Josh, that you didn't mention the problems many people have run into with Transparent Huge Page defrag and with NUMA access. Is that because there *are* configuration options that allow people to get decent performance once the issue is diagnosed? It seems like maybe there could be a better way to give a heads-up on hazards in a new kernel to the database world, but I don't know quite what that would be. For all I know, it is already available if you know where to look. Well, it was the lack of sysctl options which takes the 2Q change from annoyance to potential disaster. We can't ever get away from the possibility that the Postgres use-case might be the minority use-case, and we might have to use non-default options. It's when those options aren't present *at all* that we're stuck. However, I agree that a worthwhile thing to talk about is having some better channel to notify the Postgres (and other DB) communities about major changes to IO and Memory management. Wanna go to Collab? Seriously, one thing I'd like to get out of Collab would be a reasonable regimen for testing database performance on Linux kernels. ... or perhaps you figure this is what would bring such issues to the community's attention before people are bitten in production environments? That, too. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan
On Thu, Jan 9, 2014 at 10:57 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: I wrote: Robert Haas wrote: Hmm, fair point. But I'm still not convinced that we really need to add extra accounting for this. What's wrong with just reporting the number of exact and lossy pages? No. I intended to show the desired memory space for a TIDBitmap rather than the peak memory usage for that TIDBitmap. And I thought it'd be better for the latter to be displayed as additional information. However, I've removed the functionality for showing the desired memory space due to technical problems. Now I should probably remove the functionality for showing the peak memory usage too. Yes, as Andres mentioned, showing the peak memory usage is not a bad idea, I think. But I start to think it's not necessarily worth complicating the code ... If there are no objections of others, I'll remove extra accounting for showing the peak memory usage. Done. Please find attached a patch. Looks good to me, so committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER SYSTEM SET command to change postgresql.conf parameters
On Thu, Jan 9, 2014 at 11:06 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Thu, Jan 9, 2014 at 12:26 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Jan 6, 2014 at 11:37 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Tue, Jan 7, 2014 at 12:52 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Jan 6, 2014 at 9:48 AM, Amit Kapila amit.kapil...@gmail.com wrote: Couldn't we also handle this by postponing FreeConfigVariables until after the if (error) block? Wouldn't doing that way can lead to bigger memory leak, if error level is ERROR. Though in current fix also it can leak memory but it will be just for ErrorConfFile_save. I think some similar case can happen for 'pre_value' in code currently as well, that's why I have fixed it in a similar way in patch. I was assuming that error-recovery would reset the containing memory context, but I'm not sure what memory context we're executing in at this point. In current code, the only time it can go to error path with elevel as ERROR is during Postmaster startup (context == PGC_POSTMASTER), at which it will anyway upgrade ERROR to FATAL, so it should not be a problem to move function FreeConfigVariables() after error block check. However in future, if someone added any more ERROR (the chances of which seems to be quite less), it can cause leak, may be thats why original code has been written that way. If you think it's better to fix by moving FreeConfigVariables() after error block check, then I can update the patch by doing so and incorporate other change (directly use PG_AUTOCONF_FILENAME) suggested by you as well? Yeah, let's do it that way. Okay, done. Attached patch fixes both the display of wrong file name and usage of PG_AUTOCONF_FILENAME. Committed with a comment change. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planning time in explain/explain analyze
On Thu, Jan 9, 2014 at 11:45 PM, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark st...@mit.edu writes: On Thu, Jan 9, 2014 at 9:14 PM, Tom Lane t...@sss.pgh.pa.us wrote: In short then, I think we should just add this to EXPLAIN and be done. -1 for sticking the info into PlannedStmt or anything like that. I'm confused. I thought I was arguing to support your suggestion that the initial planning store the time in the cached plan and explain should output the time the original planning took. Uh, no, wasn't my suggestion. Doesn't that design imply measuring *every* planning cycle, explain or no? I was thinking more of just putting the timing calls into explain.c. Currently the patch includes changes to prepare.c which is what seems odd to me. I think it'd be fine to say, hey, I can't give you the planning time in this EXPLAIN ANALYZE because I just used a cached plan and did not re-plan. But saying, hey, the planning time is $TINYVALUE, when what we really mean is that looking up the previously-cached plan took only that long, seems actively misleading to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planning time in explain/explain analyze
* Robert Haas (robertmh...@gmail.com) wrote: Currently the patch includes changes to prepare.c which is what seems odd to me. I think it'd be fine to say, hey, I can't give you the planning time in this EXPLAIN ANALYZE because I just used a cached plan and did not re-plan. But saying, hey, the planning time is $TINYVALUE, when what we really mean is that looking up the previously-cached plan took only that long, seems actively misleading to me. My thought, at least, was to always grab the planning time and then provide it for explain and/or explain analyze, and then for re-plan cases, indicate if a cached plan was returned, if a replan happened, and if a replan happened, what the old plan time and the new plan time was. I don't think it makes any sense to report on the time returned from pulling a previously-cached plan. I understand that it's not completely free to track the plan time for every query but I'm in the camp that says we need better metrics and information for 99% of what we do and I'd like to see us eventually able to track average plan time (maybe on a per-query basis..), average run-time, how many times we do a hashjoin, mergejoin, the number of records in/out of each, memory usage, etc, etc.. I don't think we need per-tuple timing information. I certainly wouldn't want to try and collect all of this through shared memory or our existing stats collector. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Linux kernel impact on PostgreSQL performance
On Mon, Jan 13, 2014 at 1:51 PM, Kevin Grittner kgri...@ymail.com wrote: I notice, Josh, that you didn't mention the problems many people have run into with Transparent Huge Page defrag and with NUMA access. Amen to that. Actually, I think NUMA can be (mostly?) fixed by setting zone_reclaim_mode; is there some other problem besides that? The other thing that comes to mind is the kernel's caching behavior. We've talked a lot over the years about the difficulties of getting the kernel to write data out when we want it to and to not write data out when we don't want it to. When it writes data back to disk too aggressively, we get lousy throughput because the same page can get written more than once when caching it for longer would have allowed write-combining. When it doesn't write data to disk aggressively enough, we get huge latency spikes at checkpoint time when we call fsync() and the kernel says uh, what? you wanted that data *on the disk*? sorry boss! and then proceeds to destroy the world by starving the rest of the system for I/O for many seconds or minutes at a time. We've made some desultory attempts to use sync_file_range() to improve things here, but I'm not sure that's really the right tool, and if it is we don't know how to use it well enough to obtain consistent positive results. On a related note, there's also the problem of double-buffering. When we read a page into shared_buffers, we leave a copy behind in the OS buffers, and similarly on write-out. It's very unclear what to do about this, since the kernel and PostgreSQL don't have intimate knowledge of what each other are doing, but it would be nice to solve somehow. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Mon, Jan 13, 2014 at 1:53 PM, Peter Geoghegan p...@heroku.com wrote: On Mon, Jan 13, 2014 at 12:23 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Exclusion constraints can be used to implement uniqueness checks with SP-GiST or GiST indexes. For example, if you want to enforce that there are no two tuples with the same x and y coordinates, ie. use a point as the key. You could add a b-tree index just to enforce the constraint, but it's better if you don't have to. In general, it's just always better if features don't have implementation-specific limitations like this. That seems rather narrow. Among other things, I worry about the baggage for users in documenting supporting SP-GiST/GiST. We support it, but it only really works for the case where you're using exclusion constraints as unique constraints, something that might make sense in certain narrow contexts, contrary to our earlier general statement that a unique index should be preferred there. We catalog amcanunique methods as the way that we support unique indexes. I really do feel that that's the appropriate level to support the feature at, and I have not precluded other amcanunique implementations from doing the same, having documented the intended value locking interface/contract for the benefit of any future amcanunique AM author. It's ON DUPLICATE KEY, not ON OVERLAPPING KEY, or any other syntax suggestive of exclusion constraints and their arbitrary commutative operators. For what it's worth, I agree with Heikki. There's probably nothing sensible an upsert can do if it conflicts with more than one tuple, but if it conflicts with just exactly one, it oughta be OK. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Linux kernel impact on PostgreSQL performance
On Mon, Jan 13, 2014 at 5:15 PM, Robert Haas robertmh...@gmail.com wrote: On a related note, there's also the problem of double-buffering. When we read a page into shared_buffers, we leave a copy behind in the OS buffers, and similarly on write-out. It's very unclear what to do about this, since the kernel and PostgreSQL don't have intimate knowledge of what each other are doing, but it would be nice to solve somehow. There you have a much harder algorithmic problem. You can basically control duplication with fadvise and WONTNEED. The problem here is not the kernel and whether or not it allows postgres to be smart about it. The problem is... what kind of smarts (algorithm) to use. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planning time in explain/explain analyze
On 01/13/2014 09:06 PM, Stephen Frost wrote: My thought, at least, was to always grab the planning time and then provide it for explain and/or explain analyze, and then for re-plan cases, indicate if a cached plan was returned, if a replan happened, and if a replan happened, what the old plan time and the new plan time was. I like this solution due it is correctness and that all information is included. But I also think my original path was fine in how it in its simplicity solved the problem without adding any overhead in the non-EXPLAIN case. Either solution would be fine by me. If the consensus is that we want to always measure it I will look at implementing that instead. -- Andreas Karlsson -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planning time in explain/explain analyze
Robert Haas robertmh...@gmail.com writes: On Thu, Jan 9, 2014 at 11:45 PM, Tom Lane t...@sss.pgh.pa.us wrote: Uh, no, wasn't my suggestion. Doesn't that design imply measuring *every* planning cycle, explain or no? I was thinking more of just putting the timing calls into explain.c. Currently the patch includes changes to prepare.c which is what seems odd to me. I think it'd be fine to say, hey, I can't give you the planning time in this EXPLAIN ANALYZE because I just used a cached plan and did not re-plan. But saying, hey, the planning time is $TINYVALUE, when what we really mean is that looking up the previously-cached plan took only that long, seems actively misleading to me. Meh. Why? This would only come into play for EXPLAIN EXECUTE stmtname. I don't think users would be surprised to see a report of minimal planning time for that. In fact, it might be a good thing, as it would make it easier to tell the difference between whether you were seeing a generic plan or a custom plan for the prepared statement. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Linux kernel impact on PostgreSQL performance
On 1/13/14, 2:19 PM, Claudio Freire wrote: On Mon, Jan 13, 2014 at 5:15 PM, Robert Haas robertmh...@gmail.com wrote: On a related note, there's also the problem of double-buffering. When we read a page into shared_buffers, we leave a copy behind in the OS buffers, and similarly on write-out. It's very unclear what to do about this, since the kernel and PostgreSQL don't have intimate knowledge of what each other are doing, but it would be nice to solve somehow. There you have a much harder algorithmic problem. You can basically control duplication with fadvise and WONTNEED. The problem here is not the kernel and whether or not it allows postgres to be smart about it. The problem is... what kind of smarts (algorithm) to use. Isn't this a fairly simple matter of when we read a page into shared buffers tell the kernel do forget that page? And a corollary to that for when we dump a page out of shared_buffers (here kernel, please put this back into your cache). -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)
On Fri, Jan 10, 2014 at 2:07 PM, Tom Lane t...@sss.pgh.pa.us wrote: Florian Pflug f...@phlo.org writes: On Jan10, 2014, at 19:08 , Tom Lane t...@sss.pgh.pa.us wrote: Although, having said that ... maybe build your own aggregate would be a reasonable suggestion for people who need this? I grant that it's going to be a minority requirement, maybe even a small minority requirement. People who have the chops to get this sort of thing right can probably manage a custom aggregate definition. So we'd put a footgun into the hands of people who don't know what they're doing, to be fired for performance's sake, and leave it to the people who know what they are doing to put the safety on? If I may put words in Kevin's mouth, I think his point is that having float8 sum() at all is a foot-gun, and that's hard to deny. You need to know how to use it safely. Yeah, but Florian's point is that not all foot-guns are created equal. The fact that we're walking around with a loaded BB-gun in our hip pocket is not a good reason to replace it with a howitzer. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Linux kernel impact on PostgreSQL performance
On Mon, Jan 13, 2014 at 5:23 PM, Jim Nasby j...@nasby.net wrote: On 1/13/14, 2:19 PM, Claudio Freire wrote: On Mon, Jan 13, 2014 at 5:15 PM, Robert Haas robertmh...@gmail.com wrote: On a related note, there's also the problem of double-buffering. When we read a page into shared_buffers, we leave a copy behind in the OS buffers, and similarly on write-out. It's very unclear what to do about this, since the kernel and PostgreSQL don't have intimate knowledge of what each other are doing, but it would be nice to solve somehow. There you have a much harder algorithmic problem. You can basically control duplication with fadvise and WONTNEED. The problem here is not the kernel and whether or not it allows postgres to be smart about it. The problem is... what kind of smarts (algorithm) to use. Isn't this a fairly simple matter of when we read a page into shared buffers tell the kernel do forget that page? And a corollary to that for when we dump a page out of shared_buffers (here kernel, please put this back into your cache). That's my point. In terms of kernel-postgres interaction, it's fairly simple. What's not so simple, is figuring out what policy to use. Remember, you cannot tell the kernel to put some page in its page cache without reading it or writing it. So, once you make the kernel forget a page, evicting it from shared buffers becomes quite expensive. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planning time in explain/explain analyze
On Mon, Jan 13, 2014 at 3:23 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Jan 9, 2014 at 11:45 PM, Tom Lane t...@sss.pgh.pa.us wrote: Uh, no, wasn't my suggestion. Doesn't that design imply measuring *every* planning cycle, explain or no? I was thinking more of just putting the timing calls into explain.c. Currently the patch includes changes to prepare.c which is what seems odd to me. I think it'd be fine to say, hey, I can't give you the planning time in this EXPLAIN ANALYZE because I just used a cached plan and did not re-plan. But saying, hey, the planning time is $TINYVALUE, when what we really mean is that looking up the previously-cached plan took only that long, seems actively misleading to me. Meh. Why? This would only come into play for EXPLAIN EXECUTE stmtname. I don't think users would be surprised to see a report of minimal planning time for that. In fact, it might be a good thing, as it would make it easier to tell the difference between whether you were seeing a generic plan or a custom plan for the prepared statement. It would also make it easier to be wrong. If you want to display that information explicitly, fine. But asking the user to use the elapsed time to guess whether or not we really planned anything is just going to confuse people who don't have enough experience with the system to know what the boundary is between the largest time that could be a cache lookup and the smallest time that could be real planning activity. And that means virtually everyone, me included. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Linux kernel impact on PostgreSQL performance
On 1/13/14, 2:27 PM, Claudio Freire wrote: On Mon, Jan 13, 2014 at 5:23 PM, Jim Nasby j...@nasby.net wrote: On 1/13/14, 2:19 PM, Claudio Freire wrote: On Mon, Jan 13, 2014 at 5:15 PM, Robert Haas robertmh...@gmail.com wrote: On a related note, there's also the problem of double-buffering. When we read a page into shared_buffers, we leave a copy behind in the OS buffers, and similarly on write-out. It's very unclear what to do about this, since the kernel and PostgreSQL don't have intimate knowledge of what each other are doing, but it would be nice to solve somehow. There you have a much harder algorithmic problem. You can basically control duplication with fadvise and WONTNEED. The problem here is not the kernel and whether or not it allows postgres to be smart about it. The problem is... what kind of smarts (algorithm) to use. Isn't this a fairly simple matter of when we read a page into shared buffers tell the kernel do forget that page? And a corollary to that for when we dump a page out of shared_buffers (here kernel, please put this back into your cache). That's my point. In terms of kernel-postgres interaction, it's fairly simple. What's not so simple, is figuring out what policy to use. Remember, you cannot tell the kernel to put some page in its page cache without reading it or writing it. So, once you make the kernel forget a page, evicting it from shared buffers becomes quite expensive. Well, if we were to collaborate with the kernel community on this then presumably we can do better than that for eviction... even to the extent of here's some data from this range in this file. It's (clean|dirty). Put it in your cache. Just trust me on this. -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Linux kernel impact on PostgreSQL performance
On Mon, Jan 13, 2014 at 5:32 PM, Jim Nasby j...@nasby.net wrote: That's my point. In terms of kernel-postgres interaction, it's fairly simple. What's not so simple, is figuring out what policy to use. Remember, you cannot tell the kernel to put some page in its page cache without reading it or writing it. So, once you make the kernel forget a page, evicting it from shared buffers becomes quite expensive. Well, if we were to collaborate with the kernel community on this then presumably we can do better than that for eviction... even to the extent of here's some data from this range in this file. It's (clean|dirty). Put it in your cache. Just trust me on this. If I had a kernel developer hat, I'd put it on to say: I don't think allowing that last bit is wise for a kernel. It would violate oh-so-many separation rules and open an oh-so-big can-o-worms. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planning time in explain/explain analyze
Robert Haas robertmh...@gmail.com writes: On Mon, Jan 13, 2014 at 3:23 PM, Tom Lane t...@sss.pgh.pa.us wrote: Meh. Why? This would only come into play for EXPLAIN EXECUTE stmtname. I don't think users would be surprised to see a report of minimal planning time for that. In fact, it might be a good thing, as it would make it easier to tell the difference between whether you were seeing a generic plan or a custom plan for the prepared statement. It would also make it easier to be wrong. If you want to display that information explicitly, fine. But asking the user to use the elapsed time to guess whether or not we really planned anything is just going to confuse people who don't have enough experience with the system to know what the boundary is between the largest time that could be a cache lookup and the smallest time that could be real planning activity. And that means virtually everyone, me included. If you're saying that you'd like EXPLAIN to explicitly mention whether the plan was cached or custom, I don't have any great complaint about that. I'm just not seeing how you arrive at the conclusion that we mustn't report the amount of time EXPLAIN spent to get the plan. If we do what you're proposing we'll just have a different set of confused users, who will be wondering how EXPLAIN could have managed to spend 100 msec planning something when the EXPLAIN only took 10 msec in toto according to psql. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Linux kernel impact on PostgreSQL performance
On 2014-01-13 15:15:16 -0500, Robert Haas wrote: On Mon, Jan 13, 2014 at 1:51 PM, Kevin Grittner kgri...@ymail.com wrote: I notice, Josh, that you didn't mention the problems many people have run into with Transparent Huge Page defrag and with NUMA access. Amen to that. Actually, I think NUMA can be (mostly?) fixed by setting zone_reclaim_mode; is there some other problem besides that? I think that fixes some of the worst instances, but I've seen machines spending horrible amounts of CPU ( BUS) time in page reclaim nonetheless. If I analyzed it correctly it's in RAM working set workloads where RAM is pretty large and most of it is used as page cache. The kernel ends up spending a huge percentage of time finding and potentially defragmenting pages when looking for victim buffers. On a related note, there's also the problem of double-buffering. When we read a page into shared_buffers, we leave a copy behind in the OS buffers, and similarly on write-out. It's very unclear what to do about this, since the kernel and PostgreSQL don't have intimate knowledge of what each other are doing, but it would be nice to solve somehow. I've wondered before if there wouldn't be a chance for postgres to say my dear OS, that the file range 0-8192 of file x contains y, no need to reread and do that when we evict a page from s_b but I never dared to actually propose that to kernel people... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On Sat, Jan 11, 2014 at 1:08 AM, Amit Kapila amit.kapil...@gmail.com wrote: Yes, currently this applies to update, what I have in mind is that in future if some one wants to use WAL compression for any other operation like 'full_page_writes', then it can be easily extendible. To be honest, I have not evaluated whether such a flag or compression would make sense for full page writes, but I think it should be possible while doing full page write (BkpBlock has RelFileNode) to check such a flag if it's present. Makes sense. The reason of adding the same chunk in head of list is that it uses same technique as pglz_hist_add. Now in pglz, it will not have repeat steps from c~f, as it has concept of good_match which leads to get this done in one go. Being said above, I am really not sure, how much real world data falls in above category and should we try to optimize based on above example, but yes it will save some CPU cycles in current test we are using. In the Rabin algorithm, we shouldn't try to find a longer match. The match should end at the chunk end, period. Otherwise, you lose the shift-resistant property of the algorithm. But I do think there might be a bug here, which is that, unless I'm misinterpreting something, hp is NOT the end of the chunk. After calling pgrb_hash_init(), we've looked at the first FOUR bytes of the input. If we find that we have a zero hash value at that point, shouldn't the chunk size be 4, not 1? And similarly if we find it after sucking in one more byte, shouldn't the chunk size be 5, not 2? Right now, we're deciding where the chunks should end based on the data in the chunk plus the following 3 bytes, and that seems wonky. I would expect us to include all of those bytes in the chunk. It depends on how we define chunk, basically chunk size will be based on the byte for which we consider hindex. The hindex for any byte is calculated considering that byte and the following 3 bytes, so after calling pgrb_hash_init(), even though we have looked at 4 bytes but still the hindex is for first byte and thats why it consider chunk size as 1, not 4. Isn't it similar to how current pglz works, basically it also uses next 4 bytes to calculate index (pglz_hist_idx) but still does byte by byte comparison, here if we try to map to rabin's delta encoding then always chunk size is 1. I don't quite understand this. The point of the Rabin algorithm is to split the old tuple up into chunks and then for those chunks in the new tuple. For example, suppose the old tuple is abcdefghijklmnopqrstuvwxyz. It might get split like this: abcdef hijklmnopqrstuvw xyz. If any of those three chunks appear in the new tuple, then we'll use them for compression. If not, we'll just copy the literal bytes. If the chunks appear in the new tuple reordered or shifted or with stuff inserted between one chunk at the next, we'll still find them. Unless I'm confused, which is possible, what you're doing is essentially looking at the string and spitting it in those three places, but then recording the chunks as being three bytes shorter than they really are. I don't see how that can be right. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Planning time in explain/explain analyze
On Mon, Jan 13, 2014 at 3:40 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Jan 13, 2014 at 3:23 PM, Tom Lane t...@sss.pgh.pa.us wrote: Meh. Why? This would only come into play for EXPLAIN EXECUTE stmtname. I don't think users would be surprised to see a report of minimal planning time for that. In fact, it might be a good thing, as it would make it easier to tell the difference between whether you were seeing a generic plan or a custom plan for the prepared statement. It would also make it easier to be wrong. If you want to display that information explicitly, fine. But asking the user to use the elapsed time to guess whether or not we really planned anything is just going to confuse people who don't have enough experience with the system to know what the boundary is between the largest time that could be a cache lookup and the smallest time that could be real planning activity. And that means virtually everyone, me included. If you're saying that you'd like EXPLAIN to explicitly mention whether the plan was cached or custom, I don't have any great complaint about that. I'm just not seeing how you arrive at the conclusion that we mustn't report the amount of time EXPLAIN spent to get the plan. If we do what you're proposing we'll just have a different set of confused users, who will be wondering how EXPLAIN could have managed to spend 100 msec planning something when the EXPLAIN only took 10 msec in toto according to psql. What I'm saying is that if EXPLAIN reports something that's labelled Planning Time, it should *be* the planning time, and not anything else. When we retrieve a plan from cache, it would be sensible not to report the planning time at all, and IMHO it would also be sensible to report the time it actually took to plan whenever we originally did it. But reporting a value that is not the planning time and calling it the planning time does not seem like a good idea to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Mon, Jan 13, 2014 at 12:17 PM, Robert Haas robertmh...@gmail.com wrote: For what it's worth, I agree with Heikki. There's probably nothing sensible an upsert can do if it conflicts with more than one tuple, but if it conflicts with just exactly one, it oughta be OK. If there is exactly one, *and* the existing value is exactly the same as the value proposed for insertion (or, I suppose, a subset of the existing value, but that's so narrow that it might as well not apply). In short, when you're using an exclusion constraint as a unique constraint. Which is very narrow indeed. Weighing the costs and the benefits, that seems like far more cost than benefit, before we even consider anything beyond simply explaining the applicability and limitations of upserting with exclusion constraints. It's generally far cleaner to define speculative insertion as something that happens with unique indexes only. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: Partial sort
Hi Alexander, First, thanks a lot for working on this feature. This PostgreSQL shortcoming crops up in all the time in web applications that implement paging by multiple sorted columns. I've been trying it out in a few situations. I implemented a new enable_partialsort GUC to make it easier to turn on/off, this way it's a lot easier to test. The attached patch applies on top of partial-sort-5.patch I will spend more time reviewing the patch, but some of this planner code is over my head. If there's any way I can help to make sure this lands in the next version, let me know. The patch performs just as well as I would expect it to: marti=# select ac.name, r.name from artist_credit ac join release r on ( ac.id=r.artist_credit) order by ac.name, r.name limit 1000; Time: 9.830 ms marti=# set enable_partialsort = off; marti=# select ac.name, r.name from artist_credit ac join release r on ( ac.id=r.artist_credit) order by ac.name, r.name limit 1000; Time: 1442.815 ms A difference of almost 150x! There's a missed opportunity in that the code doesn't consider pushing new Sort steps into subplans. For example, if there's no index on language(name) then this query cannot take advantage partial sorts: marti=# explain select l.name, r.name from language l join release r on ( l.id=r.language) order by l.name, r.name limit 1000; Limit (cost=123203.20..123205.70 rows=1000 width=32) - Sort (cost=123203.20..126154.27 rows=1180430 width=32) Sort Key: l.name, r.name - Hash Join (cost=229.47..58481.49 rows=1180430 width=32) Hash Cond: (r.language = l.id) - Seq Scan on release r (cost=0.00..31040.10 rows=1232610 width=26) - Hash (cost=131.43..131.43 rows=7843 width=14) - Seq Scan on language l (cost=0.00..131.43 rows=7843 width=14) But because there are only so few languages, it would be a lot faster to sort languages in advance and then do partial sort: Limit (rows=1000 width=31) - Partial sort (rows=1180881 width=31) Sort Key: l.name, r.name Presorted Key: l.name - Nested Loop (rows=1180881 width=31) - Sort (rows=7843 width=10) Sort Key: name - Seq Scan on language (rows=7843 width=14) - Index Scan using release_language_idx on release r (rows=11246 width=25) Index Cond: (language = l.id) Even an explicit sorted CTE cannot take advantage of partial sorts: marti=# explain with sorted_lang as (select id, name from language order by name) marti-# select l.name, r.name from sorted_lang l join release r on (l.id=r.language) order by l.name, r.name limit 1000; Limit (cost=3324368.83..3324371.33 rows=1000 width=240) CTE sorted_lang - Sort (cost=638.76..658.37 rows=7843 width=14) Sort Key: language.name - Seq Scan on language (cost=0.00..131.43 rows=7843 width=14) - Sort (cost=3323710.46..3439436.82 rows=46290543 width=240) Sort Key: l.name, r.name - Merge Join (cost=664.62..785649.92 rows=46290543 width=240) Merge Cond: (r.language = l.id) - Index Scan using release_language_idx on release r (cost=0.43..87546.06 rows=1232610 width=26) - Sort (cost=664.19..683.80 rows=7843 width=222) Sort Key: l.id - CTE Scan on sorted_lang l (cost=0.00..156.86 rows=7843 width=222) But even with these limitations, this will easily be the killer feature of the next release, for me at least. Regards, Marti On Mon, Jan 13, 2014 at 8:01 PM, Alexander Korotkov aekorot...@gmail.comwrote: On Tue, Dec 31, 2013 at 5:41 AM, Andreas Karlsson andr...@proxel.sewrote: On 12/29/2013 08:24 AM, David Rowley wrote: If it was possible to devise some way to reuse any previous tuplesortstate perhaps just inventing a reset method which clears out tuples, then we could see performance exceed the standard seqscan - sort. The code the way it is seems to lookup the sort functions from the syscache for each group then allocate some sort space, so quite a bit of time is also spent in palloc0() and pfree() If it was not possible to do this then maybe adding a cost to the number of sort groups would be better so that the optimization is skipped if there are too many sort groups. It should be possible. I have hacked a quick proof of concept for reusing the tuplesort state. Can you try it and see if the performance regression is fixed by this? One thing which have to be fixed with my patch is that we probably want to close the tuplesort once we have returned the last tuple from ExecSort(). I have attached my patch and the incremental patch on Alexander's patch. Thanks. It's included into attached version of patch. As wall as estimation improvements, more comments and regression tests fix. -- With best regards, Alexander Korotkov. -- Sent via
Re: [HACKERS] Linux kernel impact on PostgreSQL performance
On 1/13/14, 2:37 PM, Claudio Freire wrote: On Mon, Jan 13, 2014 at 5:32 PM, Jim Nasby j...@nasby.net wrote: That's my point. In terms of kernel-postgres interaction, it's fairly simple. What's not so simple, is figuring out what policy to use. Remember, you cannot tell the kernel to put some page in its page cache without reading it or writing it. So, once you make the kernel forget a page, evicting it from shared buffers becomes quite expensive. Well, if we were to collaborate with the kernel community on this then presumably we can do better than that for eviction... even to the extent of here's some data from this range in this file. It's (clean|dirty). Put it in your cache. Just trust me on this. If I had a kernel developer hat, I'd put it on to say: I don't think allowing that last bit is wise for a kernel. It would violate oh-so-many separation rules and open an oh-so-big can-o-worms. Yeah, if it were me I'd probably want to keep a hash of the page and it's address and only accept putting a page back into the kernel if it matched my hash. Otherwise you'd just have to treat it as a write. -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On 01/13/2014 10:53 PM, Peter Geoghegan wrote: On Mon, Jan 13, 2014 at 12:17 PM, Robert Haas robertmh...@gmail.com wrote: For what it's worth, I agree with Heikki. There's probably nothing sensible an upsert can do if it conflicts with more than one tuple, but if it conflicts with just exactly one, it oughta be OK. If there is exactly one, *and* the existing value is exactly the same as the value proposed for insertion (or, I suppose, a subset of the existing value, but that's so narrow that it might as well not apply). In short, when you're using an exclusion constraint as a unique constraint. Which is very narrow indeed. Weighing the costs and the benefits, that seems like far more cost than benefit, before we even consider anything beyond simply explaining the applicability and limitations of upserting with exclusion constraints. It's generally far cleaner to define speculative insertion as something that happens with unique indexes only. Well, even if you don't agree that locking all the conflicting rows for update is sensible, it's still perfectly sensible to return the rejected rows to the user. For example, you're inserting N rows, and if some of them violate a constraint, you still want to insert the non-conflicting rows instead of rolling back the whole transaction. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance
On 2014-01-13 15:53:36 -0500, Trond Myklebust wrote: I've wondered before if there wouldn't be a chance for postgres to say my dear OS, that the file range 0-8192 of file x contains y, no need to reread and do that when we evict a page from s_b but I never dared to actually propose that to kernel people... O_DIRECT was specifically designed to solve the problem of double buffering between applications and the kernel. Why are you not able to use that in these situations? Because we like to handle the OS handle part of postgres' caching. For one, it makes servers with several applications/databases much more realistic without seriously overallocating memory, for another it's a huge chunk of platform dependent code to get good performance everywhere. The above was explicitly not to avoid double buffering but to move a buffer away from postgres' own buffers to the kernel's buffers once it's not 100% clear we need it in buffers anymore. Part of the reason this is being discussed is because previously people suggested going the direct IO route and some people (most prominently J. Corbet in http://archives.postgresql.org/message-id/20131204083345.31c60dd1%40lwn.net ) and others disagreed because that goes the route of reinventing storage layers everywhere without improving the common codepaths. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Linux kernel impact on PostgreSQL performance
On Mon, Jan 13, 2014 at 12:32 PM, Jim Nasby j...@nasby.net wrote: On 1/13/14, 2:27 PM, Claudio Freire wrote: On Mon, Jan 13, 2014 at 5:23 PM, Jim Nasby j...@nasby.net wrote: On 1/13/14, 2:19 PM, Claudio Freire wrote: On Mon, Jan 13, 2014 at 5:15 PM, Robert Haas robertmh...@gmail.com wrote: On a related note, there's also the problem of double-buffering. When we read a page into shared_buffers, we leave a copy behind in the OS buffers, and similarly on write-out. It's very unclear what to do about this, since the kernel and PostgreSQL don't have intimate knowledge of what each other are doing, but it would be nice to solve somehow. There you have a much harder algorithmic problem. You can basically control duplication with fadvise and WONTNEED. The problem here is not the kernel and whether or not it allows postgres to be smart about it. The problem is... what kind of smarts (algorithm) to use. Isn't this a fairly simple matter of when we read a page into shared buffers tell the kernel do forget that page? And a corollary to that for when we dump a page out of shared_buffers (here kernel, please put this back into your cache). That's my point. In terms of kernel-postgres interaction, it's fairly simple. What's not so simple, is figuring out what policy to use. I think the above is pretty simple for both interaction (allow us to inject a clean page into the file page cache) and policy (forget it after you hand it to us, then remember it again when we hand it back to you clean). And I think it would pretty likely be an improvement over what we currently do. But I think it is probably the wrong way to get the improvement. I think the real problem is that we don't trust ourselves to manage more of the memory ourselves. As far as I know, we still don't have a publicly disclosable and readily reproducible test case for the reports of performance degradation when we have more than 8GB in shared_buffers. If we had one of those, we could likely reduce the double buffering problem by fixing our own scalability issues and therefore taking responsibility for more of the data ourselves. Remember, you cannot tell the kernel to put some page in its page cache without reading it or writing it. So, once you make the kernel forget a page, evicting it from shared buffers becomes quite expensive. Well, if we were to collaborate with the kernel community on this then presumably we can do better than that for eviction... even to the extent of here's some data from this range in this file. It's (clean|dirty). Put it in your cache. Just trust me on this. Which, in the case of it being clean, amounts to Here is data we don't want in memory any more because we think it is cold. But we don't trust ourselves, so please hold on to it anyway. That might be a tough sell to the kernel people. Cheers, Jeff
Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance
On Mon, Jan 13, 2014 at 3:53 PM, Trond Myklebust tron...@gmail.com wrote: O_DIRECT was specifically designed to solve the problem of double buffering between applications and the kernel. Why are you not able to use that in these situations? O_DIRECT was apparently designed by a deranged monkey on some serious mind-controlling substances. But don't take it from me, I have it on good authority: http://yarchive.net/comp/linux/o_direct.html One might even say the best authority. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance
On Mon, 2014-01-13 at 14:32 -0600, Jim Nasby wrote: On 1/13/14, 2:27 PM, Claudio Freire wrote: On Mon, Jan 13, 2014 at 5:23 PM, Jim Nasby j...@nasby.net wrote: On 1/13/14, 2:19 PM, Claudio Freire wrote: On Mon, Jan 13, 2014 at 5:15 PM, Robert Haas robertmh...@gmail.com wrote: On a related note, there's also the problem of double-buffering. When we read a page into shared_buffers, we leave a copy behind in the OS buffers, and similarly on write-out. It's very unclear what to do about this, since the kernel and PostgreSQL don't have intimate knowledge of what each other are doing, but it would be nice to solve somehow. There you have a much harder algorithmic problem. You can basically control duplication with fadvise and WONTNEED. The problem here is not the kernel and whether or not it allows postgres to be smart about it. The problem is... what kind of smarts (algorithm) to use. Isn't this a fairly simple matter of when we read a page into shared buffers tell the kernel do forget that page? And a corollary to that for when we dump a page out of shared_buffers (here kernel, please put this back into your cache). That's my point. In terms of kernel-postgres interaction, it's fairly simple. What's not so simple, is figuring out what policy to use. Remember, you cannot tell the kernel to put some page in its page cache without reading it or writing it. So, once you make the kernel forget a page, evicting it from shared buffers becomes quite expensive. Well, if we were to collaborate with the kernel community on this then presumably we can do better than that for eviction... even to the extent of here's some data from this range in this file. It's (clean| dirty). Put it in your cache. Just trust me on this. This should be the madvise() interface (with MADV_WILLNEED and MADV_DONTNEED) is there something in that interface that is insufficient? James -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance
On Jan 13, 2014, at 15:40, Andres Freund and...@2ndquadrant.com wrote: On 2014-01-13 15:15:16 -0500, Robert Haas wrote: On Mon, Jan 13, 2014 at 1:51 PM, Kevin Grittner kgri...@ymail.com wrote: I notice, Josh, that you didn't mention the problems many people have run into with Transparent Huge Page defrag and with NUMA access. Amen to that. Actually, I think NUMA can be (mostly?) fixed by setting zone_reclaim_mode; is there some other problem besides that? I think that fixes some of the worst instances, but I've seen machines spending horrible amounts of CPU ( BUS) time in page reclaim nonetheless. If I analyzed it correctly it's in RAM working set workloads where RAM is pretty large and most of it is used as page cache. The kernel ends up spending a huge percentage of time finding and potentially defragmenting pages when looking for victim buffers. On a related note, there's also the problem of double-buffering. When we read a page into shared_buffers, we leave a copy behind in the OS buffers, and similarly on write-out. It's very unclear what to do about this, since the kernel and PostgreSQL don't have intimate knowledge of what each other are doing, but it would be nice to solve somehow. I've wondered before if there wouldn't be a chance for postgres to say my dear OS, that the file range 0-8192 of file x contains y, no need to reread and do that when we evict a page from s_b but I never dared to actually propose that to kernel people... O_DIRECT was specifically designed to solve the problem of double buffering between applications and the kernel. Why are you not able to use that in these situations? Cheers, Trond -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Linux kernel impact on PostgreSQL performance
Josh Berkus j...@agliodbs.com wrote: Wanna go to Collab? I don't think that works out for me, but thanks for suggesting it. I'd be happy to brainstorm with anyone who does go about issues to discuss; although the ones I keep running into have already been mentioned. Regarding the problems others have mentioned, there are a few features that might be a very big plus for us. Additional ways of hinting pages might be very useful. If we had a way to specify how many dirty pages were cached in PostgreSQL, the OS would count those for calculations for writing dirty pages, and we could avoid the write avalanche which is currently so tricky to avoid without causing repeated writes to the same page. Or perhaps instead a way to hint a page as dirty so that the OS could not only count those, but discard the obsolete data from its cache if it is not already dirty at the OS level, and lower the write priority if it is dirty (to improve the odds of collapsing multiple writes). If there was a way to use DONTNEED or something similar with the ability to rescind it if the page was still happened to be in the OS cache, that might help for when we discard a still-clean page from our buffers. And I seem to have a vague memory of there being cases where the OS is first reading pages when we ask to write them, which seems like avoidable I/O. (I'm not sure about that one, though.) Also, something like THP support should really have sysctl support rather than requiring people to put echo commands into scripts and tie those into runlevel changes. That's pretty ugly for something which has turned out to be necessary so often. I don't get too excited about changes to the default schedulers -- it's been pretty widely known for a long time that DEADLINE or NOOP perform better than any alternatives for most database loads. Anyone with a job setting up Linux machines to be used for database servers should know to cover that. As long as those two don't get broken, I'm good. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance
On 2014-01-13 12:34:35 -0800, James Bottomley wrote: On Mon, 2014-01-13 at 14:32 -0600, Jim Nasby wrote: Well, if we were to collaborate with the kernel community on this then presumably we can do better than that for eviction... even to the extent of here's some data from this range in this file. It's (clean| dirty). Put it in your cache. Just trust me on this. This should be the madvise() interface (with MADV_WILLNEED and MADV_DONTNEED) is there something in that interface that is insufficient? For one, postgres doesn't use mmap for files (and can't without major new interfaces). Frequently mmap()/madvise()/munmap()ing 8kb chunks has horrible consequences for performance/scalability - very quickly you contend on locks in the kernel. Also, that will mark that page dirty, which isn't what we want in this case. One major usecase is transplanting a page comming from postgres' buffers into the kernel's buffercache because the latter has a much better chance of properly allocating system resources across independent applications running. Oh, and the kernel's page-cache management while far from perfect, actually scales much better than postgres'. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On 1/13/14, 12:21 PM, Joshua D. Drake wrote: On 01/13/2014 10:12 AM, Hannu Krosing wrote: In other words, if we're going to have auto-degrade, the most intelligent place for it is in RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest* place. Anything we do *inside* Postgres is going to have a really, really hard time determining when to degrade. +1 This is also how 2PC works, btw - the database provides the building blocks, i.e. PREPARE and COMMIT, and leaves it to a transaction manager to deal with issues that require a whole-cluster perspective. ++1 +1 Josh, what do you think of the upthread idea of being able to recover in-progress transactions that are waiting when we turn off sync rep? I'm thinking that would be a very good feature to have... and it's not something you can easily do externally. -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On 2014-01-13 15:14:21 -0600, Jim Nasby wrote: On 1/13/14, 12:21 PM, Joshua D. Drake wrote: On 01/13/2014 10:12 AM, Hannu Krosing wrote: In other words, if we're going to have auto-degrade, the most intelligent place for it is in RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest* place. Anything we do *inside* Postgres is going to have a really, really hard time determining when to degrade. +1 This is also how 2PC works, btw - the database provides the building blocks, i.e. PREPARE and COMMIT, and leaves it to a transaction manager to deal with issues that require a whole-cluster perspective. ++1 +1 Josh, what do you think of the upthread idea of being able to recover in-progress transactions that are waiting when we turn off sync rep? I'm thinking that would be a very good feature to have... and it's not something you can easily do externally. I think it'd be a fairly simple patch to re-check the state of syncrep config in SyncRepWaitForLsn(). Alternatively you can just write code to iterate over the procarray and sets Proc-syncRepState to SYNC_REP_WAIT_CANCELLED or such. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disallow arrays with non-standard lower bounds
On 1/13/14, 10:40 AM, Merlin Moncure wrote: On Sun, Jan 12, 2014 at 4:38 AM, Craig Ringercr...@2ndquadrant.com wrote: Implicit casts to text, anybody? This backward compatibility break orphaned the company I work for on 8.1 until last year and very nearly caused postgres to be summarily extirpated (only rescued at the last minute by my arrival). It cost hundreds of thousands of dollars to qualify a sprawling java code base so that it could be moved back into a supported version. Breaking compatibility sucks -- it hurts your users and costs people money. Hacking type casts may not have been a mistake, but the arbitrary introduction of the breakage certainly was. IMHO, I see no reason we need to plan on removing support. Presumably it's not that much burden on our codebase; it's only a PITA for users writing correct code. (It'd be very interesting to see how much user code would blow up if presented with anything other than 1 as the lower bound...) I'd be perfectly happy with an initdb option to allow for lower bound support if you wanted it and disable it by default. People the legitimately want/need 1 lower bounds can set that up, but our general user population will end up protected from a class of heisenbugs. -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Lsf-pc] Linux kernel impact on PostgreSQL performance
On Mon, Jan 13, 2014 at 9:12 PM, Andres Freund and...@2ndquadrant.com wrote: For one, postgres doesn't use mmap for files (and can't without major new interfaces). Frequently mmap()/madvise()/munmap()ing 8kb chunks has horrible consequences for performance/scalability - very quickly you contend on locks in the kernel. I may as well dump this in this thread. We've discussed this in person a few times, including at least once with Ted T'so when he visited Dublin last year. The fundamental conflict is that the kernel understands better the hardware and other software using the same resources, Postgres understands better its own access patterns. We need to either add interfaces so Postgres can teach the kernel what it needs about its access patterns or add interfaces so Postgres can find out what it needs to know about the hardware context. The more ambitious and interesting direction is to let Postgres tell the kernel what it needs to know to manage everything. To do that we would need the ability to control when pages are flushed out. This is absolutely necessary to maintain consistency. Postgres would need to be able to mark pages as unflushable until some point in time in the future when the journal is flushed. We discussed various ways that interface could work but it would be tricky to keep it low enough overhead to be workable. The less exciting, more conservative option would be to add kernel interfaces to teach Postgres about things like raid geometries. Then Postgres could use directio and decide to do prefetching based on the raid geometry, how much available i/o bandwidth and iops is available, etc. Reimplementing i/o schedulers and all the rest of the work that the kernel provides inside Postgres just seems like something outside our competency and that none of us is really excited about doing. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On 01/13/2014 01:14 PM, Jim Nasby wrote: On 1/13/14, 12:21 PM, Joshua D. Drake wrote: On 01/13/2014 10:12 AM, Hannu Krosing wrote: In other words, if we're going to have auto-degrade, the most intelligent place for it is in RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest* place. Anything we do *inside* Postgres is going to have a really, really hard time determining when to degrade. +1 This is also how 2PC works, btw - the database provides the building blocks, i.e. PREPARE and COMMIT, and leaves it to a transaction manager to deal with issues that require a whole-cluster perspective. ++1 +1 Josh, what do you think of the upthread idea of being able to recover in-progress transactions that are waiting when we turn off sync rep? I'm thinking that would be a very good feature to have... and it's not something you can easily do externally. I think it is extremely valuable, else we have lost those transactions which is exactly what we don't want. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc In a time of universal deceit - telling the truth is a revolutionary act., George Orwell -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disallow arrays with non-standard lower bounds
On Mon, Jan 13, 2014 at 10:40:57AM -0600, Merlin Moncure wrote: On Sun, Jan 12, 2014 at 4:38 AM, Craig Ringer cr...@2ndquadrant.com wrote: Implicit casts to text, anybody? This backward compatibility break orphaned the company I work for on 8.1 until last year and very nearly caused postgres to be summarily extirpated (only rescued at the last minute by my arrival). It cost hundreds of thousands of dollars to qualify a sprawling java code base so that it could be moved back into a supported version. Breaking compatibility sucks -- it hurts your users and costs people money. Hacking type casts may not have been a mistake, but the arbitrary introduction of the breakage certainly was. With utmost respect, it was not. Databases are no good if there are fixable things in them that cause them to produce incorrect results at random, as auto-casting to text did. This project has no deprecation policy, I believe it actually does, although it's not a formal, written policy. Would you like to help draft one up? and I'd argue we'd need one before considering breaking changes. For example, maybe we could pull out an occasional release for longer term support to help users that caught out. But really, the better way to go IMNSHO is to take a hard line on compatibility issues pretty much always -- consider the case of libc and win32 api. Could you please help remind us what that was? There are certain limited exceptions to this rule -- for example security problems Probably not. or gross violations of the standard We're definitely there on lower bounds of arrays. The standard, for a wonder, is clear and unambiguous about them. Whether we should go there on the rest of our array implementation is a question for another thread. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql.consistent_into
On 1/13/14, 1:44 AM, Pavel Stehule wrote: 2014/1/12 Florian Pflug f...@phlo.org mailto:f...@phlo.org On Jan12, 2014, at 22:37 , Pavel Stehule pavel.steh...@gmail.com mailto:pavel.steh...@gmail.com wrote: There is GUC for variable_conflict already too. In this case I would to enable this functionality everywhere (it is tool how to simply eliminate some kind of strange bugs) so it needs a GUC. We have GUC for plpgsql.variable_conflict three years and I don't know about any problem. I must say I hate behaviour-changing GUCs with quite some passion. IMHO they tend to cause bugs, not avoid them, in the long run. The pattern usually is 1) Code gets written, depends on some particular set of settings to work correctly 2) Code gets reused, with little further testing since it's supposed to be battle-proven anyway. Settings get dropped. 3) Code blows up for those corner-cases where the setting actually matter. Debugging is hell, because you effectively have to go over the code line-by-line and check if it might be affected by some GUC or another. Only a few days ago I spent more than an hour tracking down a bug which, as it turned out, was caused by a regex which subtly changed its meaning depending on whether standard_conforming_strings is on or off. Some GUCs are unavoidable - standard_conforming_strings, for example probably still was a good idea, since the alternative would have been to stick with the historical, non-standard behaviour forever. But in this case, my feeling is that the trouble such a GUC may cause out-weights the potential benefits. I'm all for having a directive like #consistent_into (though I feel that the name could convey the meaning better). If we *really* think that this ought to be the default from 9.4 onward, then we should *) Change it to always complain, except if the function explictly specifies #consistent_into on or whatever. *) Have pg_dump add that to all plpgsql functions if the server version is 9.4 or whatever major release this ends up in That's all just my opinion of course. I am thinking so GUC and plpgsql option can live together. If you like to accent a some behave, then you can use a plpgsql option. On second hand, I would to use a some functionality, that is safe, but I don't would to dirty source code by using repeated options. But I have to check (and calculate with risk) a GUC settings. One idea: required GUC? Can be nice a possibility to ensure some GUC setting, and restore ensure these values or raises warning. Back to main topic. Required and described feature doesn't change a behave of INTO clause. I can enable or disable this functionality and well written code should to work without change (and problems). When check is disabled, then execution is just less safe. So in this case, a impact of GUC is significantly less than by you described issues. Does know anybody a use case where this check should be disabled? Probably we have a different experience about GUC. I had a problem with standard_conforming_strings and bytea format some years ago. Now I prepare document about required setting. But I can see (from my experience from Czech area) more often problems related to effective_cache_size or from_collapse_limit and similar GUC. These parameters are behind knowledge (and visibility) typical user. ISTM that in this case, it should be safe to make the new default behavior STRICT; if you forget to set the GUC to disable than you'll get an error that points directly at the problem, at which point you'll go Oh, yeah... I forgot to set X... Outside of the GUC, I believe the default should definitely be STRICT. If your app is relying on non-strict then you need to be made aware of that. We should be able to provide a DO block that will change this setting for every function you've got if someone isn't happy with STRICT mode. -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disallow arrays with non-standard lower bounds
On Mon, Jan 13, 2014 at 4:45 PM, David Fetter da...@fetter.org wrote: On Mon, Jan 13, 2014 at 10:40:57AM -0600, Merlin Moncure wrote: On Sun, Jan 12, 2014 at 4:38 AM, Craig Ringer cr...@2ndquadrant.com wrote: Implicit casts to text, anybody? This backward compatibility break orphaned the company I work for on 8.1 until last year and very nearly caused postgres to be summarily extirpated (only rescued at the last minute by my arrival). It cost hundreds of thousands of dollars to qualify a sprawling java code base so that it could be moved back into a supported version. Breaking compatibility sucks -- it hurts your users and costs people money. Hacking type casts may not have been a mistake, but the arbitrary introduction of the breakage certainly was. With utmost respect, it was not. Databases are no good if there are fixable things in them that cause them to produce incorrect results at random, as auto-casting to text did. With a precisely equal level of respect, that's a load of bunk. As has been discussed here many times in the past, those changes broke many applications that were just fine. Mine included, EnterpriseDB's included. It was designed to maximize rather than minimize breakage in ways that were completely unnecessary, and every time anyone dares to argue that the change was less than perfect in every respect, the reponses evince a level of dubiousness normally reserved for the parole hearings of convicted murderers. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Lsf-pc] Linux kernel impact on PostgreSQL performance
Everyone, I am looking for one or more hackers to go to Collab with me to discuss this. If you think that might be you, please let me know and I'll look for funding for your travel. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Syntax of INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On 1/12/14, 9:35 PM, Andreas Karlsson wrote: On 01/12/2014 11:20 PM, Peter Geoghegan wrote: On Sun, Jan 12, 2014 at 8:12 AM, Andreas Karlsson andr...@proxel.se wrote: On 01/11/2014 11:42 PM, Peter Geoghegan wrote: I recently suggested that rather than RETURNING REJECTS, we could have a REJECTING clause, which would see a DML statement project strictly the complement of what RETURNING projects in the same context. So perhaps you could also see what RETURNING would not have projected because a before row trigger returned NULL (i.e. when a before trigger indicates to not proceed with insertion). That is certainly more general, and so is perhaps preferable. It's also less verbose, and it seems less likely to matter that we'll need to make REJECTING a fully reserved keyword, as compared to REJECTS. (RETURNING is already a fully reserved keyword not described by the standard, so this makes a certain amount of sense to me). If nothing else, REJECTING is more terse than RETURNING REJECTS. I do not entirely understand what you are proposing here. Any example how this would look compared to your RETURNING REJECTS proposal? It's very similar - REJECTING is a total generalization of what I already have. The difference is only that REJECTING is accepted in all contexts that RETURNING is, and not just with INSERT...ON DUPLICATE KEY LOCK FOR UPDATE. So you could potentially have REJECTING project the slot proposed for insertion on an UPDATE where RETURNING would not. If for example a BEFORE ROW trigger fired, and returned NULL, perhaps it'd then be possible to project the slot as it was before being passed to the trigger. Perhaps there is no real demand for that, but, as I said, from a usability perspective it may be easier to reason about a feature that projects strictly the complement of what RETURNING would project in the same context. So simply this? WITH rej AS ( INSERT INTO foo (a, b, c) VALUES (44, 1078, 'insert'), (55, 1088, 'insert') REJECTING a) UPDATE foo SET c = 'update' FROM rej WHERE foo.a = rej.a; Another question: have you given any thought on the case where you want to use both the successfully inserted tuples and the rejected and use in the CTE? Is that even something anyone would want? Would perhaps MERGE be more suited for that? Well, a common case for INSERT RETURNING is to get your set of surrogate keys back; so I think users would want the ability to RETURN what finally made it into the table. Also, if we want to support the case of identifying tuples where a BEFORE trigger disallowed the insert, we probably want to expose that that's why those tuples were rejected (as opposed to them being rejected due to a duplicate key violation). -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disallow arrays with non-standard lower bounds
On Mon, Jan 13, 2014 at 3:45 PM, David Fetter da...@fetter.org wrote: On Mon, Jan 13, 2014 at 10:40:57AM -0600, Merlin Moncure wrote: This project has no deprecation policy, I believe it actually does, although it's not a formal, written policy. Would you like to help draft one up? Lack of 'formal, written, policy' is equivalent to 'no policy'. Regardless, the way things we done in the 7.x/8.x series may no longer apply today; the project has grown up and we need to be more serious about things, at least, IMNSHO. and I'd argue we'd need one before considering breaking changes. For example, maybe we could pull out an occasional release for longer term support to help users that caught out. But really, the better way to go IMNSHO is to take a hard line on compatibility issues pretty much always -- consider the case of libc and win32 api. Could you please help remind us what that was? Let's take gets() for example. C11 finally ditched it 12 years (!) after it was formally deprecated in C99 and informally deprecate in endless man pages (don't use this!) for decades before that. And even then most compilers, at least the decent ones, should allow to request previous standards for some time beyond that. The win32 API is also remarkably stable; ancient code written for it beyond the dim horizon of time will still compile and execute today. These are probably strong contenders for most popular APIs ever made -- see the connection? Now, comparing C APIs to an SQL implementation for deprecation purposes isn't quite applies to apples, but I'll stand by the analogy. or gross violations of the standard We're definitely there on lower bounds of arrays. The standard, for a wonder, is clear and unambiguous about them. Whether we should go there on the rest of our array implementation is a question for another thread. The SQL standard requests that standard syntax gives standard behavior. Alternate bounds is non-standard syntax giving non-standard behavior and is thus excepted. Naturally, non-standard syntax is dangerous because the standard may later implement it in which case you then have a real problem (that may be the case here: I don't know). Our array implementation is a real mess on multiple levels but at least it's an internally consistent mess. Maybe it really should be 'fixed', but not before the super un-fun discussion of how to ease the path for our hapless users happens first. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Linux kernel impact on PostgreSQL performance
On Mon, Jan 13, 2014 at 03:15:16PM -0500, Robert Haas wrote: On Mon, Jan 13, 2014 at 1:51 PM, Kevin Grittner kgri...@ymail.com wrote: I notice, Josh, that you didn't mention the problems many people have run into with Transparent Huge Page defrag and with NUMA access. Ok, there are at least three potential problems there that you may or may not have run into. First, THP when it was first introduced was a bit of a disaster. In 3.0, it was *very* heavy handed and would trash the system reclaiming memory to satisfy an allocation. When it did this, it would also writeback a bunch of data and block on it to boot. It was not the smartest move of all time but was improved over time and in some cases the patches were also backported by 3.0.101. This is a problem that should have alleviated over time. The general symptoms of the problem would be massive stalls and monitoring the /proc/PID/stack of interesting processes would show it to be somewhere in do_huge_pmd_anonymous_page - alloc_page_nodemask - try_to_free_pages - migrate_pages or something similar. You may have worked around it by disabling THP with a command line switch or /sys/kernel/mm/transparent_hugepage/enabled in the past. This is not meant to happen any more or at least it has been a while since a bug was filed against me in this area. There are corner cases though. If the underlying filesystem is NFS, the problem might still be experienced. That is the simple case. You might have also hit the case where THPages filled with zeros did not use the zero page. That would have looked like a larger footprint than anticipated and lead to another range of problems. This is also addressed since but maybe not recently enough. It's less likely this is your problem though as I expect you actually use your buffers, not leave them filled with zeros. You mention NUMA but that's trickier to figure out that problem without more context. THP can cause unexpected interleaving between NUMA nodes. Memory that would have been local on a 4K page boundary becomes remote accesses when THP is enabled and performance would be hit (maybe 3-5% depending on the machine). It's not the only possibility though. If memory was being used sparsely and THP was in use then the overall memory footprint may be higher than it should be. This potentially would cause allocations to spill over to remote nodes while kswapd wakes up to reclaim local memory. That would lead to weird buffer aging inversion problems. This is a hell of a lot of guessing though and we'd need a better handle on the reproduction case to pin it down. Amen to that. Actually, I think NUMA can be (mostly?) fixed by setting zone_reclaim_mode; is there some other problem besides that? Really? zone_reclaim_mode is often a complete disaster unless the workload is partitioned to fit within NUMA nodes. On older kernels enabling it would sometimes cause massive stalls. I'm actually very surprised to hear it fixes anything and would be interested in hearing more about what sort of circumstnaces would convince you to enable that thing. The other thing that comes to mind is the kernel's caching behavior. We've talked a lot over the years about the difficulties of getting the kernel to write data out when we want it to and to not write data out when we don't want it to. Is sync_file_range() broke? When it writes data back to disk too aggressively, we get lousy throughput because the same page can get written more than once when caching it for longer would have allowed write-combining. Do you think that is related to dirty_ratio or dirty_writeback_centisecs? If it's dirty_writeback_centisecs then that would be particularly tricky because poor interactions there would come down to luck basically. When it doesn't write data to disk aggressively enough, we get huge latency spikes at checkpoint time when we call fsync() and the kernel says uh, what? you wanted that data *on the disk*? sorry boss! and then proceeds to destroy the world by starving the rest of the system for I/O for many seconds or minutes at a time. Ok, parts of that are somewhat expected. It *may* depend on the underlying filesystem. Some of them handle fsync better than others. If you are syncing the whole file though when you call fsync then you are potentially burned by having to writeback dirty_ratio amounts of memory which could take a substantial amount of time. We've made some desultory attempts to use sync_file_range() to improve things here, but I'm not sure that's really the right tool, and if it is we don't know how to use it well enough to obtain consistent positive results. That implies that either sync_file_range() is broken in some fashion we (or at least I) are not aware of and that needs kicking. On a related note, there's also the problem of double-buffering. When we read a page into shared_buffers, we leave a copy behind in the OS buffers, and similarly on write-out. It's very unclear what to
Re: [HACKERS] Linux kernel impact on PostgreSQL performance
On Mon, Jan 13, 2014 at 06:27:03PM -0200, Claudio Freire wrote: On Mon, Jan 13, 2014 at 5:23 PM, Jim Nasby j...@nasby.net wrote: On 1/13/14, 2:19 PM, Claudio Freire wrote: On Mon, Jan 13, 2014 at 5:15 PM, Robert Haas robertmh...@gmail.com wrote: On a related note, there's also the problem of double-buffering. When we read a page into shared_buffers, we leave a copy behind in the OS buffers, and similarly on write-out. It's very unclear what to do about this, since the kernel and PostgreSQL don't have intimate knowledge of what each other are doing, but it would be nice to solve somehow. There you have a much harder algorithmic problem. You can basically control duplication with fadvise and WONTNEED. The problem here is not the kernel and whether or not it allows postgres to be smart about it. The problem is... what kind of smarts (algorithm) to use. Isn't this a fairly simple matter of when we read a page into shared buffers tell the kernel do forget that page? And a corollary to that for when we dump a page out of shared_buffers (here kernel, please put this back into your cache). That's my point. In terms of kernel-postgres interaction, it's fairly simple. What's not so simple, is figuring out what policy to use. Remember, you cannot tell the kernel to put some page in its page cache without reading it or writing it. So, once you make the kernel forget a page, evicting it from shared buffers becomes quite expensive. posix_fadvise(POSIX_FADV_WILLNEED) is meant to cover this case by forcing readahead. If you evict it prematurely then you do get kinda screwed because you pay the IO cost to read it back in again even if you had enough memory to cache it. Maybe this is the type of kernel-postgres interaction that is annoying you. If you don't evict, the kernel eventually steps in and evicts the wrong thing. If you do evict and it was unnecessarily you pay an IO cost. That could be something we look at. There are cases buried deep in the VM where pages get shuffled to the end of the LRU and get tagged for reclaim as soon as possible. Maybe you need access to something like that via posix_fadvise to say reclaim this page if you need memory but leave it resident if there is no memory pressure or something similar. Not exactly sure what that interface would look like or offhand how it could be reliably implemented. -- Mel Gorman SUSE Labs -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Mon, Jan 13, 2014 at 12:58 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Well, even if you don't agree that locking all the conflicting rows for update is sensible, it's still perfectly sensible to return the rejected rows to the user. For example, you're inserting N rows, and if some of them violate a constraint, you still want to insert the non-conflicting rows instead of rolling back the whole transaction. Right, but with your approach, can you really be sure that you have the right rejecting tuple ctid (not reject)? In other words, as you wait for the exclusion constraint to conclusively indicate that there is a conflict, minutes may have passed in which time other conflicts may emerge in earlier unique indexes. Whereas with an approach where values are locked, you are guaranteed that earlier unique indexes have no conflicting values. Maintaining that property seems useful, since we check in a well-defined order, and we're still projecting a ctid. Unlike when row locking is involved, we can make no assumptions or generalizations around where conflicts will occur. Although that may also be a general concern with your approach when row locking, for multi-master replication use-cases. There may be some value in knowing it cannot have been earlier unique indexes (and so the existing values for those unique indexes in the locked row should stay the same - don't many conflict resolution policies work that way?). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance
On 2014-01-13 14:19:56 -0800, James Bottomley wrote: Frequently mmap()/madvise()/munmap()ing 8kb chunks has horrible consequences for performance/scalability - very quickly you contend on locks in the kernel. Is this because of problems in the mmap_sem? It's been a while since I looked at it, but yes, mmap_sem was part of it. I also seem to recall the amount of IPIs increasing far too much for it to be practical, but I am not sure anymore. Also, that will mark that page dirty, which isn't what we want in this case. You mean madvise (page_addr)? It shouldn't ... the state of the dirty bit should only be updated by actual writes. Which MADV_ primitive is causing the dirty marking, because we might be able to fix it (unless there's some weird corner case I don't know about). Not the madvise() itself, but transplanting the buffer from postgres' buffers to the mmap() area of the underlying file would, right? We also do have a way of transplanting pages: it's called splice. How do the semantics of splice differ from what you need? Hm. I don't really see how splice would allow us to seed the kernel's pagecache with content *without* marking the page as dirty in the kernel. We don't need zero-copy IO here, the important thing is just to fill the pagecache with content without a) rereading the page from disk b) marking the page as dirty. One major usecase is transplanting a page comming from postgres' buffers into the kernel's buffercache because the latter has a much better chance of properly allocating system resources across independent applications running. If you want to share pages between the application and the page cache, the only known interface is mmap ... perhaps we can discuss how better to improve mmap for you? I think purely using mmap() is pretty unlikely to work out - there's just too many constraints about when a page is allowed to be written out (e.g. it's interlocked with postgres' write ahead log). I also think that for many practical purposes using mmap() would result in an absurd number of mappings or mapping way too huge areas; e.g. large btree indexes are usually accessed in a quite fragmented manner. Oh, and the kernel's page-cache management while far from perfect, actually scales much better than postgres'. Well, then, it sounds like the best way forward would be to get postgress to use the kernel page cache more efficiently. No arguments there, although working on postgres scalability is a good idea as well ;) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance
On Mon, Jan 13, 2014 at 11:38:44PM +0100, Jan Kara wrote: On Mon 13-01-14 22:26:45, Mel Gorman wrote: The flipside is also meant to hold true. If you know data will be needed in the near future then posix_fadvise(POSIX_FADV_WILLNEED). Glancing at the implementation it does a forced read-ahead on the range of pages of interest. It doesn't look like it would block. That's not quite true. POSIX_FADV_WILLNEED still needs to map logical file offsets to physical disk blocks and create IO requests. This happens synchronously. So if your disk is congested and relevant metadata is out of cache, or we simply run out of free IO requests, POSIX_FADV_WILLNEED can block for a significant amount of time. Umm, yes, you're right. It also potentially stalls allocating the pages up front even though it will only try and direct reclaim pages once. That can stall in some circumstances, particularly if there are a number of processes trying to reclaim memory. That kinda sucks though. One point of discussion would be to check if this is an interface that can be used and if so, is it required to never block and if so is there something we can do about it -- queue the IO asynchronously if you can but if the kernel would block then do not bother. That does mean that fadvise is not guaranteeing that the pages will be resident in the future but it was not the intent of the interface anyway. -- Mel Gorman SUSE Labs -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Syntax of INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Mon, Jan 13, 2014 at 2:20 PM, Jim Nasby j...@nasby.net wrote: Well, a common case for INSERT RETURNING is to get your set of surrogate keys back; so I think users would want the ability to RETURN what finally made it into the table. Your update can also have a RETURNING clause. I'm not necessarily that attached to fully generalizing RETURNING REJECTS as REJECTING. It was just an idea. When an insert is rejected and you lock a conflicting row, it hardly matters what your surrogate key might have been had that insert succeeded. To get the surrogate key when it upsert inserts, do a regular INSERTRETURNING..., and break the work up into multiple commands. That will almost always be sufficient, because you'll almost always know ahead of time where the conflict might be (certainly, the MySQL feature mandates that you do know). Also, if we want to support the case of identifying tuples where a BEFORE trigger disallowed the insert, we probably want to expose that that's why those tuples were rejected (as opposed to them being rejected due to a duplicate key violation). The ctid *won't* indicate a specific rejecting row then, I guess, which will do it. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On Jan13, 2014, at 22:30 , Joshua D. Drake j...@commandprompt.com wrote: On 01/13/2014 01:14 PM, Jim Nasby wrote: On 1/13/14, 12:21 PM, Joshua D. Drake wrote: On 01/13/2014 10:12 AM, Hannu Krosing wrote: In other words, if we're going to have auto-degrade, the most intelligent place for it is in RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest* place. Anything we do *inside* Postgres is going to have a really, really hard time determining when to degrade. +1 This is also how 2PC works, btw - the database provides the building blocks, i.e. PREPARE and COMMIT, and leaves it to a transaction manager to deal with issues that require a whole-cluster perspective. ++1 +1 Josh, what do you think of the upthread idea of being able to recover in-progress transactions that are waiting when we turn off sync rep? I'm thinking that would be a very good feature to have... and it's not something you can easily do externally. I think it is extremely valuable, else we have lost those transactions which is exactly what we don't want. We *have* to recover waiting transaction upon switching off sync rep. A transaction that waits for a sync standby to respond has already committed locally (i.e., updated the clog), it just hasn't updated the proc array yet, and thus is still seen as in-progress by the rest of the system. But rolling back the transaction is nevertheless *impossible* at that point (except by PITR, and hence the quoted around reciver). So the only alternative to recovering them, i.e. have them abort their waiting, is to let them linger indefinitely, still holding their locks, preventing xmin from advancing, etc, until either the client disconnects or the server is restarted. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Linux kernel impact on PostgreSQL performance
On 01/13/2014 02:26 PM, Mel Gorman wrote: Really? zone_reclaim_mode is often a complete disaster unless the workload is partitioned to fit within NUMA nodes. On older kernels enabling it would sometimes cause massive stalls. I'm actually very surprised to hear it fixes anything and would be interested in hearing more about what sort of circumstnaces would convince you to enable that thing. So the problem with the default setting is that it pretty much isolates all FS cache for PostgreSQL to whichever socket the postmaster is running on, and makes the other FS cache unavailable. This means that, for example, if you have two memory banks, then only one of them is available for PostgreSQL filesystem caching ... essentially cutting your available cache in half. And however slow moving cached pages between memory banks is, it's an order of magnitude faster than moving them from disk. But this isn't how the NUMA stuff is configured; it seems to assume that it's less expensive to get pages from disk than to move them between banks, so whatever you've got cached on the other bank, it flushes it to disk as fast as possible. I understand the goal was to make memory usage local to the processors stuff was running on, but that includes an implicit assumption that no individual process will ever want more than one memory bank worth of cache. So disabling all of the NUMA optimizations is the way to go for any workload I personally deal with. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disallow arrays with non-standard lower bounds
On 01/14/2014 12:40 AM, Merlin Moncure wrote: On Sun, Jan 12, 2014 at 4:38 AM, Craig Ringer cr...@2ndquadrant.com wrote: Implicit casts to text, anybody? This backward compatibility break orphaned the company I work for on 8.1 until last year and very nearly caused postgres to be summarily extirpated (only rescued at the last minute by my arrival). You're far from the only one, too. Until last year I was still seeing people saying they can't upgrade because of this. OTOH, that was a sudden and drastic change, with no BC switch like the removal of implicit joins had, that affected wide swaths of code. Lets not do that again. Removal of lower bounds for arrays is unlikely to even get noticed by the vast majority of users, and can be done progressively with BC features. The real issue IMO is how to get those few to stop using it so it can be truly removed. Past experience has shown that people just turn the compatibility flag on and forget they're using the deprecated feature. If there are warnings they'll silence them in their application and still forget they're using the deprecated feature. If there are log messages, they'll probably turn logging detail down to hide them and STILL forget they're using the deprecated feature. Then whine about it three years later when it gets removed. So I guess the question is: Is it worth all that hassle to remove a misfeature you have to go out of your way to use? Is support for non-1 lower bounds stopping us from doing something useful and important? Or is it just an irritation that it exists? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part 1: additional information
On 13.1.2014 18:07, Alexander Korotkov wrote: On Sat, Jan 11, 2014 at 6:15 AM, Tomas Vondra t...@fuzzy.cz mailto:t...@fuzzy.cz wrote: On 8.1.2014 22:58, Alexander Korotkov wrote: Thanks for reporting. Fixed version is attached. I've tried to rerun the 'archie' benchmark with the current patch, and once again I got PANIC: could not split GIN page, didn't fit I reran it with '--enable-cassert' and with that I got TRAP: FailedAssertion(!(ginCompareItemPointers(items[i - 1], items[i]) 0), File: gindatapage.c, Line: 149) LOG: server process (PID 5364) was terminated by signal 6: Aborted DETAIL: Failed process was running: INSERT INTO messages ... so the assert in GinDataLeafPageGetUncompressed fails for some reason. I can easily reproduce it, but my knowledge in this area is rather limited so I'm not entirely sure what to look for. I've fixed this bug and many other bug. Now patch passes test suite that I've used earlier. The results are so: OK, it seems the bug is gone. However now there's a memory leak somewhere. I'm loading pgsql mailing list archives (~600k messages) using this script https://bitbucket.org/tvondra/archie/src/1bbeb920/bin/load.py And after loading about 1/5 of the data, all the memory gets filled by the pgsql backends (loading the data in parallel) and the DB gets killed by the OOM killer. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql.consistent_into
On Jan13, 2014, at 22:49 , Jim Nasby j...@nasby.net wrote: ISTM that in this case, it should be safe to make the new default behavior STRICT; if you forget to set the GUC to disable than you'll get an error that points directly at the problem, at which point you'll go Oh, yeah... I forgot to set X... What do you mean by STRICT? STRICT (which we already support) complains if the query doesn't return *exactly* one row. What Marko wants is to raise an error for a plain SELECT ... INTO if more than one row is returned, but to still convert zero rows to NULL. Outside of the GUC, I believe the default should definitely be STRICT. If your app is relying on non-strict then you need to be made aware of that. We should be able to provide a DO block that will change this setting for every function you've got if someone isn't happy with STRICT mode. If you mean that we should change SELECT ... INTO to always behave as if STRICT had been specified - why on earth would we want to do that? That would break perfectly fine code for no good reason whatsoever. In fact, after reading the documentation on SELECT ... INTO, I'm convinced the the whole consistent_into thing is a bad idea. The documentation states clearly that For INSERT/UPDATE/DELETE with RETURNING, PL/pgSQL reports an error for more than one returned row, even when STRICT is not specified. This is because there is no option such as ORDER BY with which to determine which affected row should be returned. It therefor isn't an oversight that SELECT ... INTO allows multiple result rows but INSERT/UPDATE/DELETE forbids them, it's been done that way on purpose and for a reason. We shouldn't be second-guessing ourselves by changing that later - not, at least, unless we have a *very* good reason for it. Which, AFAICS, we don't. (And yeah, personally I'd prefer if we'd complain about multiple rows. But it's IMHO just too late for that) best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql.consistent_into
On 1/14/14, 12:41 AM, Florian Pflug wrote: In fact, after reading the documentation on SELECT ... INTO, I'm convinced the the whole consistent_into thing is a bad idea. The documentation states clearly that For INSERT/UPDATE/DELETE with RETURNING, PL/pgSQL reports an error for more than one returned row, even when STRICT is not specified. This is because there is no option such as ORDER BY with which to determine which affected row should be returned. It therefor isn't an oversight that SELECT ... INTO allows multiple result rows but INSERT/UPDATE/DELETE forbids them, it's been done that way on purpose and for a reason. Yeah, it does state that. But it's a BS reason. In addition to ORDER BY, SELECT also has a LIMIT which you can use to get the first row behaviour. There's no way to go to the more sane behaviour from what we have right now. When I've worked with PL/PgSQL, this has been a source of a few bugs that would have been noticed during testing if the behaviour of INTO wasn't as dangerous as it is right now. Yes, it breaks backwards compatibility, but that's why there's a nice GUC. If we're not going to scrap PL/PgSQL and start over again, we are going to have to do changes like this to make the language better. Also I think that out of all the things we could do to break backwards compatibility, this is closer to harmless than a pain in the butt. Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql.consistent_into
On 01/13/2014 03:41 PM, Florian Pflug wrote: It therefor isn't an oversight that SELECT ... INTO allows multiple result rows but INSERT/UPDATE/DELETE forbids them, it's been done that way on purpose and for a reason. We shouldn't be second-guessing ourselves by changing that later - not, at least, unless we have a *very* good reason for it. Which, AFAICS, we don't. (And yeah, personally I'd prefer if we'd complain about multiple rows. But it's IMHO just too late for that) I *really* don't want to go through all my old code to find places where I used SELECT ... INTO just to pop off the first row, and ignored the rest. I doubt anyone else does, either. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance
On 01/13/2014 09:53 PM, Trond Myklebust wrote: On Jan 13, 2014, at 15:40, Andres Freund and...@2ndquadrant.com wrote: On 2014-01-13 15:15:16 -0500, Robert Haas wrote: On Mon, Jan 13, 2014 at 1:51 PM, Kevin Grittner kgri...@ymail.com wrote: I notice, Josh, that you didn't mention the problems many people have run into with Transparent Huge Page defrag and with NUMA access. Amen to that. Actually, I think NUMA can be (mostly?) fixed by setting zone_reclaim_mode; is there some other problem besides that? I think that fixes some of the worst instances, but I've seen machines spending horrible amounts of CPU ( BUS) time in page reclaim nonetheless. If I analyzed it correctly it's in RAM working set workloads where RAM is pretty large and most of it is used as page cache. The kernel ends up spending a huge percentage of time finding and potentially defragmenting pages when looking for victim buffers. On a related note, there's also the problem of double-buffering. When we read a page into shared_buffers, we leave a copy behind in the OS buffers, and similarly on write-out. It's very unclear what to do about this, since the kernel and PostgreSQL don't have intimate knowledge of what each other are doing, but it would be nice to solve somehow. I've wondered before if there wouldn't be a chance for postgres to say my dear OS, that the file range 0-8192 of file x contains y, no need to reread and do that when we evict a page from s_b but I never dared to actually propose that to kernel people... O_DIRECT was specifically designed to solve the problem of double buffering between applications and the kernel. Why are you not able to use that in these situations? What is asked is the opposite of O_DIRECT - the write from a buffer inside postgresql to linux *buffercache* and telling linux that it is the same as what is currently on disk, so don't bother to write it back ever. This would avoid current double-buffering between postgresql and linux buffer caches while still making use of linux cache when possible. The use case is pages that postgresql has moved into its buffer cache but which it has not modified. They will at some point be evicted from the postgresql cache, but it is likely that they will still be needed sometime soon, so what is required is writing them back to the original file, only they should not really be written - or marked dirty to be written later - more levels than just to the linux cache, as they *already* are on the disk. It is probably ok to put them in the LRU position as they are written out from postgresql, though it may be better if we get some more control over where in the LRU order they would be placed. It may make sense to put them there based on when they were last read while residing inside postgresql cache Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disallow arrays with non-standard lower bounds
On 01/14/2014 12:33 AM, Craig Ringer wrote: On 01/14/2014 12:40 AM, Merlin Moncure wrote: On Sun, Jan 12, 2014 at 4:38 AM, Craig Ringer cr...@2ndquadrant.com wrote: Implicit casts to text, anybody? This backward compatibility break orphaned the company I work for on 8.1 until last year and very nearly caused postgres to be summarily extirpated (only rescued at the last minute by my arrival). You're far from the only one, too. Until last year I was still seeing people saying they can't upgrade because of this. OTOH, that was a sudden and drastic change, with no BC switch like the removal of implicit joins had, that affected wide swaths of code. Lets not do that again. Removal of lower bounds for arrays is unlikely to even get noticed by the vast majority of users, and can be done progressively with BC features. The real issue IMO is how to get those few to stop using it so it can be truly removed. Past experience has shown that people just turn the compatibility flag on and forget they're using the deprecated feature. If there are warnings they'll silence them in their application and still forget they're using the deprecated feature. If there are log messages, they'll probably turn logging detail down to hide them and STILL forget they're using the deprecated feature. Then whine about it three years later when it gets removed. So I guess the question is: Is it worth all that hassle to remove a misfeature you have to go out of your way to use? Is support for non-1 lower bounds stopping us from doing something useful and important? Or is it just an irritation that it exists? Let's just add user defined operator for '[]' (weirdly-positioned but 2 argument, almost infix :) ) and add that to JSON arrays to get 0-based ones into poastgresq ;) Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Syntax of INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On 1/13/14, 5:05 PM, Peter Geoghegan wrote: On Mon, Jan 13, 2014 at 2:20 PM, Jim Nasby j...@nasby.net wrote: Well, a common case for INSERT RETURNING is to get your set of surrogate keys back; so I think users would want the ability to RETURN what finally made it into the table. Your update can also have a RETURNING clause. I'm not necessarily that attached to fully generalizing RETURNING REJECTS as REJECTING. It was just an idea. When an insert is rejected and you lock a conflicting row, it hardly matters what your surrogate key might have been had that insert succeeded. To get the surrogate key when it upsert inserts, do a regular INSERTRETURNING..., and break the work up into multiple commands. That will almost always be sufficient, because you'll almost always know ahead of time where the conflict might be (certainly, the MySQL feature mandates that you do know). As long as there's a way to get back what was ultimately inserted or updated that'd work... there might be some cases where you'd actually want to know what the result of the REJECTING command was (ie: did the update do something fancy?). Actually, you'd also want to know if triggers did anything. So we definitely want to keep the existing RETURNING behavior (sorry, I don't know offhand if you've kept that or not). Also, if we want to support the case of identifying tuples where a BEFORE trigger disallowed the insert, we probably want to expose that that's why those tuples were rejected (as opposed to them being rejected due to a duplicate key violation). The ctid *won't* indicate a specific rejecting row then, I guess, which will do it. Yeah, the only other thing you might want is the name of the trigger that returned NULL... that would allow you to do something different based on which trigger it was. Part of me thinks that'd be useful... part of me thinks it's just a foot-gun... -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql.consistent_into
On Jan14, 2014, at 00:52 , Marko Tiikkaja ma...@joh.to wrote: When I've worked with PL/PgSQL, this has been a source of a few bugs that would have been noticed during testing if the behaviour of INTO wasn't as dangerous as it is right now. The question is, how many bugs stemmed from wrong SQL queries, and what percentage of those would have been caught by this? The way I see it, there are thousands of ways to screw up a query, and having it return multiple rows instead of one is just one of them. Yes, it breaks backwards compatibility, but that's why there's a nice GUC. Which doesn't help, because the GUC isn't tied to the code. This *adds* an error case, not remove one - now, instead of getting your code correct, you *also* have to get the GUC correct. If you even *know* that such a GUC exists. If we're not going to scrap PL/PgSQL and start over again, we are going to have to do changes like this to make the language better. Also I think that out of all the things we could do to break backwards compatibility, this is closer to harmless than a pain in the butt. I very strongly believe that languages don't get better by adding a thousand little knobs which subtly change semantics. Look at the mess that is PHP - we absolutely, certainly don't want to go there. The most important rule in language design is in my opinion stick with your choices. C, C++ and JAVA all seem to follow this, and it's one of the reasons these languages are popular for big projects, I think. The way I see it, the only OK way to change existing behaviour is to have the concept of a language version, and force code to indicate the language version it expects. The important thing is that the language version is an attribute of code, not some global setting that you can change without ever looking at the code it'd affect. So if we really want to change this, I think we need to have a LANGUAGE_VERSION attribute on functions. Each time a major postgres release changes the behaviour of one of the procedural languages, we'd increment that language's version, and enable the old behaviour for all functions tagged with an older one. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] nested hstore patch
On Mon, January 13, 2014 18:30, Andrew Dunstan wrote: On 01/13/2014 11:16 AM, Oleg Bartunov wrote: Andrew, did you run perl script ? Actually, I found, that operator table needs to be fixed. No. My build machine doesn't actually have DBD::Pg installed. Can you send me a patch if you don't want to push it yourself, or maybe Erik can send a pacth top adjust the table. [ nested_hstore_and_jsonb-2.patch ] ( centos 6.5, gcc 4.8.2. ) The patch applies compiles with warnings (see below). The opr_sanity test fails during make check: regression.diffs attached. Also attached are changes to hstore.sgml, to operator + functions table, plus some typos. Thanks, Erik Rijkers make jsonfuncs.c: In function each_object_field_end_jsonb: jsonfuncs.c:1328:7: warning: assignment from incompatible pointer type [enabled by default] val = DatumGetPointer(DirectFunctionCall1(jsonb_in, CStringGetDatum(cstr))); ^ jsonfuncs.c: In function elements_array_element_end_jsonb: jsonfuncs.c:1530:8: warning: assignment from incompatible pointer type [enabled by default] jbval = DatumGetPointer(DirectFunctionCall1(jsonb_in, CStringGetDatum(cstr))); ^ make contrib: hstore_io.c: In function array_to_hstore: hstore_io.c:1694:30: warning: result may be used uninitialized in this function [-Wmaybe-uninitialized] PG_RETURN_POINTER(hstoreDump(result)); regression.diffs Description: Binary data --- doc/src/sgml/hstore.sgml.orig 2014-01-14 00:06:30.070883763 +0100 +++ doc/src/sgml/hstore.sgml 2014-01-14 00:58:53.069334810 +0100 @@ -350,7 +350,7 @@ entrytypetext[]//entry entryget values for keys (literalNULL/ if not present)/entry entryliteral'a=gt;x, b=gt;y, c=gt;z'::hstore -gt; ARRAY['c','a']/literal/entry - entryliteral{z,x}/literal/entry + entryliteral{z,x}/literal/entry /row row @@ -422,7 +422,7 @@ entrytypehstore//entry entrydelete key from left operand/entry entryliteral'a=gt;1, b=gt;2, c=gt;3'::hstore - 'b'::text/literal/entry - entryliterala=gt;1, c=gt;3/literal/entry + entryliterala=gt;1, c=gt;3/literal/entry /row row @@ -438,7 +438,7 @@ entrytypehstore//entry entrydelete keys from left operand/entry entryliteral'a=gt;1, b=gt;2, c=gt;3'::hstore - ARRAY['a','b']/literal/entry - entryliteralc=gt;3/literal/entry + entryliteralc=gt;3/literal/entry /row row @@ -446,14 +446,14 @@ entrytypehstore//entry entrydelete matching pairs from left operand/entry entryliteral'a=gt;1, b=gt;2, c=gt;3'::hstore - 'a=gt;4, b=gt;2'::hstore/literal/entry - entryliterala=gt;1, c=gt;3/literal/entry + entryliterala=gt;1, c=gt;3/literal/entry /row row entrytypehstore/ literal#-/ typetext[]//entry entrytypehstore//entry entrydelete key path from left operand/entry - entryliteral'{a =gt; {b =gt; { c =gt; [1,2]}}}'::hstore #- '[a,b,c,0]'/literal/entry + entryliteral'{a =gt; {b =gt; { c =gt; [1,2]}}}'::hstore #- '{a,b,c,0}'/literal/entry entryliterala=gt;{b=gt;{c=gt;[2]}}/literal/entry /row @@ -525,7 +525,7 @@ entrytypehstore/type/entry entryconstruct an typehstore/ from a record or row/entry entryliteralhstore(ROW(1,2))/literal/entry - entryliteralf1=gt;1,f2=gt;2/literal/entry + entryliteralf1=gt;1,f2=gt;2/literal/entry /row row @@ -534,7 +534,7 @@ entryconstruct an typehstore/ from an array, which may be either a key/value array, or a two-dimensional array/entry entryliteralhstore(ARRAY['a','1','b','2']) || hstore(ARRAY[['c','3'],['d','4']])/literal/entry - entryliterala=gt;1, b=gt;2, c=gt;3, d=gt;4/literal/entry + entryliterala=gt;1, b=gt;2, c=gt;3, d=gt;4/literal/entry /row row @@ -707,7 +707,7 @@ entrytypehstore/type/entry entryextract a subset of an typehstore//entry entryliteralslice('a=gt;1,b=gt;2,c=gt;3'::hstore, ARRAY['b','c','x'])/literal/entry - entryliteralb=gt;2, c=gt;3/literal/entry + entryliteralb=gt;2, c=gt;3/literal/entry /row row @@ -766,15 +766,15 @@ entryfunctionreplace(hstore,text[],hstore)/functionindextermprimaryreplace/primary/indexterm/entry entrytypehstore/type/entry entryreplace value at the specified path/entry - entryliteralreplace('a=gt;1,b=gt;{c=gt;3,d=gt;[4,5,6]}'::hstore,'[b,d]', '1')/literal/entry - entryliterala=gt;1, b=gt;{c=gt;3, d=gt;}/literal/entry + entryliteralreplace('a=gt;1,b=gt;{c=gt;3,d=gt;[4,5,6]}'::hstore,'{b,d}', '1')/literal/entry + entryliterala=gt;1, b=gt;{c=gt;3, d=gt;1}/literal/entry /row row entryfunctionconcat_path(hstore,text[],hstore)/functionindextermprimaryconcat_path/primary/indexterm/entry entrytypehstore/type/entry entryconcatenate typehstore/ value at the specified path/entry -
Re: [HACKERS] plpgsql.consistent_into
On 1/13/14, 5:57 PM, Josh Berkus wrote: On 01/13/2014 03:41 PM, Florian Pflug wrote: It therefor isn't an oversight that SELECT ... INTO allows multiple result rows but INSERT/UPDATE/DELETE forbids them, it's been done that way on purpose and for a reason. We shouldn't be second-guessing ourselves by changing that later - not, at least, unless we have a *very* good reason for it. Which, AFAICS, we don't. (And yeah, personally I'd prefer if we'd complain about multiple rows. But it's IMHO just too late for that) I *really* don't want to go through all my old code to find places where I used SELECT ... INTO just to pop off the first row, and ignored the rest. I doubt anyone else does, either. Do you regularly have use cases where you actually want just one RANDOM row? I suspect the far more likely scenario is that people write code assuming they'll get only one row and they'll end up with extremely hard to trace bugs if that assumption is ever wrong. -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql.consistent_into
On 1/13/14, 6:16 PM, Florian Pflug wrote: On Jan14, 2014, at 00:52 , Marko Tiikkaja ma...@joh.to wrote: When I've worked with PL/PgSQL, this has been a source of a few bugs that would have been noticed during testing if the behaviour of INTO wasn't as dangerous as it is right now. The question is, how many bugs stemmed from wrong SQL queries, and what percentage of those would have been caught by this? The way I see it, there are thousands of ways to screw up a query, and having it return multiple rows instead of one is just one of them. A query that's simply wrong is more likely to fail consistently. Non-strict use of INTO is going to fail in very subtle ways (unless you actually DO want just the first row, in which case you should explicitly use LIMIT 1). If we're not going to scrap PL/PgSQL and start over again, we are going to have to do changes like this to make the language better. Also I think that out of all the things we could do to break backwards compatibility, this is closer to harmless than a pain in the butt. I very strongly believe that languages don't get better by adding a thousand little knobs which subtly change semantics. Look at the mess that is PHP - we absolutely, certainly don't want to go there. The most important rule in language design is in my opinion stick with your choices. C, C++ and JAVA all seem to follow this, and it's one of the reasons these languages are popular for big projects, I think. The way I see it, the only OK way to change existing behaviour is to have the concept of a language version, and force code to indicate the language version it expects. The important thing is that the language version is an attribute of code, not some global setting that you can change without ever looking at the code it'd affect. So if we really want to change this, I think we need to have a LANGUAGE_VERSION attribute on functions. Each time a major postgres release changes the behaviour of one of the procedural languages, we'd increment that language's version, and enable the old behaviour for all functions tagged with an older one. I like that idea. It allows us to fix past decisions that were ill considered without hosing all existing code. BTW, have we always had support for STRICT, or was it added at some point? It's in 8.4, but I don't know how far back it goes. And if we've always had it, why on earth didn't we make STRICT the default behavior? -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Linux kernel impact on PostgreSQL performance
On Mon, Jan 13, 2014 at 7:36 PM, Mel Gorman mgor...@suse.de wrote: That could be something we look at. There are cases buried deep in the VM where pages get shuffled to the end of the LRU and get tagged for reclaim as soon as possible. Maybe you need access to something like that via posix_fadvise to say reclaim this page if you need memory but leave it resident if there is no memory pressure or something similar. Not exactly sure what that interface would look like or offhand how it could be reliably implemented. I don't see a reason not to make this behavior the default for WONTNEED. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disallow arrays with non-standard lower bounds
On 1/13/14, 5:33 PM, Craig Ringer wrote: So I guess the question is: Is it worth all that hassle to remove a misfeature you have to go out of your way to use? Is support for non-1 lower bounds stopping us from doing something useful and important? Or is it just an irritation that it exists? It's not an irritation to -hackers, but it is an irritation for anyone that cares about data quality, because you're forced to code all of your stuff to always look at array_lower(). Actually, now that I think about it, if you want to be really safe you would actually force your code to use a different lower bound so you're more likely to discover code that's broken. So it really is a big pain for users that know what's going on. And it will become a big pain for users that don't know if they ever accidentally end up with non-1 arrays. :) -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers