Re: [HACKERS] DDL Damage Assessment
Jim Nasby jim.na...@bluetreble.com writes: EXPLAIN ALTER TABLE I'm thinking it would be better to have something you could set at a session level, so you don't have to stick EXPLAIN in front of all your DDL. Yeah I'm coming into that camp too, and I think the Event Trigger idea gets us halfway there. Here's a detailed sketched of how it would work: 1. preparatory steps: install the Event Trigger create extension norewrite; 2. test run: psql -1 -f ddl.sql ERROR: Table Rewrite has been cancelled. 3. Well actually we need to run that thing in production BEGIN; ALTER EVENT TRIGGER norewrite DISABLE; \i ddl.sql ALTER EVENT TRIGGER norewrite ENABLE; COMMIT; Then it's also possible to have another Event Trigger that would automatically issue a LOCK table NOWAIT; command before any DDL against a table is run, in another extension: create extension ddl_lock_nowait; The same applies, if your production rollout is blocked repeatedly and you want to force it through at some point, it's possible to disable the event trigger within the DDL script/transaction. As for the dry-run idea, I don't think that's really necessary. I've never seen anyone serious that doesn't have a development environment, which is where you would simply deploy the real DDL using verbose mode and see what the underlying commands actually do. The major drawback of the Event Trigger idea is that the transaction is cancelled as soon as a Rewrite Event is fired when you have installed the protective trigger. It means that you won't see the next problem after the first one, so it's not a dry-run. But considering what you're saying here, it might well be enough. Regards, -- Dimitri Fontaine06 63 07 10 78 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Promise index tuples for UPSERT
On 10/03/2014 11:07 AM, Simon Riggs wrote: On 1 October 2014 20:54, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 10/01/2014 02:34 PM, Simon Riggs wrote: ... When later insert scans see the promise tuple they perform XactLockTableWait() and when they get control they look again for the key. If they find a promise tuple with an aborted xid they replace that value with their own xid and continue as a). Otherwise b). XactLockTableWait() waits until the end of transaction, that's not you want here. If the backend that inserted the promise tuple decides to not proceed with the insertion, and removes the promise tuple, the backend waiting on it needs to be woken up more or less immediately, not when the transaction completes. There is no remove promise tuple part of the above design. I had this exact same issue in my earlier patch versions, fixed it with a new kind of heavy-weight lock, and a new field in PGPROC (http://www.postgresql.org/message-id/52d00d2d.6030...@vmware.com). That wasn't very pretty, but it got the job done. Some other design might be better. Currently, if some fool developer decides to insert rows that already duplicate values in the table, then currently he inserts a heap row, then sees a potential conflict in the index and waits for the conflicting xact to end. His fool insert, his wait. That's how btree does this now. I don't see any reason why we need to do better for Upsert. If the row already exists we need to be able to quickly change into an update, but we still only support one write xact at a time. That lowers the bar from what I thought everyone agreed on. Namely, if two backends run a similar UPSERT command concurrently on a table that has more than one unique constraint, they might deadlock, causing one of them to throw an error instead of INSERTing or UPDATEing anything. I'm sure that's useful enough in many applications, but I'd like to have a more robust implementation. The shorter we can keep the list of caveats, the better. The value in the index needs to be protected by a block level lock, so we can check it quickly, but the eventual heap work is serialized by transactional semantics. I think a little perspective is due here and we should stick to the main use case, not cater for bizarre edge cases. I'm trying to bisect your thoughts on exactly what use cases you think we must support, and which ones you consider bizarre edge cases, and what exactly is acceptable behavior in those edge cases. What we are looking for is something that can decide whether it is an insert or an update and progress quickly with that. It needs to be correct, but there is no requirement to be faster than currently possible, in the case of concurrency. I believe we all agree on that. Any form of tuple locking that uses the general lock manager will not be usable. I can't see it is worth the overhead of doing that to protect against deadlocks that would only be experienced by people doing foolish things. Maybe, maybe not, but let's define the acceptable behavior first, and think about the implementation second. I'm pretty sure all of the approaches discussed so far can be made fast enough, and the bloat issues can be made small enough, that it doesn't matter much which one we choose from a performance point of view. The differences are in what use cases they can support, and the maintainability of the code. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to make ResourceOwnerForgetBuffer() O(1), instead of O(N^2) scale
On 10/03/2014 07:08 AM, Kouhei Kaigai wrote: Hello, I recently got a trouble on development of my extension that utilizes the shared buffer when it released each buffer page. This extension transfers contents of the shared buffers to GPU device using DMA feature, then kicks a device kernel code. Wow, that sounds crazy. Once backend/extension calls ReadBuffer(), resowner.c tracks which buffer was referenced by the current resource owner, to ensure these buffers being released at end of the transaction. However, it seems to me implementation of resowner.c didn't assume many buffers are referenced by a particular resource owner simultaneously. It manages the buffer index using an expandable array, then looks up the target buffer by sequential walk but from the tail because recently pinned buffer tends to be released first. It made a trouble in my case. My extension pinned multiple thousands buffers, so owner-buffers[] were enlarged and takes expensive cost to walk on. In my measurement, ResourceOwnerForgetBuffer() takes 36 seconds in total during hash-joining 2M rows; even though hash-joining itself takes less than 16 seconds. What is the best way to solve the problem? How about creating a separate ResourceOwner for these buffer pins, and doing a wholesale ResourceOwnerRelease() on it when you're done? Let me clarify your idea. 1. Create a separate ResourceOwner under the CurrentResourceOwner. 2. Switch CurrentResourceOwner to the self-constructed resource owner during ReadBuffer() on thousands buffers. 3. Switch back to the original CurrentResourceOwner. 4. Kick DMA and run GPU device kernel (actual job running...) 5. Switch CurrentResourceOwner to the self-constructed resource owner again, during ReleaseBuffer() in reverse order. 6. Switch back to the original CurrentResourceOwner, then calls ResourceOwnerDelete(). Hmm... at this moment, I cannot find something not easy to implement. I'd like to try this idea, then report it later. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to make ResourceOwnerForgetBuffer() O(1), instead of O(N^2) scale
On 2014-10-03 10:35:42 +0300, Heikki Linnakangas wrote: On 10/03/2014 07:08 AM, Kouhei Kaigai wrote: Hello, I recently got a trouble on development of my extension that utilizes the shared buffer when it released each buffer page. This extension transfers contents of the shared buffers to GPU device using DMA feature, then kicks a device kernel code. Wow, that sounds crazy. Agreed. I doubt that pinning that many buffers is a sane thing to do. At the very least you'll heavily interfere with vacuum and such. My assumption is, this extension is used to handle OLAP type workload, thus relatively less amount of write traffic to the database. Sorry, I missed to mention about. Once backend/extension calls ReadBuffer(), resowner.c tracks which buffer was referenced by the current resource owner, to ensure these buffers being released at end of the transaction. However, it seems to me implementation of resowner.c didn't assume many buffers are referenced by a particular resource owner simultaneously. It manages the buffer index using an expandable array, then looks up the target buffer by sequential walk but from the tail because recently pinned buffer tends to be released first. It made a trouble in my case. My extension pinned multiple thousands buffers, so owner-buffers[] were enlarged and takes expensive cost to walk on. In my measurement, ResourceOwnerForgetBuffer() takes 36 seconds in total during hash-joining 2M rows; even though hash-joining itself takes less than 16 seconds. What is the best way to solve the problem? How about creating a separate ResourceOwner for these buffer pins, and doing a wholesale ResourceOwnerRelease() on it when you're done? Or even just unpinning them in reverse order? That should already fix the performance issues? In case when multiple chunks (note: a chunk contains thousands buffers as a unit of device kernel execution) are running asynchronously, order of GPU job's completion is not predictable. So, it does not help my situation if one resource-owner tracks all the buffers. Probably, Heikki suggested to create a separate resource-owner per chunk. In this case, all the buffers in a particular chunk shall be released on the same time, so ReleaseBuffer() in reverse order makes sense. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Promise index tuples for UPSERT
On Fri, Oct 3, 2014 at 2:03 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: That lowers the bar from what I thought everyone agreed on. Namely, if two backends run a similar UPSERT command concurrently on a table that has more than one unique constraint, they might deadlock, causing one of them to throw an error instead of INSERTing or UPDATEing anything. It lowers the bar to a level that I am not willing to limbo dance under. You don't even need two unique constraints. Nothing as complicated as that is required. When this happens with MySQL, they have the good sense to call it a bug [1], and even fix it. I find the comparison with conventional insertion entirely unconvincing. I'm sure that's useful enough in many applications, but I'd like to have a more robust implementation. The shorter we can keep the list of caveats, the better. INSERT and UPDATE are supposed to be fairly well balanced here. Conflicts are the norm. The value in the index needs to be protected by a block level lock, so we can check it quickly, but the eventual heap work is serialized by transactional semantics. I think a little perspective is due here and we should stick to the main use case, not cater for bizarre edge cases. I'm trying to bisect your thoughts on exactly what use cases you think we must support, and which ones you consider bizarre edge cases, and what exactly is acceptable behavior in those edge cases. Lots of concurrency is not an edge-case. Any form of tuple locking that uses the general lock manager will not be usable. I can't see it is worth the overhead of doing that to protect against deadlocks that would only be experienced by people doing foolish things. Maybe, maybe not, but let's define the acceptable behavior first, and think about the implementation second. +1. Updating a lot with UPSERT is not foolish. That's all it took to make earlier prototypes deadlock. I'm pretty sure all of the approaches discussed so far can be made fast enough, and the bloat issues can be made small enough, that it doesn't matter much which one we choose from a performance point of view. The differences are in what use cases they can support, and the maintainability of the code. +1 What do we get for giving up on not having unprincipled deadlocks here? What's the advantage? Assuming that this is a bizarre edge-case (note: it isn't), what do we get in return for giving up on fixing it? [1] MySQL bug #52020 -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Join push-down support for foreign tables
2014-09-08 8:07 GMT+09:00 Shigeru HANADA shigeru.han...@gmail.com: (2014/09/04 21:37), Robert Haas wrote: On Wed, Sep 3, 2014 at 5:16 AM, Probably both the initial cost and final cost calculations should be delegated to the FDW, but maybe within postgres_fdw, the initial cost should do only the work that can be done without contacting the remote server; then, let the final cost step do that if appropriate. But I'm not entirely sure what is best here. Agreed. I'll design planner API along that way for now. I tried some patterns of implementation but I've not gotten feasible way yet. So I'd like to hear hackers' idea. * Foreign join hook point First I thought that following existing cost estimating manner is the way to go, but I tend to think it doesn't fit foreign joins because join method is tightly-coupled to sort-ness, but foreign join would not. In current planner, add_paths_to_joinrel is conscious of sort-ness, and functions directly called from it are conscious of join method. But this seems not fit the abstraction level of FDW. FDW is highly abstracted, say differ from custom plan providers, so most of work should be delegated to FDW, including pathkeys consideration, IMO. Besides that, order of join consideration is another issue. First I try to add foreign join consideration at the last (after hash join consideration), but after some thought I noticed that early-elimination would work better if we try foreign join first, because in most cases foreign join is the cheapest way to accomplish a join between two foreign relations. So currently I'm thinking that delegating whole join consideration to FDWs before other join consideration in add_paths_to_joinrel, by calling new FDW API would be promising. This means that FDWs can add multiple arbitrary paths to RelOptInfo in a call. Of course this allows FDWs to do round-trip per path, but it would be optimization issue, and they can compare their own candidates they can get without round-trip. * Supported join types INNER and (LEFT|RIGHT|FULL) OUTER would be safe to push down, even though some of OUTER JOIN might not be much faster than local join. I'm not sure that SEMI and ANTI joins are safe to push-down. Can we leave the matter to FDWs, or should we forbid FDWs pushing down by not calling foreign join API? Anyway SEMI/ANTI would not be supported in the first version. * Blockers of join push-down Pushing down join means that foreign scans for inner and outer are skipped, so some elements blocks pushing down. Basically the criteria is same as scan push-down and update push-down. After some thoughts, we should check only unsafe expression in join qual and restrict qual. This limitation is necessary to avoid difference between results of pushe-down or not. Target list seems to contain only Var for necessary columns, but we should check that too. * WIP patch Attached is WIP patch for reviewing the design. Works should be done are 1) judging push-down or not, and 2) generating join SQL. For 2), I'm thinking about referring Postgres-XC's join shipping mechanism. Any comments or questions are welcome. -- Shigeru HANADA join_pushdown_wip.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Promise index tuples for UPSERT
On 3 October 2014 10:03, Heikki Linnakangas hlinnakan...@vmware.com wrote: That lowers the bar from what I thought everyone agreed on. Namely, if two backends run a similar UPSERT command concurrently on a table that has more than one unique constraint, they might deadlock, causing one of them to throw an error instead of INSERTing or UPDATEing anything. Now we get to a productive discussion, this is good. When we first make requirements, obviously everyone agrees a long list of things since there is initially not much reason to say No to it. As we go towards implementation we begin to understand the true price of meeting each requirement. It was good that this detail was raised and sensible to attempt to avoid unprincipled deadlocks. If the price of avoiding them is high, it is worth reconsidering how important that is. My view is that I can't see the above use case from happening in real situations, except by infrequent mistake. In most cases, unique indexes represent some form of object identity and those don't change frequently in the real world. So to be changing two unique fields at the same time and it not representing some form of business process error that people would like to see fail anyway, I'd be surprised by. If someone has an example of that in a real, common case then I would like to see it and I would revise my view accordingly We are frequently hampered by trying to design something that can sing and dance at the same time. That thought is exactly how we are looking at upsert now, not merge. So trimming our objectives to what makes sense is an accepted part of this project already. Any form of tuple locking that uses the general lock manager will not be usable. I can't see it is worth the overhead of doing that to protect against deadlocks that would only be experienced by people doing foolish things. Maybe, maybe not, but let's define the acceptable behavior first, and think about the implementation second. Hand in hand, I think, given the other constraints of time, review, maintainability etc.. I'm pretty sure all of the approaches discussed so far can be made fast enough, and the bloat issues can be made small enough, that it doesn't matter much which one we choose from a performance point of view. The differences are in what use cases they can support, and the maintainability of the code. The discussion of approaches has up to now focused only on what impossibities exist, with a we must do this because feature A can't do aspect X. I haven't yet seen much discussion of maintainability of code, but I would guess simpler is better, overall. Realistically, I won't be coding any separate approaches, so this is down to Peter and maybe yourself Heikki. I hope only to avoid foreclosing viable and simple approaches for the wrong reasons. There are many other considerations that make up the final view. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Promise index tuples for UPSERT
On Fri, Oct 3, 2014 at 2:50 AM, Simon Riggs si...@2ndquadrant.com wrote: My view is that I can't see the above use case from happening in real situations, except by infrequent mistake. In most cases, unique indexes represent some form of object identity and those don't change frequently in the real world. So to be changing two unique fields at the same time and it not representing some form of business process error that people would like to see fail anyway, I'd be surprised by. Are we talking about two different things here? Unprincipled deadlocks can be seen without updating any constrained column in the UPSERT. The test-case that originally highlighted the issue only had one unique index, and it was *not* in the update's targetlist. See: https://wiki.postgresql.org/wiki/Value_locking#.22Unprincipled_Deadlocking.22_and_value_locking -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Promise index tuples for UPSERT
On 3 October 2014 10:32, Peter Geoghegan p...@heroku.com wrote: On Fri, Oct 3, 2014 at 2:03 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: That lowers the bar from what I thought everyone agreed on. Namely, if two backends run a similar UPSERT command concurrently on a table that has more than one unique constraint, they might deadlock, causing one of them to throw an error instead of INSERTing or UPDATEing anything. It lowers the bar to a level that I am not willing to limbo dance under. You don't even need two unique constraints. Nothing as complicated as that is required. When this happens with MySQL, they have the good sense to call it a bug [1], and even fix it. I find the comparison with conventional insertion entirely unconvincing. Is there a test case that demonstrates the problem? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Promise index tuples for UPSERT
On Fri, Oct 3, 2014 at 3:04 AM, Simon Riggs si...@2ndquadrant.com wrote: Is there a test case that demonstrates the problem? Yes. See my e-mail to Heikki here: http://www.postgresql.org/message-id/cam3swzshbe29kpod44cvc3vpzjgmder6k_6fghiszeozgmt...@mail.gmail.com Testcase is attached. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Promise index tuples for UPSERT
On 3 October 2014 10:57, Peter Geoghegan p...@heroku.com wrote: On Fri, Oct 3, 2014 at 2:50 AM, Simon Riggs si...@2ndquadrant.com wrote: My view is that I can't see the above use case from happening in real situations, except by infrequent mistake. In most cases, unique indexes represent some form of object identity and those don't change frequently in the real world. So to be changing two unique fields at the same time and it not representing some form of business process error that people would like to see fail anyway, I'd be surprised by. Are we talking about two different things here? Unprincipled deadlocks can be seen without updating any constrained column in the UPSERT. The test-case that originally highlighted the issue only had one unique index, and it was *not* in the update's targetlist. See: https://wiki.postgresql.org/wiki/Value_locking#.22Unprincipled_Deadlocking.22_and_value_locking I followed that to a wiki page, then clicked again to an old email. No test case. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Promise index tuples for UPSERT
On 10/03/2014 01:05 PM, Peter Geoghegan wrote: On Fri, Oct 3, 2014 at 3:04 AM, Simon Riggs si...@2ndquadrant.com wrote: Is there a test case that demonstrates the problem? Yes. See my e-mail to Heikki here: http://www.postgresql.org/message-id/cam3swzshbe29kpod44cvc3vpzjgmder6k_6fghiszeozgmt...@mail.gmail.com Testcase is attached. Simon's approach would actually pass that test case just fine. It inserts the (promise) index tuple first, and heap tuple only after that. It will fail the test case with more than one unique index, however. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GiST splitting on empty pages
On 10/03/2014 05:03 AM, Andrew Gierth wrote: This is from Bug #11555, which is still in moderation as I type this (analysis was done via IRC). The GiST insertion code appears to have no length checks at all on the inserted entry. index_form_tuple checks for length = 8191, with the default blocksize, but obviously a tuple less than 8191 bytes may still not fit on the page due to page header info etc. However, the gist insertion code assumes that if the tuple doesn't fit, then it must have to split the page, with no check whether the page is already empty. [script to reproduce] Thanks for the analysis! Suggested fixes (probably all of these are appropriate): 1. gistSplit should have check_stack_depth() 2. gistSplit should probably refuse to split anything if called with only one item (which is the value being inserted). 3. somewhere before reaching gistSplit it might make sense to check explicitly (e.g. in gistFormTuple) whether the tuple will actually fit on a page. 4. pg_trgm probably should do something more sensible with large leaf items, but this is a peripheral issue since ultimately the gist core must enforce these limits rather than rely on the opclass. Fixed, I did 1. and 2. Number 3. would make a lot of sense, but I couldn't totally convince myself that we can safely put the check in gistFormTuple() without causing some cases to fail that currently work. I think the code sometimes forms tuples that are never added to the insert as such, used only to union them to existing tuples on internal pages. Or maybe not, but in any case, the check in gistSplit() is enough to stop the crash. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TAP test breakage on MacOS X
On Thu, Oct 2, 2014 at 10:08 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: make check-world dies ingloriously for me, like this: FWIW, it works fine for me on my Mac laptop, using the Perl 5.16.2 that comes standard with OSX 10.9.5. I did have to install IPC::Run from CPAN though. # Failed test 'initdb options handling' # at /opt/local/lib/perl5/5.12.5/Test/Builder.pm line 229. This output seems to be pretty clear proof that you're not using Apple's Perl. What is it exactly (where did you get it from)? Also, noticing that what you're using is evidently Perl 5.12, I'm wondering whether our TAP test scripts require a fairly new Perl version. I recall some of my Salesforce colleagues griping that the TAP scripts didn't work with older Perls. I installed it via MacPorts. [rhaas pgsql]$ which perl /opt/local/bin/perl [rhaas pgsql]$ perl -v This is perl 5, version 12, subversion 5 (v5.12.5) built for darwin-thread-multi-2level Copyright 1987-2012, Larry Wall Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5 source kit. Complete documentation for Perl, including FAQ lists, should be found on this system using man perl or perldoc perl. If you have access to the Internet, point your browser at http://www.perl.org/, the Perl Home Page. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog and replication slots
On 2014-10-03 10:30:19 +0900, Michael Paquier wrote: On Thu, Oct 2, 2014 at 12:44 AM, Andres Freund and...@2ndquadrant.com wrote: I pushed the first part. Thanks. Attached is a rebased version of patch 2, implementing the actual feature. One thing I noticed with more testing is that if --create is used and that the destination folder does not exist, pg_receivexlog was creating the slot, and left with an error. This does not look user-friendly so I changed the logic a bit to check for the destination folder before creating any slot. This results in a bit of refactoring, but this way behavior is more intuitive. Ok. para +applicationpg_receivexlog/application can run in one of two following +modes, which control physical replication slot: I don't think that's good enough. There's also the important mode where it's not doing --create/--drop at all. + /* + * Run IDENTIFY_SYSTEM so we can get the timeline and current xlog + * position. + */ + if (!RunIdentifySystem(conn, NULL, NULL, NULL, db_name)) + disconnect_and_exit(1); + + /* + * Check that there is a database associated with connection, none + * should be defined in this context. + */ + if (db_name) + { + fprintf(stderr, + _(%s: database defined for replication connection \%s\\n), + progname, replication_slot); + disconnect_and_exit(1); + } I don't like 'defined' here. 'replication connection unexpectedly is database specific' or something would be better. I do wonder whether --create/--drop aren't somewhat wierd for pg_receivexlog. It's not that clear what it means. It'd be ugly, but we could rename them --create-slot/drop-slot. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixed xloginsert_locks for 9.4
On 2014-10-02 20:08:33 -0400, Greg Smith wrote: I did a fair dive into double-checking the decision to just leave xloginsert_locks fixed at 8 for 9.4. My conclusion: good call, move along. Further improvements beyond what the 8-way split gives sure are possible. But my guess from chasing them a little is that additional places will pop up as things that must also be tweaked, before you'll see those gains turn significant. Thanks for doing this. I'd like to see that box re-opened at one point. But if we do that, I'm comfortable that could end with a xloginsert_locks that tunes itself reasonably on large servers in the end, similar to wal_buffers. There's nothing about this that makes feel like it needs a GUC. I barely needed an exposed knob to do this evaluation. = Baseline = I rolled back a few commits to just before the GUC was removed and tested against that point in git time. Starting with the 4 client test case Heikki provided, the fastest runs on my 24 core server looked like this: tps = 56.691855 (including connections establishing) Repeat runs do need to drop the table and rebuild, because eventually AV kicks in on things in a big way, and then your test is toast until it's done. Attached is what I settled on for a test harness. Nothing here was so subtle I felt a more complicated harness was needed. Standard practice for me is to give pgbench more workers when worrying about any scalability tests. That gives a tiny improvement, to where this is typical with 4 clients and 4 workers: tps = 60.942537 (including connections establishing) Increasing to 24 clients plus 24 workers gives roughly the same numbers, suggesting that the bottleneck here is certainly not the client count, and that the suggestion of 4 was high enough: tps = 56.731581 (including connections establishing) Decreasing xloginsert_locks to 1, so back to the original problem, the rate normally looks like this instead: tps = 25.384708 (including connections establishing) So the big return you get just fine with the default tuning; great. I'm happy to see it ship like this as good enough for 9.4. = More locks = For the next phase, I stuck to 24 clients and 24 workers. If I then bump up xloginsert_locks to something much larger, there is an additional small gain to be had. With 24 locks, so basically ever client has their own, instead of 57-60 TPS, I managed to get as high as this: tps = 66.790968 (including connections establishing) However, the minute I get into this territory, there's an obvious bottleneck shift going on in there too. The rate of creating new checkpoint segments becomes troublesome as one example, with messages like this: LOG: checkpoints are occurring too frequently (1 second apart) HINT: Consider increasing the configuration parameter checkpoint_segments. When 9.4 is already giving a more than 100% gain on this targeted test case, I can't see that chasing after maybe an extra 10% is worth having yet another GUC around. Especially when it will probably take multiple tuning steps before you're done anyway; we don't really know the rest of them yet; and when we do, we probably won't need a GUC to cope with them in the end anyway. I've modified the test slightly, by having the different backends insert into different relations. Even on my measly 5 year old workstation I *do* see quite a bit more than 10%. psql -f /tmp/prepare.sql pgbench -n -f /tmp/fooinsert.sql -c 64 -j 64 -T 10 on a 2x E5520 server (2 sockets a 4 cores a 2 threads) with the following configuration: -c shared_buffers=2GB -c wal_level=hot_standby -c full_page_writes=off -c checkpoint_segments=400 -c fsync=off (io system here is abysmally bad) -c synchronous_commit=off #define NUM_XLOGINSERT_LOCKS 1 tps = 52.711939 (including connections establishing) #define NUM_XLOGINSERT_LOCKS 8 tps = 286.496054 (including connections establishing) #define NUM_XLOGINSERT_LOCKS 16 tps = 346.113313 (including connections establishing) #define NUM_XLOGINSERT_LOCKS 24 tps = 363.242111 (including connections establishing) I'd not be surprised at all if you'd see bigger influence on a system with 4 sockets. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services insert into foo_:client_id select g from generate_series(1, 1) g; CREATE OR REPLACE FUNCTION exec(text) returns text language plpgsql volatile AS $f$ BEGIN EXECUTE $1; RETURN $1; END; $f$; \o /dev/null SELECT exec('drop table if exists foo_'||g.i||'; create table foo_'||g.i||'(id int4);') FROM generate_series(1, 64) g(i); \o CHECKPOINT; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DDL Damage Assessment
On Thu, Oct 2, 2014 at 5:22 PM, Stephen Frost sfr...@snowman.net wrote: * Andres Freund (and...@2ndquadrant.com) wrote: On 2014-10-02 17:03:59 -0400, Stephen Frost wrote: That sounds extremely complex. You'd have to implement the fake columns, foreign keys, indexes, etc on most execution nodes, the planner, and even system views. Eh? We have MVCC catalog access. So you want to modify the catalog without actually doing the corresponding actions? That'll be heck of invasive. With changes all over the backend. We'll need to remove error checks (like for the existance of relfilenodes), remove rewriting, and such. Yeah, I was getting at it being rather invasive earlier. It really depends on exactly what we'd support in this mode, which would depend on just what would be invasive and what wouldn't, I expect. I dislike the idea of not being able to actually run a real migration script though as anything else opens the very real possibility that the real script and the 'explain' script don't do the same thing, making this capability not nearly as useful.. I think this is the real issue. Somebody's got to come up with an infrastructure for reporting locks to be taken without actually performing the corresponding actions, and it's got to go through basically the same code paths as it would if we really did it. That's going to be complicated. But if we make the EXPLAIN code completely separate from the real code, it will be easy for them to get out of sync with each other, and then the answers will be wrong. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DDL Damage Assessment
* Jim Nasby (jim.na...@bluetreble.com) wrote: I'm thinking it would be better to have something you could set at a session level, so you don't have to stick EXPLAIN in front of all your DDL. Right, I'm agreed there. As for the dry-run idea, I don't think that's really necessary. I've never seen anyone serious that doesn't have a development environment, which is where you would simply deploy the real DDL using verbose mode and see what the underlying commands actually do. That's certainly an interesting point and perhaps what we'd do is, instead, have a collect info on locks needed mode- but otherwise, let everything run as-is. You could then take the report at the end of the transaction and use it to identify what would be needed in production and maybe even have a script created which grabs all the locks using 'nowait' or fails the whole thing if it isn't possible.. Of course, we kind of have that already... Just look at the locks you've acquired at the end of the transaction.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] WAL format and API changes (9.5)
On 2014-10-03 15:51:37 +0300, Heikki Linnakangas wrote: After a lot of experimentation, I figured out that the slowdown is already apparent with the *first* patch, the one that just refactors existing XLogInsert, without any changes to the WAL format or to the callers. I fiddled with that for a long time, trying to tease apart the change that makes the difference, and was finally able to narrow it down. Attached are two patches. These are both just refactoring, with no changes to the WAL format or APIs. The first is the same as the refactoring patch I posted earlier, with only minor changes to #includes, comments and such (per Alvaro's and Michael's suggestions - thanks!). With the first patch, the test case I've been using to performance test this becomes somewhere between 5% - 10% slower. Applying the second patch on top of that restores the performance back to what you get without these patches. Strange. The second patch moves the CRC calculation from a separate loop over the XLogRecDatas to the earlier loop, that also iterates through the XLogRecDatas. The strange thing about this is that when I tried to make the same change to current git master, without applying the first patch, it didn't make any difference. The CRC calculation used to integrated to the earlier loop in 9.1 and before, but in 9.2 it was moved to a separate loop for simplicity, because it didn't make any difference to performance. So I now have a refactoring patch ready that I'd like to commit (the attached two patches together), but to be honest, I have no idea why the second patch is so essential to performance. Maybe a stupid question, but did you verify it's not caused by splitting xloginsert over two translation units and/or not inlining the separate function? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL format and API changes (9.5)
On 10/03/2014 04:10 PM, Andres Freund wrote: On 2014-10-03 15:51:37 +0300, Heikki Linnakangas wrote: After a lot of experimentation, I figured out that the slowdown is already apparent with the *first* patch, the one that just refactors existing XLogInsert, without any changes to the WAL format or to the callers. I fiddled with that for a long time, trying to tease apart the change that makes the difference, and was finally able to narrow it down. Attached are two patches. These are both just refactoring, with no changes to the WAL format or APIs. The first is the same as the refactoring patch I posted earlier, with only minor changes to #includes, comments and such (per Alvaro's and Michael's suggestions - thanks!). With the first patch, the test case I've been using to performance test this becomes somewhere between 5% - 10% slower. Applying the second patch on top of that restores the performance back to what you get without these patches. Strange. The second patch moves the CRC calculation from a separate loop over the XLogRecDatas to the earlier loop, that also iterates through the XLogRecDatas. The strange thing about this is that when I tried to make the same change to current git master, without applying the first patch, it didn't make any difference. The CRC calculation used to integrated to the earlier loop in 9.1 and before, but in 9.2 it was moved to a separate loop for simplicity, because it didn't make any difference to performance. So I now have a refactoring patch ready that I'd like to commit (the attached two patches together), but to be honest, I have no idea why the second patch is so essential to performance. Maybe a stupid question, but did you verify it's not caused by splitting xloginsert over two translation units and/or not inlining the separate function? Hmph, now that I try this again, I don't see any difference with or without the second patch - they both perform just as well as unpatched master. So I'm totally baffled again, but I guess the patch is good to go. I need a more stable test environment.. Yeah, I did try merging xlog.c and xloginsert.c into the same .c file at some point, and marking the XLogInsertRecData function as static, allowing the compiler to inline it, but I didn't see any difference. But you have to take that with a grain of salt - I'm apparently not capable of constructing a properly repeatable test case for this. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixed xloginsert_locks for 9.4
On 10/3/14, 8:26 AM, Andres Freund wrote: #define NUM_XLOGINSERT_LOCKS 1 tps = 52.711939 (including connections establishing) #define NUM_XLOGINSERT_LOCKS 8 tps = 286.496054 (including connections establishing) #define NUM_XLOGINSERT_LOCKS 16 tps = 346.113313 (including connections establishing) #define NUM_XLOGINSERT_LOCKS 24 tps = 363.242111 (including connections establishing) Just to clarify: that 10% number I threw out was meant as a rough estimate for a system with the default configuration, which is all that I tested. It seemed like people would likely need to tune all the usual things like checkpoint_segments, shared_buffers, etc. as well before seeing much better. You did all that, and sure enough the gain went up; thanks for confirming my guess. I still don't think that means this needs a GUC for 9.4. Look at that jump from 1 to 8. The low-hanging fruit here hasn't just been knocked off. It's been blended into a frozen drink, poured into a glass, and had a little paper umbrella put on top. I think that's enough for 9.4. But, yes, let's see if we can add delivery to the side of the pool in the next version too. -- Greg Smith greg.sm...@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixed xloginsert_locks for 9.4
On 2014-10-03 10:07:39 -0400, Gregory Smith wrote: On 10/3/14, 8:26 AM, Andres Freund wrote: #define NUM_XLOGINSERT_LOCKS 1 tps = 52.711939 (including connections establishing) #define NUM_XLOGINSERT_LOCKS 8 tps = 286.496054 (including connections establishing) #define NUM_XLOGINSERT_LOCKS 16 tps = 346.113313 (including connections establishing) #define NUM_XLOGINSERT_LOCKS 24 tps = 363.242111 (including connections establishing) Just to clarify: that 10% number I threw out was meant as a rough estimate for a system with the default configuration, which is all that I tested. It seemed like people would likely need to tune all the usual things like checkpoint_segments, shared_buffers, etc. as well before seeing much better. You did all that, and sure enough the gain went up; thanks for confirming my guess. I still don't think that means this needs a GUC for 9.4. Look at that jump from 1 to 8. The low-hanging fruit here hasn't just been knocked off. It's been blended into a frozen drink, poured into a glass, and had a little paper umbrella put on top. I think that's enough for 9.4. But, yes, let's see if we can add delivery to the side of the pool in the next version too. So 25% performance on a relatively small machine improvements aren't worth a GUC? That are likely to be larger on a bigger machine? I utterly fail to see why that's a service to our users. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Escaping from blocked send() reprised.
On 09/28/2014 01:54 AM, Andres Freund wrote: I've invested some more time in this: Thanks! In 0001, the select() codepath will not return (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE) on EOF or error, like the comment says and like the poll() path does. It only sets WL_SOCKET_READABLE if WL_SOCKET_READABLE was requested as a wake-event, and likewise for writeable, while the poll() codepath returns (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE) regardless of the requested wake-events. I'm not sure which is actually better - a separate WL_SOCKET_ERROR code might be best - but it's inconsistent as it is. 0002 now makes sense on its own and doesn't change anything around the interrupt handling. Oh, and it compiles without 0003. WaitLatchOrSocket() can throw an error, so it's not totally safe to call that underneath OpenSSL. Admittedly the cases where it throws an error are shouldn't happen cases like poll() failed or read() on self-pipe failed, but still. Perhaps those errors should be reclassified as FATAL; it's not clear you can just roll back and expect to continue running if any of them happens. In secure_raw_write(), you need to save and restore errno, as WaitLatchOrSocket will not preserve it. If secure_raw_write() calls WaitLatchOrSocket(), and it returns because the latch was set, and we fall out of secure_raw_write, we will return -1 but the errno might not be set to anything sensible anymore. 0003 Sinval/notify processing got simplified further. There really isn't any need for DisableNotifyInterrupt/DisableCatchupInterrupt anymore. Also begin_client_read/client_read_ended don't make much sense anymore. Instead introduce ProcessClientReadInterrupt (which wants a better name). There's also a very WIP 0004 Allows secure_read/write be interrupted when ProcDiePending is set. All of that happens via the latch mechanism, nothing happens inside signal handlers. So I do think it's quite an improvement over what's been discussed in this thread. But it (and the other approaches) do noticeably increase the likelihood of clients not getting the error message if the client isn't actually dead. The likelihood of write() being blocked *temporarily* due to normal bandwidth constraints is quite high when you consider COPY FROM and similar. Right now we'll wait till we can put the error message into the socket afaics. 1-3 need some serious comment work, but I think the approach is basically sound. I'm much, much less sure about allowing send() to be interrupted. Yeah, 1-3 seem sane. 4 also looks OK to me at a quick glance. It basically enables handling the die interrupt immediately, if we're blocked in a read or write. It won't be handled in the signal handler, but within the secure_read/write call anyway. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Escaping from blocked send() reprised.
Hi, On 2014-10-03 17:12:18 +0300, Heikki Linnakangas wrote: On 09/28/2014 01:54 AM, Andres Freund wrote: I've invested some more time in this: Thanks! In 0001, the select() codepath will not return (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE) on EOF or error, like the comment says and like the poll() path does. It only sets WL_SOCKET_READABLE if WL_SOCKET_READABLE was requested as a wake-event, and likewise for writeable, while the poll() codepath returns (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE) regardless of the requested wake-events. I'm not sure which is actually better - a separate WL_SOCKET_ERROR code might be best - but it's inconsistent as it is. Hm. Right. I think we should only report the requested state. We can't really discern wether it's a hangup, error or actually readable/writable with select() - it just returns the socket as readable/writable as soon as it doesn't block anymore. Where not blocking includes the connection having gone bad. It took me a while to figure out whether that's actually guaranteed by the spec, but I'm pretty sure it is... 0002 now makes sense on its own and doesn't change anything around the interrupt handling. Oh, and it compiles without 0003. WaitLatchOrSocket() can throw an error, so it's not totally safe to call that underneath OpenSSL. Hm. Fair point. Admittedly the cases where it throws an error are shouldn't happen cases like poll() failed or read() on self-pipe failed, but still. Perhaps those errors should be reclassified as FATAL; it's not clear you can just roll back and expect to continue running if any of them happens. Fine with me. In secure_raw_write(), you need to save and restore errno, as WaitLatchOrSocket will not preserve it. If secure_raw_write() calls WaitLatchOrSocket(), and it returns because the latch was set, and we fall out of secure_raw_write, we will return -1 but the errno might not be set to anything sensible anymore. Oops. 0003 Sinval/notify processing got simplified further. There really isn't any need for DisableNotifyInterrupt/DisableCatchupInterrupt anymore. Also begin_client_read/client_read_ended don't make much sense anymore. Instead introduce ProcessClientReadInterrupt (which wants a better name). There's also a very WIP 0004 Allows secure_read/write be interrupted when ProcDiePending is set. All of that happens via the latch mechanism, nothing happens inside signal handlers. So I do think it's quite an improvement over what's been discussed in this thread. But it (and the other approaches) do noticeably increase the likelihood of clients not getting the error message if the client isn't actually dead. The likelihood of write() being blocked *temporarily* due to normal bandwidth constraints is quite high when you consider COPY FROM and similar. Right now we'll wait till we can put the error message into the socket afaics. 1-3 need some serious comment work, but I think the approach is basically sound. I'm much, much less sure about allowing send() to be interrupted. Yeah, 1-3 seem sane. I think 3 also needs a careful look. Have you looked through it? While imo much less complex than before, there's some complex interactions in the touched code. And we have terrible coverage of both catchup interrupts and notify stuff... Tom, do you happen to have time to look at that bit? There's also the concern that using a latch for client communication increases the number of syscalls for the same work. We should at least try to quantify that... 4 also looks OK to me at a quick glance. It basically enables handling the die interrupt immediately, if we're blocked in a read or write. It won't be handled in the signal handler, but within the secure_read/write call anyway. What are you thinking about the concern that it'll reduce the likelihood of transferring the error message to the client? I tried to reduce that by only allowing errors when write() blocks, but that's not an infrequent event. Thanks for looking. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Trailing comma support in SELECT statements
Hi, I have created a small patch to postgres source (in particular the psql part of it) that accepts trailing comma at the end of list in SELECT statement. The idea is to be able to say both (with the same result): SELECT a, b, c from t; SELECT a, b, c, from t; Attached you can find a patch containing regression test (incorporated into the serial_schedule). My patch is relative to origin/REL9_4_STABLE branch as that is the one I started from. My plea is to have this change merged into the main stream so that it becomes available in upcoming releases. This modification does not require any interaction with user. It does not create any backward compatibility issues. Not does it have any performance impact. regards bogdan From 450c339b4284887782b30e154766a0ee90d6f7ee Mon Sep 17 00:00:00 2001 From: Bogdan Pilch bogdan.pi...@opensynergy.com Date: Sat, 16 Aug 2014 19:42:29 +0200 Subject: [PATCH 1/3] BPI: Added support for ignoring the trailing comma in select statement --- src/backend/parser/gram.y | 1 + 1 file changed, 1 insertion(+) diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 7b9895d..345c6cb 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -12470,6 +12470,7 @@ ctext_row: '(' ctext_expr_list ')' { $$ = $2; } */ opt_target_list: target_list { $$ = $1; } + | target_list ','{ $$ = $1; } | /* EMPTY */ { $$ = NIL; } ; -- 1.9.1 From 9faf5eec4975eb99ad7c8901e30742ba92c0c4cb Mon Sep 17 00:00:00 2001 From: Bogdan Pilch bogdan.pi...@opensynergy.com Date: Sun, 28 Sep 2014 13:12:24 +0200 Subject: [PATCH 3/3] Added regression test for trailing comma select feature. --- .../regress/expected/select_trailing_comma.out | 53 ++ src/test/regress/serial_schedule | 1 + src/test/regress/sql/select_trailing_comma.sql | 16 +++ 3 files changed, 70 insertions(+) create mode 100644 src/test/regress/expected/select_trailing_comma.out create mode 100644 src/test/regress/sql/select_trailing_comma.sql diff --git a/src/test/regress/expected/select_trailing_comma.out b/src/test/regress/expected/select_trailing_comma.out new file mode 100644 index 000..f84938c --- /dev/null +++ b/src/test/regress/expected/select_trailing_comma.out @@ -0,0 +1,53 @@ +-- +-- SELECT WITH TRAILING COMMA +-- +CREATE TEMP TABLE primes (p1 int, p2 int, p3 int); +INSERT INTO primes VALUES (13, 7927, 7); +SELECT * FROM primes; + p1 | p2 | p3 ++--+ + 13 | 7927 | 7 +(1 row) + +SELECT *, FROM primes; + p1 | p2 | p3 ++--+ + 13 | 7927 | 7 +(1 row) + +SELECT p1 FROM primes; + p1 + + 13 +(1 row) + +SELECT p1, FROM primes; + p1 + + 13 +(1 row) + +SELECT p1, p2 FROM primes; + p1 | p2 ++-- + 13 | 7927 +(1 row) + +SELECT p1, p2, FROM primes; + p1 | p2 ++-- + 13 | 7927 +(1 row) + +SELECT p1, p2, p3 FROM primes; + p1 | p2 | p3 ++--+ + 13 | 7927 | 7 +(1 row) + +SELECT p1, p2, p3, FROM primes; + p1 | p2 | p3 ++--+ + 13 | 7927 | 7 +(1 row) + diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule index 16a1905..3571d14 100644 --- a/src/test/regress/serial_schedule +++ b/src/test/regress/serial_schedule @@ -79,6 +79,7 @@ test: select_distinct test: select_distinct_on test: select_implicit test: select_having +test: select_trailing_comma test: subselect test: union test: case diff --git a/src/test/regress/sql/select_trailing_comma.sql b/src/test/regress/sql/select_trailing_comma.sql new file mode 100644 index 000..a2c922f --- /dev/null +++ b/src/test/regress/sql/select_trailing_comma.sql @@ -0,0 +1,16 @@ +-- +-- SELECT WITH TRAILING COMMA +-- + +CREATE TEMP TABLE primes (p1 int, p2 int, p3 int); + +INSERT INTO primes VALUES (13, 7927, 7); + +SELECT * FROM primes; +SELECT *, FROM primes; +SELECT p1 FROM primes; +SELECT p1, FROM primes; +SELECT p1, p2 FROM primes; +SELECT p1, p2, FROM primes; +SELECT p1, p2, p3 FROM primes; +SELECT p1, p2, p3, FROM primes; -- 1.9.1 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Log notice that checkpoint is to be written on shutdown
Michael Banck wrote: diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 5a4dbb9..f2716ae 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -8085,10 +8085,14 @@ CreateCheckPoint(int flags) /* * If enabled, log checkpoint start. We postpone this until now so as not - * to log anything if we decided to skip the checkpoint. + * to log anything if we decided to skip the checkpoint. If we are during + * shutdown and checkpoints are not being logged, add a log message that a + * checkpoint is to be written and shutdown is potentially delayed. */ if (log_checkpoints) LogCheckpointStart(flags, false); + else if (shutdown) + ereport(LOG, (errmsg(waiting for checkpoint ...))); TRACE_POSTGRESQL_CHECKPOINT_START(flags); I think if we're going to emit log messages for shutdown checkpoints, we ought to use the same format we already have, i.e. instead of having the separate waiting for checkpoint message, just test log_checkpoints || shutdown, then LogCheckpointStart. And for consistency also make sure that the checkpoint end log line is also reported on shutdown checkpoints regardless of log_checkpoints. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Escaping from blocked send() reprised.
On 10/03/2014 05:26 PM, Andres Freund wrote: On 2014-10-03 17:12:18 +0300, Heikki Linnakangas wrote: On 09/28/2014 01:54 AM, Andres Freund wrote: 0003 Sinval/notify processing got simplified further. There really isn't any need for DisableNotifyInterrupt/DisableCatchupInterrupt anymore. Also begin_client_read/client_read_ended don't make much sense anymore. Instead introduce ProcessClientReadInterrupt (which wants a better name). There's also a very WIP 0004 Allows secure_read/write be interrupted when ProcDiePending is set. All of that happens via the latch mechanism, nothing happens inside signal handlers. So I do think it's quite an improvement over what's been discussed in this thread. But it (and the other approaches) do noticeably increase the likelihood of clients not getting the error message if the client isn't actually dead. The likelihood of write() being blocked *temporarily* due to normal bandwidth constraints is quite high when you consider COPY FROM and similar. Right now we'll wait till we can put the error message into the socket afaics. 1-3 need some serious comment work, but I think the approach is basically sound. I'm much, much less sure about allowing send() to be interrupted. Yeah, 1-3 seem sane. I think 3 also needs a careful look. Have you looked through it? While imo much less complex than before, there's some complex interactions in the touched code. And we have terrible coverage of both catchup interrupts and notify stuff... I only looked at the .patch, I didn't apply it, so I didn't look at the context much. But I don't see any fundamental problem with it. I would like to have a closer look before it's committed, though. There's also the concern that using a latch for client communication increases the number of syscalls for the same work. We should at least try to quantify that... I'm not too concerned about that, since we only do extra syscalls when the socket isn't immediately available for reading/writing, i.e. when we have to sleep anyway. 4 also looks OK to me at a quick glance. It basically enables handling the die interrupt immediately, if we're blocked in a read or write. It won't be handled in the signal handler, but within the secure_read/write call anyway. What are you thinking about the concern that it'll reduce the likelihood of transferring the error message to the client? I tried to reduce that by only allowing errors when write() blocks, but that's not an infrequent event. I'm not too concerned about that either. I mean, it's probably true that it reduces the likelihood, but I don't particularly care myself. But if we care, we could use a timeout there, so that if we receive a SIGTERM while blocked on a send(), we wait for a few seconds to see if we can send whatever we were sending, before terminating the backend. What should we do with this patch in the commitfest? Are you planning to clean up and commit these patches? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [RFC] Incremental backup v2: add backup profile to base backup
Hi Hackers, I've updated the wiki page https://wiki.postgresql.org/wiki/Incremental_backup following the result of discussion on hackers. Compared to first version, we switched from a timestamp+checksum based approach to one based on LSN. This patch adds an option to pg_basebackup and to replication protocol BASE_BACKUP command to generate a backup_profile file. It is almost useless by itself, but it is the foundation on which we will build the file based incremental backup (and hopefully a block based incremental backup after it). Any comment will be appreciated. In particular I'd appreciate comments on correctness of relnode files detection and LSN extraction code. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it backup_profile_v2.patch.gz Description: GNU Zip compressed data signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)
On Thu, Oct 2, 2014 at 11:50:14AM +0200, Andres Freund wrote: The first problem that comes to my mind about collecting enough data is that we have a very large number of lwlocks (fixed_number + 2 * shared_buffers). One 'trivial' way of implementing this is to have a per backend array collecting the information, and then a shared one accumulating data from it over time. But I'm afraid that's not going to fly :(. Hm. With the above sets of stats that'd be ~50MB per backend... Perhaps we should somehow encode this different for individual lwlock tranches? It's far less problematic to collect all this information for all but the buffer lwlocks... First, I think this could be a major Postgres feature, and I am excited someone is working on this. As far as gathering data, I don't think we are going to do any better in terms of performance/simplicity/reliability than to have a single PGPROC entry to record when we enter/exit a lock, and having a secondary process scan the PGPROC array periodically. What that gives us is almost zero overhead on backends, high reliability, and the ability of the scan daemon to give higher weights to locks that are held longer. Basically, if you just stored the locks you held and released, you either have to add timing overhead to the backends, or you have no timing information collected. By scanning active locks, a short-lived lock might not be seen at all, while a longer-lived lock might be seen by multiple scans. What that gives us is a weighting of the lock time with almost zero overhead. If we want finer-grained lock statistics, we just increase the number of scans per second. I am assuming almost no one cares about the number of locks, but rather they care about cummulative lock durations. I am having trouble seeing any other option that has such a good cost/benefit profile. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Typo fixes in src/backend/rewrite/rewriteHandler.c
On Thu, Oct 2, 2014 at 2:55 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: Here is the comments in process_matched_tle() in rewriteHandler.c. 883 * such nodes; consider 884 * UPDATE tab SET col.fld1.subfld1 = x, col.fld2.subfld2 = y 885 * The two expressions produced by the parser will look like 886 * FieldStore(col, fld1, FieldStore(placeholder, subfld1, x)) 887 * FieldStore(col, fld2, FieldStore(placeholder, subfld2, x)) I think the second one is not correct and should be FieldStore(col, fld2, FieldStore(placeholder, subfld2, y)) Just like this, 891 * FieldStore(FieldStore(col, fld1, 892 *FieldStore(placeholder, subfld1, x)), 893 * fld2, FieldStore(placeholder, subfld2, x)) should be FieldStore(FieldStore(col, fld1, FieldStore(placeholder, subfld1, x)), fld2, FieldStore(placeholder, subfld2, y)) Patch attached. Looks right to me. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)
On Fri, Oct 3, 2014 at 11:33 AM, Bruce Momjian br...@momjian.us wrote: I am assuming almost no one cares about the number of locks, but rather they care about cummulative lock durations. I am having trouble seeing any other option that has such a good cost/benefit profile. I do think that the instrumentation data gathered by LWLOCK_STATS is useful - very useful. But it does have significant overhead. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup
On 10/03/2014 06:31 PM, Marco Nenciarini wrote: Hi Hackers, I've updated the wiki page https://wiki.postgresql.org/wiki/Incremental_backup following the result of discussion on hackers. Compared to first version, we switched from a timestamp+checksum based approach to one based on LSN. This patch adds an option to pg_basebackup and to replication protocol BASE_BACKUP command to generate a backup_profile file. It is almost useless by itself, but it is the foundation on which we will build the file based incremental backup (and hopefully a block based incremental backup after it). I'd suggest jumping straight to block-based incremental backup. It's not significantly more complicated to implement, and if you implement both separately, then we'll have to support both forever. If you really need to, you can implement file-level diff as a special case, where the server sends all blocks in the file, if any of them have an LSN the cutoff point. But I'm not sure if there's point in that, once you have block-level support. If we're going to need a profile file - and I'm not convinced of that - is there any reason to not always include it in the backup? Any comment will be appreciated. In particular I'd appreciate comments on correctness of relnode files detection and LSN extraction code. I didn't look at it in detail, but one future problem comes to mind: Once you implement the server-side code that only sends a file if its LSN is higher than the cutoff point that the client gave, you'll have to scan the whole file first, to see if there are any blocks with a higher LSN. At least until you find the first such block. So with a file-level implementation of this sort, you'll have to scan all files twice, in the worst case. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)
On Fri, Oct 3, 2014 at 5:33 PM, Bruce Momjian br...@momjian.us wrote: As far as gathering data, I don't think we are going to do any better in terms of performance/simplicity/reliability than to have a single PGPROC entry to record when we enter/exit a lock, and having a secondary process scan the PGPROC array periodically. That was the point. What that gives us is almost zero overhead on backends, high reliability, and the ability of the scan daemon to give higher weights to locks that are held longer. Basically, if you just stored the locks you held and released, you either have to add timing overhead to the backends, or you have no timing information collected. By scanning active locks, a short-lived lock might not be seen at all, while a longer-lived lock might be seen by multiple scans. What that gives us is a weighting of the lock time with almost zero overhead. If we want finer-grained lock statistics, we just increase the number of scans per second. So I could add the function, which will accumulate the data in some view/table (with weights etc). How it should be called? From specific process? From some existing maintenance process such as autovacuum? Should I implement GUC for example lwlock_pull_rate, 0 for off, from 1 to 10 for 1 to 10 samples pro second? I am assuming almost no one cares about the number of locks, but rather they care about cummulative lock durations. Oracle and DB2 measure both, cummulative durations and counts. I am having trouble seeing any other option that has such a good cost/benefit profile. At least cost. In Oracle documentation clearly stated, that it is all about diagnostic convenience, performance impact is significant. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Ilya Kosmodemiansky, PostgreSQL-Consulting.com tel. +14084142500 cell. +4915144336040 i...@postgresql-consulting.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)
On Fri, Oct 3, 2014 at 5:51 PM, Robert Haas robertmh...@gmail.com wrote: I do think that the instrumentation data gathered by LWLOCK_STATS is useful - very useful. Sure, quite useful. But how about this comment: /* * The LWLock stats will be updated within a critical section, which * requires allocating new hash entries. Allocations within a critical * section are normally not allowed because running out of memory would * lead to a PANIC, but LWLOCK_STATS is debugging code that's not normally * turned on in production, so that's an acceptable risk. The hash entries * are small, so the risk of running out of memory is minimal in practice. */ But it does have significant overhead. I will say that it is a bit more than overhead for production use. -- Ilya Kosmodemiansky, PostgreSQL-Consulting.com tel. +14084142500 cell. +4915144336040 i...@postgresql-consulting.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange
Stephen Frost wrote: * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: I am rather surprised that nobody has reported this problem before. I am now of the mind that this is clearly a bug that should be fixed all the way back. I'm coming around to that also, however, should we worry about users who set per-table settings and then simply forgot about them? I suppose that won't matter too much unless the table is really active, and if it is, they've probably already set it to zero. Right. For the cases where it's been set and forgotten, perhaps we can have an item in release notes to tell people to look into tables with the parameters set in pg_class.reloptions (to any value different from zero) and to look for performance differences from previous versions when they are. I have pushed this now, backpatch all the way back to 9.0. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] replicating DROP commands across servers
On 09/16/2014 09:09 PM, Brightwell, Adam wrote: I think there's been some changes to this patch since july, care to resend a new version? Sure, here it is. The only difference with the previous version is that it now also supports column defaults. This was found to be a problem when you drop a sequence that some column default depends on -- for example a column declared SERIAL, or a sequence marked with ALTER SEQUENCE OWNED BY. The new code is able to drop both the sequence and the default value (leaving, of course, the rest of the column intact.) This required adding support for such objects in get_object_address. I have given this patch the following review: - Apply to current master (77e65bf). -- success - check-world. --success - multiple FIXME statements still exist -- are there plans to fix these items? Can the duplicated code be extracted to a static function? Nothing seems to be happening to this, so I'm marking this as returned with feedback. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup
Il 03/10/14 17:53, Heikki Linnakangas ha scritto: If we're going to need a profile file - and I'm not convinced of that - is there any reason to not always include it in the backup? The main reason is to have a centralized list of files that need to be present. Without a profile, you have to insert some sort of placeholder for kipped files. Moreover, the profile allows you to quickly know the size of the recovered backup (by simply summing the individual size). Another use could be to 'validate' the presence of all required files in a backup. Any comment will be appreciated. In particular I'd appreciate comments on correctness of relnode files detection and LSN extraction code. I didn't look at it in detail, but one future problem comes to mind: Once you implement the server-side code that only sends a file if its LSN is higher than the cutoff point that the client gave, you'll have to scan the whole file first, to see if there are any blocks with a higher LSN. At least until you find the first such block. So with a file-level implementation of this sort, you'll have to scan all files twice, in the worst case. It's true. To solve this you have to keep a central maxLSN directory, but I think it introduces more issues than it solves. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Last Commitfest patches waiting review
Thanks to everyone's who's reviewed a patch so far. One last crunch, and we'll be through. We have 7 patches left in Needs Review state: pg_receivexlog: addition of --create/--drop to create/drop repslots Waiting for Magnus. Amit promised to take a look if Magnus continues to be busy. Sort support for text with strxfrm() poor man's keys Robert? What do you think of Peter's latest patch? add latency limit to pgbench throttling (--rate) I'm waiting for Greg Smith to have a look at the latest patch. Escaping from blocked send() by pg_terminate_backend(). Andres posted a patch series using a completely different approach. I guess this should be just marked as returned with feedback, if we're going to pursue the different approach instead. Fix local_preload_libraries to work as expected. pg_dump refactor to remove global variables Peter: Ping? Will you have the time to review these? Fix xpath() to return namespace definitions (fixes the issue with nested or repeated xpath()) Peter, you've done some XML stuff before; could you have a look at this too? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Trailing comma support in SELECT statements
Bogdan Pilch bog...@matfyz.cz writes: I have created a small patch to postgres source (in particular the psql part of it) that accepts trailing comma at the end of list in SELECT statement. This doesn't seem to me to be a remarkably good idea. What's the difference between this and accepting random misspellings of SELECT, allowing mismatched parentheses in expressions, etc etc? It's important in a computer language to be able to catch typos. If we were going to be lax about trailing commas, the SELECT list would hardly be the only candidate, or even the first candidate, for being lax that way. But I don't want to go there. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Trailing comma support in SELECT statements
On Sun, Sep 28, 2014 at 01:42:46PM +0200, Bogdan Pilch wrote: Hi, I have created a small patch to postgres source (in particular the psql part of it) that accepts trailing comma at the end of list in SELECT statement. The idea is to be able to say both (with the same result): SELECT a, b, c from t; SELECT a, b, c, from t; Attached you can find a patch containing regression test (incorporated into the serial_schedule). My patch is relative to origin/REL9_4_STABLE branch as that is the one I started from. My plea is to have this change merged into the main stream so that it becomes available in upcoming releases. This modification does not require any interaction with user. It does not create any backward compatibility issues. Interesting --- I know some languages allow trailing delimiters, like Perl and Javascript. Could this mask query errors? Does any other database accept this? Seems this would need to be done in many other places, like UPDATE, but let's first decide if we want this. FYI, it is usually better to discuss a feature before showing a patch. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup
On Fri, Oct 3, 2014 at 1:08 PM, Marco Nenciarini marco.nenciar...@2ndquadrant.it wrote: Any comment will be appreciated. In particular I'd appreciate comments on correctness of relnode files detection and LSN extraction code. I didn't look at it in detail, but one future problem comes to mind: Once you implement the server-side code that only sends a file if its LSN is higher than the cutoff point that the client gave, you'll have to scan the whole file first, to see if there are any blocks with a higher LSN. At least until you find the first such block. So with a file-level implementation of this sort, you'll have to scan all files twice, in the worst case. It's true. To solve this you have to keep a central maxLSN directory, but I think it introduces more issues than it solves. I see that as a worthy optimization on the server side, regardless of whether file or block-level backups are used, since it allows efficient skipping of untouched segments (common for append-only tables). Still, it would be something to do after it works already (ie: it's an optimization) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE IF NOT EXISTS INDEX
On Fri, Oct 3, 2014 at 5:26 AM, Marti Raudsepp ma...@juffo.org wrote: On Fri, Oct 3, 2014 at 6:25 AM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: Documentation: I would prefer if the explanation were consistent with Do not throw an error if the index already exists. A notice is issued in this case. Fixed in that way. Ok? And also Note that there is no guarantee that the existing index is anything like the one that would have been created. Fixed. I think ERRCODE_SYNTAX_ERROR makes more sense, it's something that we decided we *don't want* to support. I don't think so. It's the same as CREATE SCHEMA IF NOT EXISTS that not support to include schema elements. IMO that's wrong too, the CREATE SCHEMA documentation doesn't list it as valid syntax. But now that you split the syntax in two, you can simply replace opt_index_name with index_name and it will naturally cause a syntax error without the need for an if(). What do you think? Patch attached, which applies on top of your v4 patch. Fine. Thanks. On Fri, Oct 3, 2014 at 6:29 AM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: I would also move this check to after all the attributes have been assigned, rather than splitting the assignments in half. Why? If you see other places in gram.y it's a common usage... Looks cleaner to me: first input all the fields, then validate. And there are examples like this too, like COPY select_with_parens. But this is moot now, if you agree with my grammar change. I agree with your grammar change. On Fri, Oct 3, 2014 at 6:35 AM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: And just to remember please add your review to commitfest Thanks for reminding, I always forget to update the CommitFest... :( You're welcome. The version 5 (attached) contains all discussed until now. Regards. -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml index e469b17..2dfe2f4 100644 --- a/doc/src/sgml/ref/create_index.sgml +++ b/doc/src/sgml/ref/create_index.sgml @@ -21,7 +21,7 @@ PostgreSQL documentation refsynopsisdiv synopsis -CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ replaceable class=parametername/replaceable ] ON replaceable class=parametertable_name/replaceable [ USING replaceable class=parametermethod/replaceable ] +CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ IF NOT EXISTS replaceable class=parametername/replaceable | replaceable class=parametername/replaceable ] ON replaceable class=parametertable_name/replaceable [ USING replaceable class=parametermethod/replaceable ] ( { replaceable class=parametercolumn_name/replaceable | ( replaceable class=parameterexpression/replaceable ) } [ COLLATE replaceable class=parametercollation/replaceable ] [ replaceable class=parameteropclass/replaceable ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] ) [ WITH ( replaceable class=PARAMETERstorage_parameter/replaceable = replaceable class=PARAMETERvalue/replaceable [, ... ] ) ] [ TABLESPACE replaceable class=parametertablespace_name/replaceable ] @@ -99,6 +99,17 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ replaceable class=parametername/ variablelist varlistentry + termliteralIF NOT EXISTS/literal/term + listitem + para +Do not throw an error if the index already exists. A notice is issued +in this case. Note that there is no guarantee that the existing index +is anything like the one that would have been created. + /para + /listitem + /varlistentry + + varlistentry termliteralUNIQUE/literal/term listitem para diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index ee10594..8905e30 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -697,7 +697,8 @@ index_create(Relation heapRelation, bool allow_system_table_mods, bool skip_build, bool concurrent, - bool is_internal) + bool is_internal, + bool if_not_exists) { Oid heapRelationId = RelationGetRelid(heapRelation); Relation pg_class; @@ -773,10 +774,22 @@ index_create(Relation heapRelation, elog(ERROR, shared relations must be placed in pg_global tablespace); if (get_relname_relid(indexRelationName, namespaceId)) + { + if (if_not_exists) + { + ereport(NOTICE, + (errcode(ERRCODE_DUPLICATE_TABLE), + errmsg(relation \%s\ already exists, skipping, + indexRelationName))); + heap_close(pg_class, RowExclusiveLock); + return InvalidOid; + } + ereport(ERROR, (errcode(ERRCODE_DUPLICATE_TABLE), errmsg(relation \%s\ already exists, indexRelationName))); + } /* * construct tuple
Re: [HACKERS] Fixed xloginsert_locks for 9.4
On Fri, Oct 3, 2014 at 04:11:30PM +0200, Andres Freund wrote: On 2014-10-03 10:07:39 -0400, Gregory Smith wrote: On 10/3/14, 8:26 AM, Andres Freund wrote: #define NUM_XLOGINSERT_LOCKS 1 tps = 52.711939 (including connections establishing) #define NUM_XLOGINSERT_LOCKS 8 tps = 286.496054 (including connections establishing) #define NUM_XLOGINSERT_LOCKS 16 tps = 346.113313 (including connections establishing) #define NUM_XLOGINSERT_LOCKS 24 tps = 363.242111 (including connections establishing) Just to clarify: that 10% number I threw out was meant as a rough estimate for a system with the default configuration, which is all that I tested. It seemed like people would likely need to tune all the usual things like checkpoint_segments, shared_buffers, etc. as well before seeing much better. You did all that, and sure enough the gain went up; thanks for confirming my guess. I still don't think that means this needs a GUC for 9.4. Look at that jump from 1 to 8. The low-hanging fruit here hasn't just been knocked off. It's been blended into a frozen drink, poured into a glass, and had a little paper umbrella put on top. I think that's enough for 9.4. But, yes, let's see if we can add delivery to the side of the pool in the next version too. So 25% performance on a relatively small machine improvements aren't worth a GUC? That are likely to be larger on a bigger machine? I utterly fail to see why that's a service to our users. Well, I think the issue is that having a GUC that can't reasonably be tuned by 95% of our users is nearly useless. Few users are going to run benchmarks to see what the optimal value is. I remember Informix had a setting that had no description except try different values to see if it helps performance --- we don't want to do that. What if we emit a server message if the setting is too low? That's how we handle checkpoint_segments. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Last Commitfest patches waiting review
On Fri, Oct 3, 2014 at 6:14 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Thanks to everyone's who's reviewed a patch so far. One last crunch, and we'll be through. We have 7 patches left in Needs Review state: pg_receivexlog: addition of --create/--drop to create/drop repslots Waiting for Magnus. Amit promised to take a look if Magnus continues to be busy. Andres did, not Amit. And thanks Andres! :) And sorry aobut that one - no way I'll get to it until at the earliest next week but almost certainly not until the week after :( So I really appreciate the pickup. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] replicating DROP commands across servers
Heikki Linnakangas wrote: On 09/16/2014 09:09 PM, Brightwell, Adam wrote: I have given this patch the following review: - Apply to current master (77e65bf). -- success - check-world. --success - multiple FIXME statements still exist -- are there plans to fix these items? Can the duplicated code be extracted to a static function? Nothing seems to be happening to this, so I'm marking this as returned with feedback. Meh. There are three fixmes in the code. One can be handled by just removing the line; we don't really care about duplicating 10 lines of boilerplate code. The other two mean missing support for domain constraints and for default ACLs. Is there absolutely no feedback to be had on the mechanism used by the patch? Since the patch has had good feedback and no further comments arise, I can just implement support for those two missing object types and push, and everybody will be happy. Right? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] replicating DROP commands across servers
Alvaro, * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: There are three fixmes in the code. One can be handled by just removing the line; we don't really care about duplicating 10 lines of boilerplate code. The other two mean missing support for domain constraints and for default ACLs. Is there absolutely no feedback to be had on the mechanism used by the patch? Since the patch has had good feedback and no further comments arise, I can just implement support for those two missing object types and push, and everybody will be happy. Right? In general, I'd say yes, but I'll take a look at the patch now and provide feedback in a couple hours anyway. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] test_shm_mq failing on anole (was: Sending out a request for more buildfarm animals?)
On Wed, Oct 1, 2014 at 11:10 AM, Robert Haas robertmh...@gmail.com wrote: As far as I can tell, it's configured to run everything. I just checked, though, and found it wedged again. I'm not sure whether it was the same problem, though; I ended up just killing all of the postgres processes to fix it. We may be just at the beginning of an exciting debugging journey. After much more time spent on this, and some help from two people whose names are both six letters starting with Andre, I discovered the following: - anole apparently had not tried to build master in a long time - I think since May - because of a leftover inst directory instead HEAD's buildroot. - anole still refused to build either 9.4 or master until I removed a couple of leftover socket directories in /tmp. - after that it hung again trying to run the test_shm_mq regression tests, in pretty much the same way as before. The postmaster state looks like this: 0xc044a350:0 in _select_sys+0x30 () from /usr/lib/hpux64/libc.so.1 (gdb) bt #0 0xc044a350:0 in _select_sys+0x30 () from /usr/lib/hpux64/libc.so.1 #1 0xc045f950:0 in select+0x90 () from /usr/lib/hpux64/libc.so.1 #2 0x40a4e510:0 in ServerLoop () at postmaster.c:1540 #3 0x40a48b20:0 in PostmasterMain (argc=3, argv=0x87ffd178) at postmaster.c:1223 #4 0x408d0a90:0 in main (argc=3, argv=0x87ffd178) at main.c:227 (gdb) p BackgroundWorkerList $1 = {head = {next = 0x600e8a58}} (gdb) p BackgroundWorkerList.head $2 = {next = 0x600e8a58} (gdb) p ((char *) BackgroundWorkerList.head) - 264 $3 = 0x600e8950 test_shm_mq (gdb) p (RegisteredBgWorker *) (((char *) BackgroundWorkerList.head) - 264) $4 = (struct RegisteredBgWorker *) 0x600e8950 (gdb) p ((RegisteredBgWorker *) (((char *) BackgroundWorkerList.head) - 264))[0] $5 = {rw_worker = { bgw_name = test_shm_mq\000\000\001?\000 \000\00511985, '\000' repeats 13 times, \005T\000\000\000\000\000\001?\000 \000\00512038\000\000\000\000\000\000, bgw_flags = 1, bgw_start_time = BgWorkerStart_ConsistentState, bgw_restart_time = -1, bgw_main = 0, bgw_library_name = test_shm_mq\000\000\000\005t\000\000\000\000\000\001?\000 \000\00511917, '\000' repeats 13 times, \005?\000\000\000\000\000\001?\000 \000\0051209, bgw_function_name = test_shm_mq_main\000\000\000\000\000\001?\000 \000\00511903, '\000' repeats 13 times, \005?\000\000\000\000\000\001?\000 \000\0051199, bgw_main_arg = 1069289198, bgw_notify_pid = 12361}, rw_backend = 0x0, rw_pid = 12369, rw_child_slot = 35, rw_crashed_at = 0, rw_shmem_slot = 6, rw_terminate = 0 '\000', rw_lnode = {next = 0x600e8938}} (gdb) p $5.rw_lnode $6 = {next = 0x600e8938} (gdb) p ((RegisteredBgWorker *) (((char *) $5.rw_lnode) - 264))[0] $7 = {rw_worker = { bgw_name = test_shm_mq\000\000\001?x\000 \000\00511931, '\000' repeats 13 times, \004?\000\000\000\000\000\001?}\000 \000\00511939\000\000\000\000\000\000, bgw_flags = 1, bgw_start_time = BgWorkerStart_ConsistentState, bgw_restart_time = -1, bgw_main = 0, bgw_library_name = test_shm_mq\000\000\000\004?\000\000\000\000\000\001?\000 \000\00511957, '\000' repeats 13 times, \004?\000\000\000\000\000\001?\000 \000\0051209, bgw_function_name = test_shm_mq_main\000\000\000\000\000\001?\000 \000\00511956, '\000' repeats 13 times, \005\024\000\000\000\000\000\001?\000 \000\0051192, bgw_main_arg = 1069289198, bgw_notify_pid = 12361}, rw_backend = 0x0, rw_pid = 0, rw_child_slot = 0, rw_crashed_at = 0, rw_shmem_slot = 5, rw_terminate = 0 '\000', rw_lnode = { next = 0x600e8818}} (gdb) p ((RegisteredBgWorker *) (((char *) $7.rw_lnode) - 264))[0] $8 = {rw_worker = { bgw_name = test_shm_mq\000\000\001?f\000 \000\00512051, '\000' repeats 13 times, \0044\000\000\000\000\000\001?g\000 \000\00512047\000\000\000\000\000\000, bgw_flags = 1, bgw_start_time = BgWorkerStart_ConsistentState, bgw_restart_time = -1, bgw_main = 0, bgw_library_name = test_shm_mq\000\000\000\004T\000\000\000\000\000\001?j\000 \000\00511979, '\000' repeats 13 times, \004d\000\000\000\000\000\001?k\000 \000\0051197, bgw_function_name = test_shm_mq_main\000\000\000\000\000\001?m\000 \000\00511873, '\000' repeats 13 times, \004?\000\000\000\000\000\001?u\000 \000\0051196, bgw_main_arg = 1069289198, bgw_notify_pid = 12361}, rw_backend = 0x0, rw_pid = 0, rw_child_slot = 0, rw_crashed_at = 0, rw_shmem_slot = 4, rw_terminate = 0 '\000', rw_lnode = {next = 0x600e86f8}} (gdb) p ((RegisteredBgWorker *) (((char *) $8.rw_lnode) - 264))[0] $9 = {rw_worker = { bgw_name = test_shm_mq\000\000\001?\\\000 \000\00512085, '\000' repeats 13 times, \003?\000\000\000\000\000\001?]\000 \000\00512076\000\000\000\000\000\000, bgw_flags = 1, bgw_start_time = BgWorkerStart_ConsistentState, bgw_restart_time = -1, bgw_main = 0, bgw_library_name = test_shm_mq\000\000\000\003?\000\000\000\000\000\001?_\000 \000\00511885, '\000' repeats
Re: [HACKERS] replicating DROP commands across servers
On 10/03/2014 08:08 PM, Stephen Frost wrote: Alvaro, * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: There are three fixmes in the code. One can be handled by just removing the line; we don't really care about duplicating 10 lines of boilerplate code. The other two mean missing support for domain constraints and for default ACLs. Is there absolutely no feedback to be had on the mechanism used by the patch? Since the patch has had good feedback and no further comments arise, I can just implement support for those two missing object types and push, and everybody will be happy. Right? In general, I'd say yes, but I'll take a look at the patch now and provide feedback in a couple hours anyway. Thanks Stephen! I had a very brief look at the docs, and these extra outputs from pg_event_trigger_dropped_objects caught my eye: +row + entryliteraladdress_names/literal/entry + entrytypetext[]/type/entry + entry + An array that, together with literaladdress_args/literal, + can be used by the C-language function getObjectAddress() to + recreate the object address in a remote server containing a similar object. + /entry +/row +row + entryliteraladdress_args/literal/entry + entrytypetext[]/type/entry + entry + See literaladdress_names/literal above. + /entry +/row I couldn't find a function called getObjectAddress anywhere. Typo? Also, is providing a C-language function the best we can do? The rest of the information returned by pg_event_trigger_dropped_objects is usable from any language. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal for updating src/timezone
I wrote: The urgency of updating our timezone code has risen quite a bit for me, because while testing an update of the data files to tzdata2014h I became aware that the -P option is failing to print a noticeable number of zone abbreviations that clearly exist in the data files. Since the -P hack itself is so simple, it's hard to come to any other conclusion than that the rest of the timezone code is buggy --- presumably because it's four years behind what IANA is targeting with the data files. Ah, scratch that. On further investigation, it turns out that my code for -P is in fact wrong; it misses out printing zone abbreviations in cases where an area reverted to an older meaning of a zone abbrev after having used another meaning for awhile. Which, in fact, is what the Russians have just done, and it seems it's happened rather often elsewhere as well. We do still need to work on syncing our code with IANA's updates one way or the other, because eventually they are going to start shipping zone definitions that break older copies of the library; this has been mentioned more than once on the IANA announcement list. But it doesn't seem like that has happened quite yet. Meanwhile, it looks like there's a fair amount of missed updates in our tznames/ files as a result of this error. Off I go to look at that. (The underlying motivation here is that I'd like to get the tzdata2014h updates installed before we wrap 9.4beta3. Since it looks like those changes are going to be rather invasive, it seems like getting some beta testing on them would be a good thing before we unleash them on back-branch releases.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixed xloginsert_locks for 9.4
On Fri, Oct 3, 2014 at 1:40 PM, Bruce Momjian br...@momjian.us wrote: On Fri, Oct 3, 2014 at 04:11:30PM +0200, Andres Freund wrote: On 2014-10-03 10:07:39 -0400, Gregory Smith wrote: On 10/3/14, 8:26 AM, Andres Freund wrote: #define NUM_XLOGINSERT_LOCKS 1 tps = 52.711939 (including connections establishing) #define NUM_XLOGINSERT_LOCKS 8 tps = 286.496054 (including connections establishing) #define NUM_XLOGINSERT_LOCKS 16 tps = 346.113313 (including connections establishing) #define NUM_XLOGINSERT_LOCKS 24 tps = 363.242111 (including connections establishing) Just to clarify: that 10% number I threw out was meant as a rough estimate for a system with the default configuration, which is all that I tested. It seemed like people would likely need to tune all the usual things like checkpoint_segments, shared_buffers, etc. as well before seeing much better. You did all that, and sure enough the gain went up; thanks for confirming my guess. I still don't think that means this needs a GUC for 9.4. Look at that jump from 1 to 8. The low-hanging fruit here hasn't just been knocked off. It's been blended into a frozen drink, poured into a glass, and had a little paper umbrella put on top. I think that's enough for 9.4. But, yes, let's see if we can add delivery to the side of the pool in the next version too. So 25% performance on a relatively small machine improvements aren't worth a GUC? That are likely to be larger on a bigger machine? I utterly fail to see why that's a service to our users. Well, I think the issue is that having a GUC that can't reasonably be tuned by 95% of our users is nearly useless. Few users are going to run benchmarks to see what the optimal value is. I remember Informix had a setting that had no description except try different values to see if it helps performance --- we don't want to do that. What if we emit a server message if the setting is too low? That's how we handle checkpoint_segments. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers Not all GUC need to be straight forward to tune. If the gains are worthy I don't see any reason not to have it.
Re: [HACKERS] pg_receivexlog and replication slots
On Fri, Oct 3, 2014 at 7:57 AM, Andres Freund and...@2ndquadrant.com wrote: I do wonder whether --create/--drop aren't somewhat wierd for pg_receivexlog. It's not that clear what it means. It'd be ugly, but we could rename them --create-slot/drop-slot. +1 on doing it, -1 on it being ugly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] replicating DROP commands across servers
Heikki Linnakangas wrote: I had a very brief look at the docs, and these extra outputs from pg_event_trigger_dropped_objects caught my eye: +row + entryliteraladdress_names/literal/entry + entrytypetext[]/type/entry + entry + An array that, together with literaladdress_args/literal, + can be used by the C-language function getObjectAddress() to + recreate the object address in a remote server containing a similar object. + /entry +/row +row + entryliteraladdress_args/literal/entry + entrytypetext[]/type/entry + entry + See literaladdress_names/literal above. + /entry +/row I couldn't find a function called getObjectAddress anywhere. Typo? Ah, yeah, it's get_object_address actually. Also, is providing a C-language function the best we can do? The rest of the information returned by pg_event_trigger_dropped_objects is usable from any language. Well, the return value from get_object_address is an ObjectAddress. It's simple enough to create an SQL wrapper that takes the address_names/address_args arrays and return an ObjectAddress; would this be useful? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixed xloginsert_locks for 9.4
On Fri, Oct 3, 2014 at 03:00:56PM -0300, Arthur Silva wrote: I remember Informix had a setting that had no description except try different values to see if it helps performance --- we don't want to do that. What if we emit a server message if the setting is too low? That's how we handle checkpoint_segments. Not all GUC need to be straight forward to tune. If the gains are worthy I don't see any reason not to have it. Every GUC add complexity to the system because people have to understand it to know if they should tune it. No GUC is zero-cost. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixed xloginsert_locks for 9.4
On Fri, Oct 3, 2014 at 02:07:45PM -0400, Bruce Momjian wrote: On Fri, Oct 3, 2014 at 03:00:56PM -0300, Arthur Silva wrote: I remember Informix had a setting that had no description except try different values to see if it helps performance --- we don't want to do that. What if we emit a server message if the setting is too low? That's how we handle checkpoint_segments. Not all GUC need to be straight forward to tune. If the gains are worthy I don't see any reason not to have it. Every GUC add complexity to the system because people have to understand it to know if they should tune it. No GUC is zero-cost. Please see my blog post about the cost of adding GUCs: http://momjian.us/main/blogs/pgblog/2009.html#January_10_2009 -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup
On Fri, Oct 3, 2014 at 06:08:47PM +0200, Marco Nenciarini wrote: Any comment will be appreciated. In particular I'd appreciate comments on correctness of relnode files detection and LSN extraction code. I didn't look at it in detail, but one future problem comes to mind: Once you implement the server-side code that only sends a file if its LSN is higher than the cutoff point that the client gave, you'll have to scan the whole file first, to see if there are any blocks with a higher LSN. At least until you find the first such block. So with a file-level implementation of this sort, you'll have to scan all files twice, in the worst case. It's true. To solve this you have to keep a central maxLSN directory, but I think it introduces more issues than it solves. The central issue Heikki is pointing out is whether we should implement a file-based system if we already know that a block-based system will be superior in every way. I agree with that and agree that implementing just file-based isn't worth it as we would have to support it forever. So, in summary, if you target just a file-based system, be prepared that it might be rejected. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] replicating DROP commands across servers
On 10/03/2014 09:06 PM, Alvaro Herrera wrote: Heikki Linnakangas wrote: I had a very brief look at the docs, and these extra outputs from pg_event_trigger_dropped_objects caught my eye: +row + entryliteraladdress_names/literal/entry + entrytypetext[]/type/entry + entry + An array that, together with literaladdress_args/literal, + can be used by the C-language function getObjectAddress() to + recreate the object address in a remote server containing a similar object. + /entry +/row +row + entryliteraladdress_args/literal/entry + entrytypetext[]/type/entry + entry + See literaladdress_names/literal above. + /entry +/row I couldn't find a function called getObjectAddress anywhere. Typo? Ah, yeah, it's get_object_address actually. Also, is providing a C-language function the best we can do? The rest of the information returned by pg_event_trigger_dropped_objects is usable from any language. Well, the return value from get_object_address is an ObjectAddress. It's simple enough to create an SQL wrapper that takes the address_names/address_args arrays and return an ObjectAddress; would this be useful? An ObjectAddress consists of a classid, objid, and objsubid. pg_event_trigger_dropped_objects already returns all of those as separate fields. What am I missing? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixed xloginsert_locks for 9.4
On Fri, Oct 3, 2014 at 3:10 PM, Bruce Momjian br...@momjian.us wrote: On Fri, Oct 3, 2014 at 02:07:45PM -0400, Bruce Momjian wrote: On Fri, Oct 3, 2014 at 03:00:56PM -0300, Arthur Silva wrote: I remember Informix had a setting that had no description except try different values to see if it helps performance --- we don't want to do that. What if we emit a server message if the setting is too low? That's how we handle checkpoint_segments. Not all GUC need to be straight forward to tune. If the gains are worthy I don't see any reason not to have it. Every GUC add complexity to the system because people have to understand it to know if they should tune it. No GUC is zero-cost. Please see my blog post about the cost of adding GUCs: http://momjian.us/main/blogs/pgblog/2009.html#January_10_2009 -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + That's true Bruce (nice post, it was a good reading). But how can we ignore 25%+ improvements (from 8 to 24)? At very least we should delivery some pretty good defaults.
Re: [HACKERS] replicating DROP commands across servers
Heikki Linnakangas wrote: On 10/03/2014 09:06 PM, Alvaro Herrera wrote: Well, the return value from get_object_address is an ObjectAddress. It's simple enough to create an SQL wrapper that takes the address_names/address_args arrays and return an ObjectAddress; would this be useful? An ObjectAddress consists of a classid, objid, and objsubid. pg_event_trigger_dropped_objects already returns all of those as separate fields. What am I missing? Precisely the point is not returning those values, because they are useless to identify the equivalent object in a remote database. What we need is the object names and other stuff used to uniquely identify it by user-visible name. We transmit those name arrays to a remote server, then on the remote server we can run get_object_address and get the ObjectAddress, which has the classid,objid,objsubid values you cite ... but for the remote server. The object can then be dropped there. Initially we thought that we would use the object_identity object for this (which is why we invented that functionality and added the column in 9.3), but this turned out not to work very well for unusual object types; hence this patch. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] test_shm_mq failing on anole (was: Sending out a request for more buildfarm animals?)
On Fri, Oct 3, 2014 at 1:09 PM, Robert Haas robertmh...@gmail.com wrote: Further debugging reveals that sigusr1_handler() gets called repeatedly, to start autovacuum workers, and it keeps waking up and starting them. But that doesn't cause the background workers to get started either, because although sigusr1_handler() contains a call to maybe_start_bgworker, it only does that if start_bgworker = true, which only happens if CheckPostmasterSignal(PMSIGNAL_BACKGROUND_WORKER_CHANGE) is true, which on these calls it isn't. And I think this might also be the missing ingredient answering Andres's question from before: why doesn't the 60-second select()-timeout cause the background worker to eventually start even if the SELECT doesn't get interrupted? There seems to be a SIGUSR1 arriving about every 3 seconds, and I bet that's resetting the timer every time. For now I propose to address this by committing the attached patch, which gets rid of the separate start_bgworker flag inside sigusr1_handler() and instead uses the same StartWorkerNeeded || HaveCrashedWorker test that we use inside ServerLoop() to decide whether to call maybe_start_bgworker(). Either more signals will arrive (in which case the signal handler will launch an additional background worker every time a signal arrives) or they won't (in which case the 60-second timeout will eventually expire, and ServerLoop() will kick into high gear and satisfy all outstanding requests). This isn't really right, because there might still be a quite noticeable delay waiting for workers to get launched, but at least the delay would be bounded to at most 60 seconds rather than, as at present, potentially infinite. A totally correct fix will require a bit more thought. A search of the git history reveals that the problem of a signal restarting the timeout is not new: Tom fixed a similar problem back in 2007 by making the autovacuum launcher sleep for at most a second at a time. Such a fix isn't ideal here, because we really don't want an up-to-1-second delay launching a newly-registered background worker if there's a way to avoid that -- it's probably OK for launching daemons, but it's not so hot for parallel query. However, we could: (1) Use the self-pipe trick. We could not directly use a latch, at least not without a new API, because we might be waiting on more than one socket. (2) Have the postmaster not set SA_RESTART for the sigusr1 handler. I don't know how platform-independent this approach would be. (3) Have sigusr1_handler repeatedly call maybe_start_bgworker() until StartWorkerNeeded becomes false, instead of just calling it once. ServerLoop() is carefully coded to call maybe_start_bgworker() just once per iteration, presumably to make sure the server stays responsive even if the bgworker-starting machinery is quite busy; looping inside the signal handler would give up that nice property unless we had some way to break out of the loop if there's activity on the socket. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index ce920ab..6220a8e 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -4752,7 +4752,6 @@ static void sigusr1_handler(SIGNAL_ARGS) { int save_errno = errno; - bool start_bgworker = false; PG_SETMASK(BlockSig); @@ -4760,7 +4759,7 @@ sigusr1_handler(SIGNAL_ARGS) if (CheckPostmasterSignal(PMSIGNAL_BACKGROUND_WORKER_CHANGE)) { BackgroundWorkerStateChange(); - start_bgworker = true; + StartWorkerNeeded = true; } /* @@ -4801,10 +4800,10 @@ sigusr1_handler(SIGNAL_ARGS) pmState = PM_HOT_STANDBY; /* Some workers may be scheduled to start now */ - start_bgworker = true; + StartWorkerNeeded = true; } - if (start_bgworker) + if (StartWorkerNeeded || HaveCrashedWorker) maybe_start_bgworker(); if (CheckPostmasterSignal(PMSIGNAL_WAKEN_ARCHIVER) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)
On Fri, Oct 3, 2014 at 05:53:59PM +0200, Ilya Kosmodemiansky wrote: What that gives us is almost zero overhead on backends, high reliability, and the ability of the scan daemon to give higher weights to locks that are held longer. Basically, if you just stored the locks you held and released, you either have to add timing overhead to the backends, or you have no timing information collected. By scanning active locks, a short-lived lock might not be seen at all, while a longer-lived lock might be seen by multiple scans. What that gives us is a weighting of the lock time with almost zero overhead. If we want finer-grained lock statistics, we just increase the number of scans per second. So I could add the function, which will accumulate the data in some view/table (with weights etc). How it should be called? From specific process? From some existing maintenance process such as autovacuum? Should I implement GUC for example lwlock_pull_rate, 0 for off, from 1 to 10 for 1 to 10 samples pro second? Yes, that's the right approach. You would implement it as a background worker process, and a GUC as you described. I assume it would populate a view like we already do for the pg_stat_ views, and the counters could be reset somehow. I would pattern it after how we handle the pg_stat_ views. I am assuming almost no one cares about the number of locks, but rather they care about cummulative lock durations. Oracle and DB2 measure both, cummulative durations and counts. Well, the big question is whether counts are really useful. You did a good job of explaining that when you find heavy clog or xlog lock usage you would adjust your server. What I am unclear about is why you would adjust your server based on lock _counts_ and not cummulative lock duration. I don't think we want the overhead of accumulating information that isn't useful. I am having trouble seeing any other option that has such a good cost/benefit profile. At least cost. In Oracle documentation clearly stated, that it is all about diagnostic convenience, performance impact is significant. Oh, we don't want to go there then, and I think this approach is a big win. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixed xloginsert_locks for 9.4
Bruce Momjian br...@momjian.us writes: On Fri, Oct 3, 2014 at 03:00:56PM -0300, Arthur Silva wrote: Not all GUC need to be straight forward to tune. If the gains are worthy I don't see any reason not to have it. Every GUC add complexity to the system because people have to understand it to know if they should tune it. No GUC is zero-cost. In particular, the cost of putting this one back would be documenting what it does and how to tune it. As mentioned upthread, we're not following that Informix precedent ;-) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixed xloginsert_locks for 9.4
On Fri, Oct 3, 2014 at 03:30:35PM -0300, Arthur Silva wrote: Every GUC add complexity to the system because people have to understand it to know if they should tune it. No GUC is zero-cost. Please see my blog post about the cost of adding GUCs: http://momjian.us/main/blogs/pgblog/2009.html#January_10_2009 That's true Bruce (nice post, it was a good reading). But how can we ignore 25%+ improvements (from 8 to 24)? At very least we should delivery some pretty good defaults. Well, checkpoint_segments was a similar case where we couldn't give good tuning advice so we went with a server log file warning if it needed to be increased --- this might be a similar case. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixed xloginsert_locks for 9.4
On 10/03/2014 09:42 PM, Bruce Momjian wrote: On Fri, Oct 3, 2014 at 03:30:35PM -0300, Arthur Silva wrote: Every GUC add complexity to the system because people have to understand it to know if they should tune it. No GUC is zero-cost. Please see my blog post about the cost of adding GUCs: http://momjian.us/main/blogs/pgblog/2009.html#January_10_2009 That's true Bruce (nice post, it was a good reading). But how can we ignore 25%+ improvements (from 8 to 24)? At very least we should delivery some pretty good defaults. Well, checkpoint_segments was a similar case where we couldn't give good tuning advice so we went with a server log file warning if it needed to be increased --- this might be a similar case. I have no idea how to decide at runtime whether it should be increased or not. If that was feasible, we probably could make it tune itself on the fly - it's not like checkpoint_segments where you need more disk space if you increase it. I stand by my decision to make it a #define, at least until someone voices their objection in the form of a documentation patch. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixed xloginsert_locks for 9.4
On Fri, Oct 3, 2014 at 2:49 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I stand by my decision to make it a #define, at least until someone voices their objection in the form of a documentation patch. I think that's exactly right. If we knew users should tune this, then we might be able to make it self-tuning, which would be best of all. At the least, we'd have useful information to provide to people who want to change it. However, if we *can't* provide tuning advice, then all making it a GUC does is give users a knob that says turning this might make things better! or worse!. Adding such knobs does little harm individually, but in the aggregate it makes for a product that is mysterious, hard to use, and hard to maintain and improve. In practice, I suspect few people will get the amount of benefit that Andres saw in his tests. You have to be xloginsert-bound, and a lot of workloads aren't. And if they were before, they aren't now that we have 8 xloginsert slots by default. My suspicion, which I think is what Andres was getting at as well, is that you'll want more slots if you have more CPUs. One way to measure the effectiveness of a given value (and maybe even auto-tune) would be to count how often you run out of slots. Of course, you'd need some countervailing form of pressure based on the effort that you're spending locking however many you have rather than some smaller number. You can't just measure the upside without measuring the downside. The xlog slots use the kind of algorithm that I quite frankly hate even when it works. It avoids one kind of loss but in a way that's not scalable; as you crank up the number of slots you start to incur cost elsewhere. Getting any more scalability beyond that point will require doing something fundamentally different, or at least that's my guess. So I feel like we've put our finger in the dike more than anything, but that still beats letting the dike collapse. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixed xloginsert_locks for 9.4
On Fri, Oct 3, 2014 at 2:20 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Oct 3, 2014 at 2:49 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I stand by my decision to make it a #define, at least until someone voices their objection in the form of a documentation patch. I think that's exactly right. If we knew users should tune this, then we might be able to make it self-tuning, which would be best of all. At the least, we'd have useful information to provide to people who want to change it. However, if we *can't* provide tuning advice, then all making it a GUC does is give users a knob that says turning this might make things better! or worse!. Adding such knobs does little harm individually, but in the aggregate it makes for a product that is mysterious, hard to use, and hard to maintain and improve. 100% agree. It's a very simple standard: if you provide a performance affecting GUC pleease provide guidelines to end user regarding the tradeoffs or performance interactions being managed. Failure to do this causes an interesting problem; let's take the case of shared buffers for example which isn't explained very well. Failure to properly document or explain the interactions causes the user to make invalid assumptions about the setting (more memory = faster!) or take obsolete advice (Tom Lane said in 1997 that 128mb is about right) as gospel. This has been a long running gripe of mine. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] replicating DROP commands across servers
Alvaro Herrera wrote: Precisely the point is not returning those values, because they are useless to identify the equivalent object in a remote database. What we need is the object names and other stuff used to uniquely identify it by user-visible name. We transmit those name arrays to a remote server, then on the remote server we can run get_object_address and get the ObjectAddress, which has the classid,objid,objsubid values you cite ... but for the remote server. The object can then be dropped there. Initially we thought that we would use the object_identity object for this (which is why we invented that functionality and added the column in 9.3), but this turned out not to work very well for unusual object types; hence this patch. The other thing to keep in mind is that with all those ObjectAddress thingies you got, you cannot simply construct a DROP obj command: 1. The objects in the list might be of different types; say a table and a view that are dropped by the same command because of CASCADE. (You could just pass the CASCADE to the other side and hope that it happens to do the same thing; but if the schemas are slightly different, it might not.) 2. DROP OWNED or other commands might have dropped several objects, again of varying types. So what we do in BDR is stuff all those ObjectAddress in an array of them, and then call performMultipleDeletions. There is no way to get this functionality in non-C code; so it's hard to see that it's very useful to have a non-C way to use the original objnames/objargs arrays. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] replicating DROP commands across servers
* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: Right. In the add to objname cases, there is already some other routine that initialized it previously by filling in some stuff; in the case above, this happens in the getRelationIdentity() immediately preceding this. In the other cases we initialize on that spot. ahh, ok, that makes a bit more sense, sorry for missing it. Still makes me wonder why objargs gets special treatment at the top of the function and objnames doesn't- seems like both should be initialized either before being passed in (and perhaps an Assert to verify that they are), or they should both be initialized, but I tend to prefer just Assert'ing that they are correct on entry- either both are valid pointers to empty lists, or both NULL. I'm also not a huge fan of the big object_type_map, but I also don't have a better solution. Agreed. We have the ObjectProperty array also in the same file; it kinda looks like there is duplication here, but actually there isn't. Yeah, I did notice that, and noticed that it's not duplication. This whole issue is just fallout from the fact that we have three different ways to identify object classes: the ObjectClass enum, the ObjectType enum, and the relation OIDs of each catalog (actually a fourth one, see below). I don't see any other nice way around this; I guess we could try to auto-generate these tables somehow from a master text file, or something. Not material for this patch, I think. Agreed that this patch doesn't need to address it and not sure that a master text file would actually be an improvement.. I had been thinking if there was some way to have a single mapping which could be used in either direction, but I didn't see any sensible way to make that work given that it's not *quite* the same backwards and forewards. Note my DDL deparse patch adds a comment: +/* XXX merge this with ObjectTypeMap? */ static event_trigger_support_data event_trigger_support[] = { Yes, I saw that, and that you added a comment that the new map needs to be updated when changes are made, which is also good. and a late revision to that patch added a new function in event_triggers.c (not yet posted I think) due to GRANT having its own enum of object types, AclObjectKind. Yeah. Perhaps one day we'll unify all of these, though I'm not 100% sure it'd be possible... Thanks! Stephen signature.asc Description: Digital signature
[HACKERS] Re: Valgrind warnings in master branch (Invalid read of size 8) originating within CreatePolicy()
Peter, * Peter Geoghegan (p...@heroku.com) wrote: I see the following Valgrind warnings in a recent build of the master branch: Fix pushed, thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Missing newlines in verbose logs of pg_dump, introduced by RLS patch
* Stephen Frost (sfr...@snowman.net) wrote: * Michael Paquier (michael.paqu...@gmail.com) wrote: On Mon, Sep 29, 2014 at 10:07 AM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: The schema name is missing... attached patch add it. Ah, right, thanks. It didn't occur to me immediately :) Your patch looks good to me, and you are updating as well the second message that missed the schema name in getRowSecurity. Thanks to you both- will review. Fix pushed- thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Trailing comma support in SELECT statements
On 10/03/2014 12:20 PM, Bruce Momjian wrote: On Sun, Sep 28, 2014 at 01:42:46PM +0200, Bogdan Pilch wrote: Hi, I have created a small patch to postgres source (in particular the psql part of it) that accepts trailing comma at the end of list in SELECT statement. The idea is to be able to say both (with the same result): SELECT a, b, c from t; SELECT a, b, c, from t; Attached you can find a patch containing regression test (incorporated into the serial_schedule). My patch is relative to origin/REL9_4_STABLE branch as that is the one I started from. My plea is to have this change merged into the main stream so that it becomes available in upcoming releases. This modification does not require any interaction with user. It does not create any backward compatibility issues. Interesting --- I know some languages allow trailing delimiters, like Perl and Javascript. Could this mask query errors? Does any other database accept this? Seems this would need to be done in many other places, like UPDATE, but let's first decide if we want this. FYI, it is usually better to discuss a feature before showing a patch. Javascript might accept it, but it's not valid JSON. The case for doing it is that then you can easily comment out any entry at all in a select list: select foo as f1, bar as f2, -- baz as f3, from blurfl cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup
On Fri, Oct 3, 2014 at 12:08 PM, Marco Nenciarini marco.nenciar...@2ndquadrant.it wrote: Il 03/10/14 17:53, Heikki Linnakangas ha scritto: If we're going to need a profile file - and I'm not convinced of that - is there any reason to not always include it in the backup? The main reason is to have a centralized list of files that need to be present. Without a profile, you have to insert some sort of placeholder for kipped files. Why do you need to do that? And where do you need to do that? It seems to me that there are three interesting operations: 1. Take a full backup. Basically, we already have this. In the backup label file, make sure to note the newest LSN guaranteed to be present in the backup. 2. Take a differential backup. In the backup label file, note the LSN of the fullback to which the differential backup is relative, and the newest LSN guaranteed to be present in the differential backup. The actual backup can consist of a series of 20-byte buffer tags, those being the exact set of blocks newer than the base-backup's latest-guaranteed-to-be-present LSN. Each buffer tag is followed by an 8kB block of data. If a relfilenode is truncated or removed, you need some way to indicate that in the backup; e.g. include a buffertag with forknum = -(forknum + 1) and blocknum = the new number of blocks, or InvalidBlockNumber if removed entirely. 3. Apply a differential backup to a full backup to create an updated full backup. This is just a matter of scanning the full backup and the differential backup and applying the changes in the differential backup to the full backup. You might want combinations of these, like something that does 2+3 as a single operation, for efficiency, or a way to copy a full backup and apply a differential backup to it as you go. But that's it, right? What else do you need? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] replicating DROP commands across servers
Stephen Frost wrote: * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: Right. In the add to objname cases, there is already some other routine that initialized it previously by filling in some stuff; in the case above, this happens in the getRelationIdentity() immediately preceding this. In the other cases we initialize on that spot. ahh, ok, that makes a bit more sense, sorry for missing it. Still makes me wonder why objargs gets special treatment at the top of the function and objnames doesn't- seems like both should be initialized either before being passed in (and perhaps an Assert to verify that they are), or they should both be initialized, but I tend to prefer just Assert'ing that they are correct on entry- either both are valid pointers to empty lists, or both NULL. I guess I could initialize objnames to NIL also. I initialize objargs because that one is unused for a lot of object types (so I would have to set it to NIL in cases where it's not used), whereas objnames is always used and thus we know it's always initialized later. Maybe what I need here is just a longer comment explaining this ... -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] replicating DROP commands across servers
* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: Stephen Frost wrote: ahh, ok, that makes a bit more sense, sorry for missing it. Still makes me wonder why objargs gets special treatment at the top of the function and objnames doesn't- seems like both should be initialized either before being passed in (and perhaps an Assert to verify that they are), or they should both be initialized, but I tend to prefer just Assert'ing that they are correct on entry- either both are valid pointers to empty lists, or both NULL. I guess I could initialize objnames to NIL also. I initialize objargs because that one is unused for a lot of object types (so I would have to set it to NIL in cases where it's not used), whereas objnames is always used and thus we know it's always initialized later. Maybe what I need here is just a longer comment explaining this ... A one-line comment that it's always reset below would be sufficient for me. Thanks for explaining it :), Stephen signature.asc Description: Digital signature
Re: [HACKERS] replicating DROP commands across servers
On Fri, Oct 3, 2014 at 2:33 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Heikki Linnakangas wrote: On 10/03/2014 09:06 PM, Alvaro Herrera wrote: Well, the return value from get_object_address is an ObjectAddress. It's simple enough to create an SQL wrapper that takes the address_names/address_args arrays and return an ObjectAddress; would this be useful? An ObjectAddress consists of a classid, objid, and objsubid. pg_event_trigger_dropped_objects already returns all of those as separate fields. What am I missing? Precisely the point is not returning those values, because they are useless to identify the equivalent object in a remote database. What we need is the object names and other stuff used to uniquely identify it by user-visible name. We transmit those name arrays to a remote server, then on the remote server we can run get_object_address and get the ObjectAddress, which has the classid,objid,objsubid values you cite ... but for the remote server. The object can then be dropped there. Initially we thought that we would use the object_identity object for this (which is why we invented that functionality and added the column in 9.3), but this turned out not to work very well for unusual object types; hence this patch. I'm not really very convinced that it's a good idea to expose this instead of just figuring out a way to parse the object identity. But I expect to lose that argument. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] replicating DROP commands across servers
Robert Haas wrote: I'm not really very convinced that it's a good idea to expose this instead of just figuring out a way to parse the object identity. That's the first thing I tried. But it's not pretty: you have to extract schema names by splitting at a period (and what if a schema name contains a period?), split out on ON for certain object types, figure out parens and argument types and names for functions and aggregates, etc. It's just not sane to try to parse such text strings. But I expect to lose that argument. Good :-) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DDL Damage Assessment
On 10/03/2014 11:02 AM, Dimitri Fontaine wrote: Jim Nasby jim.na...@bluetreble.com writes: EXPLAIN ALTER TABLE I'm thinking it would be better to have something you could set at a session level, so you don't have to stick EXPLAIN in front of all your DDL. We were considering the potential needs of accidental DBAs here at first, if memory serves me well. I definitively see the value of EXPLAIN [DDL STATEMENT]... even if implemented as a regular this is what should happen, without even attempting to run a thing (not even dry run transactions), but including the full catalog search / attribute resolution etc. Providing insight on the locking that would happen sounds like a real life-saver for many real life situations (i.e. would this ALTER completely lock my application due to the constant logging-to-table?) This is, obviously IMHO. Yeah I'm coming into that camp too, and I think the Event Trigger idea gets us halfway there. Here's a detailed sketched of how it would work: 1. preparatory steps: install the Event Trigger create extension norewrite; 2. test run: psql -1 -f ddl.sql ERROR: Table Rewrite has been cancelled. 3. Well actually we need to run that thing in production BEGIN; ALTER EVENT TRIGGER norewrite DISABLE; \i ddl.sql ALTER EVENT TRIGGER norewrite ENABLE; COMMIT; Then it's also possible to have another Event Trigger that would automatically issue a LOCK table NOWAIT; command before any DDL against a table is run, in another extension: create extension ddl_lock_nowait; The same applies, if your production rollout is blocked repeatedly and you want to force it through at some point, it's possible to disable the event trigger within the DDL script/transaction. This serves a different purpose which is, at least, as worthwhile as the former: provide a real dry run mechanism for advanced users. Stephen's delta fork sounds like a promising approach ... even if a bit too Oracle-ish (sounds an awful lot like UNDO logs!) for my liking. As for the dry-run idea, I don't think that's really necessary. I've never seen anyone serious that doesn't have a development environment, which is where you would simply deploy the real DDL using verbose mode and see what the underlying commands actually do. The major drawback of the Event Trigger idea is that the transaction is cancelled as soon as a Rewrite Event is fired when you have installed the protective trigger. It means that you won't see the next problem after the first one, so it's not a dry-run. But considering what you're saying here, it might well be enough. It is a very convenient first step (minimally invasive, and good use of existing infrastructure)... since it allows an easy testing phase in order to iron out potential shortcomings and gather input on some other applications. Thanks, / Jose -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
On Fri, Oct 3, 2014 at 1:16 PM, Kevin Grittner kgri...@ymail.com wrote: I have two questions I hope you can clarify. I'm having trouble parsing what this statement means: ... the SQL standard does not require that MERGE be atomic in the sense of atomically providing either an INSERT or UPDATE, ... My understanding is that the standard logically requires (without concern for implementation details) that the second specified table (via table name or subquery -- which could be a VALUES statement) is scanned, and for each row it attempts to match a row in the target table. That will either match or not, according to the boolean expression in the ON clause. You can have one WHEN MATCHED THEN UPDATE clause and/or one WHEN NOT MATCHED THEN INSERT clause. If both clauses are present, I believe that it is guaranteed that one or the other (but not both) will fire for each row in the second table. The usual guarantees for each isolation level must not be violated (although an implementation is not required to generate anomalies which could not happen with serial execution). That seems like a statement that isn't getting to the heart of the matter. Which is: simple UPSERT cases will get things like duplicate key violations under concurrency with MERGE, in SQL Server, Oracle, and DB2. I think serialization failures are inevitable and appropriate when not using READ COMMITTED mode with MVCC, but these duplicate violations are sort of like READ COMMITTED serialization failures in disguise. Or they would be, if MERGE (as described by the standard, or in practice) promised to care about atomicity in that sense. It doesn't, so they're off the hook. I think we should care about atomicity (in the sense of always getting an insert or update, never an error at RC) for the simple reason that that's what people want. It seems like your remarks here don't acknowledge the reality: That there are slightly different ideas of each row that are in play here. So as usual for a transaction involving multiple tables, serialization anomalies are possible if the transaction isolation level is REPEATABLE READ or less. Is that what you're getting at, or something else? My complaint is quite straightforward, really. I don't want to have to tell users to do this: http://stackoverflow.com/a/2249 At the same time, I also don't want to have to live with the consequences of implementing a MERGE that does not exhibit that behavior. Which is to say, the consequences of doing something like selectively using different types of snapshots (MVCC or dirty - the two different ideas of each row that are in tension) based on the exact clauses used. That seems like asking for trouble, TBH. -- Predicate within UPDATE auxiliary statement -- (row is still locked when the UPDATE predicate -- isn't satisfied): INSERT INTO upsert(key, val) VALUES(1, 'insert') -- CONFLICTING() carries forward effects of both INSERT and UPDATE BEFORE row-level triggers ON CONFLICT UPDATE SET val = CONFLICTING(val) -- Predicate has interesting semantics visibility-wise: WHERE val != 'delete'; Can you explain what the WHERE clause there does? Sure. Having already locked the tuple on the basis on finding a conflict TID, but before an UPDATE, the qual is evaluated using EvalPlanQual rescanning. The predicate must be satisfied in order for the UPDATE to actually proceed. If, in this instance, the locked tuple happened to have a val of 'delete', then we would not UPDATE at all. It would still be locked, though (possibly without being visible to our MVCC snapshot, just like when we might do that when we do UPDATE). There are some interesting implications visibility-wise here that must be considered. These are discussed on the Wiki page: https://wiki.postgresql.org/wiki/UPSERT#Visibility_issues_and_the_proposed_syntax_.28WHERE_clause.2Fpredicate_stuff.29 Obviously higher isolation levels just throw an error here instead. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Trailing comma support in SELECT statements
Andrew Dunstan wrote On 10/03/2014 12:20 PM, Bruce Momjian wrote: On Sun, Sep 28, 2014 at 01:42:46PM +0200, Bogdan Pilch wrote: Hi, I have created a small patch to postgres source (in particular the psql part of it) that accepts trailing comma at the end of list in SELECT statement. The idea is to be able to say both (with the same result): SELECT a, b, c from t; SELECT a, b, c, from t; Attached you can find a patch containing regression test (incorporated into the serial_schedule). My patch is relative to origin/REL9_4_STABLE branch as that is the one I started from. My plea is to have this change merged into the main stream so that it becomes available in upcoming releases. This modification does not require any interaction with user. It does not create any backward compatibility issues. Interesting --- I know some languages allow trailing delimiters, like Perl and Javascript. Could this mask query errors? Does any other database accept this? Seems this would need to be done in many other places, like UPDATE, but let's first decide if we want this. FYI, it is usually better to discuss a feature before showing a patch. Javascript might accept it, but it's not valid JSON. The case for doing it is that then you can easily comment out any entry at all in a select list: select foo as f1, bar as f2, -- baz as f3, from blurfl Should we also allow: SELECT , col1 , col2 , col3 FROM ... ? The other reason for this would be to build dynamic SQL more easily via a loop. Barring arguments showing danger allowing I don't see a reason to reject this; let people decide whether they want to utilize it on stylistic or compatibility grounds. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Trailing-comma-support-in-SELECT-statements-tp5821613p5821694.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup
On 2014-10-03 17:31:45 +0200, Marco Nenciarini wrote: I've updated the wiki page https://wiki.postgresql.org/wiki/Incremental_backup following the result of discussion on hackers. Compared to first version, we switched from a timestamp+checksum based approach to one based on LSN. This patch adds an option to pg_basebackup and to replication protocol BASE_BACKUP command to generate a backup_profile file. It is almost useless by itself, but it is the foundation on which we will build the file based incremental backup (and hopefully a block based incremental backup after it). Any comment will be appreciated. In particular I'd appreciate comments on correctness of relnode files detection and LSN extraction code. Can you describe the algorithm you implemented in words? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] superuser() shortcuts
All, Thanks! Please add it to the next commitfest. Sounds good. I'll update the patch and add accordingly. Attached is an updated patch. -Adam -- Adam Brightwell - adam.brightw...@crunchydatasolutions.com Database Engineer - www.crunchydatasolutions.com diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c new file mode 100644 index 133143d..695a13c *** a/src/backend/access/transam/xlogfuncs.c --- b/src/backend/access/transam/xlogfuncs.c *** *** 27,32 --- 27,33 #include miscadmin.h #include replication/walreceiver.h #include storage/smgr.h + #include utils/acl.h #include utils/builtins.h #include utils/numeric.h #include utils/guc.h *** pg_start_backup(PG_FUNCTION_ARGS) *** 54,60 backupidstr = text_to_cstring(backupid); ! if (!superuser() !has_rolreplication(GetUserId())) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), errmsg(must be superuser or replication role to run a backup))); --- 55,61 backupidstr = text_to_cstring(backupid); ! if (!has_replication_privilege(GetUserId())) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), errmsg(must be superuser or replication role to run a backup))); *** pg_stop_backup(PG_FUNCTION_ARGS) *** 82,88 { XLogRecPtr stoppoint; ! if (!superuser() !has_rolreplication(GetUserId())) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), (errmsg(must be superuser or replication role to run a backup; --- 83,89 { XLogRecPtr stoppoint; ! if (!has_replication_privilege(GetUserId())) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), (errmsg(must be superuser or replication role to run a backup; diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c new file mode 100644 index d30612c..4d26b02 *** a/src/backend/catalog/aclchk.c --- b/src/backend/catalog/aclchk.c *** static AclMode restrict_and_check_grant( *** 143,148 --- 143,149 AttrNumber att_number, const char *colname); static AclMode pg_aclmask(AclObjectKind objkind, Oid table_oid, AttrNumber attnum, Oid roleid, AclMode mask, AclMaskHow how); + static bool has_catupdate_privilege(Oid roleid); #ifdef ACLDEBUG *** aclcheck_error_type(AclResult aclerr, Oi *** 3425,3431 /* Check if given user has rolcatupdate privilege according to pg_authid */ static bool ! has_rolcatupdate(Oid roleid) { bool rolcatupdate; HeapTuple tuple; --- 3426,3432 /* Check if given user has rolcatupdate privilege according to pg_authid */ static bool ! has_catupdate_privilege(Oid roleid) { bool rolcatupdate; HeapTuple tuple; *** pg_class_aclmask(Oid table_oid, Oid role *** 3630,3636 if ((mask (ACL_INSERT | ACL_UPDATE | ACL_DELETE | ACL_TRUNCATE | ACL_USAGE)) IsSystemClass(table_oid, classForm) classForm-relkind != RELKIND_VIEW ! !has_rolcatupdate(roleid) !allowSystemTableMods) { #ifdef ACLDEBUG --- 3631,3637 if ((mask (ACL_INSERT | ACL_UPDATE | ACL_DELETE | ACL_TRUNCATE | ACL_USAGE)) IsSystemClass(table_oid, classForm) classForm-relkind != RELKIND_VIEW ! !has_catupdate_privilege(roleid) !allowSystemTableMods) { #ifdef ACLDEBUG *** has_createrole_privilege(Oid roleid) *** 5078,5083 --- 5079,5106 ReleaseSysCache(utup); } return result; + } + + /* + * Check whether specified role has REPLICATION priviledge (or is a superuser) + */ + bool + has_replication_privilege(Oid roleid) + { + bool result = false; + HeapTuple utup; + + /* Superusers bypass all permission checking. */ + if (superuser_arg(roleid)) + return true; + + utup = SearchSysCache1(AUTHOID, ObjectIdGetDatum(roleid)); + if (HeapTupleIsValid(utup)) + { + result = ((Form_pg_authid) GETSTRUCT(utup))-rolreplication; + ReleaseSysCache(utup); + } + return result; } bool diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c new file mode 100644 index c9a9baf..ed89b23 *** a/src/backend/commands/alter.c --- b/src/backend/commands/alter.c *** AlterObjectOwner_internal(Relation rel, *** 807,852 bool *nulls; bool *replaces; ! /* Superusers can bypass permission checks */ ! if (!superuser()) { ! AclObjectKind aclkind = get_object_aclkind(classId); ! /* must be owner */ ! if (!has_privs_of_role(GetUserId(), old_ownerId)) { ! char *objname; ! char namebuf[NAMEDATALEN]; ! ! if (Anum_name != InvalidAttrNumber) ! { ! datum = heap_getattr(oldtup, Anum_name, ! RelationGetDescr(rel), isnull); ! Assert(!isnull); ! objname = NameStr(*DatumGetName(datum)); ! } ! else ! { ! snprintf(namebuf, sizeof(namebuf), %u, ! HeapTupleGetOid(oldtup)); ! objname = namebuf;
Re: [HACKERS] Trailing comma support in SELECT statements
2014-09-28 13:42 GMT+02:00 Bogdan Pilch bog...@matfyz.cz: Hi, I have created a small patch to postgres source (in particular the psql part of it) that accepts trailing comma at the end of list in SELECT statement. It is ANSI/SQL ? Why we should to enable? We can be tolerant to this bug, but then developers will hate us, when they will try to port to other servers. -1 from me Regards Pavel The idea is to be able to say both (with the same result): SELECT a, b, c from t; SELECT a, b, c, from t; Attached you can find a patch containing regression test (incorporated into the serial_schedule). My patch is relative to origin/REL9_4_STABLE branch as that is the one I started from. My plea is to have this change merged into the main stream so that it becomes available in upcoming releases. This modification does not require any interaction with user. It does not create any backward compatibility issues. Not does it have any performance impact. regards bogdan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)
On 2014-10-03 11:33:18 -0400, Bruce Momjian wrote: On Thu, Oct 2, 2014 at 11:50:14AM +0200, Andres Freund wrote: The first problem that comes to my mind about collecting enough data is that we have a very large number of lwlocks (fixed_number + 2 * shared_buffers). One 'trivial' way of implementing this is to have a per backend array collecting the information, and then a shared one accumulating data from it over time. But I'm afraid that's not going to fly :(. Hm. With the above sets of stats that'd be ~50MB per backend... Perhaps we should somehow encode this different for individual lwlock tranches? It's far less problematic to collect all this information for all but the buffer lwlocks... First, I think this could be a major Postgres feature, and I am excited someone is working on this. As far as gathering data, I don't think we are going to do any better in terms of performance/simplicity/reliability than to have a single PGPROC entry to record when we enter/exit a lock, and having a secondary process scan the PGPROC array periodically. I don't think that'll give meaningful results given the very short times most lwlocks are held. And it'll give not very interesting results for multiple lwlocks held at the same time - most of the time the 'earlier' held ones are more interesting than the last acquired one... I am assuming almost no one cares about the number of locks, but rather they care about cummulative lock durations. I actually don't think that's true. Every lock acquiration implies a number of atomic locks. Those are expensive. And if you see individual locks acquired a high number of times in multiple proceses that's something important. It causes significant bus traffic between sockets, while not necessarily visible in the lock held times. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)
On 2014-10-03 11:51:46 -0400, Robert Haas wrote: On Fri, Oct 3, 2014 at 11:33 AM, Bruce Momjian br...@momjian.us wrote: I am assuming almost no one cares about the number of locks, but rather they care about cummulative lock durations. I am having trouble seeing any other option that has such a good cost/benefit profile. I do think that the instrumentation data gathered by LWLOCK_STATS is useful - very useful. But it does have significant overhead. Have you ever analyzed where that overhead is with the current code? I do wonder if having a per backend array in shmem indexed by the lockid (inside its tranche) wouldn't be quite doable for the smaller tranches. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Promise index tuples for UPSERT
On Fri, Oct 3, 2014 at 3:54 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Simon's approach would actually pass that test case just fine. It inserts the (promise) index tuple first, and heap tuple only after that. It will fail the test case with more than one unique index, however. Oh, I see. Still, I don't think you need to UPDATE a uniquely-constrained attribute - even if updating constrained attributes is rare (dubious), non-HOT updates will have the same effect, no? I still think that's unacceptable. In any case, I still don't see what this buys us over the other two designs. What's the pay-off for giving up on the general avoidance of unprincipled deadlocks? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Last Commitfest patches waiting review
On 2014-10-03 19:14:14 +0300, Heikki Linnakangas wrote: Thanks to everyone's who's reviewed a patch so far. One last crunch, and we'll be through. We have 7 patches left in Needs Review state: pg_receivexlog: addition of --create/--drop to create/drop repslots Waiting for Magnus. Amit promised to take a look if Magnus continues to be busy. I've committed half of it, and will commit the second half soon. I guess you meant Andres instead of Amit? Escaping from blocked send() by pg_terminate_backend(). Andres posted a patch series using a completely different approach. I guess this should be just marked as returned with feedback, if we're going to pursue the different approach instead. Sounds fair to me. I'll comment with details on the corresponding thread. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] replicating DROP commands across servers
On 2014-10-03 14:02:09 -0300, Alvaro Herrera wrote: Since the patch has had good feedback and no further comments arise, I can just implement support for those two missing object types and push, and everybody will be happy. Right? I'd like to see a new version before that out here... I don't think there's fundamental issues, but it's complicated enough to warrant that. And I don't think it has gotten enough detailed review. I hope to be able to give a bit more of that... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixed xloginsert_locks for 9.4
On 2014-10-03 12:40:21 -0400, Bruce Momjian wrote: On Fri, Oct 3, 2014 at 04:11:30PM +0200, Andres Freund wrote: On 2014-10-03 10:07:39 -0400, Gregory Smith wrote: On 10/3/14, 8:26 AM, Andres Freund wrote: #define NUM_XLOGINSERT_LOCKS 1 tps = 52.711939 (including connections establishing) #define NUM_XLOGINSERT_LOCKS 8 tps = 286.496054 (including connections establishing) #define NUM_XLOGINSERT_LOCKS 16 tps = 346.113313 (including connections establishing) #define NUM_XLOGINSERT_LOCKS 24 tps = 363.242111 (including connections establishing) Just to clarify: that 10% number I threw out was meant as a rough estimate for a system with the default configuration, which is all that I tested. It seemed like people would likely need to tune all the usual things like checkpoint_segments, shared_buffers, etc. as well before seeing much better. You did all that, and sure enough the gain went up; thanks for confirming my guess. I still don't think that means this needs a GUC for 9.4. Look at that jump from 1 to 8. The low-hanging fruit here hasn't just been knocked off. It's been blended into a frozen drink, poured into a glass, and had a little paper umbrella put on top. I think that's enough for 9.4. But, yes, let's see if we can add delivery to the side of the pool in the next version too. So 25% performance on a relatively small machine improvements aren't worth a GUC? That are likely to be larger on a bigger machine? I utterly fail to see why that's a service to our users. Well, I think the issue is that having a GUC that can't reasonably be tuned by 95% of our users is nearly useless. Few users are going to run benchmarks to see what the optimal value is. Sure. And the default loooks to be a good one. So it's not bad that they don't tune it further. But. How are we ever going to be able to tune it further without feedback from actual production usage? It's possible to convince customers to play with a performance influencing parameter and see how the results are. Even in production. It's near impossible to do so if that requires to download source packages, change a define in some .c file, rebuild the packages, distribute them to the respective servers. And then continue to do so for every release thereafter. Not only is that a *significant* amount of work. It often involves a different set of people (sysadmin, not dba-ish people). And often complicated procedures to allow patching the source code at all. I remember Informix had a setting that had no description except try different values to see if it helps performance --- we don't want to do that. I'm pretty sure we can come up with a better description than that. What if we emit a server message if the setting is too low? That's how we handle checkpoint_segments. I don't think it's realistically possible in this case. The instrumentation we'd need to add would be too expensive for something running as frequently as this. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
Peter Geoghegan p...@heroku.com wrote: On Fri, Oct 3, 2014 at 1:16 PM, Kevin Grittner kgri...@ymail.com wrote: I'm having trouble parsing what this statement means: ... the SQL standard does not require that MERGE be atomic in the sense of atomically providing either an INSERT or UPDATE, ... ... always getting an insert or update, never an error ... I've never seen atomic used to mean you never get an error before. Perhaps it would be clearer to replace atomic with error-free or some such. It certainly would be less confusing for me. Atomic in most cases is a property that can be enforced by generating an error where it would otherwise be violated. For example: http://en.wikipedia.org/wiki/Atomicity_%28database_systems%29 Although implementations vary depending on factors such as concurrency issues, the principle of atomicity — i.e. complete success or complete failure — remain. When you are saying atomic you mean something quite different. My complaint is quite straightforward, really. I don't want to have to tell users to do this: http://stackoverflow.com/a/2249 I think you are confusing syntax with semantics. I grant that a reasonable person could have concerns about using the MERGE syntax to implement the semantics you want in the special case that an appropriate unique index exists, but pretending that the semantics can't be achieved with that syntax is just silly. At the same time, I also don't want to have to live with the consequences of implementing a MERGE that does not exhibit that behavior. Which is to say, the consequences of doing something like selectively using different types of snapshots (MVCC or dirty - the two different ideas of each row that are in tension) based on the exact clauses used. That seems like asking for trouble, TBH. Now *that* is getting more to a real issue. We routinely pick very different plans based on the presence or absence of an index, and we use special snapshots in the course of executing many DML statements (if FK triggers are fired), but this would be the first place I can think of that the primary DML statement behaves that way. You and a couple others have expressed concern about that, but it seems pretty vague and hand-wavey. If a different execution node is used for the special behavior, and that node is generated based on the unique index being used, I'm really having problems seeing where the problem lies. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] test_shm_mq failing on anole (was: Sending out a request for more buildfarm animals?)
On 2014-10-03 14:38:10 -0400, Robert Haas wrote: On Fri, Oct 3, 2014 at 1:09 PM, Robert Haas robertmh...@gmail.com wrote: Further debugging reveals that sigusr1_handler() gets called repeatedly, to start autovacuum workers, and it keeps waking up and starting them. But that doesn't cause the background workers to get started either, because although sigusr1_handler() contains a call to maybe_start_bgworker, it only does that if start_bgworker = true, which only happens if CheckPostmasterSignal(PMSIGNAL_BACKGROUND_WORKER_CHANGE) is true, which on these calls it isn't. And I think this might also be the missing ingredient answering Andres's question from before: why doesn't the 60-second select()-timeout cause the background worker to eventually start even if the SELECT doesn't get interrupted? There seems to be a SIGUSR1 arriving about every 3 seconds, and I bet that's resetting the timer every time. Ick. There's a couple reports about HPUX behaving that way for select(), so that seems like a reasonable theory. And a good explanation why only anole was seing this. For now I propose to address this by committing the attached patch, which gets rid of the separate start_bgworker flag inside sigusr1_handler() and instead uses the same StartWorkerNeeded || HaveCrashedWorker test that we use inside ServerLoop() to decide whether to call maybe_start_bgworker(). That seems sane. Either more signals will arrive (in which case the signal handler will launch an additional background worker every time a signal arrives) or they won't (in which case the 60-second timeout will eventually expire, and ServerLoop() will kick into high gear and satisfy all outstanding requests). This isn't really right, because there might still be a quite noticeable delay waiting for workers to get launched, but at least the delay would be bounded to at most 60 seconds rather than, as at present, potentially infinite. Right, it will somewhat fix the current issue. But aren't the consequences of a potentially never/very seldomly run ServerLoop() more severe? There's a couple more things done in that loop :( A totally correct fix will require a bit more thought. A search of the git history reveals that the problem of a signal restarting the timeout is not new: Tom fixed a similar problem back in 2007 by making the autovacuum launcher sleep for at most a second at a time. Such a fix isn't ideal here, because we really don't want an up-to-1-second delay launching a newly-registered background worker if there's a way to avoid that -- it's probably OK for launching daemons, but it's not so hot for parallel query. However, we could: (1) Use the self-pipe trick. We could not directly use a latch, at least not without a new API, because we might be waiting on more than one socket. I think this is the way to go. And I think we should make the latch API support that case, instead of reinventing it. That'll also have the advantage of automatically using poll() (which is more efficient), rather than select(), when supported by the platform. (2) Have the postmaster not set SA_RESTART for the sigusr1 handler. I don't know how platform-independent this approach would be. This makes me wary. Postmaster doesn't do too much stuff, but it'd still be annoying to have to make sure everything always does loop around EINTR. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog and replication slots
On 2014-10-03 14:02:08 -0400, Robert Haas wrote: On Fri, Oct 3, 2014 at 7:57 AM, Andres Freund and...@2ndquadrant.com wrote: I do wonder whether --create/--drop aren't somewhat wierd for pg_receivexlog. It's not that clear what it means. It'd be ugly, but we could rename them --create-slot/drop-slot. +1 on doing it, -1 on it being ugly. The reason I'm calling it uglyu is that it's different from pg_recvlogical. We could change it there, too? A bit late, but probably better than having a discrepancy forever? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixed xloginsert_locks for 9.4
On 2014-10-03 17:55:19 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-10-03 12:40:21 -0400, Bruce Momjian wrote: Well, I think the issue is that having a GUC that can't reasonably be tuned by 95% of our users is nearly useless. Few users are going to run benchmarks to see what the optimal value is. It's possible to convince customers to play with a performance influencing parameter and see how the results are. Even in production. I'm a bit dubious that people will be willing to experiment in production with a GUC that requires a database restart to change. I've convinced customers to restart databases with several different shared_buffers settings... So I'm pretty sure it's possible, in *some* cases, for xloginsert_slots. And even if it's just test/pre-production machines - they're not going to benchmark settings they can't reasonably set in production. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixed xloginsert_locks for 9.4
Andres Freund and...@2ndquadrant.com writes: On 2014-10-03 12:40:21 -0400, Bruce Momjian wrote: Well, I think the issue is that having a GUC that can't reasonably be tuned by 95% of our users is nearly useless. Few users are going to run benchmarks to see what the optimal value is. It's possible to convince customers to play with a performance influencing parameter and see how the results are. Even in production. I'm a bit dubious that people will be willing to experiment in production with a GUC that requires a database restart to change. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic LWLock tracing via pg_stat_lwlock (proof of concept)
On Fri, Oct 3, 2014 at 11:15:13PM +0200, Andres Freund wrote: As far as gathering data, I don't think we are going to do any better in terms of performance/simplicity/reliability than to have a single PGPROC entry to record when we enter/exit a lock, and having a secondary process scan the PGPROC array periodically. I don't think that'll give meaningful results given the very short times most lwlocks are held. And it'll give not very interesting results for I figured you could get more data the more often you sampled. multiple lwlocks held at the same time - most of the time the 'earlier' held ones are more interesting than the last acquired one... Well, I thought lock _waiting_ would be the most interesting measure, not locks held. Doesn't pg_locks show locks held? I am assuming almost no one cares about the number of locks, but rather they care about cummulative lock durations. I actually don't think that's true. Every lock acquiration implies a number of atomic locks. Those are expensive. And if you see individual locks acquired a high number of times in multiple proceses that's something important. It causes significant bus traffic between sockets, while not necessarily visible in the lock held times. True, but I don't think users are going to get much value from those numbers, and they are hard to get. Server developers might want to know lock counts, but in those cases performance might not be as important. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixed xloginsert_locks for 9.4
On Fri, Oct 3, 2014 at 11:58:14PM +0200, Andres Freund wrote: On 2014-10-03 17:55:19 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-10-03 12:40:21 -0400, Bruce Momjian wrote: Well, I think the issue is that having a GUC that can't reasonably be tuned by 95% of our users is nearly useless. Few users are going to run benchmarks to see what the optimal value is. It's possible to convince customers to play with a performance influencing parameter and see how the results are. Even in production. I'm a bit dubious that people will be willing to experiment in production with a GUC that requires a database restart to change. I've convinced customers to restart databases with several different shared_buffers settings... So I'm pretty sure it's possible, in *some* cases, for xloginsert_slots. And even if it's just test/pre-production machines - they're not going to benchmark settings they can't reasonably set in production. Do we really want to expose a setting a few of us _might_ ask customers to change? I don't think so --- create a custom build if you want that. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixed xloginsert_locks for 9.4
On 2014-10-03 18:08:56 -0400, Bruce Momjian wrote: On Fri, Oct 3, 2014 at 11:58:14PM +0200, Andres Freund wrote: On 2014-10-03 17:55:19 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-10-03 12:40:21 -0400, Bruce Momjian wrote: Well, I think the issue is that having a GUC that can't reasonably be tuned by 95% of our users is nearly useless. Few users are going to run benchmarks to see what the optimal value is. It's possible to convince customers to play with a performance influencing parameter and see how the results are. Even in production. I'm a bit dubious that people will be willing to experiment in production with a GUC that requires a database restart to change. I've convinced customers to restart databases with several different shared_buffers settings... So I'm pretty sure it's possible, in *some* cases, for xloginsert_slots. And even if it's just test/pre-production machines - they're not going to benchmark settings they can't reasonably set in production. Do we really want to expose a setting a few of us _might_ ask customers to change? They also will try that themselves. Our customers aren't a horde of dumb people. Some of them are willing to try things if they hit scalability problesm. And *lots* of people hit scalability problems with postgres. In fact I've seen big users migrate away from postgres because of them. And it's not like this only affects absurd cases. Even a parallel restore will benefit. I don't think so --- create a custom build if you want that. Won't work in the majority of real world cases. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixed xloginsert_locks for 9.4
On Sat, Oct 4, 2014 at 12:13:00AM +0200, Andres Freund wrote: Do we really want to expose a setting a few of us _might_ ask customers to change? They also will try that themselves. Our customers aren't a horde of dumb people. Some of them are willing to try things if they hit scalability problesm. And *lots* of people hit scalability problems with postgres. In fact I've seen big users migrate away from postgres because of them. And it's not like this only affects absurd cases. Even a parallel restore will benefit. I disagree. I just don't see the value in having such undefined variables. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixed xloginsert_locks for 9.4
On 2014-10-03 18:16:28 -0400, Bruce Momjian wrote: On Sat, Oct 4, 2014 at 12:13:00AM +0200, Andres Freund wrote: Do we really want to expose a setting a few of us _might_ ask customers to change? They also will try that themselves. Our customers aren't a horde of dumb people. Some of them are willing to try things if they hit scalability problesm. And *lots* of people hit scalability problems with postgres. In fact I've seen big users migrate away from postgres because of them. And it's not like this only affects absurd cases. Even a parallel restore will benefit. I disagree. I just don't see the value in having such undefined variables. undefined variables? I'm not arguing that we don't need documentation for it. Obviously we'd need that. I'm arguing against taking away significant scalability possibilities from our users. My bet is that it's more than 50% on a bigger machine. I don't think we can offer absolutely accurate tuning advice, but I'm sure we can give some guidance. Let me try. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
On Fri, Oct 3, 2014 at 2:44 PM, Kevin Grittner kgri...@ymail.com wrote: I've never seen atomic used to mean you never get an error before. When you are saying atomic you mean something quite different. Perhaps I should have been more careful on that point. Just to be crystal clear: I don't intend that you'll never get an error (some cases clearly *should* error). But I want confidence that reasonable, simple UPSERTs will never error due to a duplicate violation that they thought the statement was supposed to avoid. That's the point of UPSERT. Guarantees matter. I would like to make guarantees that are at least as good as the upsert looping subxact pattern. Teradata refers to an atomic UPSERT. It also has a MERGE command. My complaint is quite straightforward, really. I don't want to have to tell users to do this: http://stackoverflow.com/a/2249 I think you are confusing syntax with semantics. I grant that a reasonable person could have concerns about using the MERGE syntax to implement the semantics you want in the special case that an appropriate unique index exists, but pretending that the semantics can't be achieved with that syntax is just silly. I am not pretending that it can't be done - just, as I said, that to do so would have bad consequences when time came to generalize the syntax to support more advanced cases. At the same time, I also don't want to have to live with the consequences of implementing a MERGE that does not exhibit that behavior. Which is to say, the consequences of doing something like selectively using different types of snapshots (MVCC or dirty - the two different ideas of each row that are in tension) based on the exact clauses used. That seems like asking for trouble, TBH. Now *that* is getting more to a real issue. Yes, it is. I am opposed to using the MERGE syntax for this *because* MERGE is useful (independently useful, when done properly), not because it is not useful. We routinely pick very different plans based on the presence or absence of an index, and we use special snapshots in the course of executing many DML statements (if FK triggers are fired), but this would be the first place I can think of that the primary DML statement behaves that way. You and a couple others have expressed concern about that, but it seems pretty vague and hand-wavey. If a different execution node is used for the special behavior, and that node is generated based on the unique index being used, I'm really having problems seeing where the problem lies. We need to avoid duplicate violations. The authoritative source of truth here is a unique index, because the presence or absence of a conclusively-visible conflict tuple controls whether our insert fails or succeeds respectively. We need a dirty snapshot to see tuples not in our MVCC snapshot, since they need to be handled too. Otherwise, when we fail to handle them, the MERGE's INSERT has a dup violation (which is not what I want, for practical reasons). If we're seq scanning a table, even with a dirty snapshot, there is no authoritative source of truth. Heap tuples will be inserted concurrently, and we'll have no idea what to do about it without reference to a unique index as a choke point/single source of final truth about what is and isn't going to be duplicated. As implemented, my UPSERT patch will have UPSERTs not really use their MVCC snapshot for simple cases, just like INSERTs in general. UPDATEs always use their MVCC snapshot, as the ModifyTable node pulls up tuples from an ordinary relation scan. UPSERT needs a DirtySnapshot scan of the unique index to find duplicates, because that's always how we find duplicates. At the same time, MERGE is not at liberty to not work just because of the absence of a unique index (plus even row locking must be prepared for conflicts). And since the MERGE is driven by an outer join - *not* an INSERT-like search for duplicates - we'll find ourselves providing one behavior when detecting one case, and another when detecting the other, with subtle and confusing differences between each case. If we fake an outer join by doing an INSERT-like search for duplicates in a unique index (to make sure we see duplicates), then we get into trouble elsewhere. Like, we cannot let users not ask for an INSERT in their MERGE, because we'd be conclusively deciding to *not* (say) UPDATE on the basis of the absence of a value in a unique index, as opposed to the presence. Suddenly, we're in the business of proving a negativeor are we?. My head hurts. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
On Fri, Oct 3, 2014 at 3:42 PM, Peter Geoghegan p...@heroku.com wrote: Yes, it is. I am opposed to using the MERGE syntax for this *because* MERGE is useful (independently useful, when done properly), not because it is not useful. As I've mentioned, there is also the practical argument: No one else does this. That suggests to me that it's hard. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax
On Fri, Oct 3, 2014 at 3:42 PM, Peter Geoghegan p...@heroku.com wrote: We routinely pick very different plans based on the presence or absence of an index, and we use special snapshots in the course of executing many DML statements (if FK triggers are fired) Apart from FK snapshots, we also use dirty snapshots for unique index enforcement, obviously. That doesn't mean that it's the command/xact snapshot, though. It's just a special constraint enforcement mechanism. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers